espnet2.svs.singing_tacotron.encoder.Encoder
Less than 1 minute
espnet2.svs.singing_tacotron.encoder.Encoder
class espnet2.svs.singing_tacotron.encoder.Encoder(idim, input_layer='embed', embed_dim=512, elayers=1, eunits=512, econv_layers=3, econv_chans=512, econv_filts=5, use_batch_norm=True, use_residual=False, dropout_rate=0.5, padding_idx=0)
Bases: Module
Encoder module of Spectrogram prediction network.
This is a module of encoder of Spectrogram prediction network in Singing Tacotron, which described in
`Singing-Tacotron: Global Duration Control Attention and Dynamic
Filter for End-to-end Singing Voice Synthesis`_
. This is the encoder which converts either a sequence of characters or acoustic features into the sequence of hidden states.
Filter for End-to-end Singing Voice Synthesis`: : https://arxiv.org/abs/2202.07907
Initialize Singing Tacotron encoder module.
- Parameters:
- idim (int)
- input_layer (str) – Input layer type.
- embed_dim (int , optional)
- elayers (int , optional)
- eunits (int , optional)
- econv_layers (int , optional)
- econv_filts (int , optional)
- econv_chans (int , optional)
- use_batch_norm (bool , optional)
- use_residual (bool , optional)
- dropout_rate (float , optional)
forward(xs, ilens=None)
Calculate forward propagation.
- Parameters:
- xs (Tensor) – Batch of the padded sequence. Either character ids (B, Tmax) or acoustic feature (B, Tmax, idim * encoder_reduction_factor). Padded value should be 0.
- ilens (LongTensor) – Batch of lengths of each input batch (B,).
- Returns: Batch of the sequences of encoder states(B, Tmax, eunits). LongTensor: Batch of lengths of each sequence (B,)
- Return type: Tensor
inference(x, ilens)
Inference.
- Parameters:x (Tensor) – The sequeunce of character ids (T,) or acoustic feature (T, idim * encoder_reduction_factor).
- Returns: The sequences of encoder states(T, eunits).
- Return type: Tensor