espnet.nets.pytorch_backend.tacotron2.cbhg.CBHG
Less than 1 minute
espnet.nets.pytorch_backend.tacotron2.cbhg.CBHG
class espnet.nets.pytorch_backend.tacotron2.cbhg.CBHG(idim, odim, conv_bank_layers=8, conv_bank_chans=128, conv_proj_filts=3, conv_proj_chans=256, highway_layers=4, highway_units=128, gru_units=256)
Bases: Module
CBHG module to convert log Mel-filterbanks to linear spectrogram.
This is a module of CBHG introduced in Tacotron: Towards End-to-End Speech Synthesis. The CBHG converts the sequence of log Mel-filterbanks into linear spectrogram.
Initialize CBHG module.
- Parameters:
- idim (int) – Dimension of the inputs.
- odim (int) – Dimension of the outputs.
- conv_bank_layers (int , optional) – The number of convolution bank layers.
- conv_bank_chans (int , optional) – The number of channels in convolution bank.
- conv_proj_filts (int , optional) – Kernel size of convolutional projection layer.
- conv_proj_chans (int , optional) – The number of channels in convolutional projection layer.
- highway_layers (int , optional) – The number of highway network layers.
- highway_units (int , optional) – The number of highway network units.
- gru_units (int , optional) – The number of GRU units (for both directions).
forward(xs, ilens)
Calculate forward propagation.
- Parameters:
- xs (Tensor) – Batch of the padded sequences of inputs (B, Tmax, idim).
- ilens (LongTensor) – Batch of lengths of each input sequence (B,).
- Returns: Batch of the padded sequence of outputs (B, Tmax, odim). LongTensor: Batch of lengths of each output sequence (B,).
- Return type: Tensor
inference(x)
Inference.
- Parameters:x (Tensor) – The sequences of inputs (T, idim).
- Returns: The sequence of outputs (T, odim).
- Return type: Tensor