espnet.nets.pytorch_backend.tacotron2.cbhg.CBHG

Less than 1 minute

espnet.nets.pytorch_backend.tacotron2.cbhg.CBHG

class espnet.nets.pytorch_backend.tacotron2.cbhg.CBHG(idim, odim, conv_bank_layers=8, conv_bank_chans=128, conv_proj_filts=3, conv_proj_chans=256, highway_layers=4, highway_units=128, gru_units=256)

Bases: Module

CBHG module to convert log Mel-filterbanks to linear spectrogram.

This is a module of CBHG introduced in Tacotron: Towards End-to-End Speech Synthesis. The CBHG converts the sequence of log Mel-filterbanks into linear spectrogram.

Initialize CBHG module.

Parameters:
- idim (int) – Dimension of the inputs.
- odim (int) – Dimension of the outputs.
- conv_bank_layers (int , optional) – The number of convolution bank layers.
- conv_bank_chans (int , optional) – The number of channels in convolution bank.
- conv_proj_filts (int , optional) – Kernel size of convolutional projection layer.
- conv_proj_chans (int , optional) – The number of channels in convolutional projection layer.
- highway_layers (int , optional) – The number of highway network layers.
- highway_units (int , optional) – The number of highway network units.
- gru_units (int , optional) – The number of GRU units (for both directions).

forward(xs, ilens)

Calculate forward propagation.

Parameters:
- xs (Tensor) – Batch of the padded sequences of inputs (B, Tmax, idim).
- ilens (LongTensor) – Batch of lengths of each input sequence (B,).
Returns: Batch of the padded sequence of outputs (B, Tmax, odim). LongTensor: Batch of lengths of each output sequence (B,).
Return type: Tensor

inference(x)

Inference.

Parameters:x (Tensor) – The sequences of inputs (T, idim).
Returns: The sequence of outputs (T, odim).
Return type: Tensor