espnet.nets.pytorch_backend.tacotron2.cbhg.CBHG
About 1 min
espnet.nets.pytorch_backend.tacotron2.cbhg.CBHG
class espnet.nets.pytorch_backend.tacotron2.cbhg.CBHG(idim, odim, conv_bank_layers=8, conv_bank_chans=128, conv_proj_filts=3, conv_proj_chans=256, highway_layers=4, highway_units=128, gru_units=256)
Bases: Module
Convolutional Bank + Highway + bidirectional GRU (CBHG).
The CBHG block was introduced in “Tacotron: Towards End‑to‑End Speech Synthesis” (Wang et al., 2017). It is a versatile sub‑network used in two places of the original Tacotron‑1 architecture:
- Encoder CBHG – converts an input sequence of character/phoneme embeddings into high‑level linguistic representations for the attention mechanism.
- Post‑net CBHG – converts a frame‑wise sequence of predicted log‑Mel filter‑bank energies into a linear‑spectrogram‑like signal for Griffin–Lim (or a neural vocoder).
This implementation follows the paper: : https://arxiv.org/abs/1703.10135
Initialize CBHG module.
- Parameters:
- idim (int) – Dimension of the inputs.
- odim (int) – Dimension of the outputs.
- conv_bank_layers (int , optional) – The number of convolution bank layers.
- conv_bank_chans (int , optional) – The number of channels in convolution bank.
- conv_proj_filts (int , optional) – Kernel size of convolutional projection layer.
- conv_proj_chans (int , optional) – The number of channels in convolutional projection layer.
- highway_layers (int , optional) – The number of highway network layers.
- highway_units (int , optional) – The number of highway network units.
- gru_units (int , optional) – The number of GRU units (for both directions).
forward(xs, ilens)
Calculate forward propagation.
- Parameters:
- xs (Tensor) – Batch of the padded sequences of inputs (B, Tmax, idim).
- ilens (LongTensor) – Batch of lengths of each input sequence (B,).
- Returns: Batch of the padded sequence of outputs (B, Tmax, odim). LongTensor: Batch of lengths of each output sequence (B,).
- Return type: Tensor
inference(x)
Inference.
- Parameters:x (Tensor) – The sequences of inputs (T, idim).
- Returns: The sequence of outputs (T, odim).
- Return type: Tensor