espnet2.gan_tts.vits.posterior_encoder.PosteriorEncoder
Less than 1 minute
espnet2.gan_tts.vits.posterior_encoder.PosteriorEncoder
class espnet2.gan_tts.vits.posterior_encoder.PosteriorEncoder(in_channels: int = 513, out_channels: int = 192, hidden_channels: int = 192, kernel_size: int = 5, layers: int = 16, stacks: int = 1, base_dilation: int = 1, global_channels: int = -1, dropout_rate: float = 0.0, bias: bool = True, use_weight_norm: bool = True)
Bases: Module
Posterior encoder module in VITS.
This is a module of posterior encoder described in Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech.
Initilialize PosteriorEncoder module.
- Parameters:
- in_channels (int) – Number of input channels.
- out_channels (int) – Number of output channels.
- hidden_channels (int) – Number of hidden channels.
- kernel_size (int) – Kernel size in WaveNet.
- layers (int) – Number of layers of WaveNet.
- stacks (int) – Number of repeat stacking of WaveNet.
- base_dilation (int) – Base dilation factor.
- global_channels (int) – Number of global conditioning channels.
- dropout_rate (float) – Dropout rate.
- bias (bool) – Whether to use bias parameters in conv.
- use_weight_norm (bool) – Whether to apply weight norm.
forward(x: Tensor, x_lengths: Tensor, g: Tensor | None = None) → Tuple[Tensor, Tensor, Tensor, Tensor]
Calculate forward propagation.
- Parameters:
- x (Tensor) – Input tensor (B, in_channels, T_feats).
- x_lengths (Tensor) – Length tensor (B,).
- g (Optional *[*Tensor ]) – Global conditioning tensor (B, global_channels, 1).
- Returns: Encoded hidden representation tensor (B, out_channels, T_feats). Tensor: Projected mean tensor (B, out_channels, T_feats). Tensor: Projected scale tensor (B, out_channels, T_feats). Tensor: Mask tensor for input tensor (B, 1, T_feats).
- Return type: Tensor