espnet2.gan_svs.vits.pitch_predictor.Decoder
espnet2.gan_svs.vits.pitch_predictor.Decoder
class espnet2.gan_svs.vits.pitch_predictor.Decoder(out_channels: int = 192, attention_dim: int = 192, attention_heads: int = 2, linear_units: int = 768, blocks: int = 6, pw_layer_type: str = 'conv1d', pw_conv_kernel_size: int = 3, pos_enc_layer_type: str = 'rel_pos', self_attention_layer_type: str = 'rel_selfattn', activation_type: str = 'swish', normalize_before: bool = True, use_macaron_style: bool = False, use_conformer_conv: bool = False, conformer_kernel_size: int = 7, dropout_rate: float = 0.1, positional_dropout_rate: float = 0.0, attention_dropout_rate: float = 0.0, global_channels: int = -1)
Bases: Module
Pitch or Mel decoder module in VISinger 2.
Initialize Decoder in VISinger 2.
- Parameters:
- out_channels (int) – The output dimension of the module.
- attention_dim (int) – The dimension of the attention mechanism.
- attention_heads (int) – The number of attention heads.
- linear_units (int) – The number of units in the linear layer.
- blocks (int) – The number of encoder blocks.
- pw_layer_type (str) – The type of position-wise layer to use.
- pw_conv_kernel_size (int) – The kernel size of the position-wise convolutional layer.
- pos_enc_layer_type (str) – The type of positional encoding layer to use.
- self_attention_layer_type (str) – The type of self-attention layer to use.
- activation_type (str) – The type of activation function to use.
- normalize_before (bool) – Whether to normalize the data before the position-wise layer or after.
- use_macaron_style (bool) – Whether to use the macaron style or not.
- use_conformer_conv (bool) – Whether to use Conformer style conv or not.
- conformer_kernel_size (int) – The kernel size of the conformer convolutional layer.
- dropout_rate (float) – The dropout rate to use.
- positional_dropout_rate (float) – The positional dropout rate to use.
- attention_dropout_rate (float) – The attention dropout rate to use.
- global_channels (int) – The number of channels to use for global conditioning.
forward(x, x_lengths, g=None)
Forward pass of the Decoder.
- Parameters:
- x (Tensor) – Input tensor (B, 2 + attention_dim, T).
- x_lengths (Tensor) – Length tensor (B,).
- g (Tensor , optional) – Global conditioning tensor (B, global_channels, 1).
- Returns: Output tensor (B, 1, T). Tensor: Output mask (B, 1, T).
- Return type: Tensor