espnet2.gan_tts.melgan.melgan.MelGANMultiScaleDiscriminator

About 1 min

espnet2.gan_tts.melgan.melgan.MelGANMultiScaleDiscriminator

class espnet2.gan_tts.melgan.melgan.MelGANMultiScaleDiscriminator(in_channels: int = 1, out_channels: int = 1, scales: int = 3, downsample_pooling: str = 'AvgPool1d', downsample_pooling_params: Dict[str, Any] = {'count_include_pad': False, 'kernel_size': 4, 'padding': 1, 'stride': 2}, kernel_sizes: List[int] = [5, 3], channels: int = 16, max_downsample_channels: int = 1024, bias: bool = True, downsample_scales: List[int] = [4, 4, 4, 4], nonlinear_activation: str = 'LeakyReLU', nonlinear_activation_params: Dict[str, Any] = {'negative_slope': 0.2}, pad: str = 'ReflectionPad1d', pad_params: Dict[str, Any] = {}, use_weight_norm: bool = True)

Bases: Module

MelGAN multi-scale discriminator module.

Initilize MelGANMultiScaleDiscriminator module.

Parameters:
- in_channels (int) – Number of input channels.
- out_channels (int) – Number of output channels.
- scales (int) – Number of multi-scales.
- downsample_pooling (str) – Pooling module name for downsampling of the inputs.
- downsample_pooling_params (Dict *[*str , Any ]) – Parameters for the above pooling module.
- kernel_sizes (List *[*int ]) – List of two kernel sizes. The sum will be used for the first conv layer, and the first and the second kernel sizes will be used for the last two layers.
- channels (int) – Initial number of channels for conv layer.
- max_downsample_channels (int) – Maximum number of channels for downsampling layers.
- bias (bool) – Whether to add bias parameter in convolution layers.
- downsample_scales (List *[*int ]) – List of downsampling scales.
- nonlinear_activation (str) – Activation function module name.
- nonlinear_activation_params (Dict *[*str , Any ]) – Hyperparameters for activation function.
- pad (str) – Padding function module name before dilated convolution layer.
- pad_params (Dict *[*str , Any ]) – Hyperparameters for padding function.
- use_weight_norm (bool) – Whether to use weight norm.

apply_weight_norm()

Apply weight normalization module from all of the layers.

forward(x: Tensor) → List[List[Tensor]]

Calculate forward propagation.

Parameters:x (Tensor) – Input noise signal (B, 1, T).
Returns: List of list of each discriminator outputs, which : consists of each layer output tensors.
Return type: List[List[Tensor]]

remove_weight_norm()

Remove weight normalization module from all of the layers.

reset_parameters()

Reset parameters.

This initialization follows official implementation manner. https://github.com/descriptinc/melgan-neurips/blob/master/mel2wav/modules.py