espnet2.gan_tts.vits.duration_predictor.StochasticDurationPredictor
Less than 1 minute
espnet2.gan_tts.vits.duration_predictor.StochasticDurationPredictor
class espnet2.gan_tts.vits.duration_predictor.StochasticDurationPredictor(channels: int = 192, kernel_size: int = 3, dropout_rate: float = 0.5, flows: int = 4, dds_conv_layers: int = 3, global_channels: int = -1)
Bases: Module
Stochastic duration predictor module.
This is a module of stochastic duration predictor described in Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech.
Initialize StochasticDurationPredictor module.
- Parameters:
- channels (int) – Number of channels.
- kernel_size (int) – Kernel size.
- dropout_rate (float) – Dropout rate.
- flows (int) – Number of flows.
- dds_conv_layers (int) – Number of conv layers in DDS conv.
- global_channels (int) – Number of global conditioning channels.
forward(x: Tensor, x_mask: Tensor, w: Tensor | None = None, g: Tensor | None = None, inverse: bool = False, noise_scale: float = 1.0) → Tensor
Calculate forward propagation.
- Parameters:
- x (Tensor) – Input tensor (B, channels, T_text).
- x_mask (Tensor) – Mask tensor (B, 1, T_text).
- w (Optional *[*Tensor ]) – Duration tensor (B, 1, T_text).
- g (Optional *[*Tensor ]) – Global conditioning tensor (B, channels, 1)
- inverse (bool) – Whether to inverse the flow.
- noise_scale (float) – Noise scale value.
- Returns: If not inverse, negative log-likelihood (NLL) tensor (B,). : If inverse, log-duration tensor (B, 1, T_text).
- Return type: Tensor