espnet2.tts.prodiff.denoiser.SpectogramDenoiser
espnet2.tts.prodiff.denoiser.SpectogramDenoiser
class espnet2.tts.prodiff.denoiser.SpectogramDenoiser(idim: int, adim: int = 256, layers: int = 20, channels: int = 256, cycle_length: int = 1, timesteps: int = 200, timescale: int = 1, max_beta: float = 40.0, scheduler: str = 'vpsde', dropout_rate: float = 0.05)
Bases: Module
Spectogram Denoiser.
Ref: https://arxiv.org/pdf/2207.06389.pdf.
Initialization.
- Parameters:
- idim (int) – Dimension of the inputs.
- adim (int , optional) – Dimension of the hidden states. Defaults to 256.
- layers (int , optional) – Number of layers. Defaults to 20.
- channels (int , optional) – Number of channels of each layer. Defaults to 256.
- cycle_length (int , optional) – Cycle length of the diffusion. Defaults to 1.
- timesteps (int , optional) – Number of timesteps of the diffusion. Defaults to 200.
- timescale (int , optional) – Number of timescale. Defaults to 1.
- max_beta (float , optional) – Maximum beta value for schedueler. Defaults to 40.
- scheduler (str , optional) – Type of noise scheduler. Defaults to “vpsde”.
- dropout_rate (float , optional) – Dropout rate. Defaults to 0.05.
diffusion(xs_ref: Tensor, steps: Tensor, noise: Tensor | None = None) → Tensor
Calculate diffusion process during training.
- Parameters:
- xs_ref (torch.Tensor) – Input tensor.
- steps (torch.Tensor) – Number of step.
- noise (Optional *[*torch.Tensor ] , optional) – Noise tensor. Defaults to None.
- Returns: Output tensor.
- Return type: torch.Tensor
forward(xs: Tensor, ys: Tensor | None = None, masks: Tensor | None = None, is_inference: bool = False) → Tensor
Calculate forward propagation.
- Parameters:
- xs (torch.Tensor) – Phoneme-encoded tensor (#batch, time, dims)
- ys (Optional *[*torch.Tensor ] , optional) – Mel-based reference tensor (#batch, time, mels). Defaults to None.
- masks (Optional *[*torch.Tensor ] , optional) – Mask tensor (#batch, time). Defaults to None.
- Returns: Output tensor (#batch, time, dims).
- Return type: torch.Tensor
forward_denoise(xs_noisy: Tensor, step: Tensor, condition: Tensor) → Tensor
Calculate forward for denoising diffusion.
- Parameters:
- xs_noisy (torch.Tensor) – Input tensor.
- step (torch.Tensor) – Number of step.
- condition (torch.Tensor) – Conditioning tensor.
- Returns: Denoised tensor.
- Return type: torch.Tensor
inference(condition: Tensor) → Tensor
Calculate forward during inference.
- Parameters:condition (torch.Tensor) – Conditioning tensor (batch, time, dims).
- Returns: Output tensor.
- Return type: torch.Tensor