espnet2.tts.gst.style_encoder.ReferenceEncoder
Less than 1 minute
espnet2.tts.gst.style_encoder.ReferenceEncoder
class espnet2.tts.gst.style_encoder.ReferenceEncoder(idim=80, conv_layers: int = 6, conv_chans_list: Sequence[int] = (32, 32, 64, 64, 128, 128), conv_kernel_size: int = 3, conv_stride: int = 2, gru_layers: int = 1, gru_units: int = 128)
Bases: Module
Reference encoder module.
This module is reference encoder introduced in Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis.
- Parameters:
- idim (int , optional) – Dimension of the input mel-spectrogram.
- conv_layers (int , optional) – The number of conv layers in the reference encoder.
- conv_chans_list – (Sequence[int], optional): List of the number of channels of conv layers in the referece encoder.
- conv_kernel_size (int , optional) – Kernel size of conv layers in the reference encoder.
- conv_stride (int , optional) – Stride size of conv layers in the reference encoder.
- gru_layers (int , optional) – The number of GRU layers in the reference encoder.
- gru_units (int , optional) – The number of GRU units in the reference encoder.
Initilize reference encoder module.
forward(speech: Tensor) → Tensor
Calculate forward propagation.
- Parameters:speech (Tensor) – Batch of padded target features (B, Lmax, idim).
- Returns: Reference embedding (B, gru_units)
- Return type: Tensor