espnet2.tts.gst.style_encoder.ReferenceEncoder

Less than 1 minute

espnet2.tts.gst.style_encoder.ReferenceEncoder

class espnet2.tts.gst.style_encoder.ReferenceEncoder(idim=80, conv_layers: int = 6, conv_chans_list: Sequence[int] = (32, 32, 64, 64, 128, 128), conv_kernel_size: int = 3, conv_stride: int = 2, gru_layers: int = 1, gru_units: int = 128)

Bases: Module

Reference encoder module.

This module is reference encoder introduced in Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis.

Parameters:
- idim (int , optional) – Dimension of the input mel-spectrogram.
- conv_layers (int , optional) – The number of conv layers in the reference encoder.
- conv_chans_list – (Sequence[int], optional): List of the number of channels of conv layers in the referece encoder.
- conv_kernel_size (int , optional) – Kernel size of conv layers in the reference encoder.
- conv_stride (int , optional) – Stride size of conv layers in the reference encoder.
- gru_layers (int , optional) – The number of GRU layers in the reference encoder.
- gru_units (int , optional) – The number of GRU units in the reference encoder.

Initilize reference encoder module.

forward(speech: Tensor) → Tensor

Calculate forward propagation.

Parameters:speech (Tensor) – Batch of padded target features (B, Lmax, idim).
Returns: Reference embedding (B, gru_units)
Return type: Tensor