espnet2.tts.gst.style_encoder.StyleTokenLayer
Less than 1 minute
espnet2.tts.gst.style_encoder.StyleTokenLayer
class espnet2.tts.gst.style_encoder.StyleTokenLayer(ref_embed_dim: int = 128, gst_tokens: int = 10, gst_token_dim: int = 256, gst_heads: int = 4, dropout_rate: float = 0.0)
Bases: Module
Style token layer module.
This module is style token layer introduced in Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis.
- Parameters:
- ref_embed_dim (int , optional) – Dimension of the input reference embedding.
- gst_tokens (int , optional) – The number of GST embeddings.
- gst_token_dim (int , optional) – Dimension of each GST embedding.
- gst_heads (int , optional) – The number of heads in GST multihead attention.
- dropout_rate (float , optional) – Dropout rate in multi-head attention.
Initilize style token layer module.
forward(ref_embs: Tensor) → Tensor
Calculate forward propagation.
- Parameters:ref_embs (Tensor) – Reference embeddings (B, ref_embed_dim).
- Returns: Style token embeddings (B, gst_token_dim).
- Return type: Tensor