espnet2.s2st.synthesizer.translatotron2.Translatotron2

Less than 1 minute

espnet2.s2st.synthesizer.translatotron2.Translatotron2

class espnet2.s2st.synthesizer.translatotron2.Translatotron2(idim: int, odim: int, synthesizer_type: str = 'rnn', layers: int = 2, units: int = 1024, prenet_layers: int = 2, prenet_units: int = 128, prenet_dropout_rate: float = 0.5, postnet_layers: int = 5, postnet_chans: int = 512, postnet_dropout_rate: float = 0.5, adim: int = 384, aheads: int = 4, conformer_rel_pos_type: str = 'legacy', conformer_pos_enc_layer_type: str = 'rel_pos', conformer_self_attn_layer_type: str = 'rel_selfattn', conformer_activation_type: str = 'swish', use_macaron_style_in_conformer: bool = True, use_cnn_in_conformer: bool = True, zero_triu: bool = False, conformer_enc_kernel_size: int = 7, conformer_dec_kernel_size: int = 31, duration_predictor_layers: int = 2, duration_predictor_type: str = 'rnn', duration_predictor_units: int = 128, spks: int | None = None, langs: int | None = None, spk_embed_dim: int | None = None, spk_embed_integration_type: str = 'add', init_type: str = 'xavier_uniform', init_enc_alpha: float = 1.0, init_dec_alpha: float = 1.0, use_masking: bool = False, use_weighted_masking: bool = False)

Bases: AbsSynthesizer

Translatotron2 module.

This is a module of the synthesizer in Translatotron2 described in

`Translatotron 2:
High-quality direct speech-to-speech translation with voice preservation`_

High-quality direct speech-to-speech translation with voice preservation`: : https://arxiv.org/pdf/2107.08661v5.pdf

Initializes internal Module state, shared by both nn.Module and ScriptModule.