espnet2.asr.encoder.avhubert_encoder.FairseqAVHubertEncoder
espnet2.asr.encoder.avhubert_encoder.FairseqAVHubertEncoder
class espnet2.asr.encoder.avhubert_encoder.FairseqAVHubertEncoder(input_size: int = 1, avhubert_url: str = './', avhubert_dir_path: str = './', freeze_finetune_updates: int = 0, encoder_embed_dim: int = 1024, encoder_layerdrop: float = 0.05, dropout_input: float = 0.1, dropout_features: float = 0.1, dropout: float = 0.1, attention_dropout: float = 0.1, feature_grad_mult: float = 0.1, activation_dropout: float = 0.0, wav_input: bool = False, layer_norm_first: bool = True, audio_feat_dim: int = 104, encoder_layers: int = 24, encoder_ffn_embed_dim: int = 4096, encoder_attention_heads: int = 16, extracted: bool = False, pretrain: bool = True, modality_dropout: float = 0.0, audio_dropout: float = 0.0, noise_augmentation: bool = False, noise_path: str = './data/babble_noise.pt', max_noise_weight: float = 0.5, audio_only: bool = False)
Bases: AbsEncoder
FairSeq AVHubert pretrained encoder module
- Parameters:
- input_size – input dim
- avhubert_url – download link for pre-trained avhubert model
- avhubert_dir_path – dir_path for downloading pre-trained avhubert model
Initializes internal Module state, shared by both nn.Module and ScriptModule.
forward(xs_pad: Dict[str, Tensor], ilens: Tensor, prev_states: Tensor | None = None) → Tuple[Tensor, Tensor, Tensor | None]
Forward AVHubert Encoder.
- Parameters:
- xs_pad**[video]** – input tensor (B, 1, L, H, W)
- xs_pad**[audio]** – input tensor (B, D, L)
- ilens – input length (B)
- prev_states – Not to be used now.
- Returns: position embedded tensor and mask
forward_fusion(xs_pad: Dict[str, Tensor]) → Tensor
output_size() → int
reload_pretrained_parameters()