espnet2.asr.encoder.avhubert_encoder.AVHubertModel
espnet2.asr.encoder.avhubert_encoder.AVHubertModel
class espnet2.asr.encoder.avhubert_encoder.AVHubertModel(cfg: AVHubertConfig, **kwargs)
Bases: Module
Initializes internal Module state, shared by both nn.Module and ScriptModule.
classmethod build_model(cfg: AVHubertConfig)
Build a new model instance.
extract_finetune(source, padding_mask=None, mask=False, ret_conv=False, output_layer=None)
Forward AVHubert Pretrain Encoder.
- Parameters:
- source**['video']** – input tensor (B, 1, L, H, W)
- source**['audio']** – input tensor (B, F, L)
- padding_mask – input tensor (B, L)
- Returns: encoded tensor and mask
forward_audio(source_audio)
forward_features(source: Tensor, modality: str) → Tensor
forward_padding_mask(features: Tensor, padding_mask: Tensor) → Tensor
forward_transformer(source, padding_mask=None, output_layer=None)
Forward AVHubert Pretrain Encoder (without frontend).
Assume the source is already fused feature. :param source: input tensor (B, L, D*2) :param padding_mask: input tensor (B, L)
- Returns: encoded tensor and mask
forward_video(source_video)
modality_fusion(features_audio, features_video)