espnet2.asr.encoder.hubert_encoder.FairseqHubertEncoder

Less than 1 minute

espnet2.asr.encoder.hubert_encoder.FairseqHubertEncoder

class espnet2.asr.encoder.hubert_encoder.FairseqHubertEncoder(input_size: int, hubert_url: str = './', hubert_dir_path: str = './', output_size: int = 256, normalize_before: bool = False, freeze_finetune_updates: int = 0, dropout_rate: float = 0.0, activation_dropout: float = 0.1, attention_dropout: float = 0.0, mask_length: int = 10, mask_prob: float = 0.75, mask_selection: str = 'static', mask_other: int = 0, apply_mask: bool = True, mask_channel_length: int = 64, mask_channel_prob: float = 0.5, mask_channel_other: int = 0, mask_channel_selection: str = 'static', layerdrop: float = 0.1, feature_grad_mult: float = 0.0)

Bases: AbsEncoder

FairSeq Hubert encoder module, used for loading pretrained weight and finetuning

Parameters:
- input_size – input dim
- hubert_url – url to Hubert pretrained model
- hubert_dir_path – directory to download the Wav2Vec2.0 pretrained model.
- output_size – dimension of attention
- normalize_before – whether to use layer_norm before the first block
- freeze_finetune_updates – steps that freeze all layers except output layer before tuning the whole model (nessasary to prevent overfit).
- dropout_rate – dropout rate
- activation_dropout – dropout rate in activation function
- attention_dropout – dropout rate in attention

Hubert specific Args: : Please refer to: https://github.com/pytorch/fairseq/blob/master/fairseq/models/hubert/hubert.py

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(xs_pad: Tensor, ilens: Tensor, prev_states: Tensor | None = None) → Tuple[Tensor, Tensor, Tensor | None]

Forward Hubert ASR Encoder.

Parameters:
- xs_pad – input tensor (B, L, D)
- ilens – input length (B)
- prev_states – Not to be used now.
Returns: position embedded tensor and mask

output_size() → int

reload_pretrained_parameters()