espnet2.slu.espnet_model.ESPnetSLUModel

About 1 min

espnet2.slu.espnet_model.ESPnetSLUModel

class espnet2.slu.espnet_model.ESPnetSLUModel(vocab_size: int, token_list: Tuple[str, ...] | List[str], frontend: AbsFrontend | None, specaug: AbsSpecAug | None, normalize: AbsNormalize | None, preencoder: AbsPreEncoder | None, encoder: AbsEncoder, postencoder: AbsPostEncoder | None, decoder: AbsDecoder | None, ctc: CTC, joint_network: Module | None, postdecoder: AbsPostDecoder | None = None, deliberationencoder: AbsPostEncoder | None = None, transcript_token_list: Tuple[str, ...] | List[str] | None = None, ctc_weight: float = 0.5, interctc_weight: float = 0.0, ignore_id: int = -1, lsm_weight: float = 0.0, length_normalized_loss: bool = False, report_cer: bool = True, report_wer: bool = True, sym_space: str = '<space>', sym_blank: str = '<blank>', extract_feats_in_collect_stats: bool = True, two_pass: bool = False, pre_postencoder_norm: bool = False, num_class: int = 0, ssl_input_size: int = 0, superb_setup: bool = False, use_only_last_correct: bool = False)

Bases: ESPnetASRModel

CTC-attention hybrid Encoder-Decoder model

Initializes internal Module state, shared by both nn.Module and ScriptModule.

collect_feats(speech: Tensor, speech_lengths: Tensor, text: Tensor, text_lengths: Tensor, transcript: Tensor | None = None, transcript_lengths: Tensor | None = None, **kwargs) → Dict[str, Tensor]

encode(speech: Tensor, speech_lengths: Tensor, transcript_pad: Tensor | None = None, transcript_pad_lens: Tensor | None = None) → Tuple[Tensor, Tensor]

Frontend + Encoder. Note that this method is used by asr_inference.py

Parameters:
- speech – (Batch, Length, …)
- speech_lengths – (Batch, )

forward(speech: Tensor, speech_lengths: Tensor, text: Tensor, text_lengths: Tensor, transcript: Tensor | None = None, transcript_lengths: Tensor | None = None, **kwargs) → Tuple[Tensor, Dict[str, Tensor], Tensor]

Frontend + Encoder + Decoder + Calc loss

Parameters:
- speech – (Batch, Length, …)
- speech_lengths – (Batch, )
- text – (Batch, Length)
- text_lengths – (Batch,)
- kwargs – “utt_id” is among the input.