espnet2.s2t.espnet_ctc_model.ESPnetS2TCTCModel

Less than 1 minute

espnet2.s2t.espnet_ctc_model.ESPnetS2TCTCModel

class espnet2.s2t.espnet_ctc_model.ESPnetS2TCTCModel(vocab_size: int, token_list: Tuple[str, ...] | List[str], frontend: AbsFrontend | None, specaug: AbsSpecAug | None, normalize: AbsNormalize | None, encoder: AbsEncoder, prompt_encoder: AbsEncoder, ctc: CTC, interctc_weight: float = 0.0, ignore_id: int = -1, report_cer: bool = True, report_wer: bool = True, sym_space: str = '<space>', sym_blank: str = '<blank>', sym_sos: str = '<sos>', sym_eos: str = '<eos>', sym_sop: str = '<sop>', sym_na: str = '<na>', extract_feats_in_collect_stats: bool = True, ctc_asr_only: List[bool] = [False])

Bases: AbsESPnetModel

OWSM-CTC model

Initialize internal Module state, shared by both nn.Module and ScriptModule.

collect_feats(speech: Tensor, speech_lengths: Tensor, text: Tensor, text_lengths: Tensor, text_prev: Tensor, text_prev_lengths: Tensor, text_ctc: Tensor, text_ctc_lengths: Tensor, **kwargs) → Dict[str, Tensor]

encode(speech: Tensor, speech_lengths: Tensor, text_prev: Tensor, text_prev_lengths: Tensor, prefix: Tensor, prefix_lengths: Tensor)

Encode input speech.

forward(speech: Tensor, speech_lengths: Tensor, text: Tensor, text_lengths: Tensor, text_prev: Tensor, text_prev_lengths: Tensor, text_ctc: Tensor, text_ctc_lengths: Tensor, prefix: Tensor, prefix_lengths: Tensor, **kwargs) → Tuple[Tensor, Dict[str, Tensor], Tensor]

Frontend + Encoder + Calc loss

Parameters:
- speech – (Batch, Length, …)
- speech_lengths – (Batch, )
- text – (Batch, Length)
- text_lengths – (Batch,)
- text_prev – (Batch, Length)
- text_prev_lengths – (Batch,)
- text_ctc – (Batch, Length)
- text_ctc_lengths – (Batch,)
- prefix – (Batch, Length=2), <lang> and <task>
- prefix_lengths – (Batch,)
- kwargs – “utt_id” is among the input.