espnet2.lid.espnet_model.ESPnetLIDModel
espnet2.lid.espnet_model.ESPnetLIDModel
class espnet2.lid.espnet_model.ESPnetLIDModel(frontend: AbsFrontend | None, specaug: AbsSpecAug | None, normalize: AbsNormalize | None, encoder: AbsEncoder | None, pooling: AbsPooling | None, projector: AbsProjector | None, loss: AbsLoss | None, extract_feats_in_collect_stats: bool | None = None)
Bases: AbsESPnetModel
ESPnet LID model
Support for language identification and language embedding extraction.
Initializes internal Module state, shared by both nn.Module and ScriptModule.
collect_feats(speech: Tensor, speech_lengths: Tensor, lid_labels: Tensor | None = None, **kwargs) → Dict[str, Tensor]
encode_frame(feats: Tensor) → Tensor
extract_feats(speech: Tensor, speech_lengths: Tensor) → Tuple[Tensor, Tensor]
forward(speech: Tensor, speech_lengths: Tensor, lid_labels: Tensor | None = None, extract_embd: bool = False, **kwargs) → Tuple[Tensor, Tensor] | Tuple[Tensor, Dict[str, Tensor], Tensor] | Tensor
Forward pass of the LID model.
Processes raw speech through frontend, encoder, pooling, and loss modules.
- Parameters:
- speech – Input waveform tensor (batch_size, num_samples)
- speech_lengths – Lengths of each input in the batch (batch_size,)
- lid_labels – Ground truth language labels (batch_size,)
- extract_embd – If True, return language embeddings and predictions (inference mode)
- Returns: Tuple(lang_embd, pred_lids)
- If training:
Tuple(loss, stats_dict, batch_weight)
- Return type:
- If extract_embd=True (inference mode)
project_lang_embd(utt_level_feat: Tensor) → Tensor