espnet2.diar.espnet_model.ESPnetDiarizationModel
espnet2.diar.espnet_model.ESPnetDiarizationModel
class espnet2.diar.espnet_model.ESPnetDiarizationModel(frontend: AbsFrontend | None, specaug: AbsSpecAug | None, normalize: AbsNormalize | None, label_aggregator: Module, encoder: AbsEncoder, decoder: AbsDecoder, attractor: AbsAttractor | None, diar_weight: float = 1.0, attractor_weight: float = 1.0)
Bases: AbsESPnetModel
Speaker Diarization model
If “attractor” is “None”, SA-EEND will be used. Else if “attractor” is not “None”, EEND-EDA will be used. For the details about SA-EEND and EEND-EDA, refer to the following papers: SA-EEND: https://arxiv.org/pdf/1909.06247.pdf EEND-EDA: https://arxiv.org/pdf/2005.09921.pdf, https://arxiv.org/pdf/2106.10654.pdf
Initializes internal Module state, shared by both nn.Module and ScriptModule.
attractor_loss(att_prob, label)
static calc_diarization_error(pred, label, length)
collect_feats(speech: Tensor, speech_lengths: Tensor, spk_labels: Tensor | None = None, spk_labels_lengths: Tensor | None = None, **kwargs) → Dict[str, Tensor]
create_length_mask(length, max_len, num_output)
encode(speech: Tensor, speech_lengths: Tensor, bottleneck_feats: Tensor, bottleneck_feats_lengths: Tensor) → Tuple[Tensor, Tensor]
Frontend + Encoder
- Parameters:
- speech – (Batch, Length, …)
- speech_lengths – (Batch,)
- bottleneck_feats – (Batch, Length, …): used for enh + diar
forward(speech: Tensor, speech_lengths: Tensor | None = None, spk_labels: Tensor | None = None, spk_labels_lengths: Tensor | None = None, **kwargs) → Tuple[Tensor, Dict[str, Tensor], Tensor]
Frontend + Encoder + Decoder + Calc loss
- Parameters:
- speech – (Batch, samples)
- speech_lengths – (Batch,) default None for chunk interator, because the chunk-iterator does not have the speech_lengths returned. see in espnet2/iterators/chunk_iter_factory.py
- spk_labels – (Batch, )
- kwargs – “utt_id” is among the input.
pit_loss(pred, label, lengths)
pit_loss_single_permute(pred, label, length)