espnet2.diar.espnet_model.ESPnetDiarizationModel

About 1 min

espnet2.diar.espnet_model.ESPnetDiarizationModel

class espnet2.diar.espnet_model.ESPnetDiarizationModel(frontend: AbsFrontend | None, specaug: AbsSpecAug | None, normalize: AbsNormalize | None, label_aggregator: Module, encoder: AbsEncoder, decoder: AbsDecoder, attractor: AbsAttractor | None, diar_weight: float = 1.0, attractor_weight: float = 1.0)

Bases: AbsESPnetModel

Speaker Diarization model

If “attractor” is “None”, SA-EEND will be used. Else if “attractor” is not “None”, EEND-EDA will be used. For the details about SA-EEND and EEND-EDA, refer to the following papers: SA-EEND: https://arxiv.org/pdf/1909.06247.pdf EEND-EDA: https://arxiv.org/pdf/2005.09921.pdf, https://arxiv.org/pdf/2106.10654.pdf

Initializes internal Module state, shared by both nn.Module and ScriptModule.

attractor_loss(att_prob, label)

static calc_diarization_error(pred, label, length)

collect_feats(speech: Tensor, speech_lengths: Tensor, spk_labels: Tensor | None = None, spk_labels_lengths: Tensor | None = None, **kwargs) → Dict[str, Tensor]

create_length_mask(length, max_len, num_output)

encode(speech: Tensor, speech_lengths: Tensor, bottleneck_feats: Tensor, bottleneck_feats_lengths: Tensor) → Tuple[Tensor, Tensor]

Frontend + Encoder

Parameters:
- speech – (Batch, Length, …)
- speech_lengths – (Batch,)
- bottleneck_feats – (Batch, Length, …): used for enh + diar

forward(speech: Tensor, speech_lengths: Tensor | None = None, spk_labels: Tensor | None = None, spk_labels_lengths: Tensor | None = None, **kwargs) → Tuple[Tensor, Dict[str, Tensor], Tensor]

Frontend + Encoder + Decoder + Calc loss

Parameters:
- speech – (Batch, samples)
- speech_lengths – (Batch,) default None for chunk interator, because the chunk-iterator does not have the speech_lengths returned. see in espnet2/iterators/chunk_iter_factory.py
- spk_labels – (Batch, )
- kwargs – “utt_id” is among the input.

pit_loss(pred, label, lengths)

pit_loss_single_permute(pred, label, length)