espnet2.diar package

espnet2.diar.espnet_model

class espnet2.diar.espnet_model.ESPnetDiarizationModel(frontend: Optional[espnet2.asr.frontend.abs_frontend.AbsFrontend], specaug: Optional[espnet2.asr.specaug.abs_specaug.AbsSpecAug], normalize: Optional[espnet2.layers.abs_normalize.AbsNormalize], label_aggregator: torch.nn.modules.module.Module, encoder: espnet2.asr.encoder.abs_encoder.AbsEncoder, decoder: espnet2.diar.decoder.abs_decoder.AbsDecoder, attractor: Optional[espnet2.diar.attractor.abs_attractor.AbsAttractor], attractor_weight: float = 1.0)[source]

Bases: espnet2.train.abs_espnet_model.AbsESPnetModel

Speaker Diarization model

If “attractor” is “None”, SA-EEND will be used. Else if “attractor” is not “None”, EEND-EDA will be used. For the details about SA-EEND and EEND-EDA, refer to the following papers: SA-EEND: https://arxiv.org/pdf/1909.06247.pdf EEND-EDA: https://arxiv.org/pdf/2005.09921.pdf, https://arxiv.org/pdf/2106.10654.pdf

attractor_loss(att_prob, label)[source]
static calc_diarization_error(pred, label, length)[source]
collect_feats(speech: torch.Tensor, speech_lengths: torch.Tensor, spk_labels: torch.Tensor = None, spk_labels_lengths: torch.Tensor = None) → Dict[str, torch.Tensor][source]
create_length_mask(length, max_len, num_output)[source]
encode(speech: torch.Tensor, speech_lengths: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor][source]

Frontend + Encoder

Parameters
  • speech – (Batch, Length, …)

  • speech_lengths – (Batch,)

forward(speech: torch.Tensor, speech_lengths: torch.Tensor = None, spk_labels: torch.Tensor = None, spk_labels_lengths: torch.Tensor = None) → Tuple[torch.Tensor, Dict[str, torch.Tensor], torch.Tensor][source]

Frontend + Encoder + Decoder + Calc loss

Parameters
  • speech – (Batch, samples)

  • speech_lengths – (Batch,) default None for chunk interator, because the chunk-iterator does not have the speech_lengths returned. see in espnet2/iterators/chunk_iter_factory.py

  • spk_labels – (Batch, )

pit_loss(pred, label, lengths)[source]
pit_loss_single_permute(pred, label, length)[source]

espnet2.diar.__init__

espnet2.diar.abs_diar

class espnet2.diar.abs_diar.AbsDiarization[source]

Bases: torch.nn.modules.module.Module, abc.ABC

Initializes internal Module state, shared by both nn.Module and ScriptModule.

abstract forward(input: torch.Tensor, ilens: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor, collections.OrderedDict][source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

abstract forward_rawwav(input: torch.Tensor, ilens: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor, collections.OrderedDict][source]

espnet2.diar.label_processor

class espnet2.diar.label_processor.LabelProcessor(win_length: int = 512, hop_length: int = 128, center: bool = True)[source]

Bases: torch.nn.modules.module.Module

Label aggregator for speaker diarization

forward(input: torch.Tensor, ilens: torch.Tensor)[source]

Forward.

Parameters
  • input – (Batch, Nsamples, Label_dim)

  • ilens – (Batch)

Returns

(Batch, Frames, Label_dim) olens: (Batch)

Return type

output

espnet2.diar.attractor.rnn_attractor

class espnet2.diar.attractor.rnn_attractor.RnnAttractor(encoder_output_size: int, layer: int = 1, unit: int = 512, dropout: float = 0.1, attractor_grad: bool = True)[source]

Bases: espnet2.diar.attractor.abs_attractor.AbsAttractor

encoder decoder attractor for speaker diarization

forward(enc_input: torch.Tensor, ilens: torch.Tensor, dec_input: torch.Tensor)[source]

Forward.

Parameters
  • enc_input (torch.Tensor) – hidden_space [Batch, T, F]

  • ilens (torch.Tensor) – input lengths [Batch]

  • dec_input (torch.Tensor) – decoder input (zeros) [Batch, num_spk + 1, F]

Returns

[Batch, num_spk + 1, F] att_prob: [Batch, num_spk + 1, 1]

Return type

attractor

espnet2.diar.attractor.__init__

espnet2.diar.attractor.abs_attractor

class espnet2.diar.attractor.abs_attractor.AbsAttractor[source]

Bases: torch.nn.modules.module.Module, abc.ABC

Initializes internal Module state, shared by both nn.Module and ScriptModule.

abstract forward(enc_input: torch.Tensor, ilens: torch.Tensor, dec_input: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor][source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

espnet2.diar.decoder.abs_decoder

class espnet2.diar.decoder.abs_decoder.AbsDecoder[source]

Bases: torch.nn.modules.module.Module, abc.ABC

Initializes internal Module state, shared by both nn.Module and ScriptModule.

abstract forward(input: torch.Tensor, ilens: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor][source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

abstract property num_spk

espnet2.diar.decoder.__init__

espnet2.diar.decoder.linear_decoder

class espnet2.diar.decoder.linear_decoder.LinearDecoder(encoder_output_size: int, num_spk: int = 2)[source]

Bases: espnet2.diar.decoder.abs_decoder.AbsDecoder

Linear decoder for speaker diarization

forward(input: torch.Tensor, ilens: torch.Tensor)[source]

Forward.

Parameters
  • input (torch.Tensor) – hidden_space [Batch, T, F]

  • ilens (torch.Tensor) – input lengths [Batch]

property num_spk