espnet2.enh.separator.dccrn_separator.DCCRNSeparator
espnet2.enh.separator.dccrn_separator.DCCRNSeparator
class espnet2.enh.separator.dccrn_separator.DCCRNSeparator(input_dim: int, num_spk: int = 1, rnn_layer: int = 2, rnn_units: int = 256, masking_mode: str = 'E', use_clstm: bool = True, bidirectional: bool = False, use_cbn: bool = False, kernel_size: int = 5, kernel_num: List[int] = [32, 64, 128, 256, 256, 256], use_builtin_complex: bool = True, use_noise_mask: bool = False)
Bases: AbsSeparator
DCCRN separator.
- Parameters:
- input_dim (int) – input dimension。
- num_spk (int , optional) – number of speakers. Defaults to 1.
- rnn_layer (int , optional) – number of lstm layers in the crn. Defaults to 2.
- rnn_units (int , optional) – rnn units. Defaults to 128.
- masking_mode (str , optional) – usage of the estimated mask. Defaults to “E”.
- use_clstm (bool , optional) – whether use complex LSTM. Defaults to False.
- bidirectional (bool , optional) – whether use BLSTM. Defaults to False.
- use_cbn (bool , optional) – whether use complex BN. Defaults to False.
- kernel_size (int , optional) – convolution kernel size. Defaults to 5.
- kernel_num (list , optional) – output dimension of each layer of the encoder.
- use_builtin_complex (bool , optional) – torch.complex if True, else ComplexTensor.
- use_noise_mask (bool , optional) – whether to estimate the mask of noise.
apply_masks(masks: List[Tensor | ComplexTensor], real: Tensor, imag: Tensor)
apply masks
- Parameters:
- masks – est_masks, [(B, T, F), …]
- real (torch.Tensor) – real part of the noisy spectrum, (B, F, T)
- imag (torch.Tensor) – imag part of the noisy spectrum, (B, F, T)
- Returns: [(B, T, F), …]
- Return type: masked (List[Union(torch.Tensor, ComplexTensor)])
create_masks(mask_tensor: Tensor)
create estimated mask for each speaker
- Parameters:mask_tensor (torch.Tensor) – output of decoder, shape(B, 2*num_spk, F-1, T)
flatten_parameters()
forward(input: Tensor | ComplexTensor, ilens: Tensor, additional: Dict | None = None) → Tuple[List[Tensor | ComplexTensor], Tensor, OrderedDict]
Forward.
Parameters:
- input (torch.Tensor or ComplexTensor) – Encoded feature [B, T, F]
- ilens (torch.Tensor) – input lengths [Batch]
- additional (Dict or None) – other data included in model NOTE: not used in this model
Returns: [(B, T, F), …] ilens (torch.Tensor): (B,) others predicted data, e.g. masks: OrderedDict[
’mask_spk1’: torch.Tensor(Batch, Frames, Freq), ‘mask_spk2’: torch.Tensor(Batch, Frames, Freq), … ‘mask_spkn’: torch.Tensor(Batch, Frames, Freq),
]
Return type: masked (List[Union(torch.Tensor, ComplexTensor)])
property num_spk