espnet2.enh.separator.dc_crn_separator.DC_CRNSeparator
espnet2.enh.separator.dc_crn_separator.DC_CRNSeparator
class espnet2.enh.separator.dc_crn_separator.DC_CRNSeparator(input_dim: int, num_spk: int = 2, predict_noise: bool = False, input_channels: List = [2, 16, 32, 64, 128, 256], enc_hid_channels: int = 8, enc_kernel_size: Tuple = (1, 3), enc_padding: Tuple = (0, 1), enc_last_kernel_size: Tuple = (1, 4), enc_last_stride: Tuple = (1, 2), enc_last_padding: Tuple = (0, 1), enc_layers: int = 5, skip_last_kernel_size: Tuple = (1, 3), skip_last_stride: Tuple = (1, 1), skip_last_padding: Tuple = (0, 1), glstm_groups: int = 2, glstm_layers: int = 2, glstm_bidirectional: bool = False, glstm_rearrange: bool = False, mode: str = 'masking', ref_channel: int = 0)
Bases: AbsSeparator
Densely-Connected Convolutional Recurrent Network (DC-CRN) Separator
Reference: : Deep Learning Based Real-Time Speech Enhancement for Dual-Microphone Mobile Phones; Tan et al., 2020 https://web.cse.ohio-state.edu/~wang.77/papers/TZW.taslp21.pdf
- Parameters:
- input_dim – input feature dimension
- num_spk – number of speakers
- predict_noise – whether to output the estimated noise signal
- input_channels (list) – number of input channels for the stacked DenselyConnectedBlock layers Its length should be (number of DenselyConnectedBlock layers).
- enc_hid_channels (int) – common number of intermediate channels for all DenselyConnectedBlock of the encoder
- enc_kernel_size (tuple) – common kernel size for all DenselyConnectedBlock of the encoder
- enc_padding (tuple) – common padding for all DenselyConnectedBlock of the encoder
- enc_last_kernel_size (tuple) – common kernel size for the last Conv layer in all DenselyConnectedBlock of the encoder
- enc_last_stride (tuple) – common stride for the last Conv layer in all DenselyConnectedBlock of the encoder
- enc_last_padding (tuple) – common padding for the last Conv layer in all DenselyConnectedBlock of the encoder
- enc_layers (int) – common total number of Conv layers for all DenselyConnectedBlock layers of the encoder
- skip_last_kernel_size (tuple) – common kernel size for the last Conv layer in all DenselyConnectedBlock of the skip pathways
- skip_last_stride (tuple) – common stride for the last Conv layer in all DenselyConnectedBlock of the skip pathways
- skip_last_padding (tuple) – common padding for the last Conv layer in all DenselyConnectedBlock of the skip pathways
- glstm_groups (int) – number of groups in each Grouped LSTM layer
- glstm_layers (int) – number of Grouped LSTM layers
- glstm_bidirectional (bool) – whether to use BLSTM or unidirectional LSTM in Grouped LSTM layers
- glstm_rearrange (bool) – whether to apply the rearrange operation after each grouped LSTM layer
- output_channels (int) – number of output channels (even number)
- mode (str) – one of (“mapping”, “masking”) “mapping”: complex spectral mapping “masking”: complex masking
- ref_channel (int) – index of the reference microphone
forward(input: Tensor | ComplexTensor, ilens: Tensor, additional: Dict | None = None) → Tuple[List[Tensor | ComplexTensor], Tensor, OrderedDict]
DC-CRN Separator Forward.
Parameters:
- input (torch.Tensor or ComplexTensor) – Encoded feature [Batch, T, F] or [Batch, T, C, F]
- ilens (torch.Tensor) – input lengths [Batch,]
Returns: [(Batch, T, F), …] ilens (torch.Tensor): (B,) others predicted data, e.g. masks: OrderedDict[
’mask_spk1’: torch.Tensor(Batch, Frames, Freq), ‘mask_spk2’: torch.Tensor(Batch, Frames, Freq), … ‘mask_spkn’: torch.Tensor(Batch, Frames, Freq),
]
Return type: masked (List[Union(torch.Tensor, ComplexTensor)])
property num_spk