espnet2.enh.separator.dpcl_separator.DPCLSeparator
espnet2.enh.separator.dpcl_separator.DPCLSeparator
class espnet2.enh.separator.dpcl_separator.DPCLSeparator(input_dim: int, rnn_type: str = 'blstm', num_spk: int = 2, nonlinear: str = 'tanh', layer: int = 2, unit: int = 512, emb_D: int = 40, dropout: float = 0.0)
Bases: AbsSeparator
Deep Clustering Separator.
References
[1] Deep clustering: Discriminative embeddings for segmentation and : separation; John R. Hershey. et al., 2016; https://ieeexplore.ieee.org/document/7471631
[2] Manifold-Aware Deep Clustering: Maximizing Angles Between Embedding : Vectors Based on Regular Simplex; Tanaka, K. et al., 2021; https://www.isca-speech.org/archive/interspeech_2021/tanaka21_interspeech.html
- Parameters:
- input_dim – input feature dimension
- rnn_type – string, select from ‘blstm’, ‘lstm’ etc.
- bidirectional – bool, whether the inter-chunk RNN layers are bidirectional.
- num_spk – number of speakers
- nonlinear – the nonlinear function for mask estimation, select from ‘relu’, ‘tanh’, ‘sigmoid’
- layer – int, number of stacked RNN layers. Default is 3.
- unit – int, dimension of the hidden state.
- emb_D – int, dimension of the feature vector for a tf-bin.
- dropout – float, dropout ratio. Default is 0.
forward(input: Tensor | ComplexTensor, ilens: Tensor, additional: Dict | None = None) → Tuple[List[Tensor | ComplexTensor], Tensor, OrderedDict]
Forward.
Parameters:
- input (torch.Tensor or ComplexTensor) – Encoded feature [B, T, F]
- ilens (torch.Tensor) – input lengths [Batch]
- additional (Dict or None) – other data included in model NOTE: not used in this model
Returns: [(B, T, N), …] ilens (torch.Tensor): (B,) others predicted data, e.g. tf_embedding: OrderedDict[
’tf_embedding’: learned embedding of all T-F bins (B, T * F, D),
]
Return type: masked (List[Union(torch.Tensor, ComplexTensor)])
property num_spk