espnet2.enh.separator.tcn_separator.TCNSeparator
espnet2.enh.separator.tcn_separator.TCNSeparator
class espnet2.enh.separator.tcn_separator.TCNSeparator(input_dim: int, num_spk: int = 2, predict_noise: bool = False, layer: int = 8, stack: int = 3, bottleneck_dim: int = 128, hidden_dim: int = 512, kernel: int = 3, causal: bool = False, norm_type: str = 'gLN', nonlinear: str = 'relu', pre_mask_nonlinear: str = 'prelu', masking: bool = True)
Bases: AbsSeparator
Temporal Convolution Separator
- Parameters:
- input_dim – input feature dimension
- num_spk – number of speakers
- predict_noise – whether to output the estimated noise signal
- layer – int, number of layers in each stack.
- stack – int, number of stacks
- bottleneck_dim – bottleneck dimension
- hidden_dim – number of convolution channel
- kernel – int, kernel size.
- causal – bool, defalut False.
- norm_type – str, choose from ‘BN’, ‘gLN’, ‘cLN’
- nonlinear – the nonlinear function for mask estimation, select from ‘relu’, ‘tanh’, ‘sigmoid’, ‘linear’
- pre_mask_nonlinear – the non-linear function before masknet
- masking – whether to use the masking or mapping based method
forward(input: Tensor | ComplexTensor, ilens: Tensor, additional: Dict | None = None) → Tuple[List[Tensor | ComplexTensor], Tensor, OrderedDict]
Forward.
Parameters:
- input (torch.Tensor or ComplexTensor) – Encoded feature [B, T, N]
- ilens (torch.Tensor) – input lengths [Batch]
- additional (Dict or None) – other data included in model NOTE: not used in this model
Returns: [(B, T, N), …] ilens (torch.Tensor): (B,) others predicted data, e.g. masks: OrderedDict[
’mask_spk1’: torch.Tensor(Batch, Frames, Freq), ‘mask_spk2’: torch.Tensor(Batch, Frames, Freq), … ‘mask_spkn’: torch.Tensor(Batch, Frames, Freq),
]
Return type: masked (List[Union(torch.Tensor, ComplexTensor)])
forward_streaming(input_frame: Tensor, buffer=None)
property num_spk