espnet2.diar.layers.multi_mask.MultiMask
espnet2.diar.layers.multi_mask.MultiMask
class espnet2.diar.layers.multi_mask.MultiMask(input_dim: int, bottleneck_dim: int = 128, max_num_spk: int = 3, mask_nonlinear='relu')
Bases: AbsMask
Multiple 1x1 convolution layer Module.
This module corresponds to the final 1x1 conv block and non-linear function in TCNSeparator. This module has multiple 1x1 conv blocks. One of them is selected according to the given num_spk to handle flexible num_spk.
- Parameters:
- input_dim – Number of filters in autoencoder
- bottleneck_dim – Number of channels in bottleneck 1 * 1-conv block
- max_num_spk – Number of mask_conv1x1 modules (>= Max number of speakers in the dataset)
- mask_nonlinear – use which non-linear function to generate mask
forward(input: Tensor | ComplexTensor, ilens: Tensor, bottleneck_feat: Tensor, num_spk: int) → Tuple[List[Tensor | ComplexTensor], Tensor, OrderedDict]
Keep this API same with TasNet.
Parameters:
- input – [M, K, N], M is batch size
- ilens (torch.Tensor) – (M,)
- bottleneck_feat – [M, K, B]
- num_spk – number of speakers
- **(**Training – oracle,
- Inference – estimated by other module (e.g, EEND-EDA))
Returns: [(M, K, N), …] ilens (torch.Tensor): (M,) others predicted data, e.g. masks: OrderedDict[
’mask_spk1’: torch.Tensor(Batch, Frames, Freq), ‘mask_spk2’: torch.Tensor(Batch, Frames, Freq), … ‘mask_spkn’: torch.Tensor(Batch, Frames, Freq),
]
Return type: masked (List[Union(torch.Tensor, ComplexTensor)])
property max_num_spk : int