espnet2.enh.separator.fasnet_separator.FaSNetSeparator
espnet2.enh.separator.fasnet_separator.FaSNetSeparator
class espnet2.enh.separator.fasnet_separator.FaSNetSeparator(input_dim: int, enc_dim: int, feature_dim: int, hidden_dim: int, layer: int, segment_size: int, num_spk: int, win_len: int, context_len: int, fasnet_type: str, dropout: float = 0.0, sr: int = 16000, predict_noise: bool = False)
Bases: AbsSeparator
Filter-and-sum Network (FaSNet) Separator
- Parameters:
- input_dim – required by AbsSeparator. Not used in this model.
- enc_dim – encoder dimension
- feature_dim – feature dimension
- hidden_dim – hidden dimension in DPRNN
- layer – number of DPRNN blocks in iFaSNet
- segment_size – dual-path segment size
- num_spk – number of speakers
- win_len – window length in millisecond
- context_len – context length in millisecond
- fasnet_type – ‘fasnet’ or ‘ifasnet’. Select from origin fasnet or Implicit fasnet
- dropout – dropout rate. Default is 0.
- sr – samplerate of input audio
- predict_noise – whether to output the estimated noise signal
forward(input: Tensor, ilens: Tensor, additional: Dict | None = None) → Tuple[List[Tensor], Tensor, OrderedDict]
Forward.
Parameters:
- input (torch.Tensor) – (Batch, samples, channels)
- ilens (torch.Tensor) – input lengths [Batch]
- additional (Dict or None) – other data included in model NOTE: not used in this model
Returns: [(B, T, N), …] ilens (torch.Tensor): (B,) others predicted data, e.g. masks: OrderedDict[
’mask_spk1’: torch.Tensor(Batch, Frames, Freq), ‘mask_spk2’: torch.Tensor(Batch, Frames, Freq), … ‘mask_spkn’: torch.Tensor(Batch, Frames, Freq),
]
Return type: separated (List[Union(torch.Tensor, ComplexTensor)])
property num_spk