espnet2.enh.layers.dnn_beamformer.DNN_Beamformer
espnet2.enh.layers.dnn_beamformer.DNN_Beamformer
class espnet2.enh.layers.dnn_beamformer.DNN_Beamformer(bidim, btype: str = 'blstmp', blayers: int = 3, bunits: int = 300, bprojs: int = 320, num_spk: int = 1, use_noise_mask: bool = True, nonlinear: str = 'sigmoid', dropout_rate: float = 0.0, badim: int = 320, ref_channel: int = -1, beamformer_type: str = 'mvdr_souden', rtf_iterations: int = 2, mwf_mu: float = 1.0, eps: float = 1e-06, diagonal_loading: bool = True, diag_eps: float = 1e-07, mask_flooring: bool = False, flooring_thres: float = 1e-06, use_torch_solver: bool = True, use_torchaudio_api: bool = False, btaps: int = 5, bdelay: int = 3)
Bases: Module
DNN mask based Beamformer.
Citation: : Multichannel End-to-end Speech Recognition; T. Ochiai et al., 2017; http://proceedings.mlr.press/v70/ochiai17a/ochiai17a.pdf
Initializes internal Module state, shared by both nn.Module and ScriptModule.
apply_beamforming(data, ilens, psd_n, psd_speech, psd_distortion=None, rtf_mat=None, spk=0)
Beamforming with the provided statistics.
- Parameters:
- data (torch.complex64/ComplexTensor) – (B, F, C, T)
- ilens (torch.Tensor) – (B,)
- psd_n (torch.complex64/ComplexTensor) – Noise covariance matrix for MVDR (B, F, C, C) Observation covariance matrix for MPDR/wMPDR (B, F, C, C) Stacked observation covariance for WPD (B,F,(btaps+1)*C,(btaps+1)*C)
- psd_speech (torch.complex64/ComplexTensor) – Speech covariance matrix (B, F, C, C)
- psd_distortion (torch.complex64/ComplexTensor) – Noise covariance matrix (B, F, C, C)
- rtf_mat (torch.complex64/ComplexTensor) – RTF matrix (B, F, C, num_spk)
- spk (int) – speaker index
- Returns: (B, F, T) ws (torch.complex64/ComplexTensor): (B, F) or (B, F, (btaps+1)*C)
- Return type: enhanced (torch.complex64/ComplexTensor)
forward(data: Tensor | ComplexTensor, ilens: LongTensor, powers: List[Tensor] | None = None, oracle_masks: List[Tensor] | None = None) → Tuple[Tensor | ComplexTensor, LongTensor, Tensor]
DNN_Beamformer forward function.
Notation: : B: Batch C: Channel T: Time or Sequence length F: Freq
- Parameters:
- data (torch.complex64/ComplexTensor) – (B, T, C, F)
- ilens (torch.Tensor) – (B,)
- powers (List *[*torch.Tensor ] or None) – used for wMPDR or WPD (B, F, T)
- oracle_masks (List *[*torch.Tensor ] or None) – oracle masks (B, F, C, T) if not None, oracle_masks will be used instead of self.mask
- Returns: (B, T, F) ilens (torch.Tensor): (B,) masks (torch.Tensor): (B, T, C, F)
- Return type: enhanced (torch.complex64/ComplexTensor)
predict_mask(data: Tensor | ComplexTensor, ilens: LongTensor) → Tuple[Tuple[Tensor, ...], LongTensor]
Predict masks for beamforming.
- Parameters:
- data (torch.complex64/ComplexTensor) – (B, T, C, F), double precision
- ilens (torch.Tensor) – (B,)
- Returns: (B, T, C, F) ilens (torch.Tensor): (B,)
- Return type: masks (torch.Tensor)