espnet2.enh.layers.tcndenseunet.TCNDenseUNet

Less than 1 minute

espnet2.enh.layers.tcndenseunet.TCNDenseUNet

class espnet2.enh.layers.tcndenseunet.TCNDenseUNet(n_spk=1, in_freqs=257, mic_channels=1, hid_chans=32, hid_chans_dense=32, ksz_dense=(3, 3), ksz_tcn=3, tcn_repeats=4, tcn_blocks=7, tcn_channels=384, activation=<class 'torch.nn.modules.activation.ELU'>)

Bases: Module

TCNDenseNet block from iNeuBe

Reference: Lu, Y. J., Cornell, S., Chang, X., Zhang, W., Li, C., Ni, Z., … & Watanabe, S. Towards Low-Distortion Multi-Channel Speech Enhancement: The ESPNET-Se Submission to the L3DAS22 Challenge. ICASSP 2022 p. 9201-9205.

Parameters:
- n_spk – number of output sources/speakers.
- in_freqs – number of complex STFT frequencies.
- mic_channels – number of microphones channels (only fixed-array geometry supported).
- hid_chans – number of channels in the subsampling/upsampling conv layers.
- hid_chans_dense – number of channels in the densenet layers (reduce this to reduce VRAM requirements).
- ksz_dense – kernel size in the densenet layers thorough iNeuBe.
- ksz_tcn – kernel size in the TCN submodule.
- tcn_repeats – number of repetitions of blocks in the TCN submodule.
- tcn_blocks – number of blocks in the TCN submodule.
- tcn_channels – number of channels in the TCN submodule.
- activation – activation function to use in the whole iNeuBe model, you can use any torch supported activation e.g. ‘relu’ or ‘elu’.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(tf_rep)

forward.

Parameters:tf_rep (torch.Tensor) – 4D tensor (multi-channel complex STFT of mixture) of shape [B, T, C, F] batch, frames, microphones, frequencies.
Returns: complex 3D tensor monaural STFT of the targets : shape is [B, T, F] batch, frames, frequencies.
Return type: out (torch.Tensor)