espnet2.enh.layers.uses.ATFBlock

Less than 1 minute

espnet2.enh.layers.uses.ATFBlock

class espnet2.enh.layers.uses.ATFBlock(input_size, rnn_type='lstm', hidden_size=128, att_heads=4, dropout=0.0, activation='relu', bidirectional=True, norm_type='cLN', ch_mode='att', ch_att_dim=256, eps=1e-05, with_channel_modeling=True)

Bases: Module

Container module for a single Attentive Time-Frequency Block.

Parameters:
- input_size (int) – dimension of the input feature.
- rnn_type (str) – type of the RNN cell in the improved Transformer layer.
- hidden_size (int) – hidden dimension of the RNN cell.
- att_heads (int) – number of attention heads in Transformer.
- dropout (float) – dropout ratio. Default is 0.
- activation (str) – non-linear activation function applied in each block.
- bidirectional (bool) – whether the RNN layers are bidirectional.
- norm_type (str) – normalization type in the improved Transformer layer.
- ch_mode (str) – mode of channel modeling. Select from “att” and “tac”.
- ch_att_dim (int) – dimension of the channel attention.
- eps (float) – epsilon for layer normalization.
- with_channel_modeling (bool) – whether to use channel modeling.

forward(input, ref_channel=None)

Forward.

Parameters:
- input (torch.Tensor) – feature sequence (batch, C, N, freq, time)
- ref_channel (None or int) – index of the reference channel. if None, simply average all channels. if int, take the specified channel instead of averaging.
Returns: output sequence (batch, C, N, freq, time)
Return type: output (torch.Tensor)

freq_path_process(x)

time_path_process(x)