espnet2.spk.pooling.chn_attn_stat_pooling.ChnAttnStatPooling
Less than 1 minute
espnet2.spk.pooling.chn_attn_stat_pooling.ChnAttnStatPooling
class espnet2.spk.pooling.chn_attn_stat_pooling.ChnAttnStatPooling(input_size: int = 1536)
Bases: AbsPooling
Aggregates frame-level features to single utterance-level feature.
Reference: ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification https://arxiv.org/pdf/2005.07143
- Parameters:input_size – Dimension of the input frame-level embeddings. The output dimensionality will be 2 × input_size after concatenating mean and std.
Initializes internal Module state, shared by both nn.Module and ScriptModule.
forward(x: Tensor, feat_lengths: Tensor | None = None) → Tensor
Forward pass of channel-attentive statistical pooling.
- Parameters:
- x – Input feature tensor of shape (batch_size, feature_dim, seq_len)
- feat_lengths – Optional tensor of shape (batch_size,) containing the valid length of each sequence before padding
- Returns: Utterance-level embeddings of shape (batch_size, 2 × feature_dim)
- Return type: x
output_size()