espnet2.asr.encoder.beats_encoder.MultiheadAttention
espnet2.asr.encoder.beats_encoder.MultiheadAttention
class espnet2.asr.encoder.beats_encoder.MultiheadAttention(embed_dim, num_heads, kdim=None, vdim=None, dropout=0.0, bias=True, add_bias_kv=False, add_zero_attn=False, self_attention=False, encoder_decoder_attention=False, q_noise=0.0, qn_block_size=8, has_relative_attention_bias=False, num_buckets=32, max_distance=128, gru_rel_pos=False, rescale_init=False)
Bases: Module
Multi-headed attention.
See “Attention Is All You Need” for more details.
Initializes internal Module state, shared by both nn.Module and ScriptModule.
apply_sparse_mask(attn_weights, tgt_len: int, src_len: int, bsz: int)
No op
compute_bias(query_length, key_length)
Compute relative position bias.
forward(query, key: Tensor | None, value: Tensor | None, key_padding_mask: Tensor | None = None, incremental_state: Dict[str, Dict[str, Tensor | None]] | None = None, need_weights: bool = True, static_kv: bool = False, attn_mask: Tensor | None = None, before_softmax: bool = False, need_head_weights: bool = False, position_bias: Tensor | None = None) → Tuple[Tensor, Tensor | None, Tensor | None]
Input shape: Time x Batch x Channel
- Parameters:
- key_padding_mask (ByteTensor , optional) – mask to exclude keys that are pads, of shape (batch, src_len), where padding elements are indicated by 1s.
- need_weights (bool , optional) – return the attention weights, averaged over heads (default: False).
- attn_mask (ByteTensor , optional) – typically used to implement causal attention, where the mask prevents the attention from looking forward in time (default: None).
- before_softmax (bool , optional) – return the raw attention weights and values before the attention softmax.
- need_head_weights (bool , optional) – return the attention weights for each head. Implies need_weights. Default: return the average attention weights over all heads.
reset_parameters()
Initiate parameters in the transformer model.