espnet2.asr.encoder.beats_encoder.MultiheadAttention

Less than 1 minute

espnet2.asr.encoder.beats_encoder.MultiheadAttention

class espnet2.asr.encoder.beats_encoder.MultiheadAttention(embed_dim, num_heads, kdim=None, vdim=None, dropout=0.0, bias=True, add_bias_kv=False, add_zero_attn=False, self_attention=False, encoder_decoder_attention=False, q_noise=0.0, qn_block_size=8, has_relative_attention_bias=False, num_buckets=32, max_distance=128, gru_rel_pos=False, rescale_init=False)

Bases: Module

Multi-headed attention.

See “Attention Is All You Need” for more details.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

apply_sparse_mask(attn_weights, tgt_len: int, src_len: int, bsz: int)

No op

compute_bias(query_length, key_length)

Compute relative position bias.

Input shape: Time x Batch x Channel

Parameters:
- key_padding_mask (ByteTensor , optional) – mask to exclude keys that are pads, of shape (batch, src_len), where padding elements are indicated by 1s.
- need_weights (bool , optional) – return the attention weights, averaged over heads (default: False).
- attn_mask (ByteTensor , optional) – typically used to implement causal attention, where the mask prevents the attention from looking forward in time (default: None).
- before_softmax (bool , optional) – return the raw attention weights and values before the attention softmax.
- need_head_weights (bool , optional) – return the attention weights for each head. Implies need_weights. Default: return the average attention weights over all heads.

reset_parameters()

Initiate parameters in the transformer model.