espnet2.asr_transducer.encoder.modules.attention.RelPositionMultiHeadedAttention
espnet2.asr_transducer.encoder.modules.attention.RelPositionMultiHeadedAttention
class espnet2.asr_transducer.encoder.modules.attention.RelPositionMultiHeadedAttention(num_heads: int, embed_size: int, dropout_rate: float = 0.0, simplified_attention_score: bool = False)
Bases: Module
RelPositionMultiHeadedAttention definition.
- Parameters:
- num_heads – Number of attention heads.
- embed_size – Embedding size.
- dropout_rate – Dropout rate.
Construct an MultiHeadedAttention object.
compute_attention_score(query: Tensor, key: Tensor, pos_enc: Tensor, left_context: int = 0) → Tensor
Attention score computation.
- Parameters:
- query – Transformed query tensor. (B, H, T_1, d_k)
- key – Transformed key tensor. (B, H, T_2, d_k)
- pos_enc – Positional embedding tensor. (B, 2 * T_1 - 1, size)
- left_context – Number of previous frames to use for current chunk attention computation.
- Returns: Attention score. (B, H, T_1, T_2)
compute_simplified_attention_score(query: Tensor, key: Tensor, pos_enc: Tensor, left_context: int = 0) → Tensor
Simplified attention score computation.
Reference: https://github.com/k2-fsa/icefall/pull/458
- Parameters:
- query – Transformed query tensor. (B, H, T_1, d_k)
- key – Transformed key tensor. (B, H, T_2, d_k)
- pos_enc – Positional embedding tensor. (B, 2 * T_1 - 1, size)
- left_context – Number of previous frames to use for current chunk attention computation.
- Returns: Attention score. (B, H, T_1, T_2)
forward(query: Tensor, key: Tensor, value: Tensor, pos_enc: Tensor, mask: Tensor, chunk_mask: Tensor | None = None, left_context: int = 0) → Tensor
Compute scaled dot product attention with rel. positional encoding.
- Parameters:
- query – Query tensor. (B, T_1, size)
- key – Key tensor. (B, T_2, size)
- value – Value tensor. (B, T_2, size)
- pos_enc – Positional embedding tensor. (B, 2 * T_1 - 1, size)
- mask – Source mask. (B, T_2)
- chunk_mask – Chunk mask. (T_1, T_1)
- left_context – Number of previous frames to use for current chunk attention computation.
- Returns: Output tensor. (B, T_1, H * d_k)
forward_attention(value: Tensor, scores: Tensor, mask: Tensor, chunk_mask: Tensor | None = None) → Tensor
Compute attention context vector.
- Parameters:
- value – Transformed value. (B, H, T_2, d_k)
- scores – Attention score. (B, H, T_1, T_2)
- mask – Source mask. (B, T_2)
- chunk_mask – Chunk mask. (T_1, T_1)
- Returns: Transformed value weighted by attention score. (B, T_1, H * d_k)
- Return type: attn_output
forward_qkv(query: Tensor, key: Tensor, value: Tensor) → Tuple[Tensor, Tensor, Tensor]
Transform query, key and value.
- Parameters:
- query – Query tensor. (B, T_1, size)
- key – Key tensor. (B, T_2, size)
- v – Value tensor. (B, T_2, size)
- Returns: Transformed query tensor. (B, H, T_1, d_k) k: Transformed key tensor. (B, H, T_2, d_k) v: Transformed value tensor. (B, H, T_2, d_k)
- Return type: q
rel_shift(x: Tensor, left_context: int = 0) → Tensor
Compute relative positional encoding.
- Parameters:
- x – Input sequence. (B, H, T_1, 2 * T_1 - 1)
- left_context – Number of previous frames to use for current chunk attention computation.
- Returns: Output sequence. (B, H, T_1, T_2)
- Return type: x