espnet.nets.beam_search.BeamSearch

About 4 min

espnet.nets.beam_search.BeamSearch

class espnet.nets.beam_search.BeamSearch(scorers: Dict[str, ScorerInterface], weights: Dict[str, float], beam_size: int, vocab_size: int, sos: int, eos: int, token_list: List[str] | None = None, pre_beam_ratio: float = 1.5, pre_beam_score_key: str | None = None, return_hs: bool = False, hyp_primer: List[int] | None = None, normalize_length: bool = False)

Bases: Module

Beam search implementation.

Initialize beam search.

Parameters:
- scorers (dict *[*str , ScorerInterface ]) – Dict of decoder modules e.g., Decoder, CTCPrefixScorer, LM The scorer will be ignored if it is None
- weights (dict *[*str , float ]) – Dict of weights for each scorers The scorer will be ignored if its weight is 0
- beam_size (int) – The number of hypotheses kept during search
- vocab_size (int) – The number of vocabulary
- sos (int) – Start of sequence id
- eos (int) – End of sequence id
- token_list (list *[*str ]) – List of tokens for debug log
- pre_beam_score_key (str) – key of scores to perform pre-beam search
- pre_beam_ratio (float) – beam size in the pre-beam search will be int(pre_beam_ratio * beam_size)
- return_hs (bool) – Whether to return hidden intermediates
- normalize_length (bool) – If true, select the best ended hypotheses based on length-normalized scores rather than the accumulated scores

static append_token(xs: Tensor, x: int) → Tensor

Append new token to prefix tokens.

Parameters:
- xs (torch.Tensor) – The prefix token
- x (int) – The new token to append
Returns: New tensor contains: xs + [x] with xs.dtype and xs.device
Return type: torch.Tensor

beam(weighted_scores: Tensor, ids: Tensor) → Tuple[Tensor, Tensor]

Compute topk full token ids and partial token ids.

Parameters:
- weighted_scores (torch.Tensor) – The weighted sum scores for each tokens.
- ` (Its shape is)
- ids (torch.Tensor) – The partial token ids to compute topk
Returns: The topk full token ids and partial token ids. Their shapes are (self.beam_size,)
Return type: Tuple[torch.Tensor, torch.Tensor]

forward(x: Tensor, maxlenratio: float = 0.0, minlenratio: float = 0.0, pre_x: Tensor | None = None) → List[Hypothesis]

Perform beam search.

Parameters:
- x (torch.Tensor) – Encoded speech feature (T, D)
- maxlenratio (float) – Input length ratio to obtain max output length. If maxlenratio=0.0 (default), it uses a end-detect function to automatically find maximum hypothesis lengths If maxlenratio<0.0, its absolute value is interpreted as a constant max output length.
- minlenratio (float) – Input length ratio to obtain min output length. If minlenratio<0.0, its absolute value is interpreted as a constant min output length.
- pre_x (torch.Tensor) – Encoded speech feature for sequential attn (T, D) Sequential attn computes attn first on pre_x then on x, thereby attending to two sources in sequence.
Returns: N-best decoding results
Return type: list[Hypothesis]

init_hyp(x: Tensor) → List[Hypothesis]

Get an initial hypothesis data.

Parameters:x (torch.Tensor) – The encoder output feature
Returns: The initial hypothesis.
Return type:Hypothesis

static merge_scores(prev_scores: Dict[str, float], next_full_scores: Dict[str, Tensor], full_idx: int, next_part_scores: Dict[str, Tensor], part_idx: int) → Dict[str, Tensor]

Merge scores for new hypothesis.

Parameters:
- prev_scores (Dict *[*str , float ]) – The previous hypothesis scores by self.scorers
- next_full_scores (Dict *[*str , torch.Tensor ]) – scores by self.full_scorers
- full_idx (int) – The next token id for next_full_scores
- next_part_scores (Dict *[*str , torch.Tensor ]) – scores of partial tokens by self.part_scorers
- part_idx (int) – The new token id for next_part_scores
Returns: The new score dict. : Its keys are names of self.full_scorers and self.part_scorers. Its values are scalar tensors by the scorers.
Return type: Dict[str, torch.Tensor]

merge_states(states: Any, part_states: Any, part_idx: int) → Any

Merge states for new hypothesis.

Parameters:
- states – states of self.full_scorers
- part_states – states of self.part_scorers
- part_idx (int) – The new token id for part_scores
Returns: The new score dict. : Its keys are names of self.full_scorers and self.part_scorers. Its values are states of the scorers.
Return type: Dict[str, torch.Tensor]

post_process(i: int, maxlen: int, minlen: int, maxlenratio: float, running_hyps: List[Hypothesis], ended_hyps: List[Hypothesis]) → List[Hypothesis]

Perform post-processing of beam search iterations.

Parameters:
- i (int) – The length of hypothesis tokens.
- maxlen (int) – The maximum length of tokens in beam search.
- maxlenratio (int) – The maximum length ratio in beam search.
- running_hyps (List [Hypothesis ]) – The running hypotheses in beam search.
- ended_hyps (List [Hypothesis ]) – The ended hypotheses in beam search.
Returns: The new running hypotheses.
Return type: List[Hypothesis]

score_full(hyp: Hypothesis, x: Tensor, pre_x: Tensor | None = None) → Tuple[Dict[str, Tensor], Dict[str, Any]]

Score new hypothesis by self.full_scorers.

Parameters:
- hyp (Hypothesis) – Hypothesis with prefix tokens to score
- x (torch.Tensor) – Corresponding input feature
- pre_x (torch.Tensor) – Encoded speech feature for sequential attn (T, D) Sequential attn computes attn first on pre_x then on x, thereby attending to two sources in sequence.
Returns: Tuple of : score dict of hyp that has string keys of self.full_scorers and tensor score values of shape: (self.n_vocab,), and state dict that has string keys and state values of self.full_scorers
Return type: Tuple[Dict[str, torch.Tensor], Dict[str, Any]]

score_partial(hyp: Hypothesis, ids: Tensor, x: Tensor) → Tuple[Dict[str, Tensor], Dict[str, Any]]

Score new hypothesis by self.part_scorers.

Parameters:
- hyp (Hypothesis) – Hypothesis with prefix tokens to score
- ids (torch.Tensor) – 1D tensor of new partial tokens to score
- x (torch.Tensor) – Corresponding input feature
Returns: Tuple of : score dict of hyp that has string keys of self.part_scorers and tensor score values of shape: (len(ids),), and state dict that has string keys and state values of self.part_scorers
Return type: Tuple[Dict[str, torch.Tensor], Dict[str, Any]]

search(running_hyps: List[Hypothesis], x: Tensor, pre_x: Tensor | None = None) → List[Hypothesis]

Search new tokens for running hypotheses and encoded speech x.

Parameters:
- running_hyps (List [Hypothesis ]) – Running hypotheses on beam
- x (torch.Tensor) – Encoded speech feature (T, D)
- pre_x (torch.Tensor) – Encoded speech feature for sequential attn (T, D) Sequential attn computes attn first on pre_x then on x, thereby attending to two sources in sequence.
Returns: Best sorted hypotheses
Return type: List[Hypotheses]

set_hyp_primer(hyp_primer: List[int] | None = None) → None

Set the primer sequence for decoding.

Used for OpenAI Whisper models.