espnet.nets.batch_beam_search_online_sim.BatchBeamSearchOnlineSim

About 2 min

espnet.nets.batch_beam_search_online_sim.BatchBeamSearchOnlineSim

class espnet.nets.batch_beam_search_online_sim.BatchBeamSearchOnlineSim(scorers: Dict[str, ScorerInterface], weights: Dict[str, float], beam_size: int, vocab_size: int, sos: int, eos: int, token_list: List[str] | None = None, pre_beam_ratio: float = 1.5, pre_beam_score_key: str | None = None, return_hs: bool = False, hyp_primer: List[int] | None = None, normalize_length: bool = False)

Bases: BatchBeamSearch

Online beam search implementation.

This simulates streaming decoding. It requires encoded features of entire utterance and extracts block by block from it as it shoud be done in streaming processing. This is based on Tsunoo et al, “STREAMING TRANSFORMER ASR WITH BLOCKWISE SYNCHRONOUS BEAM SEARCH” (https://arxiv.org/abs/2006.14941).

Initialize beam search.

Parameters:
- scorers (dict *[*str , ScorerInterface ]) – Dict of decoder modules e.g., Decoder, CTCPrefixScorer, LM The scorer will be ignored if it is None
- weights (dict *[*str , float ]) – Dict of weights for each scorers The scorer will be ignored if its weight is 0
- beam_size (int) – The number of hypotheses kept during search
- vocab_size (int) – The number of vocabulary
- sos (int) – Start of sequence id
- eos (int) – End of sequence id
- token_list (list *[*str ]) – List of tokens for debug log
- pre_beam_score_key (str) – key of scores to perform pre-beam search
- pre_beam_ratio (float) – beam size in the pre-beam search will be int(pre_beam_ratio * beam_size)
- return_hs (bool) – Whether to return hidden intermediates
- normalize_length (bool) – If true, select the best ended hypotheses based on length-normalized scores rather than the accumulated scores

extend(x: Tensor, hyps: Hypothesis) → List[Hypothesis]

Extend probabilities and states with more encoded chunks.

Parameters:
- x (torch.Tensor) – The extended encoder output feature
- hyps (Hypothesis) – Current list of hypothesis
Returns: The extended hypothesis
Return type:Hypothesis

forward(x: Tensor, maxlenratio: float = 0.0, minlenratio: float = 0.0) → List[Hypothesis]

Perform beam search.

Parameters:
- x (torch.Tensor) – Encoded speech feature (T, D)
- maxlenratio (float) – Input length ratio to obtain max output length. If maxlenratio=0.0 (default), it uses a end-detect function to automatically find maximum hypothesis lengths
- minlenratio (float) – Input length ratio to obtain min output length.
Returns: N-best decoding results
Return type: list[Hypothesis]

set_block_size(block_size: int)

Set block size for streaming decoding.

Parameters:block_size (int) – The block size of encoder

set_hop_size(hop_size: int)

Set hop size for streaming decoding.

Parameters:hop_size (int) – The hop size of encoder

set_look_ahead(look_ahead: int)

Set look ahead size for streaming decoding.

Parameters:look_ahead (int) – The look ahead size of encoder

set_streaming_config(asr_config: str)

Set config file for streaming decoding.

Parameters:asr_config (str) – The config file for asr training