espnet2.asr.transducer.rnnt_multi_blank.rnnt.multiblank_rnnt_loss_gpu

Less than 1 minute

espnet2.asr.transducer.rnnt_multi_blank.rnnt.multiblank_rnnt_loss_gpu

espnet2.asr.transducer.rnnt_multi_blank.rnnt.multiblank_rnnt_loss_gpu(acts: Tensor, labels: Tensor, input_lengths: Tensor, label_lengths: Tensor, costs: Tensor, grads: Tensor, blank_label: int, big_blank_durations: list, fastemit_lambda: float, clamp: float, num_threads: int, sigma: float)

Wrapper method for accessing GPU Multi-blank RNNT loss

(https://arxiv.org/pdf/2211.03541.pdf).

CUDA implementation ported from [HawkAaron/warp-transducer] : (https://github.com/HawkAaron/warp-transducer).

Parameters:
- acts – Activation tensor of shape [B, T, U, V + num_big_blanks + 1].
- labels – Ground truth labels of shape [B, U].
- input_lengths – Lengths of the acoustic sequence as a vector of ints [B].
- label_lengths – Lengths of the target sequence as a vector of ints [B].
- costs – Zero vector of length [B] in which costs will be set.
- grads – Zero tensor of shape [B, T, U, V + num_big_blanks + 1] where the gradient will be set.
- blank_label – Index of the standard blank token in the vocabulary.
- big_blank_durations – A list of supported durations for big blank symbols in the model, e.g. [2, 4, 8]. Note we only include durations for
```
``
```
  big blanks’’ here and it should not include 1 for the standard blank. Those big blanks have vocabulary indices after the standard blank index.
- fastemit_lambda – Float scaling factor for FastEmit regularization. Refer to FastEmit: Low-latency Streaming ASR with Sequence-level Emission Regularization.
- clamp – Float value. When set to value >= 0.0, will clamp the gradient to [-clamp, clamp].
- num_threads – Number of threads for OpenMP.
- sigma – logit-undernormalization weight used in the multi-blank model. Refer to the multi-blank paper https://arxiv.org/pdf/2211.03541 for detailed explanations.