espnet2.asr.transducer.rnnt_multi_blank.rnnt.multiblank_rnnt_loss_gpu
espnet2.asr.transducer.rnnt_multi_blank.rnnt.multiblank_rnnt_loss_gpu
espnet2.asr.transducer.rnnt_multi_blank.rnnt.multiblank_rnnt_loss_gpu(acts: Tensor, labels: Tensor, input_lengths: Tensor, label_lengths: Tensor, costs: Tensor, grads: Tensor, blank_label: int, big_blank_durations: list, fastemit_lambda: float, clamp: float, num_threads: int, sigma: float)
Wrapper method for accessing GPU Multi-blank RNNT loss
(https://arxiv.org/pdf/2211.03541.pdf).
CUDA implementation ported from [HawkAaron/warp-transducer] : (https://github.com/HawkAaron/warp-transducer).
- Parameters:
acts – Activation tensor of shape [B, T, U, V + num_big_blanks + 1].
labels – Ground truth labels of shape [B, U].
input_lengths – Lengths of the acoustic sequence as a vector of ints [B].
label_lengths – Lengths of the target sequence as a vector of ints [B].
costs – Zero vector of length [B] in which costs will be set.
grads – Zero tensor of shape [B, T, U, V + num_big_blanks + 1] where the gradient will be set.
blank_label – Index of the standard blank token in the vocabulary.
big_blank_durations – A list of supported durations for big blank symbols in the model, e.g. [2, 4, 8]. Note we only include durations for
``
big blanks’’ here and it should not include 1 for the standard blank. Those big blanks have vocabulary indices after the standard blank index.
fastemit_lambda – Float scaling factor for FastEmit regularization. Refer to FastEmit: Low-latency Streaming ASR with Sequence-level Emission Regularization.
clamp – Float value. When set to value >= 0.0, will clamp the gradient to [-clamp, clamp].
num_threads – Number of threads for OpenMP.
sigma – logit-undernormalization weight used in the multi-blank model. Refer to the multi-blank paper https://arxiv.org/pdf/2211.03541 for detailed explanations.