espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.compute_multiblank_betas_kernel

About 1 min

espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.compute_multiblank_betas_kernel

espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.compute_multiblank_betas_kernel(acts: Tensor, denom: Tensor, sigma: float, betas: Tensor, llBackward: Tensor, xlen: Tensor, ylen: Tensor, mlabels: Tensor, minibatch: int, maxT: int, maxU: int, alphabet_size: int, blank_: int, big_blank_duration: Tensor, num_big_blanks: int)

Compute beta (backward variable) probabilities for multi-blank transducer loss

(https://arxiv.org/pdf/2211.03541).

Parameters:
- acts – Tensor of shape [B, T, U, V + 1 + num-big-blanks] flattened. Represents the logprobs activation tensor.
- denom – Tensor of shape [B, T, U] flattened. Represents the denominator of the logprobs activation tensor across entire vocabulary.
- sigma – Hyper-parameter for logit-undernormalization technique for training multi-blank transducers.
- betas – Zero tensor of shape [B, T, U]. Will be updated inside the kernel with the backward variable probabilities.
- llBackward – Zero tensor of shape [B]. Represents the log-likelihood of the backward pass. Returned as the backward pass loss that is reduced by the optimizer.
- xlen – Vector of length B which contains the actual acoustic sequence lengths in the padded activation tensor.
- ylen – Vector of length B which contains the actual target sequence lengths in the padded activation tensor.
- mlabels – Matrix of shape [B, U+1] (+1 here is due to <SOS> token
  - usually the RNNT blank). The matrix contains the padded target transcription that must be predicted.
- minibatch – Int representing the batch size.
- maxT – The maximum possible acoustic sequence length. Represents T in the logprobs tensor.
- maxU – The maximum possible target sequence length. Represents U in the logprobs tensor.
- alphabet_size – The vocabulary dimension V+1 (inclusive of RNNT blank).
- blank – Index of the RNNT standard blank token in the vocabulary.
- big_blank_durations – Vector of supported big blank durations of the model.
- num_big_blanks – Number of big blanks of the model.

Updates: : Kernel inplace updates the following inputs:

betas: backward variable scores.
llBackward: log-likelihood of backward variable.