espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt.MultiblankGPURNNT
espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt.MultiblankGPURNNT
class espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt.MultiblankGPURNNT(sigma: float, num_big_blanks: int, minibatch: int, maxT: int, maxU: int, alphabet_size: int, workspace, big_blank_workspace, blank: int, fastemit_lambda: float, clamp: float, num_threads: int, stream)
Bases: GPURNNT
Helper class to launch the CUDA Kernels to compute Multi-blank
Transducer Loss(https://arxiv.org/pdf/2211.03541).
- Parameters:
- sigma – Hyper-parameter related to the logit-normalization method in training multi-blank transducers.
- num_big_blanks – Number of big blank symbols the model has. This should not include the standard blank symbol.
- minibatch – Int representing the batch size.
- maxT – The maximum possible acoustic sequence length. Represents T in the logprobs tensor.
- maxU – The maximum possible target sequence length. Represents U in the logprobs tensor.
- alphabet_size – The vocabulary dimension V + 1 + num-big-blanks
- workspace – An allocated chunk of memory that will be sliced off and reshaped into required blocks used as working memory.
- big_blank_workspace – An allocated chunk of memory that will be sliced off and reshaped into required blocks used as working memory specifically for the multi-blank related computations.
- blank – Index of the RNNT blank token in the vocabulary. Generally the first or last token in the vocab.
- fastemit_lambda – Float scaling factor for FastEmit regularization. Refer to FastEmit: Low-latency Streaming ASR with Sequence-level Emission Regularization.
- clamp – Float value. When set to value >= 0.0, will clamp the gradient to [-clamp, clamp].
- num_threads – Number of OMP threads to launch.
- stream – Numba Cuda Stream.
compute_cost_and_score(acts: Tensor, grads: Tensor | None, costs: Tensor, labels: Tensor, label_lengths: Tensor, input_lengths: Tensor) → RNNTStatus
Compute both the loss and the gradients.
- Parameters:
- acts – A flattened tensor of shape [B, T, U, V+1] representing the activation matrix.
- grad – A flattented zero tensor of same shape as acts.
- costs – A zero vector of length B which will be updated inplace with the log probability costs.
- flat_labels – A flattened matrix of labels of shape [B, U]
- label_lengths – A vector of length B that contains the original lengths of the acoustic sequence.
- input_lengths – A vector of length B that contains the original lengths of the target sequence.
Updates: : This will launch kernels that will update inline the following variables:
- grads: Gradients of the activation matrix wrt the costs vector.
- costs: Negative log likelihood of the forward variable.
- Returns: An enum that either represents a successful RNNT operation or failure.
cost_and_grad(acts: Tensor, grads: Tensor, costs: Tensor, pad_labels: Tensor, label_lengths: Tensor, input_lengths: Tensor)
score_forward(acts: Tensor, costs: Tensor, pad_labels: Tensor, label_lengths: Tensor, input_lengths: Tensor)