espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt.GPURNNT

About 2 min

espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt.GPURNNT

class espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt.GPURNNT(minibatch: int, maxT: int, maxU: int, alphabet_size: int, workspace, blank: int, fastemit_lambda: float, clamp: float, num_threads: int, stream)

Bases: object

Helper class to launch the CUDA Kernels to compute the Transducer Loss.

Parameters:
- minibatch – Int representing the batch size.
- maxT – The maximum possible acoustic sequence length. Represents T in the logprobs tensor.
- maxU – The maximum possible target sequence length. Represents U in the logprobs tensor.
- alphabet_size – The vocabulary dimension V+1 (inclusive of RNNT blank).
- workspace – An allocated chunk of memory that will be sliced off and reshaped into required blocks used as working memory.
- blank – Index of the RNNT blank token in the vocabulary. Generally the first or last token in the vocab.
- fastemit_lambda – Float scaling factor for FastEmit regularization. Refer to FastEmit: Low-latency Streaming ASR with Sequence-level Emission Regularization.
- clamp – Float value. When set to value >= 0.0, will clamp the gradient to [-clamp, clamp].
- num_threads – Number of OMP threads to launch.
- stream – Numba Cuda Stream.

compute_cost_and_score(acts: Tensor, grads: Tensor | None, costs: Tensor, labels: Tensor, label_lengths: Tensor, input_lengths: Tensor) → RNNTStatus

Compute both the loss and the gradients.

Parameters:
- acts – A flattened tensor of shape [B, T, U, V+1] representing the activation matrix.
- grad – A flattented zero tensor of same shape as acts.
- costs – A zero vector of length B which will be updated inplace with the log probability costs.
- flat_labels – A flattened matrix of labels of shape [B, U]
- label_lengths – A vector of length B that contains the original lengths of the acoustic sequence.
- input_lengths – A vector of length B that contains the original lengths of the target sequence.

Updates: : This will launch kernels that will update inline the following variables:

grads: Gradients of the activation matrix wrt the costs vector.
costs: Negative log likelihood of the forward variable.

Returns: An enum that either represents a successful RNNT operation or failure.

cost_and_grad(acts: Tensor, grads: Tensor, costs: Tensor, pad_labels: Tensor, label_lengths: Tensor, input_lengths: Tensor)

log_softmax(acts: Tensor, denom: Tensor)

Computes the log softmax denominator of the input activation tensor

and stores the result in denom.

Parameters:
- acts – Activation tensor of shape [B, T, U, V+1]. The input must be represented as a flat tensor of shape [B * T * U * (V+1)] to allow pointer indexing.
- denom – A zero tensor of same shape as acts.

Updates: : This kernel inplace updates the denom tensor

score_forward(acts: Tensor, costs: Tensor, pad_labels: Tensor, label_lengths: Tensor, input_lengths: Tensor)