espnet2.asr.transducer.rnnt_multi_blank.utils.cpu_utils.cpu_rnnt.CPURNNT

About 2 min

espnet2.asr.transducer.rnnt_multi_blank.utils.cpu_utils.cpu_rnnt.CPURNNT

class espnet2.asr.transducer.rnnt_multi_blank.utils.cpu_utils.cpu_rnnt.CPURNNT(minibatch: int, maxT: int, maxU: int, alphabet_size: int, workspace: Tensor, blank: int, fastemit_lambda: float, clamp: float, num_threads: int, batch_first: bool)

Bases: object

Helper class to compute the Transducer Loss on CPU.

Parameters:
- minibatch – Size of the minibatch b.
- maxT – The maximum possible acoustic sequence length. Represents T in the logprobs tensor.
- maxU – The maximum possible target sequence length. Represents U in the logprobs tensor.
- alphabet_size – The vocabulary dimension V+1 (inclusive of RNNT blank).
- workspace – An allocated chunk of memory that will be sliced off and reshaped into required blocks used as working memory.
- blank – Index of the RNNT blank token in the vocabulary. Generally the first or last token in the vocab.
- fastemit_lambda – Float scaling factor for FastEmit regularization. Refer to FastEmit: Low-latency Streaming ASR with Sequence-level Emission Regularization.
- clamp – Float value. When set to value >= 0.0, will clamp the gradient to [-clamp, clamp].
- num_threads – Number of OMP threads to launch.
- batch_first – Bool that decides if batch dimension is first or third.

compute_alphas(log_probs: Tensor, T: int, U: int, alphas: Tensor)

Compute the probability of the forward variable alpha.

Parameters:
- log_probs – Flattened tensor [B, T, U, V+1]
- T – Length of the acoustic sequence T (not padded).
- U – Length of the target sequence U (not padded).
- alphas – Working space memory for alpha of shape [B, T, U].
Returns: Loglikelihood of the forward variable alpha.

compute_betas_and_grads(grad: Tensor, log_probs: Tensor, T: int, U: int, alphas: Tensor, betas: Tensor, labels: Tensor, logll: Tensor)

Compute backward variable beta as well as gradients of the activation

matrix wrt loglikelihood of forward variable.

Parameters:
- grad – Working space memory of flattened shape [B, T, U, V+1]
- log_probs – Activatio tensor of flattented shape [B, T, U, V+1]
- T – Length of the acoustic sequence T (not padded).
- U – Length of the target sequence U (not padded).
- alphas – Working space memory for alpha of shape [B, T, U].
- betas – Working space memory for alpha of shape [B, T, U].
- labels – Ground truth label of shape [B, U]
- logll – Loglikelihood of the forward variable.
Returns: Loglikelihood of the forward variable and inplace updates the grad tensor.

cost_and_grad(log_probs: Tensor, grads: Tensor, costs: Tensor, flat_labels: Tensor, label_lengths: Tensor, input_lengths: Tensor) → RNNTStatus

cost_and_grad_kernel(log_probs: Tensor, grad: Tensor, labels: Tensor, mb: int, T: int, U: int, bytes_used: int)

score_forward(log_probs: Tensor, costs: Tensor, flat_labels: Tensor, label_lengths: Tensor, input_lengths: Tensor)