espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.reduce.ReduceHelper
Less than 1 minute
espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.reduce.ReduceHelper
espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.reduce.ReduceHelper(I_opid: int, R_opid: int, acts: Tensor, output: Tensor, num_rows: int, num_cols: int, minus: bool, stream)
CUDA Warp reduction kernel helper which reduces via the R_Op.Add and writes
the result to output according to I_op id.
The result is stored in the blockIdx.
NOTE
Efficient warp occurs at input shapes of 2 ^ K.
References
- Warp Primitives [https://developer.nvidia.com/blog/using-cuda-warp-level-primitives/]
- Parameters:
- I_opid – Operator ID for input. See I_Op for more information.
- R_opid – Operator ID for reduction. See R_Op for more information.
- acts – Flatened activation matrix of shape [B * T * U * (V+1)].
- output – Flatened output matrix of shape [B * T * U * (V+1)]. Data will be overwritten.
- num_rows – Vocabulary size (including blank token) - V+1. Represents the number of threads per block.
- num_cols – Flattened shape of activation matrix, without vocabulary dimension (B * T * U). Represents number of blocks per grid.
- minus – Bool flag whether to add or subtract as reduction. If minus is set; calls _reduce_minus, else calls _reduce_rows kernel.
- stream – CUDA Stream.