espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.reduce.reduce_max
Less than 1 minute
espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.reduce.reduce_max
espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.reduce.reduce_max(acts: Tensor, denom, rows: int, cols: int, minus: bool, stream)
Helper method to call the Warp Reduction Kernel to perform max reduction.
NOTE
Efficient warp occurs at input shapes of 2 ^ K.
References
- Warp Primitives [https://developer.nvidia.com/blog/using-cuda-warp-level-primitives/]
- Parameters:
- acts – Flatened activation matrix of shape [B * T * U * (V+1)].
- output – Flatened output matrix of shape [B * T * U * (V+1)]. Data will be overwritten.
- rows – Vocabulary size (including blank token) - V+1. Represents the number of threads per block.
- cols – Flattened shape of activation matrix, without vocabulary dimension (B * T * U). Represents number of blocks per grid.
- minus – Bool flag whether to add or subtract as reduction. If minus is set; calls _reduce_minus, else calls _reduce_rows kernel.
- stream – CUDA Stream.