espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.reduce.reduce_max

Less than 1 minute

espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.reduce.reduce_max(acts: Tensor, denom, rows: int, cols: int, minus: bool, stream)

Helper method to call the Warp Reduction Kernel to perform max reduction.

Efficient warp occurs at input shapes of 2 ^ K.

References

Parameters:
- acts – Flatened activation matrix of shape [B * T * U * (V+1)].
- output – Flatened output matrix of shape [B * T * U * (V+1)]. Data will be overwritten.
- rows – Vocabulary size (including blank token) - V+1. Represents the number of threads per block.
- cols – Flattened shape of activation matrix, without vocabulary dimension (B * T * U). Represents number of blocks per grid.
- minus – Bool flag whether to add or subtract as reduction. If minus is set; calls _reduce_minus, else calls _reduce_rows kernel.
- stream – CUDA Stream.