espnet.utils.spec_augment.specaug

Less than 1 minute

espnet.utils.spec_augment.specaug(spec, W=5, F=30, T=40, num_freq_masks=2, num_time_masks=2, replace_with_zero=False)

Specaugment Data Augmentation.

Reference: : SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition (https://arxiv.org/pdf/1904.08779.pdf)

Parameters:
- spec (torch.Tensor) – input tensor with the shape (T, dim)
- W (int) – time warp parameter
- F (int) – maximum width of each freq mask
- T (int) – maximum width of each time mask
- num_freq_masks (int) – number of frequency masks
- num_time_masks (int) – number of time masks
- replace_with_zero (bool) – if True, masked parts will be filled with 0, if False, filled with mean