espnet.utils.spec_augment.specaug
Less than 1 minute
espnet.utils.spec_augment.specaug
espnet.utils.spec_augment.specaug(spec, W=5, F=30, T=40, num_freq_masks=2, num_time_masks=2, replace_with_zero=False)
Specaugment Data Augmentation.
Reference: : SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition (https://arxiv.org/pdf/1904.08779.pdf)
This implementation modified from https://github.com/zcaceres/spec_augment
- Parameters:
- spec (torch.Tensor) – input tensor with the shape (T, dim)
- W (int) – time warp parameter
- F (int) – maximum width of each freq mask
- T (int) – maximum width of each time mask
- num_freq_masks (int) – number of frequency masks
- num_time_masks (int) – number of time masks
- replace_with_zero (bool) – if True, masked parts will be filled with 0, if False, filled with mean