espnet2.layers.sinc_conv.SincConv
espnet2.layers.sinc_conv.SincConv
class espnet2.layers.sinc_conv.SincConv(in_channels: int, out_channels: int, kernel_size: int, stride: int = 1, padding: int = 0, dilation: int = 1, window_func: str = 'hamming', scale_type: str = 'mel', fs: int | float = 16000)
Bases: Module
Sinc Convolution.
This module performs a convolution using Sinc filters in time domain as kernel. Sinc filters function as band passes in spectral domain. The filtering is done as a convolution in time domain, and no transformation to spectral domain is necessary.
This implementation of the Sinc convolution is heavily inspired by Ravanelli et al. https://github.com/mravanelli/SincNet, and adapted for the ESpnet toolkit. Combine Sinc convolutions with a log compression activation function, as in: https://arxiv.org/abs/2010.07597
Notes: Currently, the same filters are applied to all input channels. The windowing function is applied on the kernel to obtained a smoother filter, and not on the input values, which is different to traditional ASR.
Initialize Sinc convolutions.
- Parameters:
- in_channels – Number of input channels.
- out_channels – Number of output channels.
- kernel_size – Sinc filter kernel size (needs to be an odd number).
- stride – See torch.nn.functional.conv1d.
- padding – See torch.nn.functional.conv1d.
- dilation – See torch.nn.functional.conv1d.
- window_func – Window function on the filter, one of [“hamming”, “none”].
- fs (str , int , float) – Sample rate of the input data
forward(xs: Tensor) → Tensor
Sinc convolution forward function.
- Parameters:xs – Batch in form of torch.Tensor (B, C_in, D_in).
- Returns: Batch in form of torch.Tensor (B, C_out, D_out).
- Return type: xs
get_odim(idim: int) → int
Obtain the output dimension of the filter.
static hamming_window(x: Tensor) → Tensor
Hamming Windowing function.
init_filters()
Initialize filters with filterbank values.
static none_window(x: Tensor) → Tensor
Identity-like windowing function.
static sinc(x: Tensor) → Tensor
Sinc function.