espnet2.asr.frontend.asteroid_frontend.AsteroidFrontend
espnet2.asr.frontend.asteroid_frontend.AsteroidFrontend
class espnet2.asr.frontend.asteroid_frontend.AsteroidFrontend(sinc_filters: int = 256, sinc_kernel_size: int = 251, sinc_stride: int = 16, preemph_coef: float = 0.97, log_term: float = 1e-06)
Bases: AbsFrontend
Asteroid Filterbank Frontend.
Provides a Sinc-convolutional-based audio feature extractor. The same function can be achieved by using sliding_winodw frontend + sinc preencoder.
NOTE(jiatong): this function is used in sentence-level classification tasks (e.g., spk). Other usages are not fully investigated.
NOTE(jeeweon): this function implements the parameterized analytic filterbank layer in M. Pariente, S. Cornell, A. Deleforge and E. Vincent, “Filterbank design for end-to-end speech separation,” in Proc. ICASSP, 2020
Initialize.
- Parameters:
- sinc_filters – the filter numbers for sinc.
- sinc_kernel_size – the kernel size for sinc.
- sinc_stride – the sincstride size of the first sinc-conv layer where it decides the compression rate (Hz).
- preemph_coef – the coeifficient for preempahsis.
- log_term – the log term to prevent infinity.
forward(input: Tensor, input_length: Tensor) → Tuple[Tensor, Tensor]
Apply the Asteroid filterbank frontend to the input.
- Parameters:
- input – Input (B, T).
- input_length – Input length (B,).
- Returns: Frame-wise output (B, T’, D).
- Return type: Tensor
output_size() → int
Return output length of feature dimension D.