espnet2.asr.frontend.asteroid_frontend.AsteroidFrontend

Less than 1 minute

espnet2.asr.frontend.asteroid_frontend.AsteroidFrontend

class espnet2.asr.frontend.asteroid_frontend.AsteroidFrontend(sinc_filters: int = 256, sinc_kernel_size: int = 251, sinc_stride: int = 16, preemph_coef: float = 0.97, log_term: float = 1e-06)

Bases: AbsFrontend

Asteroid Filterbank Frontend.

Provides a Sinc-convolutional-based audio feature extractor. The same function can be achieved by using sliding_winodw frontend + sinc preencoder.

NOTE(jiatong): this function is used in sentence-level classification tasks (e.g., spk). Other usages are not fully investigated.

NOTE(jeeweon): this function implements the parameterized analytic filterbank layer in M. Pariente, S. Cornell, A. Deleforge and E. Vincent, “Filterbank design for end-to-end speech separation,” in Proc. ICASSP, 2020

Initialize.

Parameters:
- sinc_filters – the filter numbers for sinc.
- sinc_kernel_size – the kernel size for sinc.
- sinc_stride – the sincstride size of the first sinc-conv layer where it decides the compression rate (Hz).
- preemph_coef – the coeifficient for preempahsis.
- log_term – the log term to prevent infinity.

forward(input: Tensor, input_length: Tensor) → Tuple[Tensor, Tensor]

Apply the Asteroid filterbank frontend to the input.

Parameters:
- input – Input (B, T).
- input_length – Input length (B,).
Returns: Frame-wise output (B, T’, D).
Return type: Tensor

output_size() → int

Return output length of feature dimension D.