espnet2.asr.frontend.windowing.SlidingWindow

Less than 1 minute

espnet2.asr.frontend.windowing.SlidingWindow

class espnet2.asr.frontend.windowing.SlidingWindow(win_length: int = 400, hop_length: int = 160, channels: int = 1, padding: int | None = None, fs=None)

Bases: AbsFrontend

Sliding Window.

Provides a sliding window over a batched continuous raw audio tensor. Optionally, provides padding (Currently not implemented). Combine this module with a pre-encoder compatible with raw audio data, for example Sinc convolutions.

Known issues: Output length is calculated incorrectly if audio shorter than win_length. WARNING: trailing values are discarded - padding not implemented yet. There is currently no additional window function applied to input values.

Initialize.

Parameters:
- win_length – Length of frame.
- hop_length – Relative starting point of next frame.
- channels – Number of input channels.
- padding – Padding (placeholder, currently not implemented).
- fs – Sampling rate (placeholder for compatibility, not used).

forward(input: Tensor, input_lengths: Tensor) → Tuple[Tensor, Tensor]

Apply a sliding window on the input.

Parameters:
- input – Input (B, T, C*D) or (B, T*C*D), with D=C=1.
- input_lengths – Input lengths within batch.
Returns: Output with dimensions (B, T, C, D), with D=win_length. Tensor: Output lengths within batch.
Return type: Tensor

output_size() → int

Return output length of feature dimension D, i.e. the window length.