espnet2.asr.frontend.windowing.SlidingWindow
espnet2.asr.frontend.windowing.SlidingWindow
class espnet2.asr.frontend.windowing.SlidingWindow(win_length: int = 400, hop_length: int = 160, channels: int = 1, padding: int | None = None, fs=None)
Bases: AbsFrontend
Sliding Window.
Provides a sliding window over a batched continuous raw audio tensor. Optionally, provides padding (Currently not implemented). Combine this module with a pre-encoder compatible with raw audio data, for example Sinc convolutions.
Known issues: Output length is calculated incorrectly if audio shorter than win_length. WARNING: trailing values are discarded - padding not implemented yet. There is currently no additional window function applied to input values.
Initialize.
- Parameters:
- win_length – Length of frame.
- hop_length – Relative starting point of next frame.
- channels – Number of input channels.
- padding – Padding (placeholder, currently not implemented).
- fs – Sampling rate (placeholder for compatibility, not used).
forward(input: Tensor, input_lengths: Tensor) → Tuple[Tensor, Tensor]
Apply a sliding window on the input.
- Parameters:
- input – Input (B, T, C*D) or (B, T*C*D), with D=C=1.
- input_lengths – Input lengths within batch.
- Returns: Output with dimensions (B, T, C, D), with D=win_length. Tensor: Output lengths within batch.
- Return type: Tensor
output_size() → int
Return output length of feature dimension D, i.e. the window length.