espnet2.enh.encoder.stft_encoder.STFTEncoder
espnet2.enh.encoder.stft_encoder.STFTEncoder
class espnet2.enh.encoder.stft_encoder.STFTEncoder(n_fft: int = 512, win_length: int | None = None, hop_length: int = 128, window='hann', center: bool = True, normalized: bool = False, onesided: bool = True, use_builtin_complex: bool = True, default_fs: int = 16000, spec_transform_type: str | None = None, spec_factor: float = 0.15, spec_abs_exponent: float = 0.5)
Bases: AbsEncoder
STFT encoder for speech enhancement and separation
Initializes internal Module state, shared by both nn.Module and ScriptModule.
forward(input: Tensor, ilens: Tensor, fs: int = None)
Forward.
- Parameters:
- input (torch.Tensor) – mixed speech [Batch, sample]
- ilens (torch.Tensor) – input lengths [Batch]
- fs (int) – sampling rate in Hz If not None, reconfigure STFT window and hop lengths for a new sampling rate while keeping their duration fixed.
- Returns: [Batch, T, (C,) F] flens (torch.Tensor): [Batch]
- Return type: spectrum (ComplexTensor)
forward_streaming(input: Tensor)
Forward.
- Parameters:input (torch.Tensor) – mixed speech [Batch, frame_length]
- Returns: B, 1, F
property output_dim : int
spec_transform_func(spec)
streaming_frame(audio)
streaming_frame. It splits the continuous audio into frame-level audio chunks in the streaming simulation. It is noted that this function takes the entire long audio as input for a streaming simulation. You may refer to this function to manage your streaming input buffer in a real streaming application.
- Parameters:audio – (B, T)
- Returns: List [(B, frame_size),]
- Return type: chunked