espnet2.enh.encoder.stft_encoder.STFTEncoder

Less than 1 minute

espnet2.enh.encoder.stft_encoder.STFTEncoder

class espnet2.enh.encoder.stft_encoder.STFTEncoder(n_fft: int = 512, win_length: int | None = None, hop_length: int = 128, window='hann', center: bool = True, normalized: bool = False, onesided: bool = True, use_builtin_complex: bool = True, default_fs: int = 16000, spec_transform_type: str | None = None, spec_factor: float = 0.15, spec_abs_exponent: float = 0.5)

Bases: AbsEncoder

STFT encoder for speech enhancement and separation

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(input: Tensor, ilens: Tensor, fs: int = None)

Forward.

Parameters:
- input (torch.Tensor) – mixed speech [Batch, sample]
- ilens (torch.Tensor) – input lengths [Batch]
- fs (int) – sampling rate in Hz If not None, reconfigure STFT window and hop lengths for a new sampling rate while keeping their duration fixed.
Returns: [Batch, T, (C,) F] flens (torch.Tensor): [Batch]
Return type: spectrum (ComplexTensor)

forward_streaming(input: Tensor)

Forward.

Parameters:input (torch.Tensor) – mixed speech [Batch, frame_length]
Returns: B, 1, F

property output_dim : int

spec_transform_func(spec)

streaming_frame(audio)

streaming_frame. It splits the continuous audio into frame-level audio chunks in the streaming simulation. It is noted that this function takes the entire long audio as input for a streaming simulation. You may refer to this function to manage your streaming input buffer in a real streaming application.

Parameters:audio – (B, T)
Returns: List [(B, frame_size),]
Return type: chunked