espnet2.enh.decoder.stft_decoder.STFTDecoder

Less than 1 minute

espnet2.enh.decoder.stft_decoder.STFTDecoder

class espnet2.enh.decoder.stft_decoder.STFTDecoder(n_fft: int = 512, win_length: int | None = None, hop_length: int = 128, window='hann', center: bool = True, normalized: bool = False, onesided: bool = True, default_fs: int = 16000, spec_transform_type: str | None = None, spec_factor: float = 0.15, spec_abs_exponent: float = 0.5)

Bases: AbsDecoder

STFT decoder for speech enhancement and separation

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(input: ComplexTensor, ilens: Tensor, fs: int = None)

Forward.

Parameters:
- input (ComplexTensor) – spectrum [Batch, T, (C,) F]
- ilens (torch.Tensor) – input lengths [Batch]
- fs (int) – sampling rate in Hz If not None, reconfigure iSTFT window and hop lengths for a new sampling rate while keeping their duration fixed.

forward_streaming(input_frame: Tensor)

Forward.

Parameters:
- input (ComplexTensor) – spectrum [Batch, 1, F]
- output – wavs [Batch, 1, self.win_length]

spec_back(spec)

streaming_merge(chunks, ilens=None)

streaming_merge. It merges the frame-level processed audio chunks in the streaming simulation. It is noted that, in real applications, the processed audio should be sent to the output channel frame by frame. You may refer to this function to manage your streaming output buffer.

Parameters:
- chunks – List [(B, frame_size),]
- ilens – [B]
Returns: [B, T]
Return type: merge_audio