espnet2.enh.decoder.stft_decoder.STFTDecoder
espnet2.enh.decoder.stft_decoder.STFTDecoder
class espnet2.enh.decoder.stft_decoder.STFTDecoder(n_fft: int = 512, win_length: int | None = None, hop_length: int = 128, window='hann', center: bool = True, normalized: bool = False, onesided: bool = True, default_fs: int = 16000, spec_transform_type: str | None = None, spec_factor: float = 0.15, spec_abs_exponent: float = 0.5)
Bases: AbsDecoder
STFT decoder for speech enhancement and separation
Initializes internal Module state, shared by both nn.Module and ScriptModule.
forward(input: ComplexTensor, ilens: Tensor, fs: int = None)
Forward.
- Parameters:
- input (ComplexTensor) – spectrum [Batch, T, (C,) F]
- ilens (torch.Tensor) – input lengths [Batch]
- fs (int) – sampling rate in Hz If not None, reconfigure iSTFT window and hop lengths for a new sampling rate while keeping their duration fixed.
forward_streaming(input_frame: Tensor)
Forward.
- Parameters:
- input (ComplexTensor) – spectrum [Batch, 1, F]
- output – wavs [Batch, 1, self.win_length]
spec_back(spec)
streaming_merge(chunks, ilens=None)
streaming_merge. It merges the frame-level processed audio chunks in the streaming simulation. It is noted that, in real applications, the processed audio should be sent to the output channel frame by frame. You may refer to this function to manage your streaming output buffer.
- Parameters:
- chunks – List [(B, frame_size),]
- ilens – [B]
- Returns: [B, T]
- Return type: merge_audio