espnet2.enh.encoder.conv_encoder.ConvEncoder

Less than 1 minute

espnet2.enh.encoder.conv_encoder.ConvEncoder

source

class espnet2.enh.encoder.conv_encoder.ConvEncoder(channel: int, kernel_size: int, stride: int)

Bases: AbsEncoder

Convolutional encoder for speech enhancement and separation

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(input: Tensor, ilens: Tensor, fs: int | None = None)

Forward.

Parameters:
- input (torch.Tensor) – mixed speech [Batch, sample]
- ilens (torch.Tensor) – input lengths [Batch]
- fs (int) – sampling rate in Hz (Not used)
Returns: mixed feature after encoder [Batch, flens, channel]
Return type: feature (torch.Tensor)

forward_streaming(input: Tensor)

property output_dim : int

streaming_frame(audio: Tensor)

Stream frame.

It splits the continuous audio into frame-level audio chunks in the streaming simulation. It is noted that this function takes the entire long audio as input for a streaming simulation. You may refer to this function to manage your streaming input buffer in a real streaming application.

Parameters:audio – (B, T)
Returns: List [(B, frame_size),]
Return type: chunked