espnet2.enh.encoder.conv_encoder.ConvEncoder
Less than 1 minute
espnet2.enh.encoder.conv_encoder.ConvEncoder
class espnet2.enh.encoder.conv_encoder.ConvEncoder(channel: int, kernel_size: int, stride: int)
Bases: AbsEncoder
Convolutional encoder for speech enhancement and separation
Initializes internal Module state, shared by both nn.Module and ScriptModule.
forward(input: Tensor, ilens: Tensor, fs: int | None = None)
Forward.
- Parameters:
- input (torch.Tensor) – mixed speech [Batch, sample]
- ilens (torch.Tensor) – input lengths [Batch]
- fs (int) – sampling rate in Hz (Not used)
- Returns: mixed feature after encoder [Batch, flens, channel]
- Return type: feature (torch.Tensor)
forward_streaming(input: Tensor)
property output_dim : int
streaming_frame(audio: Tensor)
Stream frame.
It splits the continuous audio into frame-level audio chunks in the streaming simulation. It is noted that this function takes the entire long audio as input for a streaming simulation. You may refer to this function to manage your streaming input buffer in a real streaming application.
- Parameters:audio – (B, T)
- Returns: List [(B, frame_size),]
- Return type: chunked