espnet2.asr.frontend.cnn.CNNFrontend
Less than 1 minute
espnet2.asr.frontend.cnn.CNNFrontend
class espnet2.asr.frontend.cnn.CNNFrontend(norm_mode: str, conv_mode: str, bias: bool, shapes: List[Tuple[int, int, int]] = [(512, 10, 5), (512, 3, 2), (512, 3, 2), (512, 3, 2), (512, 3, 2), (512, 2, 2), (512, 2, 2)], fs: int | str = 16000, normalize_audio: bool = False, normalize_output: bool = False)
Bases: AbsFrontend
Convolutional feature extractor.
Typically used in SSL models. Uses raw waveforms as input.
Initializes internal Module state, shared by both nn.Module and ScriptModule.
forward(x: Tensor, length: Tensor | None) → Tuple[Tensor, Tensor | None]
- Parameters:
- x (Tensor) – Input Tensor representing a batch of audio, shape:
[batch, time]
. - length (Tensor or None , optional) – Valid length of each input sample. shape:
[batch, ]
.
- x (Tensor) – Input Tensor representing a batch of audio, shape:
- Returns: The resulting feature, shape:
[batch, frame, feature]
Optional[Tensor]:Valid length of each output sample. shape:
[batch, ]
. - Return type: Tensor
output_size() → int