espnet2.sds.asr.espnet_asr.ESPnetASRModel
espnet2.sds.asr.espnet_asr.ESPnetASRModel
class espnet2.sds.asr.espnet_asr.ESPnetASRModel(tag: str = 'espnet/simpleoier_librispeech_asr_train_asr_conformer7_wavlm_large_raw_en_bpe5000_sp', device: str = 'cuda', dtype: str = 'float16')
Bases: AbsASR
ESPnet ASR
Args: tag (str, optional):
The pre-trained model tag (on Hugging Face). Defaults to: “espnet/
simpleoier_librispeech_asr_train_asr_
conformer7_wavlm_large_raw_en_bpe5000_sp”.
device (str, optional): : The computation device for running inference. Defaults to “cuda”. Common options include “cuda” or “cpu”.
dtype (str, optional): : The floating-point precision to use. Defaults to “float16”.
forward(array: ndarray) → str
Perform a forward pass on the given audio data, returning the transcribed text prompt.
- Parameters:array (np.ndarray) – The input audio data to be transcribed. Typically a NumPy array.
- Returns: The transcribed text from the audio input, as returned by the speech-to-text model.
- Return type: str
warmup()
Perform a single forward pass with dummy input to pre-load and warm up the model.