espnet2.sds.asr.owsm_ctc_asr.OWSMCTCModel
Less than 1 minute
espnet2.sds.asr.owsm_ctc_asr.OWSMCTCModel
class espnet2.sds.asr.owsm_ctc_asr.OWSMCTCModel(tag: str = 'pyf98/owsm_ctc_v3.1_1B', device: str = 'cuda', dtype: str = 'float16')
Bases: AbsASR
OWSM CTC ASR
Args: tag (str, optional):
The pre-trained model tag (on Hugging Face). Defaults to: “pyf98/owsm_ctc_v3.1_1B”.
device (str, optional): : The computation device for running inference. Defaults to “cuda”. Common options include “cuda” or “cpu”.
dtype (str, optional): : The floating-point precision to use. Defaults to “float16”.
forward(array: ndarray) → str
Perform a forward pass on the given audio data, returning the transcribed text prompt.
- Parameters:array (np.ndarray) – The input audio data to be transcribed. Typically a NumPy array.
- Returns: The transcribed text from the audio input, as returned by the OWSM ASR model.
- Return type: str
warmup()
Perform a single forward pass with dummy input to pre-load and warm up the model.