espnet2.sds.asr.owsm_ctc_asr.OWSMCTCModel

Less than 1 minute

class espnet2.sds.asr.owsm_ctc_asr.OWSMCTCModel(tag: str = 'pyf98/owsm_ctc_v3.1_1B', device: str = 'cuda', dtype: str = 'float16')

OWSM CTC ASR

Initializer method.

Args: tag (str, optional):

The pre-trained model tag (on Hugging Face). Defaults to: “pyf98/owsm_ctc_v3.1_1B”.

device (str, optional): : The computation device for running inference. Defaults to “cuda”. Common options include “cuda” or “cpu”.

dtype (str, optional): : The floating-point precision to use. Defaults to “float16”.

forward(array: ndarray) → str

Perform a forward pass on the given audio data,

returning the transcribed text prompt.

Parameters:array (np.ndarray) – The input audio data to be transcribed. Typically a NumPy array.
Returns: The transcribed text from the audio input, as returned by the OWSM ASR model.
Return type: str

warmup()

Perform a single forward pass with dummy input to

pre-load and warm up the model.