espnet2.sds.asr.whisper_asr.WhisperASRModel

Less than 1 minute

class espnet2.sds.asr.whisper_asr.WhisperASRModel(tag: str = 'large', device: str = 'cuda', dtype: str = 'float16')

Whisper ASR

Initializer method.

Args: tag (str, optional):

The Whisper model tag

device (str, optional): : The computation device for running inference. Defaults to “cuda”. Common options include “cuda” or “cpu”.

dtype (str, optional): : The floating-point precision to use. Defaults to “float16”.

forward(array: ndarray) → str

Perform a forward pass on the given audio data,

returning the transcribed text prompt.

Parameters:array (np.ndarray) – The input audio data to be transcribed. Typically a NumPy array.
Returns: The transcribed text from the audio input, as returned by the Whisper ASR model.
Return type: str

warmup()

Perform a single forward pass with dummy input to

pre-load and warm up the model.