espnet2.sds.asr.whisper_asr.WhisperASRModel
Less than 1 minute
espnet2.sds.asr.whisper_asr.WhisperASRModel
class espnet2.sds.asr.whisper_asr.WhisperASRModel(tag: str = 'large', device: str = 'cuda', dtype: str = 'float16')
Bases: AbsASR
Whisper ASR
Args: tag (str, optional):
The Whisper model tag
device (str, optional): : The computation device for running inference. Defaults to “cuda”. Common options include “cuda” or “cpu”.
dtype (str, optional): : The floating-point precision to use. Defaults to “float16”.
forward(array: ndarray) → str
Perform a forward pass on the given audio data, returning the transcribed text prompt.
- Parameters:array (np.ndarray) – The input audio data to be transcribed. Typically a NumPy array.
- Returns: The transcribed text from the audio input, as returned by the Whisper ASR model.
- Return type: str
warmup()
Perform a single forward pass with dummy input to pre-load and warm up the model.