espnet2.sds.end_to_end.mini_omni_e2e.MiniOmniE2EModel

Less than 1 minute

class espnet2.sds.end_to_end.mini_omni_e2e.MiniOmniE2EModel(device: str = 'cuda', dtype: str = 'float16')

Mini-OMNI E2E

A class to initialize and manage the OmniInference client

for end-to-end dialogue systems.

Parameters:device (Literal [ "cuda" , "cpu" ] , optional) – The device to run the inference on. Defaults to “cuda”.
Raises:ImportError – If required dependencies (Pydub, Huggingface Hub, or OmniInference) are not installed.

forward(array: ndarray, orig_sr: int) → Tuple[str, bytes]

Processes audio input to generate synthesized speech

and the corresponding text response.

Parameters:
- array (np.ndarray) – The input audio array to be processed.
- orig_sr (int) – The sample rate of the input audio.
Returns: A tuple containing:
- text_str (str): The generated text response.
- audio_output (bytes): The synthesized speech as an MP3 byte stream.
Return type: Tuple[str, bytes]

warmup()

Perform a single forward pass with dummy input to

pre-load and warm up the model.