espnet2.sds.end_to_end.mini_omni_e2e.MiniOmniE2EModel
Less than 1 minute
espnet2.sds.end_to_end.mini_omni_e2e.MiniOmniE2EModel
class espnet2.sds.end_to_end.mini_omni_e2e.MiniOmniE2EModel(device: str = 'cuda', dtype: str = 'float16')
Bases: AbsE2E
Mini-OMNI E2E
A class to initialize and manage the OmniInference client for end-to-end dialogue systems.
- Parameters:device (Literal [ "cuda" , "cpu" ] , optional) – The device to run the inference on. Defaults to “cuda”.
- Raises:ImportError – If required dependencies (Pydub, Huggingface Hub, or OmniInference) are not installed.
forward(array: ndarray, orig_sr: int) → Tuple[str, bytes]
Processes audio input to generate synthesized speech and the corresponding text response.
- Parameters:
- array (np.ndarray) – The input audio array to be processed.
- orig_sr (int) – The sample rate of the input audio.
- Returns: A tuple containing:
- text_str (str): The generated text response.
- audio_output (bytes): The synthesized speech as an MP3 byte stream.
- Return type: Tuple[str, bytes]
warmup()
Perform a single forward pass with dummy input to pre-load and warm up the model.