espnet2.sds.tts.espnet_tts.ESPnetTTSModel

Less than 1 minute

class espnet2.sds.tts.espnet_tts.ESPnetTTSModel(tag: str = 'kan-bayashi/ljspeech_vits', device: str = 'cuda')

ESPnet TTS.

A class to initialize and manage a ESPnet’s

pre-trained text-to-speech (TTS) model.

This class:

Downloads and sets up a pre-trained TTS model using the ESPnet Model Zoo.
Supports various TTS configurations, including multi speaker TTS using speaker embeddings and speaker IDs.

Parameters:
- tag (str , optional) – The model tag for the pre-trained TTS model. Defaults to “kan-bayashi/ljspeech_vits”.
- device (str , optional) – The computation device for running inference. Defaults to “cuda”.
Raises:ImportError – If the required espnet_model_zoo library is not installed.

forward(transcript: str) → Tuple[int, ndarray]

Converts a text transcript into an audio waveform

using a pre-trained ESPnet-TTS model.

Parameters:transcript (str) – The input text to be converted into speech.
Returns: A tuple containing:
- The sample rate of the audio (int).
- The generated audio waveform as a NumPy array of type int16.
Return type: Tuple[int, np.ndarray]

warmup()

Perform a single forward pass with dummy input to

pre-load and warm up the model.