espnet2.sds.tts.espnet_tts.ESPnetTTSModel
Less than 1 minute
espnet2.sds.tts.espnet_tts.ESPnetTTSModel
class espnet2.sds.tts.espnet_tts.ESPnetTTSModel(tag: str = 'kan-bayashi/ljspeech_vits', device: str = 'cuda')
Bases: AbsTTS
ESPnet TTS.
A class to initialize and manage a ESPnet’s pre-trained text-to-speech (TTS) model.
This class:
- Downloads and sets up a pre-trained TTS model using the ESPnet Model Zoo.
- Supports various TTS configurations, including multi speaker TTS using speaker embeddings and speaker IDs.
- Parameters:
- tag (str , optional) – The model tag for the pre-trained TTS model. Defaults to “kan-bayashi/ljspeech_vits”.
- device (str , optional) – The computation device for running inference. Defaults to “cuda”.
- Raises:ImportError – If the required espnet_model_zoo library is not installed.
forward(transcript: str) → Tuple[int, ndarray]
Converts a text transcript into an audio waveform using a pre-trained ESPnet-TTS model.
- Parameters:transcript (str) – The input text to be converted into speech.
- Returns: A tuple containing:
- The sample rate of the audio (int).
- The generated audio waveform as a NumPy array of type int16.
- Return type: Tuple[int, np.ndarray]
warmup()
Perform a single forward pass with dummy input to pre-load and warm up the model.