synth_wav.sh
About 2 min
synth_wav.sh
Usage:
$ synth_wav.sh <text>
Note:
This code does not include text frontend part. Please clean the input
text manually. Also, you need to modify feature configuration according
to the model. Default setting is for ljspeech models, so if you want to
use other pretrained models, please modify the parameters by yourself.
For our provided models, you can find them in the tables at
https://github.com/espnet/espnet#tts-demo.
If you are beginner, instead of this script, I strongly recommend trying
the following colab notebook at first, which includes all of the procedure
from text frontend, feature generation, and waveform generation.
https://colab.research.google.com/github/espnet/notebook/blob/master/tts_realtime_demo.ipynb
Example:
# make text file and then generate it
# (for the default model, ljspeech, we use upper-case char sequence as the input)
echo "THIS IS A DEMONSTRATION OF TEXT TO SPEECH." > example.txt
synth_wav.sh example.txt
# also you can use multiple text
echo "THIS IS A DEMONSTRATION OF TEXT TO SPEECH." > example.txt
echo "TEXT TO SPEECH IS A TECHQNIQUE TO CONVERT TEXT INTO SPEECH." >> example.txt
synth_wav.sh example.txt
# you can specify the pretrained models
synth_wav.sh --models ljspeech.transformer.v3 example.txt
# also you can specify vocoder model
synth_wav.sh --vocoder_models ljspeech.wavenet.mol.v2 example.txt
Available models:
- ljspeech.tacotron2.v1
- ljspeech.tacotron2.v2
- ljspeech.tacotron2.v3
- ljspeech.transformer.v1
- ljspeech.transformer.v2
- ljspeech.transformer.v3
- ljspeech.fastspeech.v1
- ljspeech.fastspeech.v2
- ljspeech.fastspeech.v3
- libritts.tacotron2.v1
- libritts.transformer.v1
- jsut.transformer.v1
- jsut.tacotron2.v1
- csmsc.transformer.v1
- csmsc.fastspeech.v3
Available vocoder models:
- ljspeech.wavenet.softmax.ns.v1
- ljspeech.wavenet.mol.v1
- ljspeech.parallel_wavegan.v1
- libritts.wavenet.mol.v1
- jsut.wavenet.mol.v1
- jsut.parallel_wavegan.v1
- csmsc.wavenet.mol.v1
- csmsc.parallel_wavegan.v1
Model details:
| Model name | Lang | Fs [Hz] | Mel range [Hz] | FFT / Shift / Win [pt] | Input type |
| ----------------------- | ---- | ------- | -------------- | ---------------------- | ---------- |
| ljspeech.tacotron2.v1 | EN | 22.05k | None | 1024 / 256 / None | char |
| ljspeech.tacotron2.v2 | EN | 22.05k | None | 1024 / 256 / None | char |
| ljspeech.tacotron2.v3 | EN | 22.05k | None | 1024 / 256 / None | char |
| ljspeech.transformer.v1 | EN | 22.05k | None | 1024 / 256 / None | char |
| ljspeech.transformer.v2 | EN | 22.05k | None | 1024 / 256 / None | char |
| ljspeech.transformer.v3 | EN | 22.05k | None | 1024 / 256 / None | phn |
| ljspeech.fastspeech.v1 | EN | 22.05k | None | 1024 / 256 / None | char |
| ljspeech.fastspeech.v2 | EN | 22.05k | None | 1024 / 256 / None | char |
| ljspeech.fastspeech.v3 | EN | 22.05k | None | 1024 / 256 / None | phn |
| libritts.tacotron2.v1 | EN | 24k | 80-7600 | 1024 / 256 / None | char |
| libritts.transformer.v1 | EN | 24k | 80-7600 | 1024 / 256 / None | char |
| jsut.tacotron2 | JP | 24k | 80-7600 | 2048 / 300 / 1200 | phn |
| jsut.transformer | JP | 24k | 80-7600 | 2048 / 300 / 1200 | phn |
| csmsc.transformer.v1 | ZH | 24k | 80-7600 | 2048 / 300 / 1200 | pinyin |
| csmsc.fastspeech.v3 | ZH | 24k | 80-7600 | 2048 / 300 / 1200 | pinyin |
Vocoder model details:
| Model name | Lang | Fs [Hz] | Mel range [Hz] | FFT / Shift / Win [pt] | Model type |
| ------------------------------ | ---- | ------- | -------------- | ---------------------- | ---------------- |
| ljspeech.wavenet.softmax.ns.v1 | EN | 22.05k | None | 1024 / 256 / None | Softmax WaveNet |
| ljspeech.wavenet.mol.v1 | EN | 22.05k | None | 1024 / 256 / None | MoL WaveNet |
| ljspeech.parallel_wavegan.v1 | EN | 22.05k | None | 1024 / 256 / None | Parallel WaveGAN |
| libritts.wavenet.mol.v1 | EN | 24k | None | 1024 / 256 / None | MoL WaveNet |
| jsut.wavenet.mol.v1 | JP | 24k | 80-7600 | 2048 / 300 / 1200 | MoL WaveNet |
| jsut.parallel_wavegan.v1 | JP | 24k | 80-7600 | 2048 / 300 / 1200 | Parallel WaveGAN |
| csmsc.wavenet.mol.v1 | ZH | 24k | 80-7600 | 2048 / 300 / 1200 | MoL WaveNet |
| csmsc.parallel_wavegan.v1 | ZH | 24k | 80-7600 | 2048 / 300 / 1200 | Parallel WaveGAN |