This is tts demo of The LJ Speech Dataset [0].
tts1
recipe is based on Tacotron2 [1] (spectrogram prediction network) w/o WaveNet.
Tacotron2 generates log mel-filter bank from text and then converts it to linear spectrogram using inverse mel-basis.
Finally, phase components are recovered with Griffin-Lim.
(2019/06/16) we also support TTS-Transformer [3].
(2019/06/17) we also support Feed-forward Transformer [4].
tts2
recipe is based on Tacotron2’s spectrogram prediction network [1] and Tacotron’s CBHG module [2].
Instead of using inverse mel-basis, CBHG module is used to convert log mel-filter bank to linear spectrogram.
The recovery of the phase components is the same as tts1
.
v.0.4.0: transformer.v1
Sun Jun 16 10:03:47 JST 2019
Linux huracan.sp.m.is.nagoya-u.ac.jp 4.4.0-142-generic #168-Ubuntu SMP Wed Jan 16 21:00:45 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Python 3.7.3
espnet 0.3.1
chainer 5.0.0
pytorch 1.0.1.post2
267da3161cefeae72e9a44bd15e74c0d18591fb6
conf/tuning/train_pytorch_transformer.v1.yaml
conf/decode.yaml
data/train_no_dev/cmvn.ark
exp/train_no_dev_pytorch_train_pytorch_transformer.v1/results/model.last1.avg.best
exp/train_no_dev_pytorch_train_pytorch_transformer.v1/results/model.json
data/lang_1char/train_no_dev_units.txt
* The recommended browser for Audio player: Google Chrome
LJ050-0029 “THAT IS REFLECTED IN DEFINITE AND COMPREHENSIVE OPERATING PROCEDURES. “
ground_truth | transformer.v1_GL | transformer.v1_WNV |
---|---|---|
Attention wight | Probility |
---|---|
LJ050-0030 “THE COMMISSION ALSO RECOMMENDS “
ground_truth | transformer.v1_GL | transformer.v1_WNV |
---|---|---|
Attention wight | Probility |
---|---|
LJ050-0031 “THAT THE SECRET SERVICE CONSCIOUSLY SET ABOUT THE TASK OF INCULCATING AND MAINTAINING THE HIGHEST STANDARD OF EXCELLENCE AND ESPRIT, FOR ALL OF ITS PERSONNEL. “
ground_truth | transformer.v1_GL | transformer.v1_WNV |
---|---|---|
Attention wight | Probility |
---|---|
LJ050-0032 “THIS INVOLVES TIGHT AND UNSWERVING DISCIPLINE AS WELL AS THE PROMOTION OF AN OUTSTANDING DEGREE OF DEDICATION AND LOYALTY TO DUTY. “
ground_truth | transformer.v1_GL | transformer.v1_WNV |
---|---|---|
Attention wight | Probility |
---|---|
LJ050-0033 “THE COMMISSION EMPHASIZES THAT IT FINDS NO CAUSAL CONNECTION BETWEEN THE ASSASSINATION “
ground_truth | transformer.v1_GL | transformer.v1_WNV |
---|---|---|
Attention wight | Probility |
---|---|
https://drive.google.com/open?id=14EboYVsMVcAq__dFP1p6lyoZtdobIL1X
* The recommended browser for Google colab: Google Chrome
Please modify the option about tts model
Before: !../../../utils/synth_wav.sh --models ljspeech.fastspeech.v1 example.txt
After: !../../../utils/synth_wav.sh --models ljspeech.transformer.v1 example.txt