Pretrained Model
Pretrained Model
This is the example notebook of how-to-recognize and -synthesize speech using the ESPnet models.
See also:
- Tutorial: https://github.com/espnet/espnet/blob/master/doc/tutorial.md
- Github: https://github.com/espnet
Author: Takenori Yoshimura
Last update: 2019/07/28
Setup envrionment
Let's setup the environmet for the demonstration. It takes around 10 minues. Please keep waiting for a while.
# OS setup
!sudo apt-get install bc tree sox
!cat /etc/os-release
# espnet setup
!git clone https://github.com/espnet/espnet
!cd espnet; pip install -e .
# warp ctc setup
!git clone https://github.com/espnet/warp-ctc -b pytorch-1.1
!cd warp-ctc && mkdir build && cd build && cmake .. && make -j
!cd warp-ctc/pytorch_binding && python setup.py install
# kaldi setup
!cd /content/espnet/tools; git clone https://github.com/kaldi-asr/kaldi
!echo "" > ./espnet/tools/kaldi/tools/extras/check_dependencies.sh
!chmod +x ./espnet/tools/kaldi/tools/extras/check_dependencies.sh
!cd ./espnet/tools/kaldi/tools; make sph2pipe sclite
!rm -rf espnet/tools/kaldi/tools/python
!wget https://18-198329952-gh.circle-artifacts.com/0/home/circleci/repo/ubuntu16-featbin.tar.gz
!tar -xf ./ubuntu16-featbin.tar.gz
!cp featbin/* espnet/tools/kaldi/src/featbin/
# sentencepiece setup
!cd espnet/tools; make sentencepiece.done
# make dummy activate
!mkdir -p espnet/tools/venv/bin
!touch espnet/tools/venv/bin/activate
Recognize speech using pretrained models
Let's recognize 7-minutes long audio speech as an example. Go to a recipe directory and run recog_wav.sh
at the directory.
Available models are summarized here.
!cd espnet/egs/tedlium2/asr1; bash ../../../utils/recog_wav.sh --models tedlium2.tacotron2.v1
You can see the progress of the recognition.
!cat espnet/egs/tedlium2/asr1/decode/TomWujec_2010U/log/decode.log
You can change E2E model, language model, decoding parameters, etc. For the detail, see recog_wav.sh
.
!cat espnet/utils/recog_wav.sh
Synthesize speech using pretrained models
Let's synthesize speech using an E2E model. Go to a recipe directory and run synth_wav.sh
at the directory.
Available models are summarized here.
!cd espnet/egs/ljspeech/tts1; \
echo "THIS IS A DEMONSTRATION OF TEXT TO SPEECH." > example.txt; \
bash ../../../utils/synth_wav.sh --models ljspeech.tacotron2.v1 example.txt
Let's listen the synthesized speech!
from google.colab import files
files.download('espnet/egs/ljspeech/tts1/decode/example/wav/example.wav')
You can change E2E model, decoding parameters, etc. For the detail, see synth_wav.sh
.
!cat espnet/utils/synth_wav.sh
We have a web storage to put your good trained models. If you want, please contact Shinji Watanabe shinjiw@ieee.org.