ESPnet Notebooks
About 2 min
ESPnet Notebooks
Demo
ASR (Speech recognition)
asr_realtime_demo.ipynb: ASR realtime inference with various pre-trained models.asr_transfer_learning_demo.ipynb: Demo on how to use pre-trained ASR models for fine-tuning.streaming_asr_demo.ipynb: Streaming ASR realtime inference with pre-trained models.
SE (Speech enhancement/separation)
se_demo.ipynb: Speech enhancement/separation inference with various pre-trained models.se_demo_for_waspaa_2021.ipynb: WASPAA2021 version of ESPnet-SE demo.
SLU (Spoken language understanding)
2pass_slu_demo.ipynb: Two pass spoken language understanding pre-trained model examples.
TTS (Text-to-speech)
tts_realtime_demo.ipynb: TTS realtime inference with various pre-trained models.
Other utilities
onnx_conversion_demo.ipynb: How to convert ESPnet models into ONNX format.
ESPnet-EZ
ASR (Speech recognition)
train_from_scratch.ipynb: Training an ASR model with ESPnet-EZ on LibriSpeech-100.ASR_finetune_owsm.ipynb: Fine-tuning the weakly-supervised model (OWSM) with ESPnet-EZ on custom dataset.
ST (Speech-to-text translation)
integrate_huggingface.ipynb: Integrating the weakly-supervised model (OWSM) and huggingface's pre-trained language model with ESPnet-EZ on MuST-C-v2.ST_finetune_owsm.ipynb: Fine-tuning the weakly-supervised model (OWSM) with ESPnet-EZ on MuST-C-v2.
SLU (Spoken language understanding)
SLU_finetune_owsm.ipynb: Fine-tuning the weakly-supervised model (OWSM) with ESPnet-EZ on SLURP.
TTS (Text-to-speech)
TTS_finetune_vctk_dump.ipynb: Fine-tuning a pre-trained VITS model with ESPnet-EZ on the VCTK dataset.
SVS (Singing voice synthesis)
SVS_finetune_ace-kising.ipynb: Fine-tuning a pre-trained VISinger 2 model with ESPnet-EZ on ACE-KiSing.
Course
CMU SpeechProcessing Spring2023
assignment0_data-prep.ipynb: Course assignment on how to prepare ESPnet-format data.assignment1_espnet-tutorial.ipynb: A simplified version of previous year's new task tutorial.assignemnt3_spk.ipynb: Examples of using ESPnet to extract speaker embeddings and conduct speaker recognition.assignment4_ssl.ipynb: Exploration on using self-supervised speech representation to ESPnet ASR training.assignment5_st.ipynb: Examples of state-of-the-art speech translation models in ESPnet.assignment6_slu.ipynb: Examples of state-of-the-art spoken language understanding models in ESPnet.assignment7_se.ipynb: Examples of state-of-the-art speech enhancement/separation in ESPnet.assignment8_tts.ipynb: A student version of espnet2-tts realtime demonstration.s2st_demo.ipynb: An example of existing speech-to-speech translation model for ESPnet.
CMU SpeechRecognition Fall2022
recipe_tutorial.ipynb: A general tutorial of stage-by-stage explanation of ESPnet2 recipes (with new functions).new_task_tutorial.ipynb: A tutorial on how to add new models/tasks to ESPnet framework.
CMU SpeechRecognition Fall2021
general_tutorial.ipynb: A general tutorial of stage-by-stage explanation of ESPnet2 recipes.
ESPnet1 (Legacy)
asr_library.ipynb: Speech recognition library explanation with network training.asr_recipe.ipynb: Speech recognition recipe explanation.pretrained.ipynb: Tutorial on how to use pre-trained models.st_demo.ipynb: Speech translation demonstration with a TTS model to achieve speech-to-speech translation.tts_realtime_demo.ipynb: TTS demonstration with different pre-trained TTS models.tts_recipe.ipynb: Stage explanation for TTS recipes.
