CMU 11751/18781 Fall 2022: ESPnet Tutorial
CMU 11751/18781 Fall 2022: ESPnet Tutorial
ESPnet is a widely-used end-to-end speech processing toolkit. It has supported various speech processing tasks. ESPnet uses PyTorch as a main deep learning engine, and also follows Kaldi style data processing, feature extraction/format, and recipes to provide a complete setup for speech recognition and other speech processing experiments.
Main references:
- ESPnet repository
- ESPnet documentation
- ESPnet tutorial in Speech Recognition and Understanding (Fall 2021)
- Recitation in Multilingual NLP (Spring 2022)
Author: Siddhant Arora (siddhana@andrew.cmu.edu) This notebook was modified from the material made by Yifan Peng (yifanpen@andrew.cmu.edu)
❗Important Notes❗
- We are using Colab to show the demo. However, Colab has some constraints on the total GPU runtime. If you use too much GPU, you may fail to connect to a GPU backend for some time.
- There are multiple in-class checkpoints ✅ throughout this tutorial. There will also be some after-class excersices 📗 after the tutorial. Your participation points are based on these tasks. Please try your best to follow all the steps! If you encounter issues, please notify the TAs as soon as possible so that we can make an adjustment for you.
- Please submit PDF files of your completed notebooks to Gradescope. You can print the notebook using
File -> Print
in the menu bar. - This tutorial covers the basics of ESPnet, which will be the foundation of the next tutorial on Wednesday.
Objectives
After this tutorial, you are expected to know:
- How to run existing recipes (data prep, training, inference and scoring) in ESPnet2
- How to change the training and decoding configurations
- How to create a new recipe from scratch
- Where to find resources if you encounter an issue
Useful links
- Installation https://espnet.github.io/espnet/installation.html
- Usage https://espnet.github.io/espnet/espnet2_tutorial.html
Install ESPnet
This is a full installation method to perform data preprocessing, training, inference, scoring, and so on.
We prepare various ways of installation. Please read https://espnet.github.io/espnet/installation.html#step-2-installation-espnet for more details.
Function to print date and time
We first define a function to print the current date and time, which will be used in multiple places below.
def print_date_and_time():
from datetime import datetime
import pytz
now = datetime.now(pytz.timezone("America/New_York"))
print("=" * 60)
print(f' Current date and time: {now.strftime("%m/%d/%Y %H:%M:%S")}')
print("=" * 60)
# example output
print_date_and_time()
Check GPU type
Let's check the GPU type of this allocated environment.
!nvidia-smi
Download ESPnet
We use git clone
to download the source code of ESPnet and then go to a specific commit.
Important: In other versions of ESPnet, you may encounter errors related to imcompatible package versions (numba
). Please use the same commit to avoid such issues.
# It takes a few seconds
!git clone --depth 5 https://github.com/espnet/espnet
Setup Python environment based on anaconda
There are several other installation methods, but we highly recommend the anaconda-based one.
# It takes 30 seconds
%cd /content/espnet/tools
!./setup_anaconda.sh anaconda espnet 3.9
Install ESPnet (same procedure as your first tutorial)
This step installs PyTorch and other required tools.
We specify CUDA_VERSION=11.6
for PyTorch 1.12.1. We also support many other versions. Please check https://github.com/espnet/espnet/blob/master/tools/installers/install_torch.sh for the detailed version list.
# It may take 12 minutes
%cd /content/espnet/tools
!make TH_VERSION=1.12.1 CUDA_VERSION=11.6
If other listed packages are necessary, install any of them using
. ./activation_python.sh && ./installers/install_xxx.sh
We show two examples, although they are not used in this demo.
# s3prl and fairseq are necessary if you want to use self-supervised pre-trained models
# It takes 50s
%cd /content/espnet/tools
!. ./activate_python.sh && ./installers/install_s3prl.sh
!. ./activate_python.sh && ./installers/install_fairseq.sh # install s3prl to use Wav2Vec2 / HuBERT model series
Run an existing recipe
ESPnet has a number of recipes (130 recipes on Sep. 11, 2022). Please refer to https://github.com/espnet/espnet/blob/master/egs2/README.md for a complete list.
Please also check the general usage of the recipe in https://espnet.github.io/espnet/espnet2_tutorial.html#recipes-using-espnet2
CMU AN4 recipe
In this tutorial, we will use the CMU an4
recipe. This is a small-scale speech recognition task mainly used for testing.
First, let's go to the recipe directory.
%cd /content/espnet/egs2/an4/asr1
!ls
egs2/an4/asr1/
- conf/ # Configuration files for training, inference, etc.
- scripts/ # Bash utilities of espnet2
- pyscripts/ # Python utilities of espnet2
- steps/ # From Kaldi utilities
- utils/ # From Kaldi utilities
- db.sh # The directory path of each corpora
- path.sh # Setup script for environment variables
- cmd.sh # Configuration for your backend of job scheduler
- run.sh # Entry point
- asr.sh # Invoked by run.sh
⭕ [SSL] Get the dump_hubert_feature.sh
script and the training config
ready.
- GitHub: https://github.com/simpleoier/ESPnet_SSL_ASR_tutorial_misc.git)
!rm -r ESPnet_SSL_ASR_tutorial_misc
!git clone https://github.com/simpleoier/ESPnet_SSL_ASR_tutorial_misc.git
!cp ESPnet_SSL_ASR_tutorial_misc/dump_ssl_feature.sh ./local
!cp ESPnet_SSL_ASR_tutorial_misc/dump_feats.py ./local
!cp ESPnet_SSL_ASR_tutorial_misc/feats_loaders.py ./local
!chmod +x local/dump_ssl_feature.sh
!cp ESPnet_SSL_ASR_tutorial_misc/train_asr_demo_branchformer.yaml ./conf
ESPnet is designed for various use cases (local machines or cluster machines) based on Kaldi tools. If you use it in the cluster machines, please also check https://kaldi-asr.org/doc/queue.html
The main stages can be parallelized by various jobs.
!cat run.sh
!ls conf
!ls local
run.sh
calls asr.sh
, which completes the entire speech recognition experiments, including data preparation, training, inference, and scoring. They are separated into multiple stages (totally 16).
Instead of executing the entire pipeline by run.sh
, let's run it stage-by-stage to understand the process in each stage.
Data preparation
Stage 1: Data preparation: download raw data, split the entire set into train/dev/test, and prepare them in the Kaldi format
Note that --stage <N>
is to start from this stage and --stop_stage <N>
is to stop after this stage. We also need to specify the train, dev and test sets.
# a few seconds
!./asr.sh --stage 1 --stop_stage 1 --train_set train_nodev --valid_set train_dev --test_sets "train_dev test"
After this stage is finished, please check the newly created data
directory:
!ls data
In this recipe, we use train_nodev
as a training set, train_dev
as a validation set (monitor the training progress by checking the validation score). We also use test
and train_dev
sets for the final speech recognition evaluation.
Let's check one of the training data directories:
!ls -1 data/train_nodev/
These are the speech and corresponding text and speaker information in the Kaldi format. To understand their meanings, please check https://github.com/espnet/espnet/tree/master/egs2/TEMPLATE#about-kaldi-style-data-directory.
Please also check the official documentation of Kaldi: https://kaldi-asr.org/doc/data_prep.html
spk2utt # Speaker information
text # Transcription file
utt2spk # Speaker information
wav.scp # Audio file
Stage 2: Speed perturbation (one of the data augmentation methods)
We do not use speed perturbation for this demo. But you can turn it on by adding an argument --speed_perturb_factors "0.9 1.0 1.1"
to the shell script.
Note that we perform speed perturbation and save the augmented data in the disk before training. Another approach is to perform data augmentation during training, such as SpecAug.
!./asr.sh --stage 2 --stop_stage 2 --train_set train_nodev --valid_set train_dev --test_sets "train_dev test"
Stage 3: Format wav.scp: data/ -> dump/raw
We dump the data with specified format (flac in this case) for the efficient use of the data.
# ====== Recreating "wav.scp" ======
# Kaldi-wav.scp, which can describe the file path with unix-pipe, like "cat /some/path |",
# shouldn't be used in training process.
# "format_wav_scp.sh" dumps such pipe-style-wav to real audio file
# and it can also change the audio-format and sampling rate.
# If nothing is need, then format_wav_scp.sh does nothing:
# i.e. the input file format and rate is same as the output.
Note that --nj <N>
means the number of CPU jobs. Please set it appropriately by considering your CPU resources and disk access.
# 25 seconds
!./asr.sh --stage 3 --stop_stage 3 --train_set train_nodev --valid_set train_dev --test_sets "train_dev test" --nj 4
⭕ [SSL] Stage 3.5: Extract SSL features
We dump the SSL features of the data with specified format (kaldi mat in this case) for the efficient use of the data.
First, we need to prepare the pretrained SSL models. In this colab, we use HuBERT models. We have three choices:
- HuBERT through FairSeq API; Model choices can be found from fairseq/hubert pretrained models
Example usage: mkdir -p downloads/hubert_pretrained_models wget https://dl.fbaipublicfiles.com/hubert/hubert_large_ll60k.pt -O ./downloads/hubert_pretrained_models/hubert_large_ll60k.pt Append the following arguments: --feature_type hubert --hubert_type fairseq --hubert_url "https://dl.fbaipublicfiles.com/hubert/hubert_large_ll60k.pt" --hubert_dir_path "./downloads/hubert_pretrained_models" --layer 23
- HuBERT from ESPnet;
Example usage: # Download model ./asr.sh --skip_data_prep true --skip_train true --skip_eval true --skip_upload true --download_model simpleoier/simpleoier_librispeech_hubert_iter1_train_ssl_torchaudiohubert_base_960h_pretrain_it1_raw --train_set train_nodev --valid_set train_dev --test_sets "train_dev test" Append the following arguments: --feature_type hubert --hubert_type espnet --hubert_dir_path "/content/espnet/tools/anaconda/envs/espnet/lib/python3.9/site-packages/espnet_model_zoo/models--simpleoier--simpleoier_librispeech_hubert_iter1_train_ssl_torchaudiohubert_base_960h_pretrain_it1_raw/snapshots/4256c702685249202f333348a87c13143985b90b/exp/hubert_iter1_train_ssl_torchaudiohubert_base_960h_pretrain_it1_raw/valid.loss.ave.pth" --layer 12
- HuBERT through S3PRL API. S3prl also supports many other SSL models. Model choices can be found from
s3prl_upstream_names
hereAppend the following arguments: --feature_type s3prl --s3prl_upstream_name hubert_large_ll60k --layer 24
- HuBERT through FairSeq API; Model choices can be found from fairseq/hubert pretrained models
Second, we extract the hubert features and copy the
feats.scp
into data dirs.# ====== Creating "feats.scp" ====== # Kaldi-feats.scp, which describe the file path (ark file) and offset,
Note that
--nj <N>
means the number of CPU / GPU jobs. Please set it appropriately by considering your CPU resources and disk access.local/dump_ssl_feature.sh
is the entry script.📗 Check the shape of dumped feature [1.0 pt]
We will finally read the dumped feature and print the shape information to check if it is successful. The expected output is
fkai-an311-b (155, 1024)
# 5 min
# 'dump_hubert_feature.sh' reads wave files from a common dir, so we symbolically link dump/raw/test in dump/raw/org
!ln -s /content/espnet/egs2/an4/asr1/dump/raw/test /content/espnet/egs2/an4/asr1/dump/raw/org
!rm -r ssl_feats/
# Fairseq HuBERT large example
# !mkdir -p downloads/hubert_pretrained_models
# !wget https://dl.fbaipublicfiles.com/hubert/hubert_large_ll60k.pt -O ./downloads/hubert_pretrained_models/hubert_large_ll60k.pt
# !local/dump_ssl_feature.sh --feat_dir ssl_feats --datadir dump/raw/org --train_set train_nodev --dev_set train_dev --test_sets "test" --use_gpu true --nj 1 --feature_type hubert --hubert_type fairseq --hubert_url "https://dl.fbaipublicfiles.com/hubert/hubert_large_ll60k.pt" --hubert_dir_path "./downloads/hubert_pretrained_models" --layer 23
# S3PRL HuBERT large example
!local/dump_ssl_feature.sh --feat_dir ssl_feats --datadir dump/raw/org --train_set train_nodev --dev_set train_dev --test_sets "test" --use_gpu true --nj 1 --feature_type s3prl --s3prl_upstream_name wavlm_large --layer 24
#!local/dump_ssl_feature.sh --feat_dir ssl_feats --datadir dump/raw/org --train_set train_nodev --dev_set train_dev --test_sets "test" --use_gpu true --nj 1 --feature_type s3prl --s3prl_upstream_name hubert_large_ll60k --layer 24
# copy the feats.scp to data/*
!cp ssl_feats/s3prl/train_nodev/feats.scp data/train_nodev
!cp ssl_feats/s3prl/train_dev/feats.scp data/train_dev
!cp ssl_feats/s3prl/test/feats.scp data/test
# Print the shape of dumped features.
!/content/espnet/tools/anaconda/envs/espnet/bin/python3 -c "import kaldiio; reader=kaldiio.ReadHelper('scp:data/train_nodev/feats.scp'); key, array = next(reader.generator); print(key, array.shape)"
⭕ [SSL] Stage 3: Format feats.scp: data/ -> dump/extracted
Because we want to use extracted feature instead of raw wave, we need to run step 3 again**. It only construct a new dump/extracted folder, with some superficial commands.
👀 From now on, --feats_type "extracted"
will be added.
# 25 seconds
!./asr.sh --stage 3 --stop_stage 3 --train_set train_nodev --valid_set train_dev --test_sets "train_dev test" --feats_type "extracted" --nj 4
Stage 4: Remove long/short data: dump/extracted/org -> dump/raw
Too long and too short audio data are harmful for efficient training. Those utterances are removed for training. But for inference and scoring, we still use the full data, which is important for fair comparison.
!./asr.sh --stage 4 --stop_stage 4 --feats_type "extracted" --train_set train_nodev --valid_set train_dev --test_sets "train_dev test"
Stage 5: Generate token_list from dump/extracted/train_nodev/text using BPE.
This is important for text processing. Here, we make a dictionary simply using the English characters. We use the sentencepiece
toolkit developed by Google.
!./asr.sh --stage 5 --stop_stage 5 --feats_type "extracted" --train_set train_nodev --valid_set train_dev --test_sets "train_dev test"
Language modeling (skipped in this tutorial)
Stages 6--9: Stages related to language modeling.
We skip the language modeling part in the recipe (stages 6 -- 9) in this tutorial.
How to change the configs?
Let's revisit the configs, since this is probably the most important part to improve the performance.
Config file based
All training options are changed in the config file.
Pleae check https://espnet.github.io/espnet/espnet2_training_option.html
Let's first check config files prepared in the an4
recipe
- LSTM-based E2E ASR /content/espnet/egs2/an4/asr1/conf/train_asr_rnn.yaml
- Transformer based E2E ASR /content/espnet/egs2/an4/asr1/conf/train_asr_transformer.yaml
You can run
RNN
./asr.sh --stage 10 \
--feats_type "extracted" \
--train_set train_nodev \
--valid_set train_dev \
--test_sets "train_dev test" \
--nj 4 \
--inference_nj 4 \
--use_lm false \
--asr_config conf/train_asr_rnn.yaml
Transformer
./asr.sh --stage 10 \
--feats_type "extracted" \
--train_set train_nodev \
--valid_set train_dev \
--test_sets "train_dev test" \
--nj 4 \
--inference_nj 4 \
--use_lm false \
--asr_config conf/train_asr_transformer.yaml
You can also find various configs in other recipes espnet/egs2/*/asr1/conf/
, including
- Conformer
egs2/librispeech/asr1/conf/tuning/train_asr_conformer10_hop_length160.yaml
- Branchformer
egs2/librispeech/asr1/conf/tuning/train_asr_branchformer_hop_length160_e18_linear3072.yaml
Command line argument based
You can also customize it by passing the command line arguments, e.g.,
./run.sh --stage 10 --asr_args "--model_conf ctc_weight=0.3"
./run.sh --stage 10 --asr_args "--optim_conf lr=0.1"
This approach has a highest priority. Thus, the arguments passed in the command line will overwrite those defined in the config file. This is convenient if you only want to change a few arguments.
Please refer to https://espnet.github.io/espnet/espnet2_tutorial.html#change-the-configuration-for-training for more details.
📗 Exercise 1
Run training, inference and scoring on AN4 using a new config. Here is an example config using Branchformer (Peng et al, ICML 2022).
⭕ [SSL] Config modifications:
- Frontend is set to
null
. - A
preencoder
is added to reduce input dimension. - In the
encoder
, the subsampling is reduced to 2 (input_layer is conv2d2)
⭕ [SSL] Normalization
- Gobal Mean normalization
- Compute the statistics (mean / var) on the full training set. This is done in stage 10. Both mean and var are considered.
- This is set by default in
asr.sh by
, specifically the argument--feats_normalize global_mvn
.
- Utterance Mean normalization
- Compute the statistics (mean / var) on each single utterance. By default, ESPnet only normalize the mean.
- This can specified to
asr.sh
by--feats_normalize utt_mvn
. Whatever the value is, as long as it is notglobal_mvn
.
- No normalization
- Nothing is done in the feature.
- This can be specified by
--feats_normalize null --asr_args "--normalize null"
Similarly, we create a config file named train_asr_demo_branchformer.yaml
and start training.
batch_type: numel
batch_bins: 4000000
accum_grad: 1 # gradient accumulation steps
max_epoch: 40
patience: 10
init: xavier_uniform
best_model_criterion: # criterion to save best models
- - valid
- acc
- max
keep_nbest_models: 10 # save nbest models and average these checkpoints
use_amp: true # whether to use automatic mixed precision
num_att_plot: 0 # do not save attention plots to save time in the demo
num_workers: 2 # number of workers in dataloader
frontend: null # Since extracted features are used, frontend is not used.
preencoder: linear
preencoder_conf:
input_size: 1024
output_size: 128
encoder: branchformer
encoder_conf:
output_size: 256
use_attn: true
attention_heads: 4
attention_layer_type: rel_selfattn
pos_enc_layer_type: rel_pos
rel_pos_type: latest
use_cgmlp: true
cgmlp_linear_units: 1024
cgmlp_conv_kernel: 31
use_linear_after_conv: false
gate_activation: identity
merge_method: concat
cgmlp_weight: 0.5 # used only if merge_method is "fixed_ave"
attn_branch_drop_rate: 0.0 # used only if merge_method is "learned_ave"
num_blocks: 12
dropout_rate: 0.1
positional_dropout_rate: 0.1
attention_dropout_rate: 0.1
input_layer: conv2d2
stochastic_depth_rate: 0.0
decoder: transformer
decoder_conf:
attention_heads: 4
linear_units: 1024
num_blocks: 3
dropout_rate: 0.1
positional_dropout_rate: 0.1
self_attention_dropout_rate: 0.1
src_attention_dropout_rate: 0.1
model_conf:
ctc_weight: 0.3 # joint CTC/attention training
lsm_weight: 0.1 # label smoothing weight
length_normalized_loss: false
optim: adam
optim_conf:
lr: 0.0002
scheduler: warmuplr # linearly increase and exponentially decrease
scheduler_conf:
warmup_steps: 200
My result is shown below:
## exp/asr_train_asr_demo_branchformer_extracted_bpe30
### WER
|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|---|---|---|---|---|---|---|---|---|
|decode_asr_asr_model_valid.acc.ave/test|130|773|95.9|2.6|1.6|0.0|4.1|16.9|
|decode_asr_asr_model_valid.acc.ave/train_dev|100|591|92.0|5.9|2.0|0.2|8.1|28.0|
### CER
|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|---|---|---|---|---|---|---|---|---|
|decode_asr_asr_model_valid.acc.ave/test|130|2565|98.1|0.1|1.8|0.1|2.0|16.9|
|decode_asr_asr_model_valid.acc.ave/train_dev|100|1915|95.5|0.7|3.8|0.2|4.7|28.0|
### TER
|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|---|---|---|---|---|---|---|---|---|
|decode_asr_asr_model_valid.acc.ave/test|130|2695|98.1|0.1|1.7|0.1|1.9|16.9|
|decode_asr_asr_model_valid.acc.ave/train_dev|100|2015|95.7|0.7|3.6|0.1|4.5|28.0|
# ~10 min
# Run multiple stages
!rm -r exp/asr_train_asr_demo_branchformer_extracted_bpe30
!./asr.sh --stage 10 --stop_stage 13 --feats_type "extracted" --feats_normalize utt_mvn --train_set train_nodev --valid_set train_dev --test_sets "train_dev test" --nj 4 --ngpu 1 --use_lm false --gpu_inference true --inference_nj 1 --asr_config conf/train_asr_demo_branchformer.yaml --inference_config conf/decode_asr.yaml
# Load the TensorBoard notebook extension
%load_ext tensorboard
# Launch tensorboard before training
%tensorboard --logdir /content/espnet/egs2/an4/asr1/exp/asr_train_asr_demo_branchformer_extracted_bpe30/tensorboard
# NOTE: Exercise 1 Result 1 (HuBERT)
!scripts/utils/show_asr_result.sh exp
from IPython.display import Image, display
display(Image('exp/asr_train_asr_demo_branchformer_extracted_bpe30/images/acc.png', width=400))
print_date_and_time()
# NOTE: Exercise 1 Result 2 (WavLM)
!scripts/utils/show_asr_result.sh exp
from IPython.display import Image, display
display(Image('exp/asr_train_asr_demo_branchformer_extracted_bpe30/images/acc.png', width=400))
print_date_and_time()
# NOTE: Exercise 1 Result 3 (WavLM utt_mvn)
!scripts/utils/show_asr_result.sh exp
from IPython.display import Image, display
display(Image('exp/asr_train_asr_demo_branchformer_extracted_bpe30/images/acc.png', width=400))
print_date_and_time()
📗 Questions
WavLM is a newer model which uses masked speech denoising to create an embedding applicable to multiple downstream tasks, not just ASR.
- Get the ASR performance of one more SSL feature, WavLM, and show the results. [1 pt]
Hint: change the s3prl_upstream_name to wavlm_large
at stage 3.5 and run the following stages.
# RESULTS
## Environments
- date: `Sat Feb 25 03:26:54 UTC 2023`
- python version: `3.9.16 (main, Jan 11 2023, 16:05:54) [GCC 11.2.0]`
- espnet version: `espnet 202301`
- pytorch version: `pytorch 1.12.1`
- Git hash: `15a6dc1501b65211725a4fb514fcf5dd24f7ae95`
- Commit date: `Thu Feb 23 22:04:23 2023 -0500`
## exp/asr_train_asr_demo_branchformer_extracted_bpe30
### WER
|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|---|---|---|---|---|---|---|---|---|
|decode_asr_asr_model_valid.acc.ave/test|130|773|63.5|13.6|22.9|2.2|38.7|79.2|
|decode_asr_asr_model_valid.acc.ave/train_dev|100|591|59.6|18.1|22.3|2.4|42.8|82.0|
### CER
|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|---|---|---|---|---|---|---|---|---|
|decode_asr_asr_model_valid.acc.ave/test|130|2565|80.6|2.6|16.8|1.4|20.8|79.2|
|decode_asr_asr_model_valid.acc.ave/train_dev|100|1915|78.0|4.6|17.4|0.8|22.8|82.0|
### TER
|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|---|---|---|---|---|---|---|---|---|
|decode_asr_asr_model_valid.acc.ave/test|130|2695|81.6|2.4|16.0|1.3|19.8|79.2|
|decode_asr_asr_model_valid.acc.ave/train_dev|100|2015|79.1|4.4|16.6|0.7|21.7|82.0|
============================================================
Current date and time: 02/24/2023 22:26:55
============================================================
- Compare the performance between HuBERT, WavLM and MFCC features. Which is better? How much is it? Why do you think it is better in one sentence? [1 pt]
It seems like HuBERT performed slightly better than WavLM, probably because HuBERT is more specifically focused on this ASR.
Make a exploration of normalization mentioned in Stage 10 for either HuBRET or WavLM feature. Report the performance. [1 pt]
Hint: you may change the number of epochs to get better performance.
RESULTS
Environments
- date:
Sat Feb 25 04:31:27 UTC 2023
- python version:
3.9.16 (main, Jan 11 2023, 16:05:54) [GCC 11.2.0]
- espnet version:
espnet 202301
- pytorch version:
pytorch 1.12.1
- Git hash:
15a6dc1501b65211725a4fb514fcf5dd24f7ae95
- Commit date:
Thu Feb 23 22:04:23 2023 -0500
- Commit date:
exp/asr_train_asr_demo_branchformer_extracted_bpe30
WER
dataset | Snt | Wrd | Corr | Sub | Del | Ins | Err | S.Err |
---|---|---|---|---|---|---|---|---|
decode_asr_asr_model_valid.acc.ave/test | 130 | 773 | 63.5 | 13.6 | 22.9 | 2.2 | 38.7 | 79.2 |
decode_asr_asr_model_valid.acc.ave/train_dev | 100 | 591 | 59.6 | 18.1 | 22.3 | 2.4 | 42.8 | 82.0 |
CER
dataset | Snt | Wrd | Corr | Sub | Del | Ins | Err | S.Err |
---|---|---|---|---|---|---|---|---|
decode_asr_asr_model_valid.acc.ave/test | 130 | 2565 | 80.6 | 2.6 | 16.8 | 1.4 | 20.8 | 79.2 |
decode_asr_asr_model_valid.acc.ave/train_dev | 100 | 1915 | 78.0 | 4.6 | 17.4 | 0.8 | 22.8 | 82.0 |
TER
dataset | Snt | Wrd | Corr | Sub | Del | Ins | Err | S.Err |
---|---|---|---|---|---|---|---|---|
decode_asr_asr_model_valid.acc.ave/test | 130 | 2695 | 81.6 | 2.4 | 16.0 | 1.3 | 19.8 | 79.2 |
decode_asr_asr_model_valid.acc.ave/train_dev | 100 | 2015 | 79.1 | 4.4 | 16.6 | 0.7 | 21.7 | 82.0 |
============================================================ Current date and time: 02/24/2023 23:31:28
## Contribute to ESPnet
Please follow https://github.com/espnet/espnet/blob/master/CONTRIBUTING.md to upload your pre-trained model to [Hugging Face](https://huggingface.co/espnet) and make a pull request in the [ESPnet repository](https://github.com/espnet/espnet/pulls).