Classification

About 4 min

Classification

This is a template of cls1 recipe for ESPnet2.

Recipe flow
How to run
Evaluation
About data directory
Problems you might encounter
- 1. Torcheval not found
Supported Models

Recipe flow

CLS recipe consists of 10 stages.

Database-dependent data preparation

Data preparation stage. It calls local/data.sh to creates Kaldi-style data directories for training, validation, and evaluation sets.

How to run

TOOD(shikhar): Change this to a recipe which downloads data (perhaps beans) later.

Here, we show the procedure to run the recipe using egs2/as20k/cls1.

Move on the recipe directory.

$ cd egs2/as20k/cls1

Modify AUDIOSET variable in db.sh to specify location where you have the AudioSet dataset.

$ vim db.sh

Modify cmd.sh and conf/*.conf if you want to use job scheduler. See the detail in using job scheduling system.

$ vim cmd.sh

Run run.sh, which conducts all of the stages explained above.

$ ./run.sh

For the first time, we recommend performing each stage step-by-step via --stage and --stop_stage options.

$ ./run.sh --stage 1 --stop_stage 1
$ ./run.sh --stage 2 --stop_stage 2
...
$ ./run.sh --stage 7 --stop_stage 7

This might help you understand each stage's processing and directory structure.

Evaluation

Here we show the example command to calculate classification metrics:


cd egs2/&lt;recipe_name&gt;/cls1
. ./path.sh

python3 pyscripts/utils/cls_score.py \
    -gtxt data/text \
    -ptxt exp/cls_&lt;split&gt;/text \
    -pscore exp/cls_&lt;split&gt;/score \
    -tok data/token_list

Each directory of training set, development set, and evaluation set, has same directory structure. See also https://github.com/espnet/espnet/tree/master/egs2/TEMPLATE#about-kaldi-style-data-directory about Kaldi data structure.

Directory structure

data/
├── train/     # Training set directory
│   ├── text       # The transcription
│   ├── wav.scp    # Wave file path
│   ├── utt2spk    # A file mapping utterance-id to speaker-id
│   ├── spk2utt    # A file mapping speaker-id to utterance-id
|
├── dev/
│   ...
├── eval/
│   ...
└── token_list   # token list file
    ...

text format
```
uttidA &lt;class_a&gt;
uttidB &lt;class_b1&gt; &lt;class_b2&gt;
...
```
Note that for multi-class classification each uttid should be associated with exactly one class. For multi-label classification, each uttid should have at least one label. (TODO) We will support the case with no label in the future with the <blank> symbol.

wav.scp format

uttidA /path/to/uttidA.wav
uttidB /path/to/uttidB.wav
...

utt2spk format

uttidA speakerA
uttidB speakerB
uttidC speakerA
uttidD speakerB
...

spk2utt format
```
speakerA uttidA uttidC ...
speakerB uttidB uttidD ...
...
```
Note that spk2utt file can be generated by utt2spk, and utt2spk can be generated by spk2utt, so it's enough to create either one of them.
```
utils/utt2spk_to_spk2utt.pl data/train/utt2spk > data/train/spk2utt
utils/spk2utt_to_utt2spk.pl data/train/spk2utt > data/train/utt2spk
```
If your corpus doesn't include speaker information, give the same speaker id as the utterance id to satisfy the directory format, otherwise give the same speaker id for all utterances (Actually we don't use speaker information for cls1 recipe now).
```
uttidA uttidA
uttidB uttidB
...
```
OR
```
uttidA dummy
uttidB dummy
...
```

Once you complete creating the data directory, it's good to check it by utils/validate_data_dir.sh.

utils/validate_data_dir.sh --no-feats data/train
utils/validate_data_dir.sh --no-feats data/dev
utils/validate_data_dir.sh --no-feats data/test

Problems you might encounter

Below are some common errors to watch out for:

Torcheval not found

Run pip install torcheval

Supported Models

TODO(shikhar): Add details about BEATs once it is trained.

Classification

Classification

Table of Contents

Recipe flow

How to run

About data directory

Supported Models