Self-supervised Learning
Self-supervised Learning
This is a template of the ssl1 recipe for ESPnet2, designed for general purpose SSL.
Differences from other recipes
ESPnet2 serves two different recipes for Self-Supervised Learning (SSL): ssl1
(this one) and hubert1
.
hubert1
is the original implementation of SSL under the HuBERT pre-training framework. The recipe takes care of everything need for pre-training, such as K-means pseudo-labelling and discrete token evaluation. This is very important for reproducibility. However, it is quite complicated due to the multiple offline stages required for HuBERT and therefore difficult to hack/adapt to new training methods or other scenarios.
We created the new ssl1
recipe to future-proof the codebase to accomodate other pre-training techniques that are purely end-to-end, such as DinoSR, SpeechFlow, or w2v-BERT. This recipe is designed to be easily customizable and more scalable to large-scale pre-training setups.
HuBERT Pre-training in SSL1
The ssl1
codebase also supports HuBERT pre-training, but the steps to create the pseudo-labels are not included in the recipe. Users will either need to run the hubert1
recipe to obtain the labels, or generate it themselves.
To use the labels from hubert1
, follow these steps
Given a training set called
train_ssl
and a dev set calleddev_ssl
Run
hubert1/hubert.sh
from stages 1 to 5 for a single iteration. This will generate:A token vocabulary list. It will be called something like
hubert1/data/en_token_list_kmeans_iter1_espnet_hubert_500clusters/word/tokens.txt
. The name will depend on your exact hyperparameters.A pseudo-label text file for both sets. The exact path will depend on your hyperparameters, but will look something like
hubert1/dump/<feat_type>/espnet_hubert/layer_<x>/<data split name>/pseudo_labels_km<num>.txt
.
Copy each
pseudo_labels_km<num>.txt
to the respective kaldi directly inssl
astext
. For example:cp hubert1/dump/ssl_feats/espnet_hubert/layer_9/train_ssl/pseudo_labels_km500.txt ssl1/dump/train_ssl/text
In
ssl1/run.sh
, add the following flags:--token_type word
--token_list <path to token list from step 1.1>
Update your training config with the used k-means size
loss: - name: hubert conf: num_classes: <update this>