Self-supervised Learning
Self-supervised Learning
This is a template of the ssl1 recipe for ESPnet2, designed for general purpose SSL.
Differences from other recipes
ESPnet2 serves two different recipes for Self-Supervised Learning (SSL): ssl1 (this one) and hubert1.
hubert1 is the original implementation of SSL under the HuBERT pre-training framework. The recipe takes care of everything need for pre-training, such as K-means pseudo-labelling and discrete token evaluation. This is very important for reproducibility. However, it is quite complicated due to the multiple offline stages required for HuBERT and therefore difficult to hack/adapt to new training methods or other scenarios.
We created the new ssl1 recipe to future-proof the codebase to accomodate other pre-training techniques that are purely end-to-end, such as DinoSR, SpeechFlow, or w2v-BERT. This recipe is designed to be easily customizable and more scalable to large-scale pre-training setups.
HuBERT Pre-training in SSL1
The ssl1 codebase also supports HuBERT pre-training, but the steps to create the pseudo-labels are not included in the recipe. Users will either need to run the hubert1 recipe to obtain the labels, or generate it themselves.
To use the labels from hubert1, follow these steps
Given a training set called
train_ssland a dev set calleddev_sslRun
hubert1/hubert.shfrom stages 1 to 5 for a single iteration. This will generate:A token vocabulary list. It will be called something like
hubert1/data/en_token_list_kmeans_iter1_espnet_hubert_500clusters/word/tokens.txt. The name will depend on your exact hyperparameters.A pseudo-label text file for both sets. The exact path will depend on your hyperparameters, but will look something like
hubert1/dump/<feat_type>/espnet_hubert/layer_<x>/<data split name>/pseudo_labels_km<num>.txt.
Copy each
pseudo_labels_km<num>.txtto the respective kaldi directly insslastext. For example:cp hubert1/dump/ssl_feats/espnet_hubert/layer_9/train_ssl/pseudo_labels_km500.txt ssl1/dump/train_ssl/textIn
ssl1/run.sh, add the following flags:--token_type word--token_list <path to token list from step 1.1>
Update your training config with the used k-means size
loss: - name: hubert conf: num_classes: <update this>
