Skip to main content
Tutorials
Full ESPnet installation
ESPnet2
ESPnet1
Training configurations
Recipe tips
Audio formatting
Task class and data input system
Docker
Job scheduling system
Distributed training
Document Generation
Demos
Roadmap
ESPnet2
Demo
Course
ESPnet-EZ
ESPnet EZ
ESPnet1 (Legacy)
ESPnet1
Recipes
What is a recipe template?
Automatic Speech Recognition (Multi-tasking)
Automatic Speech Recognition with Discrete Units
Speaker Verification Spoofing and Countermeasures
Classification
Speech Codec
Speaker Diarisation
Speech Enhancement
Speech Recognition with Speech Enhancement
Speaker Diarisation with Speech Enhancement
Speech-to-Text Translation with Speech Enhancement
Self-supervised Learning
Language Modeling
Machine Translation
Speech-to-Speech Translation
Weakly-supervised Learning (Speech-to-Text)
ESPnet-SDS
Spoken Language Understanding
Speech Language Model
Speaker Representation
Self-supervised Learning
Speech-to-Text Translation
Singing Voice Synthesis
Text-to-Speech
Text-to-Speech with Discrete Units
Unsupervised Automatic Speech Recognition
Python API
espnet
asr
distributed
lm
mt
nets
optimizer
scheduler
st
transform
tts
utils
vc
espnet2
asr
asr_transducer
asvspoof
cls
diar
enh
fileio
fst
gan_codec
gan_svs
gan_tts
hubert
iterators
layers
lm
main_funcs
mt
optimizers
s2st
s2t
samplers
schedulers
sds
slu
speechlm
spk
ssl
st
svs
tasks
text
torch_utils
train
tts
tts2
uasr
utils
espnetez
config
data
dataloader
dataset
preprocess
task
trainer
Shell API
espnet2_bin
espnet_bin
spm
utils
utils_py
Search
Ctrl
K
Speechlm
Less than 1 minute
Catalog
espnet2.speechlm.core_lm.abs_core_lm.AbsCoreLM
espnet2.speechlm.core_lm.abs_core_lm.SpeechLMInferenceOptions
espnet2.speechlm.core_lm.ar_multiscale.MultiScaleLM
espnet2.speechlm.core_lm.valle.ValleLM
espnet2.speechlm.definitions.Modality
espnet2.speechlm.definitions.pad_until
espnet2.speechlm.definitions.SpeechLMTask
espnet2.speechlm.espnet_model.ESPnetSpeechLMModel
espnet2.speechlm.module.transformer.LayerNorm
espnet2.speechlm.module.transformer.Linear
espnet2.speechlm.module.transformer.MultiHeadAttention
espnet2.speechlm.module.transformer.ResidualAttentionBlock
espnet2.speechlm.module.transformer.TransformerDecoder
espnet2.speechlm.module.valle.AdaLN
espnet2.speechlm.module.valle.ResidualAttentionBlockAdaLM
espnet2.speechlm.module.valle.ValleNARDecoder
espnet2.speechlm.net_utils.causal_mask
espnet2.speechlm.net_utils.ce_loss
espnet2.speechlm.net_utils.install_kv_cache_hook
espnet2.speechlm.net_utils.length_mask
espnet2.speechlm.net_utils.logits_to_tokens
espnet2.speechlm.tokenizer.abs_tokenizer.AbsTokenizer
espnet2.speechlm.tokenizer.beats_tokenizer.BeatsRandomTokenizer
espnet2.speechlm.tokenizer.beats_tokenizer.BeatsTokenizer
espnet2.speechlm.tokenizer.beats_tokenizer.BeatsTokenizerConfig
espnet2.speechlm.tokenizer.beats_tokenizer.EmbeddingEMA
espnet2.speechlm.tokenizer.beats_tokenizer.l2norm
espnet2.speechlm.tokenizer.beats_tokenizer.NormEMAVectorQuantizer
espnet2.speechlm.tokenizer.codec_tokenizer.CodecTokenizer
espnet2.speechlm.tokenizer.random_tokenizer.RandomProjectionQuantizer
Prev
Slu
Next
Spk