espnet.bin package¶
Initialize sub package.
espnet.bin.st_trans¶
End-to-end speech translation model decoding script.
espnet.bin.__init__¶
Initialize sub package.
espnet.bin.mt_train¶
Neural machine translation model training script.
espnet.bin.vc_decode¶
VC decoding script.
espnet.bin.mt_trans¶
Neural machine translation model decoding script.
espnet.bin.asr_enhance¶
espnet.bin.tts_train¶
Text-to-speech model training script.
espnet.bin.asr_align¶
This program performs CTC segmentation to align utterances within audio files.
- Inputs:
- –data-json:
A json containing list of utterances and audio files
- –model:
An already trained ASR model
- Output:
- –output:
A plain segments file with utterance positions in the audio files.
- Selected parameters:
- –min-window-size:
Minimum window size considered for a single utterance. The current default value should be OK in most cases. Larger values might give better results; too large values cause IndexErrors.
- –subsampling-factor:
If the encoder sub-samples its input, the number of frames at the CTC layer is reduced by this factor.
- –frame-duration:
This is the non-overlapping duration of a single frame in milliseconds (the inverse of frames per millisecond).
- –set-blank:
In the rare case that the blank token has not the index 0 in the character dictionary, this parameter sets the index of the blank token.
- –gratis-blank:
Sets the transition cost for blank tokens to zero. Useful if there are longer unrelated segments between segments.
- –replace-spaces-with-blanks:
Spaces are replaced with blanks. Helps to model pauses between words. May increase length of ground truth. May lead to misaligned segments when combined with the option –gratis-blank.
-
espnet.bin.asr_align.
ctc_align
(args, device)[source]¶ ESPnet-specific interface for CTC segmentation.
Parses configuration, infers the CTC posterior probabilities, and then aligns start and end of utterances using CTC segmentation. Results are written to the output file given in the args.
- Parameters:
args – given configuration
device – for inference; one of [‘cuda’, ‘cpu’]
- Returns:
0 on success
espnet.bin.asr_train¶
Automatic speech recognition model training script.
espnet.bin.tts_decode¶
TTS decoding script.
espnet.bin.st_train¶
End-to-end speech translation model training script.
espnet.bin.vc_train¶
Voice conversion model training script.
espnet.bin.asr_recog¶
End-to-end speech recognition model decoding script.
espnet.bin.lm_train¶
Language model training script.