espnet.bin package¶

Initialize sub package.

espnet.bin.st_trans¶

End-to-end speech translation model decoding script.

espnet.bin.st_trans.get_parser()[source]¶: Get default arguments.

espnet.bin.st_trans.main(args)[source]¶: Run the main decoding function.

espnet.bin.asr_enhance¶

espnet.bin.asr_enhance.get_parser()[source]¶

espnet.bin.asr_enhance.main(args)[source]¶

espnet.bin.init¶

Initialize sub package.

espnet.bin.st_train¶

End-to-end speech translation model training script.

espnet.bin.st_train.get_parser(parser=None, required=True)[source]¶: Get default arguments.

espnet.bin.st_train.main(cmd_args)[source]¶: Run the main training function.

espnet.bin.tts_train¶

Text-to-speech model training script.

espnet.bin.tts_train.get_parser()[source]¶: Get parser of training arguments.

espnet.bin.tts_train.main(cmd_args)[source]¶: Run training.

espnet.bin.mt_trans¶

Neural machine translation model decoding script.

espnet.bin.mt_trans.get_parser()[source]¶: Get default arguments.

espnet.bin.mt_trans.main(args)[source]¶: Run the main decoding function.

espnet.bin.asr_train¶

Automatic speech recognition model training script.

espnet.bin.asr_train.get_parser(parser=None, required=True)[source]¶: Get default arguments.

espnet.bin.asr_train.main(cmd_args)[source]¶: Run the main training function.

espnet.bin.asr_train.setup_logging(verbose)[source]¶: Make logging setup with a given log level.

espnet.bin.mt_train¶

Neural machine translation model training script.

espnet.bin.mt_train.get_parser(parser=None, required=True)[source]¶: Get default arguments.

espnet.bin.mt_train.main(cmd_args)[source]¶: Run the main training function.

espnet.bin.asr_align¶

This program performs CTC segmentation to align utterances within audio files.

Inputs:

–data-json:: A json containing list of utterances and audio files
–model:: An already trained ASR model

Output:

–output:: A plain segments file with utterance positions in the audio files.

Selected parameters:

–min-window-size:: Minimum window size considered for a single utterance. The current default value should be OK in most cases. Larger values might give better results; too large values cause IndexErrors.
–subsampling-factor:: If the encoder sub-samples its input, the number of frames at the CTC layer is reduced by this factor.
–frame-duration:: This is the non-overlapping duration of a single frame in milliseconds (the inverse of frames per millisecond).
–set-blank:: In the rare case that the blank token has not the index 0 in the character dictionary, this parameter sets the index of the blank token.
–gratis-blank:: Sets the transition cost for blank tokens to zero. Useful if there are longer unrelated segments between segments.
–replace-spaces-with-blanks:: Spaces are replaced with blanks. Helps to model pauses between words. May increase length of ground truth. May lead to misaligned segments when combined with the option –gratis-blank.

espnet.bin.asr_align.ctc_align(args, device)[source]¶

ESPnet-specific interface for CTC segmentation.

Parses configuration, infers the CTC posterior probabilities, and then aligns start and end of utterances using CTC segmentation. Results are written to the output file given in the args.

Parameters:

args – given configuration
device – for inference; one of [‘cuda’, ‘cpu’]

Returns:

0 on success

espnet.bin.asr_align.get_parser()[source]¶: Get default arguments.

espnet.bin.asr_align.main(args)[source]¶: Run the main decoding function.

espnet.bin.tts_decode¶

TTS decoding script.

espnet.bin.tts_decode.get_parser()[source]¶: Get parser of decoding arguments.

espnet.bin.tts_decode.main(args)[source]¶: Run deocding.

espnet.bin.vc_train¶

Voice conversion model training script.

espnet.bin.vc_train.get_parser()[source]¶: Get parser of training arguments.

espnet.bin.vc_train.main(cmd_args)[source]¶: Run training.

espnet.bin.vc_decode¶

VC decoding script.

espnet.bin.vc_decode.get_parser()[source]¶: Get parser of decoding arguments.

espnet.bin.vc_decode.main(args)[source]¶: Run deocding.

espnet.bin.asr_recog¶

End-to-end speech recognition model decoding script.

espnet.bin.asr_recog.get_parser()[source]¶: Get default arguments.

espnet.bin.asr_recog.main(args)[source]¶: Run the main decoding function.

espnet.bin.lm_train¶

Language model training script.

espnet.bin.lm_train.get_parser(parser=None, required=True)[source]¶: Get parser.

espnet.bin.lm_train.main(cmd_args)[source]¶: Train LM.