python utility tools¶

ESPnet provides several command-line python tools under utils/

addjson.py: add multiple json values to an input or output value
apply-cmvn.py: apply mean-variance normalization to files
average_checkpoints.py: average models from snapshot
calculate_rtf.py: calculate real time factor (RTF)
change_yaml.py: change specified attributes of a YAML file
compute-cmvn-stats.py: Compute cepstral mean and variance normalization statisticsIf wspecifier provided: per-utterance by default, or per-speaker ifspk2utt option provided; if wxfilename: global
compute-fbank-feats.py: compute FBANK feature from WAV
compute-stft-feats.py: compute STFT feature from WAV
concat_json_multiref.py: concatenate multiple json files for data augmentation
concatjson.py: concatenate json files
convert_fbank_to_wav.py: convert FBANK to WAV using Griffin-Lim algorithm
copy-feats.py: copy feature with preprocessing
dump-pcm.py: dump PCM files from a WAV scp file
eval-source-separation.py: Evaluate enhanced speech. e.g. ../doc/argparse2rst.py –ref ref.scp –enh enh.scp –outdir outputdiror ../doc/argparse2rst.py –ref ref.scp ref2.scp –enh enh.scp enh2.scp –outdir outputdir
eval_perm_free_error.py: evaluate permutation-free error
feat-to-shape.py: convert feature to its shape
feats2npy.py: Convet kaldi-style features to numpy arrays
filt.py: filter words in a text file
generate_wav_from_fbank.py: generate wav from FBANK using wavenet vocoder
get_yaml.py: get a specified attribute from a YAML file
json2sctm.py: convert json to sctm
json2text.py: convert ASR recognized json to text
json2trn.py: convert a json to a transcription file with a token dictionary
json2trn_mt.py: convert json to machine translation transcription
json2trn_wo_dict.py: convert a json to a transcription file with a token dictionary
make_pair_json.py: Merge source and target data.json files into one json file.
mcd_calculate.py: calculate MCD.
merge_scp2json.py: Given each file paths with such format as <key>:<file>:<type>. type> can be omitted and the default is “str”. e.g. ../doc/argparse2rst.py –input-scps feat:data/feats.scp shape:data/utt2feat_shape:shape –input-scps feat:data/feats2.scp shape:data/utt2feat2_shape:shape –output-scps text:data/text shape:data/utt2text_shape:shape –scps utt2spk:data/utt2spk
mergejson.py: merge json files
mix-mono-wav-scp.py: Mixing wav.scp files into a multi-channel wav.scp using sox.
result2json.py: convert sclite’s result.txt file to json
score_lang_id.py: language identification scoring
scp2json.py: convert scp to json
splitjson.py: split a json file for parallel processing
text2token.py: convert raw text to tokenized text
text2vocabulary.py: create a vocabulary file from text files
trim_silence.py: Trim slience with simple power thresholding and make segments file.
trn2ctm.py: convert trn to ctm
trn2stm.py: convert trn to stm

addjson.py¶

add multiple json values to an input or output value

usage: addjson.py [-h] [-i IS_INPUT] [--verbose VERBOSE] jsons [jsons ...]

Positional Arguments¶

jsons: json files

Named Arguments¶

-i, --is-input

If true, add to input. If false, add to output

Default: True

--verbose, -V

Verbose option

Default: 0

apply-cmvn.py¶

apply mean-variance normalization to files

usage: apply-cmvn.py [-h] [--verbose VERBOSE]
                     [--in-filetype {mat,hdf5,sound.hdf5,sound}]
                     [--stats-filetype {mat,hdf5,npy}]
                     [--out-filetype {mat,hdf5}] [--norm-means NORM_MEANS]
                     [--norm-vars NORM_VARS] [--reverse REVERSE]
                     [--spk2utt SPK2UTT] [--utt2spk UTT2SPK]
                     [--write-num-frames WRITE_NUM_FRAMES]
                     [--compress COMPRESS]
                     [--compression-method COMPRESSION_METHOD]
                     stats_rspecifier_or_rxfilename rspecifier wspecifier

Positional Arguments¶

stats_rspecifier_or_rxfilename: Input stats. e.g. ark:stats.ark or stats.mat
rspecifier: Read specifier id. e.g. ark:some.ark
wspecifier: Write specifier id. e.g. ark:some.ark

Named Arguments¶

--verbose, -V

Verbose option

Default: 0

--in-filetype

Possible choices: mat, hdf5, sound.hdf5, sound

Specify the file format for the rspecifier. “mat” is the matrix format in kaldi

Default: “mat”

--stats-filetype

Possible choices: mat, hdf5, npy

Specify the file format for the rspecifier. “mat” is the matrix format in kaldi

Default: “mat”

--out-filetype

Possible choices: mat, hdf5

Specify the file format for the wspecifier. “mat” is the matrix format in kaldi

Default: “mat”

--norm-means

Do variance normalization or not.

Default: True

--norm-vars

Do variance normalization or not.

Default: False

--reverse

Do reverse mode or not

Default: False

--spk2utt

A text file of speaker to utterance-list map. (Don’t give rspecifier format, such as “ark:spk2utt”)

--utt2spk

A text file of utterance to speaker map. (Don’t give rspecifier format, such as “ark:utt2spk”)

--write-num-frames

Specify wspecifer for utt2num_frames

--compress

Save in compressed format

Default: False

--compression-method

Specify the method(if mat) or gzip-level(if hdf5)

Default: 2

average_checkpoints.py¶

average models from snapshot

usage: average_checkpoints.py [-h] --snapshots SNAPSHOTS [SNAPSHOTS ...] --out
                              OUT [--num NUM] [--backend BACKEND]
                              [--log [LOG]]
                              [--metric [{acc,bleu,cer,cer_ctc,loss,perplexity}]]
                              [--max-epoch [MAX_EPOCH]]

Named Arguments¶

--snapshots

--out

--num

Default: 10

--backend

Default: “chainer”

--log

--metric

Possible choices: acc, bleu, cer, cer_ctc, loss, perplexity

Default: “”

--max-epoch

Default: 10000000

calculate_rtf.py¶

calculate real time factor (RTF)

usage: calculate_rtf.py [-h] [--log-dir LOG_DIR]
                        [--log-name {decode,asr_inference}]
                        [--input-shift INPUT_SHIFT]
                        [--start-times-marker {input lengths,speech length}]
                        [--end-times-marker {prediction,best hypo}]
                        [--inf-num INF_NUM]

Named Arguments¶

--log-dir

path to logging directory

--log-name

Possible choices: decode, asr_inference

name of logfile, e.g., ‘decode’ (espnet1) and ‘asr_inference’ (espnet2)

Default: “decode”

--input-shift

shift of inputs in milliseconds

Default: 10.0

--start-times-marker

Possible choices: input lengths, speech length

String marking start of decoding in logfile, e.g., ‘input lengths’ (espnet1) and ‘speech length’ (espnet2)

Default: “input lengths”

--end-times-marker

Possible choices: prediction, best hypo

String marking end of decoding in logfile, e.g., ‘prediction’ (espnet1) and ‘best hypo’ (espnet2)

Default: “prediction”

--inf-num

number of inference hypothesis for each utterance, e.g. >1 in multi-speaker asr.

Default: 1

change_yaml.py¶

change specified attributes of a YAML file

usage: change_yaml.py [-h] [-o OUTYAML | --outdir OUTDIR] [-a ARG] [-d DELETE]
                      [inyaml]

Positional Arguments¶

inyaml

Named Arguments¶

-o, --outyaml

--outdir

-a, --arg

e.g -a a.b.c=4 -> {‘a’: {‘b’: {‘c’: 4}}}

Default: []

-d, --delete

e.g -d a -> “a” is removed from the input yaml

Default: []

compute-cmvn-stats.py¶

Compute cepstral mean and variance normalization statisticsIf wspecifier provided: per-utterance by default, or per-speaker ifspk2utt option provided; if wxfilename: global

usage: compute-cmvn-stats.py [-h] [--spk2utt SPK2UTT] [--verbose VERBOSE]
                             [--in-filetype {mat,hdf5,sound.hdf5,sound}]
                             [--out-filetype {mat,hdf5,npy}]
                             [--preprocess-conf PREPROCESS_CONF]
                             rspecifier wspecifier_or_wxfilename

Positional Arguments¶

rspecifier: Read specifier for feats. e.g. ark:some.ark
wspecifier_or_wxfilename: Write specifier. e.g. ark:some.ark

Named Arguments¶

--spk2utt

A text file of speaker to utterance-list map. (Don’t give rspecifier format, such as “ark:utt2spk”)

--verbose, -V

Verbose option

Default: 0

--in-filetype

Possible choices: mat, hdf5, sound.hdf5, sound

Specify the file format for the rspecifier. “mat” is the matrix format in kaldi

Default: “mat”

--out-filetype

Possible choices: mat, hdf5, npy

Specify the file format for the wspecifier. “mat” is the matrix format in kaldi

Default: “mat”

--preprocess-conf

The configuration file for the pre-processing

compute-fbank-feats.py¶

compute FBANK feature from WAV

usage: compute-fbank-feats.py [-h] [--fs FS] [--fmax [FMAX]] [--fmin [FMIN]]
                              [--n_mels N_MELS] [--n_fft N_FFT]
                              [--n_shift N_SHIFT] [--win_length [WIN_LENGTH]]
                              [--window {hann,hamming}]
                              [--write-num-frames WRITE_NUM_FRAMES]
                              [--filetype {mat,hdf5}] [--compress COMPRESS]
                              [--compression-method COMPRESSION_METHOD]
                              [--verbose VERBOSE] [--normalize {1,16,24,32}]
                              [--segments SEGMENTS]
                              rspecifier wspecifier

Positional Arguments¶

rspecifier: WAV scp file
wspecifier: Write specifier

Named Arguments¶

--fs

Sampling frequency

--fmax

Maximum frequency

--fmin

Minimum frequency

--n_mels

Number of mel basis

Default: 80

--n_fft

FFT length in point

Default: 1024

--n_shift

Shift length in point

Default: 512

--win_length

Analysis window length in point

--window

Possible choices: hann, hamming

Type of window

Default: “hann”

--write-num-frames

Specify wspecifer for utt2num_frames

--filetype

Possible choices: mat, hdf5

Specify the file format for output. “mat” is the matrix format in kaldi

Default: “mat”

--compress

Save in compressed format

Default: False

--compression-method

Specify the method(if mat) or gzip-level(if hdf5)

Default: 2

--verbose, -V

Verbose option

Default: 0

--normalize

Possible choices: 1, 16, 24, 32

Give the bit depth of the PCM, then normalizes data to scale in [-1,1]

--segments

segments-file format: each line is either<segment-id> <recording-id> <start-time> <end-time>e.g. call-861225-A-0050-0065 call-861225-A 5.0 6.5

compute-stft-feats.py¶

compute STFT feature from WAV

usage: compute-stft-feats.py [-h] [--fs FS] [--n_fft N_FFT]
                             [--n_shift N_SHIFT] [--win_length [WIN_LENGTH]]
                             [--window {hann,hamming}]
                             [--write-num-frames WRITE_NUM_FRAMES]
                             [--filetype {mat,hdf5}] [--compress COMPRESS]
                             [--compression-method COMPRESSION_METHOD]
                             [--verbose VERBOSE] [--normalize {1,16,24,32}]
                             [--segments SEGMENTS]
                             rspecifier wspecifier

Positional Arguments¶

rspecifier: WAV scp file
wspecifier: Write specifier

Named Arguments¶

--fs

Sampling frequency

--n_fft

FFT length in point

Default: 1024

--n_shift

Shift length in point

Default: 512

--win_length

Analysis window length in point

--window

Possible choices: hann, hamming

Type of window

Default: “hann”

--write-num-frames

Specify wspecifer for utt2num_frames

--filetype

Possible choices: mat, hdf5

Specify the file format. “mat” is the matrix format in kaldi

Default: “mat”

--compress

Save in compressed format

Default: False

--compression-method

Specify the method(if mat) or gzip-level(if hdf5)

Default: 2

--verbose, -V

Verbose option

Default: 0

--normalize

Possible choices: 1, 16, 24, 32

Give the bit depth of the PCM, then normalizes data to scale in [-1,1]

--segments

segments-file format: each line is either<segment-id> <recording-id> <start-time> <end-time>e.g. call-861225-A-0050-0065 call-861225-A 5.0 6.5

concat_json_multiref.py¶

concatenate multiple json files for data augmentation

usage: concat_json_multiref.py [-h] jsons [jsons ...]

Positional Arguments¶

jsons: json files

concatjson.py¶

concatenate json files

usage: concatjson.py [-h] jsons [jsons ...]

Positional Arguments¶

jsons: json files

convert_fbank_to_wav.py¶

convert FBANK to WAV using Griffin-Lim algorithm

usage: convert_fbank_to_wav.py [-h] [--fs FS] [--fmax [FMAX]] [--fmin [FMIN]]
                               [--n_fft N_FFT] [--n_shift N_SHIFT]
                               [--win_length [WIN_LENGTH]] [--n_mels [N_MELS]]
                               [--window {hann,hamming}] [--iters ITERS]
                               [--filetype {mat,hdf5}]
                               rspecifier outdir

Positional Arguments¶

rspecifier: Input feature
outdir: Output directory

Named Arguments¶

--fs

Sampling frequency

Default: 22050

--fmax

Maximum frequency

--fmin

Minimum frequency

--n_fft

FFT length in point

Default: 1024

--n_shift

Shift length in point

Default: 512

--win_length

Analysis window length in point

--n_mels

Number of mel basis

--window

Possible choices: hann, hamming

Type of window

Default: “hann”

--iters

Number of iterations in Grriffin Lim

Default: 100

--filetype

Possible choices: mat, hdf5

Specify the file format for the rspecifier. “mat” is the matrix format in kaldi

Default: “mat”

copy-feats.py¶

copy feature with preprocessing

usage: copy-feats.py [-h] [--verbose VERBOSE]
                     [--in-filetype {mat,hdf5,sound.hdf5,sound}]
                     [--out-filetype {mat,hdf5,sound.hdf5,sound}]
                     [--write-num-frames WRITE_NUM_FRAMES]
                     [--compress COMPRESS]
                     [--compression-method COMPRESSION_METHOD]
                     [--preprocess-conf PREPROCESS_CONF]
                     rspecifier wspecifier

Positional Arguments¶

rspecifier: Read specifier for feats. e.g. ark:some.ark
wspecifier: Write specifier. e.g. ark:some.ark

Named Arguments¶

--verbose, -V

Verbose option

Default: 0

--in-filetype

Possible choices: mat, hdf5, sound.hdf5, sound

Specify the file format for the rspecifier. “mat” is the matrix format in kaldi

Default: “mat”

--out-filetype

Possible choices: mat, hdf5, sound.hdf5, sound

Specify the file format for the wspecifier. “mat” is the matrix format in kaldi

Default: “mat”

--write-num-frames

Specify wspecifer for utt2num_frames

--compress

Save in compressed format

Default: False

--compression-method

Specify the method(if mat) or gzip-level(if hdf5)

Default: 2

--preprocess-conf

The configuration file for the pre-processing

dump-pcm.py¶

dump PCM files from a WAV scp file

usage: dump-pcm.py [-h] [--write-num-frames WRITE_NUM_FRAMES]
                   [--filetype {mat,hdf5,sound.hdf5,sound}] [--format FORMAT]
                   [--compress COMPRESS]
                   [--compression-method COMPRESSION_METHOD]
                   [--verbose VERBOSE] [--normalize {1,16,24,32}]
                   [--preprocess-conf PREPROCESS_CONF]
                   [--keep-length KEEP_LENGTH] [--segments SEGMENTS]
                   rspecifier wspecifier

Positional Arguments¶

rspecifier: WAV scp file
wspecifier: Write specifier

Named Arguments¶

--write-num-frames

Specify wspecifer for utt2num_frames

--filetype

Possible choices: mat, hdf5, sound.hdf5, sound

Specify the file format for output. “mat” is the matrix format in kaldi

Default: “mat”

--format

The file format for output pcm. This option is only valid when “–filetype” is “sound.hdf5” or “sound”

--compress

Save in compressed format

Default: False

--compression-method

Specify the method(if mat) or gzip-level(if hdf5)

Default: 2

--verbose, -V

Verbose option

Default: 0

--normalize

Possible choices: 1, 16, 24, 32

Give the bit depth of the PCM, then normalizes data to scale in [-1,1]

--preprocess-conf

The configuration file for the pre-processing

--keep-length

Truncating or zero padding if the output length is changed from the input by preprocessing

Default: True

--segments

segments-file format: each line is either<segment-id> <recording-id> <start-time> <end-time>e.g. call-861225-A-0050-0065 call-861225-A 5.0 6.5

eval-source-separation.py¶

Evaluate enhanced speech. e.g. /home/runner/work/espnet/espnet/tools/venv/bin/sphinx-build –ref ref.scp –enh enh.scp –outdir outputdiror /home/runner/work/espnet/espnet/tools/venv/bin/sphinx-build –ref ref.scp ref2.scp –enh enh.scp enh2.scp –outdir outputdir

usage: eval-source-separation.py [-h] [--verbose VERBOSE] --ref REFFILES
                                 [REFFILES ...] --enh ENHFILES [ENHFILES ...]
                                 --outdir OUTDIR [--keylist KEYLIST]
                                 [--evaltypes {SDR,STOI,ESTOI,PESQ} [{SDR,STOI,ESTOI,PESQ} ...]]
                                 [--permutation PERMUTATION]
                                 [--bss-eval-images BSS_EVAL_IMAGES]
                                 [--bss-eval-version {v3,v4}]

Named Arguments¶

--verbose, -V

Verbose option

Default: 0

--ref

WAV file lists for reference

--enh

WAV files lists for enhanced

--outdir

--keylist

Specify the target samples. By default, using all keys in the first reference file

--evaltypes

Possible choices: SDR, STOI, ESTOI, PESQ

Default: [‘SDR’, ‘STOI’, ‘ESTOI’, ‘PESQ’]

--permutation

Compute all permutations or use the pair of input order

Default: True

--bss-eval-images

Use bss_eval_images or bss_eval_sources. For more detail, see museval source codes.

Default: True

--bss-eval-version

Possible choices: v3, v4

Specify bss-eval-version: v3 or v4

Default: “v3”

eval_perm_free_error.py¶

evaluate permutation-free error

usage: eval_perm_free_error.py [-h] [--num-spkrs NUM_SPKRS]
                               results [results ...]

Positional Arguments¶

results: the scores between references and hypotheses, in ascending order of references (1st) and hypotheses (2nd), e.g. [r1h1, r1h2, r2h1, r2h2] in 2-speaker-mix case.

Named Arguments¶

--num-spkrs

number of mixed speakers.

Default: 2

feat-to-shape.py¶

convert feature to its shape

usage: feat-to-shape.py [-h] [--verbose VERBOSE]
                        [--filetype {mat,hdf5,sound.hdf5,sound}]
                        [--preprocess-conf PREPROCESS_CONF]
                        rspecifier [out]

Positional Arguments¶

rspecifier

Read specifier for feats. e.g. ark:some.ark

out

The output filename. If omitted, then output to sys.stdout

Default: <_io.TextIOWrapper name=’<stdout>’ mode=’w’ encoding=’utf-8’>

Named Arguments¶

--verbose, -V

Verbose option

Default: 0

--filetype

Possible choices: mat, hdf5, sound.hdf5, sound

Specify the file format for the rspecifier. “mat” is the matrix format in kaldi

Default: “mat”

--preprocess-conf

The configuration file for the pre-processing

feats2npy.py¶

Convet kaldi-style features to numpy arrays

usage: feats2npy.py [-h] scp_file out_dir

Positional Arguments¶

scp_file: scp file
out_dir: output directory

filt.py¶

filter words in a text file

usage: filt.py [-h] [--exclude] filt infile

Positional Arguments¶

filt: filter list
infile: input file

Named Arguments¶

--exclude, -v

exclude filter words

Default: False

generate_wav_from_fbank.py¶

generate wav from FBANK using wavenet vocoder

usage: generate_wav_from_fbank.py [-h] [--fs FS] [--n_fft N_FFT]
                                  [--n_shift N_SHIFT] [--model MODEL]
                                  [--filetype {mat,hdf5}]
                                  rspecifier outdir

Positional Arguments¶

rspecifier: Input feature e.g. scp:feat.scp
outdir: Output directory

Named Arguments¶

--fs

Sampling frequency

Default: 22050

--n_fft

FFT length in point

Default: 1024

--n_shift

Shift length in point

Default: 256

--model

WaveNet model

--filetype

Possible choices: mat, hdf5

Specify the file format for the rspecifier. “mat” is the matrix format in kaldi

Default: “mat”

get_yaml.py¶

get a specified attribute from a YAML file

usage: get_yaml.py [-h] inyaml attr

Positional Arguments¶

inyaml
attr: foo.bar will access yaml.load(inyaml)[“foo”][“bar”]

json2sctm.py¶

convert json to sctm

usage: json2sctm.py [-h] [--num-spkrs [NUM_SPKRS]] [--refs [REFS [REFS ...]]]
                    [--hyps [HYPS [HYPS ...]]] [--orig-stm [ORIG_STM]]
                    [--stm STM [STM ...]] [--ctm CTM [CTM ...]] [--bpe [BPE]]
                    [json] dict

Positional Arguments¶

json: input trn
dict: dict

Named Arguments¶

--num-spkrs

number of speakers

Default: 1

--refs

ref for all speakers

--hyps

hyp for all outputs

--orig-stm

orig stm

--stm

output stm

--ctm

output ctm

--bpe

BPE model if applicable

json2text.py¶

convert ASR recognized json to text

usage: json2text.py [-h] json dict ref hyp

Positional Arguments¶

json: json files
dict: dict
ref: ref
hyp: hyp

json2trn.py¶

convert a json to a transcription file with a token dictionary

usage: json2trn.py [-h] [--num-spkrs NUM_SPKRS] [--refs REFS [REFS ...]]
                   [--hyps HYPS [HYPS ...]]
                   json dict

Positional Arguments¶

json: json files
dict: dict

Named Arguments¶

--num-spkrs

number of speakers

Default: 1

--refs

ref for all speakers

--hyps

hyp for all outputs

json2trn_mt.py¶

convert json to machine translation transcription

usage: json2trn_mt.py [-h] [--refs REFS [REFS ...]] [--hyps HYPS [HYPS ...]]
                      [--srcs SRCS [SRCS ...]] [--dict-src [DICT_SRC]]
                      json dict

Positional Arguments¶

json: json files
dict: dict for target language

Named Arguments¶

--refs

ref for all speakers

--hyps

hyp for all outputs

--srcs

src for all outputs

--dict-src

dict for source language

Default: False

json2trn_wo_dict.py¶

convert a json to a transcription file with a token dictionary

usage: json2trn_wo_dict.py [-h] [--num-spkrs NUM_SPKRS]
                           [--refs REFS [REFS ...]] [--hyps HYPS [HYPS ...]]
                           json

Positional Arguments¶

json: json files

Named Arguments¶

--num-spkrs

number of speakers

Default: 1

--refs

ref for all speakers

--hyps

hyp for all outputs

make_pair_json.py¶

Merge source and target data.json files into one json file.

usage: make_pair_json.py [-h] [--src-json SRC_JSON] [--trg-json TRG_JSON]
                         [--num_utts NUM_UTTS] [--verbose VERBOSE] [--out OUT]

Named Arguments¶

--src-json

Json file for the source speaker

--trg-json

Json file for the target speaker. If not specified, use source only.

--num_utts

Number of utterances (take from head)

Default: -1

--verbose, -V

Verbose option

Default: 1

--out, -O

The output filename. If omitted, then output to sys.stdout

mcd_calculate.py¶

calculate MCD.

usage: mcd_calculate.py [-h] --wavdir WAVDIR --gtwavdir GTWAVDIR
                        [--mcep_dim MCEP_DIM] [--mcep_alpha MCEP_ALPHA]
                        [--fftl FFTL] [--shiftms SHIFTMS] --f0min F0MIN
                        --f0max F0MAX [--n_jobs N_JOBS]

Named Arguments¶

--wavdir

path of directory for converted waveforms

--gtwavdir

path of directory for ground truth waveforms

--mcep_dim

dimension of mel cepstrum coefficient

Default: 41

--mcep_alpha

all pass constant

Default: 0.41

--fftl

fft length

Default: 1024

--shiftms

frame shift (ms)

Default: 5

--f0min

fo search range (min)

--f0max

fo search range (max)

--n_jobs

number of parallel jobs

Default: 40

merge_scp2json.py¶

Given each file paths with such format as <key>:<file>:<type>. type> can be omitted and the default is “str”. e.g. /home/runner/work/espnet/espnet/tools/venv/bin/sphinx-build –input-scps feat:data/feats.scp shape:data/utt2feat_shape:shape –input-scps feat:data/feats2.scp shape:data/utt2feat2_shape:shape –output-scps text:data/text shape:data/utt2text_shape:shape –scps utt2spk:data/utt2spk

usage: merge_scp2json.py [-h] [--input-scps [INPUT_SCPS [INPUT_SCPS ...]]]
                         [--output-scps [OUTPUT_SCPS [OUTPUT_SCPS ...]]]
                         [--scps SCPS [SCPS ...]] [--verbose VERBOSE]
                         [--allow-one-column ALLOW_ONE_COLUMN] [--out OUT]

Named Arguments¶

--input-scps

Json files for the inputs

Default: []

--output-scps

Json files for the outputs

Default: []

--scps

The json files except for the input and outputs

Default: []

--verbose, -V

Verbose option

Default: 1

--allow-one-column

Allow one column in input scp files. In this case, the value will be empty string.

Default: False

--out, -O

The output filename. If omitted, then output to sys.stdout

mergejson.py¶

merge json files

usage: mergejson.py [-h] [--input-jsons INPUT_JSONS [INPUT_JSONS ...]]
                    [--output-jsons OUTPUT_JSONS [OUTPUT_JSONS ...]]
                    [--jsons JSONS [JSONS ...]] [--verbose VERBOSE]
                    [-O OUTPUT]

Named Arguments¶

--input-jsons

Json files for the inputs

Default: []

--output-jsons

Json files for the outputs

Default: []

--jsons

The json files except for the input and outputs

Default: []

--verbose, -V

Verbose option

Default: 0

-O

Output json file

mix-mono-wav-scp.py¶

Mixing wav.scp files into a multi-channel wav.scp using sox.

usage: mix-mono-wav-scp.py [-h] scp [scp ...] [out]

Positional Arguments¶

scp

Give wav.scp

out

The output filename. If omitted, then output to sys.stdout

Default: <encodings.utf_8.StreamWriter object at 0x7f6e28b3da00>

result2json.py¶

convert sclite’s result.txt file to json

usage: result2json.py [-h] [--key KEY]

Named Arguments¶

--key, -k: key

score_lang_id.py¶

language identification scoring

usage: score_lang_id.py [-h] --ref REF --hyp HYP [--out OUT]

Named Arguments¶

--ref

input reference

--hyp

input hypotheses

--out

The output filename. If omitted, then output to sys.stdout

Default: <encodings.utf_8.StreamWriter object at 0x7f6e28b3da00>

scp2json.py¶

convert scp to json

usage: scp2json.py [-h] [--key KEY]

Named Arguments¶

--key, -k: key

splitjson.py¶

split a json file for parallel processing

usage: splitjson.py [-h] [--parts PARTS] json

Positional Arguments¶

json: json file

Named Arguments¶

--parts, -p

Number of subparts to be prepared

Default: 0

text2token.py¶

convert raw text to tokenized text

usage: text2token.py [-h] [--nchar NCHAR] [--skip-ncols SKIP_NCOLS]
                     [--space SPACE] [--non-lang-syms NON_LANG_SYMS]
                     [--trans_type {char,phn}]
                     [text]

Positional Arguments¶

text

input text

Default: False

Named Arguments¶

--nchar, -n

number of characters to split, i.e., aabb -> a a b b with -n 1 and aa bb with -n 2

Default: 1

--skip-ncols, -s

skip first n columns

Default: 0

--space

space symbol

Default: “<space>”

--non-lang-syms, -l

list of non-linguistic symobles, e.g., <NOISE> etc.

--trans_type, -t

Possible choices: char, phn

Transcript type. char/phn. e.g., for TIMIT FADG0_SI1279 -: If trans_type is char, read from SI1279.WRD file -> “bricks are an alternative” Else if trans_type is phn, read from SI1279.PHN file -> “sil b r ih sil k s aa r er n aa l sil t er n ih sil t ih v sil”

Default: “char”

text2vocabulary.py¶

create a vocabulary file from text files

usage: text2vocabulary.py [-h] [--output OUTPUT] [--cutoff CUTOFF]
                          [--vocabsize VOCABSIZE]
                          [text_files [text_files ...]]

Positional Arguments¶

text_files: input text files

Named Arguments¶

--output, -o

output a vocabulary file

Default: “”

--cutoff, -c

cut-off frequency

Default: 0

--vocabsize, -s

vocabulary size

Default: 20000

trim_silence.py¶

Trim slience with simple power thresholding and make segments file.

usage: trim_silence.py [-h] [--fs FS] [--threshold THRESHOLD]
                       [--win_length WIN_LENGTH] [--shift_length SHIFT_LENGTH]
                       [--min_silence MIN_SILENCE] [--figdir FIGDIR]
                       [--verbose VERBOSE] [--normalize {1,16,24,32}]
                       rspecifier wspecifier

Positional Arguments¶

rspecifier: WAV scp file.
wspecifier: Segments file.

Named Arguments¶

--fs

Sampling frequency.

--threshold

Threshold in decibels.

Default: 60

--win_length

Analysis window length in point.

Default: 1200

--shift_length

Shift length in point.

Default: 300

--min_silence

Minimum silence length in sec.

Default: 0.01

--figdir

Directory to save figures.

--verbose

Verbosity level.

Default: 0

--normalize

Possible choices: 1, 16, 24, 32

Give the bit depth of the PCM, then normalizes data to scale in [-1,1].

trn2ctm.py¶

convert trn to ctm

usage: trn2ctm.py [-h] [trn] [ctm]

Positional Arguments¶

trn: input trn
ctm: output ctm

trn2stm.py¶

convert trn to stm

usage: trn2stm.py [-h] [--orig-stm [ORIG_STM]] [trn] [stm]

Positional Arguments¶

trn: input trn
stm: output stm

Named Arguments¶

--orig-stm: Original stm file to add additional information to the generated one