espnet.utils.cli_writers.file_writer_helper
espnet.utils.cli_writers.file_writer_helper
espnet.utils.cli_writers.file_writer_helper(wspecifier: str, filetype: str = 'mat', write_num_frames: str | None = None, compress: bool = False, compression_method: int = 2, pcm_format: str = 'wav')
Write matrices in kaldi style.
- Parameters:
- wspecifier – e.g. ark,scp:out.ark,out.scp
- filetype – “mat” is kaldi-martix, “hdf5”: HDF5
- write_num_frames – e.g. ‘ark,t:num_frames.txt’
- compress – Compress or not
- compression_method – Specify compression level
Write in kaldi-matrix-ark with “kaldi-scp” file:
>>> with file_writer_helper('ark,scp:out.ark,out.scp') as f:
>>> f['uttid'] = array
This “scp” has the following format:
uttidA out.ark:1234 uttidB out.ark:2222
where, 1234 and 2222 points the strating byte address of the matrix. (For detail, see official documentation of Kaldi)
Write in HDF5 with “scp” file:
>>> with file_writer_helper('ark,scp:out.h5,out.scp', 'hdf5') as f:
>>> f['uttid'] = array
This “scp” file is created as:
uttidA out.h5:uttidA uttidB out.h5:uttidB
HDF5 can be, unlike “kaldi-ark”, accessed to any keys, so originally “scp” is not required for random-reading. Nevertheless we create “scp” for HDF5 because it is useful for some use-case. e.g. Concatenation, Splitting.