espnet2.fileio package¶

espnet2.fileio.init¶

espnet2.fileio.sound_scp¶

class espnet2.fileio.sound_scp.SoundScpReader(fname, dtype=None, always_2d: bool = False, multi_columns: bool = False, concat_axis=1)[source]¶

Bases: collections.abc.Mapping

Reader class for ‘wav.scp’.

Examples

wav.scp is a text file that looks like the following:

key1 /some/path/a.wav key2 /some/path/b.wav key3 /some/path/c.wav key4 /some/path/d.wav …

>>> reader = SoundScpReader('wav.scp')
>>> rate, array = reader['key1']

If multi_columns=True is given and multiple files are given in one line with space delimiter, and the output array are concatenated along channel direction

key1 /some/path/a.wav /some/path/a2.wav key2 /some/path/b.wav /some/path/b2.wav …

>>> reader = SoundScpReader('wav.scp', multi_columns=True)
>>> rate, array = reader['key1']

In the above case, a.wav and a2.wav are concatenated.

Note that even if multi_columns=True is given, SoundScpReader still supports a normal wav.scp, i.e., a wav file is given per line, but this option is disable by default because dict[str, list[str]] object is needed to be kept, but it increases the required amount of memory.

get_path(key)[source]¶

keys() → a set-like object providing a view on D's keys[source]¶

class espnet2.fileio.sound_scp.SoundScpWriter(outdir: Union[pathlib.Path, str], scpfile: Union[pathlib.Path, str], format='wav', multi_columns: bool = False, output_name_format: str = '{key}.{audio_format}', output_name_format_multi_columns: str = '{key}-CH{channel}.{audio_format}', subtype: Optional[str] = None)[source]¶

Bases: object

Writer class for ‘wav.scp’

Parameters:

outdir –
scpfile –
format – The output audio format
multi_columns – Save multi channel data as multiple monaural audio files
output_name_format – The naming formam of generated audio files
output_name_format_multi_columns – The naming formam of generated audio files when multi_columns is given
dtype –
subtype –

Examples

>>> writer = SoundScpWriter('./data/', './data/wav.scp')
>>> writer['aa'] = 16000, numpy_array
>>> writer['bb'] = 16000, numpy_array

aa ./data/aa.wav bb ./data/bb.wav

>>> writer = SoundScpWriter(
    './data/', './data/feat.scp', multi_columns=True,
)
>>> numpy_array.shape
(100, 2)
>>> writer['aa'] = 16000, numpy_array

aa ./data/aa-CH0.wav ./data/aa-CH1.wav

close()[source]¶

get_path(key)[source]¶

espnet2.fileio.sound_scp.soundfile_read(wavs: Union[str, List[str]], dtype=None, always_2d: bool = False, concat_axis: int = 1, start: int = 0, end: int = None, return_subtype: bool = False) → Tuple[numpy.array, int][source]¶

espnet2.fileio.npy_scp¶

class espnet2.fileio.npy_scp.NpyScpReader(fname: Union[pathlib.Path, str])[source]¶

Bases: collections.abc.Mapping

Reader class for a scp file of numpy file.

Examples

key1 /some/path/a.npy key2 /some/path/b.npy key3 /some/path/c.npy key4 /some/path/d.npy …

>>> reader = NpyScpReader('npy.scp')
>>> array = reader['key1']

get_path(key)[source]¶

keys() → a set-like object providing a view on D's keys[source]¶

class espnet2.fileio.npy_scp.NpyScpWriter(outdir: Union[pathlib.Path, str], scpfile: Union[pathlib.Path, str])[source]¶

Bases: object

Writer class for a scp file of numpy file.

Examples

key1 /some/path/a.npy key2 /some/path/b.npy key3 /some/path/c.npy key4 /some/path/d.npy …

>>> writer = NpyScpWriter('./data/', './data/feat.scp')
>>> writer['aa'] = numpy_array
>>> writer['bb'] = numpy_array

close()[source]¶

get_path(key)[source]¶

espnet2.fileio.read_text¶

class espnet2.fileio.read_text.RandomTextReader(text_and_scp: str)[source]¶

Bases: collections.abc.Mapping

Reader class for random access to text.

Simple text reader for non-pair text data (for unsupervised ASR)
Instead of loading the whole text into memory (often large for UASR), the reader consumes text which stores in byte-offset of each text file and randomly selected unpaired text from it for training using mmap.

Examples:

text
text1line text2line text3line

scp
11 00000000000000000010 00000000110000000020 00000000210000000030

scp explanation
(number of digits per int value) (text start at bytes 0 and end at bytes 10 (including “

“)): (text start at bytes 11 and end at bytes 20 (including “
“)): (text start at bytes 21 and end at bytes 30 (including “

“))

keys() → a set-like object providing a view on D's keys[source]¶

espnet2.fileio.read_text.load_num_sequence_text(path: Union[pathlib.Path, str], loader_type: str = 'csv_int') → Dict[str, List[Union[float, int]]][source]¶

Read a text file indicating sequences of number

Examples

key1 1 2 3 key2 34 5 6

>>> d = load_num_sequence_text('text')
>>> np.testing.assert_array_equal(d["key1"], np.array([1, 2, 3]))

espnet2.fileio.read_text.read_2columns_text(path: Union[pathlib.Path, str]) → Dict[str, str][source]¶

Read a text file having 2 columns as dict object.

Examples

wav.scp:: key1 /some/path/a.wav key2 /some/path/b.wav

>>> read_2columns_text('wav.scp')
{'key1': '/some/path/a.wav', 'key2': '/some/path/b.wav'}

espnet2.fileio.read_text.read_label(path: Union[pathlib.Path, str]) → Dict[str, List[List[Union[str, float, int]]]][source]¶

Read a text file indicating sequences of number

Examples

key1 start_time_1 end_time_1 phone_1 start_time_2 end_time_2 phone_2 ….

key2 start_time_1 end_time_1 phone_1

>>> d = load_num_sequence_text('label')
>>> np.testing.assert_array_equal(d["key1"], [0.1, 0.2, "啊"]))

espnet2.fileio.read_text.read_multi_columns_text(path: Union[pathlib.Path, str], return_unsplit: bool = False) → Tuple[Dict[str, List[str]], Optional[Dict[str, str]]][source]¶

Read a text file having 2 or more columns as dict object.

Examples

wav.scp:: key1 /some/path/a1.wav /some/path/a2.wav key2 /some/path/b1.wav /some/path/b2.wav /some/path/b3.wav key3 /some/path/c1.wav …

>>> read_multi_columns_text('wav.scp')
{'key1': ['/some/path/a1.wav', '/some/path/a2.wav'],
 'key2': ['/some/path/b1.wav', '/some/path/b2.wav', '/some/path/b3.wav'],
 'key3': ['/some/path/c1.wav']}

espnet2.fileio.multi_sound_scp¶

class espnet2.fileio.multi_sound_scp.MultiSoundScpReader(fname, dtype=None, always_2d: bool = False, stack_axis=0, pad=nan)[source]¶

Bases: collections.abc.Mapping

Reader class for ‘wav.scp’ containing multiple sounds.

This is useful when loading variable numbers of audios for different samples.

Examples

wav.scp is a text file that looks like the following:

key1 /some/path/a1.wav /another/path/a2.wav /yet/another/path/a3.wav key2 /some/path/b1.wav /another/path/b2.wav key3 /some/path/c1.wav /another/path/c2.wav /yet/another/path/c3.wav key4 /some/path/d1.wav …

>>> reader = SoundScpReader('wav.scp', stack_axis=0)
>>> rate, stacked_arrays = reader['key1']
>>> assert stacked_arrays.shape[0] == 3

Note:: All audios in each sample must have the same sampling rates. Audios of different lengths in each sample will be right-padded with np.nan

to the same length.

get_path(key)[source]¶

keys() → a set-like object providing a view on D's keys[source]¶

pad_to_same_length(arrays, pad=nan, axis=0)[source]¶

Right-pad arrays to the same length.

Parameters:

arrays (List[np.ndarray]) – List of arrays to pad
pad (float) – Value to pad
axis (int) – Axis to pad

Returns:

Padded array

Return type:

np.ndarray

espnet2.fileio.rand_gen_dataset¶

class espnet2.fileio.rand_gen_dataset.FloatRandomGenerateDataset(shape_file: Union[pathlib.Path, str], dtype: Union[str, numpy.dtype] = 'float32', loader_type: str = 'csv_int')[source]¶

Bases: collections.abc.Mapping

Generate float array from shape.txt.

Examples

shape.txt uttA 123,83 uttB 34,83 >>> dataset = FloatRandomGenerateDataset(“shape.txt”) >>> array = dataset[“uttA”] >>> assert array.shape == (123, 83) >>> array = dataset[“uttB”] >>> assert array.shape == (34, 83)

class espnet2.fileio.rand_gen_dataset.IntRandomGenerateDataset(shape_file: Union[pathlib.Path, str], low: int, high: int = None, dtype: Union[str, numpy.dtype] = 'int64', loader_type: str = 'csv_int')[source]¶

Bases: collections.abc.Mapping

Generate float array from shape.txt

Examples

shape.txt uttA 123,83 uttB 34,83 >>> dataset = IntRandomGenerateDataset(“shape.txt”, low=0, high=10) >>> array = dataset[“uttA”] >>> assert array.shape == (123, 83) >>> array = dataset[“uttB”] >>> assert array.shape == (34, 83)

espnet2.fileio.vad_scp¶

class espnet2.fileio.vad_scp.VADScpReader(fname, dtype=<class 'numpy.float32'>)[source]¶

Bases: collections.abc.Mapping

Reader class for ‘vad.scp’.

Different from segments, the vad.scp would focus on utterance-level, while the segments are expected to focus on a whole session. The major usage in ESPnet is to guide the silence trim for UASR.

Examples

key1 0:1.2000 key2 3.0000:4.5000 7.0000:9:0000 …

>>> reader = VADScpReader('wav.scp')
>>> array = reader['key1']

keys() → a set-like object providing a view on D's keys[source]¶

class espnet2.fileio.vad_scp.VADScpWriter(scpfile: Union[pathlib.Path, str], dtype=None)[source]¶

Bases: object

Writer class for ‘vad.scp’

Examples

key1 0:1.2000 key2 3.0000:4.5000 7.0000:9:0000 …

>>> writer = VADScpWriter('./data/vad.scp')
>>> writer['aa'] = list of tuples
>>> writer['bb'] = list of tuples

close()[source]¶

espnet2.fileio.score_scp¶

class espnet2.fileio.score_scp.MIDReader(fname, add_rest=True, dtype=<class 'numpy.int16'>)[source]¶

Bases: collections.abc.Mapping

Reader class for ‘mid.scp’.

Examples

key1 /some/path/a.mid key2 /some/path/b.mid key3 /some/path/c.mid key4 /some/path/d.mid …

>>> reader = XMLScpReader('mid.scp')
>>> tempo, note_list = reader['key1']

get_path(key)[source]¶

keys() → a set-like object providing a view on D's keys[source]¶

class espnet2.fileio.score_scp.NOTE(lyric, midi, st, et)[source]¶: Bases: object

class espnet2.fileio.score_scp.SingingScoreReader(fname, dtype=<class 'numpy.int16'>)[source]¶

Bases: collections.abc.Mapping

Reader class for ‘score.scp’.

Examples

key1 /some/path/score.json key2 /some/path/score.json key3 /some/path/score.json key4 /some/path/score.json …

>>> reader = SoundScpReader('score.scp')
>>> score = reader['key1']

get_path(key)[source]¶

keys() → a set-like object providing a view on D's keys[source]¶

class espnet2.fileio.score_scp.SingingScoreWriter(outdir: Union[pathlib.Path, str], scpfile: Union[pathlib.Path, str])[source]¶

Bases: object

Writer class for ‘score.scp’

Examples

key1 /some/path/score.json key2 /some/path/score.json key3 /some/path/score.json key4 /some/path/score.json …

>>> writer = SingingScoreWriter('./data/', './data/score.scp')
>>> writer['aa'] = score_obj
>>> writer['bb'] = score_obj

close()[source]¶

get_path(key)[source]¶

class espnet2.fileio.score_scp.XMLReader(fname, dtype=<class 'numpy.int16'>)[source]¶

Bases: collections.abc.Mapping

Reader class for ‘xml.scp’.

Examples

key1 /some/path/a.xml key2 /some/path/b.xml key3 /some/path/c.xml key4 /some/path/d.xml …

>>> reader = XMLScpReader('xml.scp')
>>> tempo, note_list = reader['key1']

get_path(key)[source]¶

keys() → a set-like object providing a view on D's keys[source]¶

class espnet2.fileio.score_scp.XMLWriter(outdir: Union[pathlib.Path, str], scpfile: Union[pathlib.Path, str])[source]¶

Bases: object

Writer class for ‘midi.scp’

Examples

key1 /some/path/a.musicxml key2 /some/path/b.musicxml key3 /some/path/c.musicxml key4 /some/path/d.musicxml …

>>> writer = XMLScpWriter('./data/', './data/xml.scp')
>>> writer['aa'] = xml_obj
>>> writer['bb'] = xml_obj

close()[source]¶

get_path(key)[source]¶

espnet2.fileio.rttm¶

class espnet2.fileio.rttm.RttmReader(fname: str)[source]¶

Bases: collections.abc.Mapping

Reader class for ‘rttm.scp’.

Examples

SPEAKER file1 1 0 1023 <NA> <NA> spk1 <NA> SPEAKER file1 2 4000 3023 <NA> <NA> spk2 <NA> SPEAKER file1 3 500 4023 <NA> <NA> spk1 <NA> END file1 <NA> 4023 <NA> <NA> <NA> <NA>

This is an extend version of standard RTTM format for espnet. The difference including: 1. Use sample number instead of absolute time 2. has a END label to represent the duration of a recording 3. replace duration (5th field) with end time (For standard RTTM,

see https://catalog.ldc.upenn.edu/docs/LDC2004T12/RTTM-format-v13.pdf)

…

>>> reader = RttmReader('rttm')
>>> spk_label = reader["file1"]

keys() → a set-like object providing a view on D's keys[source]¶

espnet2.fileio.rttm.load_rttm_text(path: Union[pathlib.Path, str]) → Dict[str, List[Tuple[str, float, float]]][source]¶

Read a RTTM file

Note: only support speaker information now

espnet2.fileio.datadir_writer¶

class espnet2.fileio.datadir_writer.DatadirWriter(p: Union[pathlib.Path, str])[source]¶

Bases: object

Writer class to create kaldi like data directory.

Examples

>>> with DatadirWriter("output") as writer:
...     # output/sub.txt is created here
...     subwriter = writer["sub.txt"]
...     # Write "uttidA some/where/a.wav"
...     subwriter["uttidA"] = "some/where/a.wav"
...     subwriter["uttidB"] = "some/where/b.wav"

close()[source]¶

espnet2.fileio package¶

espnet2.fileio.__init__¶

espnet2.fileio.sound_scp¶

espnet2.fileio.npy_scp¶

espnet2.fileio.read_text¶

espnet2.fileio.multi_sound_scp¶

espnet2.fileio.rand_gen_dataset¶

espnet2.fileio.vad_scp¶

espnet2.fileio.score_scp¶

espnet2.fileio.rttm¶

espnet2.fileio.datadir_writer¶

espnet2.fileio.init¶