espnet2.fileio.multi_sound_scp.MultiSoundScpReader
espnet2.fileio.multi_sound_scp.MultiSoundScpReader
class espnet2.fileio.multi_sound_scp.MultiSoundScpReader(fname, dtype=None, always_2d: bool = False, stack_axis=0, pad=nan)
Bases: Mapping
Reader class for ‘wav.scp’ containing multiple sounds.
This is useful when loading variable numbers of audios for different samples.
Examples
wav.scp is a text file that looks like the following:
key1 /some/path/a1.wav /another/path/a2.wav /yet/another/path/a3.wav key2 /some/path/b1.wav /another/path/b2.wav key3 /some/path/c1.wav /another/path/c2.wav /yet/another/path/c3.wav key4 /some/path/d1.wav …
>>> reader = SoundScpReader('wav.scp', stack_axis=0)
>>> rate, stacked_arrays = reader['key1']
>>> assert stacked_arrays.shape[0] == 3
Note: : All audios in each sample must have the same sampling rates. Audios of different lengths in each sample will be right-padded with np.nan <br/>
to the same length.
get_path(key)
keys() → a set-like object providing a view on D's keys
pad_to_same_length(arrays, pad=nan, axis=0)
Right-pad arrays to the same length.
- Parameters:
- arrays (List *[*np.ndarray ]) – List of arrays to pad
- pad (float) – Value to pad
- axis (int) – Axis to pad
- Returns: Padded array
- Return type: np.ndarray