espnetez.data.dump.join_dumps

Less than 1 minute

espnetez.data.dump.join_dumps

espnetez.data.dump.join_dumps(dump_paths: List[str], dump_prefix: List[str], output_dir: str | Path)

Create a joined dump file from a list of dump paths.

This function takes multiple dump paths and prefixes, reads the corresponding dump files, and creates a new dump file in the specified output directory. Each line from the original dump files is prefixed with the corresponding prefix from the dump_prefix list.

Parameters:
- dump_paths (List *[*str ]) – A list of paths for the dump directories. Each path should contain the dump files to be joined.
- dump_prefix (List *[*str ]) – A list of prefixes for the dump files. Each prefix will be added to the beginning of the corresponding lines in the joined output file.
- output_dir (Union *[*str , Path ]) – The output directory where the joined dump file will be saved. If the directory does not exist, it will be created.
Raises:ValueError – If any of the expected dump files do not exist in the specified dump paths.

Examples

>>> join_dumps(
...     dump_paths=["/path/to/dump1", "/path/to/dump2"],
...     dump_prefix=["dataset1", "dataset2"],
...     output_dir="/path/to/output"
... )
This will read dump files from "/path/to/dump1" and "/path/to/dump2",
prefix the lines with "dataset1-" and "dataset2-", and write the joined
content to "/path/to/output".

NOTE

It is assumed that all dump directories contain the same set of dump file names. If the dump files have different names, a ValueError will be raised.