espnetez.data.dump.create_dump_file
espnetez.data.dump.create_dump_file
espnetez.data.dump.create_dump_file(dump_dir: str | Path, dataset: Dict[str, Dict] | List[Dict], data_inputs: Dict[str, Dict])
Create a dump file for a dataset.
This function generates a dump file in the specified directory containing the specified data from the dataset. The dump file will include information related to the input variables as specified in the data_inputs argument.
- Parameters:
- dump_dir (Union *[*str , Path ]) – The output directory where the dump files will be saved. If the directory does not exist, it will be created.
- dataset (Union *[*Dict *[*str , Dict ] , List *[*Dict ] ]) – The dataset from which to create the dump file. It can either be a dictionary where each key represents a data entry or a list of dictionaries representing multiple entries.
- data_inputs (Dict *[*str , Dict ]) – A dictionary containing data information for each input variable. Each key should correspond to a variable name, and the value should be a list where the first element is the desired output file name for that variable.
- Raises:ValueError – If dataset is neither a dictionary nor a list, or if any expected data entry is missing.
Examples
Creating a dump file from a dictionary dataset:
>>> dump_dir = "output/dump"
>>> dataset = {
... 0: {"feature1": "value1", "feature2": "value2"},
... 1: {"feature1": "value3", "feature2": "value4"},
... }
>>> data_inputs = {
... "feature1": ["feature1_dump.txt"],
... "feature2": ["feature2_dump.txt"],
... }
>>> create_dump_file(dump_dir, dataset, data_inputs)
This will create two files: feature1_dump.txt and feature2_dump.txt in the output/dump directory, each containing the corresponding data.
Creating a dump file from a list dataset:
>>> dump_dir = "output/dump"
>>> dataset = [
... {"feature1": "value1", "feature2": "value2"},
... {"feature1": "value3", "feature2": "value4"},
... ]
>>> data_inputs = {
... "feature1": ["feature1_dump.txt"],
... "feature2": ["feature2_dump.txt"],
... }
>>> create_dump_file(dump_dir, dataset, data_inputs)
Similar to the previous example, this will create the same dump files in the specified output directory.
NOTE
Ensure that the output directory has the necessary write permissions to avoid any I/O errors during file creation.