espnet.lm.lm_utils.load_dataset
Less than 1 minute
espnet.lm.lm_utils.load_dataset
espnet.lm.lm_utils.load_dataset(path, label_dict, outdir=None)
Load and save HDF5 that contains a dataset and stats for LM
- Parameters:
- path (str) – The path of an input text dataset file
- label_dict (dict *[*str , int ]) – dictionary that maps token label string to its ID number
- outdir (str) – The path of an output dir
- Returns: Tuple of : token IDs in np.int32 converted by read_tokens the number of tokens by count_tokens, and the number of OOVs by count_tokens
- Return type: tuple[list[np.ndarray], int, int]