espnet.utils.training.batchfy.batchfy_by_seq
Less than 1 minute
espnet.utils.training.batchfy.batchfy_by_seq
espnet.utils.training.batchfy.batchfy_by_seq(sorted_data, batch_size, max_length_in, max_length_out, min_batch_size=1, shortest_first=False, ikey='input', iaxis=0, okey='output', oaxis=0)
Make batch set from json dictionary.
- Parameters:
- sorted_data (Dict *[*str , Dict *[*str , Any ] ]) – dictionary loaded from data.json
- batch_size (int) – batch size
- max_length_in (int) – maximum length of input to decide adaptive batch size
- max_length_out (int) – maximum length of output to decide adaptive batch size
- min_batch_size (int) – mininum batch size (for multi-gpu)
- shortest_first (bool) – Sort from batch with shortest samples to longest if true, otherwise reverse
- ikey (str) – key to access input (for ASR ikey=”input”, for TTS, MT ikey=”output”.)
- iaxis (int) – dimension to access input (for ASR, TTS iaxis=0, for MT iaxis=”1”.)
- okey (str) – key to access output (for ASR, MT okey=”output”. for TTS okey=”input”.)
- oaxis (int) – dimension to access output (for ASR, TTS, MT oaxis=0, reserved for future research, -1 means all axis.)
- Returns: List[List[Tuple[str, dict]]] list of batches