espnet2.iterators.category_chunk_iter_factory.CategoryChunkIterFactory
espnet2.iterators.category_chunk_iter_factory.CategoryChunkIterFactory
class espnet2.iterators.category_chunk_iter_factory.CategoryChunkIterFactory(dataset, batch_size: int, batches: AbsSampler | Sequence[Sequence[Any]], chunk_length: int | str, chunk_shift_ratio: float = 0.5, num_cache_chunks: int = 1024, num_samples_per_epoch: int | None = None, seed: int = 0, shuffle: bool = False, num_workers: int = 0, collate_fn=None, pin_memory: bool = False, excluded_key_prefixes: List[str] | None = None, discard_short_samples: bool = True, default_fs: int | None = None, chunk_max_abs_length: int | None = None)
Bases: AbsIterFactory
Creates chunks from a sequence
Examples
>>> batches = [["id1"], ["id2"], ...]
>>> batch_size = 128
>>> chunk_length = 1000
>>> iter_factory = ChunkIterFactory(dataset, batches, batch_size, chunk_length)
>>> it = iter_factory.build_iter(epoch)
>>> for ids, batch in it:
... ...
This class is a modified class from ChunkIterFacotry.
- Get categorical balanced chunks for batch instead of per category
- TODO(jiatong): add additional setup to save/load shuffled chunk information
build_iter(epoch: int, shuffle: bool | None = None) → Iterator[Tuple[List[str], Dict[str, Tensor]]]
prepare_for_collate(id_list, batches)