espnet2.iterators.chunk_iter_factory.ChunkIterFactory
Less than 1 minute
espnet2.iterators.chunk_iter_factory.ChunkIterFactory
class espnet2.iterators.chunk_iter_factory.ChunkIterFactory(dataset, batch_size: int, batches: AbsSampler | Sequence[Sequence[Any]], chunk_length: int | str, chunk_shift_ratio: float = 0.5, num_cache_chunks: int = 1024, num_samples_per_epoch: int | None = None, seed: int = 0, shuffle: bool = False, num_workers: int = 0, collate_fn=None, pin_memory: bool = False, excluded_key_prefixes: List[str] | None = None, discard_short_samples: bool = True, default_fs: int | None = None, chunk_max_abs_length: int | None = None)
Bases: AbsIterFactory
Creates chunks from a sequence
Examples
>>> batches = [["id1"], ["id2"], ...]
>>> batch_size = 128
>>> chunk_length = 1000
>>> iter_factory = ChunkIterFactory(dataset, batches, batch_size, chunk_length)
>>> it = iter_factory.build_iter(epoch)
>>> for ids, batch in it:
... ...
- The number of mini-batches are varied in each epochs and we can’t get the number in advance because IterFactory doesn’t be given to the length information.
- Since the first reason, “num_iters_per_epoch” can’t be implemented for this iterator. Instead of it, “num_samples_per_epoch” is implemented.
build_iter(epoch: int, shuffle: bool | None = None) → Iterator[Tuple[List[str], Dict[str, Tensor]]]
prepare_for_collate(id_list, batches)