espnet2.iterators.category_iter_factory.CategoryIterFactory
espnet2.iterators.category_iter_factory.CategoryIterFactory
class espnet2.iterators.category_iter_factory.CategoryIterFactory(dataset, batches: AbsSampler | Sequence[Sequence[Any]], num_iters_per_epoch: int | None = None, seed: int = 0, sampler_args: dict | None = None, batch_type: str = 'catbel', shuffle: bool = False, num_workers: int = 0, collate_fn=None, pin_memory: bool = False)
Bases: AbsIterFactory
Build iterator for each epoch.
This class simply creates pytorch DataLoader except for the following points:
- The random seed is decided according to the number of epochs. This feature
guarantees reproducibility when resuming from middle of training process.
- Enable to restrict the number of samples for one epoch. This features controls the interval number between training and evaluation.
- Parameters:
dataset – The dataset to iterate over
batches – The batches to iterate over
num_iters_per_epoch – The number of iterations per epoch
seed – The random seed
sampler_args – The arguments to pass to the batch sampler
batch_type –
The type of batch sampler to use: catbel: Category-balanced batch sampler,
ensures equal representation of all categories in each batch
catpow: Category-power batch sampler, : applies power law sampling based on category frequency to address class imbalance
catpow_dataset: Category-power batch sampler with dataset-level : upsampling, performs dataset-level upsampling before applying power law sampling on categories within each dataset
shuffle – Whether to shuffle the batches
num_workers – The number of workers to use
collate_fn – The collate function to use
pin_memory – Whether to pin the memory
build_iter(epoch: int, shuffle: bool | None = None) → DataLoader