espnet2.samplers.category_power_sampler.CategoryPowerSampler
espnet2.samplers.category_power_sampler.CategoryPowerSampler
class espnet2.samplers.category_power_sampler.CategoryPowerSampler(batch_bins: int, shape_files: Tuple[str, ...] | List[str], min_batch_size: int = 1, max_batch_size: int | None = None, upsampling_factor: float = 1.0, dataset_scaling_factor: float = 1.2, drop_last: bool = False, category2utt_file: str | None = None, epoch: int = 1, **kwargs)
Bases: AbsSampler
A category-balanced batch sampler with power-law sampling.
Reference: : Scaling Speech Technology to 1,000+ Languages https://arxiv.org/pdf/2305.13516
This sampler constructs mini-batches by balancing samples across categories (e.g., language IDs), using a power-law distribution to control the sampling frequency. Originally developed for language identification, it can be applied to any dataset that provides a mapping from category (e.g., language) to utterances.
Sampling Strategy:
Given:
- l ∈ {1, 2, …, L}, the set of category labels
- n_l: total duration (number of bins) of category l
- N: total duration (number of bins) of all categories in the dataset
- β: upsampling factor
- k_l: the number of utterances in category l
We define:
- Category-level sampling probability:
P(l) = (n_l / N)^β
- Utterance-level conditional sampling: : P(x | l) = 1 / k_l
- Combined sampling probability: : P(x) = P(l) * P(x | l) = (n_l / N)^β * (1 / k_l)
Where β ∈ [0, 1] is the upsampling_factor:
- β → 0 emphasizes low-resource categories (strong upsampling)
- β → 1 approximates uniform sampling over all utterances
Note:
- Batches are constructed based on batch_bins, similar to LengthBatchSampler.
- Set batch_type=catpow in your configuration to use this sampler.
- Parameters:
- batch_bins – The approximate maximum number of bins (e.g., audio samples) in a batch.
- shape_files – A list or tuple of shape file paths. Only one shape file is supported, but the list format is retained for compatibility with other samplers.
- min_batch_size – Minimum number of utterances in a batch.
- max_batch_size – Maximum number of utterances in a batch (recommended for memory safety).
- upsampling_factor – β in the sampling formula; controls how strongly to upsample low-resource categories.
- dataset_scaling_factor – A multiplier that determines the total number of utterances sampled. Values > 1 simulate more frequent use of low-resource utterances across batches. Must be ≥ 1.
- drop_last – Whether to drop the final batch.
- category2utt_file – Path to a file mapping each category to utterance ID.
- epoch – Random seed is set using the epoch to ensure reproducibility with variation across epochs.