espnet2.speechlm.dataloader.batch.batchfy_bucket
Less than 1 minute
espnet2.speechlm.dataloader.batch.batchfy_bucket
espnet2.speechlm.dataloader.batch.batchfy_bucket(keys: List[T], key_to_length: Dict[T, int], batch_token: int) → List[List[T]]
Create batches using bucket batching strategy.
Samples are sorted by length and grouped into buckets such that the total tokens (max_length * batch_size) does not exceed the batch_token limit.
- Parameters:- keys – List of sample keys to batch.
- key_to_length – Dictionary mapping each key to its length.
- batch_token – Maximum number of tokens allowed per batch.
 
- Returns: List of buckets, where each bucket is a list of keys.
