espnet2.speechlm.dataloader.batch.synchronize_batches
Less than 1 minute
espnet2.speechlm.dataloader.batch.synchronize_batches
espnet2.speechlm.dataloader.batch.synchronize_batches(batches: List[List[T]]) → List[List[T]]
Synchronize batches across all GPU ranks in distributed training.
Ensures all GPU ranks have the same number of batches by duplicating the last few batches on ranks with fewer batches. This is useful for distributed training where each rank may have different numbers of batches due to data sharding.
- Parameters:batches – List of batches to synchronize.
- Returns: Synchronized list of batches with duplicates added if necessary.
Notes
- If torch.distributed is not initialized, returns unchanged
- If CUDA is not available, returns batches unchanged
- Duplicates are taken from the end of the batch list
