espnet2.speechlm.model.speechlm.parallel_utils.parallel_dims.init_parallel_dims

Less than 1 minute

espnet2.speechlm.model.speechlm.parallel_utils.parallel_dims.init_parallel_dims

espnet2.speechlm.model.speechlm.parallel_utils.parallel_dims.init_parallel_dims(titan_config: Dict[str, Any]) → Tuple[ParallelDims, int, int]

Create ParallelDims for distributed training.

Supports FSDP2 (dp_shard), HSDP (dp_replicate), pipeline parallelism (pp), and expert parallelism (ep).

The constraint dp_replicate * dp_shard * pp == world_size is enforced by TorchTitan; dp_shard=-1 auto-computes the remainder.

EP borrows from the FSDP dimension — it does NOT consume additional world_size. TorchTitan internally computes efsdp = dp_shard / ep for the expert FSDP mesh. For example, with 8 GPUs and ep=8: dense params use fsdp=8, expert params use efsdp=1 + ep=8.

This function assumes:

torch.distributed is already initialized (via dist.init_process_group)
CUDA device is already set (via torch.cuda.set_device)

Parameters:titan_config –
TorchTitan configuration dictionary containing:
- dp_replicate: HSDP replicate degree (default: 1)
- dp_shard: FSDP sharding degree (-1 = auto, default: -1)
- pp_degree: Pipeline parallel degree (default: 1)
- ep: Expert parallel degree (default: 1). Must divide
dp_shard evenly.
Returns:
- parallel_dims: ParallelDims object with device meshes built
- local_rank: Local rank within the node (current CUDA device)
- global_rank: Global rank across all nodes
Return type: Tuple of (parallel_dims, local_rank, global_rank)