espnet.distributed package

This is a helper module for distributed training.

The code uses an official implementation of distributed data parallel launcher as just a reference. One main difference is this code focuses on launching simple function with given arguments.

exception espnet.distributed.pytorch_backend.launch.MainProcessError(*, signal_no)[source]

Bases: multiprocessing.context.ProcessError

An error happened from main process.

property signal_no

Return signal number which stops main process.

exception espnet.distributed.pytorch_backend.launch.WorkerError(*, msg, exitcode, worker_id)[source]

Bases: multiprocessing.context.ProcessError

An error happened within each worker.

property exitcode

Return exitcode from worker process.

property worker_id

Return worker ID related to a process causes this error.


Find free port using bind().

There are some interval between finding this port and using it and the other process might catch the port by that time. Thus it is not guaranteed that the port is really empty.

espnet.distributed.pytorch_backend.launch.launch(func, args, nprocs, master_addr='localhost', master_port=None)[source]

Launch processes with a given function and given arguments.


Current implementaiton supports only single node case.


Set multiprocess start method.