QXel.distributed.distributed module

High-level wrappers for the distributed runtime exposed by QXel._C.

QXel.distributed.distributed.initialize(allow_print=False)

Initialize the distributed runtime once for the current process.

QXel.distributed.distributed.finalize()

Finalize the distributed runtime.

QXel.distributed.distributed.rank()

Return the MPI rank of the current process.

QXel.distributed.distributed.world_size()

Return the total number of MPI ranks in the communicator.

QXel.distributed.distributed.local_rank()

Return the local device rank within the current node.

QXel.distributed.distributed.local_size()

Return the number of ranks that share the current node.

QXel.distributed.distributed.Allreduce(tensor, op_type, data_type, stream=None)

Reduce tensor across all ranks using the requested operation.

QXel.distributed.distributed.Gather(src, dst, root, data_type, stream=None)

Gather equally sized buffers from all ranks onto root.

QXel.distributed.distributed.Allgather(src, dst, count, data_type, stream=None)

Gather equally sized buffers from all ranks onto every rank.

QXel.distributed.distributed.Barrier()

Block until every rank reaches the same synchronization point.

QXel.distributed.distributed.Bcast(tensor, root, data_type, stream=None)

Broadcast tensor from root to every rank.

QXel.distributed.distributed.Alltoall(src, dst, count, data_type, stream=None)

Exchange equally sized chunks between every pair of ranks.