QXel.distributed.distributed module¶

High-level wrappers for the distributed runtime exposed by QXel._C.

QXel.distributed.distributed.initialize(allow_print=False)¶: Initialize the distributed runtime once for the current process.

QXel.distributed.distributed.finalize()¶: Finalize the distributed runtime.

QXel.distributed.distributed.rank()¶: Return the MPI rank of the current process.

QXel.distributed.distributed.world_size()¶: Return the total number of MPI ranks in the communicator.

QXel.distributed.distributed.local_rank()¶: Return the local device rank within the current node.

QXel.distributed.distributed.local_size()¶: Return the number of ranks that share the current node.

QXel.distributed.distributed.Allreduce(tensor, op_type, data_type, stream=None)¶: Reduce tensor across all ranks using the requested operation.

QXel.distributed.distributed.Gather(src, dst, root, data_type, stream=None)¶: Gather equally sized buffers from all ranks onto root.

QXel.distributed.distributed.Allgather(src, dst, count, data_type, stream=None)¶: Gather equally sized buffers from all ranks onto every rank.

QXel.distributed.distributed.Barrier()¶: Block until every rank reaches the same synchronization point.

QXel.distributed.distributed.Bcast(tensor, root, data_type, stream=None)¶: Broadcast tensor from root to every rank.

QXel.distributed.distributed.Alltoall(src, dst, count, data_type, stream=None)¶: Exchange equally sized chunks between every pair of ranks.