QXel.distributed.distributed module¶
High-level wrappers for the distributed runtime exposed by QXel._C.
- QXel.distributed.distributed.initialize(allow_print=False)¶
Initialize the distributed runtime once for the current process.
- QXel.distributed.distributed.finalize()¶
Finalize the distributed runtime.
- QXel.distributed.distributed.rank()¶
Return the MPI rank of the current process.
- QXel.distributed.distributed.world_size()¶
Return the total number of MPI ranks in the communicator.
- QXel.distributed.distributed.local_rank()¶
Return the local device rank within the current node.
- QXel.distributed.distributed.local_size()¶
Return the number of ranks that share the current node.
- QXel.distributed.distributed.Allreduce(tensor, op_type, data_type, stream=None)¶
Reduce
tensoracross all ranks using the requested operation.
- QXel.distributed.distributed.Gather(src, dst, root, data_type, stream=None)¶
Gather equally sized buffers from all ranks onto
root.
- QXel.distributed.distributed.Allgather(src, dst, count, data_type, stream=None)¶
Gather equally sized buffers from all ranks onto every rank.
- QXel.distributed.distributed.Barrier()¶
Block until every rank reaches the same synchronization point.
- QXel.distributed.distributed.Bcast(tensor, root, data_type, stream=None)¶
Broadcast
tensorfromrootto every rank.
- QXel.distributed.distributed.Alltoall(src, dst, count, data_type, stream=None)¶
Exchange equally sized chunks between every pair of ranks.