QXel.simulation.state_vector_simulation module

class QXel.simulation.state_vector_simulation.StateVectorSimulation(plan)

Bases: BaseSimulation

Execute partitioned state-vector circuits from a SimulationPlan.

__init__(plan)

Allocate the buffers required by a state-vector execution plan.

Parameters:

plan (SimulationPlan) – Execution plan describing qubit layout, compute backend, offloading mode, and MPI topology.

init_single(plan)

Allocate buffers and transfer streams for a single-rank run.

Parameters:

plan (SimulationPlan) – State-vector execution plan. The method inspects compute_type and offload_type to decide whether the working state lives in host memory, device memory, or a staged storage tier.

init_multi(plan)

Allocate buffers and streams for a distributed MPI run.

Parameters:

plan (SimulationPlan) – State-vector execution plan. Distributed runs keep one logical state shard per rank and optionally allocate double buffers when overlap is enabled.

static alltoall_if_required(src, dst, partition_index, stream=None)

Exchange state shards when execution crosses a partition boundary.

static Allreduce(tensor, op_type, data_type, stream=None)

Reduce tensor across MPI ranks when distributed execution is active.

static amplitude_to_result(result, positions, qubit_count, final_perm)

Materialize an amplitude query in canonical bitstring order.

Parameters:
  • result (Amplitude) – Accumulated amplitude buffer.

  • positions – Logical basis-state indices requested by the caller before transpilation.

  • qubit_count (int) – Total logical qubit count.

  • final_perm (list[int]) – Final permutation applied during transpilation.

Returns:

Mapping from big-endian bitstrings to complex amplitudes.

Return type:

dict[str, complex]

static statevector_to_result(state_vector, qubit_count, qubit_count_local, num_partitions, final_perm)

Gather and reorder the full state vector for the caller.

Parameters:
  • state_vector (list[StateVector]) – Local state-vector shard or gathered state-vector buffer.

  • qubit_count (int) – Total logical qubit count.

  • qubit_count_local (int) – Number of qubits stored locally on one rank.

  • num_partitions (int) – Number of partition segments produced by the transpiler.

  • final_perm (list[int]) – Final transpiler permutation.

Returns:

Flattened state vector in canonical qubit order.

Return type:

np.ndarray

static transpose_to_perm(arr, perm)

Reorder a per-qubit result array back to canonical qubit order.

static bitlist_transpose_to_perm(arr, perm)

Reorder one measured bitstring from permuted to canonical order.

static expectation_to_result(result, final_perm)

Materialize an expectation result after cross-rank reduction.

static variance_to_result(result, final_perm)

Materialize a variance result after cross-rank reduction.

static sample_to_result(result, qubit_count, perm)

Materialize sampled measurement outcomes in canonical bit order.

static probability_to_result(result)

Materialize a probability vector after cross-rank reduction.

static convert_result_type(result, qubit_count, qubit_count_local, num_partitions, final_perm)

Convert one backend result buffer into the public return shape.

Parameters:
  • result (Result) – Backend result buffer to convert.

  • qubit_count (int) – Total logical qubit count.

  • qubit_count_local (int) – Number of qubits stored in one local working set.

  • num_partitions (int) – Number of transpiler partitions executed for the circuit.

  • final_perm (list[int]) – Final permutation emitted by the transpiler.

Returns:

Public result payload matching the concrete Result type.

Return type:

Any

static convert_result_type_single(results, qubit_count, qubit_count_local, num_partitions, final_perm)

Collapse per-slice single-rank results into the public return shape.

Parameters:
  • results (list[Result]) – One result buffer per processed slice or device.

  • qubit_count (int) – Total logical qubit count.

  • qubit_count_local (int) – Number of qubits stored in one local working set.

  • num_partitions (int) – Number of transpiler partitions executed for the circuit.

  • final_perm (list[int]) – Final permutation emitted by the transpiler.

Returns:

Public result payload matching the concrete Result type.

Return type:

Any

static permute_index(index, qubit_count_nonslice, qubit_count_dist)

Map a logical slice index to its storage location after all-to-all.

upload_states_if_required(state_vector, state_vector_local, slice_size, slice_index, partition_index, sparse_map, ready_event=None, stream=None)

Stage the next slice from host or storage into the local buffer.

Parameters:
  • state_vector (Tensor) – Full offloaded state store.

  • state_vector_local (Tensor) – Active local working buffer or staging buffer.

  • slice_size (int) – Number of amplitudes in one transferred slice.

  • slice_index (int) – Global slice index of the subcircuit being executed.

  • partition_index (int) – Partition index of the current subcircuit.

  • sparse_map (dict[int, bool]) – Tracks which offloaded slices are known to contain only zeros and can therefore skip I/O.

  • ready_event (Optional[Event], optional) – Event that marks the local staging buffers as reusable. When provided, each upload stream waits for it before touching the destination slice.

  • stream (Optional[Stream], optional) – Stream used for asynchronous transfers. When omitted, the method blocks on its internal transfer streams.

download_states_if_required(state_vector, state_vector_local, slice_size, slice_index, partition_index, sparse_map, stream=None)

Write the updated local slice back to host or storage.

Parameters:
  • state_vector (Tensor) – Full offloaded state store.

  • state_vector_local (Tensor) – Active local working buffer or staging buffer.

  • slice_size (int) – Number of amplitudes in one transferred slice.

  • slice_index (int) – Global slice index of the subcircuit being executed.

  • partition_index (int) – Partition index of the current subcircuit.

  • sparse_map (dict[int, bool]) – Tracks which offloaded slices are now materialized after execution.

  • stream (Optional[Stream], optional) – Stream used for asynchronous transfers. When omitted, the method blocks on its internal transfer streams.

run(circuits, results, shots)

Dispatch to the execution strategy selected by the plan.

Parameters:
  • circuits (list[QuantumCircuit]) – Transpiled circuits or subcircuits ready for execution.

  • results (list[Result]) – Result buffers allocated by the caller.

  • shots (int) – Number of samples requested for measurement results.

Returns:

Materialized result payloads.

Return type:

list

run_single(circuits, results, shots)

Execute a partitioned circuit on one rank.

This path can still stripe work across multiple local devices when the active backend exposes them.

Parameters:
  • circuits (list[QuantumCircuit]) – Partitioned subcircuits emitted by the transpiler.

  • results (list[Result]) – Per-device result buffers to update.

  • shots (int) – Number of requested samples.

Returns:

Materialized result payloads after local slice reduction.

Return type:

list

run_multi(circuits, results, shots)

Execute a partitioned circuit across multiple MPI ranks.

Parameters:
  • circuits (list[QuantumCircuit]) – Partitioned subcircuits assigned to this rank.

  • results (list[Result]) – Rank-local result buffers to update.

  • shots (int) – Number of requested samples.

Returns:

Materialized result payloads after cross-rank reduction.

Return type:

list

run_multi_overlap_sync(circuits, results, shots)

Execute a distributed run with overlap instrumentation enabled.

This variant keeps explicit synchronizations and timing prints to make transfer and compute overlap easier to inspect.

Parameters:
  • circuits (list[QuantumCircuit]) – Partitioned subcircuits assigned to this rank.

  • results (list[Result]) – Rank-local result buffers to update.

  • shots (int) – Number of requested samples.

Returns:

Materialized result payloads after cross-rank reduction.

Return type:

list

run_multi_overlap(circuits, results, shots)

Execute a distributed run with asynchronous double buffering.

Parameters:
  • circuits (list[QuantumCircuit]) – Partitioned subcircuits assigned to this rank.

  • results (list[Result]) – Rank-local result buffers to update.

  • shots (int) – Number of requested samples.

Returns:

Materialized result payloads after cross-rank reduction.

Return type:

list