Design of a Fault‑Tolerant‑Metric‑Aware, Reversible n‑bit Quantum Arithmetic Logic Unit using IBM Qiskit | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Design of a Fault‑Tolerant‑Metric‑Aware, Reversible n‑bit Quantum Arithmetic Logic Unit using IBM Qiskit Agniswar Banerjee This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8959757/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 10 You are reading this latest preprint version Abstract This paper presents a reversible, n‑bit Quantum Arithmetic Logic Unit (QALU) implemented in Qiskit that supports a classical ALU‑like instruction set: ADD, SUB, CMP, AND/OR/XOR and their negations, unary NOT, shifts and rotates, and operand passthrough. The QALU outputs a result register and status flags N, Z, C, V (negative, zero, carry/no‑borrow, signed overflow) and emits explicit comparison outputs EQ, LT u , LT s . The design is modular and operation‑selectable at compile time, enabling clean verification and resource accounting. To evaluate fault‑tolerant efficiency, the circuits are decomposed into a Clifford + T basis and measure T‑count alongside CX‑count and depth. Exhaustive verification on Aer (matrix‑product‑state simulation) confirms correctness for all input pairs for n = 4 across 16 operations. Hardware experiments on IBM Quantum (limited to n = 2) demonstrate end‑to‑end execution with dynamical decoupling and gate twirling, and readout mitigation via mthree. An analytical scaling discussion grounded in ripple‑carry adder theory and known Toffoli/T‑gate constructions is further provided. These results show that a practical, verifiable QALU with flags and compare can be implemented within Qiskit while exposing meaningful fault‑tolerant metrics and hardware‑realistic performance characterization. Key findings from the provided implementation artifacts: (1) Aer exhaustive verification (n = 4): the QALU was exhaustively tested over all 2 2n =256 input pairs for each of 16 operations (4096 total tests), with 0 mismatches against a classical reference model. (2) Resource metrics (n = 4): after transpilation to a Clifford + T‑style basis {cx, h, s, sdg, x, z, t, tdg} at optimization level 3, the highest‑cost operations (ADD, SUB, CMP) report T‑count = 86, CX‑count ≈ 102–106, and depth ≈ 138 on a 24‑qubit circuit instance (including flags and MCX ancillae). (3) IBM hardware test (n = 2): on an automatically selected IBM backend (ibm_torino) using dynamical decoupling and gate twirling plus mthree readout mitigation, an exhaustive run over all 256 circuits yielded raw correctness 91.41% and mitigated correctness 91.02% (slightly worse post‑mitigation, consistent with practical tradeoffs where mitigation noise/calibration drift can dominate at small sizes). Physical sciences/Engineering Physical sciences/Mathematics and computing Physical sciences/Physics Quantum Arithmetic Logic Unit (QALU) reversible computing Clifford+T synthesis T-count scaling Toffoli decomposition CDKM ripple-carry adder MCX zero-flag logic Qiskit transpilation fault-tolerant quantum computing Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Introduction Classical ALUs are multi‑operator units producing results and condition flags used by the control flow of von Neumann machines. Reproducing this capability in a quantum circuit is nontrivial because quantum operations must be reversible/unitary and cannot discard information without measurement. The reversibility requirement connects directly to Landauer’s principle relating logical irreversibility to thermodynamic cost, and to Bennett’s demonstration that irreversible computations can be embedded reversibly (often at the expense of extra workspace and later “uncomputation”). Reversible logic primitives (e.g., Toffoli/Fredkin/Peres‑style constructions) underpin many quantum arithmetic designs, since Toffoli is universal for classical reversible computation and maps cleanly to a quantum gate. Early quantum arithmetic networks for addition and modular arithmetic were provided by Vedral, Barenco, and Ekert, motivating adders as core building blocks for algorithms such as Shor’s. Ripple‑carry addition remains a central design point because of its linear depth and modest ancilla requirements. The Cuccaro–Draper–Kutin–Moulton (CDKM) “MAJ/UMA” ripple‑carry adder uses 2n-1 Toffoli gates and 5n-3 CNOTs (no incoming carry), with linear depth; variants support incoming carry and comparator‑style “high‑bit only” computations. Importantly, the CDKM paper explicitly notes that the “high bit only” configuration can be adapted into a comparator by subtracting and extracting the high bit. QFT‑based adders (Draper) avoid explicit carry bits by operating in a Fourier basis, at the cost of controlled phase rotations. Subsequent work surveys and extends QFT arithmetic, including modular and signed variants. However, from a fault‑tolerant perspective, arbitrary rotations typically require compilation to Clifford + T sequences with T‑count scaling as a function of target approximation error (e.g., Ross–Selinger, Kliuchnikov–Maslov–Mosca). This often makes “rotation‑heavy” QFT adders less competitive on strict Clifford + T metrics unless approximations are coarse or dedicated synthesis/approximation is carefully optimized. Because non‑Clifford operations dominate many fault‑tolerant cost models, modern arithmetic research focuses directly on T gates. Gidney shows how to reduce an n‑bit adder’s T‑count from \(8n+O\left(1\right)\) to \(4n+O\left(1\right)\) using “temporary logical‑AND” constructions. Complementarily, Jones presents Toffoli constructions using four T gates (instead of the “conventional” seven‑T decompositions) via teleportation‑style techniques and discusses extensions for adding controls. Selinger studies T‑depth reduction and Toffoli representations with additional ancillae. These results motivate using T‑count and T‑depth as evaluation metrics for QALUs intended for fault‑tolerant execution. Reversible/quantum ALUs have been proposed in multiple architectural styles: Ancilla‑free integrated reversible ALUs: Thomsen, Glück, and Axelsen present a garbage‑free reversible ALU requiring only 6n elementary reversible gates for five basic operations and no ancillae, achieved by nested “V‑shape” structures. Control‑line driven reversible ALUs: Zhou et al. (2011) propose a reversible ALU supporting multiple operations including ADD/SUB/XOR/NOT and other variants via several control lines. Optimization‑focused reversible ALUs: Moallem et al. (2014) and Haghparast & Bolhassani (2016) propose 1‑bit/“one‑digit” reversible ALUs and evaluate them using reversible‑logic cost metrics (quantum cost, garbage, constant inputs), reporting function sets up to 16 operations. High‑functionality multi‑bit ALUs: Slimani & Achour (2017) describe an “optimized 4‑bit” reversible ALU claiming up to 28 functions with reduced quantum cost/delay compared with existing designs. QFT‑based qALUs on IBM hardware: Çakmak et al. (2023) propose a “primitive” QFT‑based qALU supporting ADD and NAND and execute on IBM hardware. Explicit fault‑tolerant QALUs: Biswal et al. describe a Clifford + T‑group QALU and optimization strategies emphasizing T‑gate parallelism and circuit optimization, though the accessible preview emphasizes architecture rather than full numerical tables. Recent T‑metric‑centric QALUs: Keshavarz et al. (2024) explicitly frame QALU design as a low‑T‑cost problem and report percentage improvements in T‑count/T‑depth/qubits/functions compared to prior work in the accessible preview. Across this literature, a persistent gap is the combination of: (i) a broad ALU‑like function set, (ii) classical‑style flags and compare outputs, (iii) explicit Clifford + T (T‑count) accounting, and (iv) exhaustive correctness verification plus hardware validation with mitigation/suppression—all in a single, reproducible Qiskit implementation. This paper targets that intersection. Literature Review Early work on quantum arithmetic operations established two main paradigms: quantum ripple-carry adders based on reversible gate networks, and quantum Fourier transform (QFT) adders operating via phase rotations. Draper’s 2000 seminal work introduced addition in Fourier space, showing that two n-bit numbers can be added by applying a quantum Fourier transform, then performing controlled phase shifts proportional to the addend, and finally inverse transforming. This approach avoids carrying propagation in a computational basis, allowing addition in linear depth (though with many small rotation gates). Follow-up research expanded this to subtraction, multiplication and handling of signed numbers using QFT-based techniques. In parallel, reversible logic researchers developed ripple-carry adders using Toffoli (CCX) and CNOT gates that mimic classical binary addition with carries. Cuccaro et al. (2004) presented an n-bit ripple adder requiring only one ancilla (carry qubit) and achieving lower depth than earlier designs. Such adders use a majority and unmajority (MAJ/UMA) gate sequence to propagate carries, and they can be combined with other reversible circuits for a full arithmetic unit. Building reversible adders, reversible ALUs were proposed to integrate multiple operations. Thomsen, Glück, and Axelsen (2010) designed a reversible ALU that could perform five basic arithmetic/logic functions on two n-bit operands (e.g., addition, bitwise AND, OR XOR, etc.) within a single circuit. Crucially, their design produced no garbage output and used no extra ancillas, by reusing the existing qubits for multiple purposes in different operations. The authors reported that this optimized ALU can be built with only 6n reversible gates (scaling linearly in bit-width), thanks to a novel V-shape circuit structure originally developed for adders. This result highlighted that in theory, highly efficient quantum-compatible ALUs are attainable; however, the abstract reversible gates in their design (multi-bit-controlled gates) need decomposition to elementary quantum gates when implemented on real hardware. Recently, Phillip et al. (2023) began implementing small-scale quantum ALUs on real quantum systems, demonstrated a prototype 2-bit quantum ALU using IBM Qiskit. Their ALU accepted a 2-bit input (with an extra carry bit) and could compute a limited set of functions (NOT, AND, OR, and a 2-bit addition). Although limited in width and functionality, this experiment showed that basic ALU operations can be realized on quantum circuits and hinted that a quantum ALU could even surpass a classical one in speed for certain tasks or enable new quantum operations. More advanced, Cakmak et al. (2023) introduced a Quantum Fourier Transform-based qALU and ran it on IBM’s quantum hardware. Their design showed a primitive Arithmetic Logic Unit (ALU) capable of performing two operations: an arithmetic addition and a logical NAND. The operation executed by the ALU is determined by a single control qubit, ∣S⟩. When ∣S⟩=∣0⟩, the Arithmetic logic unit adds n-bits of its inputs, and when ∣S⟩=∣1⟩, a bitwise NAND operation is performed. In principle, if the control qubit is in a superposition of ∣0⟩ and ∣1⟩, the ALU could execute both operations simultaneously in a quantum superposition, demonstrating the concept of quantum-controlled computation. They implemented both a serial version (reusing a 1-bit adder sequentially) and a parallel version (employing an n-qubit QFT for addition) for up to 4-bit numbers and compared gate counts. The parallel QFT approach, while using more qubits, reduced the depth of addition by exploiting the QFT’s ability to apply carries through phase rotations. Importantly, their experiments on a real IBM quantum processor indicated the qALU operations could be carried out with a high success probability despite hardware noise. This suggests the building blocks for quantum ALUs are on the right track. From the available research, it seems that the challenge of merging both arithmetic and logical functions within the same quantum circuit is indeed possible, yet every methodology, as it stands, seems to have its drawbacks. Reversible logic designs aim for minimal overhead (no ancillas, few gates) but can be complex to map to quantum gates. QFT-based designs handle arithmetic elegantly yet require non-classical operations (rotations) that are susceptible to noise on present devices. Prior demonstrations have also been limited in the variety of operations. This work extends this frontier by constructing a more comprehensive quantum ALU and examining a hybrid approach (dual basis) to leverage the best of both worlds. Methods and experimental procedure Assumptions and scope constraints Because “n‑bit” is unspecified, the following explicit assumptions are avoided: Assumption on n for experiments. Exhaustive Aer results for n=4 are reported (captured in the provided execution artifacts). Analytical estimates for n=3 are also provided using standard ripple‑carry adder formulas; these are labeled as estimates rather than measured points. Assumption on hardware feasibility. Hardware runs are limited by (i) total qubit count required by the QALU (inputs, output register, flags/comparator, and MCX ancillas) and (ii) noise sensitivity of deeper circuits. Therefore n=2 is used for IBM hardware experiments, consistent with common NISQ practice for exhaustive validation. Architectural overview and reversible semantics The QALU is implemented as a family of circuits parameterized by operation code (chosen classically/at compile time) and bit width n. Each circuit acts on quantum registers: Operand registers A (n qubits) and B (n qubits) representing unsigned n‑bit integers in little‑endian. Output register R (n qubits) initialized to |0^n⟩, receiving the selected operation’s result. Flag register emitting NZCV‑style outputs: N: negative flag, taken as MSB of R. Z: zero flag, indicating R=0. C: carry‑out (ADD) or no‑borrow (SUB/CMP) behavior following common ISA conventions. V: signed overflow flag for addition/subtraction. Comparator outputs EQ, LT u , LT s , derived from flags in the CMP operation using standard condition‑code identities (e.g., signed LT uses N≠V after subtraction). Below is the circuit diagram of the Proposed Quantum Arithmetic Logic Unit. Figure 1. Proposed QALU Circuit Layout. The key reversible‑computing discipline is: Compute the selected operation into R without destroying A or B. 2. Compute flags and compare outputs into dedicated qubits. Ensure that scratch/temporary ancillae are uncomputed back to |0⟩, so the only non‑input outputs are R and the status/compare bits. This aligns with the general reversible embedding U f :|x⟩|y⟩↦|x⟩|y⊕f(x)⟩ philosophy discussed in reversible computing and quantum circuit design. Arithmetic core For ADD and SUB (and CMP), the QALU uses a ripple‑carry adder design consistent with the CDKM “MAJ/UMA” approach, which has linear depth and well‑characterized Toffoli/CNOT counts. ADD: R←A+B (mod 2^n) with C as carry‑out. SUB: R←A-B (mod 2^n) realized via two’s‑complement addition structure; subtraction is discussed as a minor modification of ripple‑carry addition in the CDKM paper. CMP: computes flags and comparator outputs from subtraction without requiring a distinct comparator circuit, consistent with the observation that comparator functionality can be obtained from subtraction‐derived high bits/flags. Logic and data ‑ movement core Bitwise operations {AND, OR XOR, NAND, NOR XNOR}, unary NOT (A), shifts, rotates, and passthrough are computed into R using CNOT and Toffoli where needed while keeping A, B unchanged. These are “classical reversible” embeddings implemented on computational basis states using standard quantum gates. Flag computation Flag definitions follow widely deployed ISA conventions: N: MSB of result R n-1 . Z: asserted if and only if all bits of Rare zero; implemented by inverting R, applying an n‑controlled X (MCX) into Z, then undoing. Qiskit provides MCX constructions (e.g., v‑chain / recursive styles) whose decomposition choices affect CX/T cost. C: carry from addition; for subtraction and compare, the common convention that C=1 signifies no borrow (as in Arm A32/T32 compare/subtract semantics) is used. V: signed overflow (two’s complement); for comparison, the overflow of subtraction is recorded so that signed LT can be computed from N and V. The Arm condition‑code mapping makes explicit that signed comparisons use combinations such as LT⇔N≠V and GE⇔N==V. T ‑ count measurement method To evaluate fault‑tolerant efficiency, circuits are compiled into a Clifford+T‑style basis and count operations: T‑count: number of t plus tdg gates after transpilation to the chosen basis. CX‑count: number of cx gates after transpilation. Depth: circuit depth after transpilation. Qiskit’s transpile function supports specifying basis gates and optimization level, and Qiskit provides analysis utilities such as CountOps to count operations in a circuit. This measurement is meaningful because (i) T gates are widely considered cost‑dominant in fault‑tolerant architectures, and (ii) Toffoli decompositions can vary in T‑count (e.g., 7‑T “conventional” decompositions versus 4‑T low‑overhead constructions). Aer simulation and exhaustive verification protocol Exhaustive verification on Aer used a matrix-product-state simulation method to execute the deterministic QALU circuits on computational-basis inputs. Aer’s MPS method is a documented simulation mode intended to enable larger-circuit simulation under an MPS representation, with user-configurable method options. For each supported operation and for each ordered input pair in the (n)-bit input space, the procedure was: operand registers were prepared in the computational basis, the corresponding operation-specific QALU circuit was executed, relevant output qubits (result register and status/comparator outputs) were measured, a classical reference model computed the expected result and flags, and a mismatch was recorded if any measured output bit differed from expectation. The test volume scaled as “number of operations × (2 2n )” input pairs (carry-in conventions and any handling of undefined/ISA-specific flag behavior were unspecified and should be pinned down in the reproducible reference-model definition). IBM hardware execution and noise suppression/mitigation Hardware experiments are executed using Qiskit Runtime primitives and IBM backends selected by least_busy filtering (operational and sufficient qubit count). IBM documents the least_busy selection method and execution modes such as Batch for multi‑job workloads. Noise suppression included: Dynamical decoupling (DD) enabled at the primitive options level, motivated by dynamical suppression of decoherence (“bang‑bang”/DD) techniques. Gate twirling / randomized compiling‑style randomization, enabled via runtime twirling options; the conceptual basis is that randomized compiling can tailor coherent errors into more stochastic channels. Readout error mitigation used mthree, which targets scalable measurement mitigation and is designed for sparse outcome regimes common in large quantum circuits. Important methodological note. These are error mitigation and suppression techniques suitable for NISQ‑era hardware runs; they do not constitute full quantum error correction (QEC). Full QEC requires encoding logical qubits and syndrome extraction (e.g., stabilizer/surface codes), which introduces substantial overhead but is the expected route to large‑scale fault tolerance. Results and comparative evaluation Aer exhaustive correctness and representative output visualizations Exhaustive Aer correctness (n=4). The Aer campaign reports: n=4, total tests = 4096, mismatches = 0 (i.e., every operation for every input pair matches the classical reference). This constitutes strong functional evidence because it is not sampling‑based: it is a complete truth‑table verification over the full input domain for the tested n. Below are representative Aer heat maps (n=4) illustrating deterministic correctness across all input pairs. Figure 2. Aer (n=4) ADD output heatmap (R) Figure 3. Aer (n=4) ADD flags heatmap (NZCV packed) For CMP, the circuit emits explicit compare bits (in addition to flags), enabling direct use by downstream reversible control logic. Figure 4. Aer (n=4) CMP compare outputs heatmap (EQ, LT u , LT s ) Fault ‑ tolerant resource metrics on Aer ‑ compiled circuits (n=4) The table below summarizes T‑count, CX‑count, and depth (after transpiling to a Clifford+T basis) for each supported operation at n=4. T‑count is computed as T+T † . Qiskit’s transpiler optimization level impacts these values; the evaluation uses a high optimization level consistent with Qiskit’s “heavier” optimization regimes. Operation T‑count CX‑count Depth Qubits ADD 86 102 138 24 SUB 86 102 138 24 CMP 86 106 138 24 AND 51 43 52 24 OR 51 51 54 24 NAND 51 43 50 24 NOR 51 51 52 24 XOR 23 27 43 24 XNOR 23 27 42 24 NOTA 23 23 42 24 SHL 23 23 42 24 SHR 23 23 42 24 ROTL 23 24 42 24 ROTR 23 24 42 24 PASSA 23 23 42 24 PASSB 23 23 42 24 Table 1. Clifford+T transpilation cost metrics for the proposed reversible n=4 quantum ALU operations (Qiskit Aer evaluation), showing T-count (#T + #T†), CX-count, circuit depth, and total qubits under high-level transpiler optimization. Interpretation . The dominant contributors to T‑count are (i) arithmetic Toffoli layers in ADD/SUB/CMP and (ii) the multi‑controlled X used to compute the Z flag. This is consistent with the general observation that Toffoli‑rich logic dominates non‑Clifford cost and motivates low‑T Toffoli/addition research (Jones; Gidney). Hardware execution results with suppression and mitigation (n=2) Backend and execution mode . The hardware run selected ibm_torino and executed the full circuit suite in manageable batches with a fixed shot budget per circuit. Backend selection and batch execution are consistent with IBM’s documented runtime workflow. Aggregate correctness. For n=2, there are 2^2n=16 input pairs per operation and 16 operations, hence 256 circuits total. The run reported: RAW mismatches: 22 / 256 → P("correct" )=234/256=0.9141 MITIGATED mismatches (mthree): 23 / 256 → P("correct" )=233/256=0.9102 A slight degradation after mitigation can occur when calibration overhead, time‑varying readout, or statistical effects outweigh benefits—especially at small n where logical success is already high and mitigation matrices may amplify variance. This behavior is documented as a practical caveat in measurement mitigation literature emphasizing calibration accuracy/scalability tradeoffs. Representative per‑input success probability heatmaps for ADD on hardware: Figure 5. IBM hardware (n=2) ADD raw P(correct) heatmap Figure 6. IBM hardware (n=2) ADD mitigated P(correct) heatmap (mthree) Noise suppression configuration rationale . Dynamical decoupling is motivated by the theory of suppressing decoherence under pulse sequences, while gate twirling/randomized compiling is motivated by the goal of tailoring coherent noise into effectively stochastic channels. Both approaches are supported as configurable runtime options. Analytical scaling and T ‑ count plot For scaling discussion, the CDKM ripple‑carry Toffoli count is used and combine it with the fact that Toffoli→Clifford+T decompositions have multiple cost regimes: “conventional” 7‑T constructions (often used as a baseline) and 4‑T low‑overhead constructions (Jones) that require additional ingredients (teleportation/measurement). The figure below plots an ADD T‑count scaling envelope (analytical baseline assuming 7T per Toffoli plus a linear-Toffoli MCX-based Z-flag) and a hypothetical 4T per Toffoli envelope, alongside the measured n=4 point from Qiskit transpilation. Figure 7. T-count scaling plot for ADD Takeaway . Even within a ripple‑carry architecture, replacing Toffoli/MCX subcircuits with lower‑T equivalents (Jones 4T Toffoli; Gidney temporary AND relative‑phase Toffoli optimizations) can significantly reduce T‑count at scale—suggesting clear upgrade paths for future QALU iterations. Comparison against prior QALU / reversible ALU designs The table below contrasts the proposed QALU against ≥8 prior designs. Many prior ALU works report “quantum cost,” garbage outputs, or reversible gate counts rather than Clifford+T T‑count/CX‑count; where T‑count is not explicitly available, NR (not reported) is marked and emphasize qualitative comparability limits. Design Qubits / ancillas T‑count focus Function set Flags + compare “Garbage” handling Exhaustive verification Notes This work (Qiskit QALU) data + flags/compare + MCX ancillas (e.g., 24 total at n=4) Yes (measured) 16 ops incl. shifts/rotates + CMP NZCV + EQ/LT u / LT s Work ancillas cleaned; outputs explicit Yes on Aer (n=4); HW (n=2) Qiskit‑native, FT‑metric‑aware. Thomsen et al. (2010) [26] No ancillae, -bit ALU No “Five basics” ops; ALU integrated No explicit NZCV/CMP Garbage‑free, no ancillae NR Extremely low reversible gate count (6n) but smaller function set. Zhou et al. (2011) [27] n‑bit ALU with multiple control lines No ≥8 ops (ADD/SUB/XOR/NOT/OR/AND etc.) No explicit NZCV/CMP Discusses reversibility of OR/AND via controls NR Control‑line ALU; focuses on structural design more than FT cost. Moallem et al. (2014) [28] 1‑bit ALUs No 6 / 8 / 16 ops variants No Minimize garbage/const inputs NR Evaluated via “quantum cost” and reversible metrics. Haghparast & Bolhassani (2016)[29] 1‑digit ALUs No Multi designs; comparative tables No explicit NZCV/CMP Emphasizes garbage / const inputs NR Highlights Thomsen 1‑bit cost metrics and broader design space. Slimani & Achour (2017) [30] 4‑bit ALU (reversible) No Claimed up to 28 functions No explicit NZCV/CMP Focus on garbage/quantum cost NR High functionality in reversible‑logic metrics; not FT‑T‑count reported. Biswal et al. (2020/2021) [32] 1‑bit→general module Yes (Clifford+T framing) Several logical ops tested Not emphasized FT + optimization rules NR Focuses on FT Clifford+T design and T parallelism (preview). Keshavarz et al. (2024) [33] Multiple ALU designs Yes (T‑count/T‑depth) Multiple ops; improved functionality count Not emphasized FT lemma & multiplexer integration NR Reports % improvements in preview; full numeric tables not accessible here. Çakmak et al. (2023) [31] QFT qALU variants Mixed ADD + NAND No QFT‑domain operations Partial Demonstrates on IBM hardware but limited op set; rotation‑heavy. Table 2. Comparison Table Comparative conclusions . The proposed QALU is not “best” on every axis: ancilla‑free designs (Thomsen) can use dramatically fewer qubits and reversible gates for a smaller operation set. Instead, this work is positioned as “better” in the specific sense of (a) broader ALU‑like functionality with explicit NZCV and compare outputs, (b) direct Clifford+T‑metric visibility (T‑count), and (c) exhaustive correctness validation with hardware demonstration using contemporary mitigation/suppression tooling. Discussion Novelty and design tradeoffs The main novelty is system‑level integration under fault‑tolerant metrics: building a practical, verifiable QALU with flags and compare within Qiskit and quantifying its non‑Clifford cost after realistic compilation. Many prior ALU works optimize for reversible “quantum cost” metrics or propose ALU architectures without presenting Clifford + T T‑count data and exhaustive formal verification. A key tradeoff is qubits vs T‑count vs control flexibility . Designs that integrate many operations into a single “opcode‑controlled” circuit may require large reversible multiplexers/control structures, increasing depth and ancilla use. By contrast, this work’s compile‑time operation selection produces separate circuits per operation, which is a natural fit for quantum algorithms that know the operation sequence at compile time. This mirrors how many quantum arithmetic blocks (e.g., adders) are used as subroutines rather than as dynamically opcode‑decoded units. Scalability and fault‑tolerant implications From the CDKM ripple‑carry perspective, arithmetic depth scales linearly with n, and Toffoli count scales linearly as well; thus T‑count scales linearly when Toffoli is decomposed into Clifford + T. However, the literature provides strong evidence that significant constant‑factor improvements are possible: Adder improvements: Gidney’s temporary logical‑AND yields 4n + O(1) T‑count adders, suggesting that the arithmetic core in this QALU could be swapped out to reduce T‑count asymptotically by ~ 2×. Toffoli improvements: Jones shows four‑T Toffoli constructions and discusses adding controls at 4n T rather than 8n, suggesting potential reductions for both Toffoli and multi‑control flag logic. MCX optimizations: relative‑phase Toffoli identities can reduce T/CX counts for multicontrolled gates, relevant to Z‑flag computation. Thus, the current QALU should be seen as a platform architecture whose arithmetic and MCX submodules can be upgraded as synthesis improves. Hardware results and mitigation limitations The IBM hardware campaign demonstrates that even for small n, circuit families that include arithmetic and multi‑controlled structures can achieve > 90% correctness across exhaustive inputs. However, the lack of improvement under mthree mitigation in this particular run highlights a key limitation: mitigation efficacy depends on stable calibration, appropriate error models, and sufficient statistics; in some regimes mitigation can increase variance or amplify errors. Noise suppression choices (DD and twirling) are theoretically well‑motivated (Viola–Lloyd; Wallman–Emerson) and exposed in IBM runtime options, but their interaction with circuit scheduling and backend specifics can produce variable outcomes. Limitations This paper has three notable limitations: Hardware scale is limited to n = 2 due to qubit overhead and noise sensitivity; this is consistent with NISQ constraints and not a fundamental restriction of the design. T‑count is compiler‑dependent: reported T/CX/depth depend on Qiskit’s chosen decompositions and optimization passes; this is why the basis and transpilation method are explicitly defined. “Better than prior designs” is metric‑conditional: ancilla‑free designs may dominate qubit count; QFT designs may dominate in qubit savings for addition; and state‑of‑the‑art low‑T adders can dominate arithmetic T‑count. The principal claim here is improved end‑to‑end reproducibility, verification, and FT‑metric reporting for a broad ALU feature set. Conclusion A reversible n‑bit QALU implemented in Qiskit that supports a broad classical ALU function set together with NZCV flags and explicit compare outputs is presented and evaluated. The design is verified exhaustively on Aer at n = 4 with zero mismatches and tested on an IBM backend at n = 2 under dynamical decoupling, gate twirling, and mthree measurement mitigation, achieving ~ 91% end‑to‑end correctness. Beyond correctness, the work provides a fault‑tolerant‑metric‑aware evaluation by measuring T‑count, CX‑count, and depth after Clifford + T transpilation, and it situates the design within a research landscape spanning reversible computing foundations, ripple‑carry and QFT adders, and low‑T arithmetic constructions. Declarations Author contributions statement A.B. conceived the experiment(s), conducted the experiment(s), and analysed the results. Competing Interests The author declares no competing interests. Funding Declaration This research received no external funding. Author Contribution A.B. conceived the experiment(s), conducted the experiment(s), and analysed the results. Data Availability The datasets generated and/or analysed during the current study are available in the Zenodo repository, [https://doi.org/10.5281/zenodo.18866275](https://github.com/AgniswarBanerjee05/Quantum-ALU) . References Landauer, R. Irreversibility and Heat Generation in the Computing Process (IBM J. Res. Dev. [58], 1961). Bennett, C. H. Logical Reversibility of Computation (IBM J. Res. Dev. [59], 1973). Toffoli, T. Reversible Computing (MIT/LCS/TM–151, 1980). [60]. Fredkin, E. & Toffoli, T. Conservative Logic. Int. J. Theor. Phys. (1982). [61]. Barenco, A. et al. Elementary gates for quantum computation. Phys. Rev. A. [48] (1995). Vedral, V., Barenco, A. & Ekert, A. (1995/1996). Quantum networks for elementary arithmetic operations. arXiv / Phys. Rev. A. [62]. Draper, T. G. Addition on a Quantum Computer. arXiv:quant–ph/0008033. [63] (2000). Cuccaro, S. A., Draper, T. G., Kutin, S. A. & Moulton, D. P. A new quantum ripple–carry addition circuit. [9] (2004). Ruiz–Perez, L. & Garcia–Escartin, J. C. Quantum arithmetic with the quantum Fourier transform (Quantum Inf. Process. [11], 2017). Selinger, P. Quantum circuits of T–depth one. Phys. Rev. A. [15] (2013). Jones, C. Low–overhead constructions for the fault–tolerant Toffoli gate. Phys. Rev. A. [14] (2013). Gidney, C. Halving the cost of quantum addition (Quantum. [13], 2018). Ross, N. J. & Selinger, P. Optimal ancilla–free Clifford + T approximation of z–rotations. (2014). arXiv:1403.2975. [64]. Kliuchnikov, V., Maslov, D. & Mosca, M. Practical/asymptotically optimal approximation of single–qubit unitaries by Clifford + T. arXiv / PRL. [65] (2012)/2013. Wallman, J. J. & Emerson, J. Noise tailoring for scalable quantum computation via randomized compiling. Phys. Rev. / (2016). arXiv:1512.01098. [66]. Viola, L. & Lloyd, S. Dynamical suppression of decoherence in two–state quantum systems (arXiv / Phys. Rev. A. [67], 1998). Rudinger, K. et al. Scalable mitigation of measurement errors on quantum computers (mthree–related). (2021). arXiv:2108.12518. [68]. Nation, P. D. et al. Efficient measurement error mitigation for sparse outcomes. (2022). arXiv:2201.11046. [69]. Qiskit, D. CDKMRippleCarryAdder API reference. [70]. Qiskit Documentation. AerSimulator / matrix_product_state method references. [35]. Qiskit Documentation. Transpile / basis_gates / optimization levels. [42]. Qiskit Documentation. generate_preset_pass_manager / transpiler stages. [71]. IBM Quantum Documentation. least_busy backend selection and QPU info. [72]. IBM Quantum Documentation. Batch execution mode. [73]. IBM Quantum Documentation. Runtime options for dynamical decoupling and twirling. [74]. Thomsen, M. K., Glück, R. & Axelsen, H. B. Reversible arithmetic logic unit for quantum arithmetic. J. Phys. A (2010). [16]. Zhou, R., Shi, Y. & Zhang, M. Reversible arithmetic logic unit. (2011). arXiv:1107.3924. [17]. Moallem, P., Ehsanpour, M., Bolhasani, A. & Montazeri, M. Optimized reversible arithmetic logic units. J. Electron. (China ). (2014). [48]. Haghparast, M. & Bolhassani, A. Optimization Approaches for Designing Quantum Reversible Arithmetic Logic Unit. Int. J. Theor. Phys. (2016). [49]. Slimani, A. & Benslama, A. Optimized 4–bit Quantum Reversible Arithmetic Logic Unit. Int. J. Theor. Phys. (2017). [19]. Çakmak, Z. et al. QFT based quantum arithmetic logic unit on IBM quantum computer. (2023). arXiv:2306.09560. [20]. Biswal, L., Bandyopadhyay, C., Ghosh, S. & Rahaman, H. Fault–Tolerant Implementation of QALU Using Clifford + T–Group. Springer proceedings chapter (preview). [21] (2020)/2021. Keshavarz, S., Reshadinezhad, M. R. & Moghimi, S. T–count and T–depth efficient fault–tolerant quantum arithmetic and logic unit. Quantum Inf. Process. (preview). (2024). Additional Declarations No competing interests reported. Supplementary Files QALUFinal.ipynb Cite Share Download PDF Status: Under Review Version 1 posted Editorial decision: Revision requested 06 Apr, 2026 Reviews received at journal 02 Apr, 2026 Reviews received at journal 02 Apr, 2026 Reviewers agreed at journal 25 Mar, 2026 Reviewers agreed at journal 25 Mar, 2026 Reviewers invited by journal 25 Mar, 2026 Editor assigned by journal 24 Mar, 2026 Editor invited by journal 09 Mar, 2026 Submission checks completed at journal 04 Mar, 2026 First submitted to journal 04 Mar, 2026 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8959757","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":601811474,"identity":"a69897d7-5b6c-4793-b4e2-3839dc01ece8","order_by":0,"name":"Agniswar Banerjee","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABKklEQVRIie2PQUuEUBDHnwR6eYvXt7C4X2FE0BZiP4sieHpb16BoDcGT1FXoMwRBIHRTHrkXw+uDOtV1WViCWC9b6iE8KHUMer/DMH+YHzODkEDwR0mbAm2LEVJJ25HJ75VxjCS/VvCPq+C74a2CBhX1+jbLqnB+YilP2dvp5EgznoPVKz8/xEhhj3c9CuEbm41Cd/YQHbt6gT3DfMmdgOb1YdjzeN8aXgCTkgOAlJpjHzMn4VQPqFwrBJt9yrRWsipZApRrq/Lx5/I+bpT9sAJlBOkoYQCcmpKPUxtIrSzCYUXnGNhov4JZvDbqw1w95p5zs7giWB74RSsLY7srzsBSqb69jOZTNXbTd/pxoakKy3vfJ3YnSFEnyH3jDWraTbuhMYFAIPjPfAFSRWgVV9rvWgAAAABJRU5ErkJggg==","orcid":"","institution":"Future Institute of Engineering and Management","correspondingAuthor":true,"prefix":"","firstName":"Agniswar","middleName":"","lastName":"Banerjee","suffix":""}],"badges":[],"createdAt":"2026-02-24 16:54:46","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8959757/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8959757/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":104262287,"identity":"c884084b-f21e-4929-b3e5-7abcd1b2ed49","added_by":"auto","created_at":"2026-03-09 18:40:51","extension":"jpg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":82593,"visible":true,"origin":"","legend":"\u003cp\u003eProposed QALU Circuit Layout.\u003c/p\u003e","description":"","filename":"Picture1.jpg","url":"https://assets-eu.researchsquare.com/files/rs-8959757/v1/83f6028cd714d9c04f0bf6cb.jpg"},{"id":104405239,"identity":"23049aca-1245-4549-b9ec-ded8107f6564","added_by":"auto","created_at":"2026-03-11 12:22:15","extension":"jpg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":74375,"visible":true,"origin":"","legend":"\u003cp\u003eAer (n=4) ADD output heatmap (R)\u003c/p\u003e","description":"","filename":"Picture2.jpg","url":"https://assets-eu.researchsquare.com/files/rs-8959757/v1/437007dd938ac9a7b33474b7.jpg"},{"id":104262289,"identity":"18c73867-a81e-42d3-89d0-47ecd7866d59","added_by":"auto","created_at":"2026-03-09 18:40:51","extension":"jpg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":70789,"visible":true,"origin":"","legend":"\u003cp\u003eAer (n=4) ADD flags heatmap (NZCV packed)\u003c/p\u003e","description":"","filename":"Picture3.jpg","url":"https://assets-eu.researchsquare.com/files/rs-8959757/v1/c3c524fb722c382978187431.jpg"},{"id":104262290,"identity":"e2c15de3-d362-427c-9c7e-a93412630cb6","added_by":"auto","created_at":"2026-03-09 18:40:51","extension":"jpg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":60272,"visible":true,"origin":"","legend":"\u003cp\u003eAer (n=4) CMP compare outputs heatmap (EQ, LT\u003csub\u003eu\u003c/sub\u003e, LT\u003csub\u003es\u003c/sub\u003e)\u003c/p\u003e","description":"","filename":"Picture4.jpg","url":"https://assets-eu.researchsquare.com/files/rs-8959757/v1/79d30dd24ee36502edf68648.jpg"},{"id":104404994,"identity":"80c8255d-f78b-4c88-98fb-1560b4a43870","added_by":"auto","created_at":"2026-03-11 12:21:32","extension":"jpg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":30065,"visible":true,"origin":"","legend":"\u003cp\u003eIBM hardware (n=2) ADD raw P(correct) heatmap\u003c/p\u003e","description":"","filename":"Picture5.jpg","url":"https://assets-eu.researchsquare.com/files/rs-8959757/v1/6e77b81089b75f697241a747.jpg"},{"id":104404780,"identity":"876b2d81-e573-46eb-b6b0-12fbd6177308","added_by":"auto","created_at":"2026-03-11 12:21:03","extension":"jpg","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":31768,"visible":true,"origin":"","legend":"\u003cp\u003eIBM hardware (n=2) ADD mitigated P(correct) heatmap (mthree)\u003c/p\u003e","description":"","filename":"Picture6.jpg","url":"https://assets-eu.researchsquare.com/files/rs-8959757/v1/7256c5eda9d47be61f66b5b5.jpg"},{"id":104262291,"identity":"d32a3bcf-d292-4254-81dc-a34c7a4e4b64","added_by":"auto","created_at":"2026-03-09 18:40:52","extension":"jpg","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":45864,"visible":true,"origin":"","legend":"\u003cp\u003eT-count scaling plot for ADD\u003c/p\u003e","description":"","filename":"Picture7.jpg","url":"https://assets-eu.researchsquare.com/files/rs-8959757/v1/e52435fde7c000066a9152aa.jpg"},{"id":104409283,"identity":"48ab69c4-0557-4b0c-9ed5-348572aef5b2","added_by":"auto","created_at":"2026-03-11 12:44:37","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1379706,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8959757/v1/23939a33-15f3-428c-960a-cea31f05d399.pdf"},{"id":104262294,"identity":"46395607-b466-4dcc-95fe-caecf4f3012e","added_by":"auto","created_at":"2026-03-09 18:40:52","extension":"ipynb","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":8494147,"visible":true,"origin":"","legend":"","description":"","filename":"QALUFinal.ipynb","url":"https://assets-eu.researchsquare.com/files/rs-8959757/v1/f7f6be106f14e666e725dfc4.ipynb"}],"financialInterests":"No competing interests reported.","formattedTitle":"Design of a Fault‑Tolerant‑Metric‑Aware, Reversible n‑bit Quantum Arithmetic Logic Unit using IBM Qiskit","fulltext":[{"header":"Introduction","content":"\u003cp\u003eClassical ALUs are multi‑operator units producing results and condition flags used by the control flow of von Neumann machines. Reproducing this capability in a quantum circuit is nontrivial because quantum operations must be reversible/unitary and cannot discard information without measurement. The reversibility requirement connects directly to Landauer\u0026rsquo;s principle relating logical irreversibility to thermodynamic cost, and to Bennett\u0026rsquo;s demonstration that irreversible computations can be embedded reversibly (often at the expense of extra workspace and later \u0026ldquo;uncomputation\u0026rdquo;).\u003c/p\u003e \u003cp\u003eReversible logic primitives (e.g., Toffoli/Fredkin/Peres‑style constructions) underpin many quantum arithmetic designs, since Toffoli is universal for classical reversible computation and maps cleanly to a quantum gate. Early quantum arithmetic networks for addition and modular arithmetic were provided by Vedral, Barenco, and Ekert, motivating adders as core building blocks for algorithms such as Shor\u0026rsquo;s.\u003c/p\u003e \u003cp\u003eRipple‑carry addition remains a central design point because of its linear depth and modest ancilla requirements. The Cuccaro\u0026ndash;Draper\u0026ndash;Kutin\u0026ndash;Moulton (CDKM) \u0026ldquo;MAJ/UMA\u0026rdquo; ripple‑carry adder uses 2n-1 Toffoli gates and 5n-3 CNOTs (no incoming carry), with linear depth; variants support incoming carry and comparator‑style \u0026ldquo;high‑bit only\u0026rdquo; computations. Importantly, the CDKM paper explicitly notes that the \u0026ldquo;high bit only\u0026rdquo; configuration can be adapted into a comparator by subtracting and extracting the high bit.\u003c/p\u003e \u003cp\u003eQFT‑based adders (Draper) avoid explicit carry bits by operating in a Fourier basis, at the cost of controlled phase rotations. Subsequent work surveys and extends QFT arithmetic, including modular and signed variants. However, from a fault‑tolerant perspective, arbitrary rotations typically require compilation to Clifford\u0026thinsp;+\u0026thinsp;T sequences with T‑count scaling as a function of target approximation error (e.g., Ross\u0026ndash;Selinger, Kliuchnikov\u0026ndash;Maslov\u0026ndash;Mosca). This often makes \u0026ldquo;rotation‑heavy\u0026rdquo; QFT adders less competitive on strict Clifford\u0026thinsp;+\u0026thinsp;T metrics unless approximations are coarse or dedicated synthesis/approximation is carefully optimized.\u003c/p\u003e \u003cp\u003eBecause non‑Clifford operations dominate many fault‑tolerant cost models, modern arithmetic research focuses directly on T gates. Gidney shows how to reduce an n‑bit adder\u0026rsquo;s T‑count from \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(8n+O\\left(1\\right)\\)\u003c/span\u003e\u003c/span\u003e to \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(4n+O\\left(1\\right)\\)\u003c/span\u003e\u003c/span\u003e using \u0026ldquo;temporary logical‑AND\u0026rdquo; constructions. Complementarily, Jones presents Toffoli constructions using four T gates (instead of the \u0026ldquo;conventional\u0026rdquo; seven‑T decompositions) via teleportation‑style techniques and discusses extensions for adding controls. Selinger studies T‑depth reduction and Toffoli representations with additional ancillae. These results motivate using \u003cb\u003eT‑count\u003c/b\u003e and \u003cb\u003eT‑depth\u003c/b\u003e as evaluation metrics for QALUs intended for fault‑tolerant execution.\u003c/p\u003e \u003cp\u003eReversible/quantum ALUs have been proposed in multiple architectural styles:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eAncilla‑free integrated reversible ALUs: Thomsen, Gl\u0026uuml;ck, and Axelsen present a garbage‑free reversible ALU requiring only 6n elementary reversible gates for five basic operations and no ancillae, achieved by nested \u0026ldquo;V‑shape\u0026rdquo; structures.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eControl‑line driven reversible ALUs: Zhou et al. (2011) propose a reversible ALU supporting multiple operations including ADD/SUB/XOR/NOT and other variants via several control lines.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eOptimization‑focused reversible ALUs: Moallem et al. (2014) and Haghparast \u0026amp; Bolhassani (2016) propose 1‑bit/\u0026ldquo;one‑digit\u0026rdquo; reversible ALUs and evaluate them using reversible‑logic cost metrics (quantum cost, garbage, constant inputs), reporting function sets up to 16 operations.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eHigh‑functionality multi‑bit ALUs: Slimani \u0026amp; Achour (2017) describe an \u0026ldquo;optimized 4‑bit\u0026rdquo; reversible ALU claiming up to 28 functions with reduced quantum cost/delay compared with existing designs.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eQFT‑based qALUs on IBM hardware: \u0026Ccedil;akmak et al. (2023) propose a \u0026ldquo;primitive\u0026rdquo; QFT‑based qALU supporting ADD and NAND and execute on IBM hardware.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eExplicit fault‑tolerant QALUs: Biswal et al. describe a Clifford\u0026thinsp;+\u0026thinsp;T‑group QALU and optimization strategies emphasizing T‑gate parallelism and circuit optimization, though the accessible preview emphasizes architecture rather than full numerical tables.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eRecent T‑metric‑centric QALUs: Keshavarz et al. (2024) explicitly frame QALU design as a low‑T‑cost problem and report percentage improvements in T‑count/T‑depth/qubits/functions compared to prior work in the accessible preview.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eAcross this literature, a persistent gap is the combination of: (i) a broad ALU‑like function set, (ii) classical‑style flags and compare outputs, (iii) explicit Clifford\u0026thinsp;+\u0026thinsp;T (T‑count) accounting, and (iv) exhaustive correctness verification plus hardware validation with mitigation/suppression\u0026mdash;all in a single, reproducible Qiskit implementation. This paper targets that intersection.\u003c/p\u003e"},{"header":"Literature Review","content":"\u003cp\u003eEarly work on quantum arithmetic operations established two main paradigms: quantum ripple-carry adders based on reversible gate networks, and quantum Fourier transform (QFT) adders operating via phase rotations. Draper\u0026rsquo;s 2000 seminal work introduced addition in Fourier space, showing that two n-bit numbers can be added by applying a quantum Fourier transform, then performing controlled phase shifts proportional to the addend, and finally inverse transforming. This approach avoids carrying propagation in a computational basis, allowing addition in linear depth (though with many small rotation gates). Follow-up research expanded this to subtraction, multiplication and handling of signed numbers using QFT-based techniques. In parallel, reversible logic researchers developed ripple-carry adders using Toffoli (CCX) and CNOT gates that mimic classical binary addition with carries. Cuccaro et al. (2004) presented an n-bit ripple adder requiring only one ancilla (carry qubit) and achieving lower depth than earlier designs. Such adders use a majority and unmajority (MAJ/UMA) gate sequence to propagate carries, and they can be combined with other reversible circuits for a full arithmetic unit.\u003c/p\u003e \u003cp\u003eBuilding reversible adders, reversible ALUs were proposed to integrate multiple operations. Thomsen, Gl\u0026uuml;ck, and Axelsen (2010) designed a reversible ALU that could perform five basic arithmetic/logic functions on two n-bit operands (e.g., addition, bitwise AND, OR XOR, etc.) within a single circuit. Crucially, their design produced no garbage output and used no extra ancillas, by reusing the existing qubits for multiple purposes in different operations. The authors reported that this optimized ALU can be built with only 6n reversible gates (scaling linearly in bit-width), thanks to a novel V-shape circuit structure originally developed for adders. This result highlighted that in theory, highly efficient quantum-compatible ALUs are attainable; however, the abstract reversible gates in their design (multi-bit-controlled gates) need decomposition to elementary quantum gates when implemented on real hardware.\u003c/p\u003e \u003cp\u003eRecently, Phillip et al. (2023) began implementing small-scale quantum ALUs on real quantum systems, demonstrated a prototype 2-bit quantum ALU using IBM Qiskit. Their ALU accepted a 2-bit input (with an extra carry bit) and could compute a limited set of functions (NOT, AND, OR, and a 2-bit addition). Although limited in width and functionality, this experiment showed that basic ALU operations can be realized on quantum circuits and hinted that a quantum ALU could even surpass a classical one in speed for certain tasks or enable new quantum operations. More advanced, Cakmak et al. (2023) introduced a Quantum Fourier Transform-based qALU and ran it on IBM\u0026rsquo;s quantum hardware. Their design showed a primitive Arithmetic Logic Unit (ALU) capable of performing two operations: an arithmetic addition and a logical NAND. The operation executed by the ALU is determined by a single control qubit, ∣S⟩. When ∣S⟩=∣0⟩, the Arithmetic logic unit adds n-bits of its inputs, and when ∣S⟩=∣1⟩, a bitwise NAND operation is performed. In principle, if the control qubit is in a superposition of ∣0⟩ and ∣1⟩, the ALU could execute both operations simultaneously in a quantum superposition, demonstrating the concept of quantum-controlled computation. They implemented both a serial version (reusing a 1-bit adder sequentially) and a parallel version (employing an n-qubit QFT for addition) for up to 4-bit numbers and compared gate counts. The parallel QFT approach, while using more qubits, reduced the depth of addition by exploiting the QFT\u0026rsquo;s ability to apply carries through phase rotations. Importantly, their experiments on a real IBM quantum processor indicated the qALU operations could be carried out with a high success probability despite hardware noise. This suggests the building blocks for quantum ALUs are on the right track.\u003c/p\u003e \u003cp\u003eFrom the available research, it seems that the challenge of merging both arithmetic and logical functions within the same quantum circuit is indeed possible, yet every methodology, as it stands, seems to have its drawbacks. Reversible logic designs aim for minimal overhead (no ancillas, few gates) but can be complex to map to quantum gates. QFT-based designs handle arithmetic elegantly yet require non-classical operations (rotations) that are susceptible to noise on present devices. Prior demonstrations have also been limited in the variety of operations. This work extends this frontier by constructing a more comprehensive quantum ALU and examining a hybrid approach (dual basis) to leverage the best of both worlds.\u003c/p\u003e "},{"header":"Methods and experimental procedure","content":"\u003cp\u003e\u003cstrong\u003eAssumptions and scope constraints\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eBecause \u0026ldquo;n‑bit\u0026rdquo; is unspecified, the following explicit assumptions are avoided:\u003c/p\u003e\n\u003cul\u003e\n \u003cli\u003eAssumption on n for experiments. Exhaustive Aer results for n=4 are reported (captured in the provided execution artifacts). Analytical estimates for n=3 are also provided using standard ripple‑carry adder formulas; these are labeled as estimates rather than measured points.\u003c/li\u003e\n \u003cli\u003eAssumption on hardware feasibility. Hardware runs are limited by (i) total qubit count required by the QALU (inputs, output register, flags/comparator, and MCX ancillas) and (ii) noise sensitivity of deeper circuits. Therefore n=2 is used for IBM hardware experiments, consistent with common NISQ practice for exhaustive validation.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003e\u003cstrong\u003eArchitectural overview and reversible semantics\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe QALU is implemented as a family of circuits parameterized by operation code (chosen classically/at compile time) and bit width n. Each circuit acts on quantum registers:\u003c/p\u003e\n\u003cul\u003e\n \u003cli\u003eOperand registers A (n qubits) and B (n qubits) representing unsigned n‑bit integers in little‑endian.\u003c/li\u003e\n \u003cli\u003eOutput register R (n qubits) initialized to |0^n\u0026rang;, receiving the selected operation\u0026rsquo;s result.\u003c/li\u003e\n \u003cli\u003eFlag register emitting NZCV‑style outputs:\u003cul\u003e\n \u003cli\u003eN: negative flag, taken as MSB of R.\u003c/li\u003e\n \u003cli\u003eZ: zero flag, indicating R=0.\u003c/li\u003e\n \u003cli\u003eC: carry‑out (ADD) or no‑borrow (SUB/CMP) behavior following common ISA conventions.\u003c/li\u003e\n \u003cli\u003eV: signed overflow flag for addition/subtraction.\u003c/li\u003e\n \u003c/ul\u003e\n \u003c/li\u003e\n \u003cli\u003eComparator outputs EQ, LT\u003csub\u003eu\u003c/sub\u003e, LT\u003csub\u003es\u003c/sub\u003e, derived from flags in the CMP operation using standard condition‑code identities (e.g., signed LT uses N\u0026ne;V after subtraction).\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eBelow is the circuit diagram of the Proposed Quantum Arithmetic Logic Unit.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFigure\u0026nbsp;1.\u003c/strong\u003e Proposed QALU Circuit Layout.\u003c/p\u003e\n\u003cp\u003eThe key reversible‑computing discipline is:\u003c/p\u003e\n\u003cul\u003e\n \u003cli\u003eCompute the selected operation into R without destroying A or B.\u003c/li\u003e\n \u003cli\u003e2.\u0026nbsp;Compute flags and compare outputs into dedicated qubits.\u003c/li\u003e\n \u003cli\u003eEnsure that scratch/temporary ancillae are uncomputed back to |0\u0026rang;, so the only non‑input outputs are R and the status/compare bits.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eThis aligns with the general reversible embedding U\u003csub\u003ef\u003c/sub\u003e:|x\u0026rang;|y\u0026rang;↦|x\u0026rang;|y\u0026oplus;f(x)\u0026rang;\u0026nbsp;philosophy discussed in reversible computing and quantum circuit design.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eArithmetic core\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eFor ADD and SUB (and CMP), the QALU uses a ripple‑carry adder design consistent with the CDKM \u0026ldquo;MAJ/UMA\u0026rdquo; approach, which has linear depth and well‑characterized Toffoli/CNOT counts.\u003c/p\u003e\n\u003cul\u003e\n \u003cli\u003eADD: R\u0026larr;A+B (mod 2^n) with C as carry‑out.\u003c/li\u003e\n \u003cli\u003eSUB: R\u0026larr;A-B (mod 2^n) realized via two\u0026rsquo;s‑complement addition structure; subtraction is discussed as a minor modification of ripple‑carry addition in the CDKM paper.\u003c/li\u003e\n \u003cli\u003eCMP: computes flags and comparator outputs from subtraction without requiring a distinct comparator circuit, consistent with the observation that comparator functionality can be obtained from subtraction‐derived high bits/flags.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003e\u003cstrong\u003eLogic and data\u003c/strong\u003e\u003cstrong\u003e‑\u003c/strong\u003e\u003cstrong\u003emovement core\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eBitwise operations {AND, OR XOR, NAND, NOR XNOR}, unary NOT (A), shifts, rotates, and passthrough are computed into R using CNOT and Toffoli where needed while keeping A, B unchanged. These are \u0026ldquo;classical reversible\u0026rdquo; embeddings implemented on computational basis states using standard quantum gates.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFlag computation\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eFlag definitions follow widely deployed ISA conventions:\u003c/p\u003e\n\u003cul\u003e\n \u003cli\u003eN: MSB of result R\u003csub\u003en-1\u003c/sub\u003e.\u003c/li\u003e\n \u003cli\u003eZ: asserted if and only if all bits of Rare zero; implemented by inverting R, applying an n‑controlled X (MCX) into Z, then undoing. Qiskit provides MCX constructions (e.g., v‑chain / recursive styles) whose decomposition choices affect CX/T cost.\u003c/li\u003e\n \u003cli\u003eC: carry from addition; for subtraction and compare, the common convention that C=1 signifies no borrow (as in Arm A32/T32 compare/subtract semantics) is used.\u003c/li\u003e\n \u003cli\u003eV: signed overflow (two\u0026rsquo;s complement); for comparison, the overflow of subtraction is recorded so that signed LT can be computed from N and V. The Arm condition‑code mapping makes explicit that signed comparisons use combinations such as LT\u0026hArr;N\u0026ne;V and GE\u0026hArr;N==V.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003e\u003cstrong\u003eT\u003c/strong\u003e\u003cstrong\u003e‑\u003c/strong\u003e\u003cstrong\u003ecount measurement method\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo evaluate fault‑tolerant efficiency, circuits are compiled into a Clifford+T‑style basis and count operations:\u003c/p\u003e\n\u003cul\u003e\n \u003cli\u003eT‑count: number of t plus tdg gates after transpilation to the chosen basis.\u003c/li\u003e\n \u003cli\u003eCX‑count: number of cx gates after transpilation.\u003c/li\u003e\n \u003cli\u003eDepth: circuit depth after transpilation.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eQiskit\u0026rsquo;s transpile function supports specifying basis gates and optimization level, and Qiskit provides analysis utilities such as CountOps to count operations in a circuit.\u003c/p\u003e\n\u003cp\u003eThis measurement is meaningful because (i) T gates are widely considered cost‑dominant in fault‑tolerant architectures, and (ii) Toffoli decompositions can vary in T‑count (e.g., 7‑T \u0026ldquo;conventional\u0026rdquo; decompositions versus 4‑T low‑overhead constructions).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAer simulation and exhaustive verification protocol\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eExhaustive verification on Aer used a matrix-product-state simulation method to execute the deterministic QALU circuits on computational-basis inputs. Aer\u0026rsquo;s MPS method is a documented simulation mode intended to enable larger-circuit simulation under an MPS representation, with user-configurable method options.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eFor each supported operation and for each ordered input pair in the (n)-bit input space, the procedure was: operand registers were prepared in the computational basis, the corresponding operation-specific QALU circuit was executed, relevant output qubits (result register and status/comparator outputs) were measured, a classical reference model computed the expected result and flags, and a mismatch was recorded if any measured output bit differed from expectation. The test volume scaled as \u0026ldquo;number of operations \u0026times; (2\u003csup\u003e2n\u003c/sup\u003e)\u0026rdquo; input pairs (carry-in conventions and any handling of undefined/ISA-specific flag behavior were unspecified and should be pinned down in the reproducible reference-model definition).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eIBM hardware execution and noise suppression/mitigation\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eHardware experiments are executed using Qiskit Runtime primitives and IBM backends selected by least_busy filtering (operational and sufficient qubit count). IBM documents the least_busy selection method and execution modes such as Batch for multi‑job workloads.\u003c/p\u003e\n\u003cp\u003eNoise suppression included:\u003c/p\u003e\n\u003cul\u003e\n \u003cli\u003eDynamical decoupling (DD) enabled at the primitive options level, motivated by dynamical suppression of decoherence (\u0026ldquo;bang‑bang\u0026rdquo;/DD) techniques.\u003c/li\u003e\n \u003cli\u003eGate twirling / randomized compiling‑style randomization, enabled via runtime twirling options; the conceptual basis is that randomized compiling can tailor coherent errors into more stochastic channels.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eReadout error mitigation used mthree, which targets scalable measurement mitigation and is designed for sparse outcome regimes common in large quantum circuits.\u003c/p\u003e\n\u003cp\u003eImportant methodological note. These are error mitigation and suppression techniques suitable for NISQ‑era hardware runs; they do not constitute full quantum error correction (QEC). Full QEC requires encoding logical qubits and syndrome extraction (e.g., stabilizer/surface codes), which introduces substantial overhead but is the expected route to large‑scale fault tolerance.\u003c/p\u003e"},{"header":"Results and comparative evaluation","content":"\u003cp\u003e\u003cstrong\u003eAer exhaustive correctness and representative output visualizations\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eExhaustive Aer correctness (n=4). The Aer campaign reports:\u003c/p\u003e\n\u003cul\u003e\n \u003cli\u003en=4, total tests = 4096, mismatches = 0 (i.e., every operation for every input pair matches the classical reference).\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eThis constitutes strong functional evidence because it is not sampling‑based: it is a complete truth‑table verification over the full input domain for the tested n.\u003c/p\u003e\n\u003cp\u003eBelow are representative Aer heat maps (n=4) illustrating deterministic correctness across all input pairs.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFigure\u0026nbsp;2.\u003c/strong\u003e Aer (n=4) ADD output heatmap (R)\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFigure\u0026nbsp;3.\u003c/strong\u003e Aer (n=4) ADD flags heatmap (NZCV packed)\u003c/p\u003e\n\u003cp\u003eFor CMP, the circuit emits explicit compare bits (in addition to flags), enabling direct use by downstream reversible control logic.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFigure\u0026nbsp;4.\u003c/strong\u003e Aer (n=4) CMP compare outputs heatmap (EQ, LT\u003csub\u003eu\u003c/sub\u003e, LT\u003csub\u003es\u003c/sub\u003e)\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFault\u003c/strong\u003e\u003cstrong\u003e‑\u003c/strong\u003e\u003cstrong\u003etolerant resource metrics on Aer\u003c/strong\u003e\u003cstrong\u003e‑\u003c/strong\u003e\u003cstrong\u003ecompiled circuits (n=4)\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe table below summarizes T‑count, CX‑count, and depth (after transpiling to a Clifford+T basis) for each supported operation at n=4. T‑count is computed as T+T\u003csup\u003e\u0026dagger;\u003c/sup\u003e. Qiskit\u0026rsquo;s transpiler optimization level impacts these values; the evaluation uses a high optimization level consistent with Qiskit\u0026rsquo;s \u0026ldquo;heavier\u0026rdquo; optimization regimes.\u003c/p\u003e\n\u003cdiv\u003e\n \u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eOperation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eT‑count\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eCX‑count\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eDepth\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eQubits\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eADD\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e86\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e102\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e138\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e24\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eSUB\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e86\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e102\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e138\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e24\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eCMP\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e86\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e106\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e138\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e24\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eAND\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e51\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e43\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e52\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e24\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eOR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e51\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e51\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e54\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e24\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eNAND\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e51\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e43\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e50\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e24\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eNOR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e51\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e51\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e52\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e24\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eXOR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e23\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e27\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e43\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e24\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eXNOR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e23\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e27\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e42\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e24\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eNOTA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e23\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e23\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e42\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e24\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eSHL\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e23\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e23\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e42\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e24\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eSHR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e23\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e23\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e42\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e24\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eROTL\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e23\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e24\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e42\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e24\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eROTR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e23\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e24\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e42\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e24\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003ePASSA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e23\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e23\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e42\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e24\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003ePASSB\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e23\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e23\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e42\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e24\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e\n\u003cp\u003e\u003cstrong\u003eTable\u0026nbsp;1.\u0026nbsp;\u003c/strong\u003eClifford+T transpilation cost metrics for the proposed reversible n=4 quantum ALU operations (Qiskit Aer evaluation), showing T-count (#T + #T\u0026dagger;), CX-count, circuit depth, and total qubits under high-level transpiler optimization.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eInterpretation\u003c/strong\u003e. The dominant contributors to T‑count are (i) arithmetic Toffoli layers in ADD/SUB/CMP and (ii) the multi‑controlled X used to compute the Z flag. This is consistent with the general observation that Toffoli‑rich logic dominates non‑Clifford cost and motivates low‑T Toffoli/addition research (Jones; Gidney).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eHardware execution results with suppression and mitigation (n=2)\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eBackend and execution mode\u003c/strong\u003e. The hardware run selected \u003cstrong\u003eibm_torino\u003c/strong\u003e and executed the full circuit suite in manageable batches with a fixed shot budget per circuit. Backend selection and batch execution are consistent with IBM\u0026rsquo;s documented runtime workflow.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAggregate correctness.\u003c/strong\u003e For n=2, there are 2^2n=16 input pairs per operation and 16 operations, hence 256 circuits total. The run reported:\u003c/p\u003e\n\u003cul\u003e\n \u003cli\u003eRAW mismatches: 22 / 256 \u0026rarr; P(\u0026quot;correct\u0026quot; )=234/256=0.9141\u003c/li\u003e\n \u003cli\u003eMITIGATED mismatches (mthree): 23 / 256 \u0026rarr; P(\u0026quot;correct\u0026quot; )=233/256=0.9102\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eA slight degradation after mitigation can occur when calibration overhead, time‑varying readout, or statistical effects outweigh benefits\u0026mdash;especially at small n where logical success is already high and mitigation matrices may amplify variance. This behavior is documented as a practical caveat in measurement mitigation literature emphasizing calibration accuracy/scalability tradeoffs.\u003c/p\u003e\n\u003cp\u003eRepresentative per‑input success probability heatmaps for ADD on hardware:\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFigure\u0026nbsp;5.\u003c/strong\u003e IBM hardware (n=2) ADD raw P(correct) heatmap\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFigure\u0026nbsp;6.\u003c/strong\u003e IBM hardware (n=2) ADD mitigated P(correct) heatmap (mthree)\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eNoise suppression configuration rationale\u003c/strong\u003e. Dynamical decoupling is motivated by the theory of suppressing decoherence under pulse sequences, while gate twirling/randomized compiling is motivated by the goal of tailoring coherent noise into effectively stochastic channels. Both approaches are supported as configurable runtime options.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAnalytical scaling and T\u003c/strong\u003e\u003cstrong\u003e‑\u003c/strong\u003e\u003cstrong\u003ecount plot\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eFor scaling discussion, the CDKM ripple‑carry Toffoli count is used and combine it with the fact that Toffoli\u0026rarr;Clifford+T decompositions have multiple cost regimes:\u0026nbsp;\u0026ldquo;conventional\u0026rdquo;\u0026nbsp;7‑T constructions (often used as a baseline) and 4‑T low‑overhead constructions (Jones) that require additional ingredients (teleportation/measurement).\u003c/p\u003e\n\u003cp\u003eThe figure below plots an ADD T‑count scaling envelope (analytical baseline assuming 7T per Toffoli plus a linear-Toffoli MCX-based Z-flag) and a hypothetical 4T per Toffoli envelope, alongside the measured n=4 point from Qiskit transpilation.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFigure\u0026nbsp;7.\u003c/strong\u003e T-count scaling plot for ADD\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTakeaway\u003c/strong\u003e. Even within a ripple‑carry architecture, replacing Toffoli/MCX subcircuits with lower‑T equivalents (Jones 4T Toffoli; Gidney temporary AND relative‑phase Toffoli optimizations) can significantly reduce T‑count at scale\u0026mdash;suggesting clear upgrade paths for future QALU iterations.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eComparison against prior QALU / reversible ALU designs\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe table below contrasts the proposed QALU against\u0026nbsp;\u0026ge;8 prior designs. Many prior ALU works report\u0026nbsp;\u0026ldquo;quantum cost,\u0026rdquo;\u0026nbsp;garbage outputs, or reversible gate counts rather than Clifford+T T‑count/CX‑count; where T‑count is not explicitly available, NR (not reported) is marked and emphasize qualitative comparability limits.\u003c/p\u003e\n\u003cdiv\u003e\n \u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\" width=\"100%\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eDesign\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eQubits / ancillas\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eT‑count focus\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eFunction set\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eFlags + compare\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e\u0026ldquo;Garbage\u0026rdquo; handling\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eExhaustive verification\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eNotes\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eThis work (Qiskit QALU)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e\u0026nbsp;\u0026nbsp;data + flags/compare + MCX ancillas (e.g., 24 total at n=4)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eYes (measured)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e16 ops incl. shifts/rotates + CMP\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eNZCV + EQ/LT\u003csub\u003eu\u003c/sub\u003e/ LT\u003csub\u003es\u003c/sub\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eWork ancillas cleaned; outputs explicit\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eYes on Aer (n=4); HW (n=2)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eQiskit‑native, FT‑metric‑aware.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eThomsen et al. (2010)\u003c/p\u003e\n \u003cp\u003e[26]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eNo ancillae,\u0026nbsp;\u0026nbsp;-bit ALU\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eNo\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e\u0026ldquo;Five basics\u0026rdquo; ops; ALU integrated\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eNo explicit NZCV/CMP\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eGarbage‑free, no ancillae\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eNR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eExtremely low reversible gate count (6n) but smaller function set.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eZhou et al. (2011)\u003c/p\u003e\n \u003cp\u003e[27]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003en‑bit ALU with multiple control lines\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eNo\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e\u0026ge;8 ops (ADD/SUB/XOR/NOT/OR/AND etc.)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eNo explicit NZCV/CMP\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eDiscusses reversibility of OR/AND via controls\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eNR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eControl‑line ALU; focuses on structural design more than FT cost.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eMoallem et al. (2014)\u003c/p\u003e\n \u003cp\u003e[28]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e1‑bit ALUs\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eNo\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e6 / 8 / 16 ops variants\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eNo\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eMinimize garbage/const inputs\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eNR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eEvaluated via \u0026ldquo;quantum cost\u0026rdquo; and reversible metrics.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eHaghparast \u0026amp; Bolhassani (2016)[29]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e1‑digit ALUs\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eNo\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eMulti designs; comparative tables\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eNo explicit NZCV/CMP\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eEmphasizes garbage / const inputs\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eNR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eHighlights Thomsen 1‑bit cost metrics and broader design space.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eSlimani \u0026amp; Achour (2017)\u003c/p\u003e\n \u003cp\u003e[30]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e4‑bit ALU (reversible)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eNo\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eClaimed up to 28 functions\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eNo explicit NZCV/CMP\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eFocus on garbage/quantum cost\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eNR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eHigh functionality in reversible‑logic metrics; not FT‑T‑count reported.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eBiswal et al. (2020/2021)\u003c/p\u003e\n \u003cp\u003e[32]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e1‑bit\u0026rarr;general module\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eYes (Clifford+T framing)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eSeveral logical ops tested\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eNot emphasized\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eFT + optimization rules\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eNR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eFocuses on FT Clifford+T design and T parallelism (preview).\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eKeshavarz et al. (2024)\u003c/p\u003e\n \u003cp\u003e[33]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eMultiple ALU designs\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eYes (T‑count/T‑depth)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eMultiple ops; improved functionality count\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eNot emphasized\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eFT lemma \u0026amp; multiplexer integration\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eNR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eReports % improvements in preview; full numeric tables not accessible here.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e\u0026Ccedil;akmak et al. (2023)\u003c/p\u003e\n \u003cp\u003e[31]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eQFT qALU variants\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eMixed\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eADD + NAND\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eNo\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eQFT‑domain operations\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003ePartial\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eDemonstrates on IBM hardware but limited op set; rotation‑heavy.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e\n\u003cp\u003e\u003cstrong\u003eTable\u0026nbsp;2.\u0026nbsp;\u003c/strong\u003eComparison Table\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eComparative conclusions\u003c/strong\u003e. The proposed QALU is not \u0026ldquo;best\u0026rdquo; on every axis: ancilla‑free designs (Thomsen) can use dramatically fewer qubits and reversible gates for a smaller operation set. Instead, this work is positioned as \u0026ldquo;better\u0026rdquo; in the specific sense of (a) broader ALU‑like functionality with explicit NZCV and compare outputs, (b) direct Clifford+T‑metric visibility (T‑count), and (c) exhaustive correctness validation with hardware demonstration using contemporary mitigation/suppression tooling.\u003c/p\u003e"},{"header":"Discussion","content":"\u003cdiv id=\"Sec19\" class=\"Section2\"\u003e \u003ch2\u003eNovelty and design tradeoffs\u003c/h2\u003e \u003cp\u003eThe main novelty is system‑level integration under fault‑tolerant metrics: building a practical, verifiable QALU with flags and compare within Qiskit and quantifying its non‑Clifford cost after realistic compilation. Many prior ALU works optimize for reversible \u0026ldquo;quantum cost\u0026rdquo; metrics or propose ALU architectures without presenting Clifford\u0026thinsp;+\u0026thinsp;T T‑count data and exhaustive formal verification.\u003c/p\u003e \u003cp\u003eA key tradeoff is \u003cb\u003equbits vs T‑count vs control flexibility\u003c/b\u003e. Designs that integrate many operations into a single \u0026ldquo;opcode‑controlled\u0026rdquo; circuit may require large reversible multiplexers/control structures, increasing depth and ancilla use. By contrast, this work\u0026rsquo;s compile‑time operation selection produces separate circuits per operation, which is a natural fit for quantum algorithms that know the operation sequence at compile time. This mirrors how many quantum arithmetic blocks (e.g., adders) are used as subroutines rather than as dynamically opcode‑decoded units.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec20\" class=\"Section2\"\u003e \u003ch2\u003eScalability and fault‑tolerant implications\u003c/h2\u003e \u003cp\u003eFrom the CDKM ripple‑carry perspective, arithmetic depth scales linearly with n, and Toffoli count scales linearly as well; thus T‑count scales linearly when Toffoli is decomposed into Clifford\u0026thinsp;+\u0026thinsp;T. However, the literature provides strong evidence that significant constant‑factor improvements are possible:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eAdder improvements: Gidney\u0026rsquo;s temporary logical‑AND yields 4n\u0026thinsp;+\u0026thinsp;O(1) T‑count adders, suggesting that the arithmetic core in this QALU could be swapped out to reduce T‑count asymptotically by ~\u0026thinsp;2\u0026times;.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eToffoli improvements: Jones shows four‑T Toffoli constructions and discusses adding controls at 4n T rather than 8n, suggesting potential reductions for both Toffoli and multi‑control flag logic.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eMCX optimizations: relative‑phase Toffoli identities can reduce T/CX counts for multicontrolled gates, relevant to Z‑flag computation.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eThus, the current QALU should be seen as a platform architecture whose arithmetic and MCX submodules can be upgraded as synthesis improves.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec21\" class=\"Section2\"\u003e \u003ch2\u003eHardware results and mitigation limitations\u003c/h2\u003e \u003cp\u003eThe IBM hardware campaign demonstrates that even for small n, circuit families that include arithmetic and multi‑controlled structures can achieve\u0026thinsp;\u0026gt;\u0026thinsp;90% correctness across exhaustive inputs. However, the lack of improvement under mthree mitigation in this particular run highlights a key limitation: mitigation efficacy depends on stable calibration, appropriate error models, and sufficient statistics; in some regimes mitigation can increase variance or amplify errors.\u003c/p\u003e \u003cp\u003eNoise suppression choices (DD and twirling) are theoretically well‑motivated (Viola\u0026ndash;Lloyd; Wallman\u0026ndash;Emerson) and exposed in IBM runtime options, but their interaction with circuit scheduling and backend specifics can produce variable outcomes.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec22\" class=\"Section2\"\u003e \u003ch2\u003eLimitations\u003c/h2\u003e \u003cp\u003eThis paper has three notable limitations:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eHardware scale is limited to n\u0026thinsp;=\u0026thinsp;2 due to qubit overhead and noise sensitivity; this is consistent with NISQ constraints and not a fundamental restriction of the design.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eT‑count is compiler‑dependent: reported T/CX/depth depend on Qiskit\u0026rsquo;s chosen decompositions and optimization passes; this is why the basis and transpilation method are explicitly defined.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e\u0026ldquo;Better than prior designs\u0026rdquo; is metric‑conditional: ancilla‑free designs may dominate qubit count; QFT designs may dominate in qubit savings for addition; and state‑of‑the‑art low‑T adders can dominate arithmetic T‑count. The principal claim here is improved end‑to‑end reproducibility, verification, and FT‑metric reporting for a broad ALU feature set.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"Conclusion","content":"\u003cp\u003eA reversible n‑bit QALU implemented in Qiskit that supports a broad classical ALU function set together with NZCV flags and explicit compare outputs is presented and evaluated. The design is verified exhaustively on Aer at n\u0026thinsp;=\u0026thinsp;4 with zero mismatches and tested on an IBM backend at n\u0026thinsp;=\u0026thinsp;2 under dynamical decoupling, gate twirling, and mthree measurement mitigation, achieving\u0026thinsp;~\u0026thinsp;91% end‑to‑end correctness. Beyond correctness, the work provides a fault‑tolerant‑metric‑aware evaluation by measuring T‑count, CX‑count, and depth after Clifford\u0026thinsp;+\u0026thinsp;T transpilation, and it situates the design within a research landscape spanning reversible computing foundations, ripple‑carry and QFT adders, and low‑T arithmetic constructions.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e \u003ch2\u003eAuthor contributions statement\u003c/h2\u003e \u003cp\u003eA.B. conceived the experiment(s), conducted the experiment(s), and analysed the results.\u003c/p\u003e \u003c/p\u003e\u003cp\u003e \u003ch2\u003eCompeting Interests\u003c/h2\u003e \u003cp\u003eThe author declares no competing interests.\u003c/p\u003e \u003c/p\u003e\u003ch2\u003eFunding\u003c/h2\u003e \u003cp\u003eDeclaration\u003c/p\u003e \u003cp\u003eThis research received no external funding.\u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eA.B. conceived the experiment(s), conducted the experiment(s), and analysed the results.\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eThe datasets generated and/or analysed during the current study are available in the Zenodo repository, [https://doi.org/10.5281/zenodo.18866275](https://github.com/AgniswarBanerjee05/Quantum-ALU) .\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eLandauer, R. \u003cem\u003eIrreversibility and Heat Generation in the Computing Process\u003c/em\u003e (IBM J. Res. Dev. [58], 1961).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBennett, C. H. \u003cem\u003eLogical Reversibility of Computation\u003c/em\u003e (IBM J. Res. Dev. [59], 1973).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eToffoli, T. \u003cem\u003eReversible Computing\u003c/em\u003e (MIT/LCS/TM\u0026ndash;151, 1980). [60].\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFredkin, E. \u0026amp; Toffoli, T. Conservative Logic. \u003cem\u003eInt. J. Theor. Phys.\u003c/em\u003e (1982). [61].\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBarenco, A. et al. Elementary gates for quantum computation. Phys. Rev. A. [48] (1995).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVedral, V., Barenco, A. \u0026amp; Ekert, A. (1995/1996). Quantum networks for elementary arithmetic operations. arXiv / Phys. Rev. A. [62].\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDraper, T. G. Addition on a Quantum Computer. arXiv:quant\u0026ndash;ph/0008033. [63] (2000).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCuccaro, S. A., Draper, T. G., Kutin, S. A. \u0026amp; Moulton, D. P. A new quantum ripple\u0026ndash;carry addition circuit. [9] (2004).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRuiz\u0026ndash;Perez, L. \u0026amp; Garcia\u0026ndash;Escartin, J. C. \u003cem\u003eQuantum arithmetic with the quantum Fourier transform\u003c/em\u003e (Quantum Inf. Process. [11], 2017).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSelinger, P. Quantum circuits of T\u0026ndash;depth one. Phys. Rev. A. [15] (2013).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJones, C. Low\u0026ndash;overhead constructions for the fault\u0026ndash;tolerant Toffoli gate. Phys. Rev. A. [14] (2013).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGidney, C. \u003cem\u003eHalving the cost of quantum addition\u003c/em\u003e (Quantum. [13], 2018).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRoss, N. J. \u0026amp; Selinger, P. Optimal ancilla\u0026ndash;free Clifford\u0026thinsp;+\u0026thinsp;T approximation of z\u0026ndash;rotations. (2014). arXiv:1403.2975. [64].\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKliuchnikov, V., Maslov, D. \u0026amp; Mosca, M. Practical/asymptotically optimal approximation of single\u0026ndash;qubit unitaries by Clifford\u0026thinsp;+\u0026thinsp;T. arXiv / PRL. [65] (2012)/2013.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWallman, J. J. \u0026amp; Emerson, J. Noise tailoring for scalable quantum computation via randomized compiling. \u003cem\u003ePhys. Rev. /\u003c/em\u003e (2016). arXiv:1512.01098. [66].\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eViola, L. \u0026amp; Lloyd, S. \u003cem\u003eDynamical suppression of decoherence in two\u0026ndash;state quantum systems\u003c/em\u003e (arXiv / Phys. Rev. A. [67], 1998).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRudinger, K. et al. Scalable mitigation of measurement errors on quantum computers (mthree\u0026ndash;related). (2021). arXiv:2108.12518. [68].\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNation, P. D. et al. Efficient measurement error mitigation for sparse outcomes. (2022). arXiv:2201.11046. [69].\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eQiskit, D. CDKMRippleCarryAdder API reference. [70].\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eQiskit Documentation. AerSimulator / matrix_product_state method references. [35].\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eQiskit Documentation. Transpile / basis_gates / optimization levels. [42].\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eQiskit Documentation. generate_preset_pass_manager / transpiler stages. [71].\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eIBM Quantum Documentation. least_busy backend selection and QPU info. [72].\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eIBM Quantum Documentation. Batch execution mode. [73].\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eIBM Quantum Documentation. Runtime options for dynamical decoupling and twirling. [74].\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eThomsen, M. K., Gl\u0026uuml;ck, R. \u0026amp; Axelsen, H. B. Reversible arithmetic logic unit for quantum arithmetic. \u003cem\u003eJ. Phys. A\u003c/em\u003e (2010). [16].\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhou, R., Shi, Y. \u0026amp; Zhang, M. Reversible arithmetic logic unit. (2011). arXiv:1107.3924. [17].\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMoallem, P., Ehsanpour, M., Bolhasani, A. \u0026amp; Montazeri, M. Optimized reversible arithmetic logic units. \u003cem\u003eJ. Electron. (China\u003c/em\u003e). (2014). [48].\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHaghparast, M. \u0026amp; Bolhassani, A. Optimization Approaches for Designing Quantum Reversible Arithmetic Logic Unit. \u003cem\u003eInt. J. Theor. Phys.\u003c/em\u003e (2016). [49].\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSlimani, A. \u0026amp; Benslama, A. Optimized 4\u0026ndash;bit Quantum Reversible Arithmetic Logic Unit. \u003cem\u003eInt. J. Theor. Phys.\u003c/em\u003e (2017). [19].\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e\u0026Ccedil;akmak, Z. et al. QFT based quantum arithmetic logic unit on IBM quantum computer. (2023). arXiv:2306.09560. [20].\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBiswal, L., Bandyopadhyay, C., Ghosh, S. \u0026amp; Rahaman, H. Fault\u0026ndash;Tolerant Implementation of QALU Using Clifford\u0026thinsp;+\u0026thinsp;T\u0026ndash;Group. Springer proceedings chapter (preview). [21] (2020)/2021.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKeshavarz, S., Reshadinezhad, M. R. \u0026amp; Moghimi, S. T\u0026ndash;count and T\u0026ndash;depth efficient fault\u0026ndash;tolerant quantum arithmetic and logic unit. Quantum Inf. Process. (preview). (2024).\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Quantum Arithmetic Logic Unit (QALU), reversible computing, Clifford+T synthesis, T-count scaling, Toffoli decomposition, CDKM ripple-carry adder, MCX zero-flag logic, Qiskit transpilation, fault-tolerant quantum computing","lastPublishedDoi":"10.21203/rs.3.rs-8959757/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8959757/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThis paper presents a reversible, n‑bit Quantum Arithmetic Logic Unit (QALU) implemented in Qiskit that supports a classical ALU‑like instruction set: ADD, SUB, CMP, AND/OR/XOR and their negations, unary NOT, shifts and rotates, and operand passthrough. The QALU outputs a result register and status flags N, Z, C, V (negative, zero, carry/no‑borrow, signed overflow) and emits explicit comparison outputs EQ, LT\u003csub\u003eu\u003c/sub\u003e, LT\u003csub\u003es\u003c/sub\u003e. The design is modular and operation‑selectable at compile time, enabling clean verification and resource accounting. To evaluate fault‑tolerant efficiency, the circuits are decomposed into a Clifford\u0026thinsp;+\u0026thinsp;T basis and measure T‑count alongside CX‑count and depth. Exhaustive verification on Aer (matrix‑product‑state simulation) confirms correctness for all input pairs for n\u0026thinsp;=\u0026thinsp;4 across 16 operations. Hardware experiments on IBM Quantum (limited to n\u0026thinsp;=\u0026thinsp;2) demonstrate end‑to‑end execution with dynamical decoupling and gate twirling, and readout mitigation via mthree. An analytical scaling discussion grounded in ripple‑carry adder theory and known Toffoli/T‑gate constructions is further provided. These results show that a practical, verifiable QALU with flags and compare can be implemented within Qiskit while exposing meaningful fault‑tolerant metrics and hardware‑realistic performance characterization. Key findings from the provided implementation artifacts: (1) Aer exhaustive verification (n\u0026thinsp;=\u0026thinsp;4): the QALU was exhaustively tested over all 2\u003csup\u003e2n\u003c/sup\u003e=256 input pairs for each of 16 operations (4096 total tests), with 0 mismatches against a classical reference model. (2) Resource metrics (n\u0026thinsp;=\u0026thinsp;4): after transpilation to a Clifford\u0026thinsp;+\u0026thinsp;T‑style basis {cx, h, s, sdg, x, z, t, tdg} at optimization level 3, the highest‑cost operations (ADD, SUB, CMP) report T‑count\u0026thinsp;=\u0026thinsp;86, CX‑count\u0026thinsp;\u0026asymp;\u0026thinsp;102\u0026ndash;106, and depth\u0026thinsp;\u0026asymp;\u0026thinsp;138 on a 24‑qubit circuit instance (including flags and MCX ancillae). (3) IBM hardware test (n\u0026thinsp;=\u0026thinsp;2): on an automatically selected IBM backend (ibm_torino) using dynamical decoupling and gate twirling plus mthree readout mitigation, an exhaustive run over all 256 circuits yielded raw correctness 91.41% and mitigated correctness 91.02% (slightly worse post‑mitigation, consistent with practical tradeoffs where mitigation noise/calibration drift can dominate at small sizes).\u003c/p\u003e","manuscriptTitle":"Design of a Fault‑Tolerant‑Metric‑Aware, Reversible n‑bit Quantum Arithmetic Logic Unit using IBM Qiskit","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-03-09 18:40:42","doi":"10.21203/rs.3.rs-8959757/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2026-04-06T06:59:24+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-04-02T23:25:59+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-04-02T09:02:56+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"145240914556240167473427470444169930590","date":"2026-03-25T14:24:30+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"129795205099693507459060556941191515148","date":"2026-03-25T08:42:23+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2026-03-25T08:31:38+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2026-03-25T03:48:39+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2026-03-09T11:49:42+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2026-03-04T23:36:45+00:00","index":"","fulltext":""},{"type":"submitted","content":"Scientific Reports","date":"2026-03-04T17:53:47+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"a251cdad-23b5-40f5-b8b5-279dcdf5c61e","owner":[],"postedDate":"March 9th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":64046141,"name":"Physical sciences/Engineering"},{"id":64046142,"name":"Physical sciences/Mathematics and computing"},{"id":64046143,"name":"Physical sciences/Physics"}],"tags":[],"updatedAt":"2026-05-14T09:41:36+00:00","versionOfRecord":[],"versionCreatedAt":"2026-03-09 18:40:42","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8959757","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8959757","identity":"rs-8959757","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.