Implementation and performance analysis of sorting algorithms on PYNQ Z2 FPGA board

doi:10.21203/rs.3.rs-9347937/v1

Implementation and performance analysis of sorting algorithms on PYNQ Z2 FPGA board

2026 · doi:10.21203/rs.3.rs-9347937/v1

preprint OA: closed

Full text JSON View at publisher

Full text 116,836 characters · extracted from preprint-html · click to expand

Implementation and performance analysis of sorting algorithms on PYNQ Z2 FPGA board | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Implementation and performance analysis of sorting algorithms on PYNQ Z2 FPGA board Annapoorna A, Dhanush H A, Radha R C, Lavanya R Gejji, Chethana Mohan This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9347937/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Sorting plays a crucial role in many real world applications, especially with the rapid growth of data in modern industries. As data volume increases, traditional CPU based sorting methods may not always meet the required speed and efficiency. To address this challenge, hardware based sorting accelerators using FPGA platforms can significantly improve performance through parallel processing and dedicated logic. This work focuses on the design and analysis of multiple hardware implementations of sorting algorithms including Bubble Sort, Insertion Sort, Selection Sort, Quick Sort, and Radix Sort. The proposed designs are implemented and demonstrated on an FPGA to evaluate their real time performance. Each algorithm is analyzed in terms of hardware architecture, time and space complexity, and clock cycles for best and worst cases. The study highlights the advantages of FPGA based sorting over CPU based approaches for high speed applications. FPGA PYNQ-Z2 AXI-Lite Bubble Sort Insertion Sort Selection Sort Quick sort Radix Sort LUT Utilization Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 1. Introduction Sorting plays a crucial role in various fields such as signal processing, database systems, and embedded applications, where efficient data organization is essential for performance. As the volume of data continues to grow, achieving fast and reliable sorting becomes increasingly challenging using general-purpose processors[ 1 ]. This is mainly due to the sequential execution nature of CPUs, along with their dependency on operating systems for resource allocation[ 4 ]. Additionally, the presence of other running applications can further impact performance, making CPU based sorting less suitable for time-critical tasks. To address these limitations, hardware based solutions such as Field Programmable Gate Arrays (FPGAs) can be utilized. FPGAs offer flexibility by allowing the hardware to be programmed according to specific application requirements. They also support parallel processing, which enables faster execution compared to traditional CPU based approaches. In this work, multiple sorting algorithms are implemented on an FPGA platform to evaluate their performance in a hardware environment. The study focuses on comparing different architectures based on parameters such as resource utilization, best case and worst case execution cycles, and overall efficiency. This analysis helps in identifying suitable sorting techniques for high speed and real time applications based on users requirements. 2. Literature survey Existing research on FPGA based sorting spans lightweight architectures for edge devices, high throughput heterogeneous systems, and parallel comparator based designs tailored for real time workloads. Norollah, Kazemi, and Beitollahi [ 27 ] present ULP Sorter , an ultra low power architecture based on the Multi Dimensional Sorting Algorithm. Their approach significantly reduces hardware cost, achieving up to 70% fewer LUTs, 35.7% fewer registers, and nearly 48.7% lower power consumption, making it well suited for area and energy constrained edge computing platforms. A related study by the same authors extends this idea using a modular comparator scheme to further reduce switching activity and memory overhead while maintaining stable throughput, reinforcing the suitability of lightweight FPGA sorters for IoT and edge AI workloads. High throughput sorting for large scale data is explored by Zhang [ 14 ], Chen[ 30 ], and Prasanna[ 13 ], who propose a heterogeneous CPU FPGA architecture where the FPGA performs sub block sorting through a pipelined Merge Sort Accelerator, and the CPU merges the partial results. Their design achieves up to 3.9× the throughput of FPGA only systems, demonstrating the advantage of hybrid platforms in resolving memory bottlenecks while improving scalability and resource efficiency. A complementary hardware software co design by Petrut, Amaricai, and Boncalo [ 22 ] follows a similar philosophy. Their parameterized merge sorter supports configurability across dataset sizes and offers improved performance compared to software only solutions, highlighting the importance of combining FPGA acceleration with flexible software control. Parallelism driven FPGA sorting also remains a major research direction. Lipu et al [ 25 ] propose a Bubble Sort architecture built around parallel comparator swap units, enabling higher speed and lower latency than software implementations. Abdelrasoul, Shaban, and Abdel Kader [ 26 ] also demonstrate that pipelined comparator networks mapped onto FPGA fabric achieve significant gains in throughput and real time performance compared to CPU based sorting. These works consistently show that structured parallel hardware delivers deterministic timing and superior efficiency. Several studies investigate domain specific sorting needs. Zhao [ 28 ] and colleagues design a streaming sorting network to accelerate the Burrows Wheeler Transform (BWT) in lossless compression. By processing data in a pipelined streaming fashion, their architecture reduces latency and execution time, offering improved compression performance over conventional CPU methods. Packet level sorting is addressed by Jiang, Cao, and Wu [ 24 ], who leverage chip RAM in place of external memory to reduce bandwidth constraints. Their design delivers higher throughput and lower latency, making it suitable for networking, particle detection, and other high speed acquisition systems. Finally, Reddy, Prashanth, and Bachu [ 21 ] propose an FPGA architecture for max/min selection without fully sorting the dataset. Using comparator trees and inherent hardware parallelism, their design provides low latency, resource efficient extraction of extremum values, which is useful in real time embedded and IoT applications where complete sorting is unnecessary. Collectively, these studies highlight the versatility of FPGA based sorting architectures from ultra low power edge devices and hybrid CPU FPGA platforms to domain specific accelerators and massively parallel comparator driven designs. They consistently demonstrate that FPGAs offer deterministic timing, high throughput, and superior energy efficiency compared to traditional CPU based sorting, motivating the hardware accelerated approach followed in this project. 3. Proposed Hardware Based Sorting Framework The proposed work develops a complete simulation driven hardware framework designed to study the behavior, efficiency, and implementation characteristics of multiple sorting algorithms on FPGA oriented architectures. At the front end of the system, a pseudorandom number (PRN) generator produces a continuous stream of unbiased input values [ 12 ]. These values are temporarily stored in an asynchronous FIFO, which acts as a buffering layer to decouple data generation from data processing [ 10 ]. The FIFO ensures smooth, reliable transfer of data into the sorting modules while maintaining proper timing synchronization, even when the operating clock frequency of the generating and consuming modules are different [ 14 ]. Within this framework, five distinct sorting techniques Bubble Sort, Selection Sort, Insertion Sort, Radix Sort, and Quick Sort are implemented as standalone Verilog modules. Each module retrieves data from the FIFO and executes sorting operations based on its internal hardware structure and algorithmic logic [ 16 ]. Because the modules are independent, it becomes possible to examine, in a controlled manner, how each algorithm translates into hardware, how it responds to random input patterns, and how architectural differences influence speed, cycle count, and overall hardware efficiency. This modular design also simplifies comparative analysis by ensuring that all algorithms operate under identical input conditions. Functional simulations validate the correct operation of each subsystem. The PRN generator is tested for continuity, randomness, and stable timing; the FIFO is evaluated for correct read/write behavior, pointer management, and boundary condition handling; and each sorting module is verified by observing the transition from an unsorted sequence to a correctly ordered output[ 11 ]. After simulation, post synthesis timing analysis is performed to determine the maximum operating frequency and estimate worst and best cases with respect to the frequency of operation for each algorithm. Resource utilization reports covering LUT usage and flip flop count and thus provide an insight into the hardware cost of each design. Together, these analyses form a comprehensive comparative study of Bubble Sort, Selection Sort, Insertion Sort, Radix Sort, and Quick Sort in a hardware centric environment [ 18 ] [ 19 ]. The framework enables early evaluation of algorithm suitability for FPGA implementation, providing clear evidence of which techniques offer the best balance of resource efficiency and structural simplicity before deploying the system to an actual hardware platform. 4. Methodology for Hardware Based Sorting Evaluation The methodology adopted in this work follows a systematic RTL design, verification, and analysis workflow that ensures each component of the system is thoroughly tested. The first stage of the methodology involves the development of a pseudorandom number (PRN) as shown in Fig. 1 generator in Verilog. This module is responsible for supplying a continuous stream of varied input values (which are generated according to the triggered input seed value) to the sorting algorithms. Functional simulations are carried out to verify the correctness of the generated sequences, confirm the absence of repeating patterns within short intervals, and ensure that timing and output stability are maintained across different simulation cycles[ 26 ]. The second stage focuses on the design and verification of a dual port asynchronous FIFO shown in Fig. 2 , which acts as a buffer between the PRN generator and the sorting modules[ 15 ]. The FIFO enables clean separation between data generation and data consumption, ensuring reliable operation even when read and write activity occurs at different rates [ 27 ]. The FIFO’s control logic including read and write pointer management, empty and full flag generation, and synchronization circuitry is carefully implemented to prevent overflow, underflow, and metastability[ 29 ]. Simulation based testing is used to validate boundary conditions and confirm that the FIFO behaves correctly under varying data transfer scenarios[ 30 ]. In the third stage, five sorting algorithms Bubble Sort, Selection Sort, Insertion Sort, Radix Sort, and Quick Sort are implemented as synthesizable RTL modules. Each module is designed with a clean and modular interface to support independent development and functional testing[ 16 ] [ 17 ]. Testbenches are created for each sorting algorithm to simulate behavior under controlled random inputs. These functional simulations verify that each module produces correctly sorted outputs and provide insight into the algorithmic flow and hardware mapping of each design [ 18 ]. Once functional verification is complete, the fourth stage involves synthesis and timing analysis for each sorting module. Synthesis reports are examined to identify resource utilization in terms of LUT count, flip flops, memory elements, and combinational logic. Timing analysis highlights critical paths and determines the maximum achievable operating frequency, allowing evaluation of each design’s latency and performance characteristics [ 19 ]. These metrics together provide a detailed comparison of hardware efficiency and computational behavior. In the final stage, all modules the PRN generator, asynchronous FIFO, and sorting units are integrated into a unified system level test environment. The complete data flow from random data generation to buffering and final sorting is simulated to ensure correct operation across the entire system. System level validation confirms module interoperability and consistent behavior under continuous random data input. Through this structured and layered methodology, the project achieves a thorough hardware oriented evaluation of multiple sorting algorithms. 5. Sorting Algorithms and Performance Characteristics This section presents the five sorting algorithms evaluated in the proposed hardware framework. Each algorithm is described in terms of its operating principle and hardware behaviour, followed by measured best case and worst-case clock cycle performance obtained from simulation are recorded in Table 1. Table 1 Best and Worst Clock cycle Algorithm Clock Cycles used in Best case Clock cycles used in Worst case Remarks Bubble Sort 10 38 Simple but inefficient for large datasets; stable and in place. Insertion Sort 24 52 Efficient for small or nearly sorted data; stable and in place. Quick Sort 48 70 Very fast on average; unstable; recursive and highly parallelizable. Selection Sort 59 59 Predictable comparisons, simple, minimal data movement. Radix Sort 34 34 Radix Sort is fast and stable but needs extra memory and slows down with many digits. Bubble Sort Bubble Sort repeatedly compares adjacent elements and swaps them when they are in the wrong order. With each pass, the largest element moves gradually toward the end of the list, giving the algorithm its characteristic “bubbling” behaviour [1]. Although simple to design in hardware, Bubble Sort performs many redundant comparisons, which makes it inefficient for larger datasets. In simulation, Bubble Sort achieves 10 clock cycles in the best case and 38 clock cycles in the worst case. While it is stable and in place, its high comparison count limits its suitability for high performance hardware sorting [1, 2]. Table 2 shows that Bubble Sort design uses only a very small portion of the FPGA resources, with LUTs, registers, and multiplexers all well under one percent of the device’s capacity. Table 2 Bubble sort Resource Utilization report Resource Type Used Available Utilization % Slice LUT’s 156 53200 0.29 LUT as logic 156 53200 0.29 LUT as Memory 0 17400 0.00 Slice Registers 142 106400 0.13 Registers as Flipflop 142 106400 0.13 Registers as Latches 0 106400 0.00 F7 Muxes 16 26600 0.06 F8 Muxes 0 13300 0.00 Insertion Sort Insertion Sort constructs the sorted list incrementally. Each new element from the unsorted portion is inserted into its correct position within the sorted region. Because it avoids unnecessary comparisons when the input is nearly ordered, Insertion Sort performs efficiently on small or partially sorted datasets [2, 3]. The hardware implementation recorded 24 clock cycles in the best case and 52 clock cycles in the worst case. Its stable, in place nature and predictable behaviour make it suitable for lightweight hardware deployments. Table 3 shows that the Insertion Sort design also occupies a very small share of the FPGA’s resources, with minimal LUT and register usage and no memory or multiplexer utilization . Table 3 Insertion sort Resource Utilization report Resource Type Used Available Utilization % Slice LUT’s 16 53200 0.03 LUT as logic 16 53200 0.03 LUT as Memory 0 17400 0.00 Slice Registers 144 106400 0.14 Registers as Flipflop 144 106400 0.14 Registers as Latches 0 106400 0.00 F7 Muxes 0 26600 0.00 F8 Muxes 0 13300 0.00 Quick Sort Quick Sort follows a divide and conquer strategy by selecting a pivot, partitioning the dataset into two groups, and recursively sorting each partition. The algorithm generally delivers strong performance, but its efficiency depends on pivot selection [2–4]. In hardware simulations, Quick Sort completed in 48 clock cycles in the best case and 70 clock cycles in the worst case. Although it is not stable, its parallelizable structure makes it attractive for high performance FPGA implementations. Table 4 shows that Quick Sort uses noticeably more FPGA resources than Bubble Sort and Insertion Sort, with higher LUT and register usage, while remaining well within the device’s capacity. Table 4 Quick sort Resource Utilization report Resource Type Used Available Utilization % Slice LUT’s 1448 53200 2.72 LUT as logic 1448 53200 2.72 LUT as Memory 0 17400 0.00 Slice Registers 880 106400 0.83 Registers as Flipflop 880 106400 0.83 Registers as Latches 0 106400 0.00 Radix Sort Radix Sort performs sorting digit by digit, beginning with the least significant digit. By grouping data according to digit values rather than performing comparisons, the algorithm achieves predictable and high throughput for integer-based datasets [3, 4]. Because the number of digit passes is fixed, both best case and worst case execution times are identical at 34 clock cycles. Radix Sort is fast and stable, but it demands additional memory and may exhibit slower performance when processing data with many digit positions [1]. Table 5 shows that Radix Sort requires more resources than Bubble and Insertion Sort but remains lighter than Quick Sort, using a modest amount of LUTs and registers while staying well within the FPGA’s limits. Table 5 Radix sort Resource Utilization report Resource Type Used Available Utilization % Slice LUT’s 869 53200 1.63 LUT as logic 869 53200 1.63 LUT as Memory 0 17400 0.00 Slice Registers 263 106400 0.25 Registers as Flipflop 263 106400 0.25 Registers as Latches 0 106400 0.00 F7 Muxes 6 26600 0.02 F8 Muxes 0 13300 0.00 Selection Sort Selection Sort repeatedly identifies the smallest element in the unsorted region and swaps it into the correct position. Since the same number of comparisons is performed regardless of input order, its behaviour is highly predictable [8]. Both the best case and worst-case latencies are 59 clock cycles. Despite its simplicity and minimal data movement, Selection Sort is not efficient for large datasets due to its quadratic time complexity [9]. Table 6 shows that Selection Sort uses a moderate amount of LUTs and registers compared to the simpler algorithms, but its overall utilization remains very low relative to the FPGA’s available resources. Table 6 Selection sort Resource Utilization report Resource Type Used Available Utilization % Slice LUT’s 355 53200 0.67 LUT as logic 355 53200 0.67 LUT as Memory 0 17400 0.00 Slice Registers 228 106400 0.21 Registers as Flipflop 228 106400 0.21 Registers as Latches 0 106400 0.00 F7 Muxes 0 26600 0.00 F8 Muxes 0 13300 0.00 Comparison of FPGA Resource Usage with Related Literature To evaluate the efficiency of the proposed sorting architectures, a comparative study was conducted against results reported in relevant reference papers. The comparison focuses primarily on the usage of Look-Up Tables (LUTs) and flip-flops (FFs), as these metrics directly reflect the hardware cost, complexity, and scalability of each design when implemented on FPGA platforms. LUTs represent the combinational logic required for datapath and control operations, while flip-flops account for the sequential elements such as registers, pipeline stages, and finite state machines [23]. The sorting architectures developed in this work demonstrate competitive and often more optimized resource utilization when compared with the designs documented in literature. This improvement is attributed to streamlined control logic, efficient dataflow organization, and minimized shifting or swapping operations in the hardware implementation. A detailed comparison of LUT and FF usage between the proposed designs and existing reference [26] works is presented in Table 7, providing a clear perspective on how the implemented modules align with or exceed the efficiency of previously published architectures. Table 7 Resource Utilization of Proposed Sorting Architectures Sorting Algorithm Slices (LUT + FF) Slices (LUT + FF) [26] Bubble sort 298 Serial: 723 ; Parallel: 672 Selection sort 583 730 Insertion sort 435 802 Quick sort 2328 – Radix sort 1132 – 6. Experimental Results Table 8 shows that the PRNG design uses only a tiny portion of the FPGA’s resources. The logic and register usage is very low, and there’s no memory or multiplexer utilization. Overall, the design is lightweight and leaves plenty of space for additional modules. Figure 3 shows the output in form of waveform of the PRNG module. Table 8 PRNG Resource Utilization Resource Type Used Available Utilization % Slice LUT’s 16 53200 0.03 LUT as logic 16 53200 0.03 LUT as Memory 0 17400 0.00 Slice Registers 144 106400 0.14 Registers as Flipflop 144 106400 0.14 Registers as Latches 0 106400 0.00 F7 Muxes 0 26600 0.00 F8 Muxes 0 13300 0.00 7. Implementation of a sorting module alone on PYNQ-Z2 The sorting module from the main RTL design was packaged as a memory-mapped AXI-Lite custom IP. As shown in Fig. 4 , the sorting logic is directly connected to the AXI-Lite interface, allowing the processor to trigger the sort operation, provide input data, and read the final output. This approach exposes only the required sorting functionality to the PS, keeping the design simple and efficient. The IP contains six 32-bit registers mapped within the address range 0x43C0_0000–0x43C0_FFFF, as shown in Fig. 5 . slv_reg0 triggers sorting, slv_reg1 and slv_reg2 store the 64-bit input data, slv_reg3 reports completion via the sdone bit, and slv_reg4 and slv_reg5 hold the sorted result. This mapping provides a clean and accessible structure for software-controlled sorting. After synthesis, the full design was programmed onto the PYNQ-Z2 board as a custom overlay. In the PYNQ framework, the overlay allows the ARM processor to interact with the FPGA logic through Python. Using the pynq.MMIO class, the processor accessed the sorting IP at base address 0x43C0_0000 with a memory range of 0x10000. The Python script accepted user input, packed the 64-bit data into slv_reg1 and slv_reg2, generated a pulse on slv_reg0 to start the sorting, and continuously polled the sdone bit in slv_reg3 to detect completion. Once the sorting finished, the script read the lower and upper 32-bit outputs from slv_reg4 and slv_reg5 and unpacked them to display the final sorted sequence as shown in Fig. 6 . This workflow provided a seamless bridge between Python software and the hardware accelerator running on the FPGA. 8. Conclusion This work examined the behaviour and FPGA suitability of several well-known sorting algorithms—including bubble sort, insertion sort, selection sort, quick sort, and radix sort—without running them in real time. The analysis showed that simple comparison-based methods like bubble and selection sort are easy to map onto hardware but scale poorly, while insertion sort performs better for nearly sorted inputs. Quick sort, despite its software efficiency, is difficult to implement on FPGA due to recursion and irregular memory access patterns. Radix sort stood out as the most hardware-friendly approach, as its digit-wise, non-comparative nature aligns well with parallel processing and deterministic pipeline structures in FPGAs. The implementation of the custom sorting IP on the PYNQ-Z2 platform demonstrated an effective hardware–software co-design flow. Using an AXI-Lite interface and the PYNQ overlay, the ARM processor communicated seamlessly with the FPGA logic through Python and MMIO access. Deploying the sorting module as a standalone IP validated the functionality on actual hardware while providing a flexible foundation for further optimization and future development of more advanced data-processing accelerators. Declarations Author Contribution A- Annapoorna AD- Dhanush H AR- Radha R CL- Lavanya R GejjiC- Chethana MohanA.D. and D.R. designed the hardware architecture and contributed to writing the results and observations sections of the manuscript. L.C. performed FPGA integration and contributed to writing the results and observations. R.L. reviewed and edited the manuscript for technical accuracy and clarity. All authors contributed to the final evaluation of the work and approved the manuscript for submission. References Vijay, R., Jha, L., & Gupta, G. Performance and analysis of sorting algorithms for random data input, 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT), Dhaka, Bangladesh, 2019, pp. 1–6. 10.1109/ICASERT.2019.8934789 Faujdar, N., & Ghrera, S. P., Analysis and Testing of Sorting Algorithms on a Standard Dataset, 2015 Fifth International Conference on Communication Systems and Network Technologies , Gwalior, India, 2015, pp. 962–967. 10.1109/CSNT.2015.98 Fenyi, A., Fosu, M., & Appiah, B. (2020). Comparative Analysis of Comparison and Non Comparison based Sorting Algorithms. International Journal of Computer Applications , 175 , 22–25. 10.5120/ijca2020920813 Buradagunta, S., Bodapati, J. D., Mundukur, N. B., & Salma, S. (2020). Performance comparison of sorting algorithms with random numbers as inputs. Ingénierie des Systèmes d’Information, 25, 1, pp. 113–117. https://doi.org/10.18280/isi.250115 Alkharabsheh, K., Alturani, I., Alturani, A., & Zanoon, D. N. (2013). Review on Sorting Algorithms A Comparative Study . International Journal of Computer Science and Security (IJCSS). Sundaramoorthy, S., & Karunanidhi, G. (2025). A systematic analysis on performance and computational complexity of sorting algorithms. Discover Computing , 28 . 10.1007/s10791-025-09724-w Zhu, Z. G. (2020). Analysis and Research of Sorting Algorithm in Data Structure Based on C Language. Journal of Physics: Conference Series. 1544. 012002. 10.1088/1742–6596/1544/1/012002 Mohammadagha, M. (2025). Hybridization and Optimization Modeling, Analysis, and Comparative Study of Sorting Algorithms: Adaptive Techniques, Parallelization, for Mergesort, Heapsort, Quicksort, Insertion Sort, Selection Sort, and Bubble Sort. 10.31224/4537 Li, X., Zhou, L., & Zhu, Y. A. (2025). Scalable Sorting Network Based on Hybrid Algorithms for Accelerating Data Sorting. Electronics 14, 579. https://doi.org/10.3390/electronics14030579 . FPGA BASED 64-BIT TRUE RANDOM NUMBER GENERATOR - Bonala Purushotham Karee Manish, Vankala Bhanu Prakash, Sudhir Dakey. Poojari, A., & Nagesh, H. R. (2021). FPGA implementation of random number generator using LFSR and scrambling algorithm for lightweight cryptography. International Journal of Applied Science and Engineering , 18 , 1–9. 10.6703/IJASE.202112_18(6).001 Karataş, O., & Ergün, S. A Digital Random Number Generator Based on Four Regional Examination of Double Scroll Chaos, 2022 IEEE 13th Latin America Symposium on Circuits and System (LASCAS) , Puerto Varas, Chile, 2022, pp. 1–4. 10.1109/LASCAS53948.2022.9789090 Vijayaraghavan, P., & Amutha, R. ASIC vs FPGA Realization of a Cryptographically Secure Random Number Generator Using Chaotic Map, 2025 IEEE 5th International Conference on VLSI Systems, Architecture, Technology and Applications (VLSI SATA) , Bangalore, India, 2025, pp. 1–6. 10.1109/VLSISATA65374.2025.11070201 Zhang, Z. (2023). Optimization of Asynchronous FIFO Design Difficulties Using Verilog HDL. Highlights in Science. Engineering and Technology , 38 , 956–964. 10.54097/hset.v38i.5982 Xu, Y. (2023). Asynchronous FIFO Design Based on Verilog. Highlights in Science. Engineering and Technology , 38 , 965–970. 10.54097/hset.v38i.5983 Sklyarov, V., Skliarova, I., & Sudnitson, A. (2014). FPGA-based Accelerators for Parallel Data Sort. Applied Computer Systems 16. 10.1515/acss-2014-0013 . Marcelino, R., Neto, H., & Cardoso, J. (2008). Sorting units for FPGA-Based embedded systems. IFIP International Federation for Information Processing , 271 , 11–22. 10.1007/978-0-387-09661-2_2 ben jmaa, Y., Atitallah, R., Duvivier, D., & Jemaa, M. (2019). A Comparative Study of Sorting Algorithms with FPGA Acceleration by High Level Synthesis. Computación y Sistemas. 23. 10.13053/cys-23-1-2999 Kobayashi, R., Miura, K., Fujita, N., Boku, Taisuke, Amagasa, & Toshiyuki (2022). An Open-source FPGA Library for Data Sorting. Journal of Information Processing , 30 , 766–777. 10.2197/ipsjjip.30.766 Norollah, A., Kazemi, Z., & Beitollahi, H., An Efficient Sorting Architecture for Area and Energy Constrained Edge Computing Devices, 2019 International Conference on High Performance Computing & Simulation (HPCS) , Dublin, Ireland, 2019, pp. 455–462. 10.1109/HPCS48598.2019.9188237 Reddy, P. S., Prashanth, A., & Bachu, S., An FPGA based Scheme for Real-Time Max/Min-Set-Selection Sorters, 2024 1st International Conference on Cognitive, Green and Ubiquitous Computing (IC-CGU) , Bhubaneswar, India, 2024, pp. 1–5. 10.1109/IC-CGU58078.2024.10530671 Petrut, P. C., Amaricai, A., & Boncalo, O. Configurable FPGA architecture for hardware-software merge sorting, 2016 MIXDES – 23rd International Conference Mixed Design of Integrated Circuits and Systems , Lodz, Poland, 2016, pp. 179–182. 10.1109/MIXDES.2016.7529727 Preethi, P., Ulla, M., Sapna, R., Devadas, R., Pavithra, N., & Manasa, C. M. (2025). Designing Low-Power Hardware Merge Sorters Using Clock Gating for IoT Applications. 234–238. 10.1109/INCIP64058.2025.11019960 Jiang, W., Cao, P., Wu, Y., & Efficient Data Packet Sorting Method Based on Field-Programmable Gate Arrays On-Chip Random Access Memory,. 2024 4th Asia-Pacific Conference on Communications Technology and Computer Science (ACCTCS) , Shenyang, China, 2024, pp. 247–251. 10.1109/ACCTCS61748.2024.00051 Lipu, A. R. (2016). Exploiting parallelism for faster implementation of Bubble sort algorithm using FPGA. 2nd International Conference on Electrical, Computer & Telecommunication Engineering (ICECTE) (2016): 1–4. Abdelrasoul, M., Shaban, A. S., & Abdel-Kader, H., FPGA Based Hardware Accelerator for Sorting Data, 2021 9th International Japan-Africa Conference on Electronics, Communications, and Computations (JAC-ECC) , Alexandria, Egypt, 2021, pp. 57–60. 10.1109/JAC-ECC54461.2021.9691432 Norollah, A., Kazemi, Z., Beitollahi, H., & Hély, D. (May 2022). Hardware Support for Efficient and Low-Power Data Sorting in Massive Data Application: The 3-D Sorting Method, IEEE Consumer Electronics Magazine , 11, 3, 87–94, 1 doi: 10.1109/MCE.2021.3076979 . Zhao, B., Li, Y., Wang, Y., & Yang, H. Streaming sorting network based BWT acceleration on FPGA for lossless compression, 2017 International Conference on Field Programmable Technology (ICFPT) , Melbourne, VIC, Australia, 2017, pp. 247–250. 10.1109/FPT.2017.8280152 Long, Z., Zhang, Z., & FPGA-Based Collaborative Hardware Sorting Unit for Embedded Data Processing System,. 2017 10th International Conference on Intelligent Computation Technology and Automation (ICICTA) , Changsha, China, Oct. 2017, pp. 65–69. 10.1109/ICICTA.2017.65 sWang, Y., Han, Y., Chen, J., Wang, Z., & Zhong, Y. (2023). An FPGA-Based Hardware Low-Cost, Low-Consumption Target-Recognition and Sorting System. World Electr Veh J , 14 , 245. https://doi.org/10.3390/wevj14090245 Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9347937","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":619116791,"identity":"3e81a7ff-cf2c-498b-8177-38c885da3ab4","order_by":0,"name":"Annapoorna A","email":"","orcid":"","institution":"B.M.S. College of Engineering","correspondingAuthor":false,"prefix":"","firstName":"Annapoorna","middleName":"","lastName":"A","suffix":""},{"id":619116792,"identity":"ea056056-e84a-4368-8805-d6480b8cdbe5","order_by":1,"name":"Dhanush H A","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABAUlEQVRIiWNgGAWjYDCCG2DyABAzH3zwsQEuLkGMFrZkw5kkauExk+ZtwK0QDvhuNz97zPPnDgM/+wEDadsd2+z52xsYP/xgsMjDpUXyzjFzY962ZwySPQkJxrlnbifOOHOAWbKHQaIYlxaDGwkg9xwGMhgOJOe23U4wkEhgkAb6JRGXIw1upH+T5vlzmMH+BmPDYcu22/YG8g+Yf+PXkmMmzcMGtEWCmbGZse024wYJBja8tkjeyCmTnNt2mEfiTBozYy/YL4ltlj0GuLXw3UjfJvHmz2E5/vbz33/83HEbGGKHD9/4UVGHUwsM8CCxGYGKDQioHwWjYBSMglGAFwAA4HpaZBAszT8AAAAASUVORK5CYII=","orcid":"","institution":"B.M.S. College of Engineering","correspondingAuthor":true,"prefix":"","firstName":"Dhanush","middleName":"H","lastName":"A","suffix":""},{"id":619116793,"identity":"fddcd531-9197-493d-83d8-15a1b975cc01","order_by":2,"name":"Radha R C","email":"","orcid":"","institution":"B.M.S. College of Engineering","correspondingAuthor":false,"prefix":"","firstName":"Radha","middleName":"R","lastName":"C","suffix":""},{"id":619116794,"identity":"48c203f7-425a-494f-8417-b947a52439a4","order_by":3,"name":"Lavanya R Gejji","email":"","orcid":"","institution":"B.M.S. College of Engineering","correspondingAuthor":false,"prefix":"","firstName":"Lavanya","middleName":"R","lastName":"Gejji","suffix":""},{"id":619116795,"identity":"b60538ac-cc17-4622-8e8c-1d5a404dba7b","order_by":4,"name":"Chethana Mohan","email":"","orcid":"","institution":"B.M.S. College of Engineering","correspondingAuthor":false,"prefix":"","firstName":"Chethana","middleName":"","lastName":"Mohan","suffix":""}],"badges":[],"createdAt":"2026-04-07 16:54:33","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-9347937/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9347937/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":106401061,"identity":"08162764-062c-436f-9a58-66fbbec0f198","added_by":"auto","created_at":"2026-04-08 08:44:02","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":68167,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003ePseudo Random Number Generator module\u003c/em\u003e\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-9347937/v1/2674e03b6a39ef092162c045.png"},{"id":106401002,"identity":"8a1de904-67a3-4c9d-867c-151377692874","added_by":"auto","created_at":"2026-04-08 08:43:42","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":51494,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eAsynchronous FIFO module\u003c/em\u003e\u003c/p\u003e","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-9347937/v1/7cf4374aef2bf5e13e588c4e.png"},{"id":106401064,"identity":"a9beb90d-374b-4332-8266-cf51dcb466bf","added_by":"auto","created_at":"2026-04-08 08:44:02","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":384845,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eSimulated waveform results for PRNG\u003c/em\u003e\u003c/p\u003e","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-9347937/v1/c9c558ddbe14808aff24d055.png"},{"id":106401058,"identity":"13e2090c-f898-4a26-acd2-af36f63c823f","added_by":"auto","created_at":"2026-04-08 08:43:59","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":250000,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eAXI-Lite Custom IP (sort_ip_0)\u003c/em\u003e\u003c/p\u003e","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-9347937/v1/8895385f7e765457c5bb83ab.png"},{"id":106401060,"identity":"4efc4160-199e-4f61-9299-64a7bd59ed0d","added_by":"auto","created_at":"2026-04-08 08:44:00","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":47684,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eAddress Map\u003c/em\u003e\u003c/p\u003e","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-9347937/v1/dd202d489468fbfc5900bf7b.png"},{"id":106400999,"identity":"0729cb87-654c-4c4a-9345-59eb14134eb0","added_by":"auto","created_at":"2026-04-08 08:43:41","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":349621,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003ePython Interface Output for Insertion Sort on Pynq-Z2\u003c/em\u003e\u003c/p\u003e","description":"","filename":"floatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-9347937/v1/a1939dc26006a343f86f9d62.png"},{"id":106401217,"identity":"33d374aa-9780-45d5-91b9-d267009dfe11","added_by":"auto","created_at":"2026-04-08 08:44:42","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1828275,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9347937/v1/ab2ff47d-5f34-4b83-a485-b3e0adcb8567.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Implementation and performance analysis of sorting algorithms on PYNQ Z2 FPGA board","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eSorting plays a crucial role in various fields such as signal processing, database systems, and embedded applications, where efficient data organization is essential for performance. As the volume of data continues to grow, achieving fast and reliable sorting becomes increasingly challenging using general-purpose processors[\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. This is mainly due to the sequential execution nature of CPUs, along with their dependency on operating systems for resource allocation[\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. Additionally, the presence of other running applications can further impact performance, making CPU based sorting less suitable for time-critical tasks.\u003c/p\u003e \u003cp\u003eTo address these limitations, hardware based solutions such as Field Programmable Gate Arrays (FPGAs) can be utilized. FPGAs offer flexibility by allowing the hardware to be programmed according to specific application requirements. They also support parallel processing, which enables faster execution compared to traditional CPU based approaches.\u003c/p\u003e \u003cp\u003eIn this work, multiple sorting algorithms are implemented on an FPGA platform to evaluate their performance in a hardware environment. The study focuses on comparing different architectures based on parameters such as resource utilization, best case and worst case execution cycles, and overall efficiency. This analysis helps in identifying suitable sorting techniques for high speed and real time applications based on users requirements.\u003c/p\u003e"},{"header":"2. Literature survey","content":"\u003cp\u003eExisting research on FPGA based sorting spans lightweight architectures for edge devices, high throughput heterogeneous systems, and parallel comparator based designs tailored for real time workloads. Norollah, Kazemi, and Beitollahi [\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e] present \u003cem\u003eULP Sorter\u003c/em\u003e, an ultra low power architecture based on the Multi Dimensional Sorting Algorithm. Their approach significantly reduces hardware cost, achieving up to 70% fewer LUTs, 35.7% fewer registers, and nearly 48.7% lower power consumption, making it well suited for area and energy constrained edge computing platforms. A related study by the same authors extends this idea using a modular comparator scheme to further reduce switching activity and memory overhead while maintaining stable throughput, reinforcing the suitability of lightweight FPGA sorters for IoT and edge AI workloads.\u003c/p\u003e \u003cp\u003eHigh throughput sorting for large scale data is explored by Zhang [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e], Chen[\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e], and Prasanna[\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e], who propose a heterogeneous CPU FPGA architecture where the FPGA performs sub block sorting through a pipelined Merge Sort Accelerator, and the CPU merges the partial results. Their design achieves up to 3.9\u0026times; the throughput of FPGA only systems, demonstrating the advantage of hybrid platforms in resolving memory bottlenecks while improving scalability and resource efficiency. A complementary hardware software co design by Petrut, Amaricai, and Boncalo [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e] follows a similar philosophy. Their parameterized merge sorter supports configurability across dataset sizes and offers improved performance compared to software only solutions, highlighting the importance of combining FPGA acceleration with flexible software control. Parallelism driven FPGA sorting also remains a major research direction. Lipu et al [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e] propose a Bubble Sort architecture built around parallel comparator swap units, enabling higher speed and lower latency than software implementations. Abdelrasoul, Shaban, and Abdel Kader [\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e] also demonstrate that pipelined comparator networks mapped onto FPGA fabric achieve significant gains in throughput and real time performance compared to CPU based sorting. These works consistently show that structured parallel hardware delivers deterministic timing and superior efficiency.\u003c/p\u003e \u003cp\u003eSeveral studies investigate domain specific sorting needs. Zhao [\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e] and colleagues design a streaming sorting network to accelerate the Burrows Wheeler Transform (BWT) in lossless compression. By processing data in a pipelined streaming fashion, their architecture reduces latency and execution time, offering improved compression performance over conventional CPU methods. Packet level sorting is addressed by Jiang, Cao, and Wu [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e], who leverage chip RAM in place of external memory to reduce bandwidth constraints. Their design delivers higher throughput and lower latency, making it suitable for networking, particle detection, and other high speed acquisition systems. Finally, Reddy, Prashanth, and Bachu [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e] propose an FPGA architecture for max/min selection without fully sorting the dataset. Using comparator trees and inherent hardware parallelism, their design provides low latency, resource efficient extraction of extremum values, which is useful in real time embedded and IoT applications where complete sorting is unnecessary.\u003c/p\u003e \u003cp\u003eCollectively, these studies highlight the versatility of FPGA based sorting architectures from ultra low power edge devices and hybrid CPU FPGA platforms to domain specific accelerators and massively parallel comparator driven designs. They consistently demonstrate that FPGAs offer deterministic timing, high throughput, and superior energy efficiency compared to traditional CPU based sorting, motivating the hardware accelerated approach followed in this project.\u003c/p\u003e"},{"header":"3. Proposed Hardware Based Sorting Framework","content":"\u003cp\u003eThe proposed work develops a complete simulation driven hardware framework designed to study the behavior, efficiency, and implementation characteristics of multiple sorting algorithms on FPGA oriented architectures. At the front end of the system, a pseudorandom number (PRN) generator produces a continuous stream of unbiased input values [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e]. These values are temporarily stored in an asynchronous FIFO, which acts as a buffering layer to decouple data generation from data processing [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. The FIFO ensures smooth, reliable transfer of data into the sorting modules while maintaining proper timing synchronization, even when the operating clock frequency of the generating and consuming modules are different [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eWithin this framework, five distinct sorting techniques Bubble Sort, Selection Sort, Insertion Sort, Radix Sort, and Quick Sort are implemented as standalone Verilog modules. Each module retrieves data from the FIFO and executes sorting operations based on its internal hardware structure and algorithmic logic [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e]. Because the modules are independent, it becomes possible to examine, in a controlled manner, how each algorithm translates into hardware, how it responds to random input patterns, and how architectural differences influence speed, cycle count, and overall hardware efficiency. This modular design also simplifies comparative analysis by ensuring that all algorithms operate under identical input conditions.\u003c/p\u003e \u003cp\u003eFunctional simulations validate the correct operation of each subsystem. The PRN generator is tested for continuity, randomness, and stable timing; the FIFO is evaluated for correct read/write behavior, pointer management, and boundary condition handling; and each sorting module is verified by observing the transition from an unsorted sequence to a correctly ordered output[\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. After simulation, post synthesis timing analysis is performed to determine the maximum operating frequency and estimate worst and best cases with respect to the frequency of operation for each algorithm. Resource utilization reports covering LUT usage and flip flop count and thus provide an insight into the hardware cost of each design. Together, these analyses form a comprehensive comparative study of Bubble Sort, Selection Sort, Insertion Sort, Radix Sort, and Quick Sort in a hardware centric environment [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e] [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e]. The framework enables early evaluation of algorithm suitability for FPGA implementation, providing clear evidence of which techniques offer the best balance of resource efficiency and structural simplicity before deploying the system to an actual hardware platform.\u003c/p\u003e"},{"header":"4. Methodology for Hardware Based Sorting Evaluation","content":"\u003cp\u003eThe methodology adopted in this work follows a systematic RTL design, verification, and analysis workflow that ensures each component of the system is thoroughly tested.\u003c/p\u003e \u003cp\u003eThe first stage of the methodology involves the development of a pseudorandom number (PRN) as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e generator in Verilog. This module is responsible for supplying a continuous stream of varied input values (which are generated according to the triggered input seed value) to the sorting algorithms. Functional simulations are carried out to verify the correctness of the generated sequences, confirm the absence of repeating patterns within short intervals, and ensure that timing and output stability are maintained across different simulation cycles[\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e].\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe second stage focuses on the design and verification of a dual port asynchronous FIFO shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, which acts as a buffer between the PRN generator and the sorting modules[\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]. The FIFO enables clean separation between data generation and data consumption, ensuring reliable operation even when read and write activity occurs at different rates [\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e]. The FIFO\u0026rsquo;s control logic including read and write pointer management, empty and full flag generation, and synchronization circuitry is carefully implemented to prevent overflow, underflow, and metastability[\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e]. Simulation based testing is used to validate boundary conditions and confirm that the FIFO behaves correctly under varying data transfer scenarios[\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e].\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eIn the third stage, five sorting algorithms Bubble Sort, Selection Sort, Insertion Sort, Radix Sort, and Quick Sort are implemented as synthesizable RTL modules. Each module is designed with a clean and modular interface to support independent development and functional testing[\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e] [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e]. Testbenches are created for each sorting algorithm to simulate behavior under controlled random inputs. These functional simulations verify that each module produces correctly sorted outputs and provide insight into the algorithmic flow and hardware mapping of each design [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e]. Once functional verification is complete, the fourth stage involves synthesis and timing analysis for each sorting module. Synthesis reports are examined to identify resource utilization in terms of LUT count, flip flops, memory elements, and combinational logic. Timing analysis highlights critical paths and determines the maximum achievable operating frequency, allowing evaluation of each design\u0026rsquo;s latency and performance characteristics [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e]. These metrics together provide a detailed comparison of hardware efficiency and computational behavior.\u003c/p\u003e \u003cp\u003eIn the final stage, all modules the PRN generator, asynchronous FIFO, and sorting units are integrated into a unified system level test environment. The complete data flow from random data generation to buffering and final sorting is simulated to ensure correct operation across the entire system. System level validation confirms module interoperability and consistent behavior under continuous random data input. Through this structured and layered methodology, the project achieves a thorough hardware oriented evaluation of multiple sorting algorithms.\u003c/p\u003e"},{"header":"5. Sorting Algorithms and Performance Characteristics","content":"\u003cp\u003eThis section presents the five sorting algorithms evaluated in the proposed hardware framework. Each algorithm is described in terms of its operating principle and hardware behaviour, followed by measured best case and worst-case clock cycle performance obtained from simulation are recorded in Table 1.\u0026nbsp;\u003c/p\u003e\n\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv\u003eTable 1\u003c/div\u003e\n \u003cdiv\u003e\n \u003cp\u003eBest and Worst Clock cycle\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eAlgorithm\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003eClock Cycles used in Best case\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c3\"\u003e\n \u003cp\u003eClock cycles used in Worst case\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c4\"\u003e\n \u003cp\u003eRemarks\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eBubble Sort\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e10\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e38\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c4\"\u003e\n \u003cp\u003eSimple but inefficient for large datasets; stable and in place.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eInsertion Sort\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e24\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e52\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c4\"\u003e\n \u003cp\u003eEfficient for small or nearly sorted data; stable and in place.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eQuick Sort\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e48\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e70\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c4\"\u003e\n \u003cp\u003eVery fast on average; unstable; recursive and highly parallelizable.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eSelection Sort\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e59\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e59\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c4\"\u003e\n \u003cp\u003ePredictable comparisons,\u003c/p\u003e\n \u003cp\u003esimple, minimal data movement.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eRadix Sort\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e34\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e34\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c4\"\u003e\n \u003cp\u003eRadix Sort is fast and stable but needs extra memory and slows down with many digits.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eBubble Sort\u003c/em\u003e\u003c/strong\u003e\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eBubble Sort repeatedly compares adjacent elements and swaps them when they are in the wrong order. With each pass, the largest element moves gradually toward the end of the list, giving the algorithm its characteristic “bubbling” behaviour [1]. Although simple to design in hardware, Bubble Sort performs many redundant comparisons, which makes it inefficient for larger datasets. In simulation, Bubble Sort achieves 10 clock cycles in the best case and 38 clock cycles in the worst case. While it is stable and in place, its high comparison count limits its suitability for high performance hardware sorting [1, 2]. Table 2 shows that Bubble Sort design uses only a very small portion of the FPGA resources, with LUTs, registers, and multiplexers all well under one percent of the device’s capacity.\u0026nbsp;\u003c/p\u003e\n\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv\u003eTable 2\u003c/div\u003e\n \u003cdiv\u003e\n \u003cp\u003eBubble sort Resource Utilization report\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eResource Type\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003eUsed\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c3\"\u003e\n \u003cp\u003eAvailable\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c4\"\u003e\n \u003cp\u003eUtilization %\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eSlice LUT’s\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e156\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e53200\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.29\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eLUT as logic\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e156\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e53200\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.29\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eLUT as Memory\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e17400\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.00\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eSlice Registers\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e142\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e106400\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.13\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eRegisters as Flipflop\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e142\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e106400\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.13\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eRegisters as Latches\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e106400\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.00\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eF7 Muxes\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e16\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e26600\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.06\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eF8 Muxes\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e13300\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.00\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003e\u003cstrong\u003eInsertion Sort\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eInsertion Sort constructs the sorted list incrementally. Each new element from the unsorted portion is inserted into its correct position within the sorted region. Because it avoids unnecessary comparisons when the input is nearly ordered, Insertion Sort performs efficiently on small or partially sorted datasets [2, 3]. The hardware implementation recorded 24 clock cycles in the best case and 52 clock cycles in the worst case. Its stable, in place nature and predictable behaviour make it suitable for lightweight hardware deployments. Table 3 shows that the Insertion Sort design also occupies a very small share of the FPGA’s resources, with minimal LUT and register usage and no memory or multiplexer utilization .\u0026nbsp;\u003c/p\u003e\n\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv\u003eTable 3\u003c/div\u003e\n \u003cdiv\u003e\n \u003cp\u003eInsertion sort Resource Utilization report\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eResource Type\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003eUsed\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c3\"\u003e\n \u003cp\u003eAvailable\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c4\"\u003e\n \u003cp\u003eUtilization %\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eSlice LUT’s\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e16\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e53200\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.03\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eLUT as logic\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e16\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e53200\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.03\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eLUT as Memory\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e17400\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.00\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eSlice Registers\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e144\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e106400\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.14\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eRegisters as Flipflop\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e144\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e106400\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.14\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eRegisters as Latches\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e106400\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.00\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eF7 Muxes\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e26600\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.00\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eF8 Muxes\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e13300\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.00\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eQuick Sort\u003c/em\u003e\u003c/strong\u003e\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eQuick Sort follows a divide and conquer strategy by selecting a pivot, partitioning the dataset into two groups, and recursively sorting each partition. The algorithm generally delivers strong performance, but its efficiency depends on pivot selection [2–4]. In hardware simulations, Quick Sort completed in 48 clock cycles in the best case and 70 clock cycles in the worst case. Although it is not stable, its parallelizable structure makes it attractive for high performance FPGA implementations. Table 4 shows that Quick Sort uses noticeably more FPGA resources than Bubble Sort and Insertion Sort, with higher LUT and register usage, while remaining well within the device’s capacity.\u0026nbsp;\u003c/p\u003e\n\u003ctable float=\"Yes\" id=\"Tab4\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv\u003eTable 4\u003c/div\u003e\n \u003cdiv\u003e\n \u003cp\u003eQuick sort Resource Utilization report\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eResource Type\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003eUsed\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c3\"\u003e\n \u003cp\u003eAvailable\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c4\"\u003e\n \u003cp\u003eUtilization %\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eSlice LUT’s\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e1448\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e53200\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e2.72\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eLUT as logic\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e1448\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e53200\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e2.72\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eLUT as Memory\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e17400\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.00\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eSlice Registers\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e880\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e106400\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.83\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eRegisters as Flipflop\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e880\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e106400\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.83\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eRegisters as Latches\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e106400\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.00\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eRadix Sort\u003c/em\u003e\u003c/strong\u003e\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eRadix Sort performs sorting digit by digit, beginning with the least significant digit. By grouping data according to digit values rather than performing comparisons, the algorithm achieves predictable and high throughput for integer-based datasets [3, 4]. Because the number of digit passes is fixed, both best case and worst case execution times are identical at 34 clock cycles. Radix Sort is fast and stable, but it demands additional memory and may exhibit slower performance when processing data with many digit positions [1]. Table 5 shows that Radix Sort requires more resources than Bubble and Insertion Sort but remains lighter than Quick Sort, using a modest amount of LUTs and registers while staying well within the FPGA’s limits.\u0026nbsp;\u003c/p\u003e\n\u003ctable float=\"Yes\" id=\"Tab5\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv\u003eTable 5\u003c/div\u003e\n \u003cdiv\u003e\n \u003cp\u003eRadix sort Resource Utilization report\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eResource Type\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003eUsed\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c3\"\u003e\n \u003cp\u003eAvailable\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c4\"\u003e\n \u003cp\u003eUtilization %\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eSlice LUT’s\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e869\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e53200\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e1.63\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eLUT as logic\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e869\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e53200\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e1.63\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eLUT as Memory\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e17400\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.00\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eSlice Registers\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e263\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e106400\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.25\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eRegisters as Flipflop\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e263\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e106400\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.25\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eRegisters as Latches\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e106400\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.00\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eF7 Muxes\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e6\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e26600\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.02\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eF8 Muxes\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e13300\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.00\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003e\u003cstrong\u003eSelection Sort\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eSelection Sort repeatedly identifies the smallest element in the unsorted region and swaps it into the correct position. Since the same number of comparisons is performed regardless of input order, its behaviour is highly predictable [8]. Both the best case and worst-case latencies are 59 clock cycles. Despite its simplicity and minimal data movement, Selection Sort is not efficient for large datasets due to its quadratic time complexity [9]. Table 6 shows that Selection Sort uses a moderate amount of LUTs and registers compared to the simpler algorithms, but its overall utilization remains very low relative to the FPGA’s available resources.\u0026nbsp;\u003c/p\u003e\n\u003ctable float=\"Yes\" id=\"Tab6\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv\u003eTable 6\u003c/div\u003e\n \u003cdiv\u003e\n \u003cp\u003eSelection sort Resource Utilization report\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eResource Type\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003eUsed\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c3\"\u003e\n \u003cp\u003eAvailable\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c4\"\u003e\n \u003cp\u003eUtilization %\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eSlice LUT’s\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e355\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e53200\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.67\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eLUT as logic\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e355\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e53200\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.67\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eLUT as Memory\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e17400\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.00\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eSlice Registers\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e228\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e106400\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.21\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eRegisters as Flipflop\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e228\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e106400\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.21\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eRegisters as Latches\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e106400\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.00\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eF7 Muxes\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e26600\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.00\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eF8 Muxes\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e13300\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.00\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003e\u003cstrong\u003eComparison of FPGA Resource Usage with Related Literature\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo evaluate the efficiency of the proposed sorting architectures, a comparative study was conducted against results reported in relevant reference papers. The comparison focuses primarily on the usage of Look-Up Tables (LUTs) and flip-flops (FFs), as these metrics directly reflect the hardware cost, complexity, and scalability of each design when implemented on FPGA platforms. LUTs represent the combinational logic required for datapath and control operations, while flip-flops account for the sequential elements such as registers, pipeline stages, and finite state machines [23]. The sorting architectures developed in this work demonstrate competitive and often more optimized resource utilization when compared with the designs documented in literature. This improvement is attributed to streamlined control logic, efficient dataflow organization, and minimized shifting or swapping operations in the hardware implementation. A detailed comparison of LUT and FF usage between the proposed designs and existing reference [26] works is presented in Table 7, providing a clear perspective on how the implemented modules align with or exceed the efficiency of previously published architectures. \u0026nbsp;\u003c/p\u003e\n\u003ctable float=\"Yes\" id=\"Tab7\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv\u003eTable 7\u003c/div\u003e\n \u003cdiv\u003e\n \u003cp\u003eResource Utilization of Proposed Sorting Architectures\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eSorting Algorithm\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003eSlices (LUT + FF)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c3\"\u003e\n \u003cp\u003eSlices (LUT + FF) [26]\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eBubble sort\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e298\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c3\"\u003e\n \u003cp\u003eSerial: 723 ; Parallel: 672\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eSelection sort\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e583\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c3\"\u003e\n \u003cp\u003e730\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eInsertion sort\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e435\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c3\"\u003e\n \u003cp\u003e802\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eQuick sort\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e2328\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c3\"\u003e\n \u003cp\u003e–\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eRadix sort\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e1132\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c3\"\u003e\n \u003cp\u003e–\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e"},{"header":"6. Experimental Results","content":"\u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab8\" class=\"InternalRef\"\u003e8\u003c/span\u003e shows that the PRNG design uses only a tiny portion of the FPGA\u0026rsquo;s resources. The logic and register usage is very low, and there\u0026rsquo;s no memory or multiplexer utilization. Overall, the design is lightweight and leaves plenty of space for additional modules. Figure\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e shows the output in form of waveform of the PRNG module.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab8\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 8\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003ePRNG Resource Utilization\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eResource Type\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eUsed\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eAvailable\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eUtilization %\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSlice LUT\u0026rsquo;s\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e16\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e53200\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.03\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLUT as logic\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e16\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e53200\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.03\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLUT as Memory\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e17400\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.00\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSlice Registers\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e144\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e106400\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.14\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRegisters as Flipflop\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e144\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e106400\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.14\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRegisters as Latches\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e106400\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.00\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eF7 Muxes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e26600\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.00\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eF8 Muxes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e13300\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.00\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e"},{"header":"7. Implementation of a sorting module alone on PYNQ-Z2","content":"\u003cp\u003e \u003c/p\u003e \u003cp\u003eThe sorting module from the main RTL design was packaged as a memory-mapped AXI-Lite custom IP. As shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e, the sorting logic is directly connected to the AXI-Lite interface, allowing the processor to trigger the sort operation, provide input data, and read the final output. This approach exposes only the required sorting functionality to the PS, keeping the design simple and efficient.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe IP contains six 32-bit registers mapped within the address range 0x43C0_0000\u0026ndash;0x43C0_FFFF, as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e. slv_reg0 triggers sorting, slv_reg1 and slv_reg2 store the 64-bit input data, slv_reg3 reports completion via the \u003cem\u003esdone\u003c/em\u003e bit, and slv_reg4 and slv_reg5 hold the sorted result. This mapping provides a clean and accessible structure for software-controlled sorting.\u003c/p\u003e \u003cp\u003eAfter synthesis, the full design was programmed onto the PYNQ-Z2 board as a custom overlay. In the PYNQ framework, the overlay allows the ARM processor to interact with the FPGA logic through Python. Using the \u003cspan fontcategory=\"NonProportional\" class=\"\" name=\"Emphasis\"\u003epynq.MMIO\u003c/span\u003e class, the processor accessed the sorting IP at base address 0x43C0_0000 with a memory range of 0x10000. The Python script accepted user input, packed the 64-bit data into slv_reg1 and slv_reg2, generated a pulse on slv_reg0 to start the sorting, and continuously polled the \u003cem\u003esdone\u003c/em\u003e bit in slv_reg3 to detect completion. Once the sorting finished, the script read the lower and upper 32-bit outputs from slv_reg4 and slv_reg5 and unpacked them to display the final sorted sequence as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e. This workflow provided a seamless bridge between Python software and the hardware accelerator running on the FPGA.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e"},{"header":"8. Conclusion","content":"\u003cp\u003eThis work examined the behaviour and FPGA suitability of several well-known sorting algorithms\u0026mdash;including bubble sort, insertion sort, selection sort, quick sort, and radix sort\u0026mdash;without running them in real time. The analysis showed that simple comparison-based methods like bubble and selection sort are easy to map onto hardware but scale poorly, while insertion sort performs better for nearly sorted inputs. Quick sort, despite its software efficiency, is difficult to implement on FPGA due to recursion and irregular memory access patterns. Radix sort stood out as the most hardware-friendly approach, as its digit-wise, non-comparative nature aligns well with parallel processing and deterministic pipeline structures in FPGAs.\u003c/p\u003e \u003cp\u003eThe implementation of the custom sorting IP on the PYNQ-Z2 platform demonstrated an effective hardware\u0026ndash;software co-design flow. Using an AXI-Lite interface and the PYNQ overlay, the ARM processor communicated seamlessly with the FPGA logic through Python and MMIO access. Deploying the sorting module as a standalone IP validated the functionality on actual hardware while providing a flexible foundation for further optimization and future development of more advanced data-processing accelerators.\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eA- Annapoorna AD- Dhanush H AR- Radha R CL- Lavanya R GejjiC- Chethana MohanA.D. and D.R. designed the hardware architecture and contributed to writing the results and observations sections of the manuscript. L.C. performed FPGA integration and contributed to writing the results and observations. R.L. reviewed and edited the manuscript for technical accuracy and clarity. All authors contributed to the final evaluation of the work and approved the manuscript for submission.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eVijay, R., Jha, L., \u0026amp; Gupta, G. Performance and analysis of sorting algorithms for random data input, 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT), Dhaka, Bangladesh, 2019, pp. 1\u0026ndash;6. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/ICASERT.2019.8934789\u003c/span\u003e\u003cspan address=\"10.1109/ICASERT.2019.8934789\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFaujdar, N., \u0026amp; Ghrera, S. P., Analysis and Testing of Sorting Algorithms on a Standard Dataset, \u003cem\u003e2015 Fifth International Conference on Communication Systems and Network Technologies\u003c/em\u003e, Gwalior, India, 2015, pp. 962\u0026ndash;967. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/CSNT.2015.98\u003c/span\u003e\u003cspan address=\"10.1109/CSNT.2015.98\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFenyi, A., Fosu, M., \u0026amp; Appiah, B. (2020). Comparative Analysis of Comparison and Non Comparison based Sorting Algorithms. \u003cem\u003eInternational Journal of Computer Applications\u003c/em\u003e, \u003cem\u003e175\u003c/em\u003e, 22\u0026ndash;25. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.5120/ijca2020920813\u003c/span\u003e\u003cspan address=\"10.5120/ijca2020920813\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBuradagunta, S., Bodapati, J. D., Mundukur, N. B., \u0026amp; Salma, S. (2020). Performance comparison of sorting algorithms with random numbers as inputs. Ing\u0026eacute;nierie des Syst\u0026egrave;mes d\u0026rsquo;Information, 25, 1, pp. 113\u0026ndash;117. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.18280/isi.250115\u003c/span\u003e\u003cspan address=\"10.18280/isi.250115\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAlkharabsheh, K., Alturani, I., Alturani, A., \u0026amp; Zanoon, D. N. (2013). \u003cem\u003eReview on Sorting Algorithms A Comparative Study\u003c/em\u003e. International Journal of Computer Science and Security (IJCSS).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSundaramoorthy, S., \u0026amp; Karunanidhi, G. (2025). A systematic analysis on performance and computational complexity of sorting algorithms. \u003cem\u003eDiscover Computing\u003c/em\u003e, \u003cem\u003e28\u003c/em\u003e. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/s10791-025-09724-w\u003c/span\u003e\u003cspan address=\"10.1007/s10791-025-09724-w\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhu, Z. G. (2020). Analysis and Research of Sorting Algorithm in Data Structure Based on C Language. Journal of Physics: Conference Series. 1544. 012002. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1088/1742\u0026ndash;6596/1544/1/012002\u003c/span\u003e\u003cspan address=\"10.1088/1742\u0026ndash;6596/1544/1/012002\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMohammadagha, M. (2025). Hybridization and Optimization Modeling, Analysis, and Comparative Study of Sorting Algorithms: Adaptive Techniques, Parallelization, for Mergesort, Heapsort, Quicksort, Insertion Sort, Selection Sort, and Bubble Sort. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.31224/4537\u003c/span\u003e\u003cspan address=\"10.31224/4537\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi, X., Zhou, L., \u0026amp; Zhu, Y. A. (2025). Scalable Sorting Network Based on Hybrid Algorithms for Accelerating Data Sorting. \u003cem\u003eElectronics\u003c/em\u003e 14, 579. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/electronics14030579\u003c/span\u003e\u003cspan address=\"10.3390/electronics14030579\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFPGA BASED 64-BIT TRUE RANDOM NUMBER GENERATOR - Bonala Purushotham Karee Manish, Vankala Bhanu Prakash, Sudhir Dakey.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePoojari, A., \u0026amp; Nagesh, H. R. (2021). FPGA implementation of random number generator using LFSR and scrambling algorithm for lightweight cryptography. \u003cem\u003eInternational Journal of Applied Science and Engineering\u003c/em\u003e, \u003cem\u003e18\u003c/em\u003e, 1\u0026ndash;9. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.6703/IJASE.202112_18(6).001\u003c/span\u003e\u003cspan address=\"10.6703/IJASE.202112_18(6).001\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKarataş, O., \u0026amp; Erg\u0026uuml;n, S. A Digital Random Number Generator Based on Four Regional Examination of Double Scroll Chaos, \u003cem\u003e2022 IEEE 13th Latin America Symposium on Circuits and System (LASCAS)\u003c/em\u003e, Puerto Varas, Chile, 2022, pp. 1\u0026ndash;4. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/LASCAS53948.2022.9789090\u003c/span\u003e\u003cspan address=\"10.1109/LASCAS53948.2022.9789090\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVijayaraghavan, P., \u0026amp; Amutha, R. ASIC vs FPGA Realization of a Cryptographically Secure Random Number Generator Using Chaotic Map, \u003cem\u003e2025 IEEE 5th International Conference on VLSI Systems, Architecture, Technology and Applications (VLSI SATA)\u003c/em\u003e, Bangalore, India, 2025, pp. 1\u0026ndash;6. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/VLSISATA65374.2025.11070201\u003c/span\u003e\u003cspan address=\"10.1109/VLSISATA65374.2025.11070201\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang, Z. (2023). Optimization of Asynchronous FIFO Design Difficulties Using Verilog HDL. Highlights in Science. \u003cem\u003eEngineering and Technology\u003c/em\u003e, \u003cem\u003e38\u003c/em\u003e, 956\u0026ndash;964. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.54097/hset.v38i.5982\u003c/span\u003e\u003cspan address=\"10.54097/hset.v38i.5982\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXu, Y. (2023). Asynchronous FIFO Design Based on Verilog. Highlights in Science. \u003cem\u003eEngineering and Technology\u003c/em\u003e, \u003cem\u003e38\u003c/em\u003e, 965\u0026ndash;970. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.54097/hset.v38i.5983\u003c/span\u003e\u003cspan address=\"10.54097/hset.v38i.5983\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSklyarov, V., Skliarova, I., \u0026amp; Sudnitson, A. (2014). FPGA-based Accelerators for Parallel Data Sort. \u003cem\u003eApplied Computer Systems\u003c/em\u003e 16. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1515/acss-2014-0013\u003c/span\u003e\u003cspan address=\"10.1515/acss-2014-0013\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMarcelino, R., Neto, H., \u0026amp; Cardoso, J. (2008). Sorting units for FPGA-Based embedded systems. \u003cem\u003eIFIP International Federation for Information Processing\u003c/em\u003e, \u003cem\u003e271\u003c/em\u003e, 11\u0026ndash;22. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/978-0-387-09661-2_2\u003c/span\u003e\u003cspan address=\"10.1007/978-0-387-09661-2_2\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eben jmaa, Y., Atitallah, R., Duvivier, D., \u0026amp; Jemaa, M. (2019). A Comparative Study of Sorting Algorithms with FPGA Acceleration by High Level Synthesis. Computaci\u0026oacute;n y Sistemas. 23. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.13053/cys-23-1-2999\u003c/span\u003e\u003cspan address=\"10.13053/cys-23-1-2999\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKobayashi, R., Miura, K., Fujita, N., Boku, Taisuke, Amagasa, \u0026amp; Toshiyuki (2022). An Open-source FPGA Library for Data Sorting. \u003cem\u003eJournal of Information Processing\u003c/em\u003e, \u003cem\u003e30\u003c/em\u003e, 766\u0026ndash;777. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.2197/ipsjjip.30.766\u003c/span\u003e\u003cspan address=\"10.2197/ipsjjip.30.766\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNorollah, A., Kazemi, Z., \u0026amp; Beitollahi, H., An Efficient Sorting Architecture for Area and Energy Constrained Edge Computing Devices, \u003cem\u003e2019 International Conference on High Performance Computing \u0026amp; Simulation (HPCS)\u003c/em\u003e, Dublin, Ireland, 2019, pp. 455\u0026ndash;462. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/HPCS48598.2019.9188237\u003c/span\u003e\u003cspan address=\"10.1109/HPCS48598.2019.9188237\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eReddy, P. S., Prashanth, A., \u0026amp; Bachu, S., An FPGA based Scheme for Real-Time Max/Min-Set-Selection Sorters, \u003cem\u003e2024 1st International Conference on Cognitive, Green and Ubiquitous Computing (IC-CGU)\u003c/em\u003e, Bhubaneswar, India, 2024, pp. 1\u0026ndash;5. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/IC-CGU58078.2024.10530671\u003c/span\u003e\u003cspan address=\"10.1109/IC-CGU58078.2024.10530671\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePetrut, P. C., Amaricai, A., \u0026amp; Boncalo, O. Configurable FPGA architecture for hardware-software merge sorting, \u003cem\u003e2016 MIXDES \u0026ndash;\u0026thinsp;23rd International Conference Mixed Design of Integrated Circuits and Systems\u003c/em\u003e, Lodz, Poland, 2016, pp. 179\u0026ndash;182. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/MIXDES.2016.7529727\u003c/span\u003e\u003cspan address=\"10.1109/MIXDES.2016.7529727\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePreethi, P., Ulla, M., Sapna, R., Devadas, R., Pavithra, N., \u0026amp; Manasa, C. M. (2025). Designing Low-Power Hardware Merge Sorters Using Clock Gating for IoT Applications. 234\u0026ndash;238. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/INCIP64058.2025.11019960\u003c/span\u003e\u003cspan address=\"10.1109/INCIP64058.2025.11019960\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJiang, W., Cao, P., Wu, Y., \u0026amp; Efficient Data Packet Sorting Method Based on Field-Programmable Gate Arrays On-Chip Random Access Memory,. \u003cem\u003e2024 4th Asia-Pacific Conference on Communications Technology and Computer Science (ACCTCS)\u003c/em\u003e, Shenyang, China, 2024, pp. 247\u0026ndash;251. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/ACCTCS61748.2024.00051\u003c/span\u003e\u003cspan address=\"10.1109/ACCTCS61748.2024.00051\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLipu, A. R. (2016). Exploiting parallelism for faster implementation of Bubble sort algorithm using FPGA. \u003cem\u003e2nd International Conference on Electrical, Computer \u0026amp; Telecommunication Engineering (ICECTE)\u003c/em\u003e (2016): 1\u0026ndash;4.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAbdelrasoul, M., Shaban, A. S., \u0026amp; Abdel-Kader, H., FPGA Based Hardware Accelerator for Sorting Data, \u003cem\u003e2021 9th International Japan-Africa Conference on Electronics, Communications, and Computations (JAC-ECC)\u003c/em\u003e, Alexandria, Egypt, 2021, pp. 57\u0026ndash;60. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/JAC-ECC54461.2021.9691432\u003c/span\u003e\u003cspan address=\"10.1109/JAC-ECC54461.2021.9691432\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNorollah, A., Kazemi, Z., Beitollahi, H., \u0026amp; H\u0026eacute;ly, D. (May 2022). Hardware Support for Efficient and Low-Power Data Sorting in Massive Data Application: The 3-D Sorting Method, \u003cem\u003eIEEE Consumer Electronics Magazine\u003c/em\u003e, 11, 3, 87\u0026ndash;94, 1 doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/MCE.2021.3076979\u003c/span\u003e\u003cspan address=\"10.1109/MCE.2021.3076979\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhao, B., Li, Y., Wang, Y., \u0026amp; Yang, H. Streaming sorting network based BWT acceleration on FPGA for lossless compression, \u003cem\u003e2017 International Conference on Field Programmable Technology (ICFPT)\u003c/em\u003e, Melbourne, VIC, Australia, 2017, pp. 247\u0026ndash;250. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/FPT.2017.8280152\u003c/span\u003e\u003cspan address=\"10.1109/FPT.2017.8280152\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLong, Z., Zhang, Z., \u0026amp; FPGA-Based Collaborative Hardware Sorting Unit for Embedded Data Processing System,. \u003cem\u003e2017 10th International Conference on Intelligent Computation Technology and Automation (ICICTA)\u003c/em\u003e, Changsha, China, Oct. 2017, pp. 65\u0026ndash;69. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/ICICTA.2017.65\u003c/span\u003e\u003cspan address=\"10.1109/ICICTA.2017.65\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003esWang, Y., Han, Y., Chen, J., Wang, Z., \u0026amp; Zhong, Y. (2023). An FPGA-Based Hardware Low-Cost, Low-Consumption Target-Recognition and Sorting System. \u003cem\u003eWorld Electr Veh J\u003c/em\u003e, \u003cem\u003e14\u003c/em\u003e, 245. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/wevj14090245\u003c/span\u003e\u003cspan address=\"10.3390/wevj14090245\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"FPGA, PYNQ-Z2, AXI-Lite, Bubble Sort, Insertion Sort, Selection Sort, Quick sort, Radix Sort, LUT Utilization","lastPublishedDoi":"10.21203/rs.3.rs-9347937/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9347937/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eSorting plays a crucial role in many real world applications, especially with the rapid growth of data in modern industries. As data volume increases, traditional CPU based sorting methods may not always meet the required speed and efficiency. To address this challenge, hardware based sorting accelerators using FPGA platforms can significantly improve performance through parallel processing and dedicated logic. This work focuses on the design and analysis of multiple hardware implementations of sorting algorithms including Bubble Sort, Insertion Sort, Selection Sort, Quick Sort, and Radix Sort. The proposed designs are implemented and demonstrated on an FPGA to evaluate their real time performance. Each algorithm is analyzed in terms of hardware architecture, time and space complexity, and clock cycles for best and worst cases. The study highlights the advantages of FPGA based sorting over CPU based approaches for high speed applications.\u003c/p\u003e","manuscriptTitle":"Implementation and performance analysis of sorting algorithms on PYNQ Z2 FPGA board","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-04-08 08:41:07","doi":"10.21203/rs.3.rs-9347937/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"5ef9d661-1be2-4206-a557-65c676aa0cb3","owner":[],"postedDate":"April 8th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2026-04-08T08:41:11+00:00","versionOfRecord":[],"versionCreatedAt":"2026-04-08 08:41:07","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9347937","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9347937","identity":"rs-9347937","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00