A Q-Learning Framework for Disaster Resilient Data Transfer in Underwater Wİreless Sensor Networks | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article A Q-Learning Framework for Disaster Resilient Data Transfer in Underwater Wİreless Sensor Networks Ritu Bhardwaj, Ashwani Kush This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8877266/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Underwater Wireless Sensor Networks (UWSNs) play a critical role in real-time environmental monitoring during disasters such as earthquakes, tsunamis, and underwater landslides. However, highly dynamic and hazardous underwater conditions significantly reduce the effectiveness of conventional routing protocols. This study addresses data transmission challenges in disaster-prone UWSNs by proposing a Q-learning-based Adaptive Data Transfer Algorithm (QL-ADTA). The proposed approach employs a multi-objective reward function that dynamically adapts to contextual parameters including data priority, link quality, node survivability, debris density, and water current strength.The algorithm was evaluated using disaster-simulated synthetic datasets and compared with conventional routing protocols such as DSR and AODV. Experimental results demonstrate that QL-ADTA achieves an 18% improvement in packet delivery ratio, 23% reduction in latency, and 20% enhancement in energy efficiency. These findings confirm the model’s ability to adapt routing decisions in real time based on disaster severity, thereby improving communication reliability and extending network lifetime. The context-aware reward design and emergency-adaptive routing strategy distinguish the proposed method from conventional reinforcement learning approaches in UWSNs, offering a robust framework for disaster-resilient underwater communication. Underwater Wireless Sensor Networks Reinforcement Learning Q-Learning Disaster Management Energy Efficiency Adaptive Routing Data Transfer Optimization Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 1. INTRODUCTION Submarine earthquakes, tsunamis and underwater landslides are becoming more common and bigger thus we need to set up trustworthy systems to detect them in real time. These events often cause quick changes in the environment that upset marine ecosystems and damage infrastructure offshore. In times of crisis, it is important to quickly and accurately send environmental data so that disasters can be predicted, warnings can be sent out, and emergency actions can be taken. Underwater Wireless Sensor Networks (UWSNs) are becoming an important technology in this industry because they can monitor underwater environments on their own and all the time [ 1 ]. UWSNs usually have a lot of sensor nodes that are spread out over a large area and placed below the surface of the water to collect usual environmental data and send it to a base station or sink node These nodes face unique challenges that don't exist in terrestrial networks, such as considerable signal loss, limited bandwidth, long propagation delays, changing topologies, and batteries that don't last very long [ 2 ][ 3 ]. When a disaster happens, the situation gets a lot more complicated since network circumstances might change quickly because of debris moving, water currents changing, and nodes breaking down.These changes in the network might cause a lot of packet loss, energy loss, communication delays, and, in the end, failure to send important data[ 4 ][ 5 ]. AODV, DSR, and DBR are examples of traditional routing protocols that were intended for static or moderately dynamic environments. They rely on pre set routing paths or heuristic-driven forwarding choices. Some processes include energy-awareness or delay-tolerant features, but they often can't change on the fly to deal with big unexpected changes in the environment that happen during disasters [ 6 ][ 7 ]. In addition, these rules don't rank data based on how important it is, which is very important in emergency response systems.Recent progress in Artificial Intelligence (AI), especially in Reinforcement Learning (RL), has led to several interesting ways to make decisions in real time and adapt to changing circumstances. Q-learning is different from other types of RL because it doesn't use a model and can find the best actions by trying things out in the environment [ 8 ]. Researchers have used Q-learning to come up with routing strategies that are both energy-efficient and dependent on latency in UWSNs [ 9 ][ 10 ]. Still, most of the current Q-learning-based methods focus on improving network performance in normal settings and don't take into account the special restrictions and important criteria that need to be met during underwater emergencies. They don't pay attention to real-time disaster indicators like the amount of debris, the turbulence of the current, and the survivability of nodes. They also don't use data prioritisation methods.[ 11 ][ 12 ]. Although RL has been explored for routing in UWSNs, most existing approaches are designed for normal operating conditions and do not adequately consider the dynamic challenges posed by disaster scenarios. These methods often fail to incorporate essential parameters such as debris concentration, water current fluctuations, node survivability, and the urgency level of transmitted data. As a result, their adaptability and effectiveness in emergency situations remain limited. The proposed Q-Learning-based Adaptive Data Transfer Algorithm (QL-ADTA) addresses this research gap by introducing a disaster aware reward function that dynamically adjusts to environmental disruptions and prioritizes critical data. Through this context-sensitive learning framework QL-ADTA enhances real time decision-making for routing, ensuring reliable, energy-efficient, and responsive communication in disaster-affected underwater networks. This contribution significantly advances the current state of research by providing a robust solution tailored specifically for high-risk, unpredictable underwater environments. OBJECTİVES OF THE STUDY To provide a disaster-aware reward function that takes into account a lot of important factors, such as packet delivery ratio, energy efficiency, latency, network dependability, debris density, and data priority. To create a Q-learning-based routing agent that always develops and enhances its ability to send data along the best paths based on feedback from the environment and the network in real time. To simulate emergencies using synthetic UWSNs datasets considering the algorithm's robustness and comparing its effectiveness with current and traditional RL based routing protocols. To test the adaptability of the algorithm is by adjusting the weights of the reward function in real time based on how serious the disaster is and the situations around it. This proposed study will add to the information we are familiar with through the development of a context-sensitive reinforcement learning framework that is specifically built for underwater communication systems that are likely to fail. The suggested QL-ADTA is better instead of old routing methods. 2. LITERATURE REVIEW Routing strategies in UWSNs have been widely investigated with primary emphasis on improving energy efficiency, minimizing communication delay, and enhancing packet delivery reliability. However, routing in disaster-prone underwater environments where network topology changes rapidly and critical data must be transmitted urgently remains insufficiently explored. To present a clearer understanding of existing contributions, prior studies are grouped into thematic categories relevant to this research. 2.1 Energy-Efficient Routing Approaches Energy conservation is a fundamental concern in UWSNs due to the difficulty of battery replacement and recharging in underwater environments. Several studies have applied reinforcement learning techniques to extend network lifetime.Ouidir et al. [ 2 ] proposed QENDIP, a Q-learning-based routing scheme that improves energy-efficient forwarding and enhances throughput. Zhou et al. [ 3 ] introduced the Q-Learning-Based Localization-Free Routing (QLFR) protocol, which selects routes based on residual energy and node depth to improve stability and energy utilization. Similarly, Xiao and Huang [ 7 ] applied ant colony optimization to determine cluster heads and multi-hop routing paths using energy metrics and inter-node distance.Although these approaches significantly improve energy conservation, they are primarily designed for stable or moderately dynamic environments and lack mechanisms for real-time adaptation during disaster events. 2.2 Latency-Aware and Delay-Sensitive Routing In underwater monitoring and early warning systems, minimizing communication delay is critical. Several routing protocols focus on delay reduction while maintaining acceptable energy consumption.The QTAR protocol [ 5 ] employs topology-aware Q-learning to reduce end-to-end delay while conserving energy. Wang et al. [ 8 ] developed EP-ADTA, an edge-prediction-assisted reinforcement learning model that enhances reliable data transfer under challenging underwater conditions.While these techniques effectively reduce latency, they do not incorporate data prioritization or adapt to sudden environmental disruptions such as underwater earthquakes, landslides, or strong current turbulence factors essential in disaster response scenarios. 2.3 Conventional Routing Protocols and Their Limitations Traditional routing protocols such as AODV, DSR, and DBR [ 6 ] were originally designed for static or mildly dynamic networks. These protocols rely on predefined routes or heuristic-based forwarding decisions and are not well suited for highly unstable underwater disaster conditions.Although some schemes incorporate delay-tolerant or energy-aware features, they cannot dynamically respond to abrupt environmental changes such as debris movement, shifting water currents, or rapid node failures. Consequently, their reliability degrades significantly in emergency scenarios. 2.4 Reinforcement Learning-Based Routing Enhancements Reinforcement learning has recently gained attention for adaptive routing in UWSNs. Protocols such as QLFR [ 3 ] and QTAR [ 5 ] improve routing decisions by optimizing performance metrics like energy consumption and delay. More advanced techniques including Double Q-learning [ 9 ] and Deep Q-Networks (DQN) [ 10 ] address issues such as value overestimation and scalability. Multi-agent reinforcement learning frameworks [ 13 ] and adaptive MAC-layer protocols [ 6 ] further enhance overall network efficiency.Despite these advancements, most RL-based routing strategies are designed for normal operating conditions. Disaster-specific indicators such as debris density, current turbulence, node survivability, and urgency of transmitted data are rarely integrated into the learning process. Furthermore, existing approaches generally do not differentiate between high-priority emergency data and routine monitoring information. From the above review, it is evident that significant progress has been made in improving energy efficiency and reducing delay in UWSNs. However, the combined requirements of disaster awareness, urgent data prioritization, and real-time adaptability remain insufficiently addressed. Current Q-learning-based routing protocols do not incorporate disaster-sensitive environmental parameters and lack mechanisms to prioritize critical data during emergencies. To bridge this gap, the proposed QL-ADTA framework introduces a context-aware, disaster-sensitive reward function that dynamically adapts routing decisions based on debris density, water current turbulence, node survivability, and data priority. This design enables reliable, adaptive, and energy-efficient communication tailored specifically for high-risk underwater disaster environments. 3. APPLICATION AND BROADER IMPACT OF THE PROPOSED WORK The proposed QL-ADTA has substantial practical consequences in fields that depend on reliable real time underwater communication during critical incidents. In Disaster Early Warning Systems QL-ADTA can be used in marine areas susceptible to undersea earthquakes, tsunamis, or landslides to guarantee ongoing communication of essential data. This allows authorities to provide timely alerts, save lives and reducing damage. Marine Research and Environmental Monitoring: Oceanographers and marine biologists may utilize the suggested routing system in challenging or dynamic underwater conditions to ensure a stable and energy-efficient transmission of information throughout longer observation missions. In Military and Surveillance Applications Naval operations, underwater surveillance, and border security can benefit from QL-ADTA's ability to prioritize urgent or high-importance transmissions during dynamic underwater conditions. Oil & Gas Industry can use offshore platforms to reliably transmit sensor readings during structural disturbances or environmental events, aiding operational continuity and hazard detection. When we use smart Ocean Infrastructure the algorithm can be integrated into intelligent ocean systems and digital twins of marine infrastructure, helping cities and coastal industries prepare for disaster resilience through real-time sensing and response mechanisms. From a research standpoint, QL-ADTA opens pathways for: Cross domain application of reinforcement learning in other sensor networks such as underground, space-based or vehicular sensor networks. Hybridization with edge computing or swarm intelligence, potentially enabling decentralized intelligence in marine Internet of Things (IoT). Policy-making and simulation frameworks for governments and disaster management authorities seeking resilient underwater data communication strategies. This work, therefore, not only addresses a highly specialized engineering challenge in UWSNs but also contributes to the global efforts toward climate resilience, smart environmental sensing, and emergency response preparedness. 4. METHODOLOGY UWSNs encounter severe instability in communication when subjected to natural disasters such as tsunamis, underwater earthquakes, and landslides. These scenarios result in unpredictable changes in node connectivity increased signal attenuation, and compromised energy availability [ 2 ][ 4 ][ 6 ]. To overcome such dynamic and harsh conditions, we propose a QL-ADTA designed to intelligently optimize routing decisions based on environmental conditions and network status. 4.1 SYNTHETIC DATASET GENERATİON To evaluate the proposed algorithm in a controlled yet realistic environment, a synthetic dataset was developed that models diverse underwater disaster scenarios [ 3 ][ 5 ][ 8 ]. For this we used Python programming Platform. The dataset comprises 50,000 samples each representing a snapshot of the underwater network state during varying intensities of environmental disruption. Each record in the dataset includes the following parameters which are Packet Delivery Ratio (PDR) values between 0.2 and 1.0 indicating successful data delivery, latency(ms) Simulated between 100 and 500 ms based on node distance and congestion, Remaining Energy (J) ranging from 0.2 J to 1.0 J to represent battery levels, node Survivability (%) operational probability under disaster conditions (30%–100%).debris density (kg/m³) modeled between 0 to 15, indicating underwater obstruction, water current strength (m/s) simulated values from 0.5 to 3.0 m/s to reflect oceanic turbulence, link Quality Index (LQI) ranging from 10 to 30 dB, indicating channel conditions, priority weight (1–5): represents the urgency level of the data e.g., (seismic > temperature) .These parameters were derived based on empirical models used in underwater disaster simulations [ 5 ][ 8 ][ 14 ]. Higher disaster intensity was modeled by increasing debris and current strength and decreasing node survivability. Disaster scenarios were emulated by dynamically adjusting debris density, water current, and link degradation. High disaster severity was modeled by simultaneously lowering node survivability and increasing both debris and current intensity. 4.2 LEARNING BASED ROUTİNG MODEL The proposed QL-ADTA model follows the reinforcement learning paradigm [ 9 ][ 10 ], where the routing agent learns the optimal forwarding strategy by interacting with its environment and receiving feedback. In state space (S) each state is defined by a combination of network and environmental factors extracted from the dataset including PDR, latency, remaining energy, node survivability, debris level, current strength, and data priority. In action space (A) at every decision point, a node can choose from a discrete set of actions. Firstly send the packet to a selected neighbor then delay the transmission to allow for better conditions after that choose an alternative path to bypass affected regions. The reward function guides the agent by quantifying the usefulness of each action. It is formulated as: R total =w 1 * (PDR*Priority) -w 2 * (latency *Priority) +w 3 * (Remaining Energy/Initial Energy )+ w 4 * (Link Quality× Node Survivability )- w 5 *((Debris+ current) +w 6 *Priority (i) Each component reflects a critical performance or risk metric. The weights w 1 to w 6 are initially assigned balanced values but are dynamically updated based on real time conditions which are described in the next section.The standard Q-learning update is employed: Where α is the learning rate, γ is the discount factor, and s′ is the new state after performing action a. 4.3 WEİGHT ADAPTATİON MECHANİSM The proposed approach is the dynamic tuning of reward function weights in response to detected disaster conditions. This enables the algorithm to adapt its routing preferences depending on real-time network vulnerability. Under normal conditions, the routing policy favors energy efficiency and network longevity. Thus weights are initialized as w 1 = 0.2, w 2 = 0.2, w 3 = 0.2, w 4 = 0.2, w 5 = 0.1, w 6 = 0.1.Under disaster conditions i.e. identified by spikes in current strength or debris density or drops in node survivability, the weight values are adjusted to prioritize data delivery and resilience Increase w 5 (disaster risk penalty) to discourage routing through unstable regions and Increase w 6 (data priority reward) to accelerate critical data transmission .Decrease w 3 (energy consideration), since conserving energy becomes secondary to ensuring survivability. This real-time reweighting strategy allows the QL-ADTA to remain effective across varying network conditions without requiring retraining or manual intervention. Table 1 Summary of Weight Adaptation in QL-ADTA Weight Parameter Role in Reward Function Adaptation Effect Impact on Routing Decision W1 Packet Delivery Ratio (PDR) Encourages reliable data transmission Increased when higher delivery is required Promotes stable high-PDR routes W2 Latency Penalizes transmission delay (negative term) Raised when delay-sensitive traffic dominates Discourages paths with high latency W3 Residual Energy Rewards energy-efficient nodes Increased when battery conservation is critical Extends network lifetime by favoring efficient nodes W4 Link Quality Rewards strong and stable connections Increased under unstable topologies Improves robustness against link failures W5 Hazard Factor (Debris + Current) Applies penalty for hazardous conditions (negative term) Increased under disaster conditions; reduced in calm waters Ensures safety by avoiding risky links W6 Priority Provides bonus for urgent data Increased for emergency or real-time packets Ensures timely delivery of critical information Table 1 summarizes the role of each weight in the reward function. The negative signs for W 2 and W 5 ensure that harmful factors (latency and hazards) always reduce the total reward. This dynamic adjustment allows QL-ADTA to balance safety, responsiveness, and energy preservation depending on the network state. 4.4 TRAİNİNG AND DEPLOYMENT A Q-table is initialized with all state-action values set to zero. Learning parameters are set as α = 0.5, γ = 0.9, ϵ=0.1(for exploration)[ 9 ]. Then agent reads a state from the dataset. It selects an action using the ε-greedy strategy. The environment transitions to a new state, and the reward is computed. After that the Q-value is updated using the defined rule as eqauation (ii).This process repeats until convergence. After sufficient episodes, the optimal routing policy policy π(s) = arg max a Q(s,a) is derived and applied in real-time routing decisions in new disaster instances. 4.5 RESULTS AND DİSCUSSİON The proposed QL-ADTA protocol was evaluated against conventional routing protocols AODV, DSR and reinforcement learning-based schemes QLFR, QTAR, EP-ADTA. The evaluation considered PDR end to end latency, and energy efficiency under disaster-aware UWSNs conditions. Additional experiments were conducted to analyze performance under varying hazard intensity (debris density and water currents) and to assess the relationship between PDR and reward. 4.6 PACKET DELİVERY RATİO (PDR) Figure 1 illustrates the comparative performance of the protocols in terms of PDR. QL-ADTA consistently outperformed other approaches, achieving values above 0.84, while AODV and DSR remained below 0.70. The improvement was particularly significant in unstable environments where debris density and water currents degrade communication. Figure 2 further confirms this trend showing that QL-ADTA sustains higher PDR even when debris density reaches 12–14 kg/m³ and current strength exceeds 2 m/s. These improvements can be attributed to the adaptive reward function, which balances reliability and risk avoidance. Similar findings regarding adaptive routing under uncertainty were reported in [ 1 ][ 2 ]. 4.7 LATENCY ANALYSIS Figure 3 compares the average end-to-end latency across protocols. QL-ADTA achieved the lowest latency ~ 320 ms compared to 420 ms for AODV and 400 ms for DSR. The reduced delay arises from the protocol’s ability to avoid unstable routes while still prioritizing urgent packets. This result is consistent with the principle that reinforcement learning models can dynamically adjust decisions to minimize delays under dynamic topologies [ 3 ]. 4.8 ENERGY EFFICIENCY Figure 4 presents the average energy efficiency of the protocols. QL-ADTA demonstrated a clear advantage conserving over 80% residual energy compared to ~ 60% for AODV and DSR. The time-series plot Fig. 5 further emphasizes this finding QL-ADTA sustains energy for a longer period, whereas DSR exhibits faster depletion over the same simulation duration. These results highlight the importance of adaptive reward weighting in penalizing energy-draining routes, consistent with prior work on energy-aware routing [ 4 ][ 5 ]. 4.9 STATISTICAL SIGNIFICANCE To ensure that the observed improvements were not due to random variations, paired t-tests were conducted. As shown in Table 2 , QL-ADTA significantly out performed both AODV and DSR across all three metrics. For PDR, QL-ADTA achieved a mean value of 0.85, compared to 0.68 (AODV) and 0.70 (DSR), with highly significant differences (t = 42.03, p < 0.001 and t = 42.81, p < 0.001, respectively). Similarly, latency was reduced from 420 ms (AODV) and 400 ms (DSR) to 320 ms under QL-ADTA (t = -38.44, p < 0.001; t = -74.21, p < 0.001). Energy efficiency improved from 60% (AODV) and 62% (DSR) to 80% with QL-ADTA (t = 37.56, p < 0.001; t = 46.09, p < 0.001). These results confirm that the performance gains are statistically reliable. Table 2 Statistical Significance of QL-ADTA vs. Baseline Protocols Metric Protocols Compared Mean (Baseline) Mean (QL-ADTA) t-statistic p-value Significance Packet Delivery Ratio AODV vs. QL-ADTA 0.68 0.85 42.03 1.2 × 10⁻¹¹ Significant DSR vs. QL-ADTA 0.70 0.85 42.81 1.0 × 10⁻¹¹ Significant Latency (ms) AODV vs. QL-ADTA 420 320 -38.44 2.7 × 10⁻¹¹ Significant DSR vs. QL-ADTA 400 320 -74.21 7.4 × 10⁻¹⁴ Significant Energy Efficiency (%) AODV vs. QL-ADTA 60 80 37.56 3.3 × 10⁻¹¹ Significant DSR vs. QL-ADTA 62 80 46.09 5.3 × 10⁻¹² Significant These results demonstrate that QL-ADTA achieves consistent and statistically validated improvements over traditional routing protocols. The adaptive reward formulation allows the protocol to balance data delivery, delay minimization, and energy conservation under disaster-prone underwater conditions. This confirms the robustness of QL-ADTA as a disaster-resilient routing mechanism. 4.10 REWARD BEHAVIOR AND DISASTER AWARENESS The relationship between reward and PDR is shown in Fig. 6 . The cumulative reward increases in tandem with PDR, demonstrating that the reward function effectively incentivizes stable, high-quality transmissions. The design ensures that improvements in PDR translate directly into higher long-term rewards, validating the effectiveness of the Q-learning framework for disaster-aware routing [ 6 ]. Overall, QL-ADTA achieved 18–20% higher PDR than AODV and DSR 20–23% lower latency,~20% higher energy efficiency, and more stable performance under disaster conditions. These results confirm that the integration of adaptive reward weighting with Q-learning leads to statistically and practically significant improvements. The findings with established results on reinforcement learning in dynamic networks [ 7 ] while offering a novel disaster-aware formulation tailored for UWSNs. Table 3 Summary of Comparative Insights Protocol Disaster Adaptability PDR Latency Energy Efficiency Data Prioritization AODV / DSR Yes Low High Low No QLFR [ 3 ] Partial Medium Medium Medium No QTAR [ 5 ] Partial High Medium High No EP-ADTA [ 8 ] Yes High Low Medium No QL-ADTA Yes Very High Very Low High Yes Table 3 compares the performance of conventional, reinforcement learning-based, and the proposed QL-ADTA routing protocols. Unlike earlier schemes QL-ADTA demonstrates complete adaptability to disaster conditions, achieves very high packet delivery ratio, very low latency, high energy efficiency, and explicitly supports data prioritization for urgent communication. 5. CONCLUSION This study introduced QL-ADTA, a disaster-aware Q-learning-based Adaptive Data Transfer Algorithm for UWSNs. The proposed framework dynamically adjusts routing decisions through a multi-objective reward function that incorporates packet delivery reliability, latency, residual energy, link quality, hazard intensity, node survivability, and data priority. By integrating environmental disruption indicators directly into the learning process, QL-ADTA ensures reliable and energy-efficient communication under highly unstable underwater conditions. Experimental evaluation using large-scale synthetic disaster scenarios confirmed statistically significant improvements in packet delivery ratio, latency reduction, and energy efficiency compared to conventional routing protocols.A key contribution of this work is the introduction of a context-sensitive reward adaptation mechanism tailored specifically for disaster-prone environments. Unlike existing reinforcement learning-based routing schemes that primarily optimize general performance metrics, QL-ADTA prioritizes urgent data transmission while simultaneously mitigating risks associated with debris density, turbulent currents, and node instability. This design enhances network resilience and operational sustainability during emergency monitoring and recovery operations.For future research, several concrete hybrid reinforcement learning directions can further strengthen the adaptability and scalability of disaster-resilient UWSNs.QL-ADTA provides a disaster-aware and adaptive routing framework for UWSNs, significantly improving reliability, latency, and energy efficiency under dynamic underwater conditions. Future research can extend this work through hybrid reinforcement learning models such as CNN-LSTM-based deep RL, multi-agent cooperative learning, and federated learning to enhance scalability, predictive capability, and real-time adaptability in large-scale disaster scenarios. Declarations Competing Interests The authors declare that there is no conflict of interest regarding the publication of this paper. Ethical Approval This article does not contain any studies involving human participants or animals performed by the author. Informed Consent Informed consent was not required for this study as it did not involve human participants or animals. Funding This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. Author Contribution Ritu Bhardwaj & Ashwani Kush contributed to the conception and design of the study, development of the proposed model, simulation and experimentation, data analysis, and manuscript preparation. Data Availability The datasets used in this study were synthetically generated through simulations to emulate realistic underwater wireless sensor network conditions. The data supporting the findings of this study are available from the corresponding author upon reasonable request. References Emad Felemban FK, Shaikh UM, Qureshi (2015) Adil ASheikh, and Saad Bin Qaisar. Underwater sensor network applications: A comprehensive survey. Int J Distrib Sens Netw 11(11):896832 Hamid, Ouidir (2024) Amine Berqia, and Siham Aouad. Improving uwsn perfor-mance using reinforcement learning algorithm qendip. In 2024 11th International Conference on Wireless Networks and Mobile Communications (WINCOM), pages 1–6. IEEE Yuan Zhou T, Cao, Xiang W (2020) Anypath routing protocol design via q-learning for underwater sensor networks. IEEE Internet Things J 8(10):8173–8190 Prabhu D, Alageswaran R, Miruna Joe S, Amali (2023) Multiple agent based reinforcement learning for energy efficient routing in wsn. Wireless Netw 29(4):1787–1797 Nandyala CS, Kim H-W, Ho-Shin Cho (2023) Qtar: A q-learning-based topology-aware routing protocol for underwater wireless sensor networks. Comput Netw 222:109562 Chiara Petrioli R, Petroccia, Stojanovic M (2008) A comparative performance evaluation of mac protocols for underwater sensor networks. In OCEANS. pages 1–10. IEEE, 2008 Xingxing Xiao and Haining Huang (2020) A clustering routing algorithm based on improved ant colony optimization algorithms for underwater wireless sensor networks. Algorithms 13(10):250 Wang B, Ben K, Lin H, Zuo M, Zhang F (2022) Ep-adta: edge prediction-based adaptive data transfer algorithm for underwater wireless sensor networks (uwsns). Sensors 22(15):5490 Hado Hasselt (2010) Double q-learning. Adv Neural Inf Process Syst, 23 Hado Hasselt (2010) Double q-learning. In: Lafferty J, Williams C, Shawe-Taylor J, Zemel R, Culotta A (eds) Advances in Neural Information Processing Systems, vol 23. Curran Associates, Inc. Chao Wang X, Shen H, Wang H, Zhang, Mei H (2023) Reinforcement learning-based opportunistic routing protocol using depth information for energy-efficient underwater wireless sensor networks. IEEE Sens J 23(15):17771–17783 Lu Y, He R, Chen X Bin Lin, and Cunqian Yu. Energy-efficient depth-based opportunistic routing with q-learning for underwater wireless sensor networks. Sensors, 20(4):1025, 2020.5 Shuai Liu J, Wang W, Shi G, Han SY, Li Jia-heng (2024) Clorp: Cross-layer opportunistic routing protocol for underwater sensor networks based on multi-agent reinforcement learning. IEEE Sensors Journal Ragavi B, Baranidharan V, Ramash Kumar K (2023) A novel hybridized cluster-based geographical opportunistic routing protocol for effective data routing in underwater wireless sensor networks. J Electr Comput Eng 2023(1):5567483 Karan Singh and Rajeev Gupta (2021) Performance evaluation of a manet based secure and energy optimized communication protocol (e2 s-aodv) for underwater disaster response network. Int J Comput Networks Appl (IJCNA) 8(1):11–27 Yuto Tanimoto and Kenji Fukumizu (2024) State-separated sarsa: A practical sequential decision-making algorithm with recovering rewards Robert Lowe and Tom Ziemke (2013) Exploring the relationship of reward and punishment in reinforcement learning. In 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pages 140–147 Yuan Zhou T, Cao, Xiang W (2019) Qlfr: A q-learning-based localization-free routing protocol for underwater sensor networks. In 2019 IEEE GlobalCommunications Conference (GLOBECOM), pages 1–6. IEEE Rodoshi RT, Song Y, Choi W (2021) Reinforcementlearning-based routing protocol for underwater wireless sensor networks: acomparative survey. IEEE Access 9:154578–154599 Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8877266","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":591843812,"identity":"4368874e-4315-4e7d-8135-c7db2b9b634b","order_by":0,"name":"Ritu Bhardwaj","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA1klEQVRIiWNgGAWjYBACfvbmgw8YDIAMHgZmsAgbIS2SPceSDRgMDIAMYrUY3MgxkwBSDAZnoFoIgwMJxsYVBX8SN585Y2zAUGPHwCfdgF8HY8OBxIdnDAwSt53tMU5gOJbMwCZzAL8WZsaGw4YNIC3neYwPMLABkUQCfi1szIxtkiAtm/tBWv4RoYWHjZkNrGUDL9BhjG1EaJEA6gE6zNh4xpljxQaJfck8BLXY33//8WHDHznZ/p7kzRIfvtnJyc8goAUVABXzkKJ+FIyCUTAKRgEOAADzzTzkklCpUgAAAABJRU5ErkJggg==","orcid":"","institution":"Kurukshetra University","correspondingAuthor":true,"prefix":"","firstName":"Ritu","middleName":"","lastName":"Bhardwaj","suffix":""},{"id":591843813,"identity":"81ce0e2b-3e6c-47a2-831b-2976e58339f2","order_by":1,"name":"Ashwani Kush","email":"","orcid":"","institution":"Kurukshetra University","correspondingAuthor":false,"prefix":"","firstName":"Ashwani","middleName":"","lastName":"Kush","suffix":""}],"badges":[],"createdAt":"2026-02-14 05:54:27","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8877266/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8877266/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":102826597,"identity":"62dc620c-e263-4aa9-99f2-ede973d98ed5","added_by":"auto","created_at":"2026-02-17 08:57:10","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":42650,"visible":true,"origin":"","legend":"\u003cp\u003ePDR comparison\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-8877266/v1/90a31b7ddfbb4276dbc7bf71.png"},{"id":102826567,"identity":"e59e8fa8-889c-472e-8b93-7d20301b2c13","added_by":"auto","created_at":"2026-02-17 08:56:59","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":45556,"visible":true,"origin":"","legend":"\u003cp\u003ePDR under Disaster Condition\u003c/p\u003e","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-8877266/v1/71b548810f5be88737896760.png"},{"id":102826566,"identity":"8d0f779c-ca4d-4c85-8270-1f14565c2512","added_by":"auto","created_at":"2026-02-17 08:56:59","extension":"jpeg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":130179,"visible":true,"origin":"","legend":"\u003cp\u003eLatency Comparison\u003c/p\u003e","description":"","filename":"floatimage3.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8877266/v1/def168a55f977b1bc9489a06.jpeg"},{"id":102826569,"identity":"2ba81e8e-f7e2-4369-8d67-4948796b5e51","added_by":"auto","created_at":"2026-02-17 08:57:00","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":41431,"visible":true,"origin":"","legend":"\u003cp\u003eEnergy Efficiency Comparison\u003c/p\u003e","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-8877266/v1/953bb6afe72519a6aaf2a531.png"},{"id":102826589,"identity":"c00d39bc-8b29-402e-9620-822967ac5482","added_by":"auto","created_at":"2026-02-17 08:57:07","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":45049,"visible":true,"origin":"","legend":"\u003cp\u003eEnergy Efficiency over time\u003c/p\u003e","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-8877266/v1/7d668b07fbd9694747cc900f.png"},{"id":102826633,"identity":"883f201a-a016-4894-9d45-530069323f80","added_by":"auto","created_at":"2026-02-17 08:57:12","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":45895,"visible":true,"origin":"","legend":"\u003cp\u003eRelationship between PDR and Reward\u003c/p\u003e","description":"","filename":"floatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-8877266/v1/14fbb84bb6d6d529d0754dda.png"},{"id":103236571,"identity":"5a4a82fb-7754-4c77-9e04-be2e095e771d","added_by":"auto","created_at":"2026-02-23 13:12:33","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1086434,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8877266/v1/b0238c46-9b53-466f-ae55-977204af95c2.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"\u003cp\u003eA Q-Learning Framework for Disaster Resilient Data Transfer in Underwater Wİreless Sensor Networks\u003c/p\u003e","fulltext":[{"header":"1. INTRODUCTION","content":"\u003cp\u003eSubmarine earthquakes, tsunamis and underwater landslides are becoming more common and bigger thus we need to set up trustworthy systems to detect them in real time. These events often cause quick changes in the environment that upset marine ecosystems and damage infrastructure offshore. In times of crisis, it is important to quickly and accurately send environmental data so that disasters can be predicted, warnings can be sent out, and emergency actions can be taken. Underwater Wireless Sensor Networks (UWSNs) are becoming an important technology in this industry because they can monitor underwater environments on their own and all the time [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eUWSNs usually have a lot of sensor nodes that are spread out over a large area and placed below the surface of the water to collect usual environmental data and send it to a base station or sink node These nodes face unique challenges that don't exist in terrestrial networks, such as considerable signal loss, limited bandwidth, long propagation delays, changing topologies, and batteries that don't last very long [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e][\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. When a disaster happens, the situation gets a lot more complicated since network circumstances might change quickly because of debris moving, water currents changing, and nodes breaking down.These changes in the network might cause a lot of packet loss, energy loss, communication delays, and, in the end, failure to send important data[\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e][\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. AODV, DSR, and DBR are examples of traditional routing protocols that were intended for static or moderately dynamic environments. They rely on pre set routing paths or heuristic-driven forwarding choices. Some processes include energy-awareness or delay-tolerant features, but they often can't change on the fly to deal with big unexpected changes in the environment that happen during disasters [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e][\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. In addition, these rules don't rank data based on how important it is, which is very important in emergency response systems.Recent progress in Artificial Intelligence (AI), especially in Reinforcement Learning (RL), has led to several interesting ways to make decisions in real time and adapt to changing circumstances.\u003c/p\u003e \u003cp\u003eQ-learning is different from other types of RL because it doesn't use a model and can find the best actions by trying things out in the environment [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. Researchers have used Q-learning to come up with routing strategies that are both energy-efficient and dependent on latency in UWSNs [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e][\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. Still, most of the current Q-learning-based methods focus on improving network performance in normal settings and don't take into account the special restrictions and important criteria that need to be met during underwater emergencies. They don't pay attention to real-time disaster indicators like the amount of debris, the turbulence of the current, and the survivability of nodes. They also don't use data prioritisation methods.[\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e][\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eAlthough RL has been explored for routing in UWSNs, most existing approaches are designed for normal operating conditions and do not adequately consider the dynamic challenges posed by disaster scenarios. These methods often fail to incorporate essential parameters such as debris concentration, water current fluctuations, node survivability, and the urgency level of transmitted data. As a result, their adaptability and effectiveness in emergency situations remain limited. The proposed Q-Learning-based Adaptive Data Transfer Algorithm (QL-ADTA) addresses this research gap by introducing a disaster aware reward function that dynamically adjusts to environmental disruptions and prioritizes critical data. Through this context-sensitive learning framework QL-ADTA enhances real time decision-making for routing, ensuring reliable, energy-efficient, and responsive communication in disaster-affected underwater networks. This contribution significantly advances the current state of research by providing a robust solution tailored specifically for high-risk, unpredictable underwater environments.\u003c/p\u003e \u003cp\u003e \u003cb\u003eOBJECTİVES OF THE STUDY\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003col\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eTo provide a disaster-aware reward function that takes into account a lot of important factors, such as packet delivery ratio, energy efficiency, latency, network dependability, debris density, and data priority.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eTo create a Q-learning-based routing agent that always develops and enhances its ability to send data along the best paths based on feedback from the environment and the network in real time.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eTo simulate emergencies using synthetic UWSNs datasets considering the algorithm's robustness and comparing its effectiveness with current and traditional RL based routing protocols.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eTo test the adaptability of the algorithm is by adjusting the weights of the reward function in real time based on how serious the disaster is and the situations around it.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e \u003cp\u003eThis proposed study will add to the information we are familiar with through the development of a context-sensitive reinforcement learning framework that is specifically built for underwater communication systems that are likely to fail. The suggested QL-ADTA is better instead of old routing methods.\u003c/p\u003e"},{"header":"2. LITERATURE REVIEW","content":"\u003cp\u003eRouting strategies in UWSNs have been widely investigated with primary emphasis on improving energy efficiency, minimizing communication delay, and enhancing packet delivery reliability. However, routing in disaster-prone underwater environments where network topology changes rapidly and critical data must be transmitted urgently remains insufficiently explored. To present a clearer understanding of existing contributions, prior studies are grouped into thematic categories relevant to this research.\u003c/p\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.1 Energy-Efficient Routing Approaches\u003c/h2\u003e \u003cp\u003eEnergy conservation is a fundamental concern in UWSNs due to the difficulty of battery replacement and recharging in underwater environments. Several studies have applied reinforcement learning techniques to extend network lifetime.Ouidir et al. [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e] proposed QENDIP, a Q-learning-based routing scheme that improves energy-efficient forwarding and enhances throughput. Zhou et al. [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e] introduced the Q-Learning-Based Localization-Free Routing (QLFR) protocol, which selects routes based on residual energy and node depth to improve stability and energy utilization. Similarly, Xiao and Huang [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e] applied ant colony optimization to determine cluster heads and multi-hop routing paths using energy metrics and inter-node distance.Although these approaches significantly improve energy conservation, they are primarily designed for stable or moderately dynamic environments and lack mechanisms for real-time adaptation during disaster events.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e2.2 Latency-Aware and Delay-Sensitive Routing\u003c/h2\u003e \u003cp\u003eIn underwater monitoring and early warning systems, minimizing communication delay is critical. Several routing protocols focus on delay reduction while maintaining acceptable energy consumption.The QTAR protocol [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e] employs topology-aware Q-learning to reduce end-to-end delay while conserving energy. Wang et al. [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e] developed EP-ADTA, an edge-prediction-assisted reinforcement learning model that enhances reliable data transfer under challenging underwater conditions.While these techniques effectively reduce latency, they do not incorporate data prioritization or adapt to sudden environmental disruptions such as underwater earthquakes, landslides, or strong current turbulence factors essential in disaster response scenarios.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e2.3 Conventional Routing Protocols and Their Limitations\u003c/h2\u003e \u003cp\u003eTraditional routing protocols such as AODV, DSR, and DBR [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e] were originally designed for static or mildly dynamic networks. These protocols rely on predefined routes or heuristic-based forwarding decisions and are not well suited for highly unstable underwater disaster conditions.Although some schemes incorporate delay-tolerant or energy-aware features, they cannot dynamically respond to abrupt environmental changes such as debris movement, shifting water currents, or rapid node failures. Consequently, their reliability degrades significantly in emergency scenarios.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003e\u003cb\u003e2.4 Reinforcement Learning-Based Routing Enhancements\u003c/b\u003e\u003c/h2\u003e \u003cp\u003eReinforcement learning has recently gained attention for adaptive routing in UWSNs. Protocols such as QLFR [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e] and QTAR [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e] improve routing decisions by optimizing performance metrics like energy consumption and delay. More advanced techniques including Double Q-learning [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e] and Deep Q-Networks (DQN) [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e] address issues such as value overestimation and scalability. Multi-agent reinforcement learning frameworks [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e] and adaptive MAC-layer protocols [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e] further enhance overall network efficiency.Despite these advancements, most RL-based routing strategies are designed for normal operating conditions. Disaster-specific indicators such as debris density, current turbulence, node survivability, and urgency of transmitted data are rarely integrated into the learning process. Furthermore, existing approaches generally do not differentiate between high-priority emergency data and routine monitoring information.\u003c/p\u003e \u003cp\u003eFrom the above review, it is evident that significant progress has been made in improving energy efficiency and reducing delay in UWSNs. However, the combined requirements of disaster awareness, urgent data prioritization, and real-time adaptability remain insufficiently addressed. Current Q-learning-based routing protocols do not incorporate disaster-sensitive environmental parameters and lack mechanisms to prioritize critical data during emergencies.\u003c/p\u003e \u003cp\u003eTo bridge this gap, the proposed QL-ADTA framework introduces a context-aware, disaster-sensitive reward function that dynamically adapts routing decisions based on debris density, water current turbulence, node survivability, and data priority. This design enables reliable, adaptive, and energy-efficient communication tailored specifically for high-risk underwater disaster environments.\u003c/p\u003e \u003c/div\u003e"},{"header":"3. APPLICATION AND BROADER IMPACT OF THE PROPOSED WORK","content":"\u003cp\u003eThe proposed QL-ADTA has substantial practical consequences in fields that depend on reliable real time underwater communication during critical incidents. In Disaster Early Warning Systems QL-ADTA can be used in marine areas susceptible to undersea earthquakes, tsunamis, or landslides to guarantee ongoing communication of essential data. This allows authorities to provide timely alerts, save lives and reducing damage. Marine Research and Environmental Monitoring: Oceanographers and marine biologists may utilize the suggested routing system in challenging or dynamic underwater conditions to ensure a stable and energy-efficient transmission of information throughout longer observation missions.\u003c/p\u003e \u003cp\u003eIn Military and Surveillance Applications Naval operations, underwater surveillance, and border security can benefit from QL-ADTA's ability to prioritize urgent or high-importance transmissions during dynamic underwater conditions. Oil \u0026amp; Gas Industry can use offshore platforms to reliably transmit sensor readings during structural disturbances or environmental events, aiding operational continuity and hazard detection. When we use smart Ocean Infrastructure the algorithm can be integrated into intelligent ocean systems and digital twins of marine infrastructure, helping cities and coastal industries prepare for disaster resilience through real-time sensing and response mechanisms.\u003c/p\u003e \u003cp\u003eFrom a research standpoint, QL-ADTA opens pathways for:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eCross domain application of reinforcement learning in other sensor networks such as underground, space-based or vehicular sensor networks.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eHybridization with edge computing or swarm intelligence, potentially enabling decentralized intelligence in marine Internet of Things (IoT).\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003ePolicy-making and simulation frameworks for governments and disaster management authorities seeking resilient underwater data communication strategies.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eThis work, therefore, not only addresses a highly specialized engineering challenge in UWSNs but also contributes to the global efforts toward climate resilience, smart environmental sensing, and emergency response preparedness.\u003c/p\u003e"},{"header":"4. METHODOLOGY","content":"\u003cp\u003eUWSNs encounter severe instability in communication when subjected to natural disasters such as tsunamis, underwater earthquakes, and landslides. These scenarios result in unpredictable changes in node connectivity increased signal attenuation, and compromised energy availability [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e][\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e][\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. To overcome such dynamic and harsh conditions, we propose a QL-ADTA designed to intelligently optimize routing decisions based on environmental conditions and network status.\u003c/p\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003e4.1 SYNTHETIC DATASET GENERATİON\u003c/h2\u003e \u003cp\u003eTo evaluate the proposed algorithm in a controlled yet realistic environment, a synthetic dataset was developed that models diverse underwater disaster scenarios [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e][\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e][\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. For this we used Python programming Platform. The dataset comprises 50,000 samples each representing a snapshot of the underwater network state during varying intensities of environmental disruption. Each record in the dataset includes the following parameters which are Packet Delivery Ratio (PDR) values between 0.2 and 1.0 indicating successful data delivery, latency(ms) Simulated between 100 and 500 ms based on node distance and congestion, Remaining Energy (J) ranging from 0.2 J to 1.0 J to represent battery levels, node Survivability (%) operational probability under disaster conditions (30%\u0026ndash;100%).debris density (kg/m\u0026sup3;) modeled between 0 to 15,\u003c/p\u003e \u003cp\u003eindicating underwater obstruction, water current strength (m/s) simulated values from 0.5 to 3.0 m/s to reflect oceanic turbulence, link Quality Index (LQI) ranging from 10 to 30 dB, indicating channel conditions, priority weight (1\u0026ndash;5): represents the urgency level of the data e.g., (seismic\u0026thinsp;\u0026gt;\u0026thinsp;temperature) .These parameters were derived based on empirical models used in underwater disaster simulations [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e][\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e][\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]. Higher disaster intensity was modeled by increasing debris and current strength and decreasing node survivability. Disaster scenarios were emulated by dynamically adjusting debris density, water current, and link degradation. High disaster severity was modeled by simultaneously lowering node survivability and increasing both debris and current intensity.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003e4.2 LEARNING BASED ROUTİNG MODEL\u003c/h2\u003e \u003cp\u003eThe proposed QL-ADTA model follows the reinforcement learning paradigm [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e][\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e], where the routing agent learns the optimal forwarding strategy by interacting with its environment and receiving feedback. In state space (S) each state is defined by a combination of network and environmental factors extracted from the dataset including PDR, latency, remaining energy, node survivability, debris level, current strength, and data priority. In action space (A) at every decision point, a node can choose from a discrete set of actions. Firstly send the packet to a selected neighbor then delay the transmission to allow for better conditions after that choose an alternative path to bypass affected regions.\u003c/p\u003e \u003cp\u003eThe reward function guides the agent by quantifying the usefulness of each action. It is formulated as:\u003c/p\u003e \u003cp\u003eR\u003csub\u003etotal\u003c/sub\u003e =w\u003csub\u003e1\u003c/sub\u003e* (PDR*Priority) -w\u003csub\u003e2\u003c/sub\u003e* (latency *Priority) +w\u003csub\u003e3\u003c/sub\u003e* (Remaining Energy/Initial Energy )+ w\u003csub\u003e4\u003c/sub\u003e* (Link Quality\u0026times; Node Survivability )- w\u003csub\u003e5\u003c/sub\u003e*((Debris+ current) +w\u003csub\u003e6\u003c/sub\u003e*Priority (i)\u003c/p\u003e \u003cp\u003eEach component reflects a critical performance or risk metric. The weights w\u003csub\u003e1\u003c/sub\u003e to w\u003csub\u003e6\u003c/sub\u003e are initially assigned balanced values but are dynamically updated based on real time conditions which are described in the next section.The standard Q-learning update is employed:\u003c/p\u003e \u003cp\u003e\u003cimg src=\"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAsAAAAAnCAYAAAAB8f78AAAQAElEQVR4AezdBZzdNtYFcL0uc5e7nHaZs1CmpMzMmDIzc1JmZmZmZmZmZmZm/ObvrVLH8cybN/PSL+m7/VWxLcmCI/np3KNrz1ifxH+BQCAQCAQCgUAgEAgEAoFAByEwVor/AoFAIBDoSASi04FAIBAIBAKdikAQ4E4d+eh3IBAIBAKBQCAQCHQmAtHrFAQ4JkEgEAgEAoFAIBAIBAKBQEchEAS4o4Y7OhsIDEcgTgKBQCAQCAQCgY5FIAhwxw59dDwQCAQCgUAgEOhEBKLPgUAKF4iYBIFAIBAIBAKBQCAQCAQCnYVAKMCdNd7R208RiEMgEAgEAoFAIBAIdC4CQYA7d+yj54FAIBAIBAKdh0D0OBAIBLoQCALcBUL8HwgEAoFAIBAIBAKBQCDQOQgEAe6csf6sp3EWCAQCgUAgEAgEAoFAByMQBLiDBz+6HggEAoFApyEQ/Q0EAoFAAAJBgKEQIRAIBAKBQCAQCAQCgUCgYxDoQALcMWMbHe1QBJ577rl06KGHpvvuu69fCLz77rtpl112STfccEO/yombA4FAIBAIBAKB0Q2Bz5UA33bbbUloBwi3335728pqR3teeOGFdMEFF7SjqPTOO++kyy+/PCEybSmwUsjNN9+cFllkkTTjjDOmeeedt18EBzl65JFHKjX07fL+++9Pl1xySd9u/oLfZU6ceeaZabPNNivmR0/dveeee9KwYcPSiy++2FO2pmkPP/xwUR8ifdRRR6Ujjzwyvfnmm03vG5UZPvzww3TLLbekJ598si3V+D3yW1IurJ3PR7nc0eI8GhEIBAKBwGiOgN9k645mXnXVVWnNNddMr7zyisv07LPPpnXXXTddccUV6YMPPkjXXXddn9eDfhHgs846K00zzTTp61//evrVr36Vll122fTEE08Ujaz+c8oppySd+vvf/15N6tP13/72t3T33XcnbehTAU1uQmg33njj9Ic//KHo33/+8590yCGH1N716KOPpiOOOCL9+9//rk1vNfIb3/hGGjhwYNpqq63Sq6++2urtTfO/9NJL6Yc//GE6/PDDC1Jj/CaZZJKin85z+MlPfpJWXHHF9OCDD9aWSR00GQcMGFCb3mrk73//+/TlL385IVut3jum5c/zC8bwnnzyydN5551X2w0Pvh+Aiy66KK288srJ/KjN+GnkY489lsYdd9z0xz/+8dOYvh2uvPLK9N3vfjdNO+20aZZZZkmvv/56mnTSSbt9xvtSy3vvvZcOPvjg4tn55je/WbR7yy23TM8//3xtcccdd1xCzH/xi1/UprcaCSM/tH5E872ePW069dRTE8yNVU6LYyAQCAQCgcCoQeDjjz9O55xzTsEVM6+w/hE93n///aLSLIIQCL/yla+k3/3udwm/fPnll4v0nv6pprVMgD/55JOE8C2++OJpv/32S9tvv32yVXrjjTcWDRk0aFCySIlTmfzXXHNNeuCBB9Icc8yRxhqr5SoVM1JQzoILLliQBgvVRx99NFKevkQox6I333zzpe9973uFlfHGG2+kgw46qAB5sskmS/fee28yUMp/6623iv7+61//KkiluHYExAP5XGqppbq1bqh8fa0Lfl/96lfT1772tcQo0ecll1wyff/73y8mn/EzqS6++OK02GKLjbCdDqNjjjmmGMtZZ501NRqNvjZjhPsajUZBhKh7HoKMcTnTtddem1iE5bgx7ZxiPmTIkETxvuOOO5IHlxI///zzp1133bWwanOfEMR99923GCfE8Mc//nFOqj1SiinASKs5VJupl5Hnn39+8hz89Kc/LZ4F83G11VYrrO+nn366l6V0n436qnzk03yiLpuHr732WoKF35Ty3RRwVv/cc8/dtjnH+GDcnnTSScMNvfKzUa4/zgOBQCAQCARGHQLWxKuvvjpNP/30xZqnJuKLnWHrkOtf/vKXxW77PPPM47LgXf/973/TOuusk6wdRWQv/2mZjVpg99hjj/TQQw+lPffcsyAs6tI4sjTCttFGGyWdQH4RxGOPPbZQiL/97W/L2taw3HLLFSom1asdBVOX1l9//USRW2WVVZJ+sTIGDhxYkGD9GTp06HD3hMsuuyzdddddySLajvrLZfzpT38q8EW+s/VTTtfW8nV/zqmK//znP0cogtqH9CAqQk5U7/HHH59mnnnmHNW247e+9a1kMp944ompTv2Gf6uTvG2Na0NBtmxOP/309MwzzxTGo/lF+Vx++eULPLfZZptiZyNXBWuuNYgibHJ8d0eGi2eUocbA6S5fs3hqL2XU81XOO9dccxUq/dlnn51Y4uW0Vs6feuqptMYaayTG5dZbb12o1YjneOONl9Zbb73iubOjlIm259vvi7RW6hkxb/2VOinKdjTqc0RsIBAIBAKBwKhGwPr361//OjUTeqrtmHjiiQtBjoBUTevpumUCTK3iCsD1YZxxxhmpbMqNBfjcc88tlGGL6E033VS4EjQa/1MKLXq2GPmgDulSwqjIVLGRCuuKsACutdZaaeqpp04U3zPOOKMr9rP/EQikiDJYVgxtofZlK/20005LtjwHDx5cuAR8VlMqBmXRRRdNl156abr11luLpB122KFQvstqm75Y3GG09tprp/3337/IW/cPcotICtttt91IPpbIIAL0+OOP191eGwffvfbaK80wwwwJiWesbLLJJunAAw+szd9TJFVO36jh8lF/qcJf+tKXCstLnGDM9UWfKcZ8VesIrDEyH4y7MUVyKIDKyMEDgPhVFcCcXj3Ce4UVVigIVU474YQTCh9n8yDHOSKIsGFhmh8XXnhhoigK2lWeb7vvvrtbhgdlmavyGi/9lcjtB876s8QSSxTb90iuPAgjRVY+wVyF34QTTljMJ3ECsgoT6XZMxAn6QZXn8uM6B/hsvvnmxXOhLwi0rSIEW/5//OMfOWuPR0R26aWXLsoxZxFNN3gGBnXt5nARcJ2DuaDtlNq33347R7d8pPDzH7eTxB2nXMCPfvSjQgGwa6T/0hjR3/nOdwoXCdeCuekHM/cfHuKrASZcpSgJ0003Xdpggw2SOVPOZwfHs+/3qhwf54FAIBAIfCEQGM07Yd2zHk8wwQTDW2q99tvPHdX6hn9YN6i91Z1g3INbpx3k4QU0OWmZACOaGjJgwIBk0a6WP/bYYyfKJTJhgaQEUxcR1ZzX1qp4HUNcdMjCn9PLR4ST0kRx/MEPfpAs9AhHzkOdtSgjS7aLxSPpVGjKjutWAoJtAe7OAkFI+Z7ceeedhS8i35S//vWvw6vQj3322adQtBgB2mxgh2conXAh2XTTTZMt7pVWWikhwIhTKUtRDpcLoRzf3TmsGBfcUPglIweULVvHxqa7+6rxMKb0I86Du4yB8ccfv8hiQiL/SKrt4yKy6x9EBSlSLwLMPzuPR1fy8P8RK2RxqqmmSsaUEkoBNGY5k/E0Z6pY5PTyEaFGoBuNRuIKY/zUe/LJJyfjyMe2nF+bZ5999sKFxfxDag844IBCzURekV47AIiZh0557lcHEsq4OProoxO1nDqO3PJ/91IhdwZzQZ0Dup4PBI4f95///GdFFEHbkFfzghFRRH76j/FB1syvT6OSHYbf/va3hS9ujtMGaqjyYejHgFFGGVf30K4dimyw5HvqjlNOOWVioDFWjAGDYMcddyyyqtP4Fxelf6i0SLH5Ya6Xklo69eOFwP7lL3+pvU+8OWAc9ddvRJ6D+QbPix8742EH5vrrr89JIxxhYyy33XbbYteKAeKFvvKuyjhdxrwf1zzeIxQQF4FAIBAIBAKjFAFijDXR2pMrIqJZV4kTfq/9RhMv/E7jDjmfI77n/SZrtevehJYJcLNCvcSEeGi4xiIISKqQ7+UaIWgs14INN9wwdbe9ayGkROk88onwUBtzWUg4AmJLVR5EeKGFFipcFBB1i11vgsXT/bnc7o6InzSLN38VhMBCLU7QL23Ud3m8tES5llYNlLqZZpopIT76gJxZ7Mv5+OjK541+/aIy5sCQyOeOFDy4I6FUSiQLdsbEJCpbVuU6yueIKKODCwgjAkmkdCpLPhghP/rcaPxP0RevvyavsUGslllmmdqXtbTFdoVxhY82IX2CcgR16bM2I4D6lgPCZr7k69lmmy2ZW8gQIoQQPvroo8UXEKrb98oW5BcYZTBV3xRTTFH4gHrZjPEz0UQTFXOSIuweQbttwRsT/kfGLT+EVFFtQBoRSWOgPWXyq4xmgQHDSMj5GE/ama8djQG8tcMcR/4QcPNHejdhpGjb/siyuaf9XuLs7U4DjJH5aqHakxXlPEbVIxW9el/12vibTwwUxF6gAJfzmTONRiPBgv+6OV9Oz+dcr+w2GVf3IPAMOc9pzgMDvyXmXI6LYyAQCAQCgcDng4CPJPjN93udayRMeDk+X/uNxh3KeXKatdbaaG3Kcc2OLRNgZEGhFiQLj/NyyAv4b37zm0Ih5h9bTnfOP5FCRnXaaaedClcDC5C0ahg2bFjh2EyxQW6r6Y1GY7iztDTqGmKNMFKrKU29CZRciyOygcRRr5VXDYi2Qfr5z38+0jaqvPw555hjjsKPk0pHBUZQpFUDv04uIxRjW7TqreZBsDOxQaiorDnYzs3njpReZTA6KIDaqT3ekjSJkNZq+dVr43vooYcWL2MxSmBoUuV8zvM2eY5zRORhR/GkIjIU6iap8imndgC8ZEcpdn85IKcmujhkWt9yULZ5k6+9GIX4IEcIKsJELUdyvNynjO4CQlRNo6BW41wjiLZXzCXjBSNzhAorXaAgI6IMEWXX1d9oNArCbpwYS+4rB2PtyyPluOo54s1I8XUOhgYFF8lUZzVvT9fmk7lhp4ArATLvR6ine5qlIa5cbfL41B09F/pgfnpW68pkiPl94crBqEOsq/nsBiHhSLw6ufxU87imHHNFkcfzWEfyPRueNfkjBAKBwBcNgejP6I5Af9eeOr7RrM9jNctQTUcsMHDb4AhANd1nsXxH0wtUCFQdCcDqV1999cI/lQrozXIuE9WyqKHIhIXZW/ILL7xwNctI18gQn1v1OreF35tALdRe5NUCSd0tK0S5IkQVQaFcszhyfD5aRBExW658QJEMbg51C7itd07byAtXEKQ1l9PXo/ot5vl+hJWKaWu+HJ/T646ICcWM0qqNXDUYB/IipowX5+WA/CIiCBnCoj98j8t5nLPOGDWI0gILLJDgLb5dwefcvLDHxQIpble5iDrVV/v5izLiKLDV8uEzcODAdOmll6a658M9jBHziwpZvp//qXSktBxfPTc+3DgOO+yw4vuIFGNkkPFXzdvdNQLOdWLnnXcuXlA1Bxl13eVvd/ygQYOS8eELXFc2Q8NvA1cUZNl8qeYbMGBA8SKhcWGoUfP9ZpTzeYa9Qbzbbrslc8NujOeznCfOA4FAIBAIBP5/EbBuft4taJkAIzrIDR8Mf6zBAqPR1CzqL3UO+fR5LEoef2C+gmUygOxS6ag7SBbfDmV4WAAAC81JREFUDdfKQbQE5SEECKdtcoqihV8edUp3TgGiwvk0BvIhD4KCgNoOl6eVQMnkTM2PFrHI9aiThYIQIocGi68KlYranOugjJ966qnFZ6OQFFvifFYox8rQXvfIDytKufZm60UeQbqAqDAqeqvuIbkMFH6kcEQIkAN+tY1GIz322GOKrQ3qFYwVIsZXl3LmRT9b+sqDMbU+K3S5IF9tQOj0hesEpRcRhZ/7zA15jb3x4s6S1VZ51Ctd0HbB/HDdSkAe4Y2gMgbyvX09apt799577+LzZF7aowbDQPk53Zjqt2sGmK8omNvlfikH6aPcc21AWOUXb97wBUfquGOIExiSsHCeA/9zL3wpy3PmpS7PmvmZ85SPZfxzPD8pL7OuuuqqCRm1S2Hcc3pPR+P8s5/9bISdl57y16V5NhgpXGyQ3YwTPMwP/t/81z2PCLDA9aZcFlcKGPBR59/rPr8ZymL4GRN9Z4w0Go3iiypjjz12UYR65Csuuv7JBklf5lzX7fF/IBAIBAKBQD8QwBvxBLu4dcX4zRZfPYoTvCeCn4w77rguexVaJsB8OKkyFk4L/BZbbJFsK3JRGDJkSPEGvO1UKqkWUF0sNAiTa8EWvW+a8pVEnCiGFn5pFkb+pxZzBMmCRaVCPP2hAAsbZ2nkVH6AWbgRUqRNnICg2I533kpQn/bzY7bNrF/IPgXJi2pUSy9JqUsdiBzCVa6D4kT1tT3v+6aUNeUpiwHh83EWZvfKY1vf9nqj0Uj6pu+5PGo0UifkuHw0YfJ5PiIE1HKLPsdxZasTGUBkKZk5bz5S2vhJGwvEjc8vMmHchg4dmhBq29aUdPGwhr+xyGV4GQlh1kfl6J85IJ/xpM4ZZ36n4rRLn7lCIDa2yhkHymMwuR9RdF0OvhjQkzFAXeW/a+6U78vn+kTFZVRwz/GwaZd+mUvmmLrhpD3aR0Gn9htnc5bvtS9zIIJ8RsVTtRk+3DvgbdudEYWYlec+0mxnwDyyu2FbnisIQkvlRAizMaTNXEC4OhgX14KH3A6MF/YYiIwT8xbpl14OiDW3DG4S2pvT7MIw4DxX+fnVf88SwyXnqx5hAzdzn7FVTW/lmoKN4DIcPBPUYO1x7feA8ag8eDCCkVTXOSC7yy+/fPEyJTcO/WFQXnbZZckLi/y1GcTayuAw/2HF2OZO5Q995LK4YjAiGL85Lo6BwBcEgehGIDDaI2BdRG6tt7mxhEO/0/gFPoNzEG+IQrgS7pDzug8/IMLluGbHlgmwApEsLyNZoDQAafXmP5JmISovxEgQomex0jn3W8x8kggJ4RrgxaO8LYlcI81ICtWXKkYpsiAiF4iCdB1VFvcIC7E6kQtx/Q1Ua4QXceCi4EUrCzTCbxEtEzDEEEFBNNTLh5NqajFGopAhZBjhoC5R6xBAyixypO/Imj9cgBQiRhRxZQlUdhOjzqoxBvKUg0Uc2UTWYYi4MQS8jKcNFMVyfueIkQlGcRs2bFixJW5sjBfyQGnzKTqqrklmPBDqsvI9qGtLWzuNBwKrz9wuuJUgxupBrkxO5SEo2gY/7jBUVcROPoQSkalrK/KNXMpXFyjM+lmXJk6ftB3O5oxx0C5lMsq0XzriZKwoo+YoY4hiDxf3uN/46YPxROzNXWkeUu1kHHpWGBjqzkH71IXoecC5IiCV5pjnJedztL2PoJeNIrsd5g2XGfd5TsxXRpZ76oJnpPx8mOOeXwaDXQHPKMNOW9RXV4Y4feEvb6z1TVxfA2Kr3QwBRqbnzDzzW4Dc62cum6+8eVV2q+HOYI5ov7Yw3DybnhXYIr0M9jnnnDO5lmac7Q55tvU5l+9HlTuOr3vkuDgGAoFAIBAIfD4I4ABELx8kyDVab6yJ1mIciwBjbcYprL2Ei5zXruESSywxwqcyc1p3xz4RYIV5ucpWLUKIGFhcNMCWvfQcECBkAVFGDsTrEF9RBM7ihxBn8mMRtDBZxOSlyFgUkTMLPGJH4QOCdOoZ8o0EuG5XsMWrXIRe+6l4PiWG4JbroG5SYhEv8ZRhBJGbCHKHFOW2IUX6gxxzSXDk/0w5t9jzWebnLE1ZyBayYVAzPuKbBWSAgQFfZY833njFd5ipZbnschlw1VftzUG/YCyYmMZAGsXO2MCEep3L0TekQh6T0xi61/gPHjy4+M6sc/gxEIy7chFHXw1AxBqNRjJ/KH1UbKpmLr+3R6TOXOsuvzYxoLRTMAYUen/AxTW8tJFrATJofBB5uBkbmCKl5jCDjGFh/vIhd7/yEFPkzDUs8viX20TtNacZVZ4Fii4iXc7jnDEkH7U8Pz+ePcowDNWhbm3UN/eUAzIuHtlnHJXTtLt8vz7rn7LL+crn/mKa8hBgx3JaX87h6pnnjuHZZozZykJ2y+XB0AuzvrSR4z1Lno3cBwa4NMYywwWhdc0INa7GkuElzXxn0Eln1DFyGGKuIwQCgUAgEAh8/ggQYex625lVu3fO/L7ngEPk33xxeId83AKJHX7jXfc29JkA5wos5MgROdr2sjexvSFPSaO2NRqNhOhQnBBk6lq+t+5o25hSKtSl5zh1IWCIloW7HYtxLrt8tP1vW9YWMoWJEoUIs0zkQ0bEUejqCIw8OdjipWAZQIOV46tHuLF2qN/qRBKrefp7bbtbHwTnrZQHa312pNwZi+7ut3XPZcAYIYbd5ROvHC9QIqRIX1mxlN4sUC5tjWQi1Cz/6JDOsKEum8c+5UW1N08QYu2Tzu/YEeH1XInvbeDOAPtBXQp9o/HZZ+t6e3/OxwBEPs1JCj4VO6f199hoNBID0G6LbS7PEsLq9wIhzuX7bWFM+TawuZLjq0c7LwLjpJpWvVY+dwhGGWNDuufBc5ENDnERxlwEouWBQCAwZiBg7Sa+ZRfD3rTaTjTRxPpE/OvNPTlPvwlwLsgCbtH2V7a83EMpoubldG4SthctTDmu7kgFotLUpZXjWAmUSC+kleNHxbmtWuoQK0N9/uhDWaa3XU/95A/aU/3u9emznvJIsygjQtTDvqigyugpUOspbLCmCiNJPeXvLo0fOLJqLLrLQ+FDZuTrLk+Op8QheuZKjmvlCCvKdDPjqZUyP4+8sOHnTpnmt2qbh6Kb62Zkco+xQ8BAzPG9OVJJqdG9ydtTHkYMFZZvObLaU97+pNlh4PNFXed/7Qsk5fLsDFDZezIEGK2eM24f5Xvrzrn+MM79NuV0z7m5zTDRHgpyTotjIBAIBAKBwKhBgOhF/LKG97STm2snVhC9/E5b/3N8b49tI8C5QgsuP13qC3U0xztyFRCc9zfY5iyThO7La18K4ojM8U/k5lAumRtBNa6c3sq5hduCnP2cW7m3N3ltGyBSXtry0pot4d7cV5fHvcaiLq3VOMSKj2ar95XzM8AQynJcX8/511I7uQVwteFOoCz+7F4OpIq6bkfgUkFd9ZIWl4CquwK3DAYUX9Z21NdqGQwLRJox2Oq9reb3Q8Z9gY91dUvLzolno127IhQH2Jfb2M7no1xunAcCgUAgEAg0RwBP5BbZLKf1AA/sq0jRdgLcrMGRHgiMCQjYUuEDzE/Wi4DcerilaLstc9vzlFrXEQKB0RaBaFggEAgEAoFALQJBgGthichOR4B7DZ9UX62gcvvSALcR7im2yPnVUio7HafofyAQCAQCgUAgMDoi0KxNQYCbIRTpHYkAossdwVcauEFkP2d+7b4YwFHfFz86EpzodCAQCAQCgUAgMIYjEAR4DB/AaP6oQcBXTXwb1mdZvCzIx9v3grk98AXnJzpqao5S24dAlBQIBAKBQCAQCNQjEAS4HpeI7XAEOOFTfX2RwlcJfH/Qp/B8O9YXQUbVC4odDnt0PxAIBAKBQKAdCEQZTREIAtwUosjQiQj4a2S+R+u7xP6imz9I4vN33jpFfhuNvn9TtxPxjD4HAoFAIBAIBAKjEwJBgEen0Yi2BALtQyBKCgQCgUAgEAgEAoFuEAgC3A0wER0IBAKBQCAQCAQCYyIC0eZAoDkCQYCbYxQ5AoFAIBAIBAKBQCAQCAS+QAj8HwAAAP//HQAgWgAAAAZJREFUAwB573XGIqT50wAAAABJRU5ErkJggg==\" width=\"704\" height=\"39\"\u003e\u003c/p\u003e\u003cp\u003eWhere α is the learning rate, γ is the discount factor, and s\u0026prime; is the new state after performing action a.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003e4.3 WEİGHT ADAPTATİON MECHANİSM\u003c/h2\u003e \u003cp\u003eThe proposed approach is the dynamic tuning of reward function weights in response to detected disaster conditions. This enables the algorithm to adapt its routing preferences depending on real-time network vulnerability. Under normal conditions, the routing policy favors energy efficiency and network longevity. Thus weights are initialized as w\u003csub\u003e1\u003c/sub\u003e\u0026thinsp;=\u0026thinsp;0.2, w\u003csub\u003e2\u003c/sub\u003e\u0026thinsp;=\u0026thinsp;0.2, w\u003csub\u003e3\u003c/sub\u003e\u0026thinsp;=\u0026thinsp;0.2, w\u003csub\u003e4\u003c/sub\u003e\u0026thinsp;=\u0026thinsp;0.2, w\u003csub\u003e5\u003c/sub\u003e\u0026thinsp;=\u0026thinsp;0.1, w\u003csub\u003e6\u003c/sub\u003e= 0.1.Under disaster conditions i.e. identified by spikes in current strength or debris density or drops in node survivability, the weight values are adjusted to prioritize data delivery and resilience Increase w\u003csub\u003e5\u003c/sub\u003e (disaster risk penalty) to discourage routing through unstable regions and Increase w\u003csub\u003e6\u003c/sub\u003e (data priority reward) to accelerate critical data transmission .Decrease w\u003csub\u003e3\u003c/sub\u003e(energy consideration), since conserving energy becomes secondary to ensuring survivability. This real-time reweighting strategy allows the QL-ADTA to remain effective across varying network conditions without requiring retraining or manual intervention.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eSummary of Weight Adaptation in QL-ADTA\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eWeight\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eParameter\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRole in Reward Function\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eAdaptation Effect\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eImpact on Routing Decision\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eW1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePacket Delivery Ratio (PDR)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eEncourages reliable data transmission\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eIncreased when higher delivery is required\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003ePromotes stable high-PDR routes\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eW2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLatency\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePenalizes transmission delay (negative term)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eRaised when delay-sensitive traffic dominates\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eDiscourages paths with high latency\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eW3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eResidual Energy\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRewards energy-efficient nodes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eIncreased when battery conservation is critical\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eExtends network lifetime by favoring efficient nodes\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eW4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLink Quality\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRewards strong and stable connections\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eIncreased under unstable topologies\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eImproves robustness against link failures\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eW5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eHazard Factor (Debris\u0026thinsp;+\u0026thinsp;Current)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eApplies penalty for hazardous conditions (negative term)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eIncreased under disaster conditions; reduced in calm waters\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eEnsures safety by avoiding risky links\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eW6\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePriority\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eProvides bonus for urgent data\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eIncreased for emergency or real-time packets\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eEnsures timely delivery of critical information\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e summarizes the role of each weight in the reward function. The negative signs for W\u003csub\u003e2\u003c/sub\u003e and W\u003csub\u003e5\u003c/sub\u003e ensure that harmful factors (latency and hazards) always reduce the total reward. This dynamic adjustment allows QL-ADTA to balance safety, responsiveness, and energy preservation depending on the network state.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003e4.4 TRAİNİNG AND DEPLOYMENT\u003c/h2\u003e \u003cp\u003eA Q-table is initialized with all state-action values set to zero. Learning parameters are set as α\u0026thinsp;=\u0026thinsp;0.5, γ\u0026thinsp;=\u0026thinsp;0.9, ϵ=0.1(for exploration)[\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. Then agent reads a state from the dataset. It selects an action using the ε-greedy strategy. The environment transitions to a new state, and the reward is computed. After that the Q-value is updated using the defined rule as eqauation (ii).This process repeats until convergence. After sufficient episodes, the optimal routing policy policy π(s)\u0026thinsp;=\u0026thinsp;arg max\u003csub\u003ea\u003c/sub\u003eQ(s,a) is derived and applied in real-time routing decisions in new disaster instances.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003e4.5 RESULTS AND DİSCUSSİON\u003c/h2\u003e \u003cp\u003eThe proposed QL-ADTA protocol was evaluated against conventional routing protocols AODV, DSR and reinforcement learning-based schemes QLFR, QTAR, EP-ADTA. The evaluation considered PDR end to end latency, and energy efficiency under disaster-aware UWSNs conditions. Additional experiments were conducted to analyze performance under varying hazard intensity (debris density and water currents) and to assess the relationship between PDR and reward.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003e4.6 PACKET DELİVERY RATİO (PDR)\u003c/h2\u003e \u003cp\u003eFigure\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e illustrates the comparative performance of the protocols in terms of PDR. QL-ADTA consistently outperformed other approaches, achieving values above 0.84, while AODV and DSR remained below 0.70.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe improvement was particularly significant in unstable environments where debris density and water currents degrade communication. Figure\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e further confirms this trend showing that QL-ADTA sustains higher PDR even when debris density reaches 12\u0026ndash;14 kg/m\u0026sup3; and current strength exceeds 2 m/s. These improvements can be attributed to the adaptive reward function, which balances reliability and risk avoidance. Similar findings regarding adaptive routing under uncertainty were reported in [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e][\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e].\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003e4.7 LATENCY ANALYSIS\u003c/h2\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e compares the average end-to-end latency across protocols. QL-ADTA achieved the lowest latency\u0026thinsp;~\u0026thinsp;320 ms compared to 420 ms for AODV and 400 ms for DSR. The reduced delay arises from the protocol\u0026rsquo;s ability to avoid unstable routes while still prioritizing urgent packets. This result is consistent with the principle that reinforcement learning models can dynamically adjust decisions to minimize delays under dynamic topologies [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e].\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003e4.8 ENERGY EFFICIENCY\u003c/h2\u003e \u003cp\u003eFigure\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e presents the average energy efficiency of the protocols. QL-ADTA demonstrated a clear advantage conserving over 80% residual energy compared to ~\u0026thinsp;60% for AODV and DSR. The time-series plot Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e further emphasizes this finding QL-ADTA sustains energy for a longer period, whereas DSR exhibits faster depletion over the same simulation duration. These results highlight the importance of adaptive reward weighting in penalizing energy-draining routes, consistent with prior work on energy-aware routing [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e][\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e].\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003e4.9 STATISTICAL SIGNIFICANCE\u003c/h2\u003e \u003cp\u003eTo ensure that the observed improvements were not due to random variations, paired t-tests were conducted. As shown in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, QL-ADTA significantly out performed both AODV and DSR across all three metrics. For PDR, QL-ADTA achieved a mean value of 0.85, compared to 0.68 (AODV) and 0.70 (DSR), with highly significant differences (t\u0026thinsp;=\u0026thinsp;42.03, p\u0026thinsp;\u0026lt;\u0026thinsp;0.001 and t\u0026thinsp;=\u0026thinsp;42.81, p\u0026thinsp;\u0026lt;\u0026thinsp;0.001, respectively). Similarly, latency was reduced from 420 ms (AODV) and 400 ms (DSR) to 320 ms under QL-ADTA (t = -38.44, p\u0026thinsp;\u0026lt;\u0026thinsp;0.001; t = -74.21, p\u0026thinsp;\u0026lt;\u0026thinsp;0.001). Energy efficiency improved from 60% (AODV) and 62% (DSR) to 80% with QL-ADTA (t\u0026thinsp;=\u0026thinsp;37.56, p\u0026thinsp;\u0026lt;\u0026thinsp;0.001; t\u0026thinsp;=\u0026thinsp;46.09, p\u0026thinsp;\u0026lt;\u0026thinsp;0.001). These results confirm that the performance gains are statistically reliable.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eStatistical Significance of QL-ADTA vs. Baseline Protocols\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"7\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\"\u0026times;\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMetric\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eProtocols Compared\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eMean (Baseline)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eMean (QL-ADTA)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003et-statistic\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003ep-value\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eSignificance\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePacket Delivery Ratio\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAODV vs. QL-ADTA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.68\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.85\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e42.03\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026times;\" colname=\"c6\"\u003e \u003cp\u003e1.2 \u0026times; 10⁻\u0026sup1;\u0026sup1;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eSignificant\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDSR vs. QL-ADTA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.70\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.85\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e42.81\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026times;\" colname=\"c6\"\u003e \u003cp\u003e1.0 \u0026times; 10⁻\u0026sup1;\u0026sup1;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eSignificant\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLatency (ms)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAODV vs. QL-ADTA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e420\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e320\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e-38.44\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026times;\" colname=\"c6\"\u003e \u003cp\u003e2.7 \u0026times; 10⁻\u0026sup1;\u0026sup1;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eSignificant\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDSR vs. QL-ADTA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e400\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e320\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e-74.21\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026times;\" colname=\"c6\"\u003e \u003cp\u003e7.4 \u0026times; 10⁻\u0026sup1;⁴\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eSignificant\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEnergy Efficiency (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAODV vs. QL-ADTA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e60\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e80\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e37.56\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026times;\" colname=\"c6\"\u003e \u003cp\u003e3.3 \u0026times; 10⁻\u0026sup1;\u0026sup1;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eSignificant\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDSR vs. QL-ADTA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e62\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e80\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e46.09\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026times;\" colname=\"c6\"\u003e \u003cp\u003e5.3 \u0026times; 10⁻\u0026sup1;\u0026sup2;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eSignificant\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eThese results demonstrate that QL-ADTA achieves consistent and statistically validated improvements over traditional routing protocols. The adaptive reward formulation allows the protocol to balance data delivery, delay minimization, and energy conservation under disaster-prone underwater conditions. This confirms the robustness of QL-ADTA as a disaster-resilient routing mechanism.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003e4.10 REWARD BEHAVIOR AND DISASTER AWARENESS\u003c/h2\u003e \u003cp\u003eThe relationship between reward and PDR is shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e. The cumulative reward increases in tandem with PDR, demonstrating that the reward function effectively incentivizes stable, high-quality transmissions. The design ensures that improvements in PDR translate directly into higher long-term rewards, validating the effectiveness of the Q-learning framework for disaster-aware routing [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e].\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eOverall, QL-ADTA achieved 18\u0026ndash;20% higher PDR than AODV and DSR 20\u0026ndash;23% lower latency,~20% higher energy efficiency, and more stable performance under disaster conditions. These results confirm that the integration of adaptive reward weighting with Q-learning leads to statistically and practically significant improvements. The findings with established results on reinforcement learning in dynamic networks [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e] while offering a novel disaster-aware formulation tailored for UWSNs.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eSummary of Comparative Insights\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eProtocol\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDisaster Adaptability\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePDR\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eLatency\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eEnergy Efficiency\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eData Prioritization\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAODV / DSR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eLow\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eHigh\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eLow\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eQLFR [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePartial\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eMedium\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eMedium\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eMedium\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eQTAR [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePartial\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eHigh\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eMedium\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eHigh\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEP-ADTA [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eHigh\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eLow\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eMedium\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eQL-ADTA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eVery High\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eVery Low\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eHigh\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e compares the performance of conventional, reinforcement learning-based, and the proposed QL-ADTA routing protocols. Unlike earlier schemes QL-ADTA demonstrates complete adaptability to disaster conditions, achieves very high packet delivery ratio, very low latency, high energy efficiency, and explicitly supports data prioritization for urgent communication.\u003c/p\u003e \u003c/div\u003e"},{"header":"5. CONCLUSION","content":"\u003cp\u003eThis study introduced QL-ADTA, a disaster-aware Q-learning-based Adaptive Data Transfer Algorithm for UWSNs. The proposed framework dynamically adjusts routing decisions through a multi-objective reward function that incorporates packet delivery reliability, latency, residual energy, link quality, hazard intensity, node survivability, and data priority. By integrating environmental disruption indicators directly into the learning process, QL-ADTA ensures reliable and energy-efficient communication under highly unstable underwater conditions. Experimental evaluation using large-scale synthetic disaster scenarios confirmed statistically significant improvements in packet delivery ratio, latency reduction, and energy efficiency compared to conventional routing protocols.A key contribution of this work is the introduction of a context-sensitive reward adaptation mechanism tailored specifically for disaster-prone environments. Unlike existing reinforcement learning-based routing schemes that primarily optimize general performance metrics, QL-ADTA prioritizes urgent data transmission while simultaneously mitigating risks associated with debris density, turbulent currents, and node instability. This design enhances network resilience and operational sustainability during emergency monitoring and recovery operations.For future research, several concrete hybrid reinforcement learning directions can further strengthen the adaptability and scalability of disaster-resilient UWSNs.QL-ADTA provides a disaster-aware and adaptive routing framework for UWSNs, significantly improving reliability, latency, and energy efficiency under dynamic underwater conditions. Future research can extend this work through hybrid reinforcement learning models such as CNN-LSTM-based deep RL, multi-agent cooperative learning, and federated learning to enhance scalability, predictive capability, and real-time adaptability in large-scale disaster scenarios.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e \u003ch2\u003eCompeting Interests\u003c/h2\u003e \u003cp\u003eThe authors declare that there is no conflict of interest regarding the publication of this paper.\u003c/p\u003e \u003c/p\u003e\u003cp\u003e \u003ch2\u003eEthical Approval\u003c/h2\u003e \u003cp\u003eThis article does not contain any studies involving human participants or animals performed by the author.\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eInformed Consent\u003c/strong\u003e \u003cp\u003eInformed consent was not required for this study as it did not involve human participants or animals.\u003c/p\u003e \u003c/p\u003e\u003ch2\u003eFunding\u003c/h2\u003e \u003cp\u003eThis research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.\u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eRitu Bhardwaj \u0026amp; Ashwani Kush contributed to the conception and design of the study, development of the proposed model, simulation and experimentation, data analysis, and manuscript preparation.\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eThe datasets used in this study were synthetically generated through simulations to emulate realistic underwater wireless sensor network conditions. The data supporting the findings of this study are available from the corresponding author upon reasonable request.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eEmad Felemban FK, Shaikh UM, Qureshi (2015) Adil ASheikh, and Saad Bin Qaisar. Underwater sensor network applications: A comprehensive survey. Int J Distrib Sens Netw 11(11):896832\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHamid, Ouidir (2024) Amine Berqia, and Siham Aouad. Improving uwsn perfor-mance using reinforcement learning algorithm qendip. In 2024 11th International Conference on Wireless Networks and Mobile Communications (WINCOM), pages 1\u0026ndash;6. IEEE\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYuan Zhou T, Cao, Xiang W (2020) Anypath routing protocol design via q-learning for underwater sensor networks. IEEE Internet Things J 8(10):8173\u0026ndash;8190\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePrabhu D, Alageswaran R, Miruna Joe S, Amali (2023) Multiple agent based reinforcement learning for energy efficient routing in wsn. Wireless Netw 29(4):1787\u0026ndash;1797\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNandyala CS, Kim H-W, Ho-Shin Cho (2023) Qtar: A q-learning-based topology-aware routing protocol for underwater wireless sensor networks. Comput Netw 222:109562\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChiara Petrioli R, Petroccia, Stojanovic M (2008) A comparative performance evaluation of mac protocols for underwater sensor networks. In OCEANS. pages 1\u0026ndash;10. IEEE, 2008\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXingxing Xiao and Haining Huang (2020) A clustering routing algorithm based on improved ant colony optimization algorithms for underwater wireless sensor networks. Algorithms 13(10):250\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang B, Ben K, Lin H, Zuo M, Zhang F (2022) Ep-adta: edge prediction-based adaptive data transfer algorithm for underwater wireless sensor networks (uwsns). Sensors 22(15):5490\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHado Hasselt (2010) Double q-learning. Adv Neural Inf Process Syst, 23\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHado Hasselt (2010) Double q-learning. In: Lafferty J, Williams C, Shawe-Taylor J, Zemel R, Culotta A (eds) Advances in Neural Information Processing Systems, vol 23. Curran Associates, Inc.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChao Wang X, Shen H, Wang H, Zhang, Mei H (2023) Reinforcement learning-based opportunistic routing protocol using depth information for energy-efficient underwater wireless sensor networks. IEEE Sens J 23(15):17771\u0026ndash;17783\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLu Y, He R, Chen X Bin Lin, and Cunqian Yu. Energy-efficient depth-based opportunistic routing with q-learning for underwater wireless sensor networks. Sensors, 20(4):1025, 2020.5\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShuai Liu J, Wang W, Shi G, Han SY, Li Jia-heng (2024) Clorp: Cross-layer opportunistic routing protocol for underwater sensor networks based on multi-agent reinforcement learning. IEEE Sensors Journal\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRagavi B, Baranidharan V, Ramash Kumar K (2023) A novel hybridized cluster-based geographical opportunistic routing protocol for effective data routing in underwater wireless sensor networks. J Electr Comput Eng 2023(1):5567483\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKaran Singh and Rajeev Gupta (2021) Performance evaluation of a manet based secure and energy optimized communication protocol (e2 s-aodv) for underwater disaster response network. Int J Comput Networks Appl (IJCNA) 8(1):11\u0026ndash;27\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYuto Tanimoto and Kenji Fukumizu (2024) State-separated sarsa: A practical sequential decision-making algorithm with recovering rewards\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRobert Lowe and Tom Ziemke (2013) Exploring the relationship of reward and punishment in reinforcement learning. In 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pages 140\u0026ndash;147\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYuan Zhou T, Cao, Xiang W (2019) Qlfr: A q-learning-based localization-free routing protocol for underwater sensor networks. In 2019 IEEE GlobalCommunications Conference (GLOBECOM), pages 1\u0026ndash;6. IEEE\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRodoshi RT, Song Y, Choi W (2021) Reinforcementlearning-based routing protocol for underwater wireless sensor networks: acomparative survey. IEEE Access 9:154578\u0026ndash;154599\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Underwater Wireless Sensor Networks, Reinforcement Learning, Q-Learning, Disaster Management, Energy Efficiency, Adaptive Routing, Data Transfer Optimization","lastPublishedDoi":"10.21203/rs.3.rs-8877266/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8877266/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eUnderwater Wireless Sensor Networks (UWSNs) play a critical role in real-time environmental monitoring during disasters such as earthquakes, tsunamis, and underwater landslides. However, highly dynamic and hazardous underwater conditions significantly reduce the effectiveness of conventional routing protocols. This study addresses data transmission challenges in disaster-prone UWSNs by proposing a Q-learning-based Adaptive Data Transfer Algorithm (QL-ADTA). The proposed approach employs a multi-objective reward function that dynamically adapts to contextual parameters including data priority, link quality, node survivability, debris density, and water current strength.The algorithm was evaluated using disaster-simulated synthetic datasets and compared with conventional routing protocols such as DSR and AODV. Experimental results demonstrate that QL-ADTA achieves an 18% improvement in packet delivery ratio, 23% reduction in latency, and 20% enhancement in energy efficiency. These findings confirm the model\u0026rsquo;s ability to adapt routing decisions in real time based on disaster severity, thereby improving communication reliability and extending network lifetime. The context-aware reward design and emergency-adaptive routing strategy distinguish the proposed method from conventional reinforcement learning approaches in UWSNs, offering a robust framework for disaster-resilient underwater communication.\u003c/p\u003e","manuscriptTitle":"A Q-Learning Framework for Disaster Resilient Data Transfer in Underwater Wİreless Sensor Networks","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-02-17 08:56:29","doi":"10.21203/rs.3.rs-8877266/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"c41c8d7e-6cfd-4e56-968b-3d04ffd13411","owner":[],"postedDate":"February 17th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2026-02-23T13:11:56+00:00","versionOfRecord":[],"versionCreatedAt":"2026-02-17 08:56:29","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8877266","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8877266","identity":"rs-8877266","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.