Transformer-Enhanced Deep Q-Learning for AdaptiveRobot Path Planning in Dynamic Environments

doi:10.21203/rs.3.rs-6272245/v1

Transformer-Enhanced Deep Q-Learning for AdaptiveRobot Path Planning in Dynamic Environments

2025 · doi:10.21203/rs.3.rs-6272245/v1

preprint OA: closed

Full text JSON View at publisher

Full text 13,581 characters · extracted from preprint-html · click to expand

Transformer-Enhanced Deep Q-Learning for AdaptiveRobot Path Planning in Dynamic Environments | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Transformer-Enhanced Deep Q-Learning for AdaptiveRobot Path Planning in Dynamic Environments Harish Sharma HS, Ritu Tiwari RT, Shubham Shukla SS, Sushant Kumar SK This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6272245/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 9 You are reading this latest preprint version Abstract Efficient navigation in dynamic environments remains a critical challenge for autonomous robots in industrial manufacturing, search-and-rescue operations, and automated warehousing. Traditional path-planning algorithms struggle to adapt to real-time obstacle movements, while conventional reinforcement learning (RL) approaches lack the capacity to model long-range spatial dependencies. This paper presents Transformer-Enhanced Deep Q-Learning (Transformer-DQN), a novel framework that integrates transformer architectures with Deep Q-Networks (DQN) to address the limitations. By leveraging multi-head self-attention mechanisms and Cartesian positional encoding, the model dynamically captures obstacle interactions and optimizes navigation in cluttered environments. Hyperparameter tuning via Optuna ensures a balance between exploration and exploitation, while prioritized experience replay enhances training stability. Experimental results demonstrate significant advancements over baseline methods:20×20 Grids: Reduces average pathfinding time by 33.65% (209.5s → 139s) andcollisions by 37.5% compared to vanilla DQN.30×30 Grids: Achieves a 94.6% time reduction (9125s → 492.5s) and 33.3% fewercollisions, showcasing superior scalability.Adaptive Performance: Outperforms PPO (70% vs. 85% success rate) and classicalplanners (RRT*/D*) in dynamic settings, approaching optimal path lengths (28 vs. 25steps).The Transformer-DQN’s ability to generalize across grid sizes and dynamically re-plan in real-time positions it as a robust solution for time-sensitive applications. Theoretical analysis confirms convergence guarantees, while empirical validation highlightsits energy efficiency and reduced computational overhead. This work bridges the gap between simulated and real-world robotic systems, offering a scalable framework forautonomous navigation in unstructured, dynamic environments. Reinforcement Learning Deep-Q-Network Adaptive Robot path-planning Transformer Optuna Prioritized Experience Replay Autonomous Navigation Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Under Review Version 1 posted Editorial decision: Revision requested 16 Sep, 2025 Reviews received at journal 21 Jun, 2025 Reviewers agreed at journal 12 Jun, 2025 Reviews received at journal 02 May, 2025 Reviewers agreed at journal 24 Apr, 2025 Reviewers invited by journal 24 Apr, 2025 Editor assigned by journal 22 Mar, 2025 Submission checks completed at journal 22 Mar, 2025 First submitted to journal 20 Mar, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6272245","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":447302593,"identity":"34c23863-4312-4500-b2c3-f127118fcb76","order_by":0,"name":"Harish Sharma HS","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAx0lEQVRIiWNgGAWjYBACAyA+wNhgw8PGDBVhI1JLGlALMwlaGBgbDgNJZgJKYcCc/ezDgz93nJfhY+c/9oChxo6BT7oBvxbLnnSDw7xnboMcxm7AcCyZgU3mAAGHHUhjOMzYBtbCJsHAdoCBTSKBgJbzzxgO/mw7B9XyjxgtN9IYDvC2HYBoYWwjSsszhsO8bckgLeYGiX1ABmGHpTF//NlmZy/ff/DZgw/f7OTkZxDQggzYGICKeYhXz0BEvI+CUTAKRsEIBQAxyTcLMocNAQAAAABJRU5ErkJggg==","orcid":"","institution":"Indian Institute of Information Technology, Pune","correspondingAuthor":true,"prefix":"","firstName":"Harish","middleName":"Sharma","lastName":"HS","suffix":""},{"id":447302594,"identity":"9702ccc2-497b-4ac1-9119-c6a8410c15b9","order_by":1,"name":"Ritu Tiwari RT","email":"","orcid":"","institution":"Sardar Vallabhbhai National Institute of Technology Surat","correspondingAuthor":false,"prefix":"","firstName":"Ritu","middleName":"Tiwari","lastName":"RT","suffix":""},{"id":447302595,"identity":"f2a95fe2-ade1-4f25-aaca-de4c2fa62fb5","order_by":2,"name":"Shubham Shukla SS","email":"","orcid":"","institution":"Indian Institute of Information Technology, Pune","correspondingAuthor":false,"prefix":"","firstName":"Shubham","middleName":"Shukla","lastName":"SS","suffix":""},{"id":447302596,"identity":"2f23d5c6-a5f2-454b-8504-24caf6a025ad","order_by":3,"name":"Sushant Kumar SK","email":"","orcid":"","institution":"Indian Institute of Information Technology, Pune","correspondingAuthor":false,"prefix":"","firstName":"Sushant","middleName":"Kumar","lastName":"SK","suffix":""}],"badges":[],"createdAt":"2025-03-20 20:08:11","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6272245/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6272245/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":81497753,"identity":"1a0a8b74-3ed3-461e-8cbe-2c7b0b956829","added_by":"auto","created_at":"2025-04-28 02:55:48","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1366879,"visible":true,"origin":"","legend":"","description":"","filename":"DRLRobotics.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6272245/v1_covered_eafdf54b-d8cc-47b5-970d-53d4f75854e5.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Transformer-Enhanced Deep Q-Learning for AdaptiveRobot Path Planning in Dynamic Environments","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"cluster-computing","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"Learn more about [Cluster Computing](https://www.springer.com/journal/10586)","snPcode":"10586","submissionUrl":"https://submission.nature.com/new-submission/10586/3","title":"Cluster Computing","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"Reinforcement Learning, Deep-Q-Network, Adaptive Robot path-planning, Transformer, Optuna, Prioritized Experience Replay, Autonomous Navigation","lastPublishedDoi":"10.21203/rs.3.rs-6272245/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6272245/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Efficient navigation in dynamic environments remains a critical challenge for autonomous robots in industrial manufacturing, search-and-rescue operations, and automated warehousing. Traditional path-planning algorithms struggle to adapt to real-time obstacle movements, while conventional reinforcement learning (RL) approaches lack the capacity to model long-range spatial dependencies. This paper presents Transformer-Enhanced Deep Q-Learning (Transformer-DQN), a novel framework that integrates transformer architectures with Deep Q-Networks (DQN) to address the limitations. By leveraging multi-head self-attention mechanisms and Cartesian positional encoding, the model dynamically captures obstacle interactions and optimizes navigation in cluttered environments. Hyperparameter tuning via Optuna ensures a balance between exploration and exploitation, while prioritized experience replay enhances training stability. Experimental results demonstrate significant advancements over baseline methods:20×20 Grids: Reduces average pathfinding time by 33.65% (209.5s → 139s) andcollisions by 37.5% compared to vanilla DQN.30×30 Grids: Achieves a 94.6% time reduction (9125s → 492.5s) and 33.3% fewercollisions, showcasing superior scalability.Adaptive Performance: Outperforms PPO (70% vs. 85% success rate) and classicalplanners (RRT*/D*) in dynamic settings, approaching optimal path lengths (28 vs. 25steps).The Transformer-DQN’s ability to generalize across grid sizes and dynamically re-plan in real-time positions it as a robust solution for time-sensitive applications. Theoretical analysis confirms convergence guarantees, while empirical validation highlightsits energy efficiency and reduced computational overhead. This work bridges the gap between simulated and real-world robotic systems, offering a scalable framework forautonomous navigation in unstructured, dynamic environments.","manuscriptTitle":"Transformer-Enhanced Deep Q-Learning for AdaptiveRobot Path Planning in Dynamic Environments","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-04-28 02:47:42","doi":"10.21203/rs.3.rs-6272245/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-09-16T22:19:35+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-06-22T02:37:47+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"270634331976117565363703421403455456061","date":"2025-06-12T08:07:33+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-05-02T07:15:00+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"295994085654009744163211815191640609838","date":"2025-04-24T07:14:06+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-04-24T06:57:42+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-03-22T14:32:08+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-03-22T05:52:47+00:00","index":"","fulltext":""},{"type":"submitted","content":"Cluster Computing","date":"2025-03-20T19:57:36+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"cluster-computing","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"Learn more about [Cluster Computing](https://www.springer.com/journal/10586)","snPcode":"10586","submissionUrl":"https://submission.nature.com/new-submission/10586/3","title":"Cluster Computing","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"f3a07f6e-5a0d-4661-a5c7-11b40f7d2833","owner":[],"postedDate":"April 28th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2026-03-02T00:53:48+00:00","versionOfRecord":[],"versionCreatedAt":"2025-04-28 02:47:42","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-6272245","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6272245","identity":"rs-6272245","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00