Cooperate to Generalize: Deep Reinforcement Learning for Real-time Ad Hoc Team Routing | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Cooperate to Generalize: Deep Reinforcement Learning for Real-time Ad Hoc Team Routing Gangyan Xu, Pengfu WAN, Jiawei Chen, Zhengxiong Zhu This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7896001/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Deep reinforcement learning (DRL) has demonstrated remarkable performance in efficiently solving various routing problems. However, its wide deployment in real-world applications remains challenging due to the stringent requirement for consistency between its training and application scenarios. To address this issue, improving the generalization capabilities of DRL has attracted much attention in the literature, primarily focusing on node properties while overlooking the generalizability of team resources. In practice, many decision-making systems operate under ad hoc conditions, where the available team resources are inherently uncertain and heterogeneous. It poses significant challenges for existing learning-based approaches. To address this issue, we propose a general decision-making framework tailored for real-time ad hoc team routing and introduce a novel, generalizable DRL-based method termed Generalizable Ad Hoc Team Routing (GATR). Inspired by cooperative behaviors observed in human teams, we introduce a cooperative decision-making mechanism that aggregates the knowledge of diverse team members and coordinates their actions, enabling GATR to seamlessly generalize to teams with varying or previously unseen configurations. In addition, we develop the adaptive information sharing module and leverage the inherent property of team symmetry to further enhance the effectiveness of cooperative decision-making. In a range of real-world applications, including disaster response and city logistics, GATR exhibits superior solution capabilities under varying and unseen team configurations, maintaining robust performance even under extreme conditions. These results highlight the potential of GATR for broader cross-domain applications and complex decision-making systems. Physical sciences/Mathematics and computing/Computational science Physical sciences/Mathematics and computing/Computer science Scientific community and society/Social sciences/Decision making Earth and environmental sciences/Natural hazards Physical sciences/Mathematics and computing/Information technology Full Text Additional Declarations There is NO Competing Interest. Supplementary Files Supplementary.pdf Supplementary Information Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7896001","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":533888070,"identity":"0b8227f7-b710-4a16-ba3c-0a6f2878b48c","order_by":0,"name":"Gangyan Xu","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAy0lEQVRIiWNgGAWjYBADORBx4AEpWozBWhJI0ZLYACKJ0iLvfvziY55fh9Pnhx1+CLTFTk63gYAWwzM5xYYz+w7nbrydZgDUkmxsdoCQloacNImPPUAtsxNAWg4kbiOopf9N+o/EnsPphrPTPxCnRV4i/RjDhx+HE+Slc4i0xUDiDbPkzIZ0ww3SOQUHEgyI8It8f/rDzzx/rOXlZ6dv/vChwk6OoBaDAzwGDIxtzUAGmEtAOdiWBvYHDAx/6oAMIlSPglEwCkbByAQA3ehNbZoKROUAAAAASUVORK5CYII=","orcid":"https://orcid.org/0000-0001-9537-9006","institution":"The Hong Kong Polytechnic University","correspondingAuthor":true,"prefix":"","firstName":"Gangyan","middleName":"","lastName":"Xu","suffix":""},{"id":533888071,"identity":"288be1c7-99dd-48f2-9bff-64daba3c6310","order_by":1,"name":"Pengfu WAN","email":"","orcid":"","institution":"","correspondingAuthor":false,"prefix":"","firstName":"Pengfu","middleName":"","lastName":"WAN","suffix":""},{"id":533888072,"identity":"4d5e6b1e-2acf-433a-a5f1-fc836927b497","order_by":2,"name":"Jiawei Chen","email":"","orcid":"","institution":"The Hong Kong Polytechnic University","correspondingAuthor":false,"prefix":"","firstName":"Jiawei","middleName":"","lastName":"Chen","suffix":""},{"id":533888073,"identity":"c946d175-0492-4a2d-9bc3-ddbf592c1faa","order_by":3,"name":"Zhengxiong Zhu","email":"","orcid":"","institution":"The Hong Kong Polytechnic University","correspondingAuthor":false,"prefix":"","firstName":"Zhengxiong","middleName":"","lastName":"Zhu","suffix":""}],"badges":[],"createdAt":"2025-10-19 02:25:14","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7896001/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7896001/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":94237519,"identity":"62f383e5-a123-4b02-ac0a-6a240a0400c6","added_by":"auto","created_at":"2025-10-24 02:32:10","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":5517047,"visible":true,"origin":"","legend":"","description":"","filename":"Manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7896001/v1/2a6b5b19859a11b84e7a7756.pdf"},{"id":94237518,"identity":"9be57a1f-10d4-4842-adaa-3d9064acbc10","added_by":"auto","created_at":"2025-10-24 02:32:10","extension":"json","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":6559,"visible":true,"origin":"","legend":"","description":"","filename":"NCOMMS2583783.json","url":"https://assets-eu.researchsquare.com/files/rs-7896001/v1/e8b13973bcddf883f5886ce1.json"},{"id":94237521,"identity":"e2dfed20-d04b-40a9-be95-03ee12d7d1d5","added_by":"auto","created_at":"2025-10-24 02:32:10","extension":"pdf","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":437104,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementary.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7896001/v1/12e03c004d712ba9ceaf31c4.pdf"},{"id":103507124,"identity":"1e70466b-b02c-43f7-9599-4f3bff1bec29","added_by":"auto","created_at":"2026-02-26 13:40:31","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1672106,"visible":true,"origin":"","legend":"Article File","description":"","filename":"Manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7896001/v1_covered_3b6875be-5529-444a-8966-faddfe49398f.pdf"},{"id":94237520,"identity":"6b698c1b-0150-43c1-973c-ca4570b77787","added_by":"auto","created_at":"2025-10-24 02:32:10","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":437104,"visible":true,"origin":"","legend":"Supplementary Information","description":"","filename":"Supplementary.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7896001/v1/b2b2688549027088bc316c1b.pdf"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"Cooperate to Generalize: Deep Reinforcement Learning for Real-time Ad Hoc Team Routing","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-7896001/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7896001/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Deep reinforcement learning (DRL) has demonstrated remarkable performance in efficiently solving various routing problems. However, its wide deployment in real-world applications remains challenging due to the stringent requirement for consistency between its training and application scenarios. To address this issue, improving the generalization capabilities of DRL has attracted much attention in the literature, primarily focusing on node properties while overlooking the generalizability of team resources. In practice, many decision-making systems operate under ad hoc conditions, where the available team resources are inherently uncertain and heterogeneous. It poses significant challenges for existing learning-based approaches. To address this issue, we propose a general decision-making framework tailored for real-time ad hoc team routing and introduce a novel, generalizable DRL-based method termed Generalizable Ad Hoc Team Routing (GATR). Inspired by cooperative behaviors observed in human teams, we introduce a cooperative decision-making mechanism that aggregates the knowledge of diverse team members and coordinates their actions, enabling GATR to seamlessly generalize to teams with varying or previously unseen configurations. In addition, we develop the adaptive information sharing module and leverage the inherent property of team symmetry to further enhance the effectiveness of cooperative decision-making. In a range of real-world applications, including disaster response and city logistics, GATR exhibits superior solution capabilities under varying and unseen team configurations, maintaining robust performance even under extreme conditions. These results highlight the potential of GATR for broader cross-domain applications and complex decision-making systems.","manuscriptTitle":"Cooperate to Generalize: Deep Reinforcement Learning for Real-time Ad Hoc Team Routing","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-10-24 02:32:05","doi":"10.21203/rs.3.rs-7896001/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"2099af96-ef71-438e-94b8-2b39f6f958e0","owner":[],"postedDate":"October 24th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":56762296,"name":"Physical sciences/Mathematics and computing/Computational science"},{"id":56762297,"name":"Physical sciences/Mathematics and computing/Computer science"},{"id":56762298,"name":"Scientific community and society/Social sciences/Decision making"},{"id":56762299,"name":"Earth and environmental sciences/Natural hazards"},{"id":56762300,"name":"Physical sciences/Mathematics and computing/Information technology"}],"tags":[],"updatedAt":"2026-02-25T12:36:56+00:00","versionOfRecord":[],"versionCreatedAt":"2025-10-24 02:32:05","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7896001","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7896001","identity":"rs-7896001","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.