Behavioural Transparency in Multi-Agent RL: Strategic Decoupling and the Economic Optimisation of Tactical Shocks

doi:10.21203/rs.3.rs-9183247/v1

Behavioural Transparency in Multi-Agent RL: Strategic Decoupling and the Economic Optimisation of Tactical Shocks

2026 · doi:10.21203/rs.3.rs-9183247/v1

preprint OA: closed CC-BY-4.0

📄 Open PDF Full text JSON View at publisher

Full text 11,164 characters · extracted from preprint-html · click to expand

Behavioural Transparency in Multi-Agent RL: Strategic Decoupling and the Economic Optimisation of Tactical Shocks | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Behavioural Transparency in Multi-Agent RL: Strategic Decoupling and the Economic Optimisation of Tactical Shocks Kenny Ching This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9183247/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract A primary challenge in deploying multi-agent reinforcement learning (RL) systems is the opacity of their emergent strategies, necessitating frameworks that render complex agentic behaviour interpretable. When RL agents interact with high-dimensional environments, they often execute decisions that appear erratic or "alien" to human observers. Behavioural economic theory suggests this perception stems from tactical myopia — the tendency of bounded biological agents to treat localised shocks (such as a tactical loss or victory) as terminal states, thereby degrading subsequent macroeconomic efficiency. Utilising high-fidelity telemetry from OpenAI Five in the imperfect-information environment of Dota 2, we provide empirical transparency into the processing of these tactical shocks. We demonstrate that average human cohorts exhibit intense tactical myopia: localised failures trigger extended economic contraction, while localised successes induce satisficing and stagnant resource acquisition. Conversely, econometric state-space matching reveals that the RL agent executes perfect strategic decoupling, treating shocks as neutral state transitions to instantly optimise subsequent macroeconomic trajectories. Crucially, when the RL agent is compared exclusively against apex human experts in identical economic states, the statistical divergence between biological and synthetic agents largely vanishes at all economically meaningful horizons. This convergence provides a transparent econometric explanation for opaque AI behaviour: RL networks do not invent incomprehensible strategies; rather, they mathematically purge the tactical myopia inherent in average play, converging on the precise global optimisation phenotype utilised by apex biological experts. Social science/Business and management Social science/Economics Social science/Psychology/Human behaviour Machine behaviour Multi-agent reinforcement learning Behavioural transparency Strategic decoupling Human-AI benchmarking Econometrics Dota 2 Full Text Additional Declarations There is NO Competing Interest. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9183247","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":610527352,"identity":"960ab5c9-8f81-46a0-8dbc-d1d0beafbc9f","order_by":0,"name":"Kenny Ching","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABGElEQVRIiWNgGAWjYBACCSidwMbAwPgAyDDgQxIlqIXZAKSFjWgtQMwmQZQWyfbeY495GOzy+KTbr1X8bLMzZmM/e/AGQ40dgzn/AaxapHnOpRvzMCQXs8mcKbvZ25ZsxsaTl2zBcCyZwbIBuxY5iRwzaR6GA4ltEjlpN3jbDtiwMeSYSTCwHWAwONiAXYv8G4SWwr8gLfxvgFr+AbUcxu4XaQkemJb0Y8xAW8zYgPZKMLYBtRzD4f2eHDPJOQbJIFuYpWXOJRuzSbwxtkjsS+YxOINdi8TxM2YSbyrsEufPSH/48U2ZnWE/f47hjQ/f7OQMzmP3Pggw8YDikAFCQkACkItTPRAw/gBT7A/wKRoFo2AUjIIRDABmUFD7UYbWuAAAAABJRU5ErkJggg==","orcid":"https://orcid.org/0000-0002-8049-9133","institution":"University of Auckland","correspondingAuthor":true,"prefix":"","firstName":"Kenny","middleName":"","lastName":"Ching","suffix":""}],"badges":[],"createdAt":"2026-03-21 05:15:13","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-9183247/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9183247/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":107480569,"identity":"efe80b3d-7e22-4c96-9c63-9bbcad0e9ecf","added_by":"auto","created_at":"2026-04-22 02:12:15","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2022186,"visible":true,"origin":"","legend":"Article File","description":"","filename":"snarticleblind.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9183247/v1_covered_7bbd1917-7852-461c-ba40-72d096b68981.pdf"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"Behavioural Transparency in Multi-Agent RL: Strategic Decoupling and the Economic Optimisation of Tactical Shocks","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Machine behaviour, Multi-agent reinforcement learning, Behavioural transparency, Strategic decoupling, Human-AI benchmarking, Econometrics, Dota 2","lastPublishedDoi":"10.21203/rs.3.rs-9183247/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9183247/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"A primary challenge in deploying multi-agent reinforcement learning (RL) systems is the opacity of their emergent strategies, necessitating frameworks that render complex agentic behaviour interpretable. When RL agents interact with high-dimensional environments, they often execute decisions that appear erratic or \"alien\" to human observers. Behavioural economic theory suggests this perception stems from tactical myopia — the tendency of bounded biological agents to treat localised shocks (such as a tactical loss or victory) as terminal states, thereby degrading subsequent macroeconomic efficiency. Utilising high-fidelity telemetry from OpenAI Five in the imperfect-information environment of Dota 2, we provide empirical transparency into the processing of these tactical shocks. We demonstrate that average human cohorts exhibit intense tactical myopia: localised failures trigger extended economic contraction, while localised successes induce satisficing and stagnant resource acquisition. Conversely, econometric state-space matching reveals that the RL agent executes perfect strategic decoupling, treating shocks as neutral state transitions to instantly optimise subsequent macroeconomic trajectories. Crucially, when the RL agent is compared exclusively against apex human experts in identical economic states, the statistical divergence between biological and synthetic agents largely vanishes at all economically meaningful horizons. This convergence provides a transparent econometric explanation for opaque AI behaviour: RL networks do not invent incomprehensible strategies; rather, they mathematically purge the tactical myopia inherent in average play, converging on the precise global optimisation phenotype utilised by apex biological experts.","manuscriptTitle":"Behavioural Transparency in Multi-Agent RL: Strategic Decoupling and the Economic Optimisation of Tactical Shocks","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-03-24 04:11:18","doi":"10.21203/rs.3.rs-9183247/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"84a57c2d-9423-4e8a-862b-6e9095debed6","owner":[],"postedDate":"March 24th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":64954471,"name":"Social science/Business and management"},{"id":64954472,"name":"Social science/Economics"},{"id":64954473,"name":"Social science/Psychology/Human behaviour"}],"tags":[],"updatedAt":"2026-04-15T18:31:38+00:00","versionOfRecord":[],"versionCreatedAt":"2026-03-24 04:11:18","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9183247","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9183247","identity":"rs-9183247","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall: last seen: 2026-05-23T02:00:01.238055+00:00

License: CC-BY-4.0