Toxicity-Aware Reinforcement Learning for Liquidity Provisioning on Uniswap v3: A Systematic Ablation | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Toxicity-Aware Reinforcement Learning for Liquidity Provisioning on Uniswap v3: A Systematic Ablation Régis Likassi This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9724880/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract We present an empirical study of toxicity-aware deep reinforcement learning for liquidity provisioning on Uniswap v3, with a primary finding that toxicity signals act as tail-risk regulators rather than mean-return enhancers. The swap-size signature configuration (R5_volsize) achieves a 100% episode-level win rate across five rolling windows, is the only configuration with a strictly positive CVaR10% ($ + 4.00 versus $11.84 for the toxicity-blind baseline), and outperforms the baseline by +$36.65 in stressed market regimes specifically. Concentrated-liquidity automated market makers (AMMs) such as Uniswap v3 expose liquidity providers (LPs) to adverse selection by informed arbitrageurs, formalised as Loss-Versus-Rebalancing (LVR) by Milionis et al. [2022]. While prior work has applied deep reinforcement learning (DRL) to active LP rebalancing [Xu and Brini, 2025, Zhang et al., 2023], no existing approach explicitly enriches the agent’s observation with toxicity signals derived from on-chain microstructure. We design four complementary toxicity scores: an analytical LVR proxy, a price-deviation spread, a volume-weighted realised toxicity, and a swap-size signature, each empirically validated to scale with market regime by factors of 2.79 to 8.46 between calm and stressed hours on the WETH/USDC 0.05% pool. We then conduct a factorial ablation over seven observation configurations, five rolling windows, and three random seeds (105 PPO runs of 80,000 timesteps each), using 24,019 hourly observations spanning May 2021 to January 2024 reconstructed from Dune Analytics ($428B cumulative volume, $214M cumulative fees). All seven PPO configurations beat a passive width-20-tick baseline in mean excess return, but effect sizes on mean return are small relative to outcome variance, so no configuration achieves statistically significant superiority over the toxicity-blind baseline (Welch t -test, all p > 0.20; Fisher exact test, all p > 0.10). This is consistent with structural noise dominance in AMM rewards: the standard deviation of capital outcomes ( σ $23–$57) exceeds cross-configuration mean differences (∆ $6–$16). The distributional improvements, by contrast, are consistent across all five windows and across three market regimes. R5_volsize also exhibits a Pearson correlation of r = +0.41 between its toxicity observation and the defensive range-widening action it selects (computed on held-out test episodes), a behavioural pattern consistent with the agent acting as a CVaR-constrained policy [Chow and Ghavamzadeh, 2014] despite being trained on a standard mean-return objective. All code, data and trained models are publicly released. Artificial Intelligence and Machine Learning uniswap reinforcement learning LPs Automated market makers machine learning Full Text Additional Declarations The authors declare no competing interests. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9724880","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":640878142,"identity":"acd4466a-398a-44d7-b709-2039fc33a547","order_by":0,"name":"Régis Likassi","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABQ0lEQVRIie2QMUvDQBTHXyk0y4OsLzTEr3Al0FYM5qtcCLTLIYJLQakFwSwFVwfxM+jSOSVgloi4OVoCdcmQbBEcvEZDaZvugvkNx+P/7nfvcQA1NX8UH6CPtKr4CLCI+PqkPQr9KlGp8FLhlUrxVNFpXJfBWtm52/O813kOpGteGKeL+2Odhc/zdJEHoCqik37mfaO3qehRdBqgXKyNokvOzEUWnbjEeQDaNDG1KSfzcLK5EQnZlYoBogXOrInaRHRhpbA3WSAn58HfVA4SvloMDfUjTp27S9Rukh/Flkrjq0IhxfeLxYgzciYBqlROkUWzagoKCJARardJl/hTKJXEJD4YIkXLs7Y+IJNtKUoYZ/nIsullKIuLc7ulik6WW0eG6rmPWWKNjS0FABkA2/l8uMLdrER5r4zH+42ampqa/8Y3yzNt/35j+HoAAAAASUVORK5CYII=","orcid":"","institution":"aivancity School for Technology Business and Society","correspondingAuthor":true,"prefix":"","firstName":"Régis","middleName":"","lastName":"Likassi","suffix":""}],"badges":[],"createdAt":"2026-05-15 12:55:42","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-9724880/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9724880/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":109427446,"identity":"5bf18341-49a2-4079-972f-bd262c6ca57c","added_by":"auto","created_at":"2026-05-18 03:11:10","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":611186,"visible":true,"origin":"","legend":"","description":"","filename":"ai4f12.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9724880/v1_covered_8720d086-0aea-4b55-8165-fe156b5b732e.pdf"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003eToxicity-Aware Reinforcement Learning for Liquidity\u003c/p\u003e\n\u003cp\u003eProvisioning on Uniswap v3: A Systematic Ablation\u003c/p\u003e","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"aivancity School for Technology Business and Society","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"uniswap, reinforcement learning, LPs, Automated market makers, machine learning","lastPublishedDoi":"10.21203/rs.3.rs-9724880/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9724880/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eWe present an empirical study of toxicity-aware deep reinforcement learning for liquidity provisioning on Uniswap v3, with a primary finding that toxicity signals act as tail-risk regulators rather than mean-return enhancers. The swap-size signature configuration (R5_volsize) achieves a 100% episode-level win rate across five rolling windows, is the only configuration with a strictly positive CVaR10% ($ + 4.00 versus $11.84 for the toxicity-blind baseline), and outperforms the baseline by +$36.65 in stressed market regimes specifically. Concentrated-liquidity automated market makers (AMMs) such as Uniswap v3 expose liquidity providers (LPs) to adverse selection by informed arbitrageurs, formalised as Loss-Versus-Rebalancing (LVR) by Milionis et al. [2022]. While prior work has applied deep reinforcement learning (DRL) to active LP rebalancing [Xu and Brini, 2025, Zhang et al., 2023], no existing approach explicitly enriches the agent’s observation with toxicity signals derived from on-chain microstructure. We design four complementary toxicity scores: an analytical LVR proxy, a price-deviation spread, a volume-weighted realised toxicity, and a swap-size signature, each empirically validated to scale with market regime by factors of 2.79 to 8.46 between calm and stressed hours on the WETH/USDC 0.05% pool. We then conduct a factorial ablation over seven observation configurations, five rolling windows, and three random seeds (105 PPO runs of 80,000 timesteps each), using 24,019 hourly observations spanning May 2021 to January 2024 reconstructed from Dune Analytics ($428B cumulative volume, $214M cumulative fees). All seven PPO configurations beat a passive width-20-tick baseline in mean excess return, but effect sizes on mean return are small relative to outcome variance, so no configuration achieves statistically significant superiority over the toxicity-blind baseline (Welch \u003cem\u003et\u003c/em\u003e-test, all \u003cem\u003ep \u0026gt;\u003c/em\u003e 0.20; Fisher exact test, all \u003cem\u003ep \u0026gt;\u003c/em\u003e 0.10). This is consistent with structural noise dominance in AMM rewards: the standard deviation of capital outcomes (\u003cem\u003eσ\u003c/em\u003e $23–$57) exceeds cross-configuration mean differences (∆ $6–$16). The distributional improvements, by contrast, are consistent across all five windows and across three market regimes. R5_volsize also exhibits a Pearson correlation of \u003cem\u003er\u003c/em\u003e = +0.41 between its toxicity observation and the defensive range-widening action it selects (computed on held-out test episodes), a behavioural pattern consistent with the agent acting as a CVaR-constrained policy [Chow and Ghavamzadeh, 2014] despite being trained on a standard mean-return objective. All code, data and trained models are publicly released.\u003c/p\u003e","manuscriptTitle":"Toxicity-Aware Reinforcement Learning for Liquidity\nProvisioning on Uniswap v3: A Systematic Ablation","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-05-18 03:11:04","doi":"10.21203/rs.3.rs-9724880/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"743303ab-3b66-4eb2-bc30-29ba3af3a4c2","owner":[],"postedDate":"May 18th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":68189272,"name":"Artificial Intelligence and Machine Learning"}],"tags":[],"updatedAt":"2026-05-18T03:11:04+00:00","versionOfRecord":[],"versionCreatedAt":"2026-05-18 03:11:04","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9724880","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9724880","identity":"rs-9724880","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.