The effect of reward expectancy on different types of exploration in human reinforcement learning | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article The effect of reward expectancy on different types of exploration in human reinforcement learning Kanji Shimomura, Kenji Morita This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4627464/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 03 Oct, 2024 Read the published version in Computational Brain & Behavior → Version 1 posted 13 You are reading this latest preprint version Abstract How humans resolve exploit-explore dilemma in complex environment is an important open question. Previous studies suggest that the level of reward expectancy affects the degree of exploration. However, it is still unclear (1) if the effect differs depending on the type of exploration (i.e., random or directed exploration) and (2) whether the effect can really be attributed to reward expectancy. In this preregistered study, we aimed to tackle these two challenges by extending a recently developed multi-armed bandit task that can dissociate uncertainty and novelty of stimuli. To extract the purified effect of reward expectancy, we manipulated reward by its magnitude, not by its probability, across blocks, because reward probability affects controllability of outcomes. Participants ( n = 198) showed increased optimal choices when relative expectancy was high. Behavioral analysis with computational modeling revealed that higher reward expectancy reduced the degree of random exploration, while it had little effect on the degree of uncertainty- and novelty-based exploration. These results suggest that humans modulate the degree of random exploration depending on the relative level of reward expectancy of the environment, while, combined with findings in the previous studies, they indicate the possibility that controllability also influences exploration-exploitation balance in human reinforcement learning. Reinforcement learning Reward expectancy Random exploration Directed exploration Explore-exploit tradeoff Full Text Additional Declarations No competing interests reported. Supplementary Files Supplementaryinformation.docx Cite Share Download PDF Status: Published Journal Publication published 03 Oct, 2024 Read the published version in Computational Brain & Behavior → Version 1 posted Editorial decision: Revision requested 30 Jul, 2024 Reviews received at journal 30 Jul, 2024 Reviews received at journal 29 Jul, 2024 Reviews received at journal 27 Jul, 2024 Reviews received at journal 16 Jul, 2024 Reviewers agreed at journal 27 Jun, 2024 Reviewers agreed at journal 24 Jun, 2024 Reviewers agreed at journal 24 Jun, 2024 Reviewers agreed at journal 24 Jun, 2024 Reviewers invited by journal 24 Jun, 2024 Editor assigned by journal 24 Jun, 2024 Submission checks completed at journal 24 Jun, 2024 First submitted to journal 24 Jun, 2024 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4627464","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":326614265,"identity":"6b87e61d-e3aa-4904-98d2-755362341074","order_by":0,"name":"Kanji Shimomura","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAuklEQVRIiWNgGAWjYDACHgglByIOPCBFizFYSwIpWhIbQCRRWuR7Dj978KPmTvr8sMMPgbbYyek2ENBicLbN3LDn2LPcjbfTDIBako3NDhDSws/DJs3YcDh34+wEkJYDidsIaZHvh2hJN5yd/oE4LQxne8BaEuSlc4i0xeDMMbBfDDdI5xQcSDAgwi/yPcngEJOXn52++cOHCjs5glqAgA2IDzAYgFUaEFaO0CLfQJzqUTAKRsEoGIEAAEodR1/CynH+AAAAAElFTkSuQmCC","orcid":"","institution":"The University of Tokyo","correspondingAuthor":true,"prefix":"","firstName":"Kanji","middleName":"","lastName":"Shimomura","suffix":""},{"id":326614266,"identity":"9932e53a-0f0b-4f61-af02-904c81dce4bf","order_by":1,"name":"Kenji Morita","email":"","orcid":"","institution":"The University of Tokyo","correspondingAuthor":false,"prefix":"","firstName":"Kenji","middleName":"","lastName":"Morita","suffix":""}],"badges":[],"createdAt":"2024-06-24 04:51:34","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4627464/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4627464/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1007/s42113-024-00224-6","type":"published","date":"2024-10-03T15:56:55+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":66096670,"identity":"5d9c5ab8-43e9-4e72-ba16-66cd4d225386","added_by":"auto","created_at":"2024-10-07 16:06:52","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":541349,"visible":true,"origin":"","legend":"","description":"","filename":"MABexplorationclean.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4627464/v1_covered_c0eec43e-db20-44eb-875d-076c4e6bcbd7.pdf"},{"id":60439325,"identity":"65f546dd-c0e4-4263-b3d2-6e13bed093a3","added_by":"auto","created_at":"2024-07-16 18:21:31","extension":"docx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":455938,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementaryinformation.docx","url":"https://assets-eu.researchsquare.com/files/rs-4627464/v1/0211e3de1c7f163818b26e8e.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"The effect of reward expectancy on different types of exploration in human reinforcement learning","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"computational-brain-and-behavior","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"cobb","sideBox":"Learn more about [Computational Brain \u0026 Behavior](https://link.springer.com/journal/42113)","snPcode":"42113","submissionUrl":"https://submission.nature.com/new-submission/42113/3","title":"Computational Brain \u0026 Behavior","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"Reinforcement learning, Reward expectancy, Random exploration, Directed exploration, Explore-exploit tradeoff","lastPublishedDoi":"10.21203/rs.3.rs-4627464/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4627464/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eHow humans resolve exploit-explore dilemma in complex environment is an important open question. Previous studies suggest that the level of reward expectancy affects the degree of exploration. However, it is still unclear (1) if the effect differs depending on the type of exploration (i.e., random or directed exploration) and (2) whether the effect can really be attributed to reward expectancy. In this preregistered study, we aimed to tackle these two challenges by extending a recently developed multi-armed bandit task that can dissociate uncertainty and novelty of stimuli. To extract the purified effect of reward expectancy, we manipulated reward by its magnitude, not by its probability, across blocks, because reward probability affects controllability of outcomes. Participants (\u003cem\u003en\u003c/em\u003e\u0026thinsp;=\u0026thinsp;198) showed increased optimal choices when relative expectancy was high. Behavioral analysis with computational modeling revealed that higher reward expectancy reduced the degree of random exploration, while it had little effect on the degree of uncertainty- and novelty-based exploration. These results suggest that humans modulate the degree of random exploration depending on the relative level of reward expectancy of the environment, while, combined with findings in the previous studies, they indicate the possibility that controllability also influences exploration-exploitation balance in human reinforcement learning.\u003c/p\u003e","manuscriptTitle":"The effect of reward expectancy on different types of exploration in human reinforcement learning","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-07-16 18:21:26","doi":"10.21203/rs.3.rs-4627464/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2024-07-30T23:03:12+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-07-30T20:30:41+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-07-29T14:28:49+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-07-27T15:19:37+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-07-16T08:08:15+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"295151358313498914158333602177961852213","date":"2024-06-27T09:37:03+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"34238009707022272374066249231465705807","date":"2024-06-25T01:23:26+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"210380311634744118546218923533873672517","date":"2024-06-24T23:54:31+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"81561865669610076245203262059331921652","date":"2024-06-24T23:31:09+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2024-06-24T23:25:16+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2024-06-24T11:58:14+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2024-06-24T11:56:43+00:00","index":"","fulltext":""},{"type":"submitted","content":"Computational Brain \u0026 Behavior","date":"2024-06-24T04:50:13+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"computational-brain-and-behavior","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"cobb","sideBox":"Learn more about [Computational Brain \u0026 Behavior](https://link.springer.com/journal/42113)","snPcode":"42113","submissionUrl":"https://submission.nature.com/new-submission/42113/3","title":"Computational Brain \u0026 Behavior","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"65b12bf0-33af-45f8-a04b-acf18576de25","owner":[],"postedDate":"July 16th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2024-10-07T15:58:59+00:00","versionOfRecord":{"articleIdentity":"rs-4627464","link":"https://doi.org/10.1007/s42113-024-00224-6","journal":{"identity":"computational-brain-and-behavior","isVorOnly":false,"title":"Computational Brain \u0026 Behavior"},"publishedOn":"2024-10-03 15:56:55","publishedOnDateReadable":"October 3rd, 2024"},"versionCreatedAt":"2024-07-16 18:21:26","video":"","vorDoi":"10.1007/s42113-024-00224-6","vorDoiUrl":"https://doi.org/10.1007/s42113-024-00224-6","workflowStages":[]},"version":"v1","identity":"rs-4627464","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4627464","identity":"rs-4627464","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.