A Process-Centric Survey of AI for Scientific Discovery Through the EXHYTE Framework

preprint OA: closed
Full text JSON View at publisher
Full text 14,069 characters · extracted from preprint-html · click to expand
A Process-Centric Survey of AI for Scientific Discovery Through the EXHYTE Framework | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Systematic Review A Process-Centric Survey of AI for Scientific Discovery Through the EXHYTE Framework Md Musaddaqul Hasib, Sumin Jo, Harsh Sinha, Jifeng Song, Arun Das, and 8 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8370059/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Large language models (LLMs) and agent systems are increasingly transforming scientific discovery, driving progress across chemistry, biology, materials science, and physics. Yet most existing work and surveys remain fragmented, focusing on isolated tasks such as idea generation or experiment design without addressing how these components fit within the broader discovery process. To bridge this gap, we introduce the EXHYTE cycle, an iterative framework that formalizes scientific discovery as a sequence of Exploration, Hypothesis generation, and Testing. We assembled a corpus of recent studies, distilled recurring strategies that characterize how AI methods contribute to each EXHYTE substage, and organized the literature accordingly to representative strategies and domain-specific advances. This process-centric perspective unifies diverse methodologies under a single structured workflow, identifies substages that are mature versus underexplored, and reveals complementarities that enable closed-loop discovery systems. It also clarifies the evolving division of labor between human researchers and AI systems, offering a roadmap for developing adaptive, autonomous frameworks for AIdriven scientific discovery.An accompanying website with paper summaries and an LLM-powered interactive survey based on EXHYTE is available at https: //webapps.crc.pitt.edu/exhyte/ Artificial Intelligence and Machine Learning AI for Scientific discovery the EXHYTE cycle Large language models Hypothesis generation Idea generation Full Text Additional Declarations The authors declare no competing interests. Supplementary Files supplementaryfile.xlsx Summary of surveyed papers Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8370059","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Systematic Review","associatedPublications":[],"authors":[{"id":560825331,"identity":"25595b61-faa1-472e-9f22-8ededd0ce4a4","order_by":0,"name":"Md Musaddaqul Hasib","email":"","orcid":"","institution":"University of Pittsburgh","correspondingAuthor":false,"prefix":"","firstName":"Md","middleName":"Musaddaqul","lastName":"Hasib","suffix":""},{"id":560825332,"identity":"aad3fedb-8d8b-402d-af9b-ea2dd69997ac","order_by":1,"name":"Sumin Jo","email":"","orcid":"","institution":"University of Pittsburgh","correspondingAuthor":false,"prefix":"","firstName":"Sumin","middleName":"","lastName":"Jo","suffix":""},{"id":560825333,"identity":"8054b31f-1627-4fd7-bf96-a4f963e049c7","order_by":2,"name":"Harsh Sinha","email":"","orcid":"","institution":"University of Pittsburgh","correspondingAuthor":false,"prefix":"","firstName":"Harsh","middleName":"","lastName":"Sinha","suffix":""},{"id":560825334,"identity":"bfcdd594-c48b-45fc-968f-40cabdbdc42c","order_by":3,"name":"Jifeng Song","email":"","orcid":"","institution":"University of Pittsburgh","correspondingAuthor":false,"prefix":"","firstName":"Jifeng","middleName":"","lastName":"Song","suffix":""},{"id":560825335,"identity":"6714eeb0-e046-41b9-a134-a03d6a575705","order_by":4,"name":"Arun Das","email":"","orcid":"","institution":"University of Pittsburgh","correspondingAuthor":false,"prefix":"","firstName":"Arun","middleName":"","lastName":"Das","suffix":""},{"id":560825336,"identity":"af2f45d8-302e-4110-89a8-91e85842e47d","order_by":5,"name":"Zhentao Liu","email":"","orcid":"","institution":"University of Pittsburgh","correspondingAuthor":false,"prefix":"","firstName":"Zhentao","middleName":"","lastName":"Liu","suffix":""},{"id":560825337,"identity":"4d74d820-3f69-4469-8ac6-cd948ff74946","order_by":6,"name":"Hugh Galloway","email":"","orcid":"","institution":"University of Pittsburgh","correspondingAuthor":false,"prefix":"","firstName":"Hugh","middleName":"","lastName":"Galloway","suffix":""},{"id":560825338,"identity":"1ebcdedb-8702-47ee-adf9-d7db2063795b","order_by":7,"name":"Huey Huang","email":"","orcid":"","institution":"University of Texas at Austin","correspondingAuthor":false,"prefix":"","firstName":"Huey","middleName":"","lastName":"Huang","suffix":""},{"id":560825339,"identity":"987b3291-9426-419c-8fd9-6be9938b4c36","order_by":8,"name":"Kexun Zhang","email":"","orcid":"","institution":"Carnegie Mellon University","correspondingAuthor":false,"prefix":"","firstName":"Kexun","middleName":"","lastName":"Zhang","suffix":""},{"id":560825340,"identity":"3883ba93-4f04-4283-937b-166b8b9a4513","order_by":9,"name":"Shou-Jiang Gao","email":"","orcid":"","institution":"University of Pittsburgh","correspondingAuthor":false,"prefix":"","firstName":"Shou-Jiang","middleName":"","lastName":"Gao","suffix":""},{"id":560825341,"identity":"aa68276e-76ed-407b-81c8-094d760b5a35","order_by":10,"name":"Yu-Chiao Chiu","email":"","orcid":"","institution":"University of Pittsburgh","correspondingAuthor":false,"prefix":"","firstName":"Yu-Chiao","middleName":"","lastName":"Chiu","suffix":""},{"id":560825342,"identity":"c4f18272-64d6-4104-b5fe-141fbf353a0d","order_by":11,"name":"Lei Li","email":"","orcid":"","institution":"Carnegie Mellon University","correspondingAuthor":false,"prefix":"","firstName":"Lei","middleName":"","lastName":"Li","suffix":""},{"id":560825343,"identity":"72a1a4a3-ce38-4708-b659-13662d1306ff","order_by":12,"name":"Yufei Huang","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAtElEQVRIiWNgGAWjYFACHoYDCRU2DGxgDhtxWhgPPDiTxsDGxky8FuaDD9sOA1UTq8XgRu6BA4lt5/P45PsPMHwoO0yMlryEAwnnbheDHMY44xxRWnIMDiSU3U5sA2ph5m0jWgvbOYiWv8RraTsA0cJIjBbJM2+AWs4kA/2SbHCw51w6YS18x3OMP/6osMuTbz748MGPMmvCWhQOQOgEEHGAsHogkG9A0jIKRsEoGAWjACsAABNOP5oE2YFCAAAAAElFTkSuQmCC","orcid":"","institution":"University of Pittsburgh","correspondingAuthor":true,"prefix":"","firstName":"Yufei","middleName":"","lastName":"Huang","suffix":""}],"badges":[],"createdAt":"2025-12-15 21:41:14","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-8370059/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8370059/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":98439567,"identity":"5cf159c9-4639-4a80-82e0-d7f1151b0c91","added_by":"auto","created_at":"2025-12-17 17:02:08","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":4010810,"visible":true,"origin":"","legend":"","description":"","filename":"exhytemanuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8370059/v1_covered_ffaf507f-82be-4df2-88f5-eca8b045fee9.pdf"},{"id":98363826,"identity":"755cdbf4-a30d-4555-99b1-230cfcbf2c16","added_by":"auto","created_at":"2025-12-17 03:34:00","extension":"xlsx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":23874,"visible":true,"origin":"","legend":"\u003cp\u003eSummary of surveyed papers\u0026nbsp;\u003c/p\u003e","description":"","filename":"supplementaryfile.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-8370059/v1/5e9bbbe208296a7ebcbfd5cf.xlsx"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003eA Process-Centric Survey of AI for Scientific Discovery Through the EXHYTE Framework\u003c/p\u003e","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"University of Pittsburgh","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"AI for Scientific discovery, the EXHYTE cycle, Large language models, Hypothesis generation, Idea generation","lastPublishedDoi":"10.21203/rs.3.rs-8370059/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8370059/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eLarge language models (LLMs) and agent systems are increasingly transforming scientific discovery, driving progress across chemistry, biology, materials science, and physics. Yet most existing work and surveys remain fragmented, focusing on isolated tasks such as idea generation or experiment design without addressing how these components fit within the broader discovery process. To bridge this gap, we introduce the EXHYTE cycle, an iterative framework that formalizes scientific discovery as a sequence of Exploration, Hypothesis generation, and Testing. We assembled a corpus of recent studies, distilled recurring strategies that characterize how AI methods contribute to each EXHYTE substage, and organized the literature accordingly to representative strategies and domain-specific advances. This process-centric perspective unifies diverse methodologies under a single structured workflow, identifies substages that are mature versus underexplored, and reveals complementarities that enable closed-loop discovery systems. It also clarifies the evolving division of labor between human researchers and AI systems, offering a roadmap for developing adaptive, autonomous frameworks for AIdriven scientific discovery.An accompanying website with paper summaries and an LLM-powered interactive survey based on EXHYTE is available at https: //webapps.crc.pitt.edu/exhyte/\u003c/p\u003e","manuscriptTitle":"A Process-Centric Survey of AI for Scientific Discovery Through the EXHYTE Framework","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-12-17 03:33:55","doi":"10.21203/rs.3.rs-8370059/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"1cee78e9-5abc-4d8c-bc6c-e3d0122d188c","owner":[],"postedDate":"December 17th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":59709062,"name":"Artificial Intelligence and Machine Learning"}],"tags":[],"updatedAt":"2025-12-17T03:33:55+00:00","versionOfRecord":[],"versionCreatedAt":"2025-12-17 03:33:55","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8370059","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8370059","identity":"rs-8370059","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00