Multi-Modal Large Language Model Enables All-Purpose Prediction of Drug Mechanisms and Properties

doi:10.21203/rs.3.rs-5222656/v1

Multi-Modal Large Language Model Enables All-Purpose Prediction of Drug Mechanisms and Properties

2024 · doi:10.21203/rs.3.rs-5222656/v1

preprint OA: closed

Full text JSON View at publisher

Full text 15,592 characters · extracted from preprint-html · click to expand

Multi-Modal Large Language Model Enables All-Purpose Prediction of Drug Mechanisms and Properties | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Multi-Modal Large Language Model Enables All-Purpose Prediction of Drug Mechanisms and Properties Pengtao Xie, Youwei Liang, Ruiyi Zhang, Yongce Li, Mingjia Huo, and 12 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-5222656/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted You are reading this latest preprint version Abstract Accurately predicting the mechanisms and properties of potential drug molecules is essential for advancing drug discovery. However, traditional methods often require the development of specialized models for each specific prediction task, resulting in inefficiencies in both model training and integration into workflows. Moreover, these approaches are typically limited to predicting pharmaceutical attributes represented as discrete categories, and struggle with predicting complex attributes that are best described in free-form texts. To address these challenges, we introduce DrugChat, a multi-modal large language model (LLM) designed to provide comprehensive predictions of molecule mechanisms and properties within a unified framework. DrugChat analyzes the structure of an input molecule along with users' queries to generate comprehensive, free-form predictions on drug indications, pharmacodynamics, and mechanisms of action. Moreover, DrugChat supports multi-turn dialogues with users, facilitating interactive and in-depth exploration of the same molecule. Our extensive evaluation, including assessments by human experts, demonstrates that DrugChat significantly outperforms GPT-4 and other leading LLMs in generating accurate free-form predictions, and exceeds state-of-the-art specialized prediction models. Biological sciences/Computational biology and bioinformatics/Machine learning Biological sciences/Computational biology and bioinformatics/Virtual drug screening Biological sciences/Drug discovery Drug mechanism prediction drug property prediction multimodal large language model graph neural network Full Text Additional Declarations Yes there is potential Competing Interest. T.I. is a consultant for and has an equity interest in IDEAYA Biosciences. The terms of these arrangements for T.I. have been reviewed and approved by the University of California, San Diego, in accordance with its conflict of interest policies. The remaining authors declare no competing interests. Cite Share Download PDF Status: Under Review Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-5222656","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":364514590,"identity":"658b3a07-e6d2-4e44-ab2e-32d7c1c2558f","order_by":0,"name":"Pengtao Xie","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAsklEQVRIiWNgGAWjYDACZgaGAwwVIFYCAwMPUVrYQVrOkKSFH4gZ20jRYs7MY3jg47zD8vLtCYwP3rYRocWymS3h4Mxthw03nHnAbDiXGC0Gh5kPHObddjjBQCKBTZqXOC2MDYf/zjmcID8jgf03kVqAtgB1JTDcSGBjJkoL2C89x9KBfnnYLDnnHBFazPnPGH/4UWMNDLHkgx/elBHjMASTsYEI9ahaRsEoGAWjYBTgAAD/TDf/lAJyMwAAAABJRU5ErkJggg==","orcid":"","institution":"University of California San Diego","correspondingAuthor":true,"prefix":"","firstName":"Pengtao","middleName":"","lastName":"Xie","suffix":""},{"id":364514591,"identity":"66fbe510-f5e3-441d-8f46-566787b400dd","order_by":1,"name":"Youwei Liang","email":"","orcid":"","institution":"University of California San Diego","correspondingAuthor":false,"prefix":"","firstName":"Youwei","middleName":"","lastName":"Liang","suffix":""},{"id":364514592,"identity":"afe3484f-95ac-4c12-a331-cc5344657f09","order_by":2,"name":"Ruiyi Zhang","email":"","orcid":"","institution":"University of California San Diego","correspondingAuthor":false,"prefix":"","firstName":"Ruiyi","middleName":"","lastName":"Zhang","suffix":""},{"id":364514593,"identity":"3d1f4fb0-d856-4ecb-b902-089c13df5134","order_by":3,"name":"Yongce Li","email":"","orcid":"","institution":"University of California San Diego","correspondingAuthor":false,"prefix":"","firstName":"Yongce","middleName":"","lastName":"Li","suffix":""},{"id":364514594,"identity":"df907a94-8788-4bd5-90f7-f20f0d2811c8","order_by":4,"name":"Mingjia Huo","email":"","orcid":"","institution":"University of California San Diego","correspondingAuthor":false,"prefix":"","firstName":"Mingjia","middleName":"","lastName":"Huo","suffix":""},{"id":364514595,"identity":"46ae6db4-bf8d-4494-91b6-1d6e5f2d450e","order_by":5,"name":"Zinnia Ma","email":"","orcid":"https://orcid.org/0009-0003-3386-3363","institution":"University of California San Diego","correspondingAuthor":false,"prefix":"","firstName":"Zinnia","middleName":"","lastName":"Ma","suffix":""},{"id":364514596,"identity":"5a689279-7105-4ef1-a51c-af28beeacc85","order_by":6,"name":"Digvijay Singh","email":"","orcid":"","institution":"University of California San Diego","correspondingAuthor":false,"prefix":"","firstName":"Digvijay","middleName":"","lastName":"Singh","suffix":""},{"id":364514597,"identity":"a6b430d3-201f-4ad5-b4fe-c83dd8ae2dfb","order_by":7,"name":"Chengzhan Gao","email":"","orcid":"","institution":"University of California San Diego","correspondingAuthor":false,"prefix":"","firstName":"Chengzhan","middleName":"","lastName":"Gao","suffix":""},{"id":364514598,"identity":"1c86d96c-3a94-46fd-8148-e01a5625b90a","order_by":8,"name":"Hamidreza Rahmani","email":"","orcid":"","institution":"The Scripps Research Institute","correspondingAuthor":false,"prefix":"","firstName":"Hamidreza","middleName":"","lastName":"Rahmani","suffix":""},{"id":364514599,"identity":"f8120e49-61c3-43b8-93d4-e74cbbfcf53c","order_by":9,"name":"Satvik Bandi","email":"","orcid":"","institution":"University of California San Diego","correspondingAuthor":false,"prefix":"","firstName":"Satvik","middleName":"","lastName":"Bandi","suffix":""},{"id":364514600,"identity":"4ee07d1d-9102-40f3-beaa-75f4d205aca4","order_by":10,"name":"Li Zhang","email":"","orcid":"","institution":"University of California, San Diego","correspondingAuthor":false,"prefix":"","firstName":"Li","middleName":"","lastName":"Zhang","suffix":""},{"id":364514601,"identity":"572244a7-e24f-4d91-bc4d-bc076412dd23","order_by":11,"name":"Robert Weinreb","email":"","orcid":"","institution":"Hamilton Glaucoma Center, Shiley Eye Institute, and Department of Ophthalmology, University of California, San Diego, La Jolla, CA","correspondingAuthor":false,"prefix":"","firstName":"Robert","middleName":"","lastName":"Weinreb","suffix":""},{"id":364514602,"identity":"f881875a-679e-4327-9b9e-5ec50305b2f0","order_by":12,"name":"Atul Malhotra","email":"","orcid":"","institution":"University of California San Diego School of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Atul","middleName":"","lastName":"Malhotra","suffix":""},{"id":364514603,"identity":"fcc6afe3-43e0-4f8e-8945-521417f747f8","order_by":13,"name":"Danielle Grotjahn","email":"","orcid":"","institution":"The Scripps Research Institute","correspondingAuthor":false,"prefix":"","firstName":"Danielle","middleName":"","lastName":"Grotjahn","suffix":""},{"id":364514604,"identity":"517be89e-e2b8-4533-b21b-7f2fa0baa736","order_by":14,"name":"Linda Awdishu","email":"","orcid":"","institution":"University of California San Diego","correspondingAuthor":false,"prefix":"","firstName":"Linda","middleName":"","lastName":"Awdishu","suffix":""},{"id":364514605,"identity":"28cbf820-5fa0-4faf-9b6c-cf844878c3f7","order_by":15,"name":"Trey Ideker","email":"","orcid":"","institution":"University of California San Diego","correspondingAuthor":false,"prefix":"","firstName":"Trey","middleName":"","lastName":"Ideker","suffix":""},{"id":364514606,"identity":"07ec0dff-1b45-4f53-9cee-dca492e4699c","order_by":16,"name":"Michael Gilson","email":"","orcid":"","institution":"University of California San Diego","correspondingAuthor":false,"prefix":"","firstName":"Michael","middleName":"","lastName":"Gilson","suffix":""}],"badges":[],"createdAt":"2024-10-08 07:01:23","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-5222656/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-5222656/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":68897295,"identity":"96058dc0-3ca1-41c6-83bb-9b51986f4468","added_by":"auto","created_at":"2024-11-13 08:58:21","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1133149,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5222656/v1_covered_415c4167-ccc5-4dd7-9da4-28d178075fa0.pdf"}],"financialInterests":"\u003cb\u003eYes\u003c/b\u003e there is potential Competing Interest.\nT.I. is a consultant for and has an equity interest in IDEAYA Biosciences. The terms of these arrangements for T.I. have been reviewed and approved by the University of California, San Diego, in accordance with its conflict of interest policies. The remaining authors declare no competing interests.","formattedTitle":"Multi-Modal Large Language Model Enables All-Purpose Prediction of Drug Mechanisms and Properties","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"nature-portfolio","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Nature Portfolio","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"ejp","reportingPortfolio":"","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"Drug mechanism prediction, drug property prediction, multimodal large language model, graph neural network","lastPublishedDoi":"10.21203/rs.3.rs-5222656/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-5222656/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Accurately predicting the mechanisms and properties of potential drug molecules is essential for advancing drug discovery. However, traditional methods often require the development of specialized models for each specific prediction task, resulting in inefficiencies in both model training and integration into workflows. Moreover, these approaches are typically limited to predicting pharmaceutical attributes represented as discrete categories, and struggle with predicting complex attributes that are best described in free-form texts. To address these challenges, we introduce DrugChat, a multi-modal large language model (LLM) designed to provide comprehensive predictions of molecule mechanisms and properties within a unified framework. DrugChat analyzes the structure of an input molecule along with users' queries to generate comprehensive, free-form predictions on drug indications, pharmacodynamics, and mechanisms of action. Moreover, DrugChat supports multi-turn dialogues with users, facilitating interactive and in-depth exploration of the same molecule. Our extensive evaluation, including assessments by human experts, demonstrates that DrugChat significantly outperforms GPT-4 and other leading LLMs in generating accurate free-form predictions, and exceeds state-of-the-art specialized prediction models.","manuscriptTitle":"Multi-Modal Large Language Model Enables All-Purpose Prediction of Drug Mechanisms and Properties","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-10-11 03:34:46","doi":"10.21203/rs.3.rs-5222656/v1","editorialEvents":[],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"nature-machine-intelligence","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"natmachintell","sideBox":"Learn more about [Nature Machine Intelligence](http://www.nature.com/natmachintell/)","snPcode":"","submissionUrl":"","title":"Nature Machine Intelligence","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Nature Research","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"eac2dcdb-bbc1-409f-8e31-6e23b3fc8d56","owner":[],"postedDate":"October 11th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":38781747,"name":"Biological sciences/Computational biology and bioinformatics/Machine learning"},{"id":38781748,"name":"Biological sciences/Computational biology and bioinformatics/Virtual drug screening"},{"id":38781749,"name":"Biological sciences/Drug discovery"}],"tags":[],"updatedAt":"2025-05-29T16:40:31+00:00","versionOfRecord":[],"versionCreatedAt":"2024-10-11 03:34:46","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-5222656","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-5222656","identity":"rs-5222656","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00