Harnessing Pre-trained Models for Accurate Prediction of Protein-Ligand Binding Affinity | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Harnessing Pre-trained Models for Accurate Prediction of Protein-Ligand Binding Affinity Jiashan Li, Xinqi Gong This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-5382178/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 17 Feb, 2025 Read the published version in BMC Bioinformatics → Version 1 posted 11 You are reading this latest preprint version Abstract Background : The binding between proteins and ligands plays a crucial role in the field of drug discovery. However, this area currently faces numerous challenges. On one hand, existing methods are constrained by the limited availability of labeled data, often performing inadequately when addressing complex protein-ligand interactions. On the other hand, many models struggle to effectively capture the flexible variations and relative spatial relationships between proteins and ligands. These issues not only significantly hinder the advancement of protein-ligand binding research but also adversely affect the accuracy and efficiency of drug discovery. Therefore, in response to these challenges, our study aims to enhance predictive capabilities through innovative approaches, providing more reliable support for drug discovery efforts. Methods : This study leverages a pre-trained model with spatial awareness to enhance the prediction of protein-ligand binding affinity. By perturbing protein structures in a manner consistent with physical constraints and employing self-supervised tasks, we improve the representation of small molecule structures, allowing for better adaptation to affinity predictions. Meanwhile, our approach enables the identification of potential binding sites on proteins. Results : Our model achieves a significantly higher correlation coefficient in binding affinity predictions, with a classification ROC exceeding 95% for binding site identification. Conclusion : This research presents a novel approach that not only enhances the accuracy of binding affinity predictions but also facilitates the identification of binding sites, showcasing the potential of pre-trained models in computational 1 drug design. Data and code are available at https://anonymous.4open.science/r/ SableBind-1B53. Binding affinity Binding site prediction Molecular representation Molecular pre-training Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Published Journal Publication published 17 Feb, 2025 Read the published version in BMC Bioinformatics → Version 1 posted Editorial decision: Revision requested 05 Dec, 2024 Reviews received at journal 04 Dec, 2024 Reviews received at journal 21 Nov, 2024 Reviewers agreed at journal 13 Nov, 2024 Reviewers agreed at journal 10 Nov, 2024 Reviewers agreed at journal 10 Nov, 2024 Reviewers invited by journal 07 Nov, 2024 Editor invited by journal 07 Nov, 2024 Editor assigned by journal 05 Nov, 2024 Submission checks completed at journal 05 Nov, 2024 First submitted to journal 03 Nov, 2024 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-5382178","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":377792477,"identity":"85197012-b779-4a9f-9c8d-a2a70abae12e","order_by":0,"name":"Jiashan Li","email":"","orcid":"","institution":"Renmin University of China","correspondingAuthor":false,"prefix":"","firstName":"Jiashan","middleName":"","lastName":"Li","suffix":""},{"id":377792478,"identity":"af6d8e40-a817-41c8-ac20-7aead9d8ea65","order_by":1,"name":"Xinqi Gong","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA4klEQVRIiWNgGAWjYDACZiBOADHYG8B8xgbitfAcIFYLHEgkEKnF4DjvwRsPftnkyUe+fSbxg8FGdsMB5mcP8GmRbOZLtkjsSys2vJ1uJtnDkGa84QCbuQE+LfzMPGYSiT2HEzfOTmOTZmA4nLjhAA+bBD4tbHAtM4+BtPwnrAVsS8KPw4nzJdhAWg4Q1iLZzGNskdiQlriBJ43Zsscg2XjmYTYzvFoMzp8xvPnjj03i/PZjjDd+VNjJ9h1vfoZXCwhIMLYB9R4Am8AAiVyCWhj+MDDINxChchSMglEwCkYmAACfBkR+mXWdrgAAAABJRU5ErkJggg==","orcid":"","institution":"Renmin University of China","correspondingAuthor":true,"prefix":"","firstName":"Xinqi","middleName":"","lastName":"Gong","suffix":""}],"badges":[],"createdAt":"2024-11-03 13:23:10","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-5382178/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-5382178/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1186/s12859-025-06064-w","type":"published","date":"2025-02-17T15:57:14+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":77052556,"identity":"a3cb411c-bf9a-4184-9368-14445309d5c6","added_by":"auto","created_at":"2025-02-24 16:14:48","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1796118,"visible":true,"origin":"","legend":"","description":"","filename":"SableBindsubmissionv2.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5382178/v1_covered_0dbbc709-881d-454d-82e0-49636050416d.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Harnessing Pre-trained Models for Accurate Prediction of Protein-Ligand Binding Affinity","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"bmc-bioinformatics","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"binf","sideBox":"Learn more about [BMC Bioinformatics](http://bmcbioinformatics.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/binf","title":"BMC Bioinformatics","twitterHandle":"@BMC_Bioinformatics","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Binding affinity, Binding site prediction, Molecular representation, Molecular pre-training","lastPublishedDoi":"10.21203/rs.3.rs-5382178/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-5382178/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003e\u003cstrong\u003eBackground\u003c/strong\u003e: The binding between proteins and ligands plays a crucial role in the field of drug discovery. However, this area currently faces numerous challenges. On one hand, existing methods are constrained by the limited availability of labeled data, often performing inadequately when addressing complex protein-ligand interactions. On the other hand, many models struggle to effectively capture the flexible variations and relative spatial relationships between proteins and ligands. These issues not only significantly hinder the advancement of protein-ligand binding research but also adversely affect the accuracy and efficiency of drug discovery. Therefore, in response to these challenges, our study aims to enhance predictive capabilities through innovative approaches, providing more reliable support for drug discovery efforts.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eMethods\u003c/strong\u003e: This study leverages a pre-trained model with spatial awareness to enhance the prediction of protein-ligand binding affinity. By perturbing protein structures in a manner consistent with physical constraints and employing self-supervised tasks, we improve the representation of small molecule structures, allowing for better adaptation to affinity predictions. Meanwhile, our approach enables the identification of potential binding sites on proteins.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eResults\u003c/strong\u003e: Our model achieves a significantly higher correlation coefficient in binding affinity predictions, with a classification ROC exceeding 95% for binding site identification.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConclusion\u003c/strong\u003e: This research presents a novel approach that not only enhances the accuracy of binding affinity predictions but also facilitates the identification of binding sites, showcasing the potential of pre-trained models in computational 1 drug design. Data and code are available at https://anonymous.4open.science/r/ SableBind-1B53.\u003c/p\u003e","manuscriptTitle":"Harnessing Pre-trained Models for Accurate Prediction of Protein-Ligand Binding Affinity","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-11-18 07:58:58","doi":"10.21203/rs.3.rs-5382178/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2024-12-05T11:10:01+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-12-04T20:09:28+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-11-21T05:33:33+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"156742373957133417447572182992551259255","date":"2024-11-13T17:16:51+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"129149825997367476867142988812143218913","date":"2024-11-11T00:38:02+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"337082919466092901043941122003557212944","date":"2024-11-10T19:18:47+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2024-11-08T00:54:05+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2024-11-07T11:44:11+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2024-11-05T08:39:47+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2024-11-05T08:37:55+00:00","index":"","fulltext":""},{"type":"submitted","content":"BMC Bioinformatics","date":"2024-11-03T13:09:12+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"bmc-bioinformatics","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"binf","sideBox":"Learn more about [BMC Bioinformatics](http://bmcbioinformatics.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/binf","title":"BMC Bioinformatics","twitterHandle":"@BMC_Bioinformatics","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"2b8fba98-6073-430e-9cf2-cfe1a21a49c0","owner":[],"postedDate":"November 18th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2025-02-24T16:00:38+00:00","versionOfRecord":{"articleIdentity":"rs-5382178","link":"https://doi.org/10.1186/s12859-025-06064-w","journal":{"identity":"bmc-bioinformatics","isVorOnly":false,"title":"BMC Bioinformatics"},"publishedOn":"2025-02-17 15:57:14","publishedOnDateReadable":"February 17th, 2025"},"versionCreatedAt":"2024-11-18 07:58:58","video":"","vorDoi":"10.1186/s12859-025-06064-w","vorDoiUrl":"https://doi.org/10.1186/s12859-025-06064-w","workflowStages":[]},"version":"v1","identity":"rs-5382178","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-5382178","identity":"rs-5382178","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.