Integrating protein sequence embeddings with structure via graph-based deep learning for single-residue property prediction

preprint OA: closed CC-BY-4.0
📄 Open PDF Full text JSON View at publisher
Full text 12,905 characters · extracted from preprint-html · click to expand
Integrating protein sequence embeddings with structure via graph-based deep learning for single-residue property prediction | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Integrating protein sequence embeddings with structure via graph-based deep learning for single-residue property prediction Kevin Michalewicz, Mauricio Barahona, Barbara Bravi This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8043216/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Understanding the intertwined contributions of amino acid sequence and spatial structure is essential to explain protein behaviour. Here, we introduce INFUSSE (Integrated Network Framework Unifying Structure and Sequence Embeddings), a deep learning framework for the prediction of single-residue properties that combines fine-tuning of sequence embeddings derived from a Large Language Model with the inclusion of graph-based representations of protein structures via a diffusive Graph Convolutional Network. To illustrate the benefits of jointly leveraging sequence and structure, we apply INFUSSE to the prediction of B-factors in antibodies, a residue property that reflects the local flexibility shaped by biochemical and structural constraints in these highly variable and dynamic proteins. Using a dataset of 1510 antibody and antibody-antigen complexes from the database SAbDab, we show that INFUSSE improves performance over current machine learning (ML) methods based on sequence or structure alone, and allows for the systematic disentanglement of sequence and structure contributions to the performance. Our results show that adding structural information via geometric graphs enhances predictions especially for intrinsically disordered regions, protein-protein interaction sites, and highly variable amino acid positions---all key structural features for antibody function which are not well captured by purely sequence-based ML descriptions. Biological sciences/Structural biology Biological sciences/Computational biology and bioinformatics/Machine learning Antibody Deep Learning Graph-based Learning Interpretability Large Language Model Protein Structure Full Text Additional Declarations There is NO Competing Interest. Supplementary Files SupplementaryInformation.pdf Supplementary Information Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8043216","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":540898014,"identity":"2222dde9-d3c7-4d73-912f-884a68fc50f0","order_by":0,"name":"Kevin Michalewicz","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABNklEQVRIie2RMUvDQBSAXziIg0+yyYVI+xeuCHWR+lcSAu2SFseCQ08CN1lnQX+E4OJ4IWCWuAsONghOHZrNgKh3pk7XiKNgPrjjeO993Hs8gJaWPwhVR1oc9gDUDdPDjv0VZzqikY0K1ko+3P9W8CcFakU/RBrwdaZRcfl2IKvbATo0PC1fbTI6371PnqfHA4StdEEwNxQPdq6TeR6iexHElKI9Ft4k7OUsRMAhI/hgKB2lpJYgyB4DAYyiUqK+y1UxQAQEV03KDI+04jM6sr3ooOJshuAsNyperaTIPKVIn/lK6VucpQhU/2I25sZ6FpEhfSpil0u/p2dRjWVo0xeWXJnj02x+s6rEScc5y4qyev/odi/HScnfVMQJi8XyzlCAmKH1RvROpZndCP6yrqWlpeW/8AmUiGRzIy1AKgAAAABJRU5ErkJggg==","orcid":"","institution":"Imperial College London","correspondingAuthor":true,"prefix":"","firstName":"Kevin","middleName":"","lastName":"Michalewicz","suffix":""},{"id":540898015,"identity":"84a50d36-1af2-44c8-ac11-de6d4b1ad90f","order_by":1,"name":"Mauricio Barahona","email":"","orcid":"https://orcid.org/0000-0002-1089-5675","institution":"Imperial College London","correspondingAuthor":false,"prefix":"","firstName":"Mauricio","middleName":"","lastName":"Barahona","suffix":""},{"id":540898016,"identity":"0bfe2e36-e907-4436-868d-3a653f1b482e","order_by":2,"name":"Barbara Bravi","email":"","orcid":"","institution":"Imperial College London","correspondingAuthor":false,"prefix":"","firstName":"Barbara","middleName":"","lastName":"Bravi","suffix":""}],"badges":[],"createdAt":"2025-11-06 03:30:30","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8043216/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8043216/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":95347849,"identity":"34b08a89-a999-4e39-be8a-7019af842a87","added_by":"auto","created_at":"2025-11-07 03:50:39","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":6352101,"visible":true,"origin":"","legend":"","description":"","filename":"INFUSSEpaper.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8043216/v1/cbd4521112ef371b6183dec9.pdf"},{"id":95347848,"identity":"628a8722-ba94-47b3-8e1b-4ce5f8df4993","added_by":"auto","created_at":"2025-11-07 03:50:39","extension":"json","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":4999,"visible":true,"origin":"","legend":"","description":"","filename":"NCOMMS2589348.json","url":"https://assets-eu.researchsquare.com/files/rs-8043216/v1/8c47bca90a7aec5545de3d16.json"},{"id":95347850,"identity":"333d41ce-7765-412e-873b-cd7f5e5c0f27","added_by":"auto","created_at":"2025-11-07 03:50:39","extension":"pdf","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":12590288,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryInformation.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8043216/v1/486fa11fb3fd7b970ac46b68.pdf"},{"id":96244040,"identity":"7d8f39b5-fa12-475a-bd19-d1bd1eadb5b1","added_by":"auto","created_at":"2025-11-19 07:17:39","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":6139597,"visible":true,"origin":"","legend":"Article File","description":"","filename":"INFUSSEpaper.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8043216/v1_covered_d66a0eec-6abd-42da-a30b-3380b5ddd7c7.pdf"},{"id":95347851,"identity":"ca3d0a6e-ef14-412b-87c6-7c9034f528a4","added_by":"auto","created_at":"2025-11-07 03:50:39","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":12590288,"visible":true,"origin":"","legend":"Supplementary Information","description":"","filename":"SupplementaryInformation.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8043216/v1/b5f24294ce6cd68e14691fab.pdf"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"Integrating protein sequence embeddings with structure via graph-based deep learning for single-residue property prediction","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Antibody, Deep Learning, Graph-based Learning, Interpretability, Large Language Model, Protein Structure","lastPublishedDoi":"10.21203/rs.3.rs-8043216/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8043216/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Understanding the intertwined contributions of amino acid sequence and spatial structure is essential to explain protein behaviour. Here, we introduce INFUSSE (Integrated Network Framework Unifying Structure and Sequence Embeddings), a deep learning framework for the prediction of single-residue properties that combines fine-tuning of sequence embeddings derived from a Large Language Model with the inclusion of graph-based representations of protein structures via a diffusive Graph Convolutional Network. To illustrate the benefits of jointly leveraging sequence and structure, we apply INFUSSE to the prediction of B-factors in antibodies, a residue property that reflects the local flexibility shaped by biochemical and structural constraints in these highly variable and dynamic proteins. Using a dataset of 1510 antibody and antibody-antigen complexes from the database SAbDab, we show that INFUSSE improves performance over current machine learning (ML) methods based on sequence or structure alone, and allows for the systematic disentanglement of sequence and structure contributions to the performance. Our results show that adding structural information via geometric graphs enhances predictions especially for intrinsically disordered regions, protein-protein interaction sites, and highly variable amino acid positions---all key structural features for antibody function which are not well captured by purely sequence-based ML descriptions.","manuscriptTitle":"Integrating protein sequence embeddings with structure via graph-based deep learning for single-residue property prediction","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-11-07 03:50:35","doi":"10.21203/rs.3.rs-8043216/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"def7528f-37d0-4950-be9f-d5d52b58c46d","owner":[],"postedDate":"November 7th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":57539238,"name":"Biological sciences/Structural biology"},{"id":57539239,"name":"Biological sciences/Computational biology and bioinformatics/Machine learning"}],"tags":[],"updatedAt":"2025-11-14T23:55:13+00:00","versionOfRecord":[],"versionCreatedAt":"2025-11-07 03:50:35","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8043216","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8043216","identity":"rs-8043216","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-05-24T02:00:01.246996+00:00
License: CC-BY-4.0