Leveraging Sequence Purification for Accurate Prediction of Multiple Conformational States with AlphaFold2 | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Leveraging Sequence Purification for Accurate Prediction of Multiple Conformational States with AlphaFold2 Xiaolin Cheng, Enming Xing, Junjie Zhang, Shen Wang This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6087969/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract AlphaFold2 (AF2) has transformed protein structure prediction by harnessing co-evolutionary constraints embedded in multiple sequence alignments (MSAs). MSAs not only encode static structural information, but also hold critical details about protein dynamics, which underpin biological functions. However, these subtle co-evolutionary signatures, which dictate conformational state preferences, are often obscured by noise within MSA data and thus remain challenging to decipher. Here, we introduce AF-ClaSeq, a systematic framework that isolates these co-evolutionary signals through sequence purification and iterative enrichment. By extracting sequence subsets that preferentially encode distinct structural states, AF-ClaSeq enables high-confidence predictions of alternative conformations. Our findings reveal that the successful sampling of alternative states depends not on MSA depth but on sequence purity. Intriguingly, purified sequences encoding specific structural states are distributed across phylogenetic clades and superfamilies, rather than confined to specific lineages. Expanding upon AF2's transformative capabilities, AF-ClaSeq provides a powerful approach for uncovering hidden structural plasticity, advancing allosteric protein and drug design, and facilitating dynamics-based protein function annotation. Biological sciences/Computational biology and bioinformatics/Protein structure predictions Biological sciences/Computational biology and bioinformatics/Protein folding protein dynamics evolutionary constraints multiple sequence alignment conformational states AlphaFold2 Full Text Additional Declarations There is NO Competing Interest. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6087969","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":421427339,"identity":"9d8a5726-0bf7-42e7-9667-e5a1c0c5afd0","order_by":0,"name":"Xiaolin Cheng","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA1UlEQVRIiWNgGAWjYDACCQhpx8befAAicoA4LRbJ/DzHEkjSUsE4c0aOAXFa+Gc3H3v4hUGC2eBAzseHP9sY5PhuJBCw5M6xdGMZBgk+gwNnNxvztjEYSxLSYiCRYyYtAbLlYO82acY2hsQNhLXkfwNpYdxwmOeZJNBh9URoyWGT/ADUMrONh00C6LAEA4J+uZFmJg3UCAxkNmNjnnMShjPPPMCvhX9G8jPJHxV1dmzyjx8+/FFmI893nIAtIMDMY4CwlbByEGD8QZy6UTAKRsEoGKkAAJGmP1eJQIXkAAAAAElFTkSuQmCC","orcid":"","institution":"The Ohio State University","correspondingAuthor":true,"prefix":"","firstName":"Xiaolin","middleName":"","lastName":"Cheng","suffix":""},{"id":421427340,"identity":"5ccff8fa-5d8c-4e21-b7c2-a86da2ee67f2","order_by":1,"name":"Enming Xing","email":"","orcid":"","institution":"The Ohio State University","correspondingAuthor":false,"prefix":"","firstName":"Enming","middleName":"","lastName":"Xing","suffix":""},{"id":421427341,"identity":"fdf5c6ac-dc30-4de3-8101-d70ababb95fb","order_by":2,"name":"Junjie Zhang","email":"","orcid":"","institution":"THE OHIO STATE UNIVERSITY","correspondingAuthor":false,"prefix":"","firstName":"Junjie","middleName":"","lastName":"Zhang","suffix":""},{"id":421427342,"identity":"191a286a-37bb-4a79-b033-22f19066f7ff","order_by":3,"name":"Shen Wang","email":"","orcid":"https://orcid.org/0000-0002-8466-6808","institution":"The Ohio State University","correspondingAuthor":false,"prefix":"","firstName":"Shen","middleName":"","lastName":"Wang","suffix":""}],"badges":[],"createdAt":"2025-02-23 03:35:31","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6087969/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6087969/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":77658248,"identity":"e6642d72-0830-4fef-9cbd-7170d297e76c","added_by":"auto","created_at":"2025-03-04 04:02:11","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":8803065,"visible":true,"origin":"","legend":"Article File","description":"","filename":"LeveragingSequencePurificationforAccuratePredictionofMultipleConformationalStateswithAlphaFold2finalcombine.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6087969/v1_covered_21d0e113-12c4-420d-a4de-ebfe5f37d019.pdf"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"Leveraging Sequence Purification for Accurate Prediction of Multiple Conformational States with AlphaFold2","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"protein dynamics, evolutionary constraints, multiple sequence alignment, conformational states, AlphaFold2 ","lastPublishedDoi":"10.21203/rs.3.rs-6087969/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6087969/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"AlphaFold2 (AF2) has transformed protein structure prediction by harnessing co-evolutionary constraints embedded in multiple sequence alignments (MSAs). MSAs not only encode static structural information, but also hold critical details about protein dynamics, which underpin biological functions. However, these subtle co-evolutionary signatures, which dictate conformational state preferences, are often obscured by noise within MSA data and thus remain challenging to decipher. Here, we introduce AF-ClaSeq, a systematic framework that isolates these co-evolutionary signals through sequence purification and iterative enrichment. By extracting sequence subsets that preferentially encode distinct structural states, AF-ClaSeq enables high-confidence predictions of alternative conformations. Our findings reveal that the successful sampling of alternative states depends not on MSA depth but on sequence purity. Intriguingly, purified sequences encoding specific structural states are distributed across phylogenetic clades and superfamilies, rather than confined to specific lineages. Expanding upon AF2's transformative capabilities, AF-ClaSeq provides a powerful approach for uncovering hidden structural plasticity, advancing allosteric protein and drug design, and facilitating dynamics-based protein function annotation.","manuscriptTitle":"Leveraging Sequence Purification for Accurate Prediction of Multiple Conformational States with AlphaFold2","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-03-04 03:53:56","doi":"10.21203/rs.3.rs-6087969/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"db2c3f66-b54c-4bbf-a715-8e31ef8638ef","owner":[],"postedDate":"March 4th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":44923208,"name":"Biological sciences/Computational biology and bioinformatics/Protein structure predictions"},{"id":44923209,"name":"Biological sciences/Computational biology and bioinformatics/Protein folding"}],"tags":[],"updatedAt":"2025-03-04T03:53:56+00:00","versionOfRecord":[],"versionCreatedAt":"2025-03-04 03:53:56","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-6087969","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6087969","identity":"rs-6087969","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.