A Constraint-Weighted Transfer Model Predicts Cross-System Generalization in Protein Fitness Landscapes

doi:10.21203/rs.3.rs-9623030/v1

A Constraint-Weighted Transfer Model Predicts Cross-System Generalization in Protein Fitness Landscapes

2026 · doi:10.21203/rs.3.rs-9623030/v1

preprint OA: closed

Full text JSON View at publisher

Full text 49,819 characters · extracted from preprint-html · click to expand

A Constraint-Weighted Transfer Model Predicts Cross-System Generalization in Protein Fitness Landscapes | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article A Constraint-Weighted Transfer Model Predicts Cross-System Generalization in Protein Fitness Landscapes Jeffery Scott Allbright This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9623030/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Transfer learning is widely used in protein fitness prediction but remains inconsistent: a model trained on one system may generalize well to some targets yet fail on others. The determinants of successful transfer remain poorly understood. Here, we introduce a constraint-weighted transfer model, a simple predictive framework that enables zero-shot transfer across protein systems without target-specific training. The model uses site-level sensitivity structure learned from a source system to define a weighting function over sequence variation in a target system. We systematically evaluate the model across 12 source–target transfer pairs spanning fluorescent proteins and β-lactamase systems using publicly available deep mutational scanning datasets. The model produces measurable predictive improvements over an ungated baseline in compatible systems and degrades in incompatible systems, revealing three reproducible regimes of transfer behavior. We show that transfer performance is strongly predicted by alignment of site-level sensitivity structure between source and target (Spearman r = 0.947), with additional contribution from overlap of the most sensitive sites (r = 0.752). This relationship holds across all evaluated systems, including independent validation in β-lactamase datasets. These results establish constraint alignment as a predictive criterion for cross-system generalization and position constraint-weighted transfer as a simple, interpretable baseline for zero-shot protein fitness prediction. Bioinformatics transfer learning protein fitness prediction deep mutational scanning cross-system generalization constraint-based representation sensitivity alignment GFP β-lactamase Figures Figure 1 Introduction Predicting the functional consequences of protein sequence variation is a central challenge in molecular biology and protein engineering. Machine learning approaches increasingly rely on transfer learning, in which a model trained on one protein system is applied to a related but distinct target. While transfer can be effective, performance is highly variable: some source–target pairs generalize well, while others fail or even degrade relative to simple baselines. Current approaches typically assume that sequence similarity, structural similarity, or embedding similarity should determine transfer success. However, empirical observations frequently contradict this assumption, suggesting that generalization depends on properties not captured by conventional similarity measures. In this study, we investigate cross-system transfer through the lens of site-level sensitivity structure — the contribution of each position to functional variation within a protein. This structure reflects underlying functional constraints: which positions matter and how variation propagates through sequence space. We introduce a constraint-weighted transfer model that uses these sensitivities to define a transferable representation across systems. Using deep mutational scanning datasets from multiple fluorescent proteins and two β-lactamase systems, we show that transfer performance is not stochastic but organizes into reproducible regimes that are predictable from alignment of functional constraint structure. Results We evaluated transfer by applying site-level sensitivities learned on a source protein to a target protein without retraining on target labels. Performance was measured as Spearman correlation (ρ) between predicted and observed phenotype differences across randomly sampled variant pairs. Across 12 source–target pairs spanning GFP-family proteins and two β-lactamase systems, transfer outcomes separated into three clear regimes: compatible (substantial improvement), boundary (no meaningful benefit), and incompatible (performance degradation). These regimes were consistent regardless of source system, indicating that compatibility is largely a property of the target. Table 1 Cross-system transfer performance and compatibility metrics Source → Target Δρ 95% CI Bootstrap p(Δρ ≤ 0) Sensitivity Alignment Top 20% Overlap Regime amacGFP → Q6WV12 0.073 [0.056, 0.098] 0.00 0.370 0.292 Compatible ppluGFP2 → Q6WV12 0.162 [0.140, 0.190] 0.00 0.679 0.571 Compatible amacGFP → Q8WTC7 0.163 [0.143, 0.193] 0.00 0.791 0.709 Compatible ppluGFP2 → Q8WTC7 0.055 [0.044, 0.069] 0.00 0.351 0.313 Compatible ppluGFP2 → amacGFP 0.116 [0.099, 0.137] 0.00 0.403 0.323 Compatible amacGFP → ppluGFP2 0.116 [0.099, 0.137] 0.00 0.403 0.323 Compatible amacGFP → avGFP -0.012 [-0.034, 0.004] 0.94 0.344 0.362 Boundary amacGFP → cgreGFP -0.017 [-0.029, 0.000] 0.96 -0.351 0.011 Incompatible ppluGFP2 → avGFP -0.012 [-0.024, -0.001] 0.98 0.341 0.265 Incompatible ppluGFP2 → cgreGFP -0.014 [-0.031, -0.000] 0.96 -0.306 0.036 Incompatible Firnberg → Jacquier -0.116 [-0.164, -0.066] 1.00 -0.190 0.000 Incompatible Jacquier → Firnberg -0.122 [-0.189, -0.071] 1.00 -0.164 0.000 Incompatible Δρ = ρ_transfer − ρ_ungated. Sensitivity alignment is the Spearman correlation of fully fit site sensitivities across aligned positions. Top 20% Overlap is the Jaccard overlap of the top 20% most sensitive aligned sites. Regime classification: compatible if Δρ > 0 and 95% CI excludes 0; incompatible if Δρ < 0 and 95% CI excludes 0; boundary if CI overlaps 0. Transfer improvement increased monotonically with alignment of site-level sensitivity structure (sensitivity alignment vs Δρ: Spearman r = 0.947; top 20% overlap vs Δρ: Spearman r = 0.752. Transfer improvement (Δρ = ρ_transfer − ρ_ungated) is shown as a function of top 20% sensitive-site overlap (left) and sensitivity alignment (Spearman correlation of fully fit site sensitivities across aligned positions, right) between source and target systems. Each point represents one source→target transfer across GFP-family and β-lactamase proteins. Points are colored by empirical regime classification: compatible (green), boundary (orange), and incompatible (red). Transfer improvement increases monotonically with both measures (Spearman r = 0.752 for overlap; r = 0.947 for sensitivity alignment). Compatible systems cluster at high alignment and exhibit substantial positive transfer, including independent validation on Q8WTC7 and Q6WV12. Boundary systems (avGFP) show Δρ values indistinguishable from zero, while incompatible systems (cgreGFP) exhibit negative Δρ. Dashed lines indicate least-squares linear fits. The horizontal dotted line denotes Δρ = 0 (no improvement over the ungated baseline). Cross-domain validation To test whether the observed regimes generalize beyond the GFP family, we evaluated transfer between two independent β-lactamase datasets (Firnberg et al., 2014 ; Jacquier et al., 2013 ). Both directions showed strongly negative Δρ (Firnberg → Jacquier: −0.116; Jacquier → Firnberg: −0.122), with confidence intervals lying entirely below zero. These systems also exhibited negative sensitivity alignment and near-zero overlap of the most sensitive sites, confirming that misalignment of functional constraint structure predicts transfer failure. Methods Datasets All analyses use publicly available deep mutational scanning datasets: avGFP (Sarkisyan et al., 2016 ), amacGFP, ppluGFP2, cgreGFP, Q8WTC7 and Q6WV12 (Somermeyer et al., 2022 ), and β-lactamase datasets (Firnberg et al., 2014 ; Jacquier et al., 2013 ). No new experimental data were generated. Sequence Alignment Proteins were aligned using MAFFT. Positions were mapped into a shared aligned index. Positions without valid correspondence between a given source–target pair were excluded. Constraint-Weighted Transfer Model Each protein sequence is represented as a one-hot encoded vector across aligned positions. A linear regression model is trained to predict phenotype values. For each aligned position, site-level sensitivity is defined as the mean absolute regression coefficient across amino acid substitutions at that position. For transfer, source sensitivities are mapped to the target via the alignment table. Predicted phenotype differences between variant pairs are computed as the weighted sum of differences across mutated positions, where weights are the source-derived sensitivities. Baselines Ungated baseline: uniform weighting across positions Fully fit model: model trained directly on the target system Evaluation Performance is evaluated using Spearman rank correlation (ρ) between predicted and observed phenotype differences across ~ 2,500 randomly sampled variant pairs. Transfer improvement is Δρ = ρ_transfer − ρ_ungated. Confidence intervals and statistical tests use 1,000 bootstrap resamples. Discussion A simple model reveals structured transfer regimes The constraint-weighted transfer model reveals that cross-system transfer is not stochastic but follows reproducible regimes determined by properties of the target system. Constraint alignment predicts generalization Alignment of site-level sensitivity structure provides a strong predictor of transfer success, enabling estimation of transferability prior to model application. Implications for protein representation learning These results suggest that effective representations for biological systems must preserve functional constraint structure rather than relying solely on similarity measures. Limitations This study evaluates a limited number of protein systems. Extending these results to broader classes and more complex models is an important direction for future work. Conclusion We introduce a constraint-weighted transfer model for zero-shot protein fitness prediction and show that its performance is governed by alignment of functional constraint structure. These findings provide a predictive framework for cross-system generalization and establish a simple, interpretable baseline for transfer learning in protein systems. Declarations Availability of Data and Materials All datasets used in this study are publicly available. Processed datasets and analysis code will be made available on GitHub prior to formal publication. The repository link will be added in a revised version of this preprint. Author Contributions Jeffery Scott Allbright conceived the study, designed the analysis, performed experiments, and wrote the manuscript. Competing Interests Jeffery Scott Allbright reports intellectual property (issued and/or pending) related to the theoretical and computational frameworks described in this work and is a co-founder of Forma Substrate, Inc. and AIIA Technologies, Inc., which may commercialize these ideas. The author declares no other competing interests. References Sarkisyan KS et al (2016) Local fitness landscape of the green fluorescent protein. Nature 533:397–401 Somermeyer LG et al (2022) High-throughput mapping of protein sequence–function relationships across orthologs. Nat Commun 13:1–12 Firnberg E et al (2014) A comprehensive, high-resolution map of a gene’s fitness landscape. Mol Biol Evol 31:1581–1592 Jacquier H et al (2013) Capturing the mutational landscape of the beta-lactamase TEM-1. Proceedings of the National Academy of Sciences 110, 13067–13072 Pédelacq J-D, Cabantous S, Tran T, Terwilliger TC, Waldo GS (2006) Engineering and characterization of a superfolder green fluorescent protein. Nat Biotechnol 24:79–88 Shaner NC et al (2004) Improved monomeric red, orange and yellow fluorescent proteins derived from Discosoma sp. red fluorescent protein. Nat Biotechnol 22:1567–1572 Heim R, Prasher DC, Tsien RY (1994) Wavelength mutations and posttranslational autoxidation of green fluorescent protein. Proceedings of the National Academy of Sciences 91, 12501–12504 Fowler DM, Fields S (2014) Deep mutational scanning: a new style of protein science. Nat Methods 11:801–807 Starr TN, Thornton JW (2016) Epistasis in protein evolution. Protein Sci 25:1204–1218 Rives A et al (2021) Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences 118, e2016239118. (ESM-2 foundation) Meier J et al (2021) Language models enable zero-shot prediction of the effects of mutations on protein function. Adv Neural Inf Process Syst 34:29287–29303 Bronstein MM, Bruna J, Cohen T, Veličković P (2021) Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. arXiv preprint arXiv:2104.13478. Cohen T, Welling M (2016) Group equivariant convolutional networks. In Proceedings of the International Conference on Machine Learning , 2990–2999 Lambert TJ (2019) FPbase: a community-editable fluorescent protein database. Nat Methods 16:277–278 White M (2007) The G-Ball, a New Icon for Codon Symmetry and the Genetic Code. arXiv q-bio/0702056. https://arxiv.org/abs/q-bio/0702056 Additional Declarations The authors declare potential competing interests as follows: Jeffery Scott Allbright reports intellectual property (issued and/or pending) related to the theoretical and computational frameworks described in this work and is a co-founder of Forma Substrate, Inc. and AIIA Technologies, Inc., which may commercialize these ideas. The author declares no other competing interests. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9623030","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":635071870,"identity":"9ec9f6dc-01df-46d8-9958-ecc1b8506ecc","order_by":0,"name":"Jeffery Scott Allbright","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABAElEQVRIiWNgGAWjYBADGT4JBoYDjA0McgwMPARVMzYcACpjg2gxMCZNC5BpkNhASIt8++Hjjz8w2PGwSfc+PPh1x5/0DcfPHnwAFJHTbcCuxeBMWiLQlmQeNpnjBodlzxjkbjiTl2w4gyHZ2OwADi0SPIZALcxAh6UxHJZsA2o5kGMmzcNwIHEbDi3yM8Ba6uFa0g3Ov8GvheEGWMthsJaDH9sMEgxuELAF5JcZZwyOA/1yjOEwY5ux4cwbb4wNZxjg9gswxA58qKioluOXbmP++LNNTp7vfI7hgw8VdnK4tEDtglDMoBhROIAkQhAw/gDZ20Ck6lEwCkbBKBgxAADcc1y/SJ1OfQAAAABJRU5ErkJggg==","orcid":"","institution":"AIIA Technologies, Inc","correspondingAuthor":true,"prefix":"","firstName":"Jeffery","middleName":"Scott","lastName":"Allbright","suffix":""}],"badges":[],"createdAt":"2026-05-05 23:12:39","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":true,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-9623030/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9623030/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":108666818,"identity":"5a9bd8a1-2480-4b44-a599-3c782664830b","added_by":"auto","created_at":"2026-05-07 06:45:09","extension":"jpeg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":245079,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eAlignment of site-level sensitivity structure predicts cross-system transfer performance.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTransfer improvement (Δρ = ρ_transfer − ρ_ungated) is shown as a function of top 20% sensitive-site overlap (left) and sensitivity alignment (Spearman correlation of fully fit site sensitivities across aligned positions, right) between source and target systems. Each point represents one source→target transfer across GFP-family and β-lactamase proteins. Points are colored by empirical regime classification: compatible (green), boundary (orange), and incompatible (red). Transfer improvement increases monotonically with both measures (Spearman r = 0.752 for overlap; r = 0.947 for sensitivity alignment). Compatible systems cluster at high alignment and exhibit substantial positive transfer, including independent validation on Q8WTC7 and Q6WV12. Boundary systems (avGFP) show Δρ values indistinguishable from zero, while incompatible systems (cgreGFP) exhibit negative Δρ. Dashed lines indicate least-squares linear fits. The horizontal dotted line denotes Δρ = 0 (no improvement over the ungated baseline).\u003c/p\u003e","description":"","filename":"floatimage1.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-9623030/v1/00cf282e11ead8b4706e4dd4.jpeg"},{"id":108805778,"identity":"d40649ba-69e3-49de-a08b-92bf0fd21fa9","added_by":"auto","created_at":"2026-05-08 15:26:52","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":436445,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9623030/v1/765254c7-b151-4192-ba8c-825a8c7b4a0b.pdf"}],"financialInterests":"The authors declare potential competing interests as follows: Jeffery Scott Allbright reports intellectual property (issued and/or pending) related to the theoretical and computational frameworks described in this work and is a co-founder of Forma Substrate, Inc. and AIIA Technologies, Inc., which may commercialize these ideas. The author declares no other competing interests.","formattedTitle":"\u003cp\u003e\u003cstrong\u003eA Constraint-Weighted Transfer Model Predicts Cross-System Generalization in Protein Fitness Landscapes\u003c/strong\u003e\u003c/p\u003e","fulltext":[{"header":"Introduction","content":"\u003cp\u003ePredicting the functional consequences of protein sequence variation is a central challenge in molecular biology and protein engineering. Machine learning approaches increasingly rely on transfer learning, in which a model trained on one protein system is applied to a related but distinct target. While transfer can be effective, performance is highly variable: some source\u0026ndash;target pairs generalize well, while others fail or even degrade relative to simple baselines. Current approaches typically assume that sequence similarity, structural similarity, or embedding similarity should determine transfer success. However, empirical observations frequently contradict this assumption, suggesting that generalization depends on properties not captured by conventional similarity measures. In this study, we investigate cross-system transfer through the lens of site-level sensitivity structure \u0026mdash; the contribution of each position to functional variation within a protein. This structure reflects underlying functional constraints: which positions matter and how variation propagates through sequence space. We introduce a constraint-weighted transfer model that uses these sensitivities to define a transferable representation across systems. Using deep mutational scanning datasets from multiple fluorescent proteins and two β-lactamase systems, we show that transfer performance is not stochastic but organizes into reproducible regimes that are predictable from alignment of functional constraint structure.\u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003eWe evaluated transfer by applying site-level sensitivities learned on a source protein to a target protein without retraining on target labels. Performance was measured as Spearman correlation (ρ) between predicted and observed phenotype differences across randomly sampled variant pairs. Across 12 source\u0026ndash;target pairs spanning GFP-family proteins and two β-lactamase systems, transfer outcomes separated into three clear regimes: compatible (substantial improvement), boundary (no meaningful benefit), and incompatible (performance degradation). These regimes were consistent regardless of source system, indicating that compatibility is largely a property of the target.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eCross-system transfer performance and compatibility metrics\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"7\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSource \u0026rarr; Target\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eΔρ\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003e95% CI\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eBootstrap p(Δρ\u0026thinsp;\u0026le;\u0026thinsp;0)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSensitivity Alignment\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eTop 20% Overlap\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eRegime\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eamacGFP \u0026rarr; Q6WV12\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.073\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e[0.056, 0.098]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.370\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.292\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eCompatible\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eppluGFP2 \u0026rarr; Q6WV12\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.162\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e[0.140, 0.190]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.679\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.571\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eCompatible\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eamacGFP \u0026rarr; Q8WTC7\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.163\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e[0.143, 0.193]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.791\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.709\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eCompatible\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eppluGFP2 \u0026rarr; Q8WTC7\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.055\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e[0.044, 0.069]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.351\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.313\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eCompatible\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eppluGFP2 \u0026rarr; amacGFP\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.116\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e[0.099, 0.137]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.403\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.323\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eCompatible\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eamacGFP \u0026rarr; ppluGFP2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.116\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e[0.099, 0.137]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.403\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.323\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eCompatible\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eamacGFP \u0026rarr; avGFP\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e-0.012\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e[-0.034, 0.004]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.94\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.344\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.362\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eBoundary\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eamacGFP \u0026rarr; cgreGFP\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e-0.017\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e[-0.029, 0.000]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e-0.351\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.011\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eIncompatible\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eppluGFP2 \u0026rarr; avGFP\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e-0.012\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e[-0.024, -0.001]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.341\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.265\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eIncompatible\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eppluGFP2 \u0026rarr; cgreGFP\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e-0.014\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e[-0.031, -0.000]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e-0.306\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.036\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eIncompatible\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFirnberg \u0026rarr; Jacquier\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e-0.116\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e[-0.164, -0.066]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e1.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e-0.190\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eIncompatible\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eJacquier \u0026rarr; Firnberg\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e-0.122\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e[-0.189, -0.071]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e1.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e-0.164\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eIncompatible\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eΔρ\u0026thinsp;=\u0026thinsp;ρ_transfer\u0026thinsp;\u0026minus;\u0026thinsp;ρ_ungated. Sensitivity alignment is the Spearman correlation of fully fit site sensitivities across aligned positions. Top 20% Overlap is the Jaccard overlap of the top 20% most sensitive aligned sites. Regime classification: compatible if Δρ\u0026thinsp;\u0026gt;\u0026thinsp;0 and 95% CI excludes 0; incompatible if Δρ\u0026thinsp;\u0026lt;\u0026thinsp;0 and 95% CI excludes 0; boundary if CI overlaps 0.\u003c/p\u003e \u003cp\u003eTransfer improvement increased monotonically with alignment of site-level sensitivity structure (sensitivity alignment vs Δρ: Spearman r\u0026thinsp;=\u0026thinsp;0.947; top 20% overlap vs Δρ: Spearman r\u0026thinsp;=\u0026thinsp;0.752.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eTransfer improvement (Δρ\u0026thinsp;=\u0026thinsp;ρ_transfer\u0026thinsp;\u0026minus;\u0026thinsp;ρ_ungated) is shown as a function of top 20% sensitive-site overlap (left) and sensitivity alignment (Spearman correlation of fully fit site sensitivities across aligned positions, right) between source and target systems. Each point represents one source\u0026rarr;target transfer across GFP-family and β-lactamase proteins. Points are colored by empirical regime classification: compatible (green), boundary (orange), and incompatible (red). Transfer improvement increases monotonically with both measures (Spearman r\u0026thinsp;=\u0026thinsp;0.752 for overlap; r\u0026thinsp;=\u0026thinsp;0.947 for sensitivity alignment). Compatible systems cluster at high alignment and exhibit substantial positive transfer, including independent validation on Q8WTC7 and Q6WV12. Boundary systems (avGFP) show Δρ values indistinguishable from zero, while incompatible systems (cgreGFP) exhibit negative Δρ. Dashed lines indicate least-squares linear fits. The horizontal dotted line denotes Δρ\u0026thinsp;=\u0026thinsp;0 (no improvement over the ungated baseline).\u003c/p\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eCross-domain validation\u003c/h2\u003e \u003cp\u003eTo test whether the observed regimes generalize beyond the GFP family, we evaluated transfer between two independent β-lactamase datasets (Firnberg et al., \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2014\u003c/span\u003e; Jacquier et al., \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2013\u003c/span\u003e). Both directions showed strongly negative Δρ (Firnberg \u0026rarr; Jacquier: \u0026minus;0.116; Jacquier \u0026rarr; Firnberg: \u0026minus;0.122), with confidence intervals lying entirely below zero. These systems also exhibited negative sensitivity alignment and near-zero overlap of the most sensitive sites, confirming that misalignment of functional constraint structure predicts transfer failure.\u003c/p\u003e \u003c/div\u003e"},{"header":"Methods","content":"\u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003eDatasets\u003c/h2\u003e \u003cp\u003eAll analyses use publicly available deep mutational scanning datasets: avGFP (Sarkisyan et al., \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2016\u003c/span\u003e), amacGFP, ppluGFP2, cgreGFP, Q8WTC7 and Q6WV12 (Somermeyer et al., \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2022\u003c/span\u003e), and β-lactamase datasets (Firnberg et al., \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2014\u003c/span\u003e; Jacquier et al., \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2013\u003c/span\u003e). No new experimental data were generated.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eSequence Alignment\u003c/h3\u003e\n\u003cp\u003eProteins were aligned using MAFFT. Positions were mapped into a shared aligned index. Positions without valid correspondence between a given source\u0026ndash;target pair were excluded.\u003c/p\u003e\n\u003ch3\u003eConstraint-Weighted Transfer Model\u003c/h3\u003e\n\u003cp\u003eEach protein sequence is represented as a one-hot encoded vector across aligned positions. A linear regression model is trained to predict phenotype values. For each aligned position, site-level sensitivity is defined as the mean absolute regression coefficient across amino acid substitutions at that position. For transfer, source sensitivities are mapped to the target via the alignment table. Predicted phenotype differences between variant pairs are computed as the weighted sum of differences across mutated positions, where weights are the source-derived sensitivities.\u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eBaselines\u003c/h2\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eUngated baseline: uniform weighting across positions\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eFully fit model: model trained directly on the target system\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eEvaluation\u003c/h3\u003e\n\u003cp\u003ePerformance is evaluated using Spearman rank correlation (ρ) between predicted and observed phenotype differences across ~\u0026thinsp;2,500 randomly sampled variant pairs. Transfer improvement is Δρ\u0026thinsp;=\u0026thinsp;ρ_transfer\u0026thinsp;\u0026minus;\u0026thinsp;ρ_ungated. Confidence intervals and statistical tests use 1,000 bootstrap resamples.\u003c/p\u003e"},{"header":"Discussion","content":"\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003eA simple model reveals structured transfer regimes\u003c/h2\u003e \u003cp\u003eThe constraint-weighted transfer model reveals that cross-system transfer is not stochastic but follows reproducible regimes determined by properties of the target system.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003eConstraint alignment predicts generalization\u003c/h2\u003e \u003cp\u003eAlignment of site-level sensitivity structure provides a strong predictor of transfer success, enabling estimation of transferability prior to model application.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003eImplications for protein representation learning\u003c/h2\u003e \u003cp\u003eThese results suggest that effective representations for biological systems must preserve functional constraint structure rather than relying solely on similarity measures.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003eLimitations\u003c/h2\u003e \u003cp\u003eThis study evaluates a limited number of protein systems. Extending these results to broader classes and more complex models is an important direction for future work.\u003c/p\u003e \u003c/div\u003e"},{"header":"Conclusion","content":"\u003cp\u003eWe introduce a constraint-weighted transfer model for zero-shot protein fitness prediction and show that its performance is governed by alignment of functional constraint structure. These findings provide a predictive framework for cross-system generalization and establish a simple, interpretable baseline for transfer learning in protein systems.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eAvailability of Data and Materials\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAll datasets used in this study are publicly available. Processed datasets and analysis code will be made available on GitHub prior to formal publication. The repository link will be added in a revised version of this preprint.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor Contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eJeffery Scott Allbright conceived the study, designed the analysis, performed experiments, and wrote the manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting Interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eJeffery Scott Allbright reports intellectual property (issued and/or pending) related to the theoretical and computational frameworks described in this work and is a co-founder of Forma Substrate, Inc. and AIIA Technologies, Inc., which may commercialize these ideas. The author declares no other competing interests.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eSarkisyan KS et al (2016) Local fitness landscape of the green fluorescent protein. Nature 533:397\u0026ndash;401\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSomermeyer LG et al (2022) High-throughput mapping of protein sequence\u0026ndash;function relationships across orthologs. Nat Commun 13:1\u0026ndash;12\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFirnberg E et al (2014) A comprehensive, high-resolution map of a gene\u0026rsquo;s fitness landscape. Mol Biol Evol 31:1581\u0026ndash;1592\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJacquier H et al (2013) Capturing the mutational landscape of the beta-lactamase TEM-1. \u003cem\u003eProceedings of the National Academy of Sciences\u003c/em\u003e 110, 13067\u0026ndash;13072\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eP\u0026eacute;delacq J-D, Cabantous S, Tran T, Terwilliger TC, Waldo GS (2006) Engineering and characterization of a superfolder green fluorescent protein. Nat Biotechnol 24:79\u0026ndash;88\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShaner NC et al (2004) Improved monomeric red, orange and yellow fluorescent proteins derived from \u003cem\u003eDiscosoma\u003c/em\u003e sp. red fluorescent protein. Nat Biotechnol 22:1567\u0026ndash;1572\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHeim R, Prasher DC, Tsien RY (1994) Wavelength mutations and posttranslational autoxidation of green fluorescent protein. \u003cem\u003eProceedings of the National Academy of Sciences\u003c/em\u003e 91, 12501\u0026ndash;12504\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFowler DM, Fields S (2014) Deep mutational scanning: a new style of protein science. Nat Methods 11:801\u0026ndash;807\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eStarr TN, Thornton JW (2016) Epistasis in protein evolution. Protein Sci 25:1204\u0026ndash;1218\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRives A et al (2021) Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. \u003cem\u003eProceedings of the National Academy of Sciences\u003c/em\u003e 118, e2016239118. \u003cem\u003e(ESM-2 foundation)\u003c/em\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMeier J et al (2021) Language models enable zero-shot prediction of the effects of mutations on protein function. Adv Neural Inf Process Syst 34:29287\u0026ndash;29303\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBronstein MM, Bruna J, Cohen T, Veličković P (2021) Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. arXiv preprint arXiv:2104.13478.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCohen T, Welling M (2016) Group equivariant convolutional networks. In \u003cem\u003eProceedings of the International Conference on Machine Learning\u003c/em\u003e, 2990\u0026ndash;2999\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLambert TJ (2019) FPbase: a community-editable fluorescent protein database. Nat Methods 16:277\u0026ndash;278\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWhite M (2007) The G-Ball, a New Icon for Codon Symmetry and the Genetic Code. arXiv q-bio/0702056. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://arxiv.org/abs/q-bio/0702056\u003c/span\u003e\u003cspan address=\"https://arxiv.org/abs/q-bio/0702056\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"transfer learning, protein fitness prediction, deep mutational scanning, cross-system generalization, constraint-based representation, sensitivity alignment, GFP, β-lactamase","lastPublishedDoi":"10.21203/rs.3.rs-9623030/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9623030/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eTransfer learning is widely used in protein fitness prediction but remains inconsistent: a model trained on one system may generalize well to some targets yet fail on others. The determinants of successful transfer remain poorly understood.\u003c/p\u003e \u003cp\u003eHere, we introduce a constraint-weighted transfer model, a simple predictive framework that enables zero-shot transfer across protein systems without target-specific training. The model uses site-level sensitivity structure learned from a source system to define a weighting function over sequence variation in a target system.\u003c/p\u003e \u003cp\u003eWe systematically evaluate the model across 12 source\u0026ndash;target transfer pairs spanning fluorescent proteins and β-lactamase systems using publicly available deep mutational scanning datasets. The model produces measurable predictive improvements over an ungated baseline in compatible systems and degrades in incompatible systems, revealing three reproducible regimes of transfer behavior.\u003c/p\u003e \u003cp\u003eWe show that transfer performance is strongly predicted by alignment of site-level sensitivity structure between source and target (Spearman r\u0026thinsp;=\u0026thinsp;0.947), with additional contribution from overlap of the most sensitive sites (r\u0026thinsp;=\u0026thinsp;0.752). This relationship holds across all evaluated systems, including independent validation in β-lactamase datasets.\u003c/p\u003e \u003cp\u003eThese results establish constraint alignment as a predictive criterion for cross-system generalization and position constraint-weighted transfer as a simple, interpretable baseline for zero-shot protein fitness prediction.\u003c/p\u003e","manuscriptTitle":"A Constraint-Weighted Transfer Model Predicts Cross-System Generalization in Protein Fitness Landscapes","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-05-07 06:44:59","doi":"10.21203/rs.3.rs-9623030/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"694b43d5-37c3-4213-b0e8-3ea8c29eee43","owner":[],"postedDate":"May 7th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":67589870,"name":"Bioinformatics"}],"tags":[],"updatedAt":"2026-05-07T06:44:59+00:00","versionOfRecord":[],"versionCreatedAt":"2026-05-07 06:44:59","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9623030","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9623030","identity":"rs-9623030","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00