Supervised learning of protein melting temperatures: cross-species vs species-specific prediction

doi:10.1101/2024.10.12.617972

Supervised learning of protein melting temperatures: cross-species vs species-specific prediction

2024 · doi:10.1101/2024.10.12.617972

preprint OA: closed

📄 Open PDF Full text JSON View at publisher

Full text 2,742 characters · extracted from oa-doi-fallback · click to expand

Abstract Protein melting temperatures are important proxies for stability, and frequently probed in protein engineering campaigns, for instance for enzyme discovery and protein optimization. With the emergence of large datasets of melting temperatures for diverse natural proteins, it has become possible to train models to predict this quantity, and the literature has reported impressive performance values in terms of Spearman rho. The high correlation scores suggest that it should be possible to accurately predict melting temperature changes in engineered variants, and to reliably identify naturally thermostable proteins. However, in practice, results in these settings are often disappointing. In this paper, we explore this apparent discrepancy. We show that Spearman rho over cross-species data gives an overly optimistic impression of prediction performance, and that this metric reflects the ability to distinguish global differences in amino acid composition between species, rather than the specific effects of genetic variation. We proceed by investigating whether cross-species training on melting temperature is beneficial at all, compared to training specific models for each species. We address this question using four different transfer-learning approaches and a fine-tuning procedure. Surprisingly, we consistently find no benefit of cross-species training. We conclude that 1) current models for supervised prediction of melting temperature perform substantially worse than the literature suggests, and 2) that reliable transfer across species is still a challenging problem. An implementation of this work is available at https://github.com/deltadedirac/thermocontrast_tm Competing Interest Statement The authors have declared no competing interest. Footnotes Sebastian.Garcia.Lopezs{at}di.ku.dk, JRSX{at}novonesis.com, wb{at}di.ku.dk This updated version provides a refined analysis of protein thermostability prediction, focusing on melting temperature. It builds on previous work by comparing two modeling approaches: transfer learning and Low-Rank Adaptation (LoRA) fine-tuning. These approaches were assessed in both cross-species (global) and species-specific settings, allowing for a more thorough evaluation of their ability to generalize across organisms and specialize within individual species. To strengthen the reliability of the results, statistical significance tests were included to support the observed differences between modeling strategies. Additional experiments were also conducted to confirm and expand upon the initial findings. Together, these improvements offer a more robust and detailed understanding of the trade-offs involved in different approaches to modeling protein thermostability.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00