Much ado about nothing: Modelling amino acid replacement with predicted protein structures

preprint OA: closed
Full text JSON View at publisher
Full text 1,842 characters · extracted from oa-doi-fallback · click to expand
Abstract Substitution matrices like BLOSUM62 model the likelihood of replacement of amino acids in evolution. Substitution matrices are used in protein sequence alignment tasks. Since the introduction of BLOSUM62 over three decades ago, many matrices have been released. Yet, to date, no effort uses large amounts of 3D structures predicted by AlphaFold. Here, we define AFSM, the AlphaFold Substitution Matrix derived from over 20,000 predicted 3D structures following the BLOSUM methodology. We benchmark AFSM against BLOSUM62 and 16 other matrices on five tasks in multiple sequence alignment (MSA) and protein homology search. Our analysis surprisingly reveals that all matrices perform similarly. Only when there are few sequences in an MSA, then BLOSUM62 and AFSM perform better than using no matrix. This suggests that substitution matrices were most beneficial when there was little sequence data. We corroborate this argument by showing that embeddings, which are computed from billions of sequences, perform better than substitution matrices, when sequence data is sparse. Taken together, this suggests that structural data does not improve BLOSUM62. But increased sequence data makes extrapolation with substitution matrices obsolete. Nonetheless, BLOSUM62 continues to capture chemists’ intuition on amino acids by providing numerical values implicitly reflecting physicochemical properties, and it remains indispensable for direct comparison of two sequences. Competing Interest Statement The authors have declared no competing interest. Footnotes Competing interest: None declared Data Availability Statement: Data is part of the manuscript. lukas.buschmann{at}mailbox.tu-dresden.de, sarah_naomi.bolz{at}tu-dresden.de, ferras.el-hendi{at}tu-dresden.de, negin.malekian{at}tu-dresden.de, michael.schroeder{at}biotec.tu-dresden.de,

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00