Full text
2,546 characters
· extracted from
oa-html
· click to expand
Abstract
Structural alphabets have transformed protein phylogenetics by enabling sequence-style alignment and maximum-likelihood inference to be applied directly to structural data. However, a coordinate-explicit alphabet, in which character states are derived from three-dimensional atomic positions, encodes not only evolutionary signal but also the conformational variability inherent to protein structure. This source of noise has not previously been quantified in a phylogenetic context, and no framework exists for comparing alphabets with respect to their conformational sensitivity. Here, we introduce the Normalised Noise Index (NNI), a Shannon entropy-based metric for quantifying conformational sensitivity in structural alphabet encodings, and apply it alongside ensemble-wide Robinson–Foulds (RF) variance as a framework for characterising the impact of conformational noise on phylogenetic inference. Across 3,749 single-chain NMR ensembles from the Protein Data Bank, we show that 3Di character variability is a pervasive feature of experimentally observed conformational spread, with NNI negatively correlated with within-ensemble structural stability. A 100 ns molecular dynamics simulation of myoglobin confirmed that thermal fluctuations alone are sufficient to generate comparable 3Di character variation and, in 2.9% of cases, to redirect maximum-likelihood tree search away from the expected topology in a 4-taxon globin benchmark with independently established relationships. Exhaustive enumeration of 4,800 conformational replicates across three NMR ensembles revealed that topological variance under 3Di encoding is approximately 1.7-fold greater than under structural distance, based on 11,517,600 pairwise RF comparisons, a source of uncertainty invisible to standard bootstrap analysis. By contrast, TEA, a sequence-derived structure-aware alphabet inferred from ESM-2 embeddings rather than directly from atomic coordinates, is insulated from conformational sampling by construction and yields zero topological variance across all conformational replicates, serving here as a noise-insulated reference rather than a proposed replacement for 3Di. Together, these results demonstrate that alphabet choice is a methodological variable in structural phylogenetics, and that the NNI metric and RF variance frame-work introduced here provide a practical basis for principled noise characterisation as new structural alphabets continue to emerge.
Competing Interest Statement
The authors have declared no competing interest.
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.