Integrating evolutionary signals and protein structure reveals localized adaptive divergence in Cereus lineages

preprint OA: closed
Full text JSON View at publisher
Full text 135,444 characters · extracted from preprint-html · click to expand
Integrating evolutionary signals and protein structure reveals localized adaptive divergence in Cereus lineages | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Integrating evolutionary signals and protein structure reveals localized adaptive divergence in Cereus lineages Danilo T Amaral, João Alfredo Teodoro, Maria Izadora Oliveira Cardoso, and 3 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9346910/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Understanding how positive selection translates into functional changes in proteins remains challenging, particularly in non-model species. Here, we integrate phylogenetic evidence of lineage-specific positive selection with comparative structural and biophysical analyses to investigate adaptive divergence in the Cereus fernambucensis-C. insularis clade, which occupies contrasting island and continental environments. Based on a previously published branch-site scan, we selected five orthogroups showing robust statistical support, biological relevance, and full-length sequences for downstream analyses. Orthologous protein structures were predicted and evaluated, followed by quantitative comparisons. Geometrically defined cavities were inferred by pocket detection, and hotspot enrichment near these regions was assessed. Finally, we estimated the stability effects of lineage-specific substitutions using force-field-based predictions restricted to high-confidence structural regions. Across orthogroups, global folds were generally conserved, while structural divergence was predominantly localized to flexible loops and surface-exposed regions. Variability hotspots were non-randomly enriched near predicted cavities across several orthogroups, consistent with preferential diversification of putative interaction-associated surfaces. Stability scans indicated largely mild or near-neutral energetic effects (median ΔΔG ~ 0–1 kcal·mol⁻¹), punctuated by a few strongly destabilizing substitutions at specific sites. Within the resolution of FoldX-based estimates, these patterns are consistent with the preservation of overall fold integrity. Our results indicate that adaptive evolution in Cereus lineages is primarily associated with localized modifications in surface-accessible regions within thermodynamically permissive regimes, particularly in structurally well-resolved proteins, rather than with large-scale structural innovation. More broadly, these findings highlight both the potential and current limitations of structure-informed evolutionary inference in non-model systems. Adaptation Positive Selection Stability Variability Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 1. Introduction Understanding how organisms adapt to diverse and changing environments is a central goal of evolutionary biology (Jensen et al. 2007 ). At the molecular level, adaptive genetic changes can shape protein structure and function, influencing organismal fitness by modifying enzymatic activity, regulatory interactions, and responses to environmental stress (López-Maury et al. 2008 ; Vogt 2022 ). Advances in genomics and comparative phylogenetic methods have enabled the identification of genes evolving under positive selection (PS) across lineages, providing valuable insights into the genetic basis of adaptation (Liberles et al. 2012 ; Edwards et al. 2022 ; Amaral et al. 2024 ; Jayaraman et al. 2022 ). However, translating statistical signatures of selection into mechanistic interpretations of how adaptive substitutions affect protein structure, function, and activity remains a major challenge. Branch-site models and related likelihood-based approaches based on the nonsynonymous/synonymous rate ratio (d N /d S ; Yang 2007 ; Sun and Kozai 2024 ) have become widely used to detect positive selection acting on specific evolutionary lineages. These methods have been applied to diverse taxa and ecological contexts, revealing genes potentially involved in environmental adaptation, stress tolerance, and niche specialization (Amaral et al. 2024 ; Ma et al. 2025 ; Xu et al. 2025 ). However, although d N /d S is a relevant indicator of selective pressure at the protein level, signatures of PS at the DNA level do not necessarily translate into functional divergence. Many amino acid substitutions may be selectively neutral or buffered by structural constraints, meaning that statistical signals alone provide limited information about their biological relevance (Ng and Henikoff 2006 ). Recent advances in protein structure prediction, particularly with the development of AlphaFold (Jumper et al. 2021 ), have opened new opportunities to investigate the structural consequences of evolutionary change at an unprecedented scale (Varadi and Velankar 2023 ), helping to bridge the gap between statistical signatures of selection and their functional consequences. High-confidence structural models can now be generated for thousands of proteins, enabling comparative analyses that were previously restricted to a small number of experimentally solved structures. These developments allow researchers to integrate evolutionary, structural, and functional perspectives in a unified analytical framework (Varadi and Velankar 2023 ; Zhao et al. 2024 ). Despite these advances, relatively few studies have combined genome-wide PS scans with comparative structural analyses across multiple lineages (e.g., Orsi et al. 2008 ; Batarseh et al. 2023 ; Ren et al. 2025 ). Most investigations either focus on DNA-level patterns without structural interpretation or analyze individual proteins in isolation. This creates an important methodological gap, as adaptive signals detected at the sequence level often cannot be directly linked to their potential structural or functional consequences. Moreover, many existing studies rely on codon-aware alignments and residue-level mapping of positively selected sites, approaches that require complete coding sequences and therefore cannot be readily applied to transcriptome-based datasets that frequently contain fragmented or partial transcripts. As a result, integrating evolutionary signals with structural inference at a large scale remains technically challenging, particularly for non-model organisms (Schell et al. 2025 ). Cereus fernambucensis Lem. and C. insularis Hemsl. (Cereeae, Cactaceae) compose a monophyletic group occurring in xeric coastal habitats within the Brazilian Atlantic Forest phytogeographic domain. Cereus fernambucensis is distributed in the restinga vegetation along the Brazilian Atlantic coast, whereas C. insularis is a narrow endemic restricted to coastal rocky outcrops of the Fernando de Noronha Archipelago. The phylogenetic relationships among populations of both species suggest that Cereus insularis originated from northernmost C. fernambucensis populations under a progenitor–derivative speciation model (Franco et al. 2024). In a previous comparative transcriptomic and phylogenomic study, Amaral et al. ( 2024 ) used branch-site models to identify multiple orthogroups showing signatures of lineage-specific positive selection in the C. fernambucensis - C. insularis clade. These findings suggested that adaptive evolution has contributed to shaping molecular variation across island and continental populations. However, the structural and functional consequences of these evolutionary patterns remain unknown. This leaves several questions unresolved. For example, even in the absence of codon-level site mapping, do genes exhibiting PS signatures show spatially localized and structurally interpretable divergence among lineages? In particular, it remains unclear whether these adaptive signals are associated with changes in overall protein fold, localized remodeling of surface-accessible regions consistent with altered molecular interactions, or subtle modifications that preserve overall structural stability. To address this question, we used orthogroups with signatures of positive selection identified for the C. fernambucensis - C. insularis clade (Amaral et al. 2024 ) to investigate structural divergence among lineages by integrating comparative structural modeling, sequence variability analysis, pocket detection, and stability inference. Rather than inferring function directly, we tested whether evolutionary heterogeneity was concentrated in structurally permissive regions and near putative cavities, under explicit model-confidence constraints. Using structure predictions and quantitative structural comparison methods, we examine how evolutionary divergence was reflected in three-dimensional protein structures across multiple lineages. Specifically, we aim to: (i) assess the consistency of predicted structures among orthologous proteins; (ii) quantify structural divergence using objective metrics; (iii) identify regions of elevated sequence variability and evaluate their spatial distribution within protein structures; (iv) test whether variable regions form spatial clusters near predicted functional cavities; and (v) estimate the potential impact of lineage-specific substitutions on protein stability in high-confidence structural regions. 2. Methods 2.1. Dataset and selection of positively selected orthogroups The dataset analyzed in this study was derived from a previously published transcriptomic analysis of adaptive evolution in the C. fernambucensis-C. insularis clade and related lineages (Amaral et al. 2024 ). That study identified candidate genes evolving under lineage-specific positive selection using branch-site models. Orthogroups exhibiting statistically significant likelihood ratio tests after false discovery rate correction and supported by Bayes Empirical analyses were considered candidates for adaptive evolution using PAML v.4.9 (Yang 2007 ). From the original set of positively selected genes, five orthogroups were prioritized for downstream structural and functional analyses (OG0028003, OG0028976, OG0029756, OG0030099, and OG0031271) based on the following criteria: (i) robust statistical support in branch-site tests, (ii) functional annotation suggesting biological relevance, (iii) presence of full-length or near-full-length protein sequences, and (iv) suitability for structural modeling. These orthogroups encode proteins associated with core cellular metabolism, stress response pathways, and regulatory processes (Amaral et al. 2024 ), which are frequently reported as targets of adaptive evolution in insular and environmentally heterogeneous systems (Tsatsoulis et al. 2013 ; Wek et al. 2023 ). 2.2. Retrieval and curation of orthologous protein sequences Orthologous amino acid sequences corresponding to the selected orthogroups were retrieved from the output of OrthoFinder v.2.5.5 (Emms and Kelly 2019 ) and from curated protein predictions derived from transcriptomes. Protein sequences were obtained from datasets associated with each lineage. For each orthogroup, all available orthologs were extracted and manually inspected to remove truncated, low-complexity, or poorly aligned sequences. Redundant sequences were filtered using CD-HIT v.4.8.1 (Li and Godzik 2006 ) with an identity threshold of 100% to ensure non-redundant datasets. Sequence identifiers were standardized to reflect orthogroup membership and lineage origin. The final curated datasets were used as input for subsequent analyses. 2.3. Multiple sequence alignment and variability analysis Multiple sequence alignments were generated at the amino acid level using MAFFT v7.505 (Katoh et al. 2005 ) with the L-INS-i strategy, which is optimized for accuracy in datasets with low to moderate sequence divergence. Default gap opening and extension penalties were applied. Alignment quality was assessed visually and by summary statistics, including gap frequency and alignment length. To quantify site-specific variability, Shannon entropy (Shannon 1948 ) was calculated for each alignment position using custom Python scripts built with BioPython and SciPy. Positions with gap frequencies greater than > 20% were excluded from entropy analyses. Sequence variability hotspots were defined as positions within the upper 5th percentile of entropy values among all non-gapped alignment positions. 2.4. Structural modeling Three-dimensional protein structures were predicted using AlphaFold (Jumper et al. 2021 ; available at https://alphafoldserver.com/ ), an optimized high-throughput modeling tool. For each orthogroup and lineage, the corresponding amino acid sequence was submitted to AlphaFold using default parameters. For each protein, the model with the highest average predicted Local Distance Difference Test (pLDDT) score was selected for downstream analyses. Model confidence was evaluated using residue-level pLDDT scores. Regions with mean pLDDT values below 70 were considered low-confidence and were interpreted with caution. 2.5. Quantitative structural comparison and similarity assessment Structural similarity among orthologous models was assessed using TM-align (Zhang and Skolnick 2005 ) and Foldseek v.10 (van Kempen et al. 2022 ) to increase robustness. Pairwise structural alignments were performed for all lineage pairs within each orthogroup. For each alignment, TM-score and root-mean-square deviation (RMSD) values were recorded. TM-score values were interpreted according to standard guidelines, with scores above 0.5 indicating shared global folds and values above 0.7 reflecting high structural similarity. RMSD values were calculated over aligned Cα atoms and used as a complementary measure of structural divergence. Results were compiled into pairwise similarity matrices and visualized as heatmaps using Python-based plotting libraries. 2.6. Identification of structural cavities and pocket–hotspot association Putative ligand-binding pockets were predicted using FPocket v4.0 (Le Guilloux et al. 2009 ) with default parameters. For each structural model, FPocket was used to identify cavities based on alpha sphere detection and Voronoi tessellation. Detected pockets were ranked by fpocket score, volume, and hydrophobicity. These predictions represent geometrically defined cavities and should not be interpreted as direct evidence of functional binding sites. To assess the spatial relationship between predicted pockets and sequence-variability hotspots, hotspot residues were mapped onto corresponding structural coordinates. Hotspot residues were classified as pocket-associated when at least one heavy atom was located within 5 Å of a pocket-defining alpha sphere. Statistical enrichment of hotspots in pocket-associated regions was evaluated using Fisher’s exact test by comparing observed and expected frequencies. 2.7. In silico stability analysis Protein stability was evaluated using FoldX v4 (Schymkowitz et al. 2005 ), which estimates changes in folding free energy (ΔΔG) associated with amino acid substitutions based on empirical force-field approximations and predicted structural models. Before the mutational analyses, all structural models were subjected to the RepairPDB function to optimize side-chain conformations and remove steric clashes. For each orthogroup, lineage-specific amino acid substitutions identified from multiple sequence alignments were introduced using the BuildModel command. For each mutation, five independent FoldX runs were performed, and mean ΔΔG values were calculated to minimize stochastic variation. Analyses were restricted to residues located in high-confidence structural regions (pLDDT > 70). Mutations in low-confidence regions were excluded. ΔΔG values were interpreted according to standard conventions, with positive values indicating destabilizing effects and negative values indicating stabilizing effects, while acknowledging the intrinsic uncertainty associated with force-field-based estimates and predicted backbones. 2.8. Integration of structural, sequence, and functional analyses Structural similarity metrics, entropy-based variability measures, pocket predictions, and stability estimates were integrated using custom Python scripts. For each orthogroup, the combined datasets were used to identify regions exhibiting convergent evidence of sequence divergence, structural remodeling, and potential functional relevance. The overall analytical workflow, which integrates evolutionary inference, structural modeling, functional site prediction, and stability analysis, is summarized in Figure S1 . 3. Results and Discussion 3.1. Selection of positively selected orthogroups and dataset structure The five selected orthogroups showed consistently high structural quality across lineages, as indicated by high median pLDDT values and stable TM-scores (Fig. 1 ). These results support strong overall consistency among the predicted structures of orthologous proteins (Objective i). These results indicate that the predicted models were structurally robust across lineages, providing a reliable basis for downstream comparisons of sequence variation, structural divergence, and stability. This approach is consistent with recent recommendations to incorporate AlphaFold confidence metrics into structure-informed evolutionary studies (Akdel et al. 2022 ). The selected orthogroups analyzed here include genes functionally associated with metabolism and stress response, as defined by the annotation and selection framework reported by Amaral et al. ( 2024 ). This functional profile is consistent with adaptive scenarios involving environmental heterogeneity, resource limitation, and fluctuating selective pressures. Island systems, in particular, often impose strong constraints on energetic efficiency and physiological resilience, thereby favoring the optimization of metabolic and regulatory networks (Whittaker and Fernández-Palacios 2007 ; Warren et al. 2015 ; Losos and Ricklefs 2021). Moreover, comparative studies in vertebrates, insects, and plants have repeatedly shown that stress-related proteins and metabolic enzymes are recurrent targets of positive selection in insular and fragmented habitats (Rolland et al. 2014 ; Jeffries et al. 2018; Tigano and Friesen 2016 ). Our results are consistent with this broader pattern and suggest that similar selective mechanisms may operate in our studied system. The selected orthogroups exhibited contrasting structural and evolutionary profiles (Table 1). OG0028003 and OG0030099 showed the highest model confidence (mean pLDDT = 81.7 and 69.2, respectively) and exhibited strong structural conservation (mean TM-score = 0.901 and 0.933). These features make them the most suitable orthogroups for detailed structure-based functional interpretation. In contrast, OG0029756 and OG0031271 exhibited substantially lower model confidence (mean pLDDT = 48.8 and 61.0, respectively) and high predicted disorder (0.81 and 1.00, respectively), suggesting that these proteins may contain extensive intrinsically disordered regions. This pattern indicates that these later orthogroups likely correspond to conformationally flexible or partially unstructured proteins. As a result, fine-scale structural inference is less reliable in these cases, highlighting an important methodological limit of structure-based analyses, thereby enabling subsequent assessments defined in Objectives (ii–v) (van Der Lee et al. 2014 ; Wright and Dyson 2015 ; Chen et al. 2018). 3.2. Structural Model Quality and Lineage-Specific Structural Divergence Structural models were generated for all selected orthogroups, and residue-level profiles were used to assess local variation in model confidence (Jumper et al. 2021 ; Tunyasuvunakool et al. 2021 ). While overall structural robustness is described above (Objective i), here we focus on within-protein confidence patterns. OG0028003 and OG0030099 consistently displayed extensive high-confidence segments, with most residues exceeding pLDDT > 75 and relatively few low-confidence regions. These profiles are consistent with well-defined secondary and tertiary structural elements, enabling more reliable residue-level structural interpretation for these proteins (Akdel et al. 2022 ). In contrast, the remaining orthogroups exhibited lower average pLDDT values and broader predicted intrinsically disordered regions (IDRs), thereby restricting the reliability of fine-scale structural and energetic analyses. Although lower-confidence regions should be interpreted cautiously, they may reflect biologically meaningful flexibility rather than prediction artifacts. Intrinsically disordered and flexible regions are common in regulatory proteins and stress-response factors and often associated with adaptive diversification (Wright and Dyson 2015 ; Babu 2016 ). Thus, localized low-confidence segments do not necessarily compromise the overall structural interpretation, but may instead indicate functional plasticity. The distribution of pLDDT scores across orthogroups and lineages (Fig. 1 ) indicates that structural confidence was broadly consistent among populations. This pattern suggests that the structural differences inferred among lineages are unlikely to reflect systematic modeling bias. Integrating structural confidence metrics into evolutionary analyses has been increasingly advocated to reduce false functional inferences based solely on sequence-level signals (Akdel et al. 2022 ). To quantitatively assess structural divergence among orthologous proteins, pairwise structural alignments were performed (Objective ii.). Across all orthogroups, most pairwise comparisons exhibited moderate-to-high TM-scores, indicating substantial conservation of global folds. TM-scores above 0.5 typically reflect shared structural topology, whereas values above 0.7 indicate strong fold-level conservation (Zhang and Skolnick 2005 ). In this context, OG0028003 and OG0030099 showed consistently high TM-scores (> 0.7) and low RMSD values, revealing remarkable structural conservation despite underlying sequence divergence. Quantitatively, OG0028003 and OG0030099 exhibited low structural dispersion (mean RMSD = 2.15 Å and 2.21 Å, respectively), whereas OG0029756 and OG0031271 showed markedly higher divergence (mean RMSD = 5.35 Å and 4.42 Å), reflecting substantial conformational heterogeneity (Table 1). OG0028976 displayed intermediate behavior (mean TM-score = 0.683; RMSD = 2.28 Å), consistent with partial fold conservation combined with enhanced surface flexibility. This pattern suggests strong functional constraints acting on core structural elements, consistent with the maintenance of essential functional architectures. Similar decoupling between sequence divergence and structural conservation has been widely reported and reflects the intrinsic robustness of protein folds (Chothia and Lesk 1986 ; Illergård et al. 2009 ; Liberles et al. 2012 ). By contrast, orthogroups with lower TM-scores and higher RMSD values showed greater structural variability among lineages. Importantly, this divergence was primarily localized to loop regions and peripheral domains, whereas core secondary structure elements remained largely conserved (Bloom et al. 2006 ; Echave et al. 2016 ; Arenas et al. 2013 ). Such spatially restricted divergence is particularly consistent with branch-site positive selection models, which often detect adaptive changes affecting limited subsets of residues rather than entire domains (Venkat et al. 2018 ). The concordance between sequence-level selection signals and localized structural divergence strengthens the biological plausibility of the inferred adaptive events. Heatmaps of pairwise RMSD and TM-score values (Fig. 2 ) further illustrate lineage-specific structural differentiation, with clearer divergence patterns observed in orthogroups previously identified as under positive selection. Together, these results show that integrating structural alignment metrics with evolutionary modeling provides a more nuanced view of adaptive divergence, bridging genotype–structure–phenotype relationships (Liberles et al. 2012 ). Overall, the structural analyses indicate that adaptive evolution in the studied system predominantly operates through localized conformational adjustments rather than through large-scale fold innovation, reinforcing the view that protein evolution is typically constrained by fold stability while permitting functional modulation at peripheral regions (Ferreiro et al. 2025 ; Huang et al. 2023 ). 3.3. Sequence variability hotspots and their functional-structural context Shannon entropy analyses were implemented to identify regions of elevated sequence variability and evaluate their distribution along protein sequences (Objective iii.), without assuming direct evidence of positive selection at individual sites. Entropy values were calculated for each alignment position, and highly variable regions were defined as positions with top-5% entropy values and exhibiting low gap frequencies. Importantly, entropy-based hotspots reflect patterns of sequence variability and are not equivalent to codon-based signatures of positive selection. Across orthogroups, sequence variability hotspots were unevenly distributed along protein sequences and tended to cluster in flexible loops, terminal regions, and surface-exposed segments. In contrast, structurally conserved cores displayed consistently low entropy values, reflecting strong purifying constraints associated with fold stability and functional integrity (Chothia and Lesk 1986 ; Echave et al. 2016 ; Liberles et al. 2012 ). This spatial segregation between conserved cores and more variable peripheral regions is a well-established hallmark of protein evolution. Core residues are typically constrained by thermodynamic and folding requirements, whereas surface-exposed regions tolerate higher mutational loads and may facilitate lineage-specific functional modulation, particularly when combined with independent evidence of positive selection (Bloom et al. 2006 ; Goldstein 2011 ; Nichol et al. 2019 ). Orthogroups with higher structural conservation exhibited fewer, more localized hotspots, whereas structurally divergent proteins displayed broader regions of elevated variability. This relationship indicates a tight coupling between fold stability and evolutionary flexibility, consistent with theoretical and empirical models of structure–sequence coevolution (Illergård et al. 2009 ; Arenas et al. 2013 ). The entropy profiles and hotspot distributions (Figs. 3 and 4 ) revealed that sequence diversification is concentrated in structurally permissive regions rather than within the conserved structural core. This pattern is consistent with the general expectation that purifying selection preserves residues essential for structural stability, whereas diversification accumulates in flexible or solvent-exposed regions. Several substitutions also involve shifts in physicochemical properties, including polarity, charge, and hydrophobicity, between continental and island lineages, potentially altering interaction surfaces or local binding environments. Together, these results indicate that sequence variability is spatially concentrated in structurally permissive regions rather than uniformly distributed across the protein architecture. To evaluate the functional relevance of sequence variability, structural cavities, and their spatial relationships with entropy-defined hotspots were systematically assessed (Objective iv). For most orthogroups, major pockets were consistently detected across lineages, indicating that overall binding-site architecture is evolutionarily conserved at the geometric level. Such conservation is expected for functionally essential interaction surfaces and has been observed across diverse protein families (Cukuroglu et al. 2014 ; Konc and Janežič 2014 ; Gao et al. 2022 ). These conserved pockets were frequently located near regions compatible with ligand-binding or regulatory interfaces, suggesting potential functional relevance that remains to be experimentally validated. Statistical enrichment analyses further revealed that sequence variability hotspots were significantly overrepresented near predicted pockets across several orthogroups (Table 2). Several orthogroups exhibited odds ratios > 1.5, indicating preferential localization of variable residues near interaction cavities. This pattern remained qualitatively consistent after accounting for background residue distributions, supporting a non-random association between sequence variability and functional surfaces. Together, these results indicate that evolutionary diversification preferentially targets surface-exposed regions associated with putative molecular interaction surfaces identified by geometric cavity detection, rather than structurally critical cores. This pattern is consistent with, but does not by itself demonstrate, adaptive functional divergence. Similar hotspot–pocket coupling has been reported in studies of enzyme specificity, protein–protein interactions, and signaling proteins, where adaptive substitutions fine-tune binding properties without compromising fold stability (Aftab et al. 2024 ). From an evolutionary perspective, this enrichment supports a model in which positive selection acts primarily on residues mediating intermolecular recognition, substrate affinity, or regulatory modulation. Such regions are expected to experience fluctuating selective pressures in heterogeneous environments, favoring rapid functional optimization (Arenas et al. 2013 ; Ding et al. 2022 ; Jimenez-Rosales and Flores-Merino 2018 ). In contrast, proteins displaying lower structural confidence showed weaker or inconsistent pocket–hotspot associations. This limitation likely reflects uncertainty in structural reconstruction rather than the genuine biological absence of functional coupling. These results emphasize the importance of integrating confidence metrics when interpreting structure–function relationships derived from predicted models (Akdel et al. 2022 ). The combined analysis of entropy profiles and pocket proximity (Fig. 4 ; Table 2) thus reveals that adaptive divergence in the studied orthogroups is spatially concentrated around interaction interfaces. This convergence between sequence variability, structural context, and predicted cavity architecture supports the structural plausibility of the inferred adaptive signals. 3.4. Structural stability and integrated patterns of adaptive divergence Representative structural models were subjected to systematic mutational scanning across orthogroups, and changes in folding free energy were calculated for lineage-specific variants, providing comparative rather than absolute estimates of energetic effects (Objective v.). Analyses were restricted to regions with high structural confidence (pLDDT > 70) to minimize prediction artifacts associated with uncertain backbone conformations (Jumper et al. 2021 ; Akdel et al. 2022 ). Across orthogroups, most substitutions exhibited mildly destabilizing or near-neutral effects (ΔΔG ≈ 0–1.5 kcal·mol⁻¹), although a small subset showed pronounced destabilization (> 4 kcal·mol⁻¹). Median ΔΔG values ranged from approximately 0.0 to 1.0 kcal·mol⁻¹ (Table 3), with OG0028003 and OG0028976 displaying slightly higher central tendencies (median ΔΔG ≈ 1.00 and 0.76, respectively) than OG0029756 and OG0030099 (median ΔΔG < 0.25). However, several individual sites exhibited substantially higher energetic impacts, including OG0028003 I223A (ΔΔG ≈ 3.28 kcal·mol⁻¹) and OG0030099 D55V (ΔΔG ≈ 4.47 kcal·mol⁻¹), revealing localized hotspots of structural sensitivity. These values fall within ranges commonly interpreted as mildly destabilizing or near-neutral, indicating limited impact on global folding stability within the resolution of FoldX-based predictions. This pattern is consistent with theoretical and empirical evidence indicating that most naturally occurring substitutions in functional proteins are constrained to remain close to the stability threshold required for proper folding and activity (Bloom et al. 2006 ; Goldstein 2011 ; Tokuriki and Tawfik 2009 ). A subset of lineage-specific variants displayed moderate to strong destabilizing effects, exemplified by OG0030099 D55V (ΔΔG ≈ 4.5 kcal·mol⁻¹) and OG0028003 I223A (ΔΔG ≈ 3.28 kcal·mol⁻¹). These regions are known to tolerate greater energetic perturbations due to their lower contribution to global fold stability and their frequent involvement in regulatory and interaction processes (Echave et al. 2016 ; Nichol et al. 2019 ; Pillai et al. 2022 ). Destabilizing substitutions were rarely observed within core secondary structure elements or densely packed hydrophobic cores. This spatial bias supports the view that adaptive divergence preferentially targets peripheral regions while preserving essential structural scaffolds. Similar patterns have been reported across diverse protein families and are considered a hallmark of structurally constrained adaptive evolution (Liberles et al. 2012 ; Arenas et al. 2013 ). Overall, the FoldX results (Table 3) suggest that adaptive substitutions in this system generally operate within a thermodynamically permissive regime, as inferred from relative stability trends rather than absolute energetic values, with only a few high-impact substitutions Although OG0031271 exhibited elevated predicted disorder and comparatively lower structural confidence, representative sites were nevertheless subjected to exploratory stability analyses, which should be interpreted cautiously given the limited reliability of energy calculations in highly flexible systems. In this orthogroup, predicted ΔΔG values were consistently near-neutral (|ΔΔG| < 0.5 kcal·mol⁻¹), reinforcing its role as a methodological reference and highlighting the limits of energy-based inference in highly flexible systems. When examined individually, the analyzed orthogroups revealed complementary adaptive trajectories. OG0028003 represented a highly conserved structural scaffold with localized surface diversification and moderate energetic tolerance, suggesting functional fine-tuning without architectural disruption. OG0030099 exhibited similar fold stability but greater interface-associated variability, consistent with adaptive modulation of interaction surfaces. OG0028976 displayed intermediate structural conservation combined with elevated disorder, indicating enhanced regulatory or signaling flexibility. In contrast, OG0029756 showed low structural confidence and extensive disorder, restricting detailed functional inference, whereas OG0031271 primarily served as a methodological reference, illustrating the limits of structure-based evolutionary interpretation in highly flexible systems, as reflected by its consistently near-neutral energetic responses despite elevated disorder. Beyond the specific biological patterns observed in the studied orthogroups, our analyses also illustrate a general framework for interpreting signals of positive selection using structural information. In this approach, statistical evidence of lineage-specific positive selection identifies candidate genes, while subsequent analyses place amino acid substitutions within their structural and biophysical context. Specifically, sequence variability is mapped onto predicted protein structures to evaluate whether divergence concentrates in structurally permissive regions or near interaction-associated surfaces, and stability analyses assess whether substitutions occur within thermodynamically tolerable regimes. This framework links positive selection signals, amino acid substitutions, structural context, and potential functional consequences, thereby bridging sequence-level evolutionary inference with protein structure and biophysical constraints (Liberles et al. 2012 ; Echave et al. 2016 ; Jayaraman et al. 2022 ). By integrating structural similarity metrics (Objective ii), sequence variability profiles (Objective iii), pocket enrichment analyses (Objective iv), and stability estimates (Objective v), within the framework of structurally consistent models (Objective i), we identified consistent patterns linking evolutionary divergence to protein architecture. Orthogroups exhibiting strong evidence of positive selection in previous analyses tended to display localized structural divergence near surface-accessible and functionally relevant regions. These regions were characterized by elevated sequence entropy, enrichment near predicted interaction pockets, and moderate effects on stability, with occasional pronounced destabilization at specific hotspots, collectively suggesting adaptive remodeling of molecular interaction surfaces. Such coordinated patterns are consistent with contemporary models of protein evolution in which adaptive change is concentrated at functional interfaces rather than distributed uniformly across structures (Nichol et al. 2019 ; Pillai et al. 2022 ). This mode of evolution enables fine-tuning of binding specificity, regulatory responsiveness, or interaction networks while maintaining global fold stability. In contrast, highly conserved orthogroups exhibited low sequence variability, strong structural conservation, and minimal stability perturbations, indicating stronger functional and biophysical constraints. These proteins likely occupy central positions in essential cellular pathways, where even minor perturbations may incur substantial fitness costs (Liberles et al. 2012 ; Echave et al. 2016 ). The observed coupling between evolutionary signals, structural context, and energetic effects supports the biological relevance of the inferred adaptive events. Rather than reflecting stochastic sequence divergence, the detected patterns are consistent with selective optimization of protein surfaces under lineage-specific ecological and physiological pressures. Thus, the integrative analyses indicate that adaptive evolution in the analyzed lineages is predominantly driven by subtle yet functionally meaningful remodeling of protein surfaces, mediated by coordinated changes in sequence variability, structural configuration, and thermodynamic stability. This evolutionary strategy minimizes deleterious pleiotropic effects while enabling continuous functional innovation, reinforcing the view that most adaptive protein evolution proceeds through incremental modifications of existing molecular frameworks rather than through radical architectural shifts (Arenas et al. 2013 ; Liberles et al. 2012 ; Jayaraman et al. 2022 ). 3.5. Surface-centered adaptive divergence across lineages When considered jointly, the structural, evolutionary, and biophysical analyses reveal a consistent pattern linking lineage-specific divergence to protein spatial organization. Across orthogroups, variability hotspots and lineage-specific substitutions are predominantly concentrated in surface-exposed regions and flexible loops rather than within structurally constrained cores (Figs. 3 and 4 . The majority of high-entropy sites occur in loop regions or solvent-exposed surfaces, reinforcing the observation that diversification is largely restricted to structurally permissive regions. This spatial segregation between conserved cores and variable peripheral regions is consistent with established models of protein evolution in which purifying selection maintains residues essential for fold stability whereas diversification accumulates in solvent-accessible segments that can tolerate higher mutational loads (Chothia and Lesk 1986 ; Bloom et al. 2006 ; Echave et al. 2016 ; Liberles et al. 2012 ). Variability hotspots were frequently located near predicted structural cavities (Fig. 5 ), and enrichment analyses indicate that these sites were overrepresented near pocket-associated residues in several orthogroups (Table 2). Such coupling between sequence variability and predicted interaction surfaces suggests that adaptive divergence preferentially targets molecular interfaces rather than structurally critical cores. Stability analyses further support this interpretation, as most lineage-specific substitutions produced mild or near-neutral energetic effects in high-confidence structural regions (Table 3), consistent with adaptive change occurring within thermodynamically permissive regimes (Tokuriki and Tawfik 2009 ; Bloom et al. 2006 ; Goldstein 2011 ). Together, these results indicate that adaptive evolution in the C. fernambucensis – C. insularis clade primarily operates through localized biochemical remodeling of protein surfaces while preserving global fold architecture. Such surface-centered diversification may allow lineage-specific tuning of molecular interactions under heterogeneous ecological conditions while minimizing destabilizing effects on protein structure (Liberles et al. 2012 ; Echave et al. 2016 ; Jayaraman et al. 2022 ). 4. Conclusions By integrating evidence of positive selection with structural modeling, sequence-variability analyses, pocket detection, and stability inference, this study provides a mechanistic framework for interpreting adaptive protein evolution within the C. fernambucensis - C. insularis clade. Our results show that lineage-specific adaptation predominantly operates through localized remodeling of surface-exposed and interaction-associated regions, particularly in orthogroups exhibiting high structural confidence, while highly flexible proteins highlight the current limits of structure-informed evolutionary inference. This pattern indicates that functional diversification is primarily achieved through subtle modifications to molecular interfaces rather than through large-scale structural rearrangements. Methodologically, our integrative approach demonstrates that combining AlphaFold confidence metrics with structural, energetic, and evolutionary analyses improves the biological interpretability of positive selection signals. By integrating sequence variation, three-dimensional structure, and stability constraints, this framework offers a robust strategy for investigating adaptive molecular evolution in heterogeneous and insular systems and can be extended to other taxa and ecological contexts. Declarations Funding This work was supported by São Paulo Research Foundation (FAPESP 2023/05589-4 to DTA, 2025/17270-8 to JAT, 2024/19266-5 to MIOC, and 2020/15161-3 to FFF) and CAPES 001 (MIOC). Competing Interests The authors declare no competing interests. Authors contributions DTA conceived the idea for this study. DTA, MIOC, and JAT performed data collection and analyses. All authors contributed with numerous conceptions, writing, and the intellectual development of the paper, made multiple revisions, and approved the final draft. Funding information This work was supported by São Paulo Research Foundation (FAPESP 2023/05589-4 to DTA, 2025/17270-8 to JAT, 2024/19266-5 to MIOC, and 2020/15161-3 to FFF) and CAPES 001 (MIOC). Author Contribution DTA conceived the idea for this study. DTA, MIOC, and JAT performed data collection and analyses. All authors contributed with numerous conceptions, writing, and the intellectual development of the paper, made multiple revisions, and approved the final draft. Acknowledgments We thank the funding agencies and the authors' institutions for their support. We also thank Ms. Liamar Benevides for inspiring this work. Data Availability Custom scripts and detailed file descriptions are available in GitHub (https://github.com/BBMDO/CeEVoS). References Aftab A, Sil S, Nath S, Basu A, Basu S (2024) Intrinsic disorder and other malleable arsenals of evolved protein multifunctionality. J Mol Evol 92:669–684 Akdel M, Pires DEV, Pardo EP, Jänes J, Zalevsky AO, Mészáros B et al (2022) A structural biology community assessment of AlphaFold2 applications. Nat Struct Mol Biol 29:1056–1067 Amaral DT, Bonatelli IAS, Romeiro-Brito M, Telhe MC, Moraes EM, Zappi DC et al (2024) Comparative transcriptome analysis reveals lineage- and environment-specific adaptations in cacti from the Brazilian Atlantic Forest. Planta 260:4 Anisimova M, Yang Z (2007) Multiple hypothesis testing to detect lineages under positive selection that affects only a few sites. Mol Biol Evol 24:1219–1228 Arenas M, Dos Santos HG, Posada D, Bastolla U (2013) Protein evolution along phylogenetic histories under structurally constrained substitution models. Bioinformatics 29:3020–3028 Babu MM (2016) The contribution of intrinsically disordered regions to protein function. Cell Mol Life Sci 73:3095–3108 Batarseh TN, Batarseh SN, Morales-Cruz A, Gaut BS (2023) Comparative genomics of the Liberibacter genus reveals widespread diversity in genomic content and positive selection history. Front Microbiol 14:1206094 Bloom JD, Labthavikul ST, Otey CR, Arnold FH (2006) Protein stability promotes evolvability. Proc Natl Acad Sci USA 103:5869–5874 Chen J, Kriwacki RW (2018) Intrinsically disordered proteins: structure, function and therapeutics. J Mol Biol 430:2275 Chothia C, Lesk AM (1986) The relation between the divergence of sequence and structure in proteins. EMBO J 5:823–826 Cukuroglu E, Engin HB, Gursoy A, Keskin O (2014) Hot spots in protein–protein interfaces: Towards drug discovery. Prog Biophys Mol Biol 116:165–173 Delport W, Poon AF, Frost SD, Kosakovsky Pond SL (2010) Datamonkey 2010: a suite of phylogenetic analysis tools for evolutionary biology. Bioinformatics 26:2455–2457 Ding Y, Perez-Ortiz G, Peate J, Barry SM (2022) Redesigning enzymes for biocatalysis: exploiting structural understanding for improved selectivity. Front Mol Biosci 9:908285 Echave J, Spielman SJ, Wilke CO (2016) Causes of evolutionary rate variation among protein sites. Nat Rev Genet 17:109–121 Edwards SV, Robin VV, Ferrand N, Moritz C (2022) The evolution of comparative phylogeography: putting the geography (and more) into comparative population genomics. Genome Biol Evol 14:evab176 Emms DM, Kelly S (2019) OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol 20:238 Ferreiro D, Pazos E, Arenas M (2025) Trends in substitution models of protein evolution for phylogenetic inference. Mol Phylogenet Evol 108473 Gao M, Nakajima An D, Parks JM, Skolnick J (2022) AF2Complex predicts direct physical interactions in multimeric proteins with deep learning. Nat Commun 13:1744 Goldstein RA (2011) The evolution and evolutionary consequences of marginal thermostability in proteins. Proteins 79:1396–1407 Huang Y, Ma Q, Sun J, Zhou LN, Lai CJ, Li P et al (2023) Comparative analysis of Diospyros plastomes: insights into genomic features, mutational hotspots, and adaptive evolution. Ecol Evol 13:e10301 Illergård K, Ardell DH, Elofsson A (2009) Structure is three to ten times more conserved than sequence. Proteins 77:499–508 James JE, Lascoux M (2025) Amino acid properties, substitution rates, and the nearly neutral theory. Genome Biol Evol 17:evaf025 Jayaraman V, Toledo-Patiño S, Noda-García L, Laurino P (2022) Mechanisms of protein evolution. Protein Sci 31:e4362 Jeffries KM, Connon RE, Verhille CE, Dabruzzi TF, Britton MT, Durbin-Johnson BP et al (2019) Divergent transcriptomic signatures in response to salinity exposure. Evol Appl 12:1212–1226 Jensen JD, Wong A, Aquadro CF (2007) Approaches for identifying targets of positive selection. Trends Genet 23:568–577 Jimenez-Rosales A, Flores-Merino MV (2018) Tailoring proteins to re-evolve Nature: A short review. Mol Biotechnol 60:946–974 Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O et al (2021) Highly accurate protein structure prediction with AlphaFold. Nature 596:583–589 Katoh K, Kuma KI, Toh H, Miyata T (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res 33:511–518 Konc J, Janežič D (2014) Binding site comparison for function prediction and pharmaceutical discovery. Curr Opin Struct Biol 25:34–39 Le Guilloux V, Schmidtke P, Tuffery P (2009) Fpocket: an open source platform for ligand pocket detection. BMC Bioinformatics 10:168 Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of sequences. Bioinformatics 22:1658–1659 Liberles DA, Teichmann SA, Bahar I, Bastolla U, Bloom J, Bornberg-Bauer E et al (2012) The interface of protein structure and molecular evolution. Protein Sci 21:769–785 López-Maury L, Marguerat S, Bähler J (2008) Tuning gene expression to changing environments. Nat Rev Genet 9:583–593 Ma G, Shi M, Li Y, Wang S, Zeng X, Jia Y (2025) Diverse adaptation strategies to stress in coastal sediments. Environ Res 271:121073 Ng PC, Henikoff S (2006) Predicting the effects of amino acid substitutions. Annu Rev Genomics Hum Genet 7:61–80 Nichol D, Robertson-Tessi M, Anderson AR, Jeavons P (2019) Model genotype–phenotype mappings. J R Soc Interface 16 Orsi RH, Sun Q, Wiedmann M (2008) Genome-wide analyses of Listeria monocytogenes. BMC Evol Biol 8:233 Pillai AS, Hochberg GK, Thornton JW (2022) Simple mechanisms for the evolution of protein complexity. Protein Sci 31:e4449 Ren C, Comes HP, Zhu S, Zhang X, Jiang W, Fu C et al (2025) Genome-wide patterns of local adaptation. New Phytol 247:1503–1519 Rolland J, Condamine FL, Jiguet F, Morlon H (2014) Faster speciation in the tropics. Science 343:746–749 Schell T, Greve C, Podsiadlowski L (2025) Genome sequencing for non-model organisms. Front Zool 22:7 Schymkowitz J, Borg J, Stricher F, Nys R, Rousseau F, Serrano L (2005) The FoldX web server. Nucleic Acids Res 33:W382–W388 Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423 Sun X, Kozai T (2024) Different selection levels of mitogenomes. Diversity 16:715 Tigano A, Friesen VL (2016) Genomics of local adaptation with gene flow. Mol Ecol 25:2144–2164 Tokuriki N, Tawfik DS (2009) Stability effects of mutations and protein evolvability. Curr Opin Struct Biol 19:596–604 Tsatsoulis A, Mantzaris MD, Bellou S, Andrikoula M (2013) Insulin resistance: an evolutionary perspective. Metabolism 62:622–633 Tunyasuvunakool K, Adler J, Wu Z, Green T, Zielinski M, Žídek A et al (2021) Protein structure prediction for the human proteome. Nature 596:590–596 van der Lee R, Buljan M, Lang B, Weatheritt RJ, Daughdrill GW, Dunker AK et al (2014) Classification of intrinsically disordered proteins. Chem Rev 114:6589–6631 van Kempen M, Kim SS, Tumescheit C, Mirdita M, Gilchrist CL, Söding J et al (2022) Foldseek: fast and accurate protein structure search. bioRxiv Varadi M, Velankar S (2023) Impact of AlphaFold Protein Structure Database. Proteomics 23:2200128 Venkat A, Hahn MW, Thornton JW (2018) Multinucleotide mutations cause false inferences of positive selection. Science 361:64–69 Vogt G (2022) Environmental adaptation and epigenetics. Epigenomes 7:1 Warren BH, Simberloff D, Ricklefs RE, Aguilée R, Condamine FL, Gravel D et al (2015) Islands as model systems in ecology and evolution. Ecol Lett 18:200–217 Wek RC, Anthony TG, Staschke KA (2023) Translational control and stress response. Antioxid Redox Signal 39:351–373 Whittaker RJ, Fernández-Palacios JM (2007) Island biogeography: ecology, evolution, and conservation. Oxford University Press Wright PE, Dyson HJ (2015) Intrinsically disordered proteins in cellular signalling. Nat Rev Mol Cell Biol 16:18–29 Xu W, Luo D, Peterson K, Zhao Y, Yu Y, Ye Z et al (2025) Ecological niche models for forest adaptation. Biol Rev 100:1754–1781 Yang Z (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24:1586–1591 Zhang Y, Skolnick J (2005) TM-align: a protein structure alignment algorithm. Nucleic Acids Res 33:2302–2309 Zhao N, Wu T, Wang W, Zhang L, Gong X (2024) Methods for predicting protein complex structure. Interdiscip Sci Comput Life Sci 16:261–288 Tables Tables 1 to 3 are available in the Supplementary Files section. Additional Declarations No competing interests reported. Supplementary Files Tables.xlsx Table 1. Summary of analyzed orthogroups, including the number of sequences, number of predicted models, average pLDDT scores, mean pairwise TM-scores, and related structural quality indicators. Table 2. Summary of hotspot fpocket-derived pocket statistics and enrichment results for positively selected residues, including odds ratios, confidence intervals, and statistical significance values. Table 3. Integrated evolutionary and structural characterization of lineage-specific substitutions. Shannon entropy (bits) and consensus frequencies were calculated from multiple sequence alignments, while stability effects (ΔΔG) were estimated using FoldX across independent replicates. Positive ΔΔG values indicate destabilization. SMprotCereus.docx Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9346910","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":620232804,"identity":"7ce6d4a3-a4b6-47b0-9619-ab5a5518fb55","order_by":0,"name":"Danilo T Amaral","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABCElEQVRIie2RsWrDMBRFnyk0i6nXZ+z8g8HQUBL6LRIGZ3E9ayhFk7sodO3fVCbgLHK9GrLkA0LRmKlUNimUorodO+ggiQvS4QoJwOH4j0gzDmPyh2UFMz6GCwCcUMi4PZ7MwZd/UOCLsv1dudq1tSQMHhaPbYMn1lExa2sNbEl5tDnYlFCVRBIFGKsyD4XaU+GXGYJaUx7vEpuSyCKRtAJEKK4jr9rTFxPAq7aUY269WNIdz0pwHJRXKkwA731C6T9bcGyRVODQwn9Wwt60EIXhM75lN0JlqTABSbNOq7ixv1hXpFqzVYDBXd2f2O1cmKD1/XL+FFVW5cz3LyBmXk4JDofD4ZjkA7e9YY7NmsU9AAAAAElFTkSuQmCC","orcid":"","institution":"Universidade Federal do ABC","correspondingAuthor":true,"prefix":"","firstName":"Danilo","middleName":"T","lastName":"Amaral","suffix":""},{"id":620232807,"identity":"7812f381-5da5-432b-94f6-def6e12d56e1","order_by":1,"name":"João Alfredo Teodoro","email":"","orcid":"","institution":"Universidade Federal do ABC","correspondingAuthor":false,"prefix":"","firstName":"João","middleName":"Alfredo","lastName":"Teodoro","suffix":""},{"id":620232811,"identity":"be6f8591-238b-48a1-b791-aa46f05d47bd","order_by":2,"name":"Maria Izadora Oliveira Cardoso","email":"","orcid":"","institution":"Universidade Federal do ABC","correspondingAuthor":false,"prefix":"","firstName":"Maria","middleName":"Izadora Oliveira","lastName":"Cardoso","suffix":""},{"id":620232813,"identity":"388497c0-f6a3-436d-95cd-33e1e1ddd65b","order_by":3,"name":"Evandro M. Moraes","email":"","orcid":"","institution":"Federal University of São Carlos","correspondingAuthor":false,"prefix":"","firstName":"Evandro","middleName":"M.","lastName":"Moraes","suffix":""},{"id":620232814,"identity":"4420ef9f-9722-42df-b8a5-2aaf1d107f0f","order_by":4,"name":"Fernando F Franco","email":"","orcid":"","institution":"Federal University of São Carlos","correspondingAuthor":false,"prefix":"","firstName":"Fernando","middleName":"F","lastName":"Franco","suffix":""},{"id":620232815,"identity":"90ef29e4-c6a2-4bfa-a29a-6631a5712cc9","order_by":5,"name":"Isabel A.S. Bonatelli","email":"","orcid":"","institution":"Federal University of São Paulo","correspondingAuthor":false,"prefix":"","firstName":"Isabel","middleName":"A.S.","lastName":"Bonatelli","suffix":""}],"badges":[],"createdAt":"2026-04-07 15:08:41","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-9346910/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9346910/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":107245707,"identity":"bc867f70-72d0-4c86-8f7d-a78685c94cb7","added_by":"auto","created_at":"2026-04-19 08:06:25","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":15162,"visible":true,"origin":"","legend":"\u003cp\u003eBoxplots show the distribution of mean per-model pLDDT scores for each analyzed orthogroups.\u003c/p\u003e","description":"","filename":"Fig1.png","url":"https://assets-eu.researchsquare.com/files/rs-9346910/v1/ef195747d5450f1e6aca9a5e.png"},{"id":107484334,"identity":"f782f4db-2559-4507-a3e4-3888394a21c1","added_by":"auto","created_at":"2026-04-22 02:31:40","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":242609,"visible":true,"origin":"","legend":"\u003cp\u003eHeatmaps showing pairwise RMSD (Å) and TM-score values among predicted protein structures from selected \u003cem\u003eCereus\u003c/em\u003e orthogroups. Panels A–B: OG0028003; C–D: OG0028976; E–F: OG0029756; G–H: OG0030099; I–J: OG0031271. RMSD indicates structural divergence, while TM-score reflects global fold similarity. Comparisons were performed among samples S104, S76, S85, S80, S107, S106, and S115.\u003c/p\u003e","description":"","filename":"Fig2.png","url":"https://assets-eu.researchsquare.com/files/rs-9346910/v1/b16dee51cada56ec0aff64e1.png"},{"id":107245711,"identity":"d991826a-9a2a-46db-b55d-8dd173578add","added_by":"auto","created_at":"2026-04-19 08:06:25","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":640137,"visible":true,"origin":"","legend":"\u003cp\u003eShannon entropy profiles across multiple sequence alignments for selected orthogroups: (A) OG0028003, (B) OG0028976, (C) OG0029756, (D) OG0030099, and (E) OG0031271. Dashed lines indicate the top 5% most variable sites after excluding gap-rich positions. Peaks represent sequence variability hotspots.\u003c/p\u003e","description":"","filename":"Fig3.png","url":"https://assets-eu.researchsquare.com/files/rs-9346910/v1/b9cd5f9a365d0d2959073b5d.png"},{"id":107484658,"identity":"dfb4127f-3b7d-4f65-9fa6-9dc15c881f48","added_by":"auto","created_at":"2026-04-22 02:32:39","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":17442,"visible":true,"origin":"","legend":"\u003cp\u003eBar plot showing hotspot enrichment odds ratios for residues under positive selection located within the top three predicted pockets identified by fpocket.\u003c/p\u003e","description":"","filename":"Fig4.png","url":"https://assets-eu.researchsquare.com/files/rs-9346910/v1/07bdc0014fa9b116cddb8a9f.png"},{"id":107245712,"identity":"f5a87c17-34f5-463e-85b3-7c43309855c9","added_by":"auto","created_at":"2026-04-19 08:06:25","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":905280,"visible":true,"origin":"","legend":"\u003cp\u003ePredicted structures of two representative orthogroups: (A) OG0028003 and (B) OG0030099. Protein backbones are shown in grey, lineage-specific substitutions in blue, and hotspot residues in red. Substitutions are primarily located in surface-exposed regions rather than in structurally constrained cores.\u003c/p\u003e","description":"","filename":"Fig5.png","url":"https://assets-eu.researchsquare.com/files/rs-9346910/v1/932e605d9f1e681d5d661c8a.png"},{"id":107486813,"identity":"0e71462f-18c3-4a15-b1ca-22f58c8416d9","added_by":"auto","created_at":"2026-04-22 02:39:01","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2011031,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9346910/v1/8c97eb3b-5664-459b-b44b-dc41a9caccf4.pdf"},{"id":107245708,"identity":"cff6ed03-e0eb-4a78-b01b-58e6f7c29394","added_by":"auto","created_at":"2026-04-19 08:06:25","extension":"xlsx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":22166,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eTable 1.\u003c/strong\u003e Summary of analyzed orthogroups, including the number of sequences, number of predicted models, average pLDDT scores, mean pairwise TM-scores, and related structural quality indicators.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable 2.\u003c/strong\u003e Summary of hotspot fpocket-derived pocket statistics and enrichment results for positively selected residues, including odds ratios, confidence intervals, and statistical significance values.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable 3.\u003c/strong\u003e Integrated evolutionary and structural characterization of lineage-specific substitutions. Shannon entropy (bits) and consensus frequencies were calculated from multiple sequence alignments, while stability effects (ΔΔG) were estimated using FoldX across independent replicates. Positive ΔΔG values indicate destabilization.\u003c/p\u003e","description":"","filename":"Tables.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-9346910/v1/70d827d5816ea71600bbf2a6.xlsx"},{"id":107484317,"identity":"d0791397-1ef9-440d-b278-92f4f55df31d","added_by":"auto","created_at":"2026-04-22 02:31:34","extension":"docx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":299108,"visible":true,"origin":"","legend":"","description":"","filename":"SMprotCereus.docx","url":"https://assets-eu.researchsquare.com/files/rs-9346910/v1/f4aef5c614cdfd14444d72da.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"Integrating evolutionary signals and protein structure reveals localized adaptive divergence in Cereus lineages","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eUnderstanding how organisms adapt to diverse and changing environments is a central goal of evolutionary biology (Jensen et al. \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2007\u003c/span\u003e). At the molecular level, adaptive genetic changes can shape protein structure and function, influencing organismal fitness by modifying enzymatic activity, regulatory interactions, and responses to environmental stress (L\u0026oacute;pez-Maury et al. \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2008\u003c/span\u003e; Vogt \u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). Advances in genomics and comparative phylogenetic methods have enabled the identification of genes evolving under positive selection (PS) across lineages, providing valuable insights into the genetic basis of adaptation (Liberles et al. \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e2012\u003c/span\u003e; Edwards et al. \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Amaral et al. \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Jayaraman et al. \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). However, translating statistical signatures of selection into mechanistic interpretations of how adaptive substitutions affect protein structure, function, and activity remains a major challenge.\u003c/p\u003e \u003cp\u003eBranch-site models and related likelihood-based approaches based on the nonsynonymous/synonymous rate ratio (d\u003csub\u003eN\u003c/sub\u003e/d\u003csub\u003eS\u003c/sub\u003e; Yang \u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e2007\u003c/span\u003e; Sun and Kozai \u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e2024\u003c/span\u003e) have become widely used to detect positive selection acting on specific evolutionary lineages. These methods have been applied to diverse taxa and ecological contexts, revealing genes potentially involved in environmental adaptation, stress tolerance, and niche specialization (Amaral et al. \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Ma et al. \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e2025\u003c/span\u003e; Xu et al. \u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). However, although d\u003csub\u003eN\u003c/sub\u003e/d\u003csub\u003eS\u003c/sub\u003e is a relevant indicator of selective pressure at the protein level, signatures of PS at the DNA level do not necessarily translate into functional divergence. Many amino acid substitutions may be selectively neutral or buffered by structural constraints, meaning that statistical signals alone provide limited information about their biological relevance (Ng and Henikoff \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e2006\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eRecent advances in protein structure prediction, particularly with the development of AlphaFold (Jumper et al. \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2021\u003c/span\u003e), have opened new opportunities to investigate the structural consequences of evolutionary change at an unprecedented scale (Varadi and Velankar \u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e2023\u003c/span\u003e), helping to bridge the gap between statistical signatures of selection and their functional consequences. High-confidence structural models can now be generated for thousands of proteins, enabling comparative analyses that were previously restricted to a small number of experimentally solved structures. These developments allow researchers to integrate evolutionary, structural, and functional perspectives in a unified analytical framework (Varadi and Velankar \u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Zhao et al. \u003cspan citationid=\"CR61\" class=\"CitationRef\"\u003e2024\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eDespite these advances, relatively few studies have combined genome-wide PS scans with comparative structural analyses across multiple lineages (e.g., Orsi et al. \u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e2008\u003c/span\u003e; Batarseh et al. \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Ren et al. \u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). Most investigations either focus on DNA-level patterns without structural interpretation or analyze individual proteins in isolation. This creates an important methodological gap, as adaptive signals detected at the sequence level often cannot be directly linked to their potential structural or functional consequences. Moreover, many existing studies rely on codon-aware alignments and residue-level mapping of positively selected sites, approaches that require complete coding sequences and therefore cannot be readily applied to transcriptome-based datasets that frequently contain fragmented or partial transcripts. As a result, integrating evolutionary signals with structural inference at a large scale remains technically challenging, particularly for non-model organisms (Schell et al. \u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e2025\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003cem\u003eCereus fernambucensis\u003c/em\u003e Lem. and \u003cem\u003eC. insularis\u003c/em\u003e Hemsl. (Cereeae, Cactaceae) compose a monophyletic group occurring in xeric coastal habitats within the Brazilian Atlantic Forest phytogeographic domain. \u003cem\u003eCereus fernambucensis\u003c/em\u003e is distributed in the restinga vegetation along the Brazilian Atlantic coast, whereas \u003cem\u003eC. insularis\u003c/em\u003e is a narrow endemic restricted to coastal rocky outcrops of the Fernando de Noronha Archipelago. The phylogenetic relationships among populations of both species suggest that \u003cem\u003eCereus insularis\u003c/em\u003e originated from northernmost \u003cem\u003eC. fernambucensis\u003c/em\u003e populations under a progenitor\u0026ndash;derivative speciation model (Franco et al. 2024). In a previous comparative transcriptomic and phylogenomic study, Amaral et al. (\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2024\u003c/span\u003e) used branch-site models to identify multiple orthogroups showing signatures of lineage-specific positive selection in the \u003cem\u003eC. fernambucensis\u003c/em\u003e-\u003cem\u003eC. insularis\u003c/em\u003e clade. These findings suggested that adaptive evolution has contributed to shaping molecular variation across island and continental populations. However, the structural and functional consequences of these evolutionary patterns remain unknown. This leaves several questions unresolved. For example, even in the absence of codon-level site mapping, do genes exhibiting PS signatures show spatially localized and structurally interpretable divergence among lineages? In particular, it remains unclear whether these adaptive signals are associated with changes in overall protein fold, localized remodeling of surface-accessible regions consistent with altered molecular interactions, or subtle modifications that preserve overall structural stability. To address this question, we used orthogroups with signatures of positive selection identified for the \u003cem\u003eC. fernambucensis\u003c/em\u003e-\u003cem\u003eC. insularis\u003c/em\u003e clade (Amaral et al. \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2024\u003c/span\u003e) to investigate structural divergence among lineages by integrating comparative structural modeling, sequence variability analysis, pocket detection, and stability inference. Rather than inferring function directly, we tested whether evolutionary heterogeneity was concentrated in structurally permissive regions and near putative cavities, under explicit model-confidence constraints. Using structure predictions and quantitative structural comparison methods, we examine how evolutionary divergence was reflected in three-dimensional protein structures across multiple lineages. Specifically, we aim to: (i) assess the consistency of predicted structures among orthologous proteins; (ii) quantify structural divergence using objective metrics; (iii) identify regions of elevated sequence variability and evaluate their spatial distribution within protein structures; (iv) test whether variable regions form spatial clusters near predicted functional cavities; and (v) estimate the potential impact of lineage-specific substitutions on protein stability in high-confidence structural regions.\u003c/p\u003e"},{"header":"2. Methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.1. Dataset and selection of positively selected orthogroups\u003c/h2\u003e \u003cp\u003eThe dataset analyzed in this study was derived from a previously published transcriptomic analysis of adaptive evolution in the \u003cem\u003eC. fernambucensis-C. insularis\u003c/em\u003e clade and related lineages (Amaral et al. \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). That study identified candidate genes evolving under lineage-specific positive selection using branch-site models. Orthogroups exhibiting statistically significant likelihood ratio tests after false discovery rate correction and supported by Bayes Empirical analyses were considered candidates for adaptive evolution using PAML v.4.9 (Yang \u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e2007\u003c/span\u003e). From the original set of positively selected genes, five orthogroups were prioritized for downstream structural and functional analyses (OG0028003, OG0028976, OG0029756, OG0030099, and OG0031271) based on the following criteria: (i) robust statistical support in branch-site tests, (ii) functional annotation suggesting biological relevance, (iii) presence of full-length or near-full-length protein sequences, and (iv) suitability for structural modeling. These orthogroups encode proteins associated with core cellular metabolism, stress response pathways, and regulatory processes (Amaral et al. \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2024\u003c/span\u003e), which are frequently reported as targets of adaptive evolution in insular and environmentally heterogeneous systems (Tsatsoulis et al. \u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e2013\u003c/span\u003e; Wek et al. \u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e2023\u003c/span\u003e).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e2.2. Retrieval and curation of orthologous protein sequences\u003c/h2\u003e \u003cp\u003eOrthologous amino acid sequences corresponding to the selected orthogroups were retrieved from the output of OrthoFinder v.2.5.5 (Emms and Kelly \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2019\u003c/span\u003e) and from curated protein predictions derived from transcriptomes. Protein sequences were obtained from datasets associated with each lineage. For each orthogroup, all available orthologs were extracted and manually inspected to remove truncated, low-complexity, or poorly aligned sequences. Redundant sequences were filtered using CD-HIT v.4.8.1 (Li and Godzik \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2006\u003c/span\u003e) with an identity threshold of 100% to ensure non-redundant datasets. Sequence identifiers were standardized to reflect orthogroup membership and lineage origin. The final curated datasets were used as input for subsequent analyses.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e2.3. Multiple sequence alignment and variability analysis\u003c/h2\u003e \u003cp\u003eMultiple sequence alignments were generated at the amino acid level using MAFFT v7.505 (Katoh et al. \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e2005\u003c/span\u003e) with the L-INS-i strategy, which is optimized for accuracy in datasets with low to moderate sequence divergence. Default gap opening and extension penalties were applied. Alignment quality was assessed visually and by summary statistics, including gap frequency and alignment length. To quantify site-specific variability, Shannon entropy (Shannon \u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e1948\u003c/span\u003e) was calculated for each alignment position using custom Python scripts built with BioPython and SciPy. Positions with gap frequencies greater than \u0026gt;\u0026thinsp;20% were excluded from entropy analyses. Sequence variability hotspots were defined as positions within the upper 5th percentile of entropy values among all non-gapped alignment positions.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003e2.4. Structural modeling\u003c/h2\u003e \u003cp\u003eThree-dimensional protein structures were predicted using AlphaFold (Jumper et al. \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; available at \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://alphafoldserver.com/\u003c/span\u003e\u003cspan address=\"https://alphafoldserver.com/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e), an optimized high-throughput modeling tool. For each orthogroup and lineage, the corresponding amino acid sequence was submitted to AlphaFold using default parameters. For each protein, the model with the highest average predicted Local Distance Difference Test (pLDDT) score was selected for downstream analyses. Model confidence was evaluated using residue-level pLDDT scores. Regions with mean pLDDT values below 70 were considered low-confidence and were interpreted with caution.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003e2.5. Quantitative structural comparison and similarity assessment\u003c/h2\u003e \u003cp\u003eStructural similarity among orthologous models was assessed using TM-align (Zhang and Skolnick \u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e2005\u003c/span\u003e) and Foldseek v.10 (van Kempen et al. \u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e2022\u003c/span\u003e) to increase robustness. Pairwise structural alignments were performed for all lineage pairs within each orthogroup. For each alignment, TM-score and root-mean-square deviation (RMSD) values were recorded. TM-score values were interpreted according to standard guidelines, with scores above 0.5 indicating shared global folds and values above 0.7 reflecting high structural similarity. RMSD values were calculated over aligned Cα atoms and used as a complementary measure of structural divergence. Results were compiled into pairwise similarity matrices and visualized as heatmaps using Python-based plotting libraries.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003e2.6. Identification of structural cavities and pocket\u0026ndash;hotspot association\u003c/h2\u003e \u003cp\u003ePutative ligand-binding pockets were predicted using FPocket v4.0 (Le Guilloux et al. \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2009\u003c/span\u003e) with default parameters. For each structural model, FPocket was used to identify cavities based on alpha sphere detection and Voronoi tessellation. Detected pockets were ranked by fpocket score, volume, and hydrophobicity. These predictions represent geometrically defined cavities and should not be interpreted as direct evidence of functional binding sites. To assess the spatial relationship between predicted pockets and sequence-variability hotspots, hotspot residues were mapped onto corresponding structural coordinates. Hotspot residues were classified as pocket-associated when at least one heavy atom was located within 5 \u0026Aring; of a pocket-defining alpha sphere. Statistical enrichment of hotspots in pocket-associated regions was evaluated using Fisher\u0026rsquo;s exact test by comparing observed and expected frequencies.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003e2.7. In silico stability analysis\u003c/h2\u003e \u003cp\u003eProtein stability was evaluated using FoldX v4 (Schymkowitz et al. \u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e2005\u003c/span\u003e), which estimates changes in folding free energy (ΔΔG) associated with amino acid substitutions based on empirical force-field approximations and predicted structural models. Before the mutational analyses, all structural models were subjected to the RepairPDB function to optimize side-chain conformations and remove steric clashes. For each orthogroup, lineage-specific amino acid substitutions identified from multiple sequence alignments were introduced using the BuildModel command. For each mutation, five independent FoldX runs were performed, and mean ΔΔG values were calculated to minimize stochastic variation. Analyses were restricted to residues located in high-confidence structural regions (pLDDT\u0026thinsp;\u0026gt;\u0026thinsp;70). Mutations in low-confidence regions were excluded. ΔΔG values were interpreted according to standard conventions, with positive values indicating destabilizing effects and negative values indicating stabilizing effects, while acknowledging the intrinsic uncertainty associated with force-field-based estimates and predicted backbones.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003e2.8. Integration of structural, sequence, and functional analyses\u003c/h2\u003e \u003cp\u003eStructural similarity metrics, entropy-based variability measures, pocket predictions, and stability estimates were integrated using custom Python scripts. For each orthogroup, the combined datasets were used to identify regions exhibiting convergent evidence of sequence divergence, structural remodeling, and potential functional relevance. The overall analytical workflow, which integrates evolutionary inference, structural modeling, functional site prediction, and stability analysis, is summarized in Figure \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e.\u003c/p\u003e \u003c/div\u003e"},{"header":"3. Results and Discussion","content":"\u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003e3.1. Selection of positively selected orthogroups and dataset structure\u003c/h2\u003e \u003cp\u003eThe five selected orthogroups showed consistently high structural quality across lineages, as indicated by high median pLDDT values and stable TM-scores (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). These results support strong overall consistency among the predicted structures of orthologous proteins (Objective i). These results indicate that the predicted models were structurally robust across lineages, providing a reliable basis for downstream comparisons of sequence variation, structural divergence, and stability. This approach is consistent with recent recommendations to incorporate AlphaFold confidence metrics into structure-informed evolutionary studies (Akdel et al. \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2022\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe selected orthogroups analyzed here include genes functionally associated with metabolism and stress response, as defined by the annotation and selection framework reported by Amaral et al. (\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). This functional profile is consistent with adaptive scenarios involving environmental heterogeneity, resource limitation, and fluctuating selective pressures. Island systems, in particular, often impose strong constraints on energetic efficiency and physiological resilience, thereby favoring the optimization of metabolic and regulatory networks (Whittaker and Fern\u0026aacute;ndez-Palacios \u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e2007\u003c/span\u003e; Warren et al. \u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e2015\u003c/span\u003e; Losos and Ricklefs 2021).\u003c/p\u003e \u003cp\u003eMoreover, comparative studies in vertebrates, insects, and plants have repeatedly shown that stress-related proteins and metabolic enzymes are recurrent targets of positive selection in insular and fragmented habitats (Rolland et al. \u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e2014\u003c/span\u003e; Jeffries et al. 2018; Tigano and Friesen \u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e2016\u003c/span\u003e). Our results are consistent with this broader pattern and suggest that similar selective mechanisms may operate in our studied system. The selected orthogroups exhibited contrasting structural and evolutionary profiles (Table\u0026nbsp;1). OG0028003 and OG0030099 showed the highest model confidence (mean pLDDT\u0026thinsp;=\u0026thinsp;81.7 and 69.2, respectively) and exhibited strong structural conservation (mean TM-score\u0026thinsp;=\u0026thinsp;0.901 and 0.933). These features make them the most suitable orthogroups for detailed structure-based functional interpretation. In contrast, OG0029756 and OG0031271 exhibited substantially lower model confidence (mean pLDDT\u0026thinsp;=\u0026thinsp;48.8 and 61.0, respectively) and high predicted disorder (0.81 and 1.00, respectively), suggesting that these proteins may contain extensive intrinsically disordered regions. This pattern indicates that these later orthogroups likely correspond to conformationally flexible or partially unstructured proteins. As a result, fine-scale structural inference is less reliable in these cases, highlighting an important methodological limit of structure-based analyses, thereby enabling subsequent assessments defined in Objectives (ii\u0026ndash;v) (van Der Lee et al. \u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e2014\u003c/span\u003e; Wright and Dyson \u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e2015\u003c/span\u003e; Chen et al. 2018).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003e3.2. Structural Model Quality and Lineage-Specific Structural Divergence\u003c/h2\u003e \u003cp\u003eStructural models were generated for all selected orthogroups, and residue-level profiles were used to assess local variation in model confidence (Jumper et al. \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Tunyasuvunakool et al. \u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). While overall structural robustness is described above (Objective i), here we focus on within-protein confidence patterns. OG0028003 and OG0030099 consistently displayed extensive high-confidence segments, with most residues exceeding pLDDT\u0026thinsp;\u0026gt;\u0026thinsp;75 and relatively few low-confidence regions. These profiles are consistent with well-defined secondary and tertiary structural elements, enabling more reliable residue-level structural interpretation for these proteins (Akdel et al. \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2022\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eIn contrast, the remaining orthogroups exhibited lower average pLDDT values and broader predicted intrinsically disordered regions (IDRs), thereby restricting the reliability of fine-scale structural and energetic analyses. Although lower-confidence regions should be interpreted cautiously, they may reflect biologically meaningful flexibility rather than prediction artifacts. Intrinsically disordered and flexible regions are common in regulatory proteins and stress-response factors and often associated with adaptive diversification (Wright and Dyson \u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e2015\u003c/span\u003e; Babu \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2016\u003c/span\u003e). Thus, localized low-confidence segments do not necessarily compromise the overall structural interpretation, but may instead indicate functional plasticity.\u003c/p\u003e \u003cp\u003eThe distribution of pLDDT scores across orthogroups and lineages (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e) indicates that structural confidence was broadly consistent among populations. This pattern suggests that the structural differences inferred among lineages are unlikely to reflect systematic modeling bias. Integrating structural confidence metrics into evolutionary analyses has been increasingly advocated to reduce false functional inferences based solely on sequence-level signals (Akdel et al. \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2022\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eTo quantitatively assess structural divergence among orthologous proteins, pairwise structural alignments were performed (Objective ii.). Across all orthogroups, most pairwise comparisons exhibited moderate-to-high TM-scores, indicating substantial conservation of global folds. TM-scores above 0.5 typically reflect shared structural topology, whereas values above 0.7 indicate strong fold-level conservation (Zhang and Skolnick \u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e2005\u003c/span\u003e). In this context, OG0028003 and OG0030099 showed consistently high TM-scores (\u0026gt;\u0026thinsp;0.7) and low RMSD values, revealing remarkable structural conservation despite underlying sequence divergence. Quantitatively, OG0028003 and OG0030099 exhibited low structural dispersion (mean RMSD\u0026thinsp;=\u0026thinsp;2.15 \u0026Aring; and 2.21 \u0026Aring;, respectively), whereas OG0029756 and OG0031271 showed markedly higher divergence (mean RMSD\u0026thinsp;=\u0026thinsp;5.35 \u0026Aring; and 4.42 \u0026Aring;), reflecting substantial conformational heterogeneity (Table\u0026nbsp;1). OG0028976 displayed intermediate behavior (mean TM-score\u0026thinsp;=\u0026thinsp;0.683; RMSD\u0026thinsp;=\u0026thinsp;2.28 \u0026Aring;), consistent with partial fold conservation combined with enhanced surface flexibility.\u003c/p\u003e \u003cp\u003eThis pattern suggests strong functional constraints acting on core structural elements, consistent with the maintenance of essential functional architectures. Similar decoupling between sequence divergence and structural conservation has been widely reported and reflects the intrinsic robustness of protein folds (Chothia and Lesk \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e1986\u003c/span\u003e; Illerg\u0026aring;rd et al. \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2009\u003c/span\u003e; Liberles et al. \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e2012\u003c/span\u003e). By contrast, orthogroups with lower TM-scores and higher RMSD values showed greater structural variability among lineages. Importantly, this divergence was primarily localized to loop regions and peripheral domains, whereas core secondary structure elements remained largely conserved (Bloom et al. \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2006\u003c/span\u003e; Echave et al. \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2016\u003c/span\u003e; Arenas et al. \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2013\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eSuch spatially restricted divergence is particularly consistent with branch-site positive selection models, which often detect adaptive changes affecting limited subsets of residues rather than entire domains (Venkat et al. \u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). The concordance between sequence-level selection signals and localized structural divergence strengthens the biological plausibility of the inferred adaptive events. Heatmaps of pairwise RMSD and TM-score values (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e) further illustrate lineage-specific structural differentiation, with clearer divergence patterns observed in orthogroups previously identified as under positive selection. Together, these results show that integrating structural alignment metrics with evolutionary modeling provides a more nuanced view of adaptive divergence, bridging genotype\u0026ndash;structure\u0026ndash;phenotype relationships (Liberles et al. \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e2012\u003c/span\u003e). Overall, the structural analyses indicate that adaptive evolution in the studied system predominantly operates through localized conformational adjustments rather than through large-scale fold innovation, reinforcing the view that protein evolution is typically constrained by fold stability while permitting functional modulation at peripheral regions (Ferreiro et al. \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2025\u003c/span\u003e; Huang et al. \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e2023\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003e3.3. Sequence variability hotspots and their functional-structural context\u003c/h2\u003e \u003cp\u003eShannon entropy analyses were implemented to identify regions of elevated sequence variability and evaluate their distribution along protein sequences (Objective iii.), without assuming direct evidence of positive selection at individual sites. Entropy values were calculated for each alignment position, and highly variable regions were defined as positions with top-5% entropy values and exhibiting low gap frequencies. Importantly, entropy-based hotspots reflect patterns of sequence variability and are not equivalent to codon-based signatures of positive selection. Across orthogroups, sequence variability hotspots were unevenly distributed along protein sequences and tended to cluster in flexible loops, terminal regions, and surface-exposed segments. In contrast, structurally conserved cores displayed consistently low entropy values, reflecting strong purifying constraints associated with fold stability and functional integrity (Chothia and Lesk \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e1986\u003c/span\u003e; Echave et al. \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2016\u003c/span\u003e; Liberles et al. \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e2012\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eThis spatial segregation between conserved cores and more variable peripheral regions is a well-established hallmark of protein evolution. Core residues are typically constrained by thermodynamic and folding requirements, whereas surface-exposed regions tolerate higher mutational loads and may facilitate lineage-specific functional modulation, particularly when combined with independent evidence of positive selection (Bloom et al. \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2006\u003c/span\u003e; Goldstein \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e2011\u003c/span\u003e; Nichol et al. \u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e2019\u003c/span\u003e). Orthogroups with higher structural conservation exhibited fewer, more localized hotspots, whereas structurally divergent proteins displayed broader regions of elevated variability. This relationship indicates a tight coupling between fold stability and evolutionary flexibility, consistent with theoretical and empirical models of structure\u0026ndash;sequence coevolution (Illerg\u0026aring;rd et al. \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2009\u003c/span\u003e; Arenas et al. \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2013\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eThe entropy profiles and hotspot distributions (Figs.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e and \u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e) revealed that sequence diversification is concentrated in structurally permissive regions rather than within the conserved structural core. This pattern is consistent with the general expectation that purifying selection preserves residues essential for structural stability, whereas diversification accumulates in flexible or solvent-exposed regions.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eSeveral substitutions also involve shifts in physicochemical properties, including polarity, charge, and hydrophobicity, between continental and island lineages, potentially altering interaction surfaces or local binding environments. Together, these results indicate that sequence variability is spatially concentrated in structurally permissive regions rather than uniformly distributed across the protein architecture.\u003c/p\u003e \u003cp\u003eTo evaluate the functional relevance of sequence variability, structural cavities, and their spatial relationships with entropy-defined hotspots were systematically assessed (Objective iv). For most orthogroups, major pockets were consistently detected across lineages, indicating that overall binding-site architecture is evolutionarily conserved at the geometric level. Such conservation is expected for functionally essential interaction surfaces and has been observed across diverse protein families (Cukuroglu et al. \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2014\u003c/span\u003e; Konc and Janežič \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e2014\u003c/span\u003e; Gao et al. \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). These conserved pockets were frequently located near regions compatible with ligand-binding or regulatory interfaces, suggesting potential functional relevance that remains to be experimentally validated. Statistical enrichment analyses further revealed that sequence variability hotspots were significantly overrepresented near predicted pockets across several orthogroups (Table\u0026nbsp;2). Several orthogroups exhibited odds ratios\u0026thinsp;\u0026gt;\u0026thinsp;1.5, indicating preferential localization of variable residues near interaction cavities. This pattern remained qualitatively consistent after accounting for background residue distributions, supporting a non-random association between sequence variability and functional surfaces. Together, these results indicate that evolutionary diversification preferentially targets surface-exposed regions associated with putative molecular interaction surfaces identified by geometric cavity detection, rather than structurally critical cores. This pattern is consistent with, but does not by itself demonstrate, adaptive functional divergence. Similar hotspot\u0026ndash;pocket coupling has been reported in studies of enzyme specificity, protein\u0026ndash;protein interactions, and signaling proteins, where adaptive substitutions fine-tune binding properties without compromising fold stability (Aftab et al. \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2024\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eFrom an evolutionary perspective, this enrichment supports a model in which positive selection acts primarily on residues mediating intermolecular recognition, substrate affinity, or regulatory modulation. Such regions are expected to experience fluctuating selective pressures in heterogeneous environments, favoring rapid functional optimization (Arenas et al. \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2013\u003c/span\u003e; Ding et al. \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Jimenez-Rosales and Flores-Merino \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). In contrast, proteins displaying lower structural confidence showed weaker or inconsistent pocket\u0026ndash;hotspot associations. This limitation likely reflects uncertainty in structural reconstruction rather than the genuine biological absence of functional coupling. These results emphasize the importance of integrating confidence metrics when interpreting structure\u0026ndash;function relationships derived from predicted models (Akdel et al. \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). The combined analysis of entropy profiles and pocket proximity (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e; Table\u0026nbsp;2) thus reveals that adaptive divergence in the studied orthogroups is spatially concentrated around interaction interfaces. This convergence between sequence variability, structural context, and predicted cavity architecture supports the structural plausibility of the inferred adaptive signals.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003e3.4. Structural stability and integrated patterns of adaptive divergence\u003c/h2\u003e \u003cp\u003eRepresentative structural models were subjected to systematic mutational scanning across orthogroups, and changes in folding free energy were calculated for lineage-specific variants, providing comparative rather than absolute estimates of energetic effects (Objective v.). Analyses were restricted to regions with high structural confidence (pLDDT\u0026thinsp;\u0026gt;\u0026thinsp;70) to minimize prediction artifacts associated with uncertain backbone conformations (Jumper et al. \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Akdel et al. \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). Across orthogroups, most substitutions exhibited mildly destabilizing or near-neutral effects (ΔΔG\u0026thinsp;\u0026asymp;\u0026thinsp;0\u0026ndash;1.5 kcal\u0026middot;mol⁻\u0026sup1;), although a small subset showed pronounced destabilization (\u0026gt;\u0026thinsp;4 kcal\u0026middot;mol⁻\u0026sup1;). Median ΔΔG values ranged from approximately 0.0 to 1.0 kcal\u0026middot;mol⁻\u0026sup1; (Table\u0026nbsp;3), with OG0028003 and OG0028976 displaying slightly higher central tendencies (median ΔΔG\u0026thinsp;\u0026asymp;\u0026thinsp;1.00 and 0.76, respectively) than OG0029756 and OG0030099 (median ΔΔG\u0026thinsp;\u0026lt;\u0026thinsp;0.25). However, several individual sites exhibited substantially higher energetic impacts, including OG0028003 I223A (ΔΔG\u0026thinsp;\u0026asymp;\u0026thinsp;3.28 kcal\u0026middot;mol⁻\u0026sup1;) and OG0030099 D55V (ΔΔG\u0026thinsp;\u0026asymp;\u0026thinsp;4.47 kcal\u0026middot;mol⁻\u0026sup1;), revealing localized hotspots of structural sensitivity. These values fall within ranges commonly interpreted as mildly destabilizing or near-neutral, indicating limited impact on global folding stability within the resolution of FoldX-based predictions. This pattern is consistent with theoretical and empirical evidence indicating that most naturally occurring substitutions in functional proteins are constrained to remain close to the stability threshold required for proper folding and activity (Bloom et al. \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2006\u003c/span\u003e; Goldstein \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e2011\u003c/span\u003e; Tokuriki and Tawfik \u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e2009\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eA subset of lineage-specific variants displayed moderate to strong destabilizing effects, exemplified by OG0030099 D55V (ΔΔG\u0026thinsp;\u0026asymp;\u0026thinsp;4.5 kcal\u0026middot;mol⁻\u0026sup1;) and OG0028003 I223A (ΔΔG\u0026thinsp;\u0026asymp;\u0026thinsp;3.28 kcal\u0026middot;mol⁻\u0026sup1;). These regions are known to tolerate greater energetic perturbations due to their lower contribution to global fold stability and their frequent involvement in regulatory and interaction processes (Echave et al. \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2016\u003c/span\u003e; Nichol et al. \u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Pillai et al. \u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). Destabilizing substitutions were rarely observed within core secondary structure elements or densely packed hydrophobic cores. This spatial bias supports the view that adaptive divergence preferentially targets peripheral regions while preserving essential structural scaffolds. Similar patterns have been reported across diverse protein families and are considered a hallmark of structurally constrained adaptive evolution (Liberles et al. \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e2012\u003c/span\u003e; Arenas et al. \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2013\u003c/span\u003e). Overall, the FoldX results (Table\u0026nbsp;3) suggest that adaptive substitutions in this system generally operate within a thermodynamically permissive regime, as inferred from relative stability trends rather than absolute energetic values, with only a few high-impact substitutions Although OG0031271 exhibited elevated predicted disorder and comparatively lower structural confidence, representative sites were nevertheless subjected to exploratory stability analyses, which should be interpreted cautiously given the limited reliability of energy calculations in highly flexible systems. In this orthogroup, predicted ΔΔG values were consistently near-neutral (|ΔΔG| \u0026lt; 0.5 kcal\u0026middot;mol⁻\u0026sup1;), reinforcing its role as a methodological reference and highlighting the limits of energy-based inference in highly flexible systems.\u003c/p\u003e \u003cp\u003eWhen examined individually, the analyzed orthogroups revealed complementary adaptive trajectories. OG0028003 represented a highly conserved structural scaffold with localized surface diversification and moderate energetic tolerance, suggesting functional fine-tuning without architectural disruption. OG0030099 exhibited similar fold stability but greater interface-associated variability, consistent with adaptive modulation of interaction surfaces. OG0028976 displayed intermediate structural conservation combined with elevated disorder, indicating enhanced regulatory or signaling flexibility. In contrast, OG0029756 showed low structural confidence and extensive disorder, restricting detailed functional inference, whereas OG0031271 primarily served as a methodological reference, illustrating the limits of structure-based evolutionary interpretation in highly flexible systems, as reflected by its consistently near-neutral energetic responses despite elevated disorder.\u003c/p\u003e \u003cp\u003eBeyond the specific biological patterns observed in the studied orthogroups, our analyses also illustrate a general framework for interpreting signals of positive selection using structural information. In this approach, statistical evidence of lineage-specific positive selection identifies candidate genes, while subsequent analyses place amino acid substitutions within their structural and biophysical context. Specifically, sequence variability is mapped onto predicted protein structures to evaluate whether divergence concentrates in structurally permissive regions or near interaction-associated surfaces, and stability analyses assess whether substitutions occur within thermodynamically tolerable regimes. This framework links positive selection signals, amino acid substitutions, structural context, and potential functional consequences, thereby bridging sequence-level evolutionary inference with protein structure and biophysical constraints (Liberles et al. \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e2012\u003c/span\u003e; Echave et al. \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2016\u003c/span\u003e; Jayaraman et al. \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2022\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eBy integrating structural similarity metrics (Objective ii), sequence variability profiles (Objective iii), pocket enrichment analyses (Objective iv), and stability estimates (Objective v), within the framework of structurally consistent models (Objective i), we identified consistent patterns linking evolutionary divergence to protein architecture. Orthogroups exhibiting strong evidence of positive selection in previous analyses tended to display localized structural divergence near surface-accessible and functionally relevant regions. These regions were characterized by elevated sequence entropy, enrichment near predicted interaction pockets, and moderate effects on stability, with occasional pronounced destabilization at specific hotspots, collectively suggesting adaptive remodeling of molecular interaction surfaces. Such coordinated patterns are consistent with contemporary models of protein evolution in which adaptive change is concentrated at functional interfaces rather than distributed uniformly across structures (Nichol et al. \u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Pillai et al. \u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). This mode of evolution enables fine-tuning of binding specificity, regulatory responsiveness, or interaction networks while maintaining global fold stability. In contrast, highly conserved orthogroups exhibited low sequence variability, strong structural conservation, and minimal stability perturbations, indicating stronger functional and biophysical constraints. These proteins likely occupy central positions in essential cellular pathways, where even minor perturbations may incur substantial fitness costs (Liberles et al. \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e2012\u003c/span\u003e; Echave et al. \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2016\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eThe observed coupling between evolutionary signals, structural context, and energetic effects supports the biological relevance of the inferred adaptive events. Rather than reflecting stochastic sequence divergence, the detected patterns are consistent with selective optimization of protein surfaces under lineage-specific ecological and physiological pressures. Thus, the integrative analyses indicate that adaptive evolution in the analyzed lineages is predominantly driven by subtle yet functionally meaningful remodeling of protein surfaces, mediated by coordinated changes in sequence variability, structural configuration, and thermodynamic stability. This evolutionary strategy minimizes deleterious pleiotropic effects while enabling continuous functional innovation, reinforcing the view that most adaptive protein evolution proceeds through incremental modifications of existing molecular frameworks rather than through radical architectural shifts (Arenas et al. \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2013\u003c/span\u003e; Liberles et al. \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e2012\u003c/span\u003e; Jayaraman et al. \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2022\u003c/span\u003e).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003e3.5. Surface-centered adaptive divergence across lineages\u003c/h2\u003e \u003cp\u003eWhen considered jointly, the structural, evolutionary, and biophysical analyses reveal a consistent pattern linking lineage-specific divergence to protein spatial organization. Across orthogroups, variability hotspots and lineage-specific substitutions are predominantly concentrated in surface-exposed regions and flexible loops rather than within structurally constrained cores (Figs.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e and \u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e. The majority of high-entropy sites occur in loop regions or solvent-exposed surfaces, reinforcing the observation that diversification is largely restricted to structurally permissive regions. This spatial segregation between conserved cores and variable peripheral regions is consistent with established models of protein evolution in which purifying selection maintains residues essential for fold stability whereas diversification accumulates in solvent-accessible segments that can tolerate higher mutational loads (Chothia and Lesk \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e1986\u003c/span\u003e; Bloom et al. \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2006\u003c/span\u003e; Echave et al. \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2016\u003c/span\u003e; Liberles et al. \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e2012\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eVariability hotspots were frequently located near predicted structural cavities (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e), and enrichment analyses indicate that these sites were overrepresented near pocket-associated residues in several orthogroups (Table\u0026nbsp;2). Such coupling between sequence variability and predicted interaction surfaces suggests that adaptive divergence preferentially targets molecular interfaces rather than structurally critical cores. Stability analyses further support this interpretation, as most lineage-specific substitutions produced mild or near-neutral energetic effects in high-confidence structural regions (Table\u0026nbsp;3), consistent with adaptive change occurring within thermodynamically permissive regimes (Tokuriki and Tawfik \u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e2009\u003c/span\u003e; Bloom et al. \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2006\u003c/span\u003e; Goldstein \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e2011\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eTogether, these results indicate that adaptive evolution in the \u003cem\u003eC. fernambucensis \u0026ndash; C. insularis\u003c/em\u003e clade primarily operates through localized biochemical remodeling of protein surfaces while preserving global fold architecture. Such surface-centered diversification may allow lineage-specific tuning of molecular interactions under heterogeneous ecological conditions while minimizing destabilizing effects on protein structure (Liberles et al. \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e2012\u003c/span\u003e; Echave et al. \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2016\u003c/span\u003e; Jayaraman et al. \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2022\u003c/span\u003e).\u003c/p\u003e \u003c/div\u003e"},{"header":"4. Conclusions","content":"\u003cp\u003eBy integrating evidence of positive selection with structural modeling, sequence-variability analyses, pocket detection, and stability inference, this study provides a mechanistic framework for interpreting adaptive protein evolution within the \u003cem\u003eC. fernambucensis\u003c/em\u003e-\u003cem\u003eC. insularis\u003c/em\u003e clade. Our results show that lineage-specific adaptation predominantly operates through localized remodeling of surface-exposed and interaction-associated regions, particularly in orthogroups exhibiting high structural confidence, while highly flexible proteins highlight the current limits of structure-informed evolutionary inference. This pattern indicates that functional diversification is primarily achieved through subtle modifications to molecular interfaces rather than through large-scale structural rearrangements. Methodologically, our integrative approach demonstrates that combining AlphaFold confidence metrics with structural, energetic, and evolutionary analyses improves the biological interpretability of positive selection signals. By integrating sequence variation, three-dimensional structure, and stability constraints, this framework offers a robust strategy for investigating adaptive molecular evolution in heterogeneous and insular systems and can be extended to other taxa and ecological contexts.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e \u003cstrong\u003eFunding\u003c/strong\u003e \u003cp\u003eThis work was supported by S\u0026atilde;o Paulo Research Foundation (FAPESP 2023/05589-4 to DTA, 2025/17270-8 to JAT, 2024/19266-5 to MIOC, and 2020/15161-3 to FFF) and CAPES 001 (MIOC).\u003c/p\u003e \u003c/p\u003e\u003cp\u003e \u003ch2\u003eCompeting Interests\u003c/h2\u003e \u003cp\u003eThe authors declare no competing interests.\u003c/p\u003e \u003c/p\u003e\u003cp\u003e \u003ch2\u003eAuthors contributions\u003c/h2\u003e \u003cp\u003eDTA conceived the idea for this study. DTA, MIOC, and JAT performed data collection and analyses. All authors contributed with numerous conceptions, writing, and the intellectual development of the paper, made multiple revisions, and approved the final draft.\u003c/p\u003e \u003c/p\u003e\u003ch2\u003eFunding information\u003c/h2\u003e \u003cp\u003eThis work was supported by S\u0026atilde;o Paulo Research Foundation (FAPESP 2023/05589-4 to DTA, 2025/17270-8 to JAT, 2024/19266-5 to MIOC, and 2020/15161-3 to FFF) and CAPES 001 (MIOC).\u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eDTA conceived the idea for this study. DTA, MIOC, and JAT performed data collection and analyses. All authors contributed with numerous conceptions, writing, and the intellectual development of the paper, made multiple revisions, and approved the final draft.\u003c/p\u003e\u003ch2\u003eAcknowledgments\u003c/h2\u003e \u003cp\u003eWe thank the funding agencies and the authors' institutions for their support. We also thank Ms. Liamar Benevides for inspiring this work.\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eCustom scripts and detailed file descriptions are available in GitHub (https://github.com/BBMDO/CeEVoS).\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eAftab A, Sil S, Nath S, Basu A, Basu S (2024) Intrinsic disorder and other malleable arsenals of evolved protein multifunctionality. J Mol Evol 92:669\u0026ndash;684\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAkdel M, Pires DEV, Pardo EP, J\u0026auml;nes J, Zalevsky AO, M\u0026eacute;sz\u0026aacute;ros B et al (2022) A structural biology community assessment of AlphaFold2 applications. Nat Struct Mol Biol 29:1056\u0026ndash;1067\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAmaral DT, Bonatelli IAS, Romeiro-Brito M, Telhe MC, Moraes EM, Zappi DC et al (2024) Comparative transcriptome analysis reveals lineage- and environment-specific adaptations in cacti from the Brazilian Atlantic Forest. Planta 260:4\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAnisimova M, Yang Z (2007) Multiple hypothesis testing to detect lineages under positive selection that affects only a few sites. Mol Biol Evol 24:1219\u0026ndash;1228\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eArenas M, Dos Santos HG, Posada D, Bastolla U (2013) Protein evolution along phylogenetic histories under structurally constrained substitution models. Bioinformatics 29:3020\u0026ndash;3028\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBabu MM (2016) The contribution of intrinsically disordered regions to protein function. Cell Mol Life Sci 73:3095\u0026ndash;3108\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBatarseh TN, Batarseh SN, Morales-Cruz A, Gaut BS (2023) Comparative genomics of the Liberibacter genus reveals widespread diversity in genomic content and positive selection history. Front Microbiol 14:1206094\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBloom JD, Labthavikul ST, Otey CR, Arnold FH (2006) Protein stability promotes evolvability. Proc Natl Acad Sci USA 103:5869\u0026ndash;5874\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen J, Kriwacki RW (2018) Intrinsically disordered proteins: structure, function and therapeutics. J Mol Biol 430:2275\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChothia C, Lesk AM (1986) The relation between the divergence of sequence and structure in proteins. EMBO J 5:823\u0026ndash;826\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCukuroglu E, Engin HB, Gursoy A, Keskin O (2014) Hot spots in protein\u0026ndash;protein interfaces: Towards drug discovery. Prog Biophys Mol Biol 116:165\u0026ndash;173\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDelport W, Poon AF, Frost SD, Kosakovsky Pond SL (2010) Datamonkey 2010: a suite of phylogenetic analysis tools for evolutionary biology. Bioinformatics 26:2455\u0026ndash;2457\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDing Y, Perez-Ortiz G, Peate J, Barry SM (2022) Redesigning enzymes for biocatalysis: exploiting structural understanding for improved selectivity. Front Mol Biosci 9:908285\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEchave J, Spielman SJ, Wilke CO (2016) Causes of evolutionary rate variation among protein sites. Nat Rev Genet 17:109\u0026ndash;121\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEdwards SV, Robin VV, Ferrand N, Moritz C (2022) The evolution of comparative phylogeography: putting the geography (and more) into comparative population genomics. Genome Biol Evol 14:evab176\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEmms DM, Kelly S (2019) OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol 20:238\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFerreiro D, Pazos E, Arenas M (2025) Trends in substitution models of protein evolution for phylogenetic inference. Mol Phylogenet Evol 108473\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGao M, Nakajima An D, Parks JM, Skolnick J (2022) AF2Complex predicts direct physical interactions in multimeric proteins with deep learning. Nat Commun 13:1744\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGoldstein RA (2011) The evolution and evolutionary consequences of marginal thermostability in proteins. Proteins 79:1396\u0026ndash;1407\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHuang Y, Ma Q, Sun J, Zhou LN, Lai CJ, Li P et al (2023) Comparative analysis of Diospyros plastomes: insights into genomic features, mutational hotspots, and adaptive evolution. Ecol Evol 13:e10301\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eIllerg\u0026aring;rd K, Ardell DH, Elofsson A (2009) Structure is three to ten times more conserved than sequence. Proteins 77:499\u0026ndash;508\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJames JE, Lascoux M (2025) Amino acid properties, substitution rates, and the nearly neutral theory. Genome Biol Evol 17:evaf025\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJayaraman V, Toledo-Pati\u0026ntilde;o S, Noda-Garc\u0026iacute;a L, Laurino P (2022) Mechanisms of protein evolution. Protein Sci 31:e4362\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJeffries KM, Connon RE, Verhille CE, Dabruzzi TF, Britton MT, Durbin-Johnson BP et al (2019) Divergent transcriptomic signatures in response to salinity exposure. Evol Appl 12:1212\u0026ndash;1226\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJensen JD, Wong A, Aquadro CF (2007) Approaches for identifying targets of positive selection. Trends Genet 23:568\u0026ndash;577\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJimenez-Rosales A, Flores-Merino MV (2018) Tailoring proteins to re-evolve Nature: A short review. Mol Biotechnol 60:946\u0026ndash;974\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O et al (2021) Highly accurate protein structure prediction with AlphaFold. Nature 596:583\u0026ndash;589\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKatoh K, Kuma KI, Toh H, Miyata T (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res 33:511\u0026ndash;518\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKonc J, Janežič D (2014) Binding site comparison for function prediction and pharmaceutical discovery. Curr Opin Struct Biol 25:34\u0026ndash;39\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLe Guilloux V, Schmidtke P, Tuffery P (2009) Fpocket: an open source platform for ligand pocket detection. BMC Bioinformatics 10:168\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of sequences. Bioinformatics 22:1658\u0026ndash;1659\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiberles DA, Teichmann SA, Bahar I, Bastolla U, Bloom J, Bornberg-Bauer E et al (2012) The interface of protein structure and molecular evolution. Protein Sci 21:769\u0026ndash;785\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eL\u0026oacute;pez-Maury L, Marguerat S, B\u0026auml;hler J (2008) Tuning gene expression to changing environments. Nat Rev Genet 9:583\u0026ndash;593\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMa G, Shi M, Li Y, Wang S, Zeng X, Jia Y (2025) Diverse adaptation strategies to stress in coastal sediments. Environ Res 271:121073\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNg PC, Henikoff S (2006) Predicting the effects of amino acid substitutions. Annu Rev Genomics Hum Genet 7:61\u0026ndash;80\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNichol D, Robertson-Tessi M, Anderson AR, Jeavons P (2019) Model genotype\u0026ndash;phenotype mappings. J R Soc Interface 16\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOrsi RH, Sun Q, Wiedmann M (2008) Genome-wide analyses of Listeria monocytogenes. BMC Evol Biol 8:233\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePillai AS, Hochberg GK, Thornton JW (2022) Simple mechanisms for the evolution of protein complexity. Protein Sci 31:e4449\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRen C, Comes HP, Zhu S, Zhang X, Jiang W, Fu C et al (2025) Genome-wide patterns of local adaptation. New Phytol 247:1503\u0026ndash;1519\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRolland J, Condamine FL, Jiguet F, Morlon H (2014) Faster speciation in the tropics. Science 343:746\u0026ndash;749\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSchell T, Greve C, Podsiadlowski L (2025) Genome sequencing for non-model organisms. Front Zool 22:7\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSchymkowitz J, Borg J, Stricher F, Nys R, Rousseau F, Serrano L (2005) The FoldX web server. Nucleic Acids Res 33:W382\u0026ndash;W388\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379\u0026ndash;423\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSun X, Kozai T (2024) Different selection levels of mitogenomes. Diversity 16:715\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTigano A, Friesen VL (2016) Genomics of local adaptation with gene flow. Mol Ecol 25:2144\u0026ndash;2164\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTokuriki N, Tawfik DS (2009) Stability effects of mutations and protein evolvability. Curr Opin Struct Biol 19:596\u0026ndash;604\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTsatsoulis A, Mantzaris MD, Bellou S, Andrikoula M (2013) Insulin resistance: an evolutionary perspective. Metabolism 62:622\u0026ndash;633\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTunyasuvunakool K, Adler J, Wu Z, Green T, Zielinski M, Ž\u0026iacute;dek A et al (2021) Protein structure prediction for the human proteome. Nature 596:590\u0026ndash;596\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003evan der Lee R, Buljan M, Lang B, Weatheritt RJ, Daughdrill GW, Dunker AK et al (2014) Classification of intrinsically disordered proteins. Chem Rev 114:6589\u0026ndash;6631\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003evan Kempen M, Kim SS, Tumescheit C, Mirdita M, Gilchrist CL, S\u0026ouml;ding J et al (2022) Foldseek: fast and accurate protein structure search. bioRxiv\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVaradi M, Velankar S (2023) Impact of AlphaFold Protein Structure Database. Proteomics 23:2200128\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVenkat A, Hahn MW, Thornton JW (2018) Multinucleotide mutations cause false inferences of positive selection. Science 361:64\u0026ndash;69\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVogt G (2022) Environmental adaptation and epigenetics. Epigenomes 7:1\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWarren BH, Simberloff D, Ricklefs RE, Aguil\u0026eacute;e R, Condamine FL, Gravel D et al (2015) Islands as model systems in ecology and evolution. Ecol Lett 18:200\u0026ndash;217\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWek RC, Anthony TG, Staschke KA (2023) Translational control and stress response. Antioxid Redox Signal 39:351\u0026ndash;373\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWhittaker RJ, Fern\u0026aacute;ndez-Palacios JM (2007) Island biogeography: ecology, evolution, and conservation. Oxford University Press\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWright PE, Dyson HJ (2015) Intrinsically disordered proteins in cellular signalling. Nat Rev Mol Cell Biol 16:18\u0026ndash;29\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXu W, Luo D, Peterson K, Zhao Y, Yu Y, Ye Z et al (2025) Ecological niche models for forest adaptation. Biol Rev 100:1754\u0026ndash;1781\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYang Z (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24:1586\u0026ndash;1591\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang Y, Skolnick J (2005) TM-align: a protein structure alignment algorithm. Nucleic Acids Res 33:2302\u0026ndash;2309\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhao N, Wu T, Wang W, Zhang L, Gong X (2024) Methods for predicting protein complex structure. Interdiscip Sci Comput Life Sci 16:261\u0026ndash;288\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"},{"header":"Tables","content":"\n\u003cp\u003eTables 1 to 3 are available in the Supplementary Files section.\u003c/p\u003e\n"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Adaptation, Positive Selection, Stability, Variability","lastPublishedDoi":"10.21203/rs.3.rs-9346910/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9346910/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eUnderstanding how positive selection translates into functional changes in proteins remains challenging, particularly in non-model species. Here, we integrate phylogenetic evidence of lineage-specific positive selection with comparative structural and biophysical analyses to investigate adaptive divergence in the \u003cem\u003eCereus fernambucensis-C. insularis\u003c/em\u003e clade, which occupies contrasting island and continental environments. Based on a previously published branch-site scan, we selected five orthogroups showing robust statistical support, biological relevance, and full-length sequences for downstream analyses. Orthologous protein structures were predicted and evaluated, followed by quantitative comparisons. Geometrically defined cavities were inferred by pocket detection, and hotspot enrichment near these regions was assessed. Finally, we estimated the stability effects of lineage-specific substitutions using force-field-based predictions restricted to high-confidence structural regions. Across orthogroups, global folds were generally conserved, while structural divergence was predominantly localized to flexible loops and surface-exposed regions. Variability hotspots were non-randomly enriched near predicted cavities across several orthogroups, consistent with preferential diversification of putative interaction-associated surfaces. Stability scans indicated largely mild or near-neutral energetic effects (median ΔΔG\u0026thinsp;~\u0026thinsp;0\u0026ndash;1 kcal\u0026middot;mol⁻\u0026sup1;), punctuated by a few strongly destabilizing substitutions at specific sites. Within the resolution of FoldX-based estimates, these patterns are consistent with the preservation of overall fold integrity. Our results indicate that adaptive evolution in \u003cem\u003eCereus\u003c/em\u003e lineages is primarily associated with localized modifications in surface-accessible regions within thermodynamically permissive regimes, particularly in structurally well-resolved proteins, rather than with large-scale structural innovation. More broadly, these findings highlight both the potential and current limitations of structure-informed evolutionary inference in non-model systems.\u003c/p\u003e","manuscriptTitle":"Integrating evolutionary signals and protein structure reveals localized adaptive divergence in Cereus lineages","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-04-19 08:06:20","doi":"10.21203/rs.3.rs-9346910/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"1e3cf0a4-7166-4ebb-8f62-53f907d6fb42","owner":[],"postedDate":"April 19th, 2026","published":true,"recentEditorialEvents":[{"type":"editorInvitedReview","content":"","date":"2026-05-16T00:03:51+00:00","index":19,"fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-05-12T19:13:28+00:00","index":18,"fulltext":""},{"type":"reviewerAgreed","content":"267646210130382347746810259905747534043","date":"2026-05-08T10:16:26+00:00","index":17,"fulltext":""}],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2026-04-19T08:06:20+00:00","versionOfRecord":[],"versionCreatedAt":"2026-04-19 08:06:20","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9346910","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9346910","identity":"rs-9346910","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00