Fitness-driven emergence and lineage replacement underpin the global resurgence of GII.17 noroviruses

doi:10.21203/rs.3.rs-8923591/v1

Fitness-driven emergence and lineage replacement underpin the global resurgence of GII.17 noroviruses

2026 · doi:10.21203/rs.3.rs-8923591/v1

preprint OA: closed CC-BY-4.0

📄 Open PDF Full text JSON View at publisher

Full text 145,667 characters · extracted from preprint-html · click to expand

Fitness-driven emergence and lineage replacement underpin the global resurgence of GII.17 noroviruses | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Analysis Fitness-driven emergence and lineage replacement underpin the global resurgence of GII.17 noroviruses Damien Tully, Sunando Roy, Helena Tutill, Rachel Williams, Cristina Celma, and 3 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8923591/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted You are reading this latest preprint version Abstract Norovirus genotype GII.4 has dominated global gastroenteritis outbreaks for decades, limiting sustained emergence of alternative genotypes. Recent surveillance, however, shows rapid expansion of GII.17, which has overtaken GII.4 in multiple regions. Here, we reconstruct five decades of GII.17 evolution using a global genomic dataset spanning 1976–2025. We identify a single ancestral recombination event underlying the epidemic C, D and E clades, providing the genomic foundation for subsequent epidemic diversification. Contrary to expectations of pre-adaptive change, we find no evidence of intensified stem-lineage selection preceding emergence. Instead, variant E accumulated lineage-defining substitutions under relaxed selective constraints during early transmission, accompanied by shifts in mutational processes and a substantial transmission fitness advantage over prior variants. Together, these findings demonstrate that incremental post-emergence fitness gains, rather than major antigenic shifts or recurrent recombination, can enable non-dominant genotypes to overcome entrenched epidemiological barriers and drive global lineage replacement. Biological sciences/Microbiology/Virology/Viral evolution Biological sciences/Evolution/Evolutionary genetics Norovirus viral evolution viral emergence phylodynamics fitness Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Introduction Norovirus is a leading cause of acute gastroenteritis (AGE), responsible for an estimated 200,000 deaths annually 1 and imposing a substantial economic burden costing billions worldwide 2 . Noroviruses are genetically diverse single-stranded positive sense RNA viruses in the Caliciviridae family. Based on phylogenetic relationships, they are classified into at least ten genogroups (GI to GX) of which GI and GII account for most human infections. Further subdivision is based on diversity in the major capsid protein VP1 and the RNA-dependent RNA polymerase (RdRp), yielding more than 48 capsid genotypes and over 60 polymerase (P-)types 3 . Despite this extensive diversity, global norovirus epidemiology over the past three decades has been dominated by genotype GII.4. Successive GII.4 variants, most notably GII.4 Sydney 2012 have repeatedly replaced their predecessors 4 through antigenic drift and immune escape 5,6 , resulting in prolonged global predominance and limited opportunities for sustained displacement by other genotypes. Against this backdrop the emergence of GII.17 represented a notable exception. During the winter of 2014-15, a novel GII.17 variant (GII.17 Kawasaki 2014) rapidly rose to prominence in several Asian countries, temporarily outcompeting GII.4 Syndey causing large gastroenteritis outbreaks in different Asian countries 7–9 . Despite this transient success, GII.17 declined and was detected only sporadically over the subsequent decade, even though the genotype has circulated in humans for nearly 50 years. 10 . This reinforced the view that non-GII.4 genotypes lack the evolutionary capacity for sustained global dominance. Recent surveillance data has challenged this assumption. Since 2023, GII.17 outbreaks have increased markedly in Europe and the United States, overtaking GII.4 in several regions by 2025 11–13 . Phylogenetic analyses identify multiple established GII.17 clusters (A–D) 14 and more recently a genetically distinct lineage first detected in Romania in 2021 15 and subsequently reported across multiple continents 11,16–18 . Although prior studies have characterised antigenic properties and receptor-binding adaptations of this emergent lineage 18 , its broader evolutionary context remains unclear. It is unknown whether its expansion reflects pre-adaptive innovation, recombination or fitness gains acquired during early transmission. Here, we present a global, five-decade evolutionary analysis of GII.17 based on more than 1,200 sequences sampled between 1976 and 2025. Integrating phylogenetic inference, molecular clock modelling, selection analyses, mutational spectrum characterisation, and quantitative fitness modelling, we reconstruct the long-term phylodynamic history of this genotype. We show that the dominant GII.17 lineage circulated cryptically prior to global expansion and that its emergence was not driven by recent recombination or intensified stem-lineage selection. Instead, expansion was underpinned by measurable transmission fitness gains acquired during early spread. We further identify lineage-specific shifts in mutational processes, evidence for a shared historical recombinant origin of epidemic variants, and the persistence of divergent lineages branching from ancestral variant A. Together, these findings place the recent resurgence of GII.17 within a global evolutionary framework and demonstrate how incremental fitness gains can enable non-GII.4 genotypes to displace long-standing epidemic lineages. Results Sequential replacement of GII.17 variants We curated over 5,000 publicly available GII.17 sequences from GenBank, retaining high-quality ORF1 (n=429), ORF2 (n=1,206) and ORF3 (n=474) sequences for downstream analyses. Temporal analysis revealed clear sequential variant turnover ( Fig. 1 ). Variant A predominated in the 1970s–1980s, followed by variants B and C from the mid-1990s to early 2000s. Variant D emerged around 2014 and was subsequently replaced by rapid expansion of variant E from 2021 onwards. Notably, variants A and B persisted at low frequency across decades, consistent with long-term maintenance of ancestral lineages. Phylogenetic characterisation of GII.17 variants Maximum likelihood analysis of 1,207 full-length VP1 sequences (1978-2025) identified five major GII.17 lineages including a recently emerged lineage first reported from Romania in 2021 15 ( Fig. 2a ). This lineage comprising 151 VP1 sequences sampled between 2021 – 2025, shows ~94% nucleotide identity to variant C and 93% to variant D ( Fig. 2b ). This pattern was consistent across ORF1 (94% and 93% identity with variants C and D respectively) and ORF3 (95% and 93% identity with variants C and D respectively) ( Supplementary Fig. 1 ). Comparison of intra-lineage diversity in VP1 indicated that this lineage is more diverse with a mean within-in variant pairwise distance of 2.32% compared to 1.08% for variant C and 0.79% for variant D. A small subset of sequences sampled between 2014 and 2016 from geographically diverse locations (Canada, France, the Netherlands, and the UK) occupy basal positions along the stem leading to this lineage, suggesting they are closely related to its unsampled common ancestor. By late 2025, this clade had been detected across multiple continents with new data from the UK showing that this clade was first sampled on 20 July 2021 from a 65-year-old male in North West England, predating the earliest available Romanian cases. Based on expanded phylogenetic evidence we designate this lineage as variant E or GII.17 North West England (2021) subject to review by the ad hoc international norovirus nomenclature working group. Cryptic pre-pandemic emergence and accelerated evolution of variant E Relaxed molecular clock analyses demonstrated sufficient temporal signal for robust dating ( Fig. 3a ). Using a Bayesian Markov chain Monte Carlo framework with a non-parametric GMRF skygrid coalescent model and a relaxed molecular clock, we estimated the time to the most recent common ancestor (tMRCA) of variant E to be November 2018 (95% Bayesian credible interval: September 2017 to December 2019) ( Fig. 3b ). This predates its first detection in the UK in July 2021 by approximately 3 years, consistent with cryptic pre-pandemic circulation. Variant E exhibits markedly elevated evolutionary rates ( Fig. 3c ), with estimated rates of 5.90 × 10⁻³ substitutions per site per year (95% BCI: 4.36 × 10⁻³ – 7.61 × 10⁻³) for VP1 compared to variant C (3.08 × 10⁻³) and variant D (2.18 × 10⁻³). Similarly, for RdRP variant E had elevated evolutionary rates of 6.47 × 10⁻³ substitutions per site per year (95% BCI: 4.42 × 10⁻³ – 8.69 × 10⁻³) compared to variant D (2.34 × 10⁻³) while robust estimates could not be obtained for variant C. Shared recombinant origin of epidemic GII.17 variants Recombination analysis of 429 complete GII.17 genomes provides strong evidence for a single historical recombination event that shaped the evolutionary origin of the major epidemic GII.17 variants (C, D and E). Using multiple complementary approaches, we consistently identified a breakpoint at the ORF1/ORF2 junction, indicating distinct evolutionary histories for non-structural and structural genomic regions. GARD detected a single significant breakpoint at position 5,191 bp (Δc-AIC null model = 12,263), precisely coinciding with the ORF1/ORF2 boundary. This result was independently corroborated by 3SEQ which identified 429 statistically significant recombination events ( p < 10 -23 ) with 99.3% of inferred breakpoints localising to the same junction. The dominant recombination signal indicates that variants C, D, and E share a common recombinant ancestry, rather than arising through multiple independent recombination events. Analysis of inferred parental lineages revealed that over 90% of recombinant events involved B-like and D-like parental combinations, producing variants C, D, or E as recombinant descendants. Notably, a small set of recently sampled B-like sequences from the United States (2023–2024) occupy a key phylogenetic position. These sequences are substantially more similar to variants C–E across ORF1 (90–93% nucleotide identity) than to basal historical variant B sequences (~84%), while remaining highly divergent in the capsid region ( Fig. 4a ). In 3SEQ analyses, these B-like sequences predominantly function as parental donors rather than recombinant products, consistent with their role as close descendants of the ancestral ORF1 donor lineage ( Fig. 4b ). Together, these findings support a parsimonious evolutionary model in which a single recombination event occurred prior to 2013, combining a B-like ORF1 with a D-like ORF2/3 to generate the common ancestor of the epidemic C-D-E clade ( Supplementary Fig. 2 ). Subsequent diversification of this recombinant lineage gave rise to variants C, D and ultimately the currently dominant variant E. The persistence and recent detection of divergent B-like sequences suggest long-term circulation of the ancestral genomic architecture potentially maintained in cryptic transmission chains or under-sampled reservoirs. Post-emergence relaxation and focal adaptation shape variant E evolution To investigate whether adaptive evolution preceded or followed the emergence of variant E, we examined patterns of selection across the GII.17 genome using a suite of phylogenetic methods. Across all genes mean ω values were low (ω range: 0.0011–0.346) ( Table 2 ) indicating pervasive purifying selection consistent with strong functional constraint. We first tested whether selection intensity differed on the stem branch leading to variant E, which represents viral evolution prior to its epidemiological emergence. If key adaptive changes enabling efficient human transmission occurred before widespread circulation, we would expect a detectable shift in selection pressure along this branch. However, RELAX analyses showed no evidence for either intensification or relaxation of selection on the stem branch (K= 0.68 ; p = 0.243), indicating that variant E did not arise through strong pre-adaptive evolution prior to its emergence. In contrast, analyses of internal branches with the variant E clade revealed evidence for altered selection regimes after establishment in the human population. RELAX identified significant relaxation of selection in several genes including p48, protease, entire ORF1 and ORF3 (VP2) ( Table 1 ) suggesting a genome-wide reduction in evolutionary constraints to facilitate viral adaptation during early epidemic spread. Despite, this overall relaxation, a small number of codon sites showed evidence of intensified or episodic diversifying selection. Across the genome, 17, sites displayed signals of lineage-specific selection. In p48, position 10 shows a dramatically higher ω in variant E viruses compared to other GII.17 sequences while sites 100, 293 and 437 in the NTPase, RdRp and VP1 also show intensified selection. 14 other sites showed evidence of episodic diversifying selection. Consistent with these patterns, we identified 12 lineage-specific amino acid substitutions that define variant E ( Table 2 ). These substitutions are highly present within variant E (>90% frequency) and are rare or absent in variant C and other GII.17 lineages (<10%). Eight of the substitutions occur in structural proteins with the majority located in VP1 and clustering within residues 361-447. The remaining four substitutions occur in non-structural proteins (NS3, NS4 and NS7), indicating coordinated changes affecting both capsid structure and replication associated functions. These results indicate post-emergence adaptive refinement rather than pre-adaptive innovation. Lineage-specific remodelling of mutational processes To assess whether shifts in underlying mutational processes accompanied successive GII.17 variant replacements, we compared context-dependent mutational spectra across variants C, D, and E ( Supplementary Fig. 3 ).Although all variants exhibited a strong transition bias, their trinucleotide-resolved mutational profiles differed significantly, indicating lineage-specific remodeling of mutational pressures. Variant C displayed a comparatively balanced spectrum with a higher proportion of G>A transitions ( Supplementary Fig. 3a ) , consistent with a more heterogeneous mutational process during its circulation. In contrast, variant E exhibited a pronounced enrichment of C>T substitutions ( Supplementary Fig. 3c ), the majority of which occurred outside canonical APOBEC-associated contexts, suggesting a shift toward spontaneous deamination–mediated mutagenesis rather than host-driven RNA editing. Variant D showed an intermediate profile characterised by a clear APOBEC-like cytidine deamination signature ( Supplementary Fig. 3b ). Despite overall similarity between variants D and E, these differences indicate that distinct mutational mechanisms may underlie the evolution of successive GII.17 variants. Fitness-based lineage replacement of GII.17 variants To estimate the relative fitness of each GII.17 variant, we implemented Phylowave 19 , a recently developed fitness-inference framework for rapidly evolving pathogens. This approach automatically inferred seven distinct lineages across the GII.17 phylogeny ( Fig. 5; Supplementary Fig. 5 ). Agreement between these automatically inferred clades and previously defined GII.17 variants was quantified using the adjusted rand index (ARI), which accounts for agreement expected by chance 20 . This analysis revealed near-perfect concordance between classifications (ARI = 0.997), with only a small number of phylogenetically intermediate sequences contributing to residual disagreement. We next estimated the relative fitness of each inferred lineage using a multinomial logistic fitness model ( Fig. 5b ). This analysis revealed marked differences in lineage dynamics through time with clear shifts in relative prevalence corresponding to successive variant replacements. Fitness estimates indicated that group 1, corresponding to variant E, exhibited substantially higher relative fitness than all other GII.17 lineages ( Fig. 5c ), consistent with its rapid expansion and epidemiological dominance. In addition, the replacement of Variant C by Variant D was associated with a measurable increase in estimated fitness, suggesting a stepwise improvement in lineage success prior to the emergence of Variant E. Notably, group 6, which comprises a divergent subset of variant A sequences—including ancestral viruses such as T055/Tunisia/1977 and Hu/GII.17/C142/1978/GUF also displayed elevated fitness relative to other variants, although remaining lower than that observed for Variant E. Divergent lineage emerging from ancestral variant A Phylogenetic analysis identified a small, deeply divergent GII.17 lineage branching from ancestral variant A ( Fig. 6a ) and detected during a localized hospital outbreak in young children in North West England in 2022. Despite substantial divergence across both structural and non-structural regions ( Fig. 6b ) 76% of mutations, relative to its closest ancestor T055/Tunisia/1977, are confined to the P2 subdomain ( Fig. 6c ). This lineage has not been detected beyond this setting and shows no evidence of sustained transmission. These findings indicate that GII.17 is capable of repeatedly generating divergent lineages, although only a subset such as variant E acquire the fitness advantage required for widespread dissemination ( Fig. 6 ). Discussion In this study, we present a global, five-decade evolutionary reconstruction of GII.17 noroviruses and demonstrate that its recent resurgence reflects fitness-driven lineage replacement rather than dramatic pre-adaptive emergence or repeated recombination. By integrating phylogenetic, recombination, selection, mutational landscape and quantitative fitness-based analyses we define a coherent evolutionary pathway that is best explained by post-emergence fine-tuning during early spread, rather than by major adaptive shifts or repeated recombination. A central finding is that epidemic GII.17 variants (C, D and E) share a single historical recombinant origin at the ORF1/ORF2 junction. While recombination at this boundary is well documented in noroviruses it has been implicated in the emergence of multiple epidemic genotypes, including GII.4 and GII.2 21,22 . However, unlike the frequent recombination observed in GII.4 our results indicate that modern GII.17 diversity largely reflects diversification of a single recombinant lineage rather than repeated independent recombination events. This suggests that recombination provided an early permissive genomic scaffold but not the immediate trigger for epidemic expansion which was instead shaped by lineage-specific fitness advantages. The identification of contemporary divergent variant B sequences (2023–2024) as the closest known descendants of the ancestral ORF1 donor lineage suggests that this proto-ORF1 genomic architecture was either historically under-sampled or persisted within a cryptic reservoir (e.g. animal hosts or immunocompromised individuals) that has only recently been detected. Selection analyses further clarifies the mode of emergence. We detect no evidence for intensified or directional selection along the stem branch leading to variant E, arguing against strong pre-adaptive evolution prior to epidemiological expansion. Instead, altered selective regimes and modest lineage-defining substitutions accumulated after establishment in the human population. This contrasts with the well-described epochal evolution of GII.4, in which repeated waves of adaptive change in VP1 drive immune escape and lineage replacement 23,24 . In GII.17, by contrast, emergence appears to reflect post-establishment adaptive refinement under relaxed purifying selection rather than the sudden acquisition of a large-effect immune-escape mutation. Bayesian molecular clock analyses reveal elevated substitution rates in variant E relative to earlier variants. Such increases likely reflect short-term rate inflation during early epidemic growth, a phenomenon observed in other rapidly expanding RNA viruses 25 . Nevertheless, the combination of accelerated accumulation of substitutions and relaxed constraint likely facilitated the fixation of a limited number of lineage-defining changes that collectively conferred a measurable transmission advantage. Our quantitative fitness inferences demonstrate that variant E has a marked fitness advantage relative to all previously circulating GII.17 lineages, mirroring observed patterns of lineage replacement. This concordance between phylogenetic inference and frequency dynamics supports the interpretation that transmission fitness differences contributed substantially to its global expansion, beyond stochastic epidemiological effects alone. Notably, the second-highest relative fitness was observed in a subset of variant A sequences, including our recently described divergent UK genomes. Although limited sampling precludes firm conclusions regarding their competitive potential, these finding highlights substantial cryptic diversity within the GII.17 phylogeny that may be underrepresented in routine outbreak-focused surveillance. We further identify lineage-specific remodelling of mutational processes across GII.17 variants. Variant D displays an APOBEC-like cytidine deamination signature, whereas variant E shows enrichment of C>T transitions outside canonical editing motifs, consistent with altered mutational pressures during transmission. These findings suggest that shifts in the cellular or immunological environments encountered during spread may influence not only which mutations are selected, but also which mutations are generated. Thus, viral evolution may be shaped jointly by selection and by changes in the mutational landscape itself. The estimated three-year period of cryptic circulation prior to detection of variant E contrasts with the rapid emergence of the Kawasaki (variant D) lineage in 2014-2015 26 . This delayed detection of variant E likely reflects widespread suppression of norovirus transmission during COVID-19-associated non-pharmaceutical interventions including lockdowns, social distancing, and enhanced hygiene measures 27–29 followed by a rebound in population susceptibility after relaxation 30 .Such a transient transmission bottleneck may have inadvertently provided an opportunity for variant E to rise in relative frequency before encountering competition from established GII.4 lineages. Earlier estimates by Epifanova et al. 31 placing the origin in 2017 were based on limited sampling (<20 sequences) and likely lacked sufficient temporal resolution. Our expanded dataset and GMRF Bayesian Skyline model provided improved flexibility in capturing demographic change over time 32 . These evolutionary patterns are consistent with prior structural and antigenic investigations of GII.17. Earlier studies of GII.17 have largely focused on regional outbreaks 33–35 or short time windows surrounding the emergence of the Kawasaki 2014 strain 9 (variant D) which temporarily displaced GII.4 lineages. These studies highlighted antigenic changes in VP1 and alterations in HBGA binding as potential drivers of epidemic success, with variants C and D exhibiting enhanced HBGA-binding affinity 36–38 . Variant E retains conserved HBGA binding sites relative to variants C and D suggesting preservation of strong receptor engagement capacity. Notably, recent work has identified adaptive mutations in VP1, including the K361R substitution, that enhance HBGA binding and may contribute to increased fitness 18 . As this substitution lies outside canonical HBGA binding interfaces, its mechanistic effect remains unclear but may reflect indirect structural adaptations influencing receptor engagement 39 . Despite providing a comprehensive analysis of GII.17 evolution to date, several limitations warrant consideration. First, despite substantial expansion over previous studies, our dataset remains subject to geographic and temporal sampling biases inherent to genomic surveillance with disproportionate representation from high-income regions. Undersampling in parts of Africa, South America and Asia may obscure additional cryptic diversity and transmission patterns. Second, fitness inference assumes that lineage frequency dynamics primarily reflect transmission fitness. While the concordance between inferred fitness and observed replacement patterns supports this interpretation, stochastic epidemiological processes and local transmission heterogeneity could influence estimates, particularly for poorly sampled lineages. Finally, selection analyses identify statistical signals of adaptation but do not directly measure functional effects. Although consistent with published phenotypic data, comprehensive experimental validation of putatively adaptive mutations remains an important direction for future work. Taken together, our findings suggest that viral emergence need not be driven by dramatic pre-adaptive shifts or repeated recombination but can arise through incremental fitness gains acquired during early transmission under altered selective constraints. The evolutionary history of GII.17 is shaped by historical recombination, lineage-specific shifts in inferred mutational processes, and repeated exploration of divergent evolutionary trajectories. This integrative framework may extend beyond noroviruses to other RNA viruses exhibiting episodic genotype replacement and highlights the value of combining phylodynamics, mutational analysis and quantitative fitness inference to understand pathogen emergence. Methods GII.17 genome sequences Publicly available GII.17 genomic sequences were downloaded from NCBI GenBank (retrieved on August 26, 2025) using the following search terms “norovirus” and “GII.17” with a length criteria greater than 1000bp. A total of 1,422 sequences were obtained, and genotyping was performed using the Norovirus Typing Tool v2.0 Tool 40 to confirm the genotype of each norovirus sequence. Sequences were split by ORF and only those covering at least 66% of the ORF was included. Sample metadata was extracted (collection date and country) for each sequence using the NCBI GenBank flatfile source feature where we fetched and parsed records in batches from NCBI’s nucleotide database. After filtering, 429 ORF1, 1,206 ORF2 and 474 ORF3 sequences were retained. Accession numbers are provided as Supplementary Table 1 . All the sequences were multiple aligned using MAFFT (version 7.505) 41 and manually inspected for errors. Similarity plots were generated using SimPlot++ version 3.5.1 with a window size and step size of 200 and 20, respectively 42 . Newly generated genomes from the United Kingdom were obtained from samples using SureSelectXT target enrichment followed by Illumina sequencing as previously described 5 . Written informed consent was obtained from all subjects involved in the study. Newly generated genomes have been deposited into GenBank under the following accession numbers: PV920367-PV920384. To mitigate ORF2 variant-D over-representation (938 sequences) and enable Bayesian inference, we used the Nextstrain pipeline to downsample to 460 sequences (max three per country–month–year). This downsampled set (728 sequences) was only for BEAST as all other analyses used the full dataset. Evolutionary analyses Maximum likelihood phylogenetic trees rooted to non-GII.17 genotypes (GII.13) were inferred using IQ-Tree v. 2.1.2 43 using the best-fit nucleotide substitution model as determined by Model Finder 44 . A root-to-regression approach in TempEst version 1.5.3 45 was used to evaluate the temporal signal in the phylogeny with outliers removed using the interquartile range method. Bayesian analyses were carried out using BEAST v1.10.4 46 with the SRD06 nucleotide substitution model 47 implemented with a four-category gamma distribution model of site-specific rate variation and separate partitions for codon position 1 plus 2 versus position 3 with the Hasegawa-Kishino-Yano (HKY) substitution model on each with an uncorrelated lognormal relaxed molecular clock 48 and a coalescent GMRF Bayesian skyride tree prior 49 . Model parameter estimates were evaluatedusing Tracer v1.7.2 50 to ensure an effective sample size (ESS) value ³ 200 indicating sufficientmixing and convergence. For each genomic region at least three independent Markov Chain Monte Carlo (MCMC) runs were performed and combined using the Log Combiner tool in the BEAST Package. Each chain consisted of 500,000,000 steps and was sampled every 50,000 steps and the first 10% of samples were discarded as the burn-in. In all cases the sampling time (day/month/year) associated with the sequence was used in all cases. Recombination analyses We used two different approaches to identify recombination. First, we used 3SEQ 51 as a statistical test for recombination on our curated dataset as it has been found to be statistically one of the most powerful methods for identifying mosaic regions 52 . 3SEQ detects recombination by identifying mosaic sequence patterns that can be explained as a combination of two parental sequences, using an exact, non-parametric framework based on sequence triplets 52 . For each potential recombinant, 3SEQ evaluates whether the observed pattern of sitewise similarity to two candidate parent sequences deviates significantly from expectations under a strictly clonal evolutionary model. Statistical significance is assessed using an exact test that does not rely on phylogenetic tree reconstruction or breakpoint pre-specification, making the approach robust to alignment noise and substitution-rate heterogeneity. We tested all pairs of sequences from our curated datasets and report p values that are corrected with a Dunn-Šidák correction for the large number of triplets tested. Second, we used the genetic algorithm for recombination detection (GARD) to independently identify recombination breakpoints on genomes 53 . GARD was performed for each dataset using Hyphy v2.5.63 54 . Selection analyses Selection analyses employed a suite of phylogenetic methods as implemented in HyPhy. Internal branches were annotated with the variant E or other tags using the LabelTree.bf script in HyPhy v2.5.63 54 . Internal nodes were only labelled if all the descendants had the same label, otherwise it remained unlabelled. The MG94xREV model was used to estimate the mean omega for specific variant clades. We used four methods in HyPhy v2.5.63 to screen for evidence of natural selection on the focal lineage. The Branch-Site Unrestricted Statistical Test for Episodic Diversification (BUSTED[S]) method 55 applied to internal branches labelled variant E to seek gene-wide evidence of episodic diversifying selection in the variant E clade. Fixed Effects Likelihood (FEL) method 56 was applied to the same branches to identify individual codon sites evolving non-neutrally. The Mixed Effects Model of Evolution (MEME) method 57 was used to detect sites undergoing episodic diversifying selection on internal branches labelled variant E. The Contrast-FEL method 58 was applied to identify sites evolving under different selective pressures between focal (variant E) and reference clades. Finally, we compared the intensity of selective forces acting on the variant E clade using the RELAX method 59 . This method infers where selection strength (ω ratios) differs between the focal and reference clades by estimating a scaling parameter (K). A value of K 1 suggests intensified selection relative to the reference branches. A likelihood ratio test (LRT) is used to compare a null model with equal selection intensity (K = 1) across all branches to an alternative model allowing K ≠ 1 on the test branches, with statistical significance assessed using a chi-square distribution. In all cases we accounted for any spurious polymorphic variants and only included internal branches which are more likely to represent changes that have become fixed in the viral population, rather than transient intra-host variants or sequencing artifacts. Branches that were not part of the test or background are treated as a nuisance set. Estimation of relative fitness for each lineage The relative fitness dynamics of GII.17 variants were estimated using the model developed by Lefrancq and colleagues 19 . The VP1 maximum likelihood tree reconstructed from from above was time calibrated and rooted on the branch that minimized the squared deviation of the root-to-tip regression using the augur pipeline. The timescaled tree was used as input for automatic lineage detection and relative fitness estimation models with genome_length = 1634, mutation_rate = 2.1 x 10 -3 (as calculated from augur), with timescale = 1 and wind = 180 days. Automatic lineage detection partitioned the tree into seven defined lineages. The multinomial logisitic fitness model was fit with the min_year = 2000 and window 180/365 to quantify the fitness of each lineage. We quantified the agreement between previously defined GII.17 variants and automatically inferred clades detected with phylowave using the adjusted rand index (ARI), which accounts for agreement expected by chance 20 . An ARI of 1 indicates a perfect concordance whereas values close to 0 indicate random assignment. Reconstruction of variant mutational spectrum Mutational spectra were reconstructed using MutTui v2.0.2 60 . This method performs ancestral reconstruction onto the phylogenetic tree using treetime v0.8.1 61 which enables identification of the direction of each mutation. The mutational spectrum is then calculated by counting the numbers of each contextual mutation across the clade. Single nucleotide mutations are classified under the single base substitution (SBS) spectrum. Mutations occurring at two adjacent genome positions on the same phylogenetic branch are categorized as double base substitutions (DBS), while clusters of three or more adjacent mutations are excluded from analysis. The mutational spectra were rescaled to account for genome composition. The mutational spectrum for variant C only contained less than 300 mutations (Between 300 and 600 mutations has been suggested for the mutational spectrum to be accurately estimated 60 so we did not attempt to examine the detailed contextual patterns in these mutations. Declarations Conflict of Interest: The authors declare there are no competing interests. Data Availability: Accession numbers are provided as supplementary table 1. Newly sequenced genomes have been deposited in GenBank under the following accession numbers: PV920367-PV920384. Acknowledgments: The authors wish to acknowledge UCL genomes at University College London for sequencing of the NoroPatrol samples (RRID:SCR_027010). References Pires, S. M. et al. Aetiology-Specific Estimates of the Global and Regional Incidence and Mortality of Diarrhoeal Diseases Commonly Transmitted through Food. PLoS One 10 , e0142927 (2015). Bartsch, S. M., Lopman, B. A., Ozawa, S., Hall, A. J. & Lee, B. Y. Global Economic Burden of Norovirus Gastroenteritis. PLoS One 11 , e0151219 (2016). Chhabra, P. et al. Updated classification of norovirus genogroups and genotypes. Journal of General Virology 100 , 1393–1406 (2019). Parra, G. I. Emergence of norovirus strains: A tale of two genes. Virus Evol. 5 , (2019). Lindesmith, L. C. et al. Immune Imprinting Drives Human Norovirus Potential for Global Spread. mBio 13 , (2022). Tohma, K., Lepore, C. J., Gao, Y., Ford-Siltz, L. A. & Parra, G. I. Population genomics of gii.4 noroviruses reveal complex diversification and new antigenic sites involved in the emergence of pandemic strains. mBio 10 , (2019). Matsushima, Y. et al. Genetic analyses of GII.17 norovirus strains in diarrheal disease outbreaks from december 2014 to march 2015 in Japan reveal a novel polymerase sequence and amino acid substitutions in the capsid region. Eurosurveillance 20 , 1–6 (2015). Lu, J. et al. Gastroenteritis Outbreaks Caused by Norovirus GII.17, Guangdong Province, China, 2014–2015 - Volume 21, Number 7—July 2015 - Emerging Infectious Diseases journal - CDC. Emerg. Infect. Dis. 21 , 1240–1242 (2015). Chan, M. C. W. et al. Rapid emergence and predominance of a broadly recognizing and fast-evolving norovirus GII.17 variant in late 2014. Nature Communications 2015 6:1 6 , 1–9 (2015). Rackoff, L. A., Bok, K., Green, K. Y. & Kapikian, A. Z. Epidemiology and Evolution of Rotaviruses and Noroviruses from an Archival WHO Global Study in Children (1976–79) with Implications for Vaccine Design. PLoS One 8 , e59394 (2013). Chhabra, P. et al. Increased circulation of GII.17 noroviruses, six European countries and the United States, 2023 to 2024. Euro Surveill. 29 , 2400625 (2024). National norovirus and rotavirus report, week 19 report: data to week 17 (data up to 27 April 2025) - GOV.UK. https://www.gov.uk/government/statistics/national-norovirus-and-rotavirus-surveillance-reports-2024-to-2025-season/national-norovirus-and-rotavirus-report-week-19-report-data-to-week-17-data-up-to-27-april-2025. Barclay, L. & Vinjé, J. Early Release - Increasing Predominance of Norovirus GII.17 over GII.4, United States, 2022–2025 - Volume 31, Number 7—July 2025 - Emerging Infectious Diseases journal - CDC. https://doi.org/10.3201/EID3107.250524 doi:10.3201/EID3107.250524. Parra, G. I. et al. Static and Evolving Norovirus Genotypes: Implications for Epidemiology and Immunity. PLoS Pathog. 13 , e1006136 (2017). Dinu, S., Oprea, M., Iordache, R. I., Rusu, L. C. & Usein, C. R. Genome characterisation of norovirus GII.P17-GII.17 detected during a large gastroenteritis outbreak in Romania in 2021. Arch. Virol. 168 , (2023). Gomes, K. A. et al. Multi-Province Outbreak of Acute Gastroenteritis Linked to Potential Novel Lineage of GII.17 Norovirus in Argentina in 2024. Viruses 17 , (2025). Yang, J., Qi, Z., Chen, S. & Xiong, C. Epidemiological and molecular investigation of a norovirus GII.17 outbreak in a kindergarten in Shanghai, China. Diagn. Microbiol. Infect. Dis. 114 , 117089 (2026). Tohma, K. et al. GII.17 norovirus re-emerged in the 2020s as a result of dynamic and adaptive evolutionary processes. Nature Communications 2025 16:1 16 , 11596- (2025). Lefrancq, N. et al. Learning the fitness dynamics of pathogens from phylogenies. Nature 2025 637:8046 637 , 683–690 (2025). Hubert, L. & Arabie, P. Comparing partitions. J. Classif. 2 , 193–218 (1985). Tohma, K., Lepore, C. J., Ford-Siltz, L. A. & Parra, G. I. Phylogenetic Analyses Suggest that Factors Other Than the Capsid Protein Play a Role in the Epidemic Potential of GII.2 Norovirus. mSphere 2 , (2017). Eden, J.-S., Tanaka, M. M., Boni, M. F., Rawlinson, W. D. & White, P. A. Recombination within the Pandemic Norovirus GII.4 Lineage. J. Virol. 87 , 6270 (2013). Siebenga, J. J. et al. Epochal Evolution of GGII.4 Norovirus Capsid Proteins from 1995 to 2006. J. Virol. 81 , 9932–9941 (2007). Lindesmith, L. C. et al. Mechanisms of GII.4 Norovirus Persistence in Human Populations. PLoS Med. 5 , e31 (2008). Duchêne, S., Holmes, E. C. & Ho, S. Y. W. Analyses of evolutionary dynamics in viruses are hindered by a time-dependent bias in rate estimates. Proceedings of the Royal Society B: Biological Sciences 281 , (2014). Lu, J. et al. The Evolution and Transmission of Epidemic GII.17 Noroviruses. J. Infect. Dis. 214 , 556 (2016). Bruggink, L. D., Garcia-Clapes, A., Tran, T., Druce, J. D. & Thorley, B. R. Decreased incidence of enterovirus and norovirus infections during the COVID-19 pandemic, Victoria, Australia, 2020. Commun. Dis. Intell. 45 , (2021). Lennon, R. P. et al. Norovirus Infections Drop 49% in the United States with Strict COVID-19 Public Health Interventions. Acta Med. Acad. 49 , 278–280 (2020). Douglas, A. et al. Impact of COVID-19 on national surveillance of norovirus in England and potential risk of increased disease activity in 2021. Journal of Hospital Infection 112 , 124–126 (2021). O’Reilly, K. M. et al. Predicted norovirus resurgence in 2021–2022 due to the relaxation of nonpharmaceutical interventions associated with COVID-19 restrictions in England: a mathematical modeling study. BMC Med. 19 , 1–10 (2021). Epifanova, N. V. et al. Appearance and spread of norovirus genotype GII.17 subcluster C2 (Romania-2021 like) in Nizhny Novgorod, Russia, 2021–2023. Arch. Virol. 170 , 1–9 (2025). Minin, V. N., Bloomquist, E. W. & Suchard, M. A. Smooth Skyride through a Rough Skyline: Bayesian Coalescent-Based Inference of Population Dynamics. Mol. Biol. Evol. 25 , 1459–1471 (2008). Gomes, K. A. et al. Multi-Province Outbreak of Acute Gastroenteritis Linked to Potential Novel Lineage of GII.17 Norovirus in Argentina in 2024. Viruses 2025, Vol. 17, 17 , (2025). Chen, C. et al. Molecular evolution of GII.P17-GII.17 norovirus associated with sporadic acute gastroenteritis cases during 2013–2018 in Zhoushan Islands, China. Virus Genes 56 , 279–287 (2020). Das Neves Costa, L. C. P. et al. Molecular and evolutionary characterization of norovirus GII.17 in the northern region of Brazil. BMC Infectious Diseases 2019 19:1 19 , 1021- (2019). Jin, M. et al. Characterization of the new GII.17 norovirus variant that emerged recently as the predominant strain in China. Journal of General Virology 97 , 2620–2632 (2016). Zhang, X. F. et al. An outbreak caused by GII.17 norovirus with a wide spectrum of HBGA-associated susceptibility. Sci. Rep. 5 , 1–10 (2015). Estienney, M. et al. Epidemiological Impact of GII.17 Human Noroviruses Associated With Attachment to Enterocytes. Front. Microbiol. 13 , 858245 (2022). Qian, Y. et al. Structural Adaptations of Norovirus GII.17/13/21 Lineage through Two Distinct Evolutionary Paths. J. Virol. 93 , e01655-18 (2018). Kroneman, A. et al. An automated genotyping tool for enteroviruses and noroviruses. Journal of Clinical Virology 51 , 121–125 (2011). Katoh, K. & Standley, D. M. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Mol. Biol. Evol. 30 , 772–780 (2013). Samson, S., Lord, É. & Makarenkov, V. SimPlot++: a Python application for representing sequence similarity and detecting recombination. Bioinformatics 38 , 3118–3120 (2022). Minh, B. Q. et al. IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Mol. Biol. Evol. 37 , 1530–1534 (2020). Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K. F., von Haeseler, A. & Jermiin, L. S. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Methods 14 , 587–589 (2017). Rambaut, A., Lam, T. T., Carvalho, L. M. & Pybus, O. G. Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen). Virus Evol. 2 , (2016). Suchard, M. A. et al. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 4 , (2018). Shapiro, B., Rambaut, A. & Drummond, A. J. Choosing Appropriate Substitution Models for the Phylogenetic Analysis of Protein-Coding Sequences. Mol. Biol. Evol. 23 , 7–9 (2006). Drummond, A. J., Ho, S. Y. W., Phillips, M. J. & Rambaut, A. Relaxed Phylogenetics and Dating with Confidence. PLoS Biol. 4 , e88 (2006). Minin, V. N., Bloomquist, E. W. & Suchard, M. A. Smooth Skyride through a Rough Skyline: Bayesian Coalescent-Based Inference of Population Dynamics. Mol. Biol. Evol. 25 , 1459–1471 (2008). Rambaut, A., Drummond, A. J., Xie, D., Baele, G. & Suchard, M. A. Posterior summarization in Bayesian phylogenetics using Tracer 1.7. Syst. Biol. 67 , 901–904 (2018). Lam, H. M., Ratmann, O. & Boni, M. F. Improved Algorithmic Complexity for the 3SEQ Recombination Detection Algorithm. Mol. Biol. Evol. 35 , 247–251 (2018). Boni, M. F., Posada, D. & Feldman, M. W. An Exact Nonparametric Method for Inferring Mosaic Structure in Sequence Triplets. Genetics 176 , 1035–1047 (2007). Pond, S. L. K., Posada, D., Gravenor, M. B., Woelk, C. H. & Frost, S. D. W. Automated Phylogenetic Detection of Recombination Using a Genetic Algorithm. Mol. Biol. Evol. 23 , 1891–1901 (2006). Kosakovsky Pond, S. L. et al. HyPhy 2.5—A Customizable Platform for Evolutionary Hypothesis Testing Using Phylogenies. Mol. Biol. Evol. 37 , 295–299 (2020). Murrell, B. et al. Gene-Wide Identification of Episodic Selection. Mol. Biol. Evol. 32 , 1365 (2015). Kosakovsky Pond, S. L. & Frost, S. D. W. Not So Different After All: A Comparison of Methods for Detecting Amino Acid Sites Under Selection. Mol. Biol. Evol. 22 , 1208–1222 (2005). Murrell, B. et al. Detecting Individual Sites Subject to Episodic Diversifying Selection. PLoS Genet. 8 , e1002764 (2012). Kosakovsky Pond, S. L., Wisotsky, S. R., Escalante, A., Magalis, B. R. & Weaver, S. Contrast-FEL—A Test for Differences in Selective Pressures at Individual Sites among Clades and Sets of Branches. Mol. Biol. Evol. 38 , 1184–1198 (2021). Wertheim, J. O., Murrell, B., Smith, M. D., Pond, S. L. K. & Scheffler, K. RELAX: Detecting Relaxed Selection in a Phylogenetic Framework. Mol. Biol. Evol. 32 , 820–832 (2015). Ruis, C., Tonkin-Hill, G., Floto, R. A. & Parkhill, J. Calculating and applying pathogen mutational spectra using MutTui. bioRxiv 2023.06.15.545111 (2023) doi:10.1101/2023.06.15.545111. Sagulenko, P., Puller, V. & Neher, R. A. TreeTime: Maximum-likelihood phylodynamic analysis. Virus Evol. 4 , (2018). Tables Table 1. Overview of selection among GII.17 variant E internal branches. S: the number of codon sites in the alignment. ⍵: mean estimate on variant E clade internal branches (MG94XREV model). Sites under positive selection have been inferred using MEME, negative selection - FEL, intensified:⍵ variant E >⍵ other - Contrast-FEL. Relax reports the p-value and the intensification/relaxation parameter for overall selective pressure on the variant E branches relative to the reference clade branches. N.S: not significant. ORF Gene S ⍵ Sites in variant E evolving under (P< 0.05) RELAX, K (p) positive negative intensified ORF1 p48 350 0.175 5 12 1 0.68 (0.020) NTPase 366 0.093 2 10 1 0.42 (N.S) p22 179 0.169 0 3 0 0.34 (N.S) VPg 133 0.032 1 9 0 0.79 (N.S) Pro 181 0.011 0 7 0 0.36 (0.020) RdRp 510 0.084 2 36 1 0.52 (N.S) ORF1 1715 0.1006 11 91 1 0.53 (0.011) ORF2 VP1 544 0.079 3 44 1 0.63 (0.088) ORF3 VP2 266 0.346 4 3 0 0.35 (0.0004) Table 2. Genome wide lineage specific mutations found within GII.17 variant E. Protein Position Variant C amino acid Variant E amino acid Frequency in variant E 1 Frequency in variant C 1 Maximum frequency in other lineages 2 p22 120 T S 99.3 0 0 p22 131 I V 99.3 0 0 VPg 76 S N 98.5 4.3 4.3 polymerase 102 Q E 98.5 0 0 VP1 144 L I 99.3 0 0 VP1 361 Q R 98.7 0 0 VP1 372 R K 97.4 2.3 2.3 VP1 384 K R 100 0 0 VP1 409 L V 99.3 4.5 4.5 VP1 435 F S 99.3 0 0.9 VP1 447 V I 98 0 0.5 VP2 144 A G 99.3 0 0 1 Frequency is denoted as percentage of variant sequences containing a particular amino acid 2 Maximum frequency is the percentage of sequences other than variant C that contain that amino acid Additional Declarations There is NO Competing Interest. Supplementary Files SFigure1.pdf Supplementary Fig. 1. Similarity plot analysis reveals divergence across ORF1 and ORF3. Simplot similarity analyses compared to the focal GII.17 lineage against reference variants A-D across (a) ORF1 and (b) ORF3. The x-axis indicates nucleotide position along each open reading frame and the y-axis shows pairwise sequence similarity. Across both ORF1 and ORF3 the focal lineage displays consistently reduced similarity relative to established variants. SFigure2.pdf Supplementary Fig. 2. Proposed evolutionary model for the origin of epidemic GII.17 variants. Schematic representation of the inferred evolutionary history underlying the emergence of epidemic GII.17 noroviruses. A historical recombination event at the ORF1/ORF2 junction combined an ancestral B-like lineage contributing the non-structural ORF1 region (left) with an ancestral D-like lineage contributing the structural ORF2/ORF3 region (right), generating a proto-D recombinant ancestor. This recombinant lineage subsequently diversified to give rise to variants C, D, and E, which have driven successive epidemic waves since 2013. Dashed lines indicate inferred evolutionary relationships rather than direct ancestral sequences, and dates denote the approximate period of first detection for each variant. The model summarizes evidence from phylogenetic, recombination, and similarity analyses and supports a single shared recombinant origin for variants C–E rather than multiple independent recombination events. SFigure3.pdf Supplementary Fig. 3. Distinct mutational spectra across GII.17 variants C, D and E. Each panel a) variant C (b) variant D (c) variant E shows the proportional contribution of the trinucleotide substitution contexts collapsed into multiple substitution classes. Mutational spectra are corrected for genome composition. Distinct colour schemes represent different substitution types. SFigure4.pdf Supplementary Fig. 4. Model fits of the index dynamics using the inferred set of lineages.(a) Temporal trajectory of the computed index through time for each GII.17 sequence for each automatically inferred clade (groups 1-7) (b) Observed versus predicated index values indicating that the inferred clade structure and associated dynamics are well captured by the model. Colours represent the different lineages identified by the different index dynamics. SupplementaryTable1.xlsx Supplementary Table 1 Cite Share Download PDF Status: Under Review Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8923591","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Analysis","associatedPublications":[],"authors":[{"id":601207118,"identity":"21c60bc2-7f0c-4b6a-a0b4-942c913e3ace","order_by":0,"name":"Damien Tully","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABAElEQVRIiWNgGAWjYBACeyA+AMT1DezMDQwfChgSYDIGuLQYNkC08DAwMzYwzjBgSOAhpMXgAIQGa2HmIUrL8dOJBz7uYOAxOMzYJm1jUJdnL3aA8cMPhsPGOLWcyd1wcOYZqJYcg8PFPNIJzJI9DIfNcDssd8Nh3ja4lgOJPdIJDNIMDIdtcGo5/3bD4b8wLRYGdSAtzL/xarkBtIURpoXBgBmkhQ1kC06HGc54u+Fgb5sEj+RhxmbLHoPDiT23E9uAjHSc3rfnz9384WebDQ/f8eaDN35U1CW2z04+DGRYg2IMH5AAESwSEA5jA55YQQXMH4hTNwpGwSgYBSMNAADAV1dv2N4b0QAAAABJRU5ErkJggg==","orcid":"https://orcid.org/0000-0002-7620-9340","institution":"LSHTM","correspondingAuthor":true,"prefix":"","firstName":"Damien","middleName":"","lastName":"Tully","suffix":""},{"id":601207119,"identity":"884b485f-b948-4917-aabf-2e75740d9600","order_by":1,"name":"Sunando Roy","email":"","orcid":"","institution":"UCL Institute of Child Health","correspondingAuthor":false,"prefix":"","firstName":"Sunando","middleName":"","lastName":"Roy","suffix":""},{"id":601207120,"identity":"afffb963-68e8-434f-84b5-82ee2fa92896","order_by":2,"name":"Helena Tutill","email":"","orcid":"https://orcid.org/0000-0002-9977-9012","institution":"Genetics and Genomic Medicine Department, GOS Institute of Child Health, University College London, London, UK","correspondingAuthor":false,"prefix":"","firstName":"Helena","middleName":"","lastName":"Tutill","suffix":""},{"id":601207121,"identity":"c5c4d3c2-f128-4262-affd-58849d99be7f","order_by":3,"name":"Rachel Williams","email":"","orcid":"https://orcid.org/0000-0001-8057-5063","institution":"UCL Institute of Child Health","correspondingAuthor":false,"prefix":"","firstName":"Rachel","middleName":"","lastName":"Williams","suffix":""},{"id":601207122,"identity":"f083246c-c7e8-4fa1-a124-adbfc0ce08bf","order_by":4,"name":"Cristina Celma","email":"","orcid":"","institution":"Enteric Virus Unit, The Virus Reference Department, UK Health Security Agency","correspondingAuthor":false,"prefix":"","firstName":"Cristina","middleName":"","lastName":"Celma","suffix":""},{"id":601207123,"identity":"199ae860-0527-4dec-8543-f8e2542f6470","order_by":5,"name":"Lisa Lindesmith","email":"","orcid":"https://orcid.org/0000-0001-9567-0522","institution":"Department of Epidemiology, University of North Carolina, Chapel Hill, North Carolina, USA","correspondingAuthor":false,"prefix":"","firstName":"Lisa","middleName":"","lastName":"Lindesmith","suffix":""},{"id":601207124,"identity":"300991cd-a505-4bc2-8f80-f0d8b7f4dd47","order_by":6,"name":"Ralph Baric","email":"","orcid":"https://orcid.org/0000-0001-6827-8701","institution":"University of North Carolina at Chapel Hill","correspondingAuthor":false,"prefix":"","firstName":"Ralph","middleName":"","lastName":"Baric","suffix":""},{"id":601207125,"identity":"08adbf11-79e3-417f-ab78-25078c16520d","order_by":7,"name":"Judith Breuer","email":"","orcid":"https://orcid.org/0000-0001-8246-0534","institution":"University College London","correspondingAuthor":false,"prefix":"","firstName":"Judith","middleName":"","lastName":"Breuer","suffix":""}],"badges":[],"createdAt":"2026-02-20 08:02:04","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8923591/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8923591/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":105810373,"identity":"c7f2d964-31c3-4ddc-99ab-bc7b8a26ab4f","added_by":"auto","created_at":"2026-03-31 11:12:49","extension":"jpg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":249345,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eTemporal distribution of GII.17 variants. \u0026nbsp;\u003c/strong\u003eBubble plot showing the number of GII.17 norovirus sequences detected over time for each variant (A – E). The x-axis indicates the sampling year and the y-axis denotes variant assignment. Bubble area is proportional to the number of sequences collected in that year.\u003c/p\u003e","description":"","filename":"fig1.jpg","url":"https://assets-eu.researchsquare.com/files/rs-8923591/v1/416d7e415b1423a77e9fcaa9.jpg"},{"id":105904497,"identity":"bd945256-2734-463c-add4-7567d4e70d8e","added_by":"auto","created_at":"2026-04-01 10:09:03","extension":"jpg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":698383,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eGenomic characterisation and phylogenetic profile of GII.17 noroviruses. (a) \u003c/strong\u003eMaximum-likelihood phylogenetic tree of 1206 VP1 nucleotide sequences rooted by the best-fit option from TempEst v1.5.3. Variants are labelled according to the lineage in which they cluster. Insert panel highlights the variant E lineage and its ancestors. Scale bar indicates the number of substitutions per site. \u003cstrong\u003e(b) \u003c/strong\u003eSimplot analysis of 1207 VP1 nucleotide sequences. Variant E sequences are used as the reference and compared against other GII.17 variants.\u003c/p\u003e","description":"","filename":"fig2.jpg","url":"https://assets-eu.researchsquare.com/files/rs-8923591/v1/d227967b501447cd5f227945.jpg"},{"id":105904112,"identity":"582f0790-1f6a-469d-9950-36ac4820ec15","added_by":"auto","created_at":"2026-04-01 10:04:26","extension":"jpg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":763613,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eTime-resolved phylogeny of 728 GII.17 VP1 sequences. (a) \u003c/strong\u003eRegression of root-to-tip genetic distances against sampling dates \u003cstrong\u003e(b) \u003c/strong\u003eHighest independent posterior subtree reconstruction\u003cstrong\u003e (\u003c/strong\u003eHIPSTAR) time-scaled phylogenetic tree from BEAST showing the emergence of the five major GII.17 lineages with absolute time shown on the x axis. Probability density distribution n shows the estimated time of the most recent ancestor (tMRCA) of the variant E lineage to be November 2018. Variants are colour coded while those circles denoted in white represent intermediates and unassigned variants. \u003cstrong\u003e(c)\u003c/strong\u003e Substitution rates of variants C, D and E.\u003c/p\u003e","description":"","filename":"fig3.jpg","url":"https://assets-eu.researchsquare.com/files/rs-8923591/v1/cfcc3c7678726edb8c2e1ca6.jpg"},{"id":105904375,"identity":"2871c70a-0f52-40b0-8228-c5478f6cc7e5","added_by":"auto","created_at":"2026-04-01 10:07:50","extension":"jpg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":555606,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eEvidence for historical recombination underlying modern day GII.17 diversity (a) \u003c/strong\u003eSimilarity plot representing GII.17 variants across the virus genome compared to the 4 divergent B-like sequences from 2023-2024. The x-axis shows nucleotide position along the genome, and the y-axis indicates sequence similarity to variant sequences. Colours denote similarity to different historical GII.17 variants (B–E). Dotted line indicates the inferred recombination breakpoint with occurs at the ORF1/ORF2 junction \u003cstrong\u003e(b) \u003c/strong\u003eExample 3SEQ results for three sequences. Similarity profile for inferred recombinant WT-NORO-2857 (variant E) showing the same breakpoint pattern with one divergent B-like sequence more similar in ORF1 while ORF2/3 are more like a variant D sequence (WT-NORO-2472). Dotted line illustrates the breakpoints identified by 3SEQ.\u003c/p\u003e","description":"","filename":"fig4.jpg","url":"https://assets-eu.researchsquare.com/files/rs-8923591/v1/92719c5974f995fc340260ea.jpg"},{"id":105904327,"identity":"749efd4c-7120-4991-ac86-721e962b0fb5","added_by":"auto","created_at":"2026-04-01 10:07:30","extension":"jpg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":363111,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFitness-driven emergence and replacement of GII.17 variants. (a) \u003c/strong\u003eAutomatic inference of discrete lineages using phylowave.\u003cstrong\u003e \u003c/strong\u003ePhylowave was applied to the time-scaled GII.17 phylogeny to automatically infer discrete clades using a quantitative index akin to the local branching index. This method partitioned the phylogeny into seven clades each representing a lineage with a distinct inferred growth and resistance profile. Clades are coloured consistently across the phylogeny to illustrate their temporal distribution and genealogical structure, highlighting the emergence, expansion and turnover of lineages through time. Known GII.17 variants are labelled from A to E with unlabelled sequences representing phylogenetic intermediate sequences. \u003cstrong\u003e\u0026nbsp;(b) \u003c/strong\u003eTemporal changes in the proportional representation of each automatically inferred lineage estimated using a multinomial logistic fitness model. Coloured data points show observed proportions through time with bars denoting 95% confidence intervals and lines indicate model estimated trajectories. \u003cstrong\u003e(c) \u003c/strong\u003eEstimated relative fitness for each inferred lineage with 95 % credible intervals (bars) indicating uncertainty in model estimates. The highest-fitness lineage (group1) corresponds to variant E. Group 2 corresponds primarily to variant D, groups 3 and 7 to variant C, group 5 to variant B and groups 4 and 6 to variant A, with group 6 representing a divergent subset of variant A viruses including ancestral sequences. Group 7 also contains several older intermediate sequences including Arg13099 and Hu/GII/27-3/Tokyo/1976.\u003c/p\u003e","description":"","filename":"fig5.jpg","url":"https://assets-eu.researchsquare.com/files/rs-8923591/v1/aa1f7f45acd43ec6062725db.jpg"},{"id":105904499,"identity":"3339c6a8-1bdc-42df-ad63-3f9202163c7c","added_by":"auto","created_at":"2026-04-01 10:09:03","extension":"jpg","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":935240,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eDivergent GII.17 lineage emerging from ancestral variant A. \u003c/strong\u003e(a) Maximum-likelihood phylogeny highlighting a distinct cluster of three GII.17 sequences sampled during a localized outbreak in North West England in 2022, branching from the historical variant A lineage (b) Genome-wide similarity plot comparing the divergent lineage to representative A and all publicly available variant A genomes, demonstrating substantial divergence across both structural and non-structural regions (c) Distribution of amino-acid substitutions across VP1 relative to the historical T055/Tunisia/1977 strain, with the majority of changes concentrated in the P2 subdomain.\u003c/p\u003e","description":"","filename":"fig6.jpg","url":"https://assets-eu.researchsquare.com/files/rs-8923591/v1/df35e82f4d6541d91e1c0c8a.jpg"},{"id":105908714,"identity":"357a6826-80a0-4c4b-9bd6-558d23e0da95","added_by":"auto","created_at":"2026-04-01 10:39:22","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":4989738,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8923591/v1/04ab67c0-df97-4ac1-8c62-eab0b5c397d0.pdf"},{"id":105904206,"identity":"9826238f-979d-4127-aa74-c142911c44b1","added_by":"auto","created_at":"2026-04-01 10:06:16","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":479890,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSupplementary Fig. 1. \u0026nbsp;Similarity plot analysis reveals divergence across ORF1 and ORF3. \u003c/strong\u003eSimplot similarity analyses compared to the focal GII.17 lineage against reference variants A-D across (\u003cstrong\u003ea\u003c/strong\u003e) ORF1 and (\u003cstrong\u003eb\u003c/strong\u003e) ORF3. The x-axis indicates nucleotide position along each open reading frame and the y-axis shows pairwise sequence similarity. Across both ORF1 and ORF3 the focal lineage displays consistently reduced similarity relative to established variants.\u003c/p\u003e","description":"","filename":"SFigure1.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8923591/v1/08405d27dccaed1d8a7d171b.pdf"},{"id":105810377,"identity":"3a249538-70a5-4764-9349-9e0ecb7274be","added_by":"auto","created_at":"2026-03-31 11:12:49","extension":"pdf","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":450403,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSupplementary Fig. 2. Proposed evolutionary model for the origin of epidemic GII.17 variants. \u003c/strong\u003eSchematic representation of the inferred evolutionary history underlying the emergence of epidemic GII.17 noroviruses. A historical recombination event at the ORF1/ORF2 junction combined an ancestral B-like lineage contributing the non-structural ORF1 region (left) with an ancestral D-like lineage contributing the structural ORF2/ORF3 region (right), generating a proto-D recombinant ancestor. This recombinant lineage subsequently diversified to give rise to variants C, D, and E, which have driven successive epidemic waves since 2013. Dashed lines indicate inferred evolutionary relationships rather than direct ancestral sequences, and dates denote the approximate period of first detection for each variant. The model summarizes evidence from phylogenetic, recombination, and similarity analyses and supports a single shared recombinant origin for variants C–E rather than multiple independent recombination events.\u003c/p\u003e","description":"","filename":"SFigure2.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8923591/v1/a593403a69bd4471ee734024.pdf"},{"id":105810378,"identity":"8c9d8d98-25e2-48a0-b7c7-efcf5b0d55e0","added_by":"auto","created_at":"2026-03-31 11:12:49","extension":"pdf","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":159176,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSupplementary Fig. 3. Distinct mutational spectra across GII.17 variants C, D and E. \u003c/strong\u003eEach panel a) variant C (b) variant D (c) variant E shows the proportional contribution of the trinucleotide substitution contexts collapsed into multiple substitution classes. Mutational spectra are corrected for genome composition. Distinct colour schemes represent different substitution types.\u003c/p\u003e","description":"","filename":"SFigure3.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8923591/v1/8ac0bbde7dc02fe821a1d3e9.pdf"},{"id":105810380,"identity":"1494bd80-0604-4cb2-a15f-dcc2f7594399","added_by":"auto","created_at":"2026-03-31 11:12:49","extension":"pdf","order_by":4,"title":"","display":"","copyAsset":false,"role":"supplement","size":232783,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSupplementary Fig. 4. Model fits of the index dynamics using the inferred set of lineages.\u003c/strong\u003e(a) Temporal trajectory of the computed index through time for each GII.17 sequence for each automatically inferred clade (groups 1-7)\u003cstrong\u003e \u003c/strong\u003e(b)\u003cstrong\u003e \u003c/strong\u003eObserved versus predicated index values indicating that the inferred clade structure and associated dynamics are well captured by the model. Colours represent the different lineages identified by the different index dynamics.\u003c/p\u003e","description":"","filename":"SFigure4.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8923591/v1/47267b58b75b9466f548fd5b.pdf"},{"id":105810382,"identity":"03d700b1-e764-4911-9760-bcb80cc25049","added_by":"auto","created_at":"2026-03-31 11:12:49","extension":"xlsx","order_by":5,"title":"","display":"","copyAsset":false,"role":"supplement","size":24307,"visible":true,"origin":"","legend":"Supplementary Table 1","description":"","filename":"SupplementaryTable1.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-8923591/v1/c9ae36f1b1878782193e7bb4.xlsx"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"Fitness-driven emergence and lineage replacement underpin the global resurgence of GII.17 noroviruses","fulltext":[{"header":"Introduction ","content":"\u003cp\u003eNorovirus is a leading cause of acute gastroenteritis (AGE), responsible for an estimated 200,000 deaths annually\u0026nbsp;\u003csup\u003e1\u003c/sup\u003e and imposing a substantial economic burden costing billions worldwide\u0026nbsp;\u003csup\u003e2\u003c/sup\u003e. Noroviruses are genetically diverse single-stranded positive sense RNA viruses in the \u003cem\u003eCaliciviridae\u0026nbsp;\u003c/em\u003efamily. Based on phylogenetic relationships, they are classified into at least ten genogroups (GI to GX) of which GI and GII account for most human infections. Further subdivision is based on diversity in the major capsid protein VP1 and the RNA-dependent RNA polymerase (RdRp), yielding more than 48 capsid genotypes and over 60 polymerase (P-)types\u0026nbsp;\u003csup\u003e3\u003c/sup\u003e. Despite this extensive diversity, global norovirus epidemiology over the past three decades has been dominated by genotype GII.4. Successive GII.4 variants, most notably GII.4 Sydney 2012 have repeatedly replaced their predecessors\u0026nbsp;\u003csup\u003e4\u003c/sup\u003e through antigenic drift and immune escape\u0026nbsp;\u003csup\u003e5,6\u003c/sup\u003e, resulting in prolonged global predominance and limited opportunities for sustained displacement by other genotypes. Against this backdrop the emergence of GII.17 represented a notable exception. During the winter of 2014-15, a novel GII.17 variant (GII.17 Kawasaki 2014) rapidly rose to prominence in several Asian countries, temporarily outcompeting GII.4 Syndey causing large gastroenteritis outbreaks in different Asian countries\u0026nbsp;\u003csup\u003e7–9\u003c/sup\u003e. Despite this transient success, GII.17 declined and was detected only sporadically over the subsequent decade, even though the genotype has circulated in humans for nearly 50 years.\u003csup\u003e10\u003c/sup\u003e. \u0026nbsp;This reinforced the view that non-GII.4 genotypes lack the evolutionary capacity for sustained global dominance.\u003c/p\u003e\n\u003cp\u003eRecent surveillance data has challenged this assumption. Since 2023, GII.17 outbreaks have increased markedly in Europe and the United States, overtaking GII.4 in several regions by 2025\u0026nbsp;\u003csup\u003e11–13\u003c/sup\u003e. Phylogenetic analyses identify multiple established GII.17 clusters (A–D)\u0026nbsp;\u003csup\u003e14\u003c/sup\u003e and more recently a genetically distinct lineage first detected in Romania in 2021\u0026nbsp;\u003csup\u003e15\u003c/sup\u003e\u0026nbsp; and subsequently reported across multiple continents\u0026nbsp;\u003csup\u003e11,16–18\u003c/sup\u003e. \u0026nbsp;Although prior studies have characterised antigenic properties and receptor-binding adaptations of this emergent lineage\u0026nbsp;\u003csup\u003e18\u003c/sup\u003e, its broader evolutionary context remains unclear. It is unknown whether its expansion reflects pre-adaptive innovation, recombination or fitness gains acquired during early transmission.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eHere, we present a global, five-decade evolutionary analysis of GII.17 based on more than 1,200 sequences sampled between 1976 and 2025. Integrating phylogenetic inference, molecular clock modelling, selection analyses, mutational spectrum characterisation, and quantitative fitness modelling, we reconstruct the long-term phylodynamic history of this genotype. We show that the dominant GII.17 lineage circulated cryptically prior to global expansion and that its emergence was not driven by recent recombination or intensified stem-lineage selection. Instead, expansion was underpinned by measurable transmission fitness gains acquired during early spread. We further identify lineage-specific shifts in mutational processes, evidence for a shared historical recombinant origin of epidemic variants, and the persistence of divergent lineages branching from ancestral variant A. Together, these findings place the recent resurgence of GII.17 within a global evolutionary framework and demonstrate how incremental fitness gains can enable non-GII.4 genotypes to displace long-standing epidemic lineages.\u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003e\u003cstrong\u003e\u003cem\u003eSequential replacement of GII.17 variants\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe curated over 5,000 publicly available GII.17 sequences from GenBank, retaining high-quality ORF1 (n=429), ORF2 (n=1,206) and ORF3 (n=474) sequences for downstream analyses. Temporal analysis revealed clear sequential variant turnover (\u003cstrong\u003eFig. 1\u003c/strong\u003e). Variant A predominated in the 1970s\u0026ndash;1980s, followed by variants B and C from the mid-1990s to early 2000s. Variant D emerged around 2014 and was subsequently replaced by rapid expansion of variant E from 2021 onwards. Notably, variants A and B persisted at low frequency across decades, consistent with long-term maintenance of ancestral lineages.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003ePhylogenetic characterisation of GII.17 variants\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eMaximum likelihood analysis of 1,207 full-length VP1 sequences (1978-2025) identified five major GII.17 lineages including a recently emerged lineage first reported from Romania in 2021 \u003csup\u003e15\u003c/sup\u003e (\u003cstrong\u003eFig. 2a\u003c/strong\u003e). This lineage comprising 151 VP1 sequences sampled between 2021 \u0026ndash; 2025, shows ~94% nucleotide identity to variant C and 93% to variant D (\u003cstrong\u003eFig. 2b\u003c/strong\u003e). This pattern was consistent across ORF1 (94% and 93% identity with variants C and D respectively) and ORF3 (95% and 93% identity with variants C and D respectively) (\u003cstrong\u003eSupplementary Fig. 1\u003c/strong\u003e). Comparison of intra-lineage diversity in VP1 indicated that this lineage is more diverse with a mean within-in variant pairwise distance of 2.32% compared to 1.08% for variant C and 0.79% for variant D. A small subset of sequences sampled between 2014 and 2016 from geographically diverse locations (Canada, France, the Netherlands, and the UK) occupy basal positions along the stem leading to this lineage, suggesting they are closely related to its unsampled common ancestor. By late 2025, this clade had been detected across multiple continents with new data from the UK showing that this clade was first sampled on 20 July 2021 from a 65-year-old male in North West England, predating the earliest available Romanian cases. Based on expanded phylogenetic evidence we designate this lineage as variant E or GII.17 North West England (2021) subject to review by the ad hoc international norovirus nomenclature working group.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eCryptic pre-pandemic emergence and accelerated evolution of variant E\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eRelaxed molecular clock analyses demonstrated sufficient temporal signal for robust dating (\u003cstrong\u003eFig. 3a\u003c/strong\u003e). \u0026nbsp;Using a Bayesian Markov chain Monte Carlo framework with a non-parametric GMRF skygrid coalescent model and a relaxed molecular clock, we estimated the time to the most recent common ancestor (tMRCA) of variant E to be November 2018 (95% Bayesian credible interval: September 2017 to December 2019) (\u003cstrong\u003eFig. 3b\u003c/strong\u003e). This predates its first detection in the UK in July 2021 by approximately 3 years, consistent with cryptic pre-pandemic circulation. \u0026nbsp;Variant E exhibits markedly elevated evolutionary rates (\u003cstrong\u003eFig. 3c\u003c/strong\u003e), with estimated rates of 5.90 \u0026times; 10⁻\u0026sup3; substitutions per site per year (95% BCI: 4.36 \u0026times; 10⁻\u0026sup3; \u0026ndash; 7.61 \u0026times; 10⁻\u0026sup3;) for VP1 compared to variant C (3.08 \u0026times; 10⁻\u0026sup3;) and variant D (2.18 \u0026times; 10⁻\u0026sup3;). Similarly, for RdRP variant E had elevated evolutionary rates of 6.47 \u0026times; 10⁻\u0026sup3; substitutions per site per year (95% BCI: 4.42 \u0026times; 10⁻\u0026sup3; \u0026ndash; 8.69 \u0026times; 10⁻\u0026sup3;) compared to variant D (2.34 \u0026times; 10⁻\u0026sup3;) while robust estimates could not be obtained for variant C.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eShared recombinant origin of epidemic GII.17 variants\u0026nbsp;\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eRecombination analysis of 429 complete GII.17 genomes provides strong evidence for a single historical recombination event that shaped the evolutionary origin of the major epidemic GII.17 variants (C, D and E). Using multiple complementary approaches, we consistently identified a breakpoint at the ORF1/ORF2 junction, indicating distinct evolutionary histories for non-structural and structural genomic regions.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eGARD detected a single significant breakpoint at position 5,191 bp (\u0026Delta;c-AIC\u003cem\u003enull model\u003c/em\u003e = 12,263), precisely coinciding with the ORF1/ORF2 boundary. \u0026nbsp;This result was independently corroborated by 3SEQ which identified 429 statistically significant recombination events (\u003cem\u003ep\u0026nbsp;\u003c/em\u003e\u0026lt; 10\u003csup\u003e-23\u003c/sup\u003e) with 99.3% of inferred breakpoints localising to the same junction. The dominant recombination signal indicates that variants C, D, and E share a common recombinant ancestry, rather than arising through multiple independent recombination events. Analysis of inferred parental lineages revealed that over 90% of recombinant events involved B-like and D-like parental combinations, producing variants C, D, or E as recombinant descendants. Notably, a small set of recently sampled B-like sequences from the United States (2023\u0026ndash;2024) occupy a key phylogenetic position. These sequences are substantially more similar to variants C\u0026ndash;E across ORF1 (90\u0026ndash;93% nucleotide identity) than to basal historical variant B sequences (~84%), while remaining highly divergent in the capsid region (\u003cstrong\u003eFig. 4a\u003c/strong\u003e). In 3SEQ analyses, these B-like sequences predominantly function as parental donors rather than recombinant products, consistent with their role as close descendants of the ancestral ORF1 donor lineage (\u003cstrong\u003eFig. 4b\u003c/strong\u003e). \u0026nbsp;\u003c/p\u003e\n\u003cp\u003eTogether, these findings support a parsimonious evolutionary model in which a single recombination event occurred prior to 2013, combining a B-like ORF1 with a D-like ORF2/3 to generate the common ancestor of the epidemic C-D-E clade (\u003cstrong\u003eSupplementary Fig. 2\u003c/strong\u003e). Subsequent diversification of this recombinant lineage gave rise to variants C, D and ultimately the currently dominant variant E. The persistence and recent detection of divergent B-like sequences suggest long-term circulation of the ancestral genomic architecture potentially maintained in cryptic transmission chains or under-sampled reservoirs.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003ePost-emergence relaxation and focal adaptation shape variant E evolution\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo investigate whether adaptive evolution preceded or followed the emergence of variant E, we examined patterns of selection across the GII.17 genome using a suite of phylogenetic methods. Across all genes mean \u0026omega; values were low (\u0026omega; range: 0.0011\u0026ndash;0.346) (\u003cstrong\u003eTable 2\u003c/strong\u003e) indicating pervasive purifying selection consistent with strong functional constraint. We first tested whether selection intensity differed on the stem branch leading to variant E, which represents viral evolution prior to its epidemiological emergence. If key adaptive changes enabling efficient human transmission occurred before widespread circulation, we would expect a detectable shift in selection pressure along this branch. However, RELAX analyses showed no evidence for either intensification or relaxation of selection on the stem branch (K= 0.68 ; \u003cem\u003ep\u003c/em\u003e \u003cem\u003e=\u003c/em\u003e 0.243), indicating that variant E did not arise through strong pre-adaptive evolution prior to its emergence. In contrast, analyses of internal branches with the variant E clade revealed evidence for altered selection regimes after establishment in the human population. RELAX identified significant relaxation of selection in several genes including p48, protease, entire ORF1 and ORF3 (VP2) (\u003cstrong\u003eTable 1\u003c/strong\u003e) suggesting a genome-wide reduction in evolutionary constraints to facilitate viral adaptation during early epidemic spread. Despite, this overall relaxation, a small number of codon sites showed evidence of intensified or episodic diversifying selection. Across the genome, 17, sites displayed signals of lineage-specific selection. In p48, position 10 shows a dramatically higher \u0026omega; in variant E viruses compared to other GII.17 sequences while sites 100, 293 and 437 in the NTPase, RdRp and VP1 also show intensified selection. 14 other sites showed evidence of episodic diversifying selection.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eConsistent with these patterns, we identified 12 lineage-specific amino acid substitutions that define variant E (\u003cstrong\u003eTable 2\u003c/strong\u003e). These substitutions are highly present within variant E (\u0026gt;90% frequency) and are rare or absent in variant C and other GII.17 lineages (\u0026lt;10%). Eight of the substitutions occur in structural proteins with the majority located in VP1 and clustering within residues 361-447. The remaining four substitutions occur in non-structural proteins (NS3, NS4 and NS7), indicating coordinated changes affecting both capsid structure and replication associated functions. These results indicate post-emergence adaptive refinement rather than pre-adaptive innovation.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eLineage-specific remodelling of mutational processes\u0026nbsp;\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo assess whether shifts in underlying mutational processes accompanied successive GII.17 variant replacements, we compared context-dependent mutational spectra across variants C, D, and E (\u003cstrong\u003eSupplementary Fig. 3\u003c/strong\u003e).Although all variants exhibited a strong transition bias, their trinucleotide-resolved mutational profiles differed significantly, indicating lineage-specific remodeling of mutational pressures. Variant C displayed a comparatively balanced spectrum with a higher proportion of G\u0026gt;A transitions (\u003cstrong\u003eSupplementary Fig. 3a\u003c/strong\u003e)\u003cstrong\u003e,\u003c/strong\u003e consistent with a more heterogeneous mutational process during its circulation. In contrast, variant E exhibited a pronounced enrichment of C\u0026gt;T substitutions (\u003cstrong\u003eSupplementary Fig. 3c\u003c/strong\u003e), the majority of which occurred outside canonical APOBEC-associated contexts, suggesting a shift toward spontaneous deamination\u0026ndash;mediated mutagenesis rather than host-driven RNA editing. Variant D showed an intermediate profile characterised by a clear APOBEC-like cytidine deamination signature (\u003cstrong\u003eSupplementary Fig. 3b\u003c/strong\u003e). Despite overall similarity between variants D and E, these differences indicate that distinct mutational mechanisms may underlie the evolution of successive GII.17 variants.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eFitness-based lineage replacement of GII.17 variants\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo estimate the relative fitness of each GII.17 variant, we implemented Phylowave \u003csup\u003e19\u003c/sup\u003e, a recently developed fitness-inference framework for rapidly evolving pathogens. This approach automatically inferred seven distinct lineages across the GII.17 phylogeny (\u003cstrong\u003eFig. 5; Supplementary Fig. 5\u003c/strong\u003e). Agreement between these automatically inferred clades and previously defined GII.17 variants was quantified using the adjusted rand index (ARI), which accounts for agreement expected by chance \u003csup\u003e20\u003c/sup\u003e. This analysis revealed near-perfect concordance between classifications (ARI = 0.997), with only a small number of phylogenetically intermediate sequences contributing to residual disagreement. We next estimated the relative fitness of each inferred lineage using a multinomial logistic fitness model (\u003cstrong\u003eFig. 5b\u003c/strong\u003e). This analysis revealed marked differences in lineage dynamics through time with clear shifts in relative prevalence corresponding to successive variant replacements. Fitness estimates indicated that group 1, corresponding to variant E, exhibited substantially higher relative fitness than all other GII.17 lineages (\u003cstrong\u003eFig. 5c\u003c/strong\u003e), consistent with its rapid expansion and epidemiological dominance. \u0026nbsp;In addition, the replacement of Variant C by Variant D was associated with a measurable increase in estimated fitness, suggesting a stepwise improvement in lineage success prior to the emergence of Variant E. Notably, group 6, which comprises a divergent subset of variant A sequences\u0026mdash;including ancestral viruses such as T055/Tunisia/1977 and Hu/GII.17/C142/1978/GUF also displayed elevated fitness relative to other variants, although remaining lower than that observed for Variant E.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eDivergent lineage emerging from ancestral variant A\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003ePhylogenetic analysis identified a small, deeply divergent GII.17 lineage branching from ancestral variant A (\u003cstrong\u003eFig. 6a\u003c/strong\u003e) and detected during a localized hospital outbreak in young children in North West England in 2022. Despite substantial divergence across both structural and non-structural regions (\u003cstrong\u003eFig. 6b\u003c/strong\u003e) 76% of mutations, relative to its closest ancestor T055/Tunisia/1977, are confined to the P2 subdomain (\u003cstrong\u003eFig. 6c\u003c/strong\u003e). This lineage has not been detected beyond this setting and shows no evidence of sustained transmission. These findings indicate that GII.17 is capable of repeatedly generating divergent lineages, although only a subset such as variant E acquire the fitness advantage required for widespread dissemination (\u003cstrong\u003eFig. 6\u003c/strong\u003e).\u0026nbsp;\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eIn this study, we present a global, five-decade evolutionary reconstruction of GII.17 noroviruses and demonstrate that its recent resurgence reflects fitness-driven lineage replacement rather than dramatic pre-adaptive emergence or repeated recombination. By integrating phylogenetic, recombination, selection, mutational landscape and quantitative fitness-based analyses we define a coherent evolutionary pathway that is best explained by post-emergence fine-tuning during early spread, rather than by major adaptive shifts or repeated recombination.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eA central finding is that epidemic GII.17 variants (C, D and E) share a single historical recombinant origin at the ORF1/ORF2 junction. While recombination at this boundary is well documented in noroviruses it has been implicated in the emergence of multiple epidemic genotypes, including GII.4 and GII.2\u0026nbsp;\u003csup\u003e21,22\u003c/sup\u003e. However, unlike the frequent recombination observed in GII.4 our results indicate that modern GII.17 diversity largely reflects diversification of a single recombinant lineage rather than repeated independent recombination events. This suggests that recombination provided an early permissive genomic scaffold but not the immediate trigger for epidemic expansion which was instead shaped by lineage-specific fitness advantages. The identification of contemporary divergent variant B sequences (2023–2024) as the closest known descendants of the ancestral ORF1 donor lineage suggests that this proto-ORF1 genomic architecture was either historically under-sampled or persisted within a cryptic reservoir (e.g. animal hosts or immunocompromised individuals) that has only recently been detected.\u003c/p\u003e\n\u003cp\u003eSelection analyses further clarifies the mode of emergence. We detect no evidence for intensified or directional selection along the stem branch leading to variant E, arguing against strong pre-adaptive evolution prior to epidemiological expansion. Instead, altered selective regimes and modest lineage-defining substitutions accumulated after establishment in the human population. This contrasts with the well-described epochal evolution of GII.4, in which repeated waves of adaptive change in VP1 drive immune escape and lineage replacement\u0026nbsp;\u003csup\u003e23,24\u003c/sup\u003e. In GII.17, by contrast, emergence appears to reflect post-establishment adaptive refinement under relaxed purifying selection rather than the sudden acquisition of a large-effect immune-escape mutation.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eBayesian molecular clock analyses reveal elevated substitution rates in variant E relative to earlier variants. Such increases likely reflect short-term rate inflation during early epidemic growth, a phenomenon observed in other rapidly expanding RNA viruses\u0026nbsp;\u003csup\u003e25\u003c/sup\u003e. Nevertheless, the combination of accelerated accumulation of substitutions and relaxed constraint likely facilitated the fixation of a limited number of lineage-defining changes that collectively conferred a measurable transmission advantage. Our quantitative fitness inferences demonstrate that variant E has a marked fitness advantage relative to all previously circulating GII.17 lineages, mirroring observed patterns of lineage replacement. This concordance between phylogenetic inference and frequency dynamics supports the interpretation that transmission fitness differences contributed substantially to its global expansion, beyond stochastic epidemiological effects alone. Notably, the second-highest relative fitness was observed in a subset of variant A sequences, including our recently described divergent UK genomes. Although limited sampling precludes firm conclusions regarding their competitive potential, these finding highlights substantial cryptic diversity within the GII.17 phylogeny that may be underrepresented in routine outbreak-focused surveillance.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eWe further identify lineage-specific remodelling of mutational processes across GII.17 variants. Variant D displays an APOBEC-like cytidine deamination signature, whereas variant E shows enrichment of C\u0026gt;T transitions outside canonical editing motifs, consistent with altered mutational pressures during transmission. These findings suggest that shifts in the cellular or immunological environments encountered during spread may influence not only which mutations are selected, but also which mutations are generated. Thus, viral evolution may be shaped jointly by selection and by changes in the mutational landscape itself.\u003c/p\u003e\n\u003cp\u003eThe estimated three-year period of cryptic circulation prior to detection of variant E contrasts with the rapid emergence of the Kawasaki (variant D) lineage in 2014-2015\u0026nbsp;\u003csup\u003e26\u003c/sup\u003e. This delayed detection of variant E likely\u0026nbsp;reflects widespread suppression of norovirus transmission during COVID-19-associated non-pharmaceutical interventions including lockdowns, social distancing, and enhanced hygiene measures\u0026nbsp;\u003csup\u003e27–29\u003c/sup\u003e\u0026nbsp; followed by a rebound in population susceptibility after relaxation\u0026nbsp;\u003csup\u003e30\u003c/sup\u003e .Such a transient transmission bottleneck may have inadvertently provided an opportunity for variant E to rise in relative frequency before encountering competition from established GII.4 lineages.\u0026nbsp;\u0026nbsp;Earlier estimates by Epifanova et al.\u0026nbsp;\u003csup\u003e31\u003c/sup\u003eplacing the origin in 2017 were based on limited sampling (\u0026lt;20 sequences) and likely lacked sufficient temporal resolution. Our expanded dataset and GMRF Bayesian Skyline model provided improved flexibility in capturing demographic change over time\u0026nbsp;\u003csup\u003e32\u003c/sup\u003e. These evolutionary patterns are consistent with prior structural and antigenic investigations of GII.17. Earlier\u0026nbsp;studies of GII.17 have largely focused on regional outbreaks\u0026nbsp;\u003csup\u003e33–35\u003c/sup\u003eor short time windows surrounding the emergence of the Kawasaki 2014 strain\u0026nbsp;\u003csup\u003e9\u003c/sup\u003e (variant D) which temporarily displaced GII.4 lineages. \u0026nbsp;These studies highlighted antigenic changes in VP1 and alterations in HBGA binding as potential drivers of epidemic success, with\u0026nbsp;variants C and D exhibiting enhanced HBGA-binding affinity\u0026nbsp;\u003csup\u003e36–38\u003c/sup\u003e. Variant E retains conserved HBGA binding sites relative to variants C and D suggesting preservation of strong receptor engagement capacity. Notably, recent work has identified adaptive mutations in VP1, including the K361R substitution, that enhance HBGA binding and may contribute to increased fitness\u0026nbsp;\u003csup\u003e18\u003c/sup\u003e. As this substitution lies outside canonical HBGA binding interfaces, its mechanistic effect remains unclear but may reflect indirect structural adaptations influencing receptor engagement\u0026nbsp;\u003csup\u003e39\u003c/sup\u003e.\u003c/p\u003e\n\u003cp\u003eDespite providing a comprehensive analysis of GII.17 evolution to date, several limitations warrant consideration. First, despite substantial expansion over previous studies, our dataset remains subject to geographic and temporal sampling biases inherent to genomic surveillance with disproportionate representation from high-income regions. Undersampling in parts of Africa, South America and Asia may obscure additional cryptic diversity and transmission patterns. Second, fitness inference assumes that lineage frequency dynamics primarily reflect transmission fitness. While the concordance between inferred fitness and observed replacement patterns supports this interpretation, stochastic epidemiological processes and local transmission heterogeneity could influence estimates, particularly for poorly sampled lineages. Finally, selection analyses identify statistical signals of adaptation but do not directly measure functional effects. Although consistent with published phenotypic data, comprehensive experimental validation of putatively adaptive mutations remains an important direction for future work.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eTaken together, our findings suggest that viral emergence need not be driven by dramatic pre-adaptive shifts or repeated recombination but can arise through incremental fitness gains acquired during early transmission under altered selective constraints. The evolutionary history of GII.17 is shaped by historical recombination, lineage-specific shifts in inferred mutational processes, and repeated exploration of divergent evolutionary trajectories. This integrative framework may extend beyond noroviruses to other RNA viruses exhibiting episodic genotype replacement and highlights the value of combining phylodynamics, mutational analysis and quantitative fitness inference to understand pathogen emergence.\u003c/p\u003e"},{"header":"Methods","content":"\u003cp\u003e\u003cstrong\u003e\u003cem\u003eGII.17 genome sequences\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003ePublicly available GII.17 genomic sequences were downloaded from NCBI GenBank (retrieved on August 26, 2025) using the following search terms \u0026ldquo;norovirus\u0026rdquo; and \u0026ldquo;GII.17\u0026rdquo; with a length criteria greater than 1000bp. A total of 1,422 sequences were obtained, and genotyping was performed using the Norovirus Typing Tool v2.0 Tool \u003csup\u003e40\u003c/sup\u003e to confirm the genotype of each norovirus sequence. Sequences were split by ORF and only those covering at least 66% of the ORF was included. Sample metadata was extracted (collection date and country) for each sequence using the NCBI GenBank flatfile source feature where we fetched and parsed records in batches from NCBI\u0026rsquo;s nucleotide database. After filtering, 429 ORF1, 1,206 ORF2 and 474 ORF3 sequences were retained. Accession numbers are provided as \u003cstrong\u003eSupplementary Table 1\u003c/strong\u003e. All the sequences were multiple aligned using MAFFT (version 7.505) \u003csup\u003e41\u003c/sup\u003e and manually inspected for errors. Similarity plots were generated using SimPlot++ version 3.5.1 with a window size and step size of 200 and 20, respectively \u003csup\u003e42\u003c/sup\u003e.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eNewly generated genomes from the United Kingdom were obtained from samples using SureSelectXT target enrichment followed by Illumina sequencing as previously described \u003csup\u003e5\u003c/sup\u003e. Written informed consent was obtained from all subjects involved in the study. Newly generated genomes have been deposited into GenBank under the following accession numbers: PV920367-PV920384.\u0026nbsp;To mitigate ORF2 variant-D over-representation (938 sequences) and enable Bayesian inference, we used the Nextstrain pipeline to downsample to 460 sequences (max three per country\u0026ndash;month\u0026ndash;year). This downsampled set (728 sequences) was only for BEAST as all other analyses used the full dataset.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eEvolutionary analyses\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eMaximum likelihood phylogenetic trees rooted to non-GII.17 genotypes (GII.13) were inferred using\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIQ-Tree v. 2.1.2 \u003csup\u003e43\u003c/sup\u003eusing the best-fit nucleotide substitution model as determined by\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eModel Finder \u003csup\u003e44\u003c/sup\u003e. A root-to-regression approach in TempEst version 1.5.3 \u003csup\u003e45\u003c/sup\u003ewas used to evaluate the temporal signal in the phylogeny with outliers removed using the interquartile range method. Bayesian analyses were carried out using BEAST v1.10.4 \u003csup\u003e46\u003c/sup\u003e with the SRD06 nucleotide substitution model \u003csup\u003e47\u003c/sup\u003eimplemented with a four-category gamma distribution model of site-specific rate variation and separate partitions for codon position 1 plus 2 versus position 3 with the Hasegawa-Kishino-Yano (HKY) substitution model on each with an uncorrelated lognormal relaxed molecular clock \u003csup\u003e48\u003c/sup\u003eand a coalescent GMRF Bayesian skyride tree prior \u003csup\u003e49\u003c/sup\u003e. Model parameter estimates were evaluatedusing Tracer v1.7.2 \u003csup\u003e50\u003c/sup\u003e to ensure an effective sample size (ESS) value \u0026sup3; 200 indicating sufficientmixing and convergence. For each genomic region at least three independent Markov Chain Monte Carlo (MCMC) runs were performed and combined using the Log Combiner tool in the BEAST Package. Each chain consisted of 500,000,000 steps and was sampled every 50,000 steps and the first 10% of samples were discarded as the burn-in. \u0026nbsp;In all cases the sampling time (day/month/year) associated with the sequence was used in all cases.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eRecombination analyses\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe used two different approaches to identify recombination. First, we used 3SEQ \u003csup\u003e51\u003c/sup\u003eas a statistical test for recombination on our curated dataset as it has been found to be statistically one of the most powerful methods for identifying mosaic regions \u003csup\u003e52\u003c/sup\u003e. 3SEQ detects recombination by identifying mosaic sequence patterns that can be explained as a combination of two parental sequences, using an exact, non-parametric framework based on sequence triplets \u003csup\u003e52\u003c/sup\u003e. For each potential recombinant, 3SEQ evaluates whether the observed pattern of sitewise similarity to two candidate parent sequences deviates significantly from expectations under a strictly clonal evolutionary model. Statistical significance is assessed using an exact test that does not rely on phylogenetic tree reconstruction or breakpoint pre-specification, making the approach robust to alignment noise and substitution-rate heterogeneity. We tested all pairs of sequences from our curated datasets and report p values that are corrected with a Dunn-Šidák correction for the large number of triplets tested. Second, we used the genetic algorithm for recombination detection (GARD) to independently identify recombination breakpoints on genomes \u003csup\u003e53\u003c/sup\u003e. GARD was performed for each dataset using Hyphy v2.5.63 \u003csup\u003e54\u003c/sup\u003e.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eSelection analyses\u0026nbsp;\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eSelection analyses employed a suite of phylogenetic methods as implemented in HyPhy. Internal branches were annotated with the variant E or other tags using the LabelTree.bf script in HyPhy v2.5.63 \u003csup\u003e54\u003c/sup\u003e . Internal nodes were only labelled if all the descendants had the same label, otherwise it remained unlabelled. The MG94xREV model was used to estimate the mean omega for specific variant clades. We used four methods in HyPhy v2.5.63 to screen for evidence of natural selection on the focal lineage. The Branch-Site Unrestricted Statistical Test for Episodic Diversification (BUSTED[S]) method \u003csup\u003e55\u003c/sup\u003e applied to internal branches labelled variant E to seek gene-wide evidence of episodic diversifying selection in the variant E clade. Fixed Effects Likelihood (FEL) method \u003csup\u003e56\u003c/sup\u003e was applied to the same branches to identify individual codon sites evolving non-neutrally. The Mixed Effects Model of Evolution (MEME) method \u003csup\u003e57\u003c/sup\u003e was used to detect sites undergoing episodic diversifying selection on internal branches labelled variant E. The Contrast-FEL method \u003csup\u003e58\u003c/sup\u003e was applied to identify sites evolving under different selective pressures between focal (variant E) and reference clades. Finally, we compared the intensity of selective forces acting on the variant E clade using the RELAX method \u003csup\u003e59\u003c/sup\u003e. This method infers where selection strength (\u0026omega; ratios) differs between the focal and reference clades by estimating a scaling parameter (K). A value of K \u0026lt; 1 indicates relaxation of selection (weaker purifying or positive selection), while K \u0026gt; 1 suggests intensified selection relative to the reference branches. A likelihood ratio test (LRT) is used to compare a null model with equal selection intensity (K = 1) across all branches to an alternative model allowing K \u0026ne; 1 on the test branches, with statistical significance assessed using a chi-square distribution. In all cases we accounted for any spurious polymorphic variants and only included internal branches which are more likely to represent changes that have become fixed in the viral population, rather than transient intra-host variants or sequencing artifacts. Branches that were not part of the test or background are treated as a nuisance set.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eEstimation of relative fitness for each lineage\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe relative fitness dynamics of GII.17 variants were estimated using the model developed by Lefrancq and colleagues \u003csup\u003e19\u003c/sup\u003e. The VP1 maximum likelihood tree reconstructed from from above was time calibrated and rooted on the branch that minimized the squared deviation of the root-to-tip regression using the augur pipeline. The timescaled tree was used as input for automatic lineage detection and relative fitness estimation models with genome_length = 1634, mutation_rate = 2.1 x 10\u003csup\u003e-3\u0026nbsp;\u003c/sup\u003e(as calculated from augur), with timescale = 1 and wind = 180 days. Automatic lineage detection partitioned the tree into seven defined lineages. The multinomial logisitic fitness model was fit with the min_year = 2000 and window 180/365 to quantify the fitness of each lineage. We quantified the agreement between previously defined GII.17 variants and automatically inferred clades detected with phylowave using the adjusted rand index (ARI), which accounts for agreement expected by chance \u003csup\u003e20\u003c/sup\u003e. An ARI of 1 indicates a perfect concordance whereas values close to 0 indicate random assignment.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eReconstruction of variant mutational spectrum\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eMutational spectra were reconstructed using MutTui v2.0.2 \u003csup\u003e60\u003c/sup\u003e. This method performs ancestral reconstruction onto the phylogenetic tree using treetime v0.8.1 \u003csup\u003e61\u003c/sup\u003ewhich enables identification of the direction of each mutation. The mutational spectrum is then calculated by counting the numbers of each contextual mutation across the clade. Single nucleotide mutations are classified under the single base substitution (SBS) spectrum. Mutations occurring at two adjacent genome positions on the same phylogenetic branch are categorized as double base substitutions (DBS), while clusters of three or more adjacent mutations are excluded from analysis. The mutational spectra were rescaled to account for genome composition. The mutational spectrum for variant C only contained less than 300 mutations (Between 300 and 600 mutations has been suggested for the mutational spectrum to be accurately estimated \u003csup\u003e60\u003c/sup\u003eso we did not attempt to examine the detailed contextual patterns in these mutations.\u0026nbsp;\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eConflict of Interest:\u0026nbsp;\u003c/strong\u003eThe authors declare there are no competing interests.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData Availability:\u0026nbsp;\u003c/strong\u003eAccession numbers are provided as supplementary table 1. Newly sequenced genomes have been deposited in GenBank under the following accession numbers: PV920367-PV920384.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgments:\u0026nbsp;\u003c/strong\u003eThe authors wish to acknowledge UCL genomes at University College London for sequencing of the NoroPatrol samples (RRID:SCR_027010).\u0026nbsp;\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003ePires, S. M. \u003cem\u003eet al.\u003c/em\u003e Aetiology-Specific Estimates of the Global and Regional Incidence and Mortality of Diarrhoeal Diseases Commonly Transmitted through Food. \u003cem\u003ePLoS One\u003c/em\u003e \u003cstrong\u003e10\u003c/strong\u003e, e0142927 (2015).\u003c/li\u003e\n\u003cli\u003eBartsch, S. M., Lopman, B. A., Ozawa, S., Hall, A. J. \u0026amp; Lee, B. Y. Global Economic Burden of Norovirus Gastroenteritis. \u003cem\u003ePLoS One\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, e0151219 (2016).\u003c/li\u003e\n\u003cli\u003eChhabra, P. \u003cem\u003eet al.\u003c/em\u003e Updated classification of norovirus genogroups and genotypes. \u003cem\u003eJournal of General Virology\u003c/em\u003e \u003cstrong\u003e100\u003c/strong\u003e, 1393\u0026ndash;1406 (2019).\u003c/li\u003e\n\u003cli\u003eParra, G. I. Emergence of norovirus strains: A tale of two genes. \u003cem\u003eVirus Evol.\u003c/em\u003e \u003cstrong\u003e5\u003c/strong\u003e, (2019).\u003c/li\u003e\n\u003cli\u003eLindesmith, L. C. \u003cem\u003eet al.\u003c/em\u003e Immune Imprinting Drives Human Norovirus Potential for Global Spread. \u003cem\u003emBio\u003c/em\u003e \u003cstrong\u003e13\u003c/strong\u003e, (2022).\u003c/li\u003e\n\u003cli\u003eTohma, K., Lepore, C. J., Gao, Y., Ford-Siltz, L. A. \u0026amp; Parra, G. I. Population genomics of gii.4 noroviruses reveal complex diversification and new antigenic sites involved in the emergence of pandemic strains. \u003cem\u003emBio\u003c/em\u003e \u003cstrong\u003e10\u003c/strong\u003e, (2019).\u003c/li\u003e\n\u003cli\u003eMatsushima, Y. \u003cem\u003eet al.\u003c/em\u003e Genetic analyses of GII.17 norovirus strains in diarrheal disease outbreaks from december 2014 to march 2015 in Japan reveal a novel polymerase sequence and amino acid substitutions in the capsid region. \u003cem\u003eEurosurveillance\u003c/em\u003e \u003cstrong\u003e20\u003c/strong\u003e, 1\u0026ndash;6 (2015).\u003c/li\u003e\n\u003cli\u003eLu, J. \u003cem\u003eet al.\u003c/em\u003e Gastroenteritis Outbreaks Caused by Norovirus GII.17, Guangdong Province, China, 2014\u0026ndash;2015 - Volume 21, Number 7\u0026mdash;July 2015 - Emerging Infectious Diseases journal - CDC. \u003cem\u003eEmerg. Infect. Dis.\u003c/em\u003e \u003cstrong\u003e21\u003c/strong\u003e, 1240\u0026ndash;1242 (2015).\u003c/li\u003e\n\u003cli\u003eChan, M. C. W. \u003cem\u003eet al.\u003c/em\u003e Rapid emergence and predominance of a broadly recognizing and fast-evolving norovirus GII.17 variant in late 2014. \u003cem\u003eNature Communications 2015 6:1\u003c/em\u003e \u003cstrong\u003e6\u003c/strong\u003e, 1\u0026ndash;9 (2015).\u003c/li\u003e\n\u003cli\u003eRackoff, L. A., Bok, K., Green, K. Y. \u0026amp; Kapikian, A. Z. Epidemiology and Evolution of Rotaviruses and Noroviruses from an Archival WHO Global Study in Children (1976\u0026ndash;79) with Implications for Vaccine Design. \u003cem\u003ePLoS One\u003c/em\u003e \u003cstrong\u003e8\u003c/strong\u003e, e59394 (2013).\u003c/li\u003e\n\u003cli\u003eChhabra, P. \u003cem\u003eet al.\u003c/em\u003e Increased circulation of GII.17 noroviruses, six European countries and the United States, 2023 to 2024. \u003cem\u003eEuro Surveill.\u003c/em\u003e \u003cstrong\u003e29\u003c/strong\u003e, 2400625 (2024).\u003c/li\u003e\n\u003cli\u003eNational norovirus and rotavirus report, week 19 report: data to week 17 (data up to 27 April 2025) - GOV.UK. https://www.gov.uk/government/statistics/national-norovirus-and-rotavirus-surveillance-reports-2024-to-2025-season/national-norovirus-and-rotavirus-report-week-19-report-data-to-week-17-data-up-to-27-april-2025.\u003c/li\u003e\n\u003cli\u003eBarclay, L. \u0026amp; Vinj\u0026eacute;, J. Early Release - Increasing Predominance of Norovirus GII.17 over GII.4, United States, 2022\u0026ndash;2025 - Volume 31, Number 7\u0026mdash;July 2025 - Emerging Infectious Diseases journal - CDC. https://doi.org/10.3201/EID3107.250524 doi:10.3201/EID3107.250524.\u003c/li\u003e\n\u003cli\u003eParra, G. I. \u003cem\u003eet al.\u003c/em\u003e Static and Evolving Norovirus Genotypes: Implications for Epidemiology and Immunity. \u003cem\u003ePLoS Pathog.\u003c/em\u003e \u003cstrong\u003e13\u003c/strong\u003e, e1006136 (2017).\u003c/li\u003e\n\u003cli\u003eDinu, S., Oprea, M., Iordache, R. I., Rusu, L. C. \u0026amp; Usein, C. R. Genome characterisation of norovirus GII.P17-GII.17 detected during a large gastroenteritis outbreak in Romania in 2021. \u003cem\u003eArch. Virol.\u003c/em\u003e \u003cstrong\u003e168\u003c/strong\u003e, (2023).\u003c/li\u003e\n\u003cli\u003eGomes, K. A. \u003cem\u003eet al.\u003c/em\u003e Multi-Province Outbreak of Acute Gastroenteritis Linked to Potential Novel Lineage of GII.17 Norovirus in Argentina in 2024. \u003cem\u003eViruses\u003c/em\u003e \u003cstrong\u003e17\u003c/strong\u003e, (2025).\u003c/li\u003e\n\u003cli\u003eYang, J., Qi, Z., Chen, S. \u0026amp; Xiong, C. Epidemiological and molecular investigation of a norovirus GII.17 outbreak in a kindergarten in Shanghai, China. \u003cem\u003eDiagn. Microbiol. Infect. Dis.\u003c/em\u003e \u003cstrong\u003e114\u003c/strong\u003e, 117089 (2026).\u003c/li\u003e\n\u003cli\u003eTohma, K. \u003cem\u003eet al.\u003c/em\u003e GII.17 norovirus re-emerged in the 2020s as a result of dynamic and adaptive evolutionary processes. \u003cem\u003eNature Communications 2025 16:1\u003c/em\u003e \u003cstrong\u003e16\u003c/strong\u003e, 11596- (2025).\u003c/li\u003e\n\u003cli\u003eLefrancq, N. \u003cem\u003eet al.\u003c/em\u003e Learning the fitness dynamics of pathogens from phylogenies. \u003cem\u003eNature 2025 637:8046\u003c/em\u003e \u003cstrong\u003e637\u003c/strong\u003e, 683\u0026ndash;690 (2025).\u003c/li\u003e\n\u003cli\u003eHubert, L. \u0026amp; Arabie, P. Comparing partitions. \u003cem\u003eJ. Classif.\u003c/em\u003e \u003cstrong\u003e2\u003c/strong\u003e, 193\u0026ndash;218 (1985).\u003c/li\u003e\n\u003cli\u003eTohma, K., Lepore, C. J., Ford-Siltz, L. A. \u0026amp; Parra, G. I. Phylogenetic Analyses Suggest that Factors Other Than the Capsid Protein Play a Role in the Epidemic Potential of GII.2 Norovirus. \u003cem\u003emSphere\u003c/em\u003e \u003cstrong\u003e2\u003c/strong\u003e, (2017).\u003c/li\u003e\n\u003cli\u003eEden, J.-S., Tanaka, M. M., Boni, M. F., Rawlinson, W. D. \u0026amp; White, P. A. Recombination within the Pandemic Norovirus GII.4 Lineage. \u003cem\u003eJ. Virol.\u003c/em\u003e \u003cstrong\u003e87\u003c/strong\u003e, 6270 (2013).\u003c/li\u003e\n\u003cli\u003eSiebenga, J. J. \u003cem\u003eet al.\u003c/em\u003e Epochal Evolution of GGII.4 Norovirus Capsid Proteins from 1995 to 2006. \u003cem\u003eJ. Virol.\u003c/em\u003e \u003cstrong\u003e81\u003c/strong\u003e, 9932\u0026ndash;9941 (2007).\u003c/li\u003e\n\u003cli\u003eLindesmith, L. C. \u003cem\u003eet al.\u003c/em\u003e Mechanisms of GII.4 Norovirus Persistence in Human Populations. \u003cem\u003ePLoS Med.\u003c/em\u003e \u003cstrong\u003e5\u003c/strong\u003e, e31 (2008).\u003c/li\u003e\n\u003cli\u003eDuch\u0026ecirc;ne, S., Holmes, E. C. \u0026amp; Ho, S. Y. W. Analyses of evolutionary dynamics in viruses are hindered by a time-dependent bias in rate estimates. \u003cem\u003eProceedings of the Royal Society B: Biological Sciences\u003c/em\u003e \u003cstrong\u003e281\u003c/strong\u003e, (2014).\u003c/li\u003e\n\u003cli\u003eLu, J. \u003cem\u003eet al.\u003c/em\u003e The Evolution and Transmission of Epidemic GII.17 Noroviruses. \u003cem\u003eJ. Infect. Dis.\u003c/em\u003e \u003cstrong\u003e214\u003c/strong\u003e, 556 (2016).\u003c/li\u003e\n\u003cli\u003eBruggink, L. D., Garcia-Clapes, A., Tran, T., Druce, J. D. \u0026amp; Thorley, B. R. Decreased incidence of enterovirus and norovirus infections during the COVID-19 pandemic, Victoria, Australia, 2020. \u003cem\u003eCommun. Dis. Intell.\u003c/em\u003e \u003cstrong\u003e45\u003c/strong\u003e, (2021).\u003c/li\u003e\n\u003cli\u003eLennon, R. P. \u003cem\u003eet al.\u003c/em\u003e Norovirus Infections Drop 49% in the United States with Strict COVID-19 Public Health Interventions. \u003cem\u003eActa Med. Acad.\u003c/em\u003e \u003cstrong\u003e49\u003c/strong\u003e, 278\u0026ndash;280 (2020).\u003c/li\u003e\n\u003cli\u003eDouglas, A. \u003cem\u003eet al.\u003c/em\u003e Impact of COVID-19 on national surveillance of norovirus in England and potential risk of increased disease activity in 2021. \u003cem\u003eJournal of Hospital Infection\u003c/em\u003e \u003cstrong\u003e112\u003c/strong\u003e, 124\u0026ndash;126 (2021).\u003c/li\u003e\n\u003cli\u003eO\u0026rsquo;Reilly, K. M. \u003cem\u003eet al.\u003c/em\u003e Predicted norovirus resurgence in 2021\u0026ndash;2022 due to the relaxation of nonpharmaceutical interventions associated with COVID-19 restrictions in England: a mathematical modeling study. \u003cem\u003eBMC Med.\u003c/em\u003e \u003cstrong\u003e19\u003c/strong\u003e, 1\u0026ndash;10 (2021).\u003c/li\u003e\n\u003cli\u003eEpifanova, N. V. \u003cem\u003eet al.\u003c/em\u003e Appearance and spread of norovirus genotype GII.17 subcluster C2 (Romania-2021 like) in Nizhny Novgorod, Russia, 2021\u0026ndash;2023. \u003cem\u003eArch. Virol.\u003c/em\u003e \u003cstrong\u003e170\u003c/strong\u003e, 1\u0026ndash;9 (2025).\u003c/li\u003e\n\u003cli\u003eMinin, V. N., Bloomquist, E. W. \u0026amp; Suchard, M. A. Smooth Skyride through a Rough Skyline: Bayesian Coalescent-Based Inference of Population Dynamics. \u003cem\u003eMol. Biol. Evol.\u003c/em\u003e \u003cstrong\u003e25\u003c/strong\u003e, 1459\u0026ndash;1471 (2008).\u003c/li\u003e\n\u003cli\u003eGomes, K. A. \u003cem\u003eet al.\u003c/em\u003e Multi-Province Outbreak of Acute Gastroenteritis Linked to Potential Novel Lineage of GII.17 Norovirus in Argentina in 2024. \u003cem\u003eViruses 2025, Vol. 17,\u003c/em\u003e \u003cstrong\u003e17\u003c/strong\u003e, (2025).\u003c/li\u003e\n\u003cli\u003eChen, C. \u003cem\u003eet al.\u003c/em\u003e Molecular evolution of GII.P17-GII.17 norovirus associated with sporadic acute gastroenteritis cases during 2013\u0026ndash;2018 in Zhoushan Islands, China. \u003cem\u003eVirus Genes\u003c/em\u003e \u003cstrong\u003e56\u003c/strong\u003e, 279\u0026ndash;287 (2020).\u003c/li\u003e\n\u003cli\u003eDas Neves Costa, L. C. P. \u003cem\u003eet al.\u003c/em\u003e Molecular and evolutionary characterization of norovirus GII.17 in the northern region of Brazil. \u003cem\u003eBMC Infectious Diseases 2019 19:1\u003c/em\u003e \u003cstrong\u003e19\u003c/strong\u003e, 1021- (2019).\u003c/li\u003e\n\u003cli\u003eJin, M. \u003cem\u003eet al.\u003c/em\u003e Characterization of the new GII.17 norovirus variant that emerged recently as the predominant strain in China. \u003cem\u003eJournal of General Virology\u003c/em\u003e \u003cstrong\u003e97\u003c/strong\u003e, 2620\u0026ndash;2632 (2016).\u003c/li\u003e\n\u003cli\u003eZhang, X. F. \u003cem\u003eet al.\u003c/em\u003e An outbreak caused by GII.17 norovirus with a wide spectrum of HBGA-associated susceptibility. \u003cem\u003eSci. Rep.\u003c/em\u003e \u003cstrong\u003e5\u003c/strong\u003e, 1\u0026ndash;10 (2015).\u003c/li\u003e\n\u003cli\u003eEstienney, M. \u003cem\u003eet al.\u003c/em\u003e Epidemiological Impact of GII.17 Human Noroviruses Associated With Attachment to Enterocytes. \u003cem\u003eFront. Microbiol.\u003c/em\u003e \u003cstrong\u003e13\u003c/strong\u003e, 858245 (2022).\u003c/li\u003e\n\u003cli\u003eQian, Y. \u003cem\u003eet al.\u003c/em\u003e Structural Adaptations of Norovirus GII.17/13/21 Lineage through Two Distinct Evolutionary Paths. \u003cem\u003eJ. Virol.\u003c/em\u003e \u003cstrong\u003e93\u003c/strong\u003e, e01655-18 (2018).\u003c/li\u003e\n\u003cli\u003eKroneman, A. \u003cem\u003eet al.\u003c/em\u003e An automated genotyping tool for enteroviruses and noroviruses. \u003cem\u003eJournal of Clinical Virology\u003c/em\u003e \u003cstrong\u003e51\u003c/strong\u003e, 121\u0026ndash;125 (2011).\u003c/li\u003e\n\u003cli\u003eKatoh, K. \u0026amp; Standley, D. M. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. \u003cem\u003eMol. Biol. Evol.\u003c/em\u003e \u003cstrong\u003e30\u003c/strong\u003e, 772\u0026ndash;780 (2013).\u003c/li\u003e\n\u003cli\u003eSamson, S., Lord, \u0026Eacute;. \u0026amp; Makarenkov, V. SimPlot++: a Python application for representing sequence similarity and detecting recombination. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e38\u003c/strong\u003e, 3118\u0026ndash;3120 (2022).\u003c/li\u003e\n\u003cli\u003eMinh, B. Q. \u003cem\u003eet al.\u003c/em\u003e IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. \u003cem\u003eMol. Biol. Evol.\u003c/em\u003e \u003cstrong\u003e37\u003c/strong\u003e, 1530\u0026ndash;1534 (2020).\u003c/li\u003e\n\u003cli\u003eKalyaanamoorthy, S., Minh, B. Q., Wong, T. K. F., von Haeseler, A. \u0026amp; Jermiin, L. S. ModelFinder: fast model selection for accurate phylogenetic estimates. \u003cem\u003eNat. Methods\u003c/em\u003e \u003cstrong\u003e14\u003c/strong\u003e, 587\u0026ndash;589 (2017).\u003c/li\u003e\n\u003cli\u003eRambaut, A., Lam, T. T., Carvalho, L. M. \u0026amp; Pybus, O. G. Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen). \u003cem\u003eVirus Evol.\u003c/em\u003e \u003cstrong\u003e2\u003c/strong\u003e, (2016).\u003c/li\u003e\n\u003cli\u003eSuchard, M. A. \u003cem\u003eet al.\u003c/em\u003e Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. \u003cem\u003eVirus Evol.\u003c/em\u003e \u003cstrong\u003e4\u003c/strong\u003e, (2018).\u003c/li\u003e\n\u003cli\u003eShapiro, B., Rambaut, A. \u0026amp; Drummond, A. J. Choosing Appropriate Substitution Models for the Phylogenetic Analysis of Protein-Coding Sequences. \u003cem\u003eMol. Biol. Evol.\u003c/em\u003e \u003cstrong\u003e23\u003c/strong\u003e, 7\u0026ndash;9 (2006).\u003c/li\u003e\n\u003cli\u003eDrummond, A. J., Ho, S. Y. W., Phillips, M. J. \u0026amp; Rambaut, A. Relaxed Phylogenetics and Dating with Confidence. \u003cem\u003ePLoS Biol.\u003c/em\u003e \u003cstrong\u003e4\u003c/strong\u003e, e88 (2006).\u003c/li\u003e\n\u003cli\u003eMinin, V. N., Bloomquist, E. W. \u0026amp; Suchard, M. A. Smooth Skyride through a Rough Skyline: Bayesian Coalescent-Based Inference of Population Dynamics. \u003cem\u003eMol. Biol. Evol.\u003c/em\u003e \u003cstrong\u003e25\u003c/strong\u003e, 1459\u0026ndash;1471 (2008).\u003c/li\u003e\n\u003cli\u003eRambaut, A., Drummond, A. J., Xie, D., Baele, G. \u0026amp; Suchard, M. A. Posterior summarization in Bayesian phylogenetics using Tracer 1.7. \u003cem\u003eSyst. Biol.\u003c/em\u003e \u003cstrong\u003e67\u003c/strong\u003e, 901\u0026ndash;904 (2018).\u003c/li\u003e\n\u003cli\u003eLam, H. M., Ratmann, O. \u0026amp; Boni, M. F. Improved Algorithmic Complexity for the 3SEQ Recombination Detection Algorithm. \u003cem\u003eMol. Biol. Evol.\u003c/em\u003e \u003cstrong\u003e35\u003c/strong\u003e, 247\u0026ndash;251 (2018).\u003c/li\u003e\n\u003cli\u003eBoni, M. F., Posada, D. \u0026amp; Feldman, M. W. An Exact Nonparametric Method for Inferring Mosaic Structure in Sequence Triplets. \u003cem\u003eGenetics\u003c/em\u003e \u003cstrong\u003e176\u003c/strong\u003e, 1035\u0026ndash;1047 (2007).\u003c/li\u003e\n\u003cli\u003ePond, S. L. K., Posada, D., Gravenor, M. B., Woelk, C. H. \u0026amp; Frost, S. D. W. Automated Phylogenetic Detection of Recombination Using a Genetic Algorithm. \u003cem\u003eMol. Biol. Evol.\u003c/em\u003e \u003cstrong\u003e23\u003c/strong\u003e, 1891\u0026ndash;1901 (2006).\u003c/li\u003e\n\u003cli\u003eKosakovsky Pond, S. L. \u003cem\u003eet al.\u003c/em\u003e HyPhy 2.5\u0026mdash;A Customizable Platform for Evolutionary Hypothesis Testing Using Phylogenies. \u003cem\u003eMol. Biol. Evol.\u003c/em\u003e \u003cstrong\u003e37\u003c/strong\u003e, 295\u0026ndash;299 (2020).\u003c/li\u003e\n\u003cli\u003eMurrell, B. \u003cem\u003eet al.\u003c/em\u003e Gene-Wide Identification of Episodic Selection. \u003cem\u003eMol. Biol. Evol.\u003c/em\u003e \u003cstrong\u003e32\u003c/strong\u003e, 1365 (2015).\u003c/li\u003e\n\u003cli\u003eKosakovsky Pond, S. L. \u0026amp; Frost, S. D. W. Not So Different After All: A Comparison of Methods for Detecting Amino Acid Sites Under Selection. \u003cem\u003eMol. Biol. Evol.\u003c/em\u003e \u003cstrong\u003e22\u003c/strong\u003e, 1208\u0026ndash;1222 (2005).\u003c/li\u003e\n\u003cli\u003eMurrell, B. \u003cem\u003eet al.\u003c/em\u003e Detecting Individual Sites Subject to Episodic Diversifying Selection. \u003cem\u003ePLoS Genet.\u003c/em\u003e \u003cstrong\u003e8\u003c/strong\u003e, e1002764 (2012).\u003c/li\u003e\n\u003cli\u003eKosakovsky Pond, S. L., Wisotsky, S. R., Escalante, A., Magalis, B. R. \u0026amp; Weaver, S. Contrast-FEL\u0026mdash;A Test for Differences in Selective Pressures at Individual Sites among Clades and Sets of Branches. \u003cem\u003eMol. Biol. Evol.\u003c/em\u003e \u003cstrong\u003e38\u003c/strong\u003e, 1184\u0026ndash;1198 (2021).\u003c/li\u003e\n\u003cli\u003eWertheim, J. O., Murrell, B., Smith, M. D., Pond, S. L. K. \u0026amp; Scheffler, K. RELAX: Detecting Relaxed Selection in a Phylogenetic Framework. \u003cem\u003eMol. Biol. Evol.\u003c/em\u003e \u003cstrong\u003e32\u003c/strong\u003e, 820\u0026ndash;832 (2015).\u003c/li\u003e\n\u003cli\u003eRuis, C., Tonkin-Hill, G., Floto, R. A. \u0026amp; Parkhill, J. Calculating and applying pathogen mutational spectra using MutTui. \u003cem\u003ebioRxiv\u003c/em\u003e 2023.06.15.545111 (2023) doi:10.1101/2023.06.15.545111.\u003c/li\u003e\n\u003cli\u003eSagulenko, P., Puller, V. \u0026amp; Neher, R. A. TreeTime: Maximum-likelihood phylodynamic analysis. \u003cem\u003eVirus Evol.\u003c/em\u003e \u003cstrong\u003e4\u003c/strong\u003e, (2018).\u003c/li\u003e\n\u003c/ol\u003e"},{"header":"Tables","content":"\u003cp\u003e\u003cstrong\u003eTable 1. Overview of selection among GII.17 variant E internal branches.\u0026nbsp;\u003c/strong\u003eS: the number of codon sites in the alignment.\u0026nbsp;⍵: mean estimate on variant E clade internal branches (MG94XREV model). Sites under positive selection have been inferred using MEME, negative selection - FEL, intensified:⍵\u0026nbsp;variant E \u0026gt;⍵\u0026nbsp;other - Contrast-FEL. Relax reports the p-value and the intensification/relaxation parameter for overall selective pressure on the variant E branches relative to the reference clade branches. N.S: not significant.\u0026nbsp;\u003c/p\u003e\n\u003ctable border=\"0\" cellspacing=\"0\" cellpadding=\"0\" width=\"566\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"2\"\u003e\n \u003cp\u003e\u003cstrong\u003eORF\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\"\u003e\n \u003cp\u003e\u003cstrong\u003eGene\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\"\u003e\n \u003cp\u003e\u003cstrong\u003eS\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\"\u003e\n \u003cp\u003e\u003cstrong\u003e⍵\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd colspan=\"3\"\u003e\n \u003cp\u003e\u003cstrong\u003eSites in variant E evolving under\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e(P\u0026lt; 0.05)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\"\u003e\n \u003cp\u003e\u003cstrong\u003eRELAX, K (p)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e\u003cstrong\u003epositive\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e\u003cstrong\u003enegative\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e\u003cstrong\u003eintensified\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"6\"\u003e\n \u003cp\u003eORF1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003ep48\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e350\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e0.175\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e12\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.68 (0.020)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003eNTPase\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e366\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e0.093\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e10\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e0.42 (N.S)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003ep22\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e179\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e0.169\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e0.34 (N.S)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003eVPg\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e133\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e0.032\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e9\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e0.79 (N.S)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003ePro\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e181\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e0.011\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e7\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.36 (0.020)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003eRdRp\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e510\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e0.084\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e36\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e0.52 (N.S)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"2\" valign=\"bottom\"\u003e\n \u003cp\u003eORF1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e1715\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e0.1006\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e11\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e91\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e0.53 \u003cstrong\u003e(0.011)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003eORF2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003eVP1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e544\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e0.079\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e44\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e0.63 (0.088)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003eORF3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003eVP2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e266\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e0.346\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.35 (0.0004)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003e\u003cstrong\u003eTable 2. Genome wide lineage specific mutations found within GII.17 variant E.\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\" width=\"639\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eProtein\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003ePosition\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eVariant C amino acid\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eVariant E amino acid\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eFrequency in variant E\u003csup\u003e1\u003c/sup\u003e\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eFrequency in variant C\u003csup\u003e1\u003c/sup\u003e\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eMaximum frequency in other lineages\u003csup\u003e2\u003c/sup\u003e\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003ep22\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e120\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eT\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eS\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e99.3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003ep22\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e131\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eI\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eV\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e99.3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eVPg\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e76\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eS\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eN\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e98.5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e4.3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e4.3\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003epolymerase\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e102\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eQ\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eE\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e98.5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eVP1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e144\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eL\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eI\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e99.3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eVP1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e361\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eQ\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e98.7\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eVP1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e372\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eK\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e97.4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e2.3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e2.3\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eVP1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e384\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eK\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e100\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eVP1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e409\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eL\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eV\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e99.3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e4.5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e4.5\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eVP1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e435\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eF\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eS\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e99.3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.9\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eVP1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e447\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eV\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eI\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e98\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.5\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eVP2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e144\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eG\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e99.3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003e\u003csup\u003e1\u0026nbsp;\u003c/sup\u003eFrequency is denoted as percentage of variant sequences containing a particular amino acid\u003c/p\u003e\n\u003cp\u003e\u003csup\u003e2\u003c/sup\u003e Maximum frequency is the percentage of sequences other than variant C that contain that amino acid\u003c/p\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"nature-portfolio","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Nature Portfolio","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"ejp","reportingPortfolio":"","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"Norovirus, viral evolution, viral emergence, phylodynamics, fitness","lastPublishedDoi":"10.21203/rs.3.rs-8923591/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8923591/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Norovirus genotype GII.4 has dominated global gastroenteritis outbreaks for decades, limiting sustained emergence of alternative genotypes. Recent surveillance, however, shows rapid expansion of GII.17, which has overtaken GII.4 in multiple regions. Here, we reconstruct five decades of GII.17 evolution using a global genomic dataset spanning 1976–2025. We identify a single ancestral recombination event underlying the epidemic C, D and E clades, providing the genomic foundation for subsequent epidemic diversification. Contrary to expectations of pre-adaptive change, we find no evidence of intensified stem-lineage selection preceding emergence. Instead, variant E accumulated lineage-defining substitutions under relaxed selective constraints during early transmission, accompanied by shifts in mutational processes and a substantial transmission fitness advantage over prior variants. Together, these findings demonstrate that incremental post-emergence fitness gains, rather than major antigenic shifts or recurrent recombination, can enable non-dominant genotypes to overcome entrenched epidemiological barriers and drive global lineage replacement.","manuscriptTitle":"Fitness-driven emergence and lineage replacement underpin the global resurgence of GII.17 noroviruses","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-03-31 11:12:35","doi":"10.21203/rs.3.rs-8923591/v1","editorialEvents":[],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"nature-communications","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"NCOMMS","sideBox":"Learn more about [Nature Communications](http://www.nature.com/ncomms/)","snPcode":"","submissionUrl":"https://mts-ncomms.nature.com/","title":"Nature Communications","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Nature Communications","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"607c1f74-c554-4b18-be78-c150bab98e0a","owner":[],"postedDate":"March 31st, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":63983696,"name":"Biological sciences/Microbiology/Virology/Viral evolution"},{"id":63983697,"name":"Biological sciences/Evolution/Evolutionary genetics"}],"tags":[],"updatedAt":"2026-03-31T11:12:35+00:00","versionOfRecord":[],"versionCreatedAt":"2026-03-31 11:12:35","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8923591","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8923591","identity":"rs-8923591","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall: last seen: 2026-05-24T02:00:01.246996+00:00

License: CC-BY-4.0