SARS-CoV-2 Intra-host Variation Shows Evidence of Transmission and Convergent Evolution in a University Surveillance Cohort

preprint OA: closed CC-BY-4.0
Full text 2,722 characters · extracted from oa-html · click to expand
ABSTRACT Monitoring and understanding the transmission and evolution of SARS-CoV-2 remains a significant pub-lic health priority. Within-host genetic variation provides insight into viral evolution during infection and may help infer transmission events. In this study, we analyzed intrahost variation in SARS-CoV-2 genome sequences from Boston University’s testing mandate. Focusing on intrahost single nucleotide variants (iSNVs), we inferred transmission events and assessed the selective forces shaping within-host viral evolution. To minimize false-positive iSNVs resulting from systematic biases, we implemented stringent data filtering and developed a heuristic to exclude contamination-derived artifacts arising from batched sequencing. We find that intrahost variation is limited and infrequently transmitted during acute infections, suggesting that shared iSNVs serve as highly specific but insensitive markers of transmission. We also observed incomplete purifying selection shaping within-host diversity, with the loci most affected changing among variants of concern. Finally, we identified a highly recurrent iSNV (G11083T) which may represent a site of positive selection. Our results highlight that within host variation provides insight towards within host pathogen evolution, in spite of a limited use towards genomic epidemiology. IMPACT STATEMENT SARS-CoV-2 is the most extensively sequenced pathogen to date, yet much of its genomic data remains underutilized. Intrahost variation, in particular, is less studied than consensus-level variation, partly because most datasets lack technical sequencing replicates to control false-positive signals. Using genomic data from a university testing mandate and applying rigorous filtering to systematically minimize false-positive iSNVs in a data-driven manner, we obtain insights into SARS-CoV-2 evolution and transmission from intrahost variation. Our work underscores the potential to use existing, large-scale datasets to better understand pathogen evolution in situ. DATA SUMMARY All sequence data have been deposited in the Sequence Read Archive (SRA) of the national center for biotechnology information (NCBI), under the project accession number PRJNA892225. All code is open-access and available in GitHub at: https://github.com/Leacavalli/Sars-cov-2-Intrahost-Variation. Any additional supporting data has been provided within the article. Competing Interest Statement W.P.H declares receipt of compensation for service on advisory boards for Shionogi Inc., Pfizer Vaccines, and Merck Vaccines. He also declares speaker fees received from Shionogi Inc. and is a consultant for Biobot Analytics. Footnotes Sequence data accession numbers: PRJNA892225

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-05-24T02:00:01.246996+00:00
License: CC-BY-4.0