The Genomic Landscape of Head and Neck Cancer-associated Streptococci | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article The Genomic Landscape of Head and Neck Cancer-associated Streptococci Linh Mai, George Bouras, Kenny Yeo, John-Charles Hodge, Emma Barry, and 6 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8400357/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Disruption of the oral microbiome is increasingly implicated in head and neck cancer (HNC), yet the genomic adaptations that commensal bacteria acquire in tumour-associated environments remain unclear. We performed genome-resolved analyses of 101 complete Streptococcus genomes from the tumours and oral cavities of 31 HNC patients. Phylogenomic analysis identified 35 species, including ten novel species belonging to the Mitis group. The Streptococcus genus shared 29 core genes, with analysis of accessory genomes (1.7-2.5 Mbp) showing extensive horizontal gene transfer (HGT), supported by 245 ICE clusters, 82 prophages and 4 plasmid groups. Comparison with 391 published genomes from the oral cavities of healthy individuals showed that tumour-associated isolates exhibited niche-specific expansions of carbohydrate-active enzymes and enrichment of genes involved in sugar transport, thiamine biosynthesis and antimicrobial resistance. Together, these findings reveal distinct HGT-driven genomic remodelling in tumour-associated Streptococcus and provide the first comprehensive genomic resource for examining microbiome adaptation in HNC. Graphical Abstract Biological sciences/Cancer Biological sciences/Computational biology and bioinformatics Biological sciences/Genetics Biological sciences/Microbiology Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Highlights In-depth analyses of 101 complete Streptococcus genomes from Head and Neck Cancer patients. Genomic analysis reveals extensive diversity within the accessory genome. Evidence of intra- and inter-species horizontal gene transfer. Unique genomic features identified in HNC-associated isolates compared to those from healthy donors. Introduction Head and neck cancer (HNC) leads to 450,000 worldwide deaths per year and consists of tumours found in the oral cavity, pharynx and larynx 1-3 . Lifestyle factors such as tobacco and alcohol consumption, and human papillomavirus (HPV) are well-accepted risk factors for disease pathology however, recent studies highlight the contributions of the local microbiota in tumour behaviour and treatment response 4-6 . The mucosal surfaces of the oral cavity support more than 700 bacterial species 7-9 with Streptococci predominating and accounting for roughly 20–50% of the healthy oral microbiota 8,10-12 . These bacteria not only colonise the oral cavity but also play a crucial role in human health. Commensal Streptococcus species, such as S. mitis, S. sanguinis, S. gordonii, and S. salivarius , occupy ecological niches within the oral cavity and produce bacteriocins that suppress pathogen outgrowth 10,13-15 . However, certain Streptococcus can display pathogenic behaviour under conditions of microbial dysbiosis or immune compromise, such as within the tumour microenvironment (TME) 16,17 . For instance, S. anginosus has been associated with oral and oesophageal cancers, while S. mitis and S. parasanguinis have been shown to modulate immune signalling within the TME 16,18-25 . Collectively, these findings suggest that the pathogenic potential of commensal Streptococcus species can be unmasked by alterations in host or environmental conditions. Despite their abundance and intimate association with oral epithelial surfaces, the precise mechanisms by which commensal Streptococcus species contribute to the development of HNC remain poorly understood. Although certain Streptococcus species have been implicated in carcinogenesis, it is unclear whether tumour-associated isolates acquire genetic adaptations that facilitate survival within the TME and influence tumour progression. The application of microbial whole-genome sequencing (WGS) has enabled the investigation of genomic adaptations that lead to lineage-specific traits and functional capacities impacting health and disease outcomes 26 . While previous studies have employed WGS to characterise the genomic composition of Streptococcus isolates from healthy oral microbiomes 26 , to the best of our knowledge, no study has yet to comprehensively analysed the genomes of HNC-associated Streptococcus strains. In this study, we performed a genome-resolved analysis of Streptococcus isolates obtained from the tumours and oral cavities of HNC patients. By integrating species-level classification with pangenome profiling and functional annotation, we identified lineage-specific traits and accessory gene functions that may underlie ecological adaptation and tumour colonisation. Materials and Methods Study Population and Sample Collection. Microbial isolates were collected from 31 HNC patients, at three hospitals in Adelaide, Australia, between 2021 and 2022 ( Supplementary Data S1 ). Before surgery, all participants provided written informed consent, following the ethical guidelines of the Central Adelaide Local Health Network’s ethics committee (CALHN Ref No. 14116). All samples were coded and anonymised before use. Samples included tumour tissue harvested from resected cancer lesions and oral cavity swabs collected from the buccal mucosa and adjacent tissues. All samples were promptly transported to the laboratory in sterile containers on ice and processed on the same day of collection to preserve microbial viability. Tumour tissue was stored in completed DMEM media with 10% Foetal Bovine Serum (FBS) for later isolation. Bacterial Isolation from Tumour Tissues Tissues were rinsed three times with DMEM (10% FBS) to remove blood and then dispersed with a blade to increase surface area. Minced tissues were digested using a standard protocol of collagenase to degrade the extracellular matrix and release microbial communities 27 . Briefly, tissue was treated with a final working concentration of 1000 U/ml of collagenase IV digestion solution (3 ml of sterile phosphate-buffered saline (PBS) with Collagenase IV, 30 μl of 100 mM CaCl2, 90 μl of 1% BSA) mixed at 37°C for 60 mins with continuous shaking at 100 rpm. After incubation, the tumour digest was filtered through a 40 µm cell strainer to remove large tissue fragments. The microbial cells enrichment filtrate was centrifuged at 3000 x g for 10 minutes to pellet bacterial cells. Eventually, the bacterial pellet was resuspended in PSB and streaked onto Remel Wilkins-Chalgren Agar (WCA) dishes 28 . Dishes were incubated in anaerobic chambers with an oxygen-free environment at 37°C for 24 to 48 hours. A single colony was selected based on morphology for further subculture, until a pure colony was obtained. Isolated bacterial strains were cryostored at –80°C in glycerol stock (20%, v/v). Aerobic Bacterial Isolation from Oral Swabs. Oral cavity swabs, within 2 hours after collecting from 31 HNC patients, were directly inoculated onto a range of selective and non-selective agar media, including Tryptic soy agar (TSA), Luria-Bertani agar (LBA), Sheep Blood agar (SBA), Nutrient agar (NA) and Brain Heart Infusion (BHI) Agar. The plates were incubated aerobically at 37°C for 16 to 24 hours. After incubation, single bacterial colonies were subcultured onto fresh petri dishes based on morphological characteristics. After a single isolate was confirmed, bacteria were stored at –80 °C in glycerol stock (20%, v/v) for further analysis. DNA Extraction and Sequencing Bacterial isolates were cultured overnight, and the pellets were collected after centrifugation at 3000 × g for 10 minutes. Genomic DNA was extracted using the DNeasy Blood & Tissue Kit (Qiagen, Hilden, Germany) following the manufacturer’s protocol. The concentration and purity of extracted DNA were assessed using a NanoDrop spectrophotometer (Thermo Fisher Scientific, Wilmington, DE, USA). Samples with an A260/A280 ratio between 1.8 and 2.0 were considered suitable for downstream applications. Whole Genome Sequencing- Long Read Sequencing (LRS) Extracted DNA from all isolates was prepared for WGS using the MinION platform with R9.4.1 flow cells (Oxford Nanopore Technologies, Oxford, UK). Library preparation was performed using the Rapid Barcoding Kit 96 (SQK-RBK110.96; Oxford Nanopore Technologies Ltd.) according to the manufacturer’s instructions. DNA concentrations were measured using the Qubit 4 Fluorometer with the dsDNA High Sensitivity Assay Kit (Thermo Fisher Scientific, Waltham, MA, USA). Base calling was conducted with Guppy (v6.2.11) in super accuracy mode using the dna_r9.4.1_450bps_sup.cfg configuration file. Whole Genome Sequencing - Short Read Sequencing (SRS) Following taxonomic identification using ONT LRS, isolates belonging to the Streptococcus genus were selected for SRS-WGS through the Australian Genome Research Facility (AGRF). Genomic DNA was enzymatically fragmented, and sequencing libraries were prepared using the Nextera XT DNA Library Prep Kit with unique dual indices added via low-cycle PCR. Libraries were purified, quantified, and manually normalised before sequencing. Paired-end sequencing (2 × 150 bp) was performed on the Illumina NovaSeq 6000 platform using S4 flow cell chemistry. Image analysis and base calling were performed in real time using NovaSeq Control Software (NCS v1.2.0.28691) and Real-Time Analysis (RTA v4.6.7). Raw base call (BCL) files were converted to FASTQ format using the Illumina DRAGEN BCL Convert pipeline (v4.0.3). All sequencing data met AGRF quality control standards. Final read quality was evaluated using FastQC (v0.11.9). Assembly, Annotation and Taxonomic Classification Genome assemblies were generated using Hybracter v0.7.3, using `hybracter long` is only long reads were available and `hybracter hybrid` if both long and short reads were 29 . Specifically, Flye v2.9.3 was used for assemble the chromosomes with Dnaapler v0.7.0 used for consistent reorientation to ensure all genomes began with the dnaA gene 30,31 . Plassembler v1.5.1 was used to assemble all plasmids 32 . Genome completeness was checked with CheckM2 v1.0.2 33 . Taxonomic assignment was conducted using GTDB-tk v2.4.0 using the `gtdbtk classify_wf` command 34 . Genome annotation was then conducted with Bakta v1.9.4 using the full Bakta database 35 . Core genome phylogenetic reconstruction Phylogenetic relationships among the 101 dereplicated Streptococcus isolates were inferred from the concatenated alignment of core genes identified by Panaroo v1.3.0 in strict mode (–clean-mode strict; –core_threshold 0.99) 36 . Coding sequences for each core gene cluster were aligned using MAFFT v7.505, as implemented in the panaroo-msa utility, and concatenated into a single core gene alignment 37 . The resulting multiple sequence alignment was analysed with IQ-TREE v3.0.1 under the best-fit substitution model selected by ModelFinder (-m MFP) 38 . Branch support was assessed using 1,000 ultrafast bootstrap replicates (-bb 1000) and 1,000 SH-aLRT replicates (-alrt 1000). The final maximum likelihood tree was visualised and annotated in iTOL v6 online platform (https://itol.embl.de), with tip labels and colour coding based on species-level taxonomy (GTDB assignments) and anatomical source (oral cavity or tumour) 39 . Gene content-based clustering and UMAP visualization To assess genomic relationships based on accessory gene content, a binary matrix representing the presence/absence of protein-coding genes was generated from the pangenome analysis using Panaroo. Uniform Manifold Approximation and Projection (UMAP) was applied for dimensionality reduction and visualization. K-means clustering was then used to define isolate clusters. The optimal number of clusters (k) was determined using the elbow method, by plotting the within-cluster sum of squares (WSS) across k = 1 to 15. A sharp inflection point at k = 7 indicated the optimal number of clusters, consistent with standard heuristics for WSS minimization 40,41 . Carbohydrate-active enzyme (CAZyme) annotation Carbohydrate-active enzymes (CAZymes) was identified to characterise the glycan degradation, synthesis, and binding capacities of Streptococcus isolates 42 . Protein sequences from each genome were queried against the CAZy database using DIAMOND (v2.0.13.151) in blastp mode with default parameters for functional annotation of CAZyme families 43 . Gene family counts for CAZymes was quantified per isolate, and species-level profiles were generated by summing gene counts across isolates. These data were used to compare glycan utilisation potential between species and to investigate associations with tumour-associated ecological adaptation. Niche-specific and species-resolved gene enrichment analysis We compared gene presence/absence profiles between Streptococcus isolates from tumour and oral cavity niches across the HNC cohort to identify niche-enriched genes. Gene presence frequencies were calculated for each niche, and Fisher’s exact tests were performed on 2×2 contingency tables for each gene. For the whole-cohort analysis, genes were retained if they showed an absolute frequency difference > 0.3 between niches and a raw p < 0.05, with a minimum prevalence of 10% in at least one niche. Gene functional annotations were retrieved from genome annotation outputs and curated using UniProtKB for known protein names and functional domains. For the species-resolved recurrent enrichment analysis, enrichment tests were performed separately for each species with ≥ 2 tumour and ≥ 2 oral isolates. Genes were classified as tumour-enriched or oral-enriched within a species if they had FDR-adjusted p < 0.1 (Benjamini–Hochberg correction) and were significantly depleted in the opposite niche. The number of species in which each gene was enriched was counted, and genes enriched in ≥ 2 species were considered “recurrent” in that niche. Functional categories for recurrent genes were assigned based on annotation databases and literature review. Visualisations were generated in R using the ggplot2 package. Identification of mobile genetic elements and horizontal transfer The identification of potential horizontal gene transfer (HGT) events between Streptococcus isolates from the same patient required a multi-step approach that integrated mobile genetic element (MGE) annotation with sequence clustering and functional analysis. First, genomic assemblies were annotated for MGEs using dedicated tools: MobileElementFinder v1.0.3 44 to detect integrative and conjugative elements (ICEs), PlasmidFinder v2.1 /MOB-suite v3.0.3 45 to identify plasmids, and Pharokka v1.3.0 46 with CheckV v1.0.1 47 to characterize prophages. Predicted sequences from all isolates were then clustered with CD-HIT 48 at ≥90% identity and ≥80% coverage to define homologous MGE clusters shared across isolates. Next, cluster assignments were mapped to patient identifiers, and HGT events were defined as identical cluster IDs present in two or more co-colonising isolates from the same patient. This enabled quantification of within-patient sharing by MGE type (ICE, plasmid, prophage) and detection of cross-patient dissemination. Open reading frames within each MGE cluster were annotated with Prokka 49 , and predicted functions were grouped into broad categories, including recombination, conjugation, antimicrobial resistance (AMR), and phage-related functions. Functional distributions were visualised in R using ggplot2 , while networks of shared MGEs were constructed with igraph/ggraph , with nodes representing isolates and edges weighted by the number of shared clusters. Statistical analysis Data were analysed using R (v4.2.2) and Python (v3.9). For microbial isolate prevalence, differences in Streptococcus abundance between HNC patients and controls were assessed using Fisher’s exact test, with a significance threshold of p 99.9%, <1,000 SNPs, and <75 InDels 50 . Phylogenetic trees were constructed using mastree via the iTOL website. Genome size and GC content differences across species were evaluated using one-way ANOVA, followed by Tukey’s post-hoc test for pairwise comparisons. Functional trait distributions—including CAZyme counts, and phage proteins—were compared between tumour- and oral-derived isolates using Wilcoxon rank-sum tests within species. The analysis of pathway and functional category enrichments used Gene Ontology (GO) and KEGG databases through hypergeometric testing with Benjamini–Hochberg correction for FDR values below 0.05. The analysis used Fisher’s exact test to compare gene presence between different conditions and niches and determined enrichment significance at p < 0.05 (two-tailed). All graphical outputs (volcano plots, network visualisations, UMAP projections, and functional summaries) were generated in R using the ggplot2, igraph, and ggraph packages. Results Isolation, identification and deduplication of HNC-associated Streptococci. We analysed bacterial isolates collected from 31 HNC patients , with a mean age of 69 (range: 44–91). The cohort consisted of 26 males and 5 females ( Supplementary Data S1 ). Patients presented with a broad spectrum of tumour sites, including the oropharynx, larynx, oral tongue, floor of mouth, buccal mucosa, gingiva/alveolar ridge, sinonasal cavity, and salivary glands , spanning Stage I to Stage IVb disease where staging was documented. Samples were collected from two sites: (i) tumour tissue , obtained via enzymatic digestion of resected tumours, and (ii) oral swabs collected from non-tumour oral mucosa. To investigate the genomic landscape of HNC-associated Streptococcus , we performed long-read WGS on 388 bacterial isolates using the Oxford Nanopore MinION platform. Following quality control, 318 high-quality assemblies were taxonomically classified using the Genome Taxonomy Database (GTDB), of which Streptococcus represented the dominant genus, accounting for 143 genomes (50.3%). This aligns with previous studies identifying Streptococcus as a predominant member of the oral microbiome in both healthy individuals and HNC patients 51,52 . Focusing our analysis on Streptococcus , we selected a subset of 143 genomes for in-depth WGS-based investigation, prioritising tumour-resident isolates and reducing redundancy across samples. Since multiple isolates were often derived from the same patient and species, we implemented a dereplication step to identify clonal lineages. Pairwise genome comparisons were performed using FastANI and Nucmer 53,54 . Isolates were considered clonally redundant if they shared >99.9% ANI, <1,000 single-nucleotide polymorphisms (SNPs), and <75 insertion-deletions (InDels). While these thresholds are relatively permissive, they were chosen to retain intra-patient diversity while minimizing analytical redundancy. Applying these filters, we defined a final dataset of 101 non-redundant Streptococcus genomes for downstream genomic and functional analyses ( Supplementary Data S2 ). Phylogenetic Relationships of HNC-Associated Streptococci. To explore the strain-level diversity and evolutionary relationships of Streptococcus isolates in HNC patients, we analysed a maximum likelihood phylogenetic tree constructed from 101 dereplicated genomes 55 . These genomes encompassed 35 Streptococcus species, including 10 putative novel species that had not been previously classified in the GTDB. These novel species are provisionally named S. BHI_1 to S. BHI_10 . The resulting phylogenetic tree ( Fig. 1A ) illustrates the taxonomic structure of the Streptococcus population, with branches colour-coded by species and anatomical source (oral cavity or tumour). A high degree of species-level diversity is evident, with distinct clades corresponding to well-characterised taxa such as S. anginosus , S. mitis , S. oralis , and S. salivarius , consistent with their established abundance in the oral microbiome 22 . Notably, S. mitis and S. parasanguinis were among the most abundant species, each forming prominent clades. Several unclassified or novel isolates clustered within known species groups, suggesting close evolutionary relationships or potential misassignments of species. For example, S. BHI_1 , S. BHI_2 , S. BHI_4 , S. BHI_6 , S. BHI_8 , S. BHI_9 , S. BHI_11 , and S. sp002238115 were all positioned within the S. mitis clade. Similarly, S. sp029691405 and S. BHI_10 grouped within the S. oralis , while S. sp013394695 aligned with S. infantis , and S. sp001813105 and S. caecimuris fell within the S. parasanguinis clades. These patterns reflect the well-documented genomic fluidity among members of the mitis group, which includes S. mitis , S. oralis , and S. infantis —species that frequently exceed 95% ANI and exhibit overlapping pangenomes 56 . The phylogenetic proximity of several S. BHI strains to S. mitis likely reflects the presence of conserved core regions that obscure clear species demarcation. S. mitis is particularly known for its expansive and genetically diverse pangenome, which may encompass emerging or cryptic lineages 57,58 . The clustering of GTDB-designated “sp.” taxa (e.g., S. sp013394695 ) within established species groups further supports the possibility of novel or misclassified lineages that merit re-evaluation with expanded genome datasets and phenotypic characterisation. Genomic size and GC content of HNC-associated Streptococci. We analysed the genomic characteristics of the 101 unique Streptococcus isolates to explore intra- and inter-species diversity. Genome sizes ranged from approximately 1.7 Mbp to 2.5 Mbp, consistent with the known variation within Streptococcus 59 . While most species showed a narrow range of genome sizes, several, such as S. parasanguinis, exhibited notable variability, suggesting the presence of strain-specific genetic elements ( Fig. 1B ). Among the 10 putative novel species, S. BHI_11 , S. BHI_6 , and S. BHI_4 that were assigned to the S. mitis group had genome sizes approaching 2.1 Mbp, comparable to the upper end of typical S. mitis isolates, supporting their classification as genetically distinct yet closely related taxa. GC content across isolates ranged from 36% to 43%, with most species falling between 39% and 41%, consistent with previously reported profiles for oral Streptococci 56,59 ( Fig. 1C ). While genome size varied, GC content remained relatively stable within species, reflecting conserved evolutionary and functional constraints. For example, S. parasanguinis exhibited a broad genome size range (2.0–2.3 Mbp) while maintaining a stable GC content of approximately 41%. These patterns indicate that genome expansion does not substantially shift GC composition. Genomic Diversity and Functional Composition of Streptococcus Isolates. To further elucidate the genomic relationships of 101 unique Streptococcus isolates from HNC patients, we performed gene content-based clustering using the presence or absence of protein-coding genes identified through pangenome analysis. K-means clustering identified seven distinct isolate clusters, with the optimal k determined using the elbow method (k = 7) ( Supplementary Fig. S1) . UMAP was then used to project these high-dimensional gene content profiles into two dimensions, providing a clear visual separation of the seven clusters ( Fig. 2A ). These clusters corresponded closely with known Streptococcus phylogenetic groups, including the Mitis, Sanguinis, Anginosus, and Salivarius groups, reinforcing the concordance between phylogenetic and functional classifications 60 . The separation was also visually apparent, with distinct clusters showing minimal overlap in the UMAP space, illustrating their distinct accessory gene-content profiles and aligning with their phylogenetic assignments. Within the UMAP space, isolates clustered tightly with their corresponding species, affirming species-level identity ( Fig. 2A ). The Mitis group, known for its genomic plasticity, was distributed across 3 clusters (Clusters 1, 3, and 6). Cluster 1 contained S. oralis , S. BHI _ 10 , and S. sp029691405 , aligning with their position in the S. oralis clade in the core genome tree. Cluster 3 included S. infantis , S. massiliensis , S. mutans , and several novel taxa (e.g., S. BHI_3 , S. BHI_7 ), while Cluster 6 featured novel mitis-like species (e.g., S. BHI_2 , S. BHI_6 ) grouped phylogenetically with S. mitis . These findings suggest that Mitis-like species have retained conserved core genes while diversifying in accessory content, as illustrated by cluster-level pangenome comparisons ( Supplementary Fig. S2A, C, F ). The Sanguinis group ( S. sanguinis , S. gordonii , S. cristatus , S. sinensis ) formed Cluster 2, which is characterised by high core gene content (e.g., 73.8% in S. sanguinis , 76.7% in S. gordonii ). S. gordonii and S. sanguinis harbour open pan-genomes and share generally high sequence homology ( Supplementary Fig. S2B ). Several of their shared core genes are involved in carbohydrate metabolism (such as PTS components, glycolysis, and glycogen metabolism) and oxidative stress response (such as thioredoxin and peroxide detoxification systems), consistent with metabolic fitness and stress tolerance that may support persistence within oral biofilms 61,62 ( Supplementary Data S3 ). Cluster 5 included the Anginosus group ( S. anginosus , S. constellatus , S. hominis ), which are opportunistic pathogens frequently associated with abscess formation and deep tissue invasion ( Supplementary Fig. S2E ) 21,63,64 . Next, we constructed a pangenome on the 13 Streptococcus species with multiple isolates ( Fig. 2B ). Only 29 genus-level core genes were found across all isolates, highlighting the extensive genetic variability within this dataset. These genes included essential housekeeping functions such as ribosomal proteins ( rpl/rps ), RNA polymerase subunits ( rpo ), translation factors ( inf ), and key components of protein secretion ( secY ) and polysaccharide synthesis (galU) ( Supplementary Data S3 ). Comparative genomic analysis across 13 Streptococcus species isolated from our cohort patients revealed substantial interspecies variation in genome architecture and coding potential ( Supplementary Fig. S3 ; Supplementary Data S3 ). Average core gene lengths were relatively consistent across species, ranging from 916 bp in S. xiaochunlingii to 973 bp in S. salivarius , suggesting functional conservation of the core genome with only modest lineage-specific variation( Supplementary Fig. S3B ). GC content ranged from 37.99% in S. constellatus to 43.00% in S. sanguinis . While species such as S. sanguinis (43.00%), S. xiaochunlingii (42.00%), and S. parasanguinis (41.73%) had the highest GC content, the overall range was narrow, indicating that GC bias is unlikely to be a major driver of genomic differentiation in these taxa. Average genome sizes spanned from 1.86 Mb in S. infantis to 2.44 Mb in S. sanguinis , with larger genomes observed in S. gordonii , S. salivarius , and S. BHI_8 , consistent with an expanded coding repertoire. To account for differences in the number of isolates per species, we performed rarefaction analysis (n=5 isolates per species, 100 bootstrap replicates) to estimate core genome size and pangenome size. After rarefaction, core genome sizes ranged from X genes (species A) to Y genes (species B) , and pangenome sizes ranged from X to Y genes ( Supplementary Fig. S3A ). The persistence of these differences after sampling normalisation indicates that the observed variation reflects true biological differences among species rather than differences in isolate count . Species such as S. mitis and S. oralis have comparatively larger pangenomes and smaller core fractions, consistent with higher genomic flexibility and potential for horizontal gene transfer, whereas species such as S. sanguinis and S. vestibularis have more conserved genomic repertoires. Collectively, these results underscore the genomic diversity of Streptococcus in the HNC microbiome, shaped by both ecological adaptation and gene flow. Gene Enrichment Analysis between Tumour- and Oral-Derived Streptococci . We next analysed functional gene differences in the Streptococcus pangenome between tumour and oral-derived isolates. Across all 101 Streptococcus genomes, 13 genes showed significant prevalence shifts between tumour and oral isolates (absolute frequency difference > 0.3; raw p < 0.05) ( Fig. 3A ; Supplementary Data S5 ). Ten genes were enriched in tumour-derived isolates, including sfcA (malolactic enzyme), aIM24 (mitochondrial respiration-associated protein), group_338 (CsbD-like stress response protein), yetF (membrane protein; DUF421), group_1767 (DUF3290 domain protein), group_16421 (YtxH-like protein; Gram-positive signal peptide YSIRK family), group_11486 (hypothetical protein), group_1364 (DUF1269 domain protein), group_2826 (GNAT family N-acetyltransferase), and group_18096 ( zntA , Zn(II)-translocating P-type ATPase). Many of these genes are linked to stress tolerance, metal ion transport, membrane-associated functions, and potential adaptation to nutrient-limited or inflammatory tumour microenvironments 65-69 . Three genes were more prevalent in oral isolates, including group_1388 (two-pore domain potassium channel protein), group_2725 (DUF1304 domain-containing epimerase), and marR–mgrA (oxidative stress response regulator). These functions are consistent with osmotic balance, carbohydrate metabolism, and colonisation persistence in the oral cavity 70-73 . To assess whether these niche associations persisted across multiple species, we performed a species-resolved enrichment analysis in species with matched tumour and oral representation. Across all species, 197 genes were significantly enriched in at least one niche (FDR < 0.1), comprising 146 oral-enriched and 51 tumour-enriched genes. However, only a small subset was recurrent across multiple species (≥ 2 species) ( Fig. 3B ; Supplementary Data S5 ). Among tumour-enriched genes, only five were recurrent, including conjugative transfer components ( traE–virB4 , traG–virD4 ), and a capsule biosynthesis operon ( wcwK–cpsJ ), stress-associated ( yozG ), and DNA-binding and repair ( ssb ) 74-78 . In contrast, oral-enriched recurrent genes were more numerous and predominantly associated with carbohydrate metabolism and membrane transport ( treC , putP , phnC , fucO–gldA , wecB ), capsule biosynthesis ( wchO , wchP , cpsO–epsJ ), and CRISPR-Cas systems ( cas1 , cas2 , csn2 ) 75,79-83 . These findings indicate that while niche-specific functional adaptations are common and largely taxon-restricted, recurrent signals converge on carbohydrate metabolism, membrane transport and capsule biosynthesis, highlighting carbon utilisation and cell-surface glycan biology as major axes of oral–tumour niche specialisation in Streptococcus . CAZyme repertoires of HNC-associated Streptococci . Given the prominence of carbohydrate transport, metabolism and capsule-related functions among niche-associated genes, we next examined whether these differences extend to dedicated glycan-processing machinery. In oral Streptococci , carbohydrate-active enzymes (CAZymes) are central to the utilisation of host and dietary glycans, biofilm formation, mucosal adhesion and immune modulation, and have been implicated in adaptation to nutrient-limited or inflamed tumour microenvironments 84-87 . We therefore characterised the CAZyme repertoires of HNC-associated Streptococcus to define their glycan-utilisation potential and provide a framework for subsequent comparisons with healthy-derived oral isolates. Across the 101 HNC-associated Streptococcus genomes, we identified 99 distinct CAZyme families spanning six major functional classes: glycoside hydrolases (GH; n = 55), glycosyltransferases (GT; n = 21), carbohydrate-binding modules (CBM; n = 14), carbohydrate esterases (CE; n = 6), polysaccharide lyases (PL; n = 2), and auxiliary activities (AA; n = 1) ( Fig. 4A ; Supplementary Data S4 ). GHs and GTs dominated the repertoire, accounting for up to 63.8% and 30.4% of the CAZyme families per genome, respectively, while CBMs, CEs, PLs, and AAs occurred at lower frequencies. This class-level distribution was broadly conserved across species, although family-level counts varied substantially ( Fig. 4B ). For example, S. salivarius , S. mitis , and S. parasanguinis had the highest GT counts, while CE abundance peaked in S. mutans and S. parasanguinis . PLs were rare and restricted to S. constellatus , S. anginosus , S. oralis , and S. sp029691405 . CBMs were more evenly distributed (~4–5 families/genome), suggesting conserved substrate recognition functions. Although no CAZyme family was present in all genomes, several were highly prevalent across species, likely representing a conserved “functional core”. These included GH1 (β-glucosidase) 88 , GH13 subfamilies (e.g., GH13_9, GH13_14, GH13_31) 89 , GH23 and GH25 (peptidoglycan hydrolases) 90 , and GT2/GT4 (polysaccharide synthesis) 91 , as well as CBM48 and CBM50 92-94 , which target glycogen and chitin/peptidoglycan, respectively. These core families likely represent essential metabolic functions in mucosal colonisation and host glycan processing 95-99 . Conversely, several CAZyme families were rare or species-restricted. GH26 (mannanase) and CBM23 (mannan-binding) occurred only in S. parasanguinis , GH98 (blood group antigen hydrolase) was detected exclusively in Strep.BHI_11 , and GH170 and GH43_12 (arabinofuranosidase) were enriched in S. gordonii , S. constellatus , and S. parasanguinis . PL8_1 and PL12_1 were confined to small subsets of S. anginosus and S. constellatus . The AA10 family (lytic polysaccharide monooxygenase) was broadly distributed, detected in 74 of 101 isolates, suggesting a potential role in redox adaptation within mucosal environments or host immune interactions 100 . Fisher’s exact tests (FDR-corrected) further highlighted lineage-specific associations ( Supplementary Data S4 ). GH170 was significantly overrepresented in S. gordonii and S. anginosus (adjusted p < 0.01), GH78 and GH43_12 were confined to S. parasanguinis (adjusted p < 0.05), and GT14 (galactosyltransferase activity) occurred only in Strep.BHI_6 . CBM41 and CBM48 showed strong species-specific depletion or enrichment, particularly in S. massiliensis , S. cristatus , and S. pseudopneumoniae (adjusted p < 1 × 1e-56), underscoring functional divergence in glycan interaction strategies across taxa. CAZyme family richness, defined as the number of distinct CAZy families per genome, ranged from 44 in S. sp901875575 to 61 in S. anginosus ( Supplementary Data S4 ). S. gordonii and S. anginosus were among the most enriched, whereas S. massiliensis and S. sp901875575 harboured the lowest richness values ( Fig. 4C ). Shannon and Inverse Simpson indices revealed significant interspecies differences, and PERMANOVA confirmed strong species-level structuring of CAZyme profiles (R² = 0.895, p < 0.001). Integrative and conjugative elements are the predominant drivers of horizontal gene transfer in HNC Streptococcus . Using the resolution provided by closed long-read generated genomics, we then analysed 132 oral streptococcal isolates for complete mobile genetic elements (MGE). After identification, we considered the distribution and prevalence of specific elements through CD-HIT clustering (≥90% sequence identity and ≥80% coverage) to detect shared elements. In total, we identified 245 ICE clusters from 122 isolates and 82 prophage clusters from 77 isolates and 4 plasmid clusters from 35 isolates. ICEs were more conserved throughout the dataset, forming non-singleton clusters found in 43 different isolates while prophages were only found in one isolate in most cases, with the maximum occurrence being four isolates. While there were fewer plasmids in the dataset generally, they were spread widely throughout the isolates as shown by their presence in 26 strains from 13 different patients ( Fig. 5A , Supplementary Data S 6 ). This pattern aligns with prior observations of widespread plasmid exchange and host range diversity among streptococcal populations. The global MGE sharing network showed that particular MGEs are likely present throughout the broader population rather than through possible person-to-person transmission ( Supplementary Fig. S4A ). This interpretation was supported by cluster size distributions ( Supplementary Fig. S4B ), where ICEs and plasmids were often found in multiple patients, in contrast to the restricted distribution of prophages. The analysis of MGE distribution between species showed distinct patterns that followed taxonomic lines ( Fig. 5B , Supplementary Fig. S5A ). The three species S. mitis, S. oralis and S. salivarius displayed the largest mobilomes, containing genes for conjugation and recombination while S. constellatus and S. parasanguinis had phage-related genes as their main mobilome components. The species S. salivarius contained multiple antibiotic resistance determinants but S. pseudopneumoniae and S. massiliensis displayed smaller mobilomes that contained recombination genes. Within-patient analyses showed that ICE collections were characterised by abundant insertion sequences and metabolic/transport genes, whereas prophage collections were dominated by hypothetical or uncharacterised proteins, followed by phage-related and replication-associated functions ( Fig. 5C -D , Supplementary Fig. S5B ). This distribution highlights the high level of functional ambiguity in prophage-associated regions, suggesting extensive genetic mosaicism and unannotated phage cargo in the tumour-associated Streptococcus mobilome. The most common HGT region products included IS-family transposases and recombinases and conjugation proteins and clinically important antibiotic resistance determinants such as Msr(D) and BcrA ( Fig. 5E ). Comparison of Streptococcus Genomes between HNC and Healthy Donors. To investigate potential genomic differences between Streptococcus isolates associated with HNC and those from healthy individuals, we compared 76 oral isolates derived from HNC patients with 391 publicly available oral Streptococcus isolates from healthy individuals, obtained from the China National GeneBank COGR collection (CNGB Sequence Archive, accession CNP0003047) 26 ( Supplementary Data S7 ). Analyses were restricted to 12 Streptococcus species that were represented by multiple strains in both cohorts. Comparison of genome size and GC content revealed species-specific genomic divergence between cancer-associated and healthy isolates ( Fig. 6A-B ). GC content was largely conserved across species, but cancer-derived S. mitis and S. salivarius isolates exhibited significantly reduced GC levels (–0.5% and –0.4%, respectively, p < 0.05), potentially indicating altered compositional bias or increased mobile element load. Other species, such as S. anginosus , S. parasanguinis , and S. gordonii , showed no significant differences, reinforcing the heterogeneity of genomic adaptation across taxa. To investigate genomic differences between cancer- and healthy-associated Streptococcus isolates, we performed gene presence/absence analysis across all genomes. Principal Coordinate Analysis (PCoA) of gene presence/absence based on Jaccard distance revealed partial but distinct clustering between cancer and healthy isolates ( Supplementary Fig. S 6A ). To further explore whether these differences were consistent within individual species, we applied the PCoA stratified by species, revealing variable degrees of separation between cancer and healthy isolates across taxa ( Supplementary Fig. S 6B , Supplementary Data S7 ). Several species exhibited separation between cancer and healthy isolates, suggesting that tumour-associated strains acquire distinct accessory gene repertoires. Species-stratified PERMANOVA confirmed significant effects of health status on gene content in S. anginosus (R² = 0.1177, p = 0.001), S. constellatus (R² = 0.1741, p = 0.001), S. gordonii (R² = 0.1258, p = 0.002), S. infantis (R² = 0.1653, p = 0.034), S. mitis (R² = 0.1675, p = 0.001), S. oralis (R² = 0.0395, p = 0.022), and S. parasanguinis (R² = 0.0964, p = 0.003) ( Supplementary Data S7 ), with S. constellatus showing the strongest disease-associated effect size. These findings suggest that while species-level structure is conserved, disease status can drive niche-specific genomic diversification in a subset of lineages. To determine whether cancer-associated Streptococcus exhibit distinct glycan-processing capacities relative to healthy isolates, we conducted matched comparisons of CAZymes across 17 shared species. At the class level, CAZyme composition were broadly conserved between cancer and healthy cohorts, with GHs and GTs comprising the dominant categories in both cancer and healthy isolates (GH: 53.6–55.4%; GT: 23.4–24.8%) ( Supplementary Fig. S 7A ). Nevertheless, PERMANOVA analysis revealed a modest but statistically significant effect of cancer status on CAZyme composition (R² = 0.018, p = 0.001), indicating that disease state partially structures glycan-degradation potential. Diversity analyses further supported subtle cohort-level differences. Directional enrichment analysis using Fisher’s exact test identified several species-specific functional shifts ( Supplementary Fig. S 7C ; Supplementary Data S8 ). Cancer-derived isolates of S. mitis , S. oralis , and S. pseudopneumoniae exhibited higher proportions of CE, GH, and GT classes, potentially reflecting adaptations in nutrient acquisition or host interaction. Conversely, healthy-associated isolates of S. salivarius , S. parasanguinis , and S. gordonii were enriched in GH, GT, or CBM classes, indicating retention of canonical commensal glycan-processing profiles. To explore finer-scale differences, we applied DESeq2 analysis to CAZyme gene counts across all 467 isolates. Four CAZy families—GH70, GH73, GT8, and GT2—were significantly enriched in healthy isolates (adjusted p < 0.05). These families, associated with extracellular glucan synthesis and peptidoglycan turnover, may reflect enhanced structural maintenance and biosynthetic activity in the healthy oral microbiome 101 . No CAZyme families were significantly enriched in the cancer cohort at the global level, though species-level expansions (e.g., GT8 in S. pseudopneumoniae ) were observed. Finally, we compared the gene presence–absence profiles of oral cavity-derived Streptococcus genomes from 76 HNC-associated isolates and 391 healthy-associated isolates to identify genomic and functional differences associated with cancer 26 . We performed two complementary gene enrichment analyses: a pooled analysis that compared all isolates irrespective of species, and a species-balanced analysis focused on within-species comparisons across 17 matched species. In the pooled analysis, 158 genes exhibited significant differential presence (FDR < 0.05) between cancer and healthy cohorts ( Fig. 7A ; Supplementary Data S9 ). HNC-enriched genes were involved in sugar transport and metabolism ( lacE , lacF , lacT , malX , glf ), thiamine biosynthesis ( thiM , thiE ), and drug resistance ( ermB , mefA , yheS–msrD ), as well as mobile genetic elements ( tnp , traG , pezT ). Notably, several cancer-enriched loci, such as group_15226 and group_13653 , were annotated as hypothetical or grouped gene clusters, suggesting that uncharacterised genes could play a role in cancer adaptation. In contrast, healthy-enriched genes showed higher frequencies of CRISPR-Cas loci ( cas3–cas7 , casA , casB ), histidine and nucleotide biosynthesis genes ( hisA–hisH , bioH , pdxS ), and metal ion transporters ( acm , crcB , glpG ). To dissect the genomic adaptations of Streptococcus isolates in the cancer microenvironment, we performed species-resolved comparisons of gene presence-absence across 17 species with matched cancer and healthy representatives. Fisher’s exact tests followed by FDR correction identified 622 significantly differentially enriched genes (adjusted p 0.1), the majority of which were species-specific (91.3%), underscoring individualized adaptation patterns ( Supplementary Data S9 ). Five species ( S. anginosus , S. gordonii , S. mitis , S. parasanguinis , and S. salivarius ) showed pronounced gene-level divergence, with multiple genes significantly enriched in either cancer- or health-associated isolates ( Supplementary Fig. S 8 ). For example, S. anginosus cancer isolates exhibited enrichment for genes with unknown function (group_4022, group_5958, and group_6215), while S. gordonii healthy isolates were enriched in metabolic and regulatory genes such as lacF and bglG . S. mitis displayed the most extensive divergence, with over 100 genes differentially enriched, including the mobile element tnp_tran5 and stress response regulators like yoeG , suggesting broad genomic remodelling in the cancer-associated niche. Although most gene differences were species-restricted, a subset of functions showed recurrent enrichment across multiple species. For instance, sprT , a stress-related protease, and narK , a nitrate transporter, were enriched in cancer isolates from different species, suggesting convergent functional adaptation to the tumour microenvironment. In contrast, genes involved in sugar metabolism and quorum regulation were preferentially retained in healthy-associated strains. These results highlight both conserved and species-specific strategies by which Streptococcus adapts to cancer-associated ecological pressures. We performed GO and KEGG pathway enrichment analysis using Fisher’s exact test with Benjamini–Hochberg FDR correction (adjusted p < 0.05), based on gene presence/absence data. When assessed globally across all genes, no significant pathway-level differences were detected between HNC-associated and healthy-derived Streptococcus isolates in either analysis, suggesting a conserved core functionality ( Supplementary Fig. S 9A–D ). However, enrichment analysis restricted to significantly different genes revealed pronounced functional separation. In the cancer-enriched gene set, KEGG pathways were dominated by carbohydrate and energy metabolism, including galactose metabolism, pyruvate metabolism, fructose and mannose metabolism, glycolysis/gluconeogenesis, amino sugar and nucleotide sugar metabolism, pentose phosphate pathway, and carbon metabolism. Additional pathways included lipoic acid metabolism, PTS (phosphotransferase system), and base excision repair ( Fig. 7B ). GO enrichment similarly highlighted carbohydrate transport and uptake processes, such as “carbohydrate transmembrane transport”, “monosaccharide transport”, “ABC-type carbohydrate transporter activity”, and “maltose/oligosaccharide transporter activity” ( Fig. 7C ). These were complemented by broad response terms such as “cellular response to external stimulus”. In contrast, genes enriched in healthy-associated isolates mapped to biosynthetic and regulatory pathways, including amino acid biosynthesis (e.g., cysteine and methionine, alanine, arginine, glutamate, phenylalanine, histidine), starch and sucrose metabolism, quorum sensing, and protein export. GO terms supported this metabolic breadth, with enrichment of “L-histidine biosynthetic process,” “glutamine family amino acid metabolism,” “oxidoreductase activity,” and multiple vitamin transport processes (e.g., riboflavin, vitamin B6, CoA biosynthesis). Healthy isolates were also enriched for DNA repair and cell wall-related pathways, including mismatch repair, homologous recombination, peptidoglycan and teichoic acid biosynthesis. Discussion Here, we present a comprehensive dataset of fully resolved Streptococcus isolate genomes associated with HNC. Our analyses show substantial microbial diversity along with specific adaptations in HNC patient oral microbiomes. Notably, we identified 101 unique fully resolved genomes across 35 species, including ten putative novel species-level taxa ( Strep. BHI_1 - 10 ) clustering within the Mitis group ( S. mitis, S. oralis, and S. infantis ), reflecting genomic fluidity with high ANI (>95%) and overlapping pangenomes 102,103 . The discovered taxa show features of undetected lineages which need further phenotypic validation 103 . Our phylogenomic and pangenome analyses demonstrate both taxonomic and functional divergence among HNC-associated Streptococcus . Species such as S. mitis and S. oralis exhibited extensive accessory genome variation, with some novel strains (e.g., Strep. BHI_6 , BHI_11 ) carrying expanded gene repertoires not observed in reference genomes. This genomic diversity is likely reflective of microenvironment-specific selective pressures, including differential nutrient availability, host immune surveillance, and biofilm spatial organization which influence microbial evolution in mucosal niches 104-106 . This underscores the need for expanded genomic and phenotypic studies to refine taxonomic classifications and elucidate functional roles in HNC. The genomic analyses also revealed distinct patterns of possible adaptation amongst the Streptococcus genus. The variable genome sizes (1.7–2.5 Mbp) together with stable GC content (36–43%) demonstrate that HNC TME adaptation likely occurs through HGT events that expand accessory genes while maintaining a stable core composition 107,108 . Across all 101 isolates spanning 35 species, we identified only 29 genus-level core genes, including galU critical for biofilm formation 109 , emphasising the extensive genetic variability within Streptococcus , with accessory genes likely conferring niche-specific advantages such as enhanced nutrient acquisition or immune evasion 73,109 . UMAP clustering and pangenome analyses further confirmed that species like S. mitis and S. oralis exhibit substantial genomic flexibility through their low core gene percentages (32–36%), while S. sanguinis and S. gordonii maintain more conserved genomes, reflecting ecological stability in oral communities 59,102,110 . Within the HNC cohort, our analyses observed distinct carbohydrate-active enzyme (CAZyme) repertoires accumulate differentially between different taxa and between tumour- and oral-derived isolates. Tumour-associated isolates of S. mitis and S. oralis showed significant expansions in glycosyltransferase (GT) and carbohydrate esterase (CE) families, whereas oral-derived isolates retained broader CAZyme diversity, including rare glycoside hydrolases (GHs). The observed change in glycan acquisition approaches may stem from tumour-related modifications of host mucin glycosylation and extracellular matrix remodelling in tumours 111,112 . The observed CAZyme divergence aligns with previous research that connects glycan remodelling to microbiome restructuring in oncogenic contexts 113,114 . Our pan-genome comparisons between healthy- and HNC-associated Streptococcus isolates revealed significant enrichment of genes in tumour-derived strains that are functionally linked to stress response, metal homeostasis, and membrane/capsule biosynthesis. Specifically, the spore-coat associated gene yetF 115 , the arsenate reductase arsC 116 , the zinc/cadmium efflux pump zntA 117 and membrane or capsule-associated functions ( group_16421 , wcwK–cpsJ ) 118,119 were over-represented in tumour isolates. Although these genes were originally characterised in non-cancer contexts, their known roles in oxidative stress resistance, metal detoxification, and surface polysaccharide modification are highly relevant to known selective pressures in the HNC tumour microenvironment, including hypoxia-induced ROS accumulation, metal ion dysregulation, and immune-mediated antimicrobial attack 120-124 . These findings suggest that acquisition or retention of such genes may confer a fitness advantage to Streptococci within the tumour niche. In contrast, oral-derived isolates showed increased presence of carbohydrate metabolic genes( treC and putP ) 125,126 and oxidative stress resistance genes ( marR-mgrA ) 127 which matches the oral mucosa's dynamic sugar and oxygen gradients 128,129 . Using our panel of long-read sequenced and closed genomes, we identified extensive HGT among co-resident Streptococcus strains within individual HNC patients. ICEs were the predominant vectors, forming 245 distinct clusters across 122 isolates, while prophages (82 clusters from 77 isolates) were generally strain-specific and plasmids were rare (4 clusters from 35 isolates) but distributed across multiple patients. These MGEs collectively carried genes for conjugation, recombination, and antibiotic resistance (such as Msr(D) , BcrA ), indicating that ICEs play a dominant role in mobilising adaptive traits in HNC-associated Streptococcus 130-132 . The relative scarcity of plasmids suggests limited plasmid-mediated exchange, whereas ICEs and certain prophages facilitate broader chromosomal gene movement. The abundance of hypothetical proteins in prophage regions highlights the high proportion of uncharacterised cargo within the HNC Streptococcus mobilome. The analysis of our HNC-associated isolates against 391 publicly available healthy oral Streptococcus genomes revealed cancer-specific genomic patterns. The fundamental genetic structure of these bacteria stayed intact, but their non-essential genetic material showed significant differences. Cancer-related Streptococcus isolates exhibited higher frequencies genes for sugar transport (e.g., lacE , glf ) 133,134 , thiamine biosynthesis ( thiM, thiE ) 133,134 and antimicrobial resistance (e.g., ermB , mefA ) 135,136 , which suggest metabolic restructuring and stress response optimization. The healthy bacterial isolates contained more CRISPR-Cas loci and metal ion homeostasis genes which indicate better phage defence capabilities and environmental perception 137 . These findings suggest that cancer-associated oral dysbiosis is shaped by the selective retention of horizontally acquired genes that enhance stress tolerance, immune modulation, and nutrient utilisation within cancer-affected oral environments 113,138 . Cancer-enriched genes were predominantly associated within MGEs, particularly ICEs. These HGT-associated genes, including transporters (e.g., lacF, celB ) 139,140 and carbohydrate uptake and fermentation genes (e.g., lacC, bglG ) 141,142 , likely enhance metabolic versatility under nutrient-limited or stress-associated conditions common in cancer-affected oral environments 19 . Because these genes were observed across multiple isolates but not concentrated in a single species, they likely reflect functional convergence rather than clonal expansion 143 . The prevalence of ICEs further underscores their role in chromosome-based adaptation, although the abundance of unclassified MGEs warrants further characterization to elucidate their contributions 144,145 . The study provides has several limitations. The analyses were restricted to culturable bacteria, excluding unculturable species. Sequence clustering using CD-HIT thresholds at 90% identity and 80% coverage fails to detect distant homologs and may merge mosaic elements incorrectly. Moreover, this approach cannot infer the directionality of gene transfer. Although long-read sequencing improved assembly contiguity, the detection and complete annotation of plasmids and prophages remain challenging. Potential biases in the healthy cohort analysis arising from differences in batch conditions, geographical origin and sequencing platforms, were addressed through bioinformatic methods but could not be completely eliminated. Finally, while differential gene expression was observed between isolates from cancer and healthy donors, further validation is required to confirm whether cancer-specific enrichments confer true survival advantages. Nevertheless, this study provides a comprehensive and novel dataset that advances understanding of Streptococcus species in HNC. Conclusion This is the first study to demonstrate that Streptococcus isolates from HNC patients exhibit distinct genetic and functional profiles compared with those from healthy individuals. These differences arise primarily from accessory genome variation, ICE-mediated horizontal gene transfer, and species-specific adaptations in carbohydrate metabolism. Together, these findings reveal species-dependent evolutionary strategies that may enable bacterial survival within TMEs and identify new lineages and MGE-linked functions that may inform microbiome-based diagnostics and therapeutics for HNC. Declarations Data Availability Statement All sequencing data in this paper will be provided upon request. Author Contributions L.M.: Conceptualization, Methodology, Investigation, Data Analysis, Writing – Original draft, Writing – review and editing. G.B.: Investigation, Data Analysis, Writing – Original draft, Writing – review and editing. E.B, K.Y: Investigation, Writing – review and editing. J.H., S.K.: Resources, Writing – review and editing. P.W., R.V., A.P., S.V.: Resources, Writing – review and editing, Supervision, Funding Acquisition. K.F: Conceptualization, Investigation, Writing – Original draft, Writing – review and editing, Supervision, Project Administration, Funding Acquisition. Acknowledgments We would like to thank the medical staff from The Royal Adelaide Hospital and The Memorial Hospital for their assistance in sample collection. Graphical Abstract was generated using BioRender. Funding This work was supported by an HSCGB Ray and Shirl Norman Cancer Research Grant (A.P., K.F., S.V., and R.V.), an NHMRC investigator grant APP1196832 (P.W.), a Passe and Williams Senior Fellowship (S.V.), a Cancer Council SA Research Fellowship (K.F) and a The University of Adelaide Research Training Program Scholarship (L.M). Conflict of Interests The authors declare that there are no conflicts of interest. Ethics statement Ethics approval (HREC MYIP14116) for the collection and storage of patient samples was granted by Central Adelaide Local Health Network Human Research Ethics Committee (Adelaide, South Australia) in accordance with the Declaration of Helsinki, and all patients had signed written informed consent. References Barsouk, A., Aluru, J. S., Rawla, P., Saginala, K. & Barsouk, A. Epidemiology, Risk Factors, and Prevention of Head and Neck Squamous Cell Carcinoma. Med Sci (Basel) 11 (2023). https://doi.org/10.3390/medsci11020042 Sung, H. et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin 71 , 209-249 (2021). https://doi.org/10.3322/caac.21660 Chow, L. Q. M. Head and Neck Cancer. New England Journal of Medicine 382 , 60-72 (2020). https://doi.org/10.1056/NEJMra1715715 Irfan, M., Delgado, R. Z. R. & Frias-Lopez, J. The Oral Microbiome and Cancer. Frontiers in Immunology Volume 11 - 2020 (2020). https://doi.org/10.3389/fimmu.2020.591088 Burcher, K. M. et al. A Review of the Role of Oral Microbiome in the Development, Detection, and Management of Head and Neck Squamous Cell Cancers. Cancers (Basel) 14 (2022). https://doi.org/10.3390/cancers14174116 Dorobisz, K., Dorobisz, T. & Zatoński, T. The Microbiome's Influence on Head and Neck Cancers. Curr Oncol Rep 25 , 163-171 (2023). https://doi.org/10.1007/s11912-022-01352-7 Dewhirst, F. E. et al. The human oral microbiome. J Bacteriol 192 , 5002-5017 (2010). https://doi.org/10.1128/jb.00542-10 Kilian, M. et al. The oral microbiome – an update for oral healthcare professionals. British Dental Journal 221 , 657-666 (2016). https://doi.org/10.1038/sj.bdj.2016.865 Gilbert, J. A. et al. Current understanding of the human microbiome. Nature Medicine 24 , 392-400 (2018). https://doi.org/10.1038/nm.4517 Baty, J. J., Stoner, S. N. & Scoffield, J. A. Oral Commensal Streptococci: Gatekeepers of the Oral Cavity. J Bacteriol 204 , e0025722 (2022). https://doi.org/10.1128/jb.00257-22 Velsko, I. M. & Warinner, C. Streptococcus abundance and oral site tropism in humans and non-human primates reflects host and lifestyle differences. npj Biofilms and Microbiomes 11 , 19 (2025). https://doi.org/10.1038/s41522-024-00642-1 Zhang, Y. et al. Human oral microbiota and its modulation for oral health. Biomedicine & Pharmacotherapy 99 , 883-893 (2018). https://doi.org/https://doi.org/10.1016/j.biopha.2018.01.146 Bloch, S., Hager-Mair, F. F., Andrukhov, O. & Schäffer, C. Oral streptococci: modulators of health and disease. Frontiers in Cellular and Infection Microbiology Volume 14 - 2024 (2024). https://doi.org/10.3389/fcimb.2024.1357631 Kreth, J., Giacaman, R. A., Raghavan, R. & Merritt, J. The road less traveled - defining molecular commensalism with Streptococcus sanguinis. Mol Oral Microbiol 32 , 181-196 (2017). https://doi.org/10.1111/omi.12170 Ye, D. et al. Competitive dynamics and balance between Streptococcus mutans and commensal streptococci in oral microecology. Crit Rev Microbiol 51 , 532-543 (2025). https://doi.org/10.1080/1040841x.2024.2389386 Senthil Kumar, S. et al. Oral streptococci S. anginosus and S. mitis induce distinct morphological, inflammatory, and metabolic signatures in macrophages. Infection and Immunity 92 , e00536-00523 (2024). https://doi.org/10.1128/iai.00536-23 Tomic, U. et al. Streptococcus mitis and Prevotella melaninogenica Influence Gene Expression Changes in Oral Mucosal Lesions in Periodontitis Patients. Pathogens 12 , 1194 (2023). Narikiyo, M. et al. Frequent and preferential infection of Treponema denticola, Streptococcus mitis, and Streptococcus anginosus in esophageal cancers. Cancer Sci 95 , 569-574 (2004). https://doi.org/10.1111/j.1349-7006.2004.tb02488.x Fu, K. et al. Streptococcus anginosus promotes gastric inflammation, atrophy, and tumorigenesis in mice. Cell 187 , 882-896.e817 (2024). https://doi.org/10.1016/j.cell.2024.01.004 Sasaki, H. et al. Presence of Streptococcus anginosus DNA in esophageal cancer, dysplasia of esophagus, and gastric cancer. Cancer Res 58 , 2991-2995 (1998). Sasaki, M. et al. Streptococcus anginosus infection in oral cancer and its infection route. Oral Diseases 11 , 151-156 (2005). https://doi.org/https://doi.org/10.1111/j.1601-0825.2005.01051.x Tateda, M. et al. Streptococcus anginosus in head and neck squamous cell carcinoma: implication in carcinogenesis. Int J Mol Med 6 , 699-703 (2000). https://doi.org/10.3892/ijmm.6.6.699 Rai, A. K. et al. Dysbiosis of salivary microbiome and cytokines influence oral squamous cell carcinoma through inflammation. Arch Microbiol 203 , 137-152 (2021). https://doi.org/10.1007/s00203-020-02011-w Zhou, L., Fan, S., Zhang, W., Wang, D. & Tang, D. Microbes in the tumor microenvironment: New additions to break the tumor immunotherapy dilemma. Microbiological Research 285 , 127777 (2024). https://doi.org/https://doi.org/10.1016/j.micres.2024.127777 Li, S. et al. Gut Microbiota and Immune Modulatory Properties of Human Breast Milk Streptococcus salivarius and S. parasanguinis Strains. Front Nutr 9 , 798403 (2022). https://doi.org/10.3389/fnut.2022.798403 Li, W. et al. A catalog of bacterial reference genomes from cultivated human oral bacteria. npj Biofilms and Microbiomes 9 , 45 (2023). https://doi.org/10.1038/s41522-023-00414-3 Burja, B. et al. An Optimized Tissue Dissociation Protocol for Single-Cell RNA Sequencing Analysis of Fresh and Cultured Human Skin Biopsies. Front Cell Dev Biol 10 , 872688 (2022). https://doi.org/10.3389/fcell.2022.872688 Boyanova, L. & Medeiros, J. A. d. S. in ClinMicroNow 1-7. Bouras, G. et al. Hybracter: enabling scalable, automated, complete and accurate bacterial genome assemblies. Microbial Genomics 10 (2024). https://doi.org/https://doi.org/10.1099/mgen.0.001244 Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nature Biotechnology 37 , 540-546 (2019). https://doi.org/10.1038/s41587-019-0072-8 Bouras, G., Grigson, S., Papudeshi, B., Mallawaarachchi, V. & Roach, M. Dnaapler: A tool to reorient circular microbial genomes. Journal of Open Source Software 9 , 5968 (2024). https://doi.org/10.21105/joss.05968 Bouras, G., Sheppard, A. E., Mallawaarachchi, V. & Vreugde, S. Plassembler: an automated bacterial plasmid assembly tool. Bioinformatics 39 (2023). https://doi.org/10.1093/bioinformatics/btad409 Chklovski, A., Parks, D. H., Woodcroft, B. J. & Tyson, G. W. CheckM2: a rapid, scalable and accurate tool for assessing microbial genome quality using machine learning. Nature Methods 20 , 1203-1212 (2023). https://doi.org/10.1038/s41592-023-01940-w Chaumeil, P.-A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics 36 , 1925-1927 (2019). https://doi.org/10.1093/bioinformatics/btz848 Schwengers, O. et al. Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification. Microbial Genomics 7 (2021). https://doi.org/https://doi.org/10.1099/mgen.0.000685 Tonkin-Hill, G. et al. Producing polished prokaryotic pangenomes with the Panaroo pipeline. Genome Biology 21 , 180 (2020). https://doi.org/10.1186/s13059-020-02090-4 Katoh, K., Misawa, K., Kuma, K. i. & Miyata, T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research 30 , 3059-3066 (2002). https://doi.org/10.1093/nar/gkf436 Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K. F., von Haeseler, A. & Jermiin, L. S. ModelFinder: fast model selection for accurate phylogenetic estimates. Nature Methods 14 , 587-589 (2017). https://doi.org/10.1038/nmeth.4285 Letunic, I. & Bork, P. Interactive Tree of Life (iTOL) v6: recent updates to the phylogenetic tree display and annotation tool. Nucleic Acids Research 52 , W78-W82 (2024). https://doi.org/10.1093/nar/gkae268 McInnes, L., Healy, J. & Melville, J. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018). Kodinariya, T. & Makwana, P. Review on Determining of Cluster in K-means Clustering. International Journal of Advance Research in Computer Science and Management Studies 1 , 90-95 (2013). Drula, E. et al. The carbohydrate-active enzyme database: functions and literature. Nucleic Acids Research 50 , D571-D577 (2021). https://doi.org/10.1093/nar/gkab1045 Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nature Methods 12 , 59-60 (2015). https://doi.org/10.1038/nmeth.3176 Johansson, M. H. K. et al. Detection of mobile genetic elements associated with antibiotic resistance in Salmonella enterica using a newly developed web tool: MobileElementFinder. J Antimicrob Chemother 76 , 101-109 (2021). https://doi.org/10.1093/jac/dkaa390 Carattoli, A. et al. In silico detection and typing of plasmids using PlasmidFinder and plasmid multilocus sequence typing. Antimicrob Agents Chemother 58 , 3895-3903 (2014). https://doi.org/10.1128/aac.02412-14 Bouras, G. et al. Pharokka: a fast scalable bacteriophage annotation tool. Bioinformatics 39 (2022). https://doi.org/10.1093/bioinformatics/btac776 Nayfach, S. et al. CheckV assesses the quality and completeness of metagenome-assembled viral genomes. Nature Biotechnology 39 , 578-585 (2021). https://doi.org/10.1038/s41587-020-00774-7 Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22 , 1658-1659 (2006). https://doi.org/10.1093/bioinformatics/btl158 Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30 , 2068-2069 (2014). https://doi.org/10.1093/bioinformatics/btu153 Zhang, Z. J. et al. Comprehensive analyses of a large human gut Bacteroidales culture collection reveal species and strain level diversity and evolution. bioRxiv (2024). https://doi.org/10.1101/2024.03.08.584156 Sasaki, M. et al. Streptococcus anginosus infection in oral cancer and its infection route. Oral diseases 11 , 151-156 (2005). Yang, C.-Y. et al. Oral Microbiota Community Dynamics Associated With Oral Squamous Cell Carcinoma Staging. Frontiers in Microbiology 9 (2018). https://doi.org/10.3389/fmicb.2018.00862 Marçais, G. et al. MUMmer4: A fast and versatile genome alignment system. PLOS Computational Biology 14 , e1005944 (2018). https://doi.org/10.1371/journal.pcbi.1005944 Jain, C., Rodriguez-R, L. M., Phillippy, A. M., Konstantinidis, K. T. & Aluru, S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nature Communications 9 , 5114 (2018). https://doi.org/10.1038/s41467-018-07641-9 Zou, Y. et al. Common Methods for Phylogenetic Tree Construction and Their Implementation in R. Bioengineering (Basel) 11 (2024). https://doi.org/10.3390/bioengineering11050480 Kilian, M. et al. The oral microbiome–an update for oral healthcare professionals. British dental journal 221 , 657-666 (2016). Belman, S., Chaguza, C., Kumar, N., Lo, S. & Bentley, S. D. A new perspective on ancient Mitis group streptococcal genetics. Microb Genom 8 (2022). https://doi.org/10.1099/mgen.0.000753 Do, T. et al. Population structure of Streptococcus oralis. Microbiology (Reading) 155 , 2593-2602 (2009). https://doi.org/10.1099/mic.0.027284-0 Gao, X.-Y., Zhi, X.-Y., Li, H.-W., Klenk, H.-P. & Li, W.-J. Comparative Genomics of the Bacterial Genus Streptococcus Illuminates Evolutionary Implications of Species Groups. PLOS ONE 9 , e101229 (2014). https://doi.org/10.1371/journal.pone.0101229 Abranches, J. et al. Biology of Oral Streptococci. Microbiol Spectr 6 (2018). https://doi.org/10.1128/microbiolspec.GPP3-0042-2018 Taylor, Z. A., Pham, D. N. & Zeng, L. Systematic analysis of the glucose-PTS in Streptococcus sanguinis highlighted its importance in central metabolism and bacterial fitness. Appl Environ Microbiol 91 , e0193524 (2025). https://doi.org/10.1128/aem.01935-24 Zheng, W. et al. Distinct Biological Potential of Streptococcus gordonii and Streptococcus sanguinis Revealed by Comparative Genome Analysis. Scientific Reports 7 , 2949 (2017). https://doi.org/10.1038/s41598-017-02399-4 Gray, T. Streptococcus anginosus group: Clinical significance of an important group of pathogens. Clinical Microbiology Newsletter 27 , 155-159 (2005). https://doi.org/https://doi.org/10.1016/j.clinmicnews.2005.09.006 Sunwoo, B. Y. & Miller, W. T., Jr. Streptococcus anginosus infections: crossing tissue planes. Chest 146 , e121-e125 (2014). https://doi.org/10.1378/chest.13-2791 Zhang, C. et al. Glutamine enhances pneumococcal growth under methionine semi-starvation by elevating intracellular pH. Frontiers in Microbiology Volume 15 - 2024 (2024). https://doi.org/10.3389/fmicb.2024.1430038 Pal, C. et al. in Advances in Microbial Physiology Vol. 70 (ed Robert K. Poole) 261-313 (Academic Press, 2017). Nobbs, A. H., Lamont, R. J. & Jenkinson, H. F. Streptococcus adherence and colonization. Microbiol Mol Biol Rev 73 , 407-450, Table of Contents (2009). https://doi.org/10.1128/mmbr.00014-09 Shelburne, S. A., Davenport, M. T., Keith, D. B. & Musser, J. M. The role of complex carbohydrate catabolism in the pathogenesis of invasive streptococci. Trends Microbiol 16 , 318-325 (2008). https://doi.org/10.1016/j.tim.2008.04.002 Qiao, Y. et al. Lactate metabolism and lactylation in breast cancer: mechanisms and implications. Cancer Metastasis Rev 44 , 48 (2025). https://doi.org/10.1007/s10555-025-10264-4 Jin, P., Wang, L., Chen, D. & Chen, Y. Unveiling the complexity of early childhood caries: Candida albicans and Streptococcus mutans cooperative strategies in carbohydrate metabolism and virulence. J Oral Microbiol 16 , 2339161 (2024). https://doi.org/10.1080/20002297.2024.2339161 Rajasekaran, J. J. et al. Oral Microbiome: A Review of Its Impact on Oral and Systemic Health. Microorganisms 12 (2024). https://doi.org/10.3390/microorganisms12091797 Nazaret, F., Alloing, G., Mandon, K. & Frendo, P. MarR Family Transcriptional Regulators and Their Roles in Plant-Interacting Bacteria. Microorganisms 11 (2023). https://doi.org/10.3390/microorganisms11081936 Lemos, J. A. et al. The Biology of Streptococcus mutans. Microbiol Spectr 7 (2019). https://doi.org/10.1128/microbiolspec.GPP3-0051-2018 Durand, E., Oomen, C. & Waksman, G. Biochemical dissection of the ATPase TraB, the VirB4 homologue of the Escherichia coli pKM101 conjugation machinery. J Bacteriol 192 , 2315-2323 (2010). https://doi.org/10.1128/jb.01384-09 Zeng, Y. et al. cpsJ gene of Streptococcus iniae is involved in capsular polysaccharide synthesis and virulence. Antonie van Leeuwenhoek 109 , 1483-1492 (2016). https://doi.org/10.1007/s10482-016-0750-1 Marceau, A. H. Functions of single-strand DNA-binding proteins in DNA replication, recombination, and repair. Methods Mol Biol 922 , 1-21 (2012). https://doi.org/10.1007/978-1-62703-032-8_1 Wang, J. et al. The conserved domain database in 2023. Nucleic Acids Res 51 , D384-d388 (2023). https://doi.org/10.1093/nar/gkac1096 Schröder, G. et al. TraG-like proteins of DNA transfer systems and of the Helicobacter pylori type IV secretion system: inner membrane gate for exported substrates? J Bacteriol 184 , 2767-2779 (2002). https://doi.org/10.1128/jb.184.10.2767-2779.2002 Steen, J. A., Bohlke, N., Vickers, C. E. & Nielsen, L. K. The Trehalose Phosphotransferase System (PTS) in E. coli W Can Transport Low Levels of Sucrose that Are Sufficient to Facilitate Induction of the csc Sucrose Catabolism Operon. PLOS ONE 9 , e88688 (2014). https://doi.org/10.1371/journal.pone.0088688 Lehman, M. K. et al. Proline transporters ProT and PutP are required for Staphylococcus aureus infection. PLOS Pathogens 19 , e1011098 (2023). https://doi.org/10.1371/journal.ppat.1011098 Stasi, R., Neves, H. I. & Spira, B. Phosphate uptake by the phosphonate transport system PhnCDE. BMC Microbiology 19 , 79 (2019). https://doi.org/10.1186/s12866-019-1445-3 Zhang, J. et al. Structure of glycerol dehydrogenase (GldA) from Escherichia coli. Acta Crystallogr F Struct Biol Commun 75 , 176-183 (2019). https://doi.org/10.1107/s2053230x19000037 Wilkinson, M. et al. Structure of the DNA-Bound Spacer Capture Complex of a Type II CRISPR-Cas System. Molecular Cell 75 , 90-101.e105 (2019). https://doi.org/10.1016/j.molcel.2019.04.020 Lombard, V., Golaconda Ramulu, H., Drula, E., Coutinho, P. M. & Henrissat, B. The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res 42 , D490-495 (2014). https://doi.org/10.1093/nar/gkt1178 Muramatsu, M. K. & Winter, S. E. Nutrient acquisition strategies by gut microbes. Cell Host Microbe 32 , 863-874 (2024). https://doi.org/10.1016/j.chom.2024.05.011 Andreassen, P. R. et al. Host-glycan metabolism is regulated by a species-conserved two-component system in Streptococcus pneumoniae. PLoS Pathog 16 , e1008332 (2020). https://doi.org/10.1371/journal.ppat.1008332 Zhao, S., Peralta, R. M., Avina-Ochoa, N., Delgoffe, G. M. & Kaech, S. M. Metabolic regulation of T cells in the tumor microenvironment by nutrient availability and diet. Semin Immunol 52 , 101485 (2021). https://doi.org/10.1016/j.smim.2021.101485 Michalska, K. et al. GH1-family 6-P-β-glucosidases from human microbiome lactic acid bacteria. Acta Crystallogr D Biol Crystallogr 69 , 451-463 (2013). https://doi.org/10.1107/s0907444912049608 Stam, M. R., Danchin, E. G., Rancurel, C., Coutinho, P. M. & Henrissat, B. Dividing the large glycoside hydrolase family 13 into subfamilies: towards improved functional annotations of alpha-amylase-related proteins. Protein Eng Des Sel 19 , 555-562 (2006). https://doi.org/10.1093/protein/gzl044 Wohlkönig, A., Huet, J., Looze, Y. & Wintjens, R. Structural relationships in the lysozyme superfamily: significant evidence for glycoside hydrolase signature motifs. PLoS One 5 , e15388 (2010). https://doi.org/10.1371/journal.pone.0015388 Alshareef, S. A. Metabolic analysis of the CAZy class glycosyltransferases in rhizospheric soil fungiome of the plant species Moringa oleifera. Saudi J Biol Sci 31 , 103956 (2024). https://doi.org/10.1016/j.sjbs.2024.103956 Møller, M. S., Henriksen, A. & Svensson, B. Structure and function of α-glucan debranching enzymes. Cell Mol Life Sci 73 , 2619-2641 (2016). https://doi.org/10.1007/s00018-016-2241-y Akcapinar, G. B., Kappel, L., Sezerman, O. U. & Seidl-Seiboth, V. Molecular diversity of LysM carbohydrate-binding motifs in fungi. Curr Genet 61 , 103-113 (2015). https://doi.org/10.1007/s00294-014-0471-9 van Wyk, N., Drancourt, M., Henrissat, B. & Kremer, L. Current perspectives on the families of glycoside hydrolases of Mycobacterium tuberculosis: their importance and prospects for assigning function to unknowns. Glycobiology 27 , 112-122 (2017). https://doi.org/10.1093/glycob/cww099 Abbott, D. W. & van Bueren, A. L. Using structure to inform carbohydrate binding module function. Curr Opin Struct Biol 28 , 32-40 (2014). https://doi.org/10.1016/j.sbi.2014.07.004 Kato, K. & Ishiwa, A. The role of carbohydrates in infection strategies of enteric pathogens. Trop Med Health 43 , 41-52 (2015). https://doi.org/10.2149/tmh.2014-25 Tailford, L. E., Crost, E. H., Kavanaugh, D. & Juge, N. Mucin glycan foraging in the human gut microbiome. Front Genet 6 , 81 (2015). https://doi.org/10.3389/fgene.2015.00081 Homann, N., Jousimies-Somer, H., Jokelainen, K., Heine, R. & Salaspuro, M. High acetaldehyde levels in saliva after ethanol consumption: methodological aspects and pathogenetic implications. Carcinogenesis 18 , 1739-1743 (1997). https://doi.org/10.1093/carcin/18.9.1739 Zhang, Q., Ma, Q., Wang, Y., Wu, H. & Zou, J. Molecular mechanisms of inhibiting glucosyltransferases for biofilm formation in Streptococcus mutans. Int J Oral Sci 13 , 30 (2021). https://doi.org/10.1038/s41368-021-00137-1 Qin, X., Chen, X. & Li, Q. The evolution and profile of AA10 lytic polysaccharide monooxygenase coupled with cellulose decomposition in different composting microenvironments . (2023). Onyango, S. O., Juma, J., De Paepe, K. & Van de Wiele, T. Oral and Gut Microbial Carbohydrate-Active Enzymes Landscape in Health and Disease. Frontiers in Microbiology Volume 12 - 2021 (2021). https://doi.org/10.3389/fmicb.2021.653448 Jensen, A., Scholz, C. F. P. & Kilian, M. Re-evaluation of the taxonomy of the Mitis group of the genus Streptococcus based on whole genome phylogenetic analyses, and proposed reclassification of Streptococcus dentisani as Streptococcus oralis subsp. dentisani comb. nov., Streptococcus tigurinus as Streptococcus oralis subsp. tigurinus comb. nov., and Streptococcus oligofermentans as a later synonym of Streptococcus cristatus. International Journal of Systematic and Evolutionary Microbiology 66 , 4803-4820 (2016). https://doi.org/https://doi.org/10.1099/ijsem.0.001433 Parks, D. H. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nature Biotechnology 36 , 996-1004 (2018). https://doi.org/10.1038/nbt.4229 Huang, Y., Zhao, X., Cui, L. & Huang, S. Metagenomic and Metatranscriptomic Insight Into Oral Biofilms in Periodontitis and Related Systemic Diseases. Frontiers in Microbiology Volume 12 - 2021 (2021). https://doi.org/10.3389/fmicb.2021.728585 Mark Welch, J. L., Rossetti, B. J., Rieken, C. W., Dewhirst, F. E. & Borisy, G. G. Biogeography of a human oral microbiome at the micron scale. Proceedings of the National Academy of Sciences 113 , E791-E800 (2016). https://doi.org/doi:10.1073/pnas.1522149113 Montanari, E. et al. Biofilm formation by the host microbiota: a protective shield against immunity and its implication in cancer. Mol Cancer 24 , 148 (2025). https://doi.org/10.1186/s12943-025-02348-0 Crestani, C. et al. Genomic and functional determinants of host spectrum in Group B Streptococcus. PLoS Pathog 20 , e1012400 (2024). https://doi.org/10.1371/journal.ppat.1012400 Lassalle, F. et al. GC-Content evolution in bacterial genomes: the biased gene conversion hypothesis expands. PLoS Genet 11 , e1004941 (2015). https://doi.org/10.1371/journal.pgen.1004941 Guo, L., Dai, H., Feng, S. & Zhao, Y. Contribution of GalU to biofilm formation, motility, antibiotic and serum resistance, and pathogenicity of Salmonella Typhimurium. Front Cell Infect Microbiol 13 , 1149541 (2023). https://doi.org/10.3389/fcimb.2023.1149541 Olson, A. B. et al. Phylogenetic relationship and virulence inference of Streptococcus Anginosus Group: curated annotation and whole-genome comparative analysis support distinct species designation. BMC Genomics 14 , 895 (2013). https://doi.org/10.1186/1471-2164-14-895 Pinho, S. S. & Reis, C. A. Glycosylation in cancer: mechanisms and clinical implications. Nature Reviews Cancer 15 , 540-555 (2015). https://doi.org/10.1038/nrc3982 Crouch, L. I. et al. The role of glycans in health and disease: Regulators of the interaction between gut microbiota and host immune system. Semin Immunol 73 , 101891 (2024). https://doi.org/10.1016/j.smim.2024.101891 Flynn, K. J., Baxter, N. T. & Schloss, P. D. Metabolic and Community Synergy of Oral Bacteria in Colorectal Cancer. mSphere 1 (2016). https://doi.org/10.1128/mSphere.00102-16 Garrett, W. S. Cancer and the microbiota. Science 348 , 80-86 (2015). https://doi.org/10.1126/science.aaa4972 Yu, B. et al. Identification and characterization of new proteins crucial for bacterial spore resistance and germination. Frontiers in Microbiology Volume 14 - 2023 (2023). https://doi.org/10.3389/fmicb.2023.1161604 Jackson, C. R. & Dugas, S. L. Phylogenetic analysis of bacterial and archaeal arsC gene sequences suggests an ancient, common origin for arsenate reductase. BMC Evolutionary Biology 3 , 18 (2003). https://doi.org/10.1186/1471-2148-3-18 Bui, H. B. & Inaba, K. Structures, Mechanisms, and Physiological Functions of Zinc Transporters in Different Biological Kingdoms. Int J Mol Sci 25 (2024). https://doi.org/10.3390/ijms25053045 Howell, K. J. et al. Gene Content and Diversity of the Loci Encoding Biosynthesis of Capsular Polysaccharides of the 15 Serovar Reference Strains of Haemophilus parasuis. Journal of Bacteriology 195 , 4264-4273 (2013). https://doi.org/doi:10.1128/jb.00471-13 Chua, W.-Z. et al. High-Throughput Mutagenesis and Cross-Complementation Experiments Reveal Substrate Preference and Critical Residues of the Capsule Transporters in Streptococcus pneumoniae. mBio 12 (2021). https://doi.org/10.1128/mBio.02615-21 Kedlaya Herga, S. et al. Streptococcus spp. in oral cancer: host-microbe interactions, mechanistic insights, and diagnostic implications. Frontiers in Cellular and Infection Microbiology Volume 15 - 2025 (2025). https://doi.org/10.3389/fcimb.2025.1688701 Hong, Q., Ding, S., Xing, C. & Mu, Z. Advances in tumor immune microenvironment of head and neck squamous cell carcinoma: A review of literature. Medicine (Baltimore) 103 , e37387 (2024). https://doi.org/10.1097/md.0000000000037387 Ahmad, S. et al. Oral Microbiome as a Biomarker and Therapeutic Target in Head and Neck Cancer: Current Insights and Future Directions. Cancers (Basel) 17 (2025). https://doi.org/10.3390/cancers17162667 Chen, G., Wu, K., Li, H., Xia, D. & He, T. Role of hypoxia in the tumor microenvironment and targeted therapy. Frontiers in Oncology Volume 12 - 2022 (2022). https://doi.org/10.3389/fonc.2022.961637 Aboelella, N. S., Brandle, C., Kim, T., Ding, Z. C. & Zhou, G. Oxidative Stress in the Tumor Microenvironment and Its Relevance to Cancer Immunotherapy. Cancers (Basel) 13 (2021). https://doi.org/10.3390/cancers13050986 Park, M., Mitchell, W. J. & Rafii, F. Effect of Trehalose and Trehalose Transport on the Tolerance of Clostridium perfringens to Environmental Stress in a Wild Type Strain and Its Fluoroquinolone-Resistant Mutant. International Journal of Microbiology 2016 , 4829716 (2016). https://doi.org/https://doi.org/10.1155/2016/4829716 Zhou, Y., Zhu, W., Bellur, P. S., Rewinkel, D. & Becker, D. F. Direct linking of metabolism and gene expression in the proline utilization A protein from Escherichia coli. Amino Acids 35 , 711-718 (2008). https://doi.org/10.1007/s00726-008-0053-6 Grove, A. MarR family transcription factors. Current Biology 23 , R142-R143 (2013). https://doi.org/10.1016/j.cub.2013.01.013 Chen, T. et al. The Human Oral Microbiome Database: a web accessible resource for investigating oral microbe taxonomic and genomic information. Database (Oxford) 2010 , baq013 (2010). https://doi.org/10.1093/database/baq013 Lamont, R. J., Koo, H. & Hajishengallis, G. The oral microbiota: dynamic communities and host interactions. Nature Reviews Microbiology 16 , 745-759 (2018). https://doi.org/10.1038/s41579-018-0089-x Lee, E. et al. Genomic analysis of conjugative and chromosomally integrated mobile genetic elements in oral streptococci. Applied and Environmental Microbiology 90 , e01360-01324 (2024). https://doi.org/doi:10.1128/aem.01360-24 Gauntlett, J. C. et al. Molecular Analysis of BcrR, a Membrane-bound Bacitracin Sensor and DNA-binding Protein from Enterococcus faecalis*. Journal of Biological Chemistry 283 , 8591-8600 (2008). https://doi.org/https://doi.org/10.1074/jbc.M709503200 Zhang, Y. et al. Predominant role of msr(D) over mef(A) in macrolide resistance in Streptococcus pyogenes. Microbiology (Reading) 162 , 46-52 (2016). https://doi.org/10.1099/mic.0.000206 Alpert, C. A. & Siebers, U. The lac operon of Lactobacillus casei contains lacT, a gene coding for a protein of the Bg1G family of transcriptional antiterminators. J Bacteriol 179 , 1555-1562 (1997). https://doi.org/10.1128/jb.179.5.1555-1562.1997 Yoshida, A. & Kuramitsu, H. K. Multiple Streptococcus mutans Genes Are Involved in Biofilm Formation. Appl Environ Microbiol 68 , 6283-6291 (2002). https://doi.org/10.1128/aem.68.12.6283-6291.2002 Du, Q., Wang, H. & Xie, J. Thiamin (vitamin B1) biosynthesis and regulation: a rich source of antimicrobial drug targets? Int J Biol Sci 7 , 41-52 (2011). https://doi.org/10.7150/ijbs.7.41 Jurgenson, C. T., Begley, T. P. & Ealick, S. E. The structural and biochemical foundations of thiamin biosynthesis. Annu Rev Biochem 78 , 569-603 (2009). https://doi.org/10.1146/annurev.biochem.78.072407.102340 Jiang, F. & Doudna, J. A. The structural biology of CRISPR-Cas systems. Curr Opin Struct Biol 30 , 100-111 (2015). https://doi.org/10.1016/j.sbi.2015.02.002 Jobin, C. Precision medicine using microbiota. Science 359 , 32-34 (2018). https://doi.org/10.1126/science.aar2946 de Vos, W. M., Boerrigter, I., van Rooyen, R. J., Reiche, B. & Hengstenberg, W. Characterization of the lactose-specific enzymes of the phosphotransferase system in Lactococcus lactis. J Biol Chem 265 , 22554-22560 (1990). Zhao, J., Liang, Y., Zhang, S. & Xu, Z. Effect of sugar transporter on galactose utilization in Streptococcus thermophilus. Frontiers in Microbiology Volume 14 - 2023 (2023). https://doi.org/10.3389/fmicb.2023.1267237 Afzal, M., Shafeeq, S. & Kuipers, O. P. LacR is a repressor of lacABCD and LacT is an activator of lacTFEG, constituting the lac gene cluster in Streptococcus pneumoniae. Appl Environ Microbiol 80 , 5349-5358 (2014). https://doi.org/10.1128/aem.01370-14 Marasco, R., Muscariello, L., Varcamonti, M., De Felice, M. & Sacco, M. Expression of the bglH gene of Lactobacillus plantarum is controlled by carbon catabolite repression. J Bacteriol 180 , 3400-3404 (1998). https://doi.org/10.1128/jb.180.13.3400-3404.1998 Croucher, N. J. et al. Horizontal DNA Transfer Mechanisms of Bacteria as Weapons of Intragenomic Conflict. PLOS Biology 14 , e1002394 (2016). https://doi.org/10.1371/journal.pbio.1002394 Vale, F. F., Lehours, P. & Yamaoka, Y. Editorial: The Role of Mobile Genetic Elements in Bacterial Evolution and Their Adaptability. Front Microbiol 13 , 849667 (2022). https://doi.org/10.3389/fmicb.2022.849667 Yang, Q. et al. Integrative and conjugative elements in streptococci can act as vectors for plasmids and translocatable units integrated via IS1216E. Int J Antimicrob Agents 61 , 106793 (2023). https://doi.org/10.1016/j.ijantimicag.2023.106793 Additional Declarations No competing interests reported. Supplementary Files SupDataS1Patientclinicalmetadata.xlsx SupDataS2Isolatemetadatasequencingqualitymetrics.xlsx SupDataS3Genomicandpangenomesummary.xlsx SupDataS4CAZymefunctionalanalysisofHNCcancerisolates.xlsx SupDataS5Geneenrichmentanddifferentialfrequencyanalysis.xlsx SupDataS6HGTanalysisandgenomicannotations.xls SupDataS7GenomesizeandGCcontentanalysiscancervshealthyisolates.xlsx SupDataS8CAZymefunctionalanddifferentialanalysiscancervshealthy.xlsx SupDataS9Functionalenrichmentanddifferentialgeneanalysiscancervshealthy.xlsx SupplementaryDatalist.docx MaiStrepGenomicsSuppFigures.docx GraphicalAbstract.png Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8400357","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":568984570,"identity":"c2844c0c-1e25-40d4-870c-f0471a8e855f","order_by":0,"name":"Linh Mai","email":"","orcid":"","institution":"University of Adelaide","correspondingAuthor":false,"prefix":"","firstName":"Linh","middleName":"","lastName":"Mai","suffix":""},{"id":568984571,"identity":"4a597601-9a3a-472c-8ea1-d52f08932769","order_by":1,"name":"George Bouras","email":"","orcid":"","institution":"University of Adelaide","correspondingAuthor":false,"prefix":"","firstName":"George","middleName":"","lastName":"Bouras","suffix":""},{"id":568984572,"identity":"4be4905a-3fdf-4db2-bde0-e2ab7ab87f7e","order_by":2,"name":"Kenny Yeo","email":"","orcid":"","institution":"University of Adelaide","correspondingAuthor":false,"prefix":"","firstName":"Kenny","middleName":"","lastName":"Yeo","suffix":""},{"id":568984573,"identity":"89177031-a0a0-4208-9e86-93546bd64975","order_by":3,"name":"John-Charles Hodge","email":"","orcid":"","institution":"Royal Adelaide Hospital","correspondingAuthor":false,"prefix":"","firstName":"John-Charles","middleName":"","lastName":"Hodge","suffix":""},{"id":568984574,"identity":"6695892b-7042-43a0-b3e4-c7f759220c3c","order_by":4,"name":"Emma Barry","email":"","orcid":"","institution":"University of Adelaide","correspondingAuthor":false,"prefix":"","firstName":"Emma","middleName":"","lastName":"Barry","suffix":""},{"id":568984575,"identity":"07a64df0-4804-4fc4-8d38-c39e44a05987","order_by":5,"name":"Suren Krishnan","email":"","orcid":"","institution":"Royal Adelaide Hospital","correspondingAuthor":false,"prefix":"","firstName":"Suren","middleName":"","lastName":"Krishnan","suffix":""},{"id":568984576,"identity":"832e8ffa-be19-4124-928f-c2a834d3592f","order_by":6,"name":"Peter-John Wormald","email":"","orcid":"","institution":"University of Adelaide","correspondingAuthor":false,"prefix":"","firstName":"Peter-John","middleName":"","lastName":"Wormald","suffix":""},{"id":568984577,"identity":"6f8040b7-497b-4ffb-89ff-bd4abc6eeecb","order_by":7,"name":"Rowan Valentine","email":"","orcid":"","institution":"Queen Elizabeth Hospital","correspondingAuthor":false,"prefix":"","firstName":"Rowan","middleName":"","lastName":"Valentine","suffix":""},{"id":568984578,"identity":"dbd39cb1-91b0-4ac3-b20c-310d820c22f2","order_by":8,"name":"Alkis Psaltis","email":"","orcid":"","institution":"University of Adelaide","correspondingAuthor":false,"prefix":"","firstName":"Alkis","middleName":"","lastName":"Psaltis","suffix":""},{"id":568984579,"identity":"304949bb-ecab-4c56-8702-d31c617c8a75","order_by":9,"name":"Sarah Vreugde","email":"","orcid":"","institution":"University of Adelaide","correspondingAuthor":false,"prefix":"","firstName":"Sarah","middleName":"","lastName":"Vreugde","suffix":""},{"id":568984580,"identity":"b3525cf9-6c6f-4292-96e6-ca908dca7bdb","order_by":10,"name":"Kevin Fenix","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABI0lEQVRIie3Qv2vCQBTA8RcOzuVJ1hNt/BdeCCjF0r8lIRCXDpaCZCg0IJxLoGv/jPgfpBzYJejabk5ODpYuDvbHxbUn6djhvnBwgfvw7gJgs/3DhF4lgAfAwNkAh57eENSfDSSozzDSBP9E4EQAuDgRaCCdee6rCZA3bOEgPUyv0J2zxQbSUZS1KjKRLlaknoCCyxlP3vJVgkLxO4JqHGV4YySeSEKFcIwKxZavban3CgfCkSrK4Azpb2tCD4Vy5O2n/Ma+wuHB+dLE3ZkvJlhZk5D0FNaWJZKeAk6miTBP6eSxJkR+oXjc7a1i9PVbRLgcB1JsJyYiXp5nH5hSn9aV/76bXnveWi32+/vRxaMbF+bfXPfrAqFe/Px5m81mszX0AyxQWC7+I3pUAAAAAElFTkSuQmCC","orcid":"","institution":"University of Adelaide","correspondingAuthor":true,"prefix":"","firstName":"Kevin","middleName":"","lastName":"Fenix","suffix":""}],"badges":[],"createdAt":"2025-12-19 03:53:10","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8400357/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8400357/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":99858937,"identity":"9a8278b6-0007-4c1f-a189-ba5a15a1cfd3","added_by":"auto","created_at":"2026-01-09 06:36:38","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":326273,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePhylogenetic Relationships and Genomic Characteristics of 101 HNC-Associated \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eStreptococcus\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e Isolates\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(A) Phylogenetic distance tree of 101 unique \u003cem\u003eStreptococcus\u003c/em\u003e isolates derived from head and neck cancer (HNC) patients, constructed using concatenated single-copy core gene sequences.\u003cbr\u003e\n(B) Genome sizes of \u003cem\u003eStreptococcus\u003c/em\u003e isolates, ordered from largest to smallest.\u003cbr\u003e\n(C) GC content (%) of the same isolates, aligned by genome size as in (B).\u003c/p\u003e\n\u003cp\u003eSeveral unclassified or novel isolates clustered within known species groups, suggesting close evolutionary relationships or potential misassignments of species. For example, \u003cem\u003eS. BHI_1\u003c/em\u003e, \u003cem\u003eS. BHI_2\u003c/em\u003e, \u003cem\u003eS. BHI_4\u003c/em\u003e, \u003cem\u003eS. BHI_6\u003c/em\u003e, \u003cem\u003eS. BHI_8\u003c/em\u003e, \u003cem\u003eS. BHI_9\u003c/em\u003e, \u003cem\u003eS. BHI_11\u003c/em\u003e, and \u003cem\u003eS. sp002238115\u003c/em\u003e were all positioned within the \u003cem\u003eS. mitis\u003c/em\u003e clade. Similarly, \u003cem\u003eS. sp029691405\u003c/em\u003e and \u003cem\u003eS. BHI_10\u003c/em\u003e grouped within the \u003cem\u003eS. oralis\u003c/em\u003e, while \u003cem\u003eS. sp013394695\u003c/em\u003e aligned with \u003cem\u003eS. infantis\u003c/em\u003e, and \u003cem\u003eS. sp001813105\u003c/em\u003e and \u003cem\u003eS. caecimuris\u003c/em\u003e fell within the \u003cem\u003eS. parasanguinis\u003c/em\u003e clades. These patterns reflect the well-documented genomic fluidity among members of the mitis group, which includes \u003cem\u003eS. mitis\u003c/em\u003e, \u003cem\u003eS. oralis\u003c/em\u003e, and \u003cem\u003eS. infantis\u003c/em\u003e—species that frequently exceed 95% ANI and exhibit overlapping pangenomes\u003csup\u003e56\u003c/sup\u003e. The phylogenetic proximity of several \u003cem\u003eS. BHI\u003c/em\u003e strains to \u003cem\u003eS. mitis\u003c/em\u003e likely reflects the presence of conserved core regions that obscure clear species demarcation. \u003cem\u003eS. mitis\u003c/em\u003e is particularly known for its expansive and genetically diverse pangenome, which may encompass emerging or cryptic lineages\u003csup\u003e57,58\u003c/sup\u003e. The clustering of GTDB-designated “sp.” taxa (e.g., \u003cem\u003eS. sp013394695\u003c/em\u003e) within established species groups further supports the possibility of novel or misclassified lineages that merit re-evaluation with expanded genome datasets and phenotypic characterisation.\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-8400357/v1/16acfd9116ab32b9865b7f7b.png"},{"id":99858935,"identity":"f0b57822-5db0-4dc5-b54e-de0272540227","added_by":"auto","created_at":"2026-01-09 06:36:38","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":233172,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eGenomic Clustering and Pangenome Structure of HNC-Associated \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eStreptococcus\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003eIsolates\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(A) UMAP projection of 101 \u003cem\u003eStreptococcus\u003c/em\u003e genomes from HNC patients based on the presence or absence of protein-coding genes. Clustering was performed using Panaroo-derived gene presence–absence matrices, revealing seven major clusters that correspond to known \u003cem\u003eStreptococcus\u003c/em\u003e clades. Each point represents one isolate, coloured by species and shaped by cluster membership.\u003c/p\u003e\n\u003cp\u003e(B) Stacked bar plot of gene counts per isolate, categorized into family-level core genes (orange; shared by all isolates), species-level core genes (green; shared within species), and accessory genes (blue; variable or strain-specific). Species are shown on the x-axis; total gene counts are plotted on the y-axis.\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-8400357/v1/85bbd96e943d89bff786648d.png"},{"id":100358180,"identity":"476586e5-f91d-472c-be90-c0ec94a8c61e","added_by":"auto","created_at":"2026-01-16 07:20:42","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":122219,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eMicroenvironment-Driven Gene Enrichment in Tumour- and Oral-Derived \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eStreptococcus\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(A) Heatmap showing the presence frequencies of 13 genes that differed in prevalence between tumour- and oral-derived \u003cem\u003eStreptococcus\u003c/em\u003e isolates. Genes were selected based on a frequency difference \u0026gt; 0.3 and raw \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.05 (Fisher’s exact test). Warmer colours indicate higher prevalence within a niche.\u003c/p\u003e\n\u003cp\u003e(B) Top recurrent niche-enriched genes identified across species. Bars represent the number of species in which each gene was significantly enriched in either tumour (orange) or oral (blue) isolates.\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-8400357/v1/95af9216d14bff618cf86fd3.png"},{"id":100358159,"identity":"8b1f04c3-843c-465d-9cb6-70f06c6347b1","added_by":"auto","created_at":"2026-01-16 07:20:40","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":307046,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eDiversity and distribution of CAZyme families and PULs in cancer-associated Streptococcus isolates.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(A) Total number of each CAZyme class per isolate across 101 \u003cem\u003eStreptococcus\u003c/em\u003egenomes. Bars are ordered by decreasing glycoside hydrolase (GH) counts, the most abundant CAZy class.\u003c/p\u003e\n\u003cp\u003e(B) CAZy class composition per species, shown as proportional bar plots of six major classes: GH, glycosyltransferases (GT), carbohydrate-binding modules (CBM), carbohydrate esterases (CE), polysaccharide lyases (PL), and auxiliary activities (AA).\u003c/p\u003e\n\u003cp\u003e(C) Mean number of unique CAZyme families per species (± SD), indicating variation in repertoire size across lineages.\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-8400357/v1/7fb623308b407e3ef2914775.png"},{"id":100357898,"identity":"b26e4aef-2b80-4cbf-867c-9d6fbcec1784","added_by":"auto","created_at":"2026-01-16 07:20:28","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":199928,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eHorizontal Gene Transfer (HGT) Dynamics and Functional Implications in HNC-Associated \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eStreptococcus\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(A) Intra-patient sharing of ICEs, prophage and plasmids across 132 oral \u003cem\u003estreptococcal\u003c/em\u003e isolates. Each bar represents a patient, showing the number of shared clusters among co-colonising isolates.\u003c/p\u003e\n\u003cp\u003e(B) Functional composition of horizontally transferred genes stratified by species. Top species (by number of HGT genes) are shown, with bars coloured by functional category.\u003c/p\u003e\n\u003cp\u003e(C) Overall functional categories of MGE-associated genes across all isolates. Counts are shown for non-hypothetical genes only.\u003c/p\u003e\n\u003cp\u003e(D) Functional composition by MGE type. Bars represent the within-type proportion of genes annotated to recombination, transposition, conjugation, AMR, replication, phage, or metabolic/transport functions.\u003c/p\u003e\n\u003cp\u003e(E) Top 20 gene products most frequently observed in HGT segments, stratified by MGE type.\u003c/p\u003e","description":"","filename":"5.png","url":"https://assets-eu.researchsquare.com/files/rs-8400357/v1/cf6ee9d4bbbcf0d50bb8310d.png"},{"id":99858940,"identity":"d06a2352-2d30-4089-ad39-daf4d11230f6","added_by":"auto","created_at":"2026-01-09 06:36:39","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":125817,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eGenomic features and gene repertoire differences between cancer-associated and healthy oral \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eStreptococcus\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e isolates.\u003c/strong\u003e Genome size (left) and GC content (right) of \u003cem\u003eStreptococcus\u003c/em\u003eisolates from oral samples of HNC patients (n = 76) and healthy individuals (n = 391). Points represent individual genomes, grouped by source cohort. Species-level differences were assessed using two-sided Wilcoxon rank-sum tests.\u003c/p\u003e","description":"","filename":"6.png","url":"https://assets-eu.researchsquare.com/files/rs-8400357/v1/8165bf194607ff69ac61075a.png"},{"id":99858942,"identity":"2f46f819-e98a-4a11-a375-407da978a728","added_by":"auto","created_at":"2026-01-09 06:36:39","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":183264,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFunctional and Genomic Differences in Streptococcus Isolates Between Cancer and Healthy Groups\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(A) Volcano plot illustrating gene presence–absence differences between cancer-associated and healthy \u003cem\u003eStreptococcus\u003c/em\u003e isolates. Significance was assessed using Fisher’s exact test followed by Benjamini–Hochberg FDR correction.\u003c/p\u003e\n\u003cp\u003e(B) KEGG pathway enrichment analysis of differentially enriched genes based on presence/absence data. Enrichment was tested using Fisher’s exact test with FDR adjustment.\u003c/p\u003e\n\u003cp\u003e(C) Gene Ontology (GO) enrichment analysis (biological process and molecular function) of significantly different genes between cohorts. Statistical significance was evaluated using Fisher’s exact test with Benjamini–Hochberg correction.\u003c/p\u003e","description":"","filename":"7.png","url":"https://assets-eu.researchsquare.com/files/rs-8400357/v1/760a3fa921f18b2af66ee4b3.png"},{"id":100377116,"identity":"c43710fe-a8db-40a5-a207-45ed151eb70f","added_by":"auto","created_at":"2026-01-16 08:47:08","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":3983920,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8400357/v1/15984d20-d556-45cc-a8b1-166f24559e0b.pdf"},{"id":99858938,"identity":"07fcd0f5-cb62-44ef-b21a-dca2c9de2ba8","added_by":"auto","created_at":"2026-01-09 06:36:38","extension":"xlsx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":13118,"visible":true,"origin":"","legend":"","description":"","filename":"SupDataS1Patientclinicalmetadata.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-8400357/v1/8a40a54c8cb257a8f35909a6.xlsx"},{"id":100358120,"identity":"e80dab30-6cc9-4caa-9986-2f9f1ef0932b","added_by":"auto","created_at":"2026-01-16 07:20:39","extension":"xlsx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":82406,"visible":true,"origin":"","legend":"","description":"","filename":"SupDataS2Isolatemetadatasequencingqualitymetrics.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-8400357/v1/47110064a82b507b45c2cde2.xlsx"},{"id":99858944,"identity":"c8e0fb92-2f78-446d-a751-facdcfd417b4","added_by":"auto","created_at":"2026-01-09 06:36:39","extension":"xlsx","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":3827576,"visible":true,"origin":"","legend":"","description":"","filename":"SupDataS3Genomicandpangenomesummary.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-8400357/v1/4efa227f86c33a4345909f2d.xlsx"},{"id":100357646,"identity":"8e6c97bb-e482-4e89-ae81-5892168ebbe3","added_by":"auto","created_at":"2026-01-16 07:20:08","extension":"xlsx","order_by":4,"title":"","display":"","copyAsset":false,"role":"supplement","size":592434,"visible":true,"origin":"","legend":"","description":"","filename":"SupDataS4CAZymefunctionalanalysisofHNCcancerisolates.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-8400357/v1/9d9a80d6d5f0b24503123ec6.xlsx"},{"id":99858952,"identity":"56c5a163-8634-4c4d-8e33-08c0815fadb3","added_by":"auto","created_at":"2026-01-09 06:36:39","extension":"xlsx","order_by":5,"title":"","display":"","copyAsset":false,"role":"supplement","size":12921937,"visible":true,"origin":"","legend":"","description":"","filename":"SupDataS5Geneenrichmentanddifferentialfrequencyanalysis.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-8400357/v1/bbca652d729f4ec9e2b58e7d.xlsx"},{"id":99858947,"identity":"b9de2c3f-6735-43d4-84a9-fad9bac26c2f","added_by":"auto","created_at":"2026-01-09 06:36:39","extension":"xls","order_by":6,"title":"","display":"","copyAsset":false,"role":"supplement","size":3285504,"visible":true,"origin":"","legend":"","description":"","filename":"SupDataS6HGTanalysisandgenomicannotations.xls","url":"https://assets-eu.researchsquare.com/files/rs-8400357/v1/44401ca3aeb3ea0315a5f4e2.xls"},{"id":100357335,"identity":"b49be80e-68d9-4d0b-b45c-c32003603694","added_by":"auto","created_at":"2026-01-16 07:19:41","extension":"xlsx","order_by":7,"title":"","display":"","copyAsset":false,"role":"supplement","size":33652,"visible":true,"origin":"","legend":"","description":"","filename":"SupDataS7GenomesizeandGCcontentanalysiscancervshealthyisolates.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-8400357/v1/5548018cd8165f79d3baf70e.xlsx"},{"id":99858951,"identity":"71de47a1-284b-43b3-8dd0-bacb73adc9d0","added_by":"auto","created_at":"2026-01-09 06:36:39","extension":"xlsx","order_by":8,"title":"","display":"","copyAsset":false,"role":"supplement","size":1565968,"visible":true,"origin":"","legend":"","description":"","filename":"SupDataS8CAZymefunctionalanddifferentialanalysiscancervshealthy.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-8400357/v1/cb93f8569669a792874218c4.xlsx"},{"id":100357706,"identity":"798c265c-f48d-41ea-bedb-e27245ef6b2e","added_by":"auto","created_at":"2026-01-16 07:20:14","extension":"xlsx","order_by":9,"title":"","display":"","copyAsset":false,"role":"supplement","size":225036,"visible":true,"origin":"","legend":"","description":"","filename":"SupDataS9Functionalenrichmentanddifferentialgeneanalysiscancervshealthy.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-8400357/v1/d322db9239df7d067b67f505.xlsx"},{"id":99858948,"identity":"b43d9212-a68d-4919-88d3-5e59374046ca","added_by":"auto","created_at":"2026-01-09 06:36:39","extension":"docx","order_by":10,"title":"","display":"","copyAsset":false,"role":"supplement","size":25900,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryDatalist.docx","url":"https://assets-eu.researchsquare.com/files/rs-8400357/v1/8dcf058a3ff70c24e4cb25b9.docx"},{"id":100357737,"identity":"4cf6d4a4-3b62-45fc-b2ff-a67dc95aa230","added_by":"auto","created_at":"2026-01-16 07:20:16","extension":"docx","order_by":11,"title":"","display":"","copyAsset":false,"role":"supplement","size":4307202,"visible":true,"origin":"","legend":"","description":"","filename":"MaiStrepGenomicsSuppFigures.docx","url":"https://assets-eu.researchsquare.com/files/rs-8400357/v1/515b0ed7b0efa971a8ec4aef.docx"},{"id":100358032,"identity":"055a87e1-4e12-4e46-ad73-fb6afb582074","added_by":"auto","created_at":"2026-01-16 07:20:35","extension":"png","order_by":12,"title":"","display":"","copyAsset":false,"role":"supplement","size":273388,"visible":true,"origin":"","legend":"","description":"","filename":"GraphicalAbstract.png","url":"https://assets-eu.researchsquare.com/files/rs-8400357/v1/c48918537538dcfd876790ea.png"}],"financialInterests":"No competing interests reported.","formattedTitle":"The Genomic Landscape of Head and Neck Cancer-associated Streptococci","fulltext":[{"header":"Highlights","content":"\u003cul\u003e\n \u003cli\u003eIn-depth analyses of 101 complete \u003cem\u003eStreptococcus\u003c/em\u003e genomes from Head and Neck Cancer patients.\u003c/li\u003e\n \u003cli\u003eGenomic analysis reveals extensive diversity within the accessory genome.\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eEvidence of intra- and inter-species horizontal gene transfer.\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eUnique genomic features identified in HNC-associated isolates compared to those from healthy donors.\u003c/li\u003e\n\u003c/ul\u003e"},{"header":"Introduction","content":"\u003cp\u003eHead and neck cancer (HNC) leads to 450,000 worldwide deaths per year and consists of tumours found in the oral\u0026nbsp;cavity, pharynx and larynx\u003csup\u003e1-3\u003c/sup\u003e. Lifestyle factors such as tobacco and alcohol consumption, and human papillomavirus (HPV) are well-accepted risk factors for disease pathology however, recent studies highlight the contributions of the local microbiota in tumour behaviour and treatment response\u003csup\u003e4-6\u003c/sup\u003e.\u003c/p\u003e\n\u003cp\u003eThe mucosal surfaces of the oral cavity support more than 700 bacterial species\u003csup\u003e7-9\u003c/sup\u003e with \u003cem\u003eStreptococci\u0026nbsp;\u003c/em\u003epredominating and accounting for roughly 20–50% of the healthy oral microbiota\u003csup\u003e8,10-12\u003c/sup\u003e. These bacteria not only colonise the oral cavity but also play a crucial role in human health. Commensal \u003cem\u003eStreptococcus\u003c/em\u003e species, such as \u003cem\u003eS. mitis, S. sanguinis, S. gordonii,\u003c/em\u003e and \u003cem\u003eS. salivarius\u003c/em\u003e, occupy ecological niches within the oral cavity and produce bacteriocins that suppress pathogen outgrowth\u003csup\u003e10,13-15\u003c/sup\u003e. However, certain \u003cem\u003eStreptococcus\u0026nbsp;\u003c/em\u003ecan display pathogenic behaviour under conditions of microbial dysbiosis or immune compromise, such as within the tumour microenvironment (TME)\u003csup\u003e16,17\u003c/sup\u003e. For instance, \u003cem\u003eS. anginosus\u003c/em\u003e has been associated with oral and oesophageal cancers, while \u003cem\u003eS. mitis\u003c/em\u003e and \u003cem\u003eS. parasanguinis\u003c/em\u003e have been shown to modulate immune signalling within the TME\u003csup\u003e16,18-25\u003c/sup\u003e. Collectively, these findings suggest that the pathogenic potential of commensal \u003cem\u003eStreptococcus\u003c/em\u003e species can be unmasked by alterations in host or environmental conditions.\u003c/p\u003e\n\u003cp\u003eDespite their abundance and intimate association with oral epithelial surfaces,\u0026nbsp;the precise mechanisms by which commensal \u003cem\u003eStreptococcus\u003c/em\u003e species contribute to the development of HNC remain poorly understood.\u0026nbsp;Although certain Streptococcus\u003cem\u003e\u0026nbsp;species\u0026nbsp;\u003c/em\u003ehave been implicated in carcinogenesis, it is unclear whether tumour-associated isolates acquire genetic adaptations that facilitate survival within the TME and influence tumour progression. The application of microbial whole-genome sequencing (WGS) has enabled the investigation of genomic adaptations that lead to lineage-specific traits and functional capacities impacting health and disease outcomes\u003csup\u003e26\u003c/sup\u003e. While previous studies have employed WGS to characterise the genomic composition of \u003cem\u003eStreptococcus\u003c/em\u003e isolates from healthy oral microbiomes\u003csup\u003e26\u003c/sup\u003e, to the best of our knowledge, no study has yet to comprehensively analysed the genomes of HNC-associated \u003cem\u003eStreptococcus\u003c/em\u003e strains.\u003c/p\u003e\n\u003cp\u003eIn this study, we performed a genome-resolved analysis of \u003cem\u003eStreptococcus\u003c/em\u003e isolates obtained from the tumours and oral cavities of HNC patients. By integrating species-level classification with pangenome profiling and functional annotation, we identified lineage-specific traits and accessory gene functions that may underlie ecological adaptation and tumour colonisation.\u0026nbsp;\u003c/p\u003e"},{"header":"Materials and Methods","content":"\u003cp\u003e\u003cstrong\u003eStudy Population and Sample Collection.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eMicrobial isolates were collected from 31 HNC patients, at three hospitals in Adelaide, Australia, between 2021 and 2022 (\u003cstrong\u003eSupplementary Data S1\u003c/strong\u003e). Before surgery, all participants provided written informed consent, following the ethical guidelines of the Central Adelaide Local Health Network’s ethics committee (CALHN Ref No. 14116). All samples were coded and anonymised before use.\u003c/p\u003e\n\u003cp\u003eSamples included tumour tissue harvested from resected cancer lesions and oral cavity swabs collected from the buccal mucosa and adjacent tissues. All samples were promptly transported to the laboratory in sterile containers on ice and processed on the same day of collection to preserve microbial viability. Tumour tissue was stored in completed DMEM media with 10% Foetal Bovine Serum (FBS) for later isolation.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eBacterial Isolation from Tumour Tissues\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTissues were rinsed three times with DMEM (10% FBS) to remove blood and then dispersed with a blade to increase surface area. Minced tissues were digested using a standard protocol of collagenase to degrade the extracellular matrix and release microbial communities\u003csup\u003e27\u003c/sup\u003e. Briefly, tissue was treated with a final working concentration of 1000\u0026nbsp;U/ml of collagenase IV digestion solution (3\u0026nbsp;ml of sterile phosphate-buffered saline (PBS) with Collagenase IV, 30\u0026nbsp;μl of 100\u0026nbsp;mM CaCl2, 90\u0026nbsp;μl of 1% BSA) mixed at 37°C for 60 mins with continuous shaking at 100\u0026nbsp;rpm. After incubation, the tumour digest was filtered through a 40 µm cell strainer to remove large tissue fragments. The microbial cells enrichment filtrate was centrifuged at 3000 x g for 10 minutes to pellet bacterial cells. Eventually, the bacterial pellet was resuspended in PSB and streaked onto Remel Wilkins-Chalgren Agar (WCA) dishes\u003csup\u003e28\u003c/sup\u003e. Dishes were incubated in anaerobic chambers with an oxygen-free environment at 37°C for 24 to 48 hours. A single colony was selected based on morphology for further subculture, until a pure colony was obtained. Isolated bacterial strains were cryostored at –80°C in glycerol stock (20%, v/v).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAerobic Bacterial Isolation from Oral Swabs.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eOral cavity swabs, within 2 hours after collecting from 31 HNC patients, were directly inoculated onto a range of selective and non-selective agar media, including Tryptic soy agar (TSA), Luria-Bertani agar (LBA), Sheep Blood agar (SBA), Nutrient agar (NA) and Brain Heart Infusion (BHI) Agar. The plates were incubated aerobically at 37°C for 16 to 24 hours. After incubation, single bacterial colonies were subcultured onto fresh petri dishes based on morphological characteristics. After a single isolate was confirmed, bacteria were stored at –80 °C in glycerol stock (20%, v/v) for further analysis.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eDNA Extraction and Sequencing\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eBacterial isolates were cultured overnight, and the pellets were collected after centrifugation at 3000 × g for 10 minutes. Genomic DNA was extracted using the DNeasy Blood \u0026amp; Tissue Kit (Qiagen, Hilden, Germany) following the manufacturer’s protocol. The concentration and purity of extracted DNA were assessed using a NanoDrop spectrophotometer (Thermo Fisher Scientific, Wilmington, DE, USA). Samples with an A260/A280 ratio between 1.8 and 2.0 were considered suitable for downstream applications.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eWhole Genome Sequencing- Long Read Sequencing (LRS)\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eExtracted DNA from all isolates was prepared for WGS using the MinION platform with R9.4.1 flow cells (Oxford Nanopore Technologies, Oxford, UK). Library preparation was performed using the Rapid Barcoding Kit 96 (SQK-RBK110.96; Oxford Nanopore Technologies Ltd.) according to the manufacturer’s instructions. DNA concentrations were measured using the Qubit 4 Fluorometer with the dsDNA High Sensitivity Assay Kit (Thermo Fisher Scientific, Waltham, MA, USA). Base calling was conducted with Guppy (v6.2.11) in super accuracy mode using the dna_r9.4.1_450bps_sup.cfg configuration file.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eWhole Genome Sequencing - Short Read Sequencing (SRS)\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eFollowing taxonomic identification using ONT LRS, isolates belonging to the \u003cem\u003eStreptococcus\u003c/em\u003e genus were selected for SRS-WGS through the Australian Genome Research Facility (AGRF). Genomic DNA was enzymatically fragmented, and sequencing libraries were prepared using the Nextera XT DNA Library Prep Kit with unique dual indices added via low-cycle PCR. Libraries were purified, quantified, and manually normalised before sequencing. Paired-end sequencing (2 × 150 bp) was performed on the Illumina NovaSeq 6000 platform using S4 flow cell chemistry. Image analysis and base calling were performed in real time using NovaSeq Control Software (NCS v1.2.0.28691) and Real-Time Analysis (RTA v4.6.7). Raw base call (BCL) files were converted to FASTQ format using the Illumina DRAGEN BCL Convert pipeline (v4.0.3). All sequencing data met AGRF quality control standards. Final read quality was evaluated using FastQC (v0.11.9).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAssembly, Annotation and Taxonomic Classification\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eGenome assemblies were generated using Hybracter v0.7.3, using `hybracter long` is only long reads were available and `hybracter hybrid` if both long and short reads were\u003csup\u003e29\u003c/sup\u003e. Specifically, Flye v2.9.3 was used for assemble the chromosomes with Dnaapler v0.7.0 used for consistent reorientation to ensure all genomes began with the dnaA gene\u003csup\u003e30,31\u003c/sup\u003e. Plassembler v1.5.1 was used to assemble all plasmids\u003csup\u003e32\u003c/sup\u003e. Genome completeness was checked with CheckM2 v1.0.2\u003csup\u003e33\u003c/sup\u003e. Taxonomic assignment was conducted using GTDB-tk v2.4.0 using the `gtdbtk classify_wf` command\u003csup\u003e34\u003c/sup\u003e. Genome annotation was then conducted with Bakta v1.9.4 using the full Bakta database\u003csup\u003e35\u003c/sup\u003e.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCore genome phylogenetic reconstruction\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003ePhylogenetic relationships among the 101 dereplicated \u003cem\u003eStreptococcus\u003c/em\u003e isolates were inferred from the concatenated alignment of core genes identified by Panaroo v1.3.0 in strict mode (–clean-mode strict; –core_threshold 0.99)\u003csup\u003e36\u003c/sup\u003e. Coding sequences for each core gene cluster were aligned using MAFFT v7.505, as implemented in the panaroo-msa utility, and concatenated into a single core gene alignment\u003csup\u003e37\u003c/sup\u003e. The resulting multiple sequence alignment was analysed with IQ-TREE v3.0.1 under the best-fit substitution model selected by ModelFinder (-m MFP)\u003csup\u003e38\u003c/sup\u003e. Branch support was assessed using 1,000 ultrafast bootstrap replicates (-bb 1000) and 1,000 SH-aLRT replicates (-alrt 1000). The final maximum likelihood tree was visualised and annotated in iTOL v6 online platform (https://itol.embl.de), with tip labels and colour coding based on species-level taxonomy (GTDB assignments) and anatomical source (oral cavity or tumour)\u003csup\u003e39\u003c/sup\u003e.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eGene content-based clustering and UMAP visualization\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo assess genomic relationships based on accessory gene content, a binary matrix representing the presence/absence of protein-coding genes was generated from the pangenome analysis using Panaroo. Uniform Manifold Approximation and Projection (UMAP) was applied for dimensionality reduction and visualization. K-means clustering was then used to define isolate clusters. The optimal number of clusters (k) was determined using the elbow method, by plotting the within-cluster sum of squares (WSS) across k = 1 to 15. A sharp inflection point at k = 7 indicated the optimal number of clusters, consistent with standard heuristics for WSS minimization\u003csup\u003e40,41\u003c/sup\u003e.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCarbohydrate-active enzyme (CAZyme) annotation\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eCarbohydrate-active enzymes (CAZymes) was identified to characterise the glycan degradation, synthesis, and binding capacities of \u003cem\u003eStreptococcus\u003c/em\u003e isolates\u003csup\u003e42\u003c/sup\u003e. Protein sequences from each genome were queried against the CAZy database using DIAMOND (v2.0.13.151) in blastp mode with default parameters for functional annotation of CAZyme families\u003csup\u003e43\u003c/sup\u003e. Gene family counts for CAZymes was quantified per isolate, and species-level profiles were generated by summing gene counts across isolates. These data were used to compare glycan utilisation potential between species and to investigate associations with tumour-associated ecological adaptation.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eNiche-specific and species-resolved gene enrichment analysis\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe compared gene presence/absence profiles between \u003cem\u003eStreptococcus\u003c/em\u003e isolates from tumour and oral cavity niches across the HNC cohort to identify niche-enriched genes. Gene presence frequencies were calculated for each niche, and Fisher’s exact tests were performed on 2×2 contingency tables for each gene. For the whole-cohort analysis, genes were retained if they showed an absolute frequency difference \u0026gt; 0.3 between niches and a raw p \u0026lt; 0.05, with a minimum prevalence of 10% in at least one niche. Gene functional annotations were retrieved from genome annotation outputs and curated using UniProtKB for known protein names and functional domains.\u003c/p\u003e\n\u003cp\u003eFor the species-resolved recurrent enrichment analysis, enrichment tests were performed separately for each species with ≥ 2 tumour and ≥ 2 oral isolates. Genes were classified as tumour-enriched or oral-enriched within a species if they had FDR-adjusted p \u0026lt; 0.1 (Benjamini–Hochberg correction) and were significantly depleted in the opposite niche. The number of species in which each gene was enriched was counted, and genes enriched in ≥ 2 species were considered “recurrent” in that niche. Functional categories for recurrent genes were assigned based on annotation databases and literature review. Visualisations were generated in R using the ggplot2 package.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eIdentification of mobile genetic elements and horizontal transfer\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe identification of potential horizontal gene transfer (HGT) events between \u003cem\u003eStreptococcus\u003c/em\u003e isolates from the same patient required a multi-step approach that integrated mobile genetic element (MGE) annotation with sequence clustering and functional analysis.\u0026nbsp;\u0026nbsp;First, genomic assemblies were annotated for MGEs using dedicated tools:\u0026nbsp;\u003cstrong\u003eMobileElementFinder\u003c/strong\u003e\u003cstrong\u003e\u0026nbsp;v1.0.3\u003c/strong\u003e\u003cstrong\u003e\u003csup\u003e44\u003c/sup\u003e\u003c/strong\u003e to detect integrative and conjugative elements (ICEs),\u0026nbsp;\u003cstrong\u003ePlasmidFinder\u003c/strong\u003e\u003cstrong\u003e\u0026nbsp;v2.1\u003c/strong\u003e \u003cstrong\u003e/MOB-suite\u003c/strong\u003e\u003cstrong\u003e\u0026nbsp;v3.0.3\u003c/strong\u003e\u003cstrong\u003e\u003csup\u003e45\u003c/sup\u003e\u003c/strong\u003e to identify plasmids, and\u0026nbsp;\u003cstrong\u003ePharokka\u003c/strong\u003e\u003cstrong\u003e\u0026nbsp;v1.3.0\u003c/strong\u003e\u003cstrong\u003e\u003csup\u003e46\u003c/sup\u003e\u003c/strong\u003e\u003cstrong\u003e\u0026nbsp;with CheckV\u003c/strong\u003e\u003cstrong\u003e\u0026nbsp;v1.0.1\u003c/strong\u003e\u003cstrong\u003e\u003csup\u003e47\u003c/sup\u003e\u003c/strong\u003e to characterize prophages. Predicted sequences from all isolates were then clustered with\u0026nbsp;\u003cstrong\u003eCD-HIT\u003c/strong\u003e\u003cstrong\u003e\u003csup\u003e48\u003c/sup\u003e\u003c/strong\u003e at ≥90% identity and ≥80% coverage to define homologous MGE clusters shared across isolates.\u0026nbsp;Next, cluster assignments were mapped to patient identifiers, and HGT events were defined as identical cluster IDs present in two or more co-colonising isolates from the same patient. This enabled quantification of within-patient sharing by MGE type (ICE, plasmid, prophage) and detection of cross-patient dissemination.\u0026nbsp;Open reading frames within each MGE cluster were annotated with\u0026nbsp;\u003cstrong\u003eProkka\u003c/strong\u003e\u003cstrong\u003e\u003csup\u003e49\u003c/sup\u003e\u003c/strong\u003e, and predicted functions were grouped into broad categories, including recombination, conjugation, antimicrobial resistance (AMR), and phage-related functions. Functional distributions were visualised in\u0026nbsp;\u003cstrong\u003eR\u003c/strong\u003e using\u0026nbsp;\u003cstrong\u003eggplot2\u003c/strong\u003e, while networks of shared MGEs were constructed with\u0026nbsp;\u003cstrong\u003eigraph/ggraph\u003c/strong\u003e, with nodes representing isolates and edges weighted by the number of shared clusters.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eStatistical analysis\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eData were analysed using R (v4.2.2) and Python (v3.9). For microbial isolate prevalence, differences in \u003cem\u003eStreptococcus\u003c/em\u003e abundance between HNC patients and controls were assessed using Fisher’s exact test, with a significance threshold of p \u0026lt; 0.05. Genomic uniqueness was determined by pairwise comparisons of Average Nucleotide Identity (ANI) via FastANI (v1.33), with isolates considered clonal if ANI \u0026gt; 99.9%, \u0026lt;1,000 SNPs, and \u0026lt;75 InDels\u003csup\u003e50\u003c/sup\u003e. Phylogenetic trees were constructed using mastree via the iTOL website. Genome size and GC content differences across species were evaluated using one-way ANOVA, followed by Tukey’s post-hoc test for pairwise comparisons. Functional trait distributions—including CAZyme counts, and phage proteins—were compared between tumour- and oral-derived isolates using Wilcoxon rank-sum tests within species. The analysis of pathway and functional category enrichments used Gene Ontology (GO) and KEGG databases through hypergeometric testing with Benjamini–Hochberg correction for FDR values below 0.05. The analysis used Fisher’s exact test to compare gene presence between different conditions and niches and determined enrichment significance at p \u0026lt; 0.05 (two-tailed). All graphical outputs (volcano plots, network visualisations, UMAP projections, and functional summaries) were generated in R using the ggplot2, igraph, and ggraph packages.\u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003e\u003cstrong\u003eIsolation, identification and deduplication of HNC-associated \u003cem\u003eStreptococci.\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe analysed bacterial isolates collected from \u003cstrong\u003e31 HNC patients\u003c/strong\u003e, with a mean age of \u003cstrong\u003e69\u0026nbsp;\u003c/strong\u003e(range: 44–91). The cohort consisted of \u003cstrong\u003e26 males and 5 females\u003c/strong\u003e (\u003cstrong\u003eSupplementary\u0026nbsp;\u003c/strong\u003e\u003cstrong\u003eData S1\u003c/strong\u003e). Patients presented with a broad spectrum of tumour sites, including the \u003cstrong\u003eoropharynx, larynx, oral tongue, floor of mouth, buccal mucosa, gingiva/alveolar ridge, sinonasal cavity, and salivary glands\u003c/strong\u003e\u003cstrong\u003e,\u0026nbsp;\u003c/strong\u003espanning \u003cstrong\u003eStage I to Stage IVb\u003c/strong\u003e disease where staging was documented. Samples were collected from two sites: (i) \u003cstrong\u003etumour tissue\u003c/strong\u003e, obtained via enzymatic digestion of resected tumours, and (ii) \u003cstrong\u003eoral swabs\u003c/strong\u003e collected from non-tumour oral mucosa.\u003c/p\u003e\n\u003cp\u003eTo investigate the genomic landscape of HNC-associated \u003cem\u003eStreptococcus\u003c/em\u003e, we performed long-read WGS on 388 bacterial isolates using the Oxford Nanopore MinION platform. Following quality control, 318 high-quality assemblies were taxonomically classified using the Genome Taxonomy Database (GTDB), of which \u003cem\u003eStreptococcus\u003c/em\u003e represented the dominant genus, accounting for 143 genomes (50.3%). This aligns with previous studies identifying \u003cem\u003eStreptococcus\u003c/em\u003e as a predominant member of the oral microbiome in both healthy individuals and HNC patients\u003csup\u003e51,52\u003c/sup\u003e. Focusing our analysis on \u003cem\u003eStreptococcus\u003c/em\u003e, we selected a subset of 143 genomes for in-depth WGS-based investigation, prioritising tumour-resident isolates and reducing redundancy across samples. Since multiple isolates were often derived from the same patient and species, we implemented a dereplication step to identify clonal lineages. Pairwise genome comparisons were performed using FastANI and Nucmer\u003csup\u003e53,54\u003c/sup\u003e. Isolates were considered clonally redundant if they shared \u0026gt;99.9% ANI, \u0026lt;1,000 single-nucleotide polymorphisms (SNPs), and \u0026lt;75 insertion-deletions (InDels). While these thresholds are relatively permissive, they were chosen to retain intra-patient diversity while minimizing analytical redundancy. Applying these filters, we defined a final dataset of 101 non-redundant \u003cem\u003eStreptococcus\u003c/em\u003e genomes for downstream genomic and functional analyses (\u003cstrong\u003eSupplementary\u0026nbsp;\u003c/strong\u003e\u003cstrong\u003eData S2\u003c/strong\u003e).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ePhylogenetic Relationships of HNC-Associated \u003cem\u003eStreptococci.\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo explore the strain-level diversity and evolutionary relationships of \u003cem\u003eStreptococcus\u003c/em\u003e isolates in HNC patients, we analysed a maximum likelihood phylogenetic tree constructed from 101 dereplicated genomes\u003csup\u003e55\u003c/sup\u003e. These genomes encompassed 35 \u003cem\u003eStreptococcus\u003c/em\u003e species, including 10 putative novel species that had not been previously classified in the GTDB. These novel species are provisionally named \u003cem\u003eS. BHI_1\u003c/em\u003e to \u003cem\u003eS. BHI_10\u003c/em\u003e. The resulting phylogenetic tree (\u003cstrong\u003eFig. 1A\u003c/strong\u003e) illustrates the taxonomic structure of the \u003cem\u003eStreptococcus\u003c/em\u003e population, with branches colour-coded by species and anatomical source (oral cavity or tumour). A high degree of species-level diversity is evident, with distinct clades corresponding to well-characterised taxa such as \u003cem\u003eS. anginosus\u003c/em\u003e, \u003cem\u003eS. mitis\u003c/em\u003e, \u003cem\u003eS. oralis\u003c/em\u003e, and \u003cem\u003eS. salivarius\u003c/em\u003e, consistent with their established abundance in the oral microbiome\u003csup\u003e22\u003c/sup\u003e. Notably, \u003cem\u003eS. mitis\u003c/em\u003e and \u003cem\u003eS. parasanguinis\u003c/em\u003e were among the most abundant species, each forming prominent clades.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eSeveral unclassified or novel isolates clustered within known species groups, suggesting close evolutionary relationships or potential misassignments of species. For example, \u003cem\u003eS. BHI_1\u003c/em\u003e, \u003cem\u003eS. BHI_2\u003c/em\u003e, \u003cem\u003eS. BHI_4\u003c/em\u003e, \u003cem\u003eS. BHI_6\u003c/em\u003e, \u003cem\u003eS. BHI_8\u003c/em\u003e, \u003cem\u003eS. BHI_9\u003c/em\u003e, \u003cem\u003eS. BHI_11\u003c/em\u003e, and \u003cem\u003eS. sp002238115\u003c/em\u003e were all positioned within the \u003cem\u003eS. mitis\u003c/em\u003e clade. Similarly, \u003cem\u003eS. sp029691405\u003c/em\u003e and \u003cem\u003eS. BHI_10\u003c/em\u003e grouped within the \u003cem\u003eS. oralis\u003c/em\u003e, while \u003cem\u003eS. sp013394695\u003c/em\u003e aligned with \u003cem\u003eS. infantis\u003c/em\u003e, and \u003cem\u003eS. sp001813105\u003c/em\u003e and \u003cem\u003eS. caecimuris\u003c/em\u003e fell within the \u003cem\u003eS. parasanguinis\u003c/em\u003e clades. These patterns reflect the well-documented genomic fluidity among members of the mitis group, which includes \u003cem\u003eS. mitis\u003c/em\u003e, \u003cem\u003eS. oralis\u003c/em\u003e, and \u003cem\u003eS. infantis\u003c/em\u003e—species that frequently exceed 95% ANI and exhibit overlapping pangenomes\u003csup\u003e56\u003c/sup\u003e.\u0026nbsp;The phylogenetic proximity of several \u003cem\u003eS. BHI\u003c/em\u003e strains to \u003cem\u003eS. mitis\u003c/em\u003e likely reflects the presence of conserved core regions that obscure clear species demarcation. \u003cem\u003eS. mitis\u003c/em\u003e is particularly known for its expansive and genetically diverse pangenome, which may encompass emerging or cryptic lineages\u003csup\u003e57,58\u003c/sup\u003e.\u0026nbsp;The clustering of GTDB-designated “sp.” taxa (e.g., \u003cem\u003eS. sp013394695\u003c/em\u003e) within established species groups further supports the possibility of novel or misclassified lineages that merit re-evaluation with expanded genome datasets and phenotypic characterisation.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eGenomic size and GC content of HNC-associated \u003cem\u003eStreptococci.\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe analysed the genomic characteristics of the 101 unique \u003cem\u003eStreptococcus\u003c/em\u003e isolates to explore intra- and inter-species diversity. Genome sizes ranged from approximately 1.7 Mbp to 2.5 Mbp, consistent with the known variation within \u003cem\u003eStreptococcus\u003c/em\u003e\u003csup\u003e59\u003c/sup\u003e.\u0026nbsp;While most species showed a narrow range of genome sizes, several, such as \u003cem\u003eS. parasanguinis,\u0026nbsp;\u003c/em\u003eexhibited notable variability, suggesting the presence of strain-specific genetic elements (\u003cstrong\u003eFig. 1B\u003c/strong\u003e). Among the 10 putative novel species, \u003cem\u003eS. BHI_11\u003c/em\u003e, \u003cem\u003eS. BHI_6\u003c/em\u003e, and \u003cem\u003eS. BHI_4\u003c/em\u003e that were assigned to the \u003cem\u003eS. mitis\u003c/em\u003e group had genome sizes approaching 2.1 Mbp, comparable to the upper end of typical \u003cem\u003eS. mitis\u003c/em\u003e isolates, supporting their classification as genetically distinct yet closely related taxa.\u003c/p\u003e\n\u003cp\u003eGC content across isolates ranged from 36% to 43%, with most species falling between 39% and 41%, consistent with previously reported profiles for oral \u003cem\u003eStreptococci\u003c/em\u003e\u003csup\u003e56,59\u003c/sup\u003e (\u003cstrong\u003eFig. 1C\u003c/strong\u003e). While genome size varied, GC content remained relatively stable within species, reflecting conserved evolutionary and functional constraints. For example, \u003cem\u003eS. parasanguinis\u003c/em\u003e exhibited a broad genome size range (2.0–2.3 Mbp) while maintaining a stable GC content of approximately 41%. These patterns indicate that genome expansion does not substantially shift GC composition.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eGenomic Diversity and Functional Composition of \u003cem\u003eStreptococcus\u003c/em\u003e Isolates.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo further elucidate the genomic relationships of 101 unique \u003cem\u003eStreptococcus\u003c/em\u003e isolates from HNC patients, we performed gene content-based clustering using the presence or absence of protein-coding genes identified through pangenome analysis.\u0026nbsp;\u003cstrong\u003eK-means clustering identified seven distinct isolate clusters, with the optimal k determined using the elbow method (k = 7) (\u003c/strong\u003e\u003cstrong\u003eSupplementary Fig. S1)\u003c/strong\u003e\u003cstrong\u003e.\u003c/strong\u003e UMAP was then used to project these high-dimensional gene content profiles into two dimensions, providing a clear visual separation of the seven clusters\u0026nbsp;(\u003cstrong\u003eFig. 2A\u003c/strong\u003e). These clusters corresponded closely with known \u003cem\u003eStreptococcus\u003c/em\u003e phylogenetic groups, including the Mitis, Sanguinis, Anginosus, and Salivarius groups, reinforcing the concordance between phylogenetic and functional classifications\u003csup\u003e60\u003c/sup\u003e. The separation was also visually apparent, with distinct clusters showing minimal overlap in the UMAP space, illustrating their distinct accessory gene-content profiles and aligning with their phylogenetic assignments.\u003c/p\u003e\n\u003cp\u003eWithin the UMAP space, isolates clustered tightly with their corresponding species, affirming species-level identity (\u003cstrong\u003eFig. 2A\u003c/strong\u003e). The Mitis group, known for its genomic plasticity, was distributed across 3 clusters (Clusters 1, 3, and 6). Cluster 1 contained \u003cem\u003eS. oralis\u003c/em\u003e, \u003cem\u003eS. BHI\u003c/em\u003e\u003cem\u003e_\u003c/em\u003e\u003cem\u003e10\u003c/em\u003e, and \u003cem\u003eS. sp029691405\u003c/em\u003e, aligning with their position in the \u003cem\u003eS. oralis\u003c/em\u003e clade in the core genome tree. Cluster 3 included \u003cem\u003eS. infantis\u003c/em\u003e, \u003cem\u003eS. massiliensis\u003c/em\u003e, \u003cem\u003eS. mutans\u003c/em\u003e, and several novel taxa (e.g., \u003cem\u003eS. BHI_3\u003c/em\u003e, \u003cem\u003eS. BHI_7\u003c/em\u003e), while Cluster 6 featured novel mitis-like species (e.g., \u003cem\u003eS. BHI_2\u003c/em\u003e, \u003cem\u003eS. BHI_6\u003c/em\u003e) grouped phylogenetically with \u003cem\u003eS. mitis\u003c/em\u003e.\u0026nbsp;These findings suggest that Mitis-like species have retained conserved core genes while diversifying in accessory content, as illustrated by cluster-level pangenome comparisons (\u003cstrong\u003eSupplementary Fig. S2A, C, F\u003c/strong\u003e). The Sanguinis group (\u003cem\u003eS. sanguinis\u003c/em\u003e, \u003cem\u003eS. gordonii\u003c/em\u003e, \u003cem\u003eS. cristatus\u003c/em\u003e, \u003cem\u003eS. sinensis\u003c/em\u003e) formed Cluster 2, which is characterised by high core gene content (e.g., 73.8% in \u003cem\u003eS. sanguinis\u003c/em\u003e, 76.7% in \u003cem\u003eS. gordonii\u003c/em\u003e). \u003cem\u003eS. gordonii\u003c/em\u003e and \u003cem\u003eS. sanguinis\u003c/em\u003e harbour open pan-genomes and share generally high sequence homology (\u003cstrong\u003eSupplementary Fig. S2B\u003c/strong\u003e). Several of their shared core genes are involved in carbohydrate metabolism (such as PTS components, glycolysis, and glycogen metabolism) and oxidative stress response (such as thioredoxin and peroxide detoxification systems), consistent with metabolic fitness and stress tolerance that may support persistence within oral biofilms\u003csup\u003e61,62\u003c/sup\u003e (\u003cstrong\u003eSupplementary Data S3\u003c/strong\u003e). Cluster 5 included the Anginosus group (\u003cem\u003eS. anginosus\u003c/em\u003e, \u003cem\u003eS. constellatus\u003c/em\u003e, \u003cem\u003eS. hominis\u003c/em\u003e), which are opportunistic pathogens frequently associated with abscess formation and deep tissue invasion\u0026nbsp;(\u003cstrong\u003eSupplementary Fig. S2E\u003c/strong\u003e) \u003csup\u003e21,63,64\u003c/sup\u003e.\u003c/p\u003e\n\u003cp\u003eNext, we constructed a pangenome on the 13\u0026nbsp;\u003cem\u003eStreptococcus\u003c/em\u003e species with multiple isolates\u0026nbsp;(\u003cstrong\u003eFig. 2B\u003c/strong\u003e). Only 29 genus-level core genes were found across all isolates, highlighting the extensive genetic variability within this dataset. These genes included essential housekeeping functions such as ribosomal proteins (\u003cem\u003erpl/rps\u003c/em\u003e), RNA polymerase subunits (\u003cem\u003erpo\u003c/em\u003e), translation factors (\u003cem\u003einf\u003c/em\u003e), and key components of protein secretion (\u003cem\u003esecY\u003c/em\u003e) and polysaccharide synthesis (galU) (\u003cstrong\u003eSupplementary Data S3\u003c/strong\u003e). Comparative genomic analysis across 13 \u003cem\u003eStreptococcus\u003c/em\u003e species isolated from our cohort patients revealed substantial interspecies variation in genome architecture and coding potential (\u003cstrong\u003eSupplementary Fig. S3\u003c/strong\u003e;\u003cstrong\u003e\u0026nbsp;Supplementary Data S3\u003c/strong\u003e). Average core gene lengths were relatively consistent across species, ranging from 916 bp in \u003cem\u003eS. xiaochunlingii\u003c/em\u003e to 973 bp in \u003cem\u003eS. salivarius\u003c/em\u003e, suggesting functional conservation of the core genome with only modest lineage-specific variation(\u003cstrong\u003eSupplementary Fig. S3B\u003c/strong\u003e). GC content ranged from 37.99% in \u003cem\u003eS. constellatus\u003c/em\u003e to 43.00% in \u003cem\u003eS. sanguinis\u003c/em\u003e. While species such as \u003cem\u003eS. sanguinis\u003c/em\u003e (43.00%), \u003cem\u003eS. xiaochunlingii\u003c/em\u003e (42.00%), and \u003cem\u003eS. parasanguinis\u003c/em\u003e (41.73%) had the highest GC content, the overall range was narrow, indicating that GC bias is unlikely to be a major driver of genomic differentiation in these taxa. Average genome sizes spanned from 1.86 Mb in \u003cem\u003eS. infantis\u003c/em\u003e to 2.44 Mb in \u003cem\u003eS. sanguinis\u003c/em\u003e, with larger genomes observed in \u003cem\u003eS. gordonii\u003c/em\u003e, \u003cem\u003eS. salivarius\u003c/em\u003e, and \u003cem\u003eS. BHI_8\u003c/em\u003e, consistent with an expanded coding repertoire.\u003c/p\u003e\n\u003cp\u003eTo account for differences in the number of isolates per species, we performed rarefaction analysis (n=5 isolates per species, 100 bootstrap replicates) to estimate core genome size and pangenome size.\u0026nbsp;After rarefaction, \u003cstrong\u003ecore genome sizes ranged from X genes (species A) to Y genes (species B)\u003c/strong\u003e\u003cstrong\u003e,\u0026nbsp;\u003c/strong\u003eand\u003cstrong\u003epangenome sizes ranged from X to Y genes\u003c/strong\u003e (\u003cstrong\u003eSupplementary Fig. S3A\u003c/strong\u003e).\u0026nbsp;The persistence of these differences after sampling normalisation indicates that the observed variation reflects \u003cstrong\u003etrue biological differences among species rather than differences in isolate count\u003c/strong\u003e. Species such as \u003cem\u003eS. mitis\u003c/em\u003e and \u003cem\u003eS. oralis\u003c/em\u003e have comparatively larger pangenomes and smaller core fractions, consistent with higher genomic flexibility and potential for horizontal gene transfer, whereas species such as \u003cem\u003eS. sanguinis\u003c/em\u003e and \u003cem\u003eS. vestibularis\u003c/em\u003e have more conserved genomic repertoires. Collectively, these results underscore the genomic diversity of \u003cem\u003eStreptococcus\u003c/em\u003e in the HNC microbiome, shaped by both ecological adaptation and gene flow.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eGene Enrichment Analysis between Tumour- and Oral-Derived \u003cem\u003eStreptococci\u003c/em\u003e.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe next analysed functional gene differences in the \u003cem\u003eStreptococcus\u003c/em\u003e pangenome between tumour and oral-derived isolates. Across all 101 \u003cem\u003eStreptococcus\u003c/em\u003e genomes, 13 genes showed significant prevalence shifts between tumour and oral isolates (absolute frequency difference \u0026gt; 0.3; raw \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.05) (\u003cstrong\u003eFig. 3A\u003c/strong\u003e; \u003cstrong\u003eSupplementary Data S5\u003c/strong\u003e). Ten genes were enriched in tumour-derived isolates, including \u003cem\u003esfcA\u003c/em\u003e (malolactic enzyme), \u003cem\u003eaIM24\u003c/em\u003e (mitochondrial respiration-associated protein), \u003cem\u003egroup_338\u003c/em\u003e (CsbD-like stress response protein), \u003cem\u003eyetF\u003c/em\u003e (membrane protein; DUF421), \u003cem\u003egroup_1767\u003c/em\u003e (DUF3290 domain protein), \u003cem\u003egroup_16421\u003c/em\u003e (YtxH-like protein; Gram-positive signal peptide YSIRK family), \u003cem\u003egroup_11486\u003c/em\u003e (hypothetical protein), \u003cem\u003egroup_1364\u003c/em\u003e (DUF1269 domain protein), \u003cem\u003egroup_2826\u003c/em\u003e (GNAT family N-acetyltransferase), and \u003cem\u003egroup_18096\u003c/em\u003e (\u003cem\u003ezntA\u003c/em\u003e, Zn(II)-translocating P-type ATPase). Many of these genes are linked to stress tolerance, metal ion transport, membrane-associated functions, and potential adaptation to nutrient-limited or inflammatory tumour microenvironments\u003csup\u003e65-69\u003c/sup\u003e. Three genes were more prevalent in oral isolates, including \u003cem\u003egroup_1388\u003c/em\u003e (two-pore domain potassium channel protein), \u003cem\u003egroup_2725\u003c/em\u003e (DUF1304 domain-containing epimerase), and \u003cem\u003emarR–mgrA\u003c/em\u003e (oxidative stress response regulator). These functions are consistent with osmotic balance, carbohydrate metabolism, and colonisation persistence in the oral cavity\u003csup\u003e70-73\u003c/sup\u003e.\u003c/p\u003e\n\u003cp\u003eTo assess whether these niche associations persisted across multiple species, we performed a species-resolved enrichment analysis in species with matched tumour and oral representation. Across all species, 197 genes were significantly enriched in at least one niche (FDR \u0026lt; 0.1), comprising 146 oral-enriched and 51 tumour-enriched genes. However, only a small subset was recurrent across multiple species (≥ 2 species) (\u003cstrong\u003eFig. 3B\u003c/strong\u003e; \u003cstrong\u003eSupplementary Data S5\u003c/strong\u003e). Among tumour-enriched genes, only five were recurrent, including conjugative transfer components (\u003cem\u003etraE–virB4\u003c/em\u003e, \u003cem\u003etraG–virD4\u003c/em\u003e), and a capsule biosynthesis operon (\u003cem\u003ewcwK–cpsJ\u003c/em\u003e), stress-associated (\u003cem\u003eyozG\u003c/em\u003e), and DNA-binding and repair (\u003cem\u003essb\u003c/em\u003e)\u003csup\u003e74-78\u003c/sup\u003e. In contrast, oral-enriched recurrent genes were more numerous and predominantly associated with carbohydrate metabolism and membrane transport (\u003cem\u003etreC\u003c/em\u003e, \u003cem\u003eputP\u003c/em\u003e, \u003cem\u003ephnC\u003c/em\u003e, \u003cem\u003efucO–gldA\u003c/em\u003e, \u003cem\u003ewecB\u003c/em\u003e), capsule biosynthesis (\u003cem\u003ewchO\u003c/em\u003e, \u003cem\u003ewchP\u003c/em\u003e, \u003cem\u003ecpsO–epsJ\u003c/em\u003e), and CRISPR-Cas systems\u0026nbsp;(\u003cem\u003ecas1\u003c/em\u003e, \u003cem\u003ecas2\u003c/em\u003e, \u003cem\u003ecsn2\u003c/em\u003e)\u003csup\u003e75,79-83\u003c/sup\u003e. These findings indicate that while niche-specific functional adaptations are common and largely taxon-restricted, recurrent signals converge on carbohydrate metabolism, membrane transport and capsule biosynthesis, highlighting carbon utilisation and cell-surface glycan biology as major axes of oral–tumour niche specialisation in \u003cem\u003eStreptococcus\u003c/em\u003e.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCAZyme repertoires of HNC-associated \u003cem\u003eStreptococci\u003c/em\u003e.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eGiven the prominence of carbohydrate transport, metabolism and capsule-related functions among niche-associated genes, we next examined whether these differences extend to dedicated glycan-processing machinery. In oral \u003cem\u003eStreptococci\u003c/em\u003e, carbohydrate-active enzymes (CAZymes) are central to the utilisation of host and dietary glycans, biofilm formation, mucosal adhesion and immune modulation, and have been implicated in adaptation to nutrient-limited or inflamed tumour microenvironments\u003csup\u003e84-87\u003c/sup\u003e.\u0026nbsp;We therefore characterised the CAZyme repertoires of HNC-associated \u003cem\u003eStreptococcus\u003c/em\u003e to define their glycan-utilisation potential and provide a framework for subsequent comparisons with healthy-derived oral isolates.\u003c/p\u003e\n\u003cp\u003eAcross the 101 HNC-associated \u003cem\u003eStreptococcus\u003c/em\u003e genomes, we identified 99 distinct CAZyme families spanning six major functional classes: glycoside hydrolases (GH; \u003cem\u003en\u003c/em\u003e = 55), glycosyltransferases (GT; \u003cem\u003en\u003c/em\u003e = 21), carbohydrate-binding modules (CBM; \u003cem\u003en\u003c/em\u003e = 14), carbohydrate esterases (CE; \u003cem\u003en\u003c/em\u003e = 6), polysaccharide lyases (PL; \u003cem\u003en\u003c/em\u003e = 2), and auxiliary activities (AA; \u003cem\u003en\u003c/em\u003e = 1) (\u003cstrong\u003eFig. 4A\u003c/strong\u003e; \u003cstrong\u003eSupplementary Data S4\u003c/strong\u003e). GHs and GTs dominated the repertoire, accounting for up to 63.8% and 30.4% of the CAZyme families per genome, respectively, while CBMs, CEs, PLs, and AAs occurred at lower frequencies. This class-level distribution was broadly conserved across species, although family-level counts varied substantially (\u003cstrong\u003eFig. 4B\u003c/strong\u003e). For example, \u003cem\u003eS. salivarius\u003c/em\u003e, \u003cem\u003eS. mitis\u003c/em\u003e, and \u003cem\u003eS. parasanguinis\u003c/em\u003e had the highest GT counts, while CE abundance peaked in \u003cem\u003eS. mutans\u003c/em\u003e and \u003cem\u003eS. parasanguinis\u003c/em\u003e. PLs were rare and restricted to \u003cem\u003eS. constellatus\u003c/em\u003e, \u003cem\u003eS. anginosus\u003c/em\u003e, \u003cem\u003eS. oralis\u003c/em\u003e, and \u003cem\u003eS. sp029691405\u003c/em\u003e. CBMs were more evenly distributed (~4–5 families/genome), suggesting conserved substrate recognition functions.\u003c/p\u003e\n\u003cp\u003eAlthough no CAZyme family was present in all genomes, several were highly prevalent across species, likely representing a conserved “functional core”. These included GH1 (β-glucosidase)\u003csup\u003e88\u003c/sup\u003e, GH13 subfamilies (e.g., GH13_9, GH13_14, GH13_31)\u003csup\u003e89\u003c/sup\u003e, GH23 and GH25 (peptidoglycan hydrolases)\u003csup\u003e90\u003c/sup\u003e, and GT2/GT4 (polysaccharide synthesis)\u003csup\u003e91\u003c/sup\u003e, as well as CBM48 and CBM50\u003csup\u003e92-94\u003c/sup\u003e, which target glycogen and chitin/peptidoglycan, respectively. These core families likely represent essential metabolic functions in mucosal colonisation and host glycan processing\u003csup\u003e95-99\u003c/sup\u003e. Conversely, several CAZyme families were rare or species-restricted. GH26 (mannanase) and CBM23 (mannan-binding) occurred only in \u003cem\u003eS. parasanguinis\u003c/em\u003e, GH98 (blood group antigen hydrolase) was detected exclusively in \u003cem\u003eStrep.BHI_11\u003c/em\u003e, and GH170 and GH43_12 (arabinofuranosidase) were enriched in \u003cem\u003eS. gordonii\u003c/em\u003e, \u003cem\u003eS. constellatus\u003c/em\u003e, and \u003cem\u003eS. parasanguinis\u003c/em\u003e. PL8_1 and PL12_1 were confined to small subsets of \u003cem\u003eS. anginosus\u003c/em\u003e and \u003cem\u003eS. constellatus\u003c/em\u003e. The AA10 family (lytic polysaccharide monooxygenase) was broadly distributed, detected in 74 of 101 isolates, suggesting a potential role in redox adaptation within mucosal environments or host immune interactions\u003csup\u003e100\u003c/sup\u003e. Fisher’s exact tests (FDR-corrected) further highlighted lineage-specific associations (\u003cstrong\u003eSupplementary Data S4\u003c/strong\u003e). GH170 was significantly overrepresented in \u003cem\u003eS. gordonii\u003c/em\u003e and \u003cem\u003eS. anginosus\u003c/em\u003e (adjusted \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.01), GH78 and GH43_12 were confined to \u003cem\u003eS. parasanguinis\u003c/em\u003e (adjusted \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.05), and GT14 (galactosyltransferase activity) occurred only in \u003cem\u003eStrep.BHI_6\u003c/em\u003e. CBM41 and CBM48 showed strong species-specific depletion or enrichment, particularly in \u003cem\u003eS. massiliensis\u003c/em\u003e, \u003cem\u003eS. cristatus\u003c/em\u003e, and \u003cem\u003eS. pseudopneumoniae\u003c/em\u003e (adjusted \u003cem\u003ep\u003c/em\u003e \u0026lt; 1 × 1e-56), underscoring functional divergence in glycan interaction strategies across taxa.\u003c/p\u003e\n\u003cp\u003eCAZyme family richness, defined as the number of distinct CAZy families per genome, ranged from 44 in \u003cem\u003eS. sp901875575\u003c/em\u003e to 61 in \u003cem\u003eS. anginosus\u003c/em\u003e (\u003cstrong\u003eSupplementary Data S4\u003c/strong\u003e). \u003cem\u003eS. gordonii\u003c/em\u003e and \u003cem\u003eS. anginosus\u003c/em\u003e were among the most enriched, whereas \u003cem\u003eS. massiliensis\u003c/em\u003e and \u003cem\u003eS. sp901875575\u003c/em\u003e harboured the lowest richness values (\u003cstrong\u003eFig. 4C\u003c/strong\u003e). Shannon and Inverse Simpson indices revealed significant interspecies differences, and PERMANOVA confirmed strong species-level structuring of CAZyme profiles (R² = 0.895, \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eIntegrative and conjugative elements\u0026nbsp;\u003c/strong\u003e\u003cstrong\u003eare the predominant drivers of horizontal gene transfer in HNC\u0026nbsp;\u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eStreptococcus\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eUsing the resolution provided by closed long-read generated genomics, we then analysed 132 oral streptococcal isolates for complete mobile genetic elements (MGE). After identification, we considered the distribution and prevalence of specific elements through CD-HIT clustering (≥90% sequence identity and ≥80% coverage) to detect shared elements. In total, we identified 245 ICE clusters from 122 isolates and 82 prophage clusters from 77 isolates and 4 plasmid clusters from 35 isolates. ICEs were more conserved throughout the dataset, forming non-singleton clusters found in 43 different isolates while prophages were only found in one isolate in most cases, with the maximum occurrence being four isolates. While there were fewer plasmids in the dataset generally, they were spread widely throughout the isolates as shown by their presence in 26 strains from 13 different patients (\u003cstrong\u003eFig. 5A\u003c/strong\u003e, \u003cstrong\u003eSupplementary Data S\u003c/strong\u003e\u003cstrong\u003e6\u003c/strong\u003e). This pattern aligns with prior observations of widespread plasmid exchange and host range diversity\u0026nbsp;among streptococcal populations. The global MGE sharing network showed that particular MGEs are likely present throughout the broader population rather than through possible person-to-person transmission (\u003cstrong\u003eSupplementary Fig. S4A\u003c/strong\u003e). This interpretation was supported by cluster size distributions (\u003cstrong\u003eSupplementary Fig. S4B\u003c/strong\u003e), where ICEs and plasmids were often found in multiple patients, in contrast to the restricted distribution of prophages.\u003c/p\u003e\n\u003cp\u003eThe analysis of MGE distribution between species showed distinct patterns that followed taxonomic lines (\u003cstrong\u003eFig. 5B\u003c/strong\u003e, \u003cstrong\u003eSupplementary Fig. S5A\u003c/strong\u003e). The three species \u003cem\u003eS. mitis, S. oralis\u003c/em\u003e and \u003cem\u003eS. salivarius\u003c/em\u003e displayed the largest mobilomes, containing genes for conjugation and recombination while \u003cem\u003eS. constellatus\u003c/em\u003e and \u003cem\u003eS. parasanguinis\u003c/em\u003e had phage-related genes as their main mobilome components. The species \u003cem\u003eS. salivarius\u003c/em\u003e contained multiple antibiotic resistance determinants but \u003cem\u003eS. pseudopneumoniae\u003c/em\u003e and \u003cem\u003eS. massiliensis\u003c/em\u003e displayed smaller mobilomes that contained recombination genes. Within-patient analyses showed that ICE collections were characterised by abundant insertion sequences and metabolic/transport genes, whereas prophage collections were dominated by hypothetical or uncharacterised proteins, followed by phage-related and replication-associated functions (\u003cstrong\u003eFig. 5C\u003c/strong\u003e\u003cstrong\u003e-D\u003c/strong\u003e, \u003cstrong\u003eSupplementary Fig. S5B\u003c/strong\u003e). This distribution highlights the high level of functional ambiguity in prophage-associated regions, suggesting extensive genetic mosaicism and unannotated phage cargo in the tumour-associated \u003cem\u003eStreptococcus\u003c/em\u003e mobilome. The most common HGT region products included IS-family transposases and recombinases and conjugation proteins and clinically important antibiotic resistance determinants such as \u003cem\u003eMsr(D)\u003c/em\u003e and \u003cem\u003eBcrA\u003c/em\u003e (\u003cstrong\u003eFig. 5E\u003c/strong\u003e).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eComparison of \u003cem\u003eStreptococcus\u003c/em\u003e Genomes between HNC and Healthy Donors.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo investigate potential genomic differences between \u003cem\u003eStreptococcus\u003c/em\u003e isolates associated with HNC and those from healthy individuals, we compared 76 oral isolates derived from HNC patients with 391 publicly available oral \u003cem\u003eStreptococcus\u003c/em\u003e isolates from healthy individuals, obtained from the China National GeneBank COGR collection (CNGB Sequence Archive, accession CNP0003047)\u003csup\u003e26\u003c/sup\u003e (\u003cstrong\u003eSupplementary Data S7\u003c/strong\u003e). Analyses were restricted to 12 \u003cem\u003eStreptococcus\u003c/em\u003e species that were represented by multiple strains in both cohorts. Comparison of genome size and GC content revealed species-specific genomic divergence between cancer-associated and healthy isolates (\u003cstrong\u003eFig. 6A-B\u003c/strong\u003e). GC content was largely conserved across species, but cancer-derived \u003cem\u003eS. mitis\u003c/em\u003e and \u003cem\u003eS. salivarius\u003c/em\u003e isolates exhibited significantly reduced GC levels (–0.5% and –0.4%, respectively, \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.05), potentially indicating altered compositional bias or increased mobile element load. Other species, such as \u003cem\u003eS. anginosus\u003c/em\u003e, \u003cem\u003eS. parasanguinis\u003c/em\u003e, and \u003cem\u003eS. gordonii\u003c/em\u003e, showed no significant differences, reinforcing the heterogeneity of genomic adaptation across taxa.\u003c/p\u003e\n\u003cp\u003eTo investigate genomic differences between cancer- and healthy-associated \u003cem\u003eStreptococcus\u003c/em\u003e isolates, we performed gene presence/absence analysis across all genomes. Principal Coordinate Analysis (PCoA) of gene presence/absence based on Jaccard distance revealed partial but distinct clustering between cancer and healthy isolates (\u003cstrong\u003eSupplementary Fig. S\u003c/strong\u003e\u003cstrong\u003e6A\u003c/strong\u003e).\u0026nbsp;To further explore whether these differences were consistent within individual species, we applied the PCoA stratified by species, revealing variable degrees of separation between cancer and healthy isolates across taxa (\u003cstrong\u003eSupplementary Fig. S\u003c/strong\u003e\u003cstrong\u003e6B\u003c/strong\u003e, \u003cstrong\u003eSupplementary Data S7\u003c/strong\u003e).\u0026nbsp;Several species exhibited separation between cancer and healthy isolates, suggesting that tumour-associated strains acquire distinct accessory gene repertoires. Species-stratified PERMANOVA confirmed significant effects of health status on gene content in \u003cem\u003eS. anginosus\u003c/em\u003e (R² = 0.1177, p = 0.001), \u003cem\u003eS. constellatus\u003c/em\u003e (R² = 0.1741, p = 0.001), \u003cem\u003eS. gordonii\u003c/em\u003e (R² = 0.1258, p = 0.002), \u003cem\u003eS. infantis\u003c/em\u003e (R² = 0.1653, p = 0.034), \u003cem\u003eS. mitis\u003c/em\u003e (R² = 0.1675, p = 0.001), \u003cem\u003eS. oralis\u003c/em\u003e (R² = 0.0395, p = 0.022), and \u003cem\u003eS. parasanguinis\u003c/em\u003e (R² = 0.0964, p = 0.003) (\u003cstrong\u003eSupplementary Data S7\u003c/strong\u003e), with \u003cem\u003eS. constellatus\u003c/em\u003e showing the strongest disease-associated effect size. These findings suggest that while species-level structure is conserved, disease status can drive niche-specific genomic diversification in a subset of lineages.\u003c/p\u003e\n\u003cp\u003eTo determine whether cancer-associated \u003cem\u003eStreptococcus\u003c/em\u003e exhibit distinct glycan-processing capacities relative to healthy isolates, we conducted matched comparisons of CAZymes across 17 shared species. At the class level, CAZyme composition were broadly conserved between cancer and healthy cohorts, with GHs and GTs comprising the dominant categories in both cancer and healthy isolates (GH: 53.6–55.4%; GT: 23.4–24.8%) (\u003cstrong\u003eSupplementary Fig. S\u003c/strong\u003e\u003cstrong\u003e7A\u003c/strong\u003e). Nevertheless, PERMANOVA analysis revealed a modest but statistically significant effect of cancer status on CAZyme composition (R² = 0.018, \u003cem\u003ep\u003c/em\u003e = 0.001), indicating that disease state partially structures glycan-degradation potential.\u0026nbsp;Diversity analyses further supported subtle cohort-level differences.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eDirectional enrichment analysis using Fisher’s exact test identified several species-specific functional shifts (\u003cstrong\u003eSupplementary Fig. S\u003c/strong\u003e\u003cstrong\u003e7C\u003c/strong\u003e; \u003cstrong\u003eSupplementary Data S8\u003c/strong\u003e). Cancer-derived isolates of \u003cem\u003eS. mitis\u003c/em\u003e, \u003cem\u003eS. oralis\u003c/em\u003e, and \u003cem\u003eS. pseudopneumoniae\u003c/em\u003e exhibited higher proportions of CE, GH, and GT classes, potentially reflecting adaptations in nutrient acquisition or host interaction. Conversely, healthy-associated isolates of \u003cem\u003eS. salivarius\u003c/em\u003e, \u003cem\u003eS. parasanguinis\u003c/em\u003e, and \u003cem\u003eS. gordonii\u003c/em\u003e were enriched in GH, GT, or CBM classes, indicating retention of canonical commensal glycan-processing profiles. To explore finer-scale differences, we applied DESeq2 analysis to CAZyme gene counts across all 467 isolates. Four CAZy families—GH70, GH73, GT8, and GT2—were significantly enriched in healthy isolates (adjusted \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.05). These families, associated with extracellular glucan synthesis and peptidoglycan turnover, may reflect enhanced structural maintenance and biosynthetic activity in the healthy oral microbiome\u003csup\u003e101\u003c/sup\u003e. No CAZyme families were significantly enriched in the cancer cohort at the global level, though species-level expansions (e.g., GT8 in \u003cem\u003eS. pseudopneumoniae\u003c/em\u003e) were observed.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eFinally, we compared the gene presence–absence profiles of oral cavity-derived \u003cem\u003eStreptococcus\u003c/em\u003e genomes from \u0026nbsp;76 HNC-associated isolates and 391 healthy-associated isolates to identify genomic and functional differences associated with cancer\u003csup\u003e26\u003c/sup\u003e.\u0026nbsp;We performed two complementary gene enrichment analyses: a pooled analysis that compared all isolates irrespective of species, and a species-balanced analysis focused on within-species comparisons across 17 matched species. In the pooled analysis, 158 genes exhibited significant differential presence (FDR \u0026lt; 0.05) between cancer and healthy cohorts (\u003cstrong\u003eFig. 7A\u003c/strong\u003e; \u003cstrong\u003eSupplementary Data S9\u003c/strong\u003e). HNC-enriched genes were involved in sugar transport and metabolism (\u003cem\u003elacE\u003c/em\u003e, \u003cem\u003elacF\u003c/em\u003e, \u003cem\u003elacT\u003c/em\u003e, \u003cem\u003emalX\u003c/em\u003e, \u003cem\u003eglf\u003c/em\u003e), thiamine biosynthesis (\u003cem\u003ethiM\u003c/em\u003e, \u003cem\u003ethiE\u003c/em\u003e), and drug resistance (\u003cem\u003eermB\u003c/em\u003e, \u003cem\u003emefA\u003c/em\u003e, \u003cem\u003eyheS–msrD\u003c/em\u003e), as well as mobile genetic elements (\u003cem\u003etnp\u003c/em\u003e, \u003cem\u003etraG\u003c/em\u003e, \u003cem\u003epezT\u003c/em\u003e). Notably, several cancer-enriched loci, such as \u003cem\u003egroup_15226\u003c/em\u003e and \u003cem\u003egroup_13653\u003c/em\u003e, were annotated as hypothetical or grouped gene clusters, suggesting that uncharacterised genes could play a role in cancer adaptation. In contrast, healthy-enriched genes showed higher frequencies of CRISPR-Cas loci (\u003cem\u003ecas3–cas7\u003c/em\u003e, \u003cem\u003ecasA\u003c/em\u003e, \u003cem\u003ecasB\u003c/em\u003e), histidine and nucleotide biosynthesis genes (\u003cem\u003ehisA–hisH\u003c/em\u003e, \u003cem\u003ebioH\u003c/em\u003e, \u003cem\u003epdxS\u003c/em\u003e), and metal ion transporters (\u003cem\u003eacm\u003c/em\u003e, \u003cem\u003ecrcB\u003c/em\u003e, \u003cem\u003eglpG\u003c/em\u003e).\u003c/p\u003e\n\u003cp\u003eTo dissect the genomic adaptations of \u003cem\u003eStreptococcus\u003c/em\u003e isolates in the cancer microenvironment, we performed species-resolved comparisons of gene presence-absence across 17 species with matched cancer and healthy representatives. Fisher’s exact tests followed by FDR correction identified 622 significantly differentially enriched genes (adjusted \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.05, |Δfreq| \u0026gt; 0.1), the majority of which were species-specific (91.3%), underscoring individualized adaptation patterns (\u003cstrong\u003eSupplementary Data S9\u003c/strong\u003e). Five species (\u003cem\u003eS. anginosus\u003c/em\u003e, \u003cem\u003eS. gordonii\u003c/em\u003e, \u003cem\u003eS. mitis\u003c/em\u003e, \u003cem\u003eS. parasanguinis\u003c/em\u003e, and \u003cem\u003eS. salivarius\u003c/em\u003e) showed pronounced gene-level divergence, with multiple genes significantly enriched in either cancer- or health-associated isolates (\u003cstrong\u003eSupplementary Fig. S\u003c/strong\u003e\u003cstrong\u003e8\u003c/strong\u003e). For example, \u003cem\u003eS. anginosus\u003c/em\u003e cancer isolates exhibited enrichment for genes with unknown function (group_4022, group_5958, and group_6215), while \u003cem\u003eS. gordonii\u003c/em\u003e healthy isolates were enriched in metabolic and regulatory genes such as \u003cem\u003elacF\u003c/em\u003e and \u003cem\u003ebglG\u003c/em\u003e. \u003cem\u003eS. mitis\u003c/em\u003e displayed the most extensive divergence, with over 100 genes differentially enriched, including the mobile element \u003cem\u003etnp_tran5\u003c/em\u003e and stress response regulators like \u003cem\u003eyoeG\u003c/em\u003e, suggesting broad genomic remodelling in the cancer-associated niche. Although most gene differences were species-restricted, a subset of functions showed recurrent enrichment across multiple species. For instance, \u003cem\u003esprT\u003c/em\u003e, a stress-related protease, and \u003cem\u003enarK\u003c/em\u003e, a nitrate transporter, were enriched in cancer isolates from different species, suggesting convergent functional adaptation to the tumour microenvironment. In contrast, genes involved in sugar metabolism and quorum regulation were preferentially retained in healthy-associated strains. These results highlight both conserved and species-specific strategies by which \u003cem\u003eStreptococcus\u003c/em\u003e adapts to cancer-associated ecological pressures.\u003c/p\u003e\n\u003cp\u003eWe performed GO and KEGG pathway enrichment analysis using Fisher’s exact test with Benjamini–Hochberg FDR correction (adjusted \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.05), based on gene presence/absence data. When assessed globally across all genes, no significant pathway-level differences were detected between HNC-associated and healthy-derived \u003cem\u003eStreptococcus\u003c/em\u003e isolates in either analysis, suggesting a conserved core functionality (\u003cstrong\u003eSupplementary Fig. S\u003c/strong\u003e\u003cstrong\u003e9A–D\u003c/strong\u003e). However, enrichment analysis restricted to significantly different genes revealed pronounced functional separation. In the cancer-enriched gene set, KEGG pathways were dominated by carbohydrate and energy metabolism, including galactose metabolism, pyruvate metabolism, fructose and mannose metabolism, glycolysis/gluconeogenesis, amino sugar and nucleotide sugar metabolism, pentose phosphate pathway, and carbon metabolism. Additional pathways included lipoic acid metabolism, PTS (phosphotransferase system), and base excision repair (\u003cstrong\u003eFig. 7B\u003c/strong\u003e). GO enrichment similarly highlighted carbohydrate transport and uptake processes, such as “carbohydrate transmembrane transport”, “monosaccharide transport”, “ABC-type carbohydrate transporter activity”, and “maltose/oligosaccharide transporter activity” (\u003cstrong\u003eFig. 7C\u003c/strong\u003e). These were complemented by broad response terms such as “cellular response to external stimulus”. In contrast, genes enriched in healthy-associated isolates mapped to biosynthetic and regulatory pathways, including amino acid biosynthesis (e.g., cysteine and methionine, alanine, arginine, glutamate, phenylalanine, histidine), starch and sucrose metabolism, quorum sensing, and protein export. GO terms supported this metabolic breadth, with enrichment of “L-histidine biosynthetic process,” “glutamine family amino acid metabolism,” “oxidoreductase activity,” and multiple vitamin transport processes (e.g., riboflavin, vitamin B6, CoA biosynthesis). Healthy isolates were also enriched for DNA repair and cell wall-related pathways, including mismatch repair, homologous recombination, peptidoglycan and teichoic acid biosynthesis.\u0026nbsp;\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eHere, we present a comprehensive dataset of fully resolved \u003cem\u003eStreptococcus\u003c/em\u003e isolate genomes associated with HNC. Our analyses show substantial microbial diversity along with specific adaptations in HNC patient oral microbiomes. Notably, we identified 101 unique fully resolved genomes across 35 species, including ten putative novel species-level taxa (\u003cem\u003eStrep. BHI_1\u003c/em\u003e\u003cem\u003e-\u003c/em\u003e\u003cem\u003e10\u003c/em\u003e) clustering within the Mitis group (\u003cem\u003eS. mitis, S. oralis,\u003c/em\u003e\u003cem\u003e\u0026nbsp;and\u003c/em\u003e\u003cem\u003e\u0026nbsp;S. infantis\u003c/em\u003e), reflecting genomic fluidity with high ANI (\u0026gt;95%) and overlapping pangenomes\u003csup\u003e102,103\u003c/sup\u003e. The discovered taxa show features of undetected lineages which need further phenotypic validation\u003csup\u003e103\u003c/sup\u003e.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eOur phylogenomic and pangenome analyses demonstrate both taxonomic and functional divergence among HNC-associated \u003cem\u003eStreptococcus\u003c/em\u003e. Species such as \u003cem\u003eS. mitis\u003c/em\u003e and \u003cem\u003eS. oralis\u003c/em\u003e exhibited extensive accessory genome variation, with some novel strains (e.g., \u003cem\u003eStrep. BHI_6\u003c/em\u003e, \u003cem\u003eBHI_11\u003c/em\u003e) carrying expanded gene repertoires not observed in reference genomes. This genomic diversity is likely reflective of microenvironment-specific selective pressures, including differential nutrient availability, host immune surveillance, and biofilm spatial organization which influence microbial evolution in mucosal niches\u003csup\u003e104-106\u003c/sup\u003e. This underscores the need for expanded genomic and phenotypic studies to refine taxonomic classifications and elucidate functional roles in HNC.\u003c/p\u003e\n\u003cp\u003eThe genomic analyses also revealed distinct patterns of possible adaptation amongst the \u003cem\u003eStreptococcus\u003c/em\u003e genus.\u0026nbsp;The variable genome sizes (1.7–2.5 Mbp) together with stable GC content (36–43%) demonstrate that HNC TME adaptation likely occurs through HGT events that expand accessory genes while maintaining a stable core composition\u003csup\u003e107,108\u003c/sup\u003e.\u0026nbsp;Across all 101 isolates spanning 35 species, we identified only 29 genus-level core genes, including \u003cem\u003egalU\u003c/em\u003e critical for biofilm formation\u003csup\u003e109\u003c/sup\u003e,\u0026nbsp;emphasising the extensive genetic variability within \u003cem\u003eStreptococcus\u003c/em\u003e, with accessory genes likely conferring niche-specific advantages such as enhanced nutrient acquisition or immune evasion\u003csup\u003e73,109\u003c/sup\u003e. UMAP clustering and pangenome analyses further confirmed that species like \u003cem\u003eS. mitis\u003c/em\u003e and \u003cem\u003eS. oralis\u003c/em\u003e exhibit\u0026nbsp;substantial genomic flexibility through their low core gene percentages (32–36%), while \u003cem\u003eS.\u0026nbsp;\u003c/em\u003e\u003cem\u003esanguinis\u003c/em\u003e and \u003cem\u003eS. gordonii\u003c/em\u003e maintain more conserved genomes, reflecting ecological stability in oral communities\u003csup\u003e59,102,110\u003c/sup\u003e.\u003c/p\u003e\n\u003cp\u003eWithin the HNC cohort, our analyses observed distinct carbohydrate-active enzyme (CAZyme) repertoires accumulate differentially between different taxa and between tumour- and oral-derived isolates. Tumour-associated isolates of \u003cem\u003eS. mitis\u003c/em\u003e and \u003cem\u003eS. oralis\u003c/em\u003e showed significant expansions in glycosyltransferase (GT)\u0026nbsp;and\u0026nbsp;carbohydrate esterase (CE)\u0026nbsp;families, whereas oral-derived isolates retained broader\u0026nbsp;CAZyme\u0026nbsp;diversity, including rare\u0026nbsp;glycoside hydrolases (GHs). The observed change in glycan acquisition approaches may stem from tumour-related modifications of host mucin glycosylation and extracellular matrix remodelling in tumours\u003csup\u003e111,112\u003c/sup\u003e. The observed CAZyme divergence aligns with previous research that connects glycan remodelling to microbiome restructuring in oncogenic contexts\u003csup\u003e113,114\u003c/sup\u003e.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eOur pan-genome comparisons between healthy- and HNC-associated \u003cem\u003eStreptococcus\u003c/em\u003e isolates revealed significant enrichment of genes in tumour-derived strains that are functionally linked to stress response, metal homeostasis, and membrane/capsule biosynthesis. Specifically, the spore-coat associated gene\u0026nbsp;\u003cem\u003eyetF\u003c/em\u003e\u003csup\u003e115\u003c/sup\u003e, the arsenate reductase \u003cem\u003earsC\u003c/em\u003e \u003csup\u003e116\u003c/sup\u003e, the zinc/cadmium efflux pump\u003cem\u003ezntA\u003c/em\u003e\u003csup\u003e117\u003c/sup\u003e and membrane or capsule-associated functions (\u003cem\u003egroup_16421\u003c/em\u003e, \u003cem\u003ewcwK–cpsJ\u003c/em\u003e)\u003csup\u003e118,119\u003c/sup\u003e were over-represented in tumour isolates. Although these genes were originally characterised in non-cancer contexts, their known roles in oxidative stress resistance, metal detoxification, and surface polysaccharide modification are highly relevant to known selective pressures in the HNC tumour microenvironment, including hypoxia-induced ROS accumulation, metal ion dysregulation, and immune-mediated antimicrobial attack\u003csup\u003e120-124\u003c/sup\u003e. These findings suggest that acquisition or retention of such genes may confer a fitness advantage to \u003cem\u003eStreptococci\u003c/em\u003e within the tumour niche. In contrast, oral-derived isolates showed increased presence of carbohydrate metabolic genes( \u003cem\u003etreC\u003c/em\u003e and \u003cem\u003eputP\u003c/em\u003e)\u003csup\u003e125,126\u003c/sup\u003e and oxidative stress resistance genes\u0026nbsp;(\u003cem\u003emarR-mgrA\u003c/em\u003e)\u003csup\u003e127\u003c/sup\u003e which matches the oral mucosa's dynamic sugar and oxygen gradients\u003csup\u003e128,129\u003c/sup\u003e.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eUsing our panel of long-read sequenced and closed genomes, we identified extensive HGT among co-resident \u003cem\u003eStreptococcus\u003c/em\u003e strains within individual HNC patients. ICEs were the predominant vectors, forming 245 distinct clusters across 122 isolates, while prophages (82 clusters from 77 isolates) were generally strain-specific and plasmids were rare (4 clusters from 35 isolates) but distributed across multiple patients. These MGEs collectively carried genes for conjugation, recombination, and antibiotic resistance (such as \u003cem\u003eMsr(D)\u003c/em\u003e, \u003cem\u003eBcrA\u003c/em\u003e), indicating that ICEs play a dominant role in mobilising adaptive traits in HNC-associated \u003cem\u003eStreptococcus\u003c/em\u003e\u003csup\u003e130-132\u003c/sup\u003e.\u0026nbsp;The relative scarcity of plasmids suggests limited plasmid-mediated exchange, whereas ICEs and certain prophages facilitate broader chromosomal gene movement. The abundance of hypothetical proteins in prophage regions highlights the high proportion of uncharacterised cargo within the HNC \u003cem\u003eStreptococcus\u003c/em\u003e mobilome.\u003c/p\u003e\n\u003cp\u003eThe analysis of our HNC-associated isolates against 391 publicly available healthy oral \u003cem\u003eStreptococcus\u003c/em\u003e genomes revealed cancer-specific genomic patterns. The fundamental genetic structure of these bacteria stayed intact, but their non-essential genetic material showed significant differences. Cancer-related \u003cem\u003eStreptococcus\u003c/em\u003e isolates exhibited higher frequencies genes for sugar transport\u0026nbsp;(e.g., \u003cem\u003elacE\u003c/em\u003e, \u003cem\u003eglf\u003c/em\u003e)\u003csup\u003e133,134\u003c/sup\u003e, thiamine biosynthesis (\u003cem\u003ethiM, thiE\u003c/em\u003e) \u003csup\u003e133,134\u003c/sup\u003e and antimicrobial resistance (e.g., \u003cem\u003eermB\u003c/em\u003e, \u003cem\u003emefA\u003c/em\u003e)\u003csup\u003e135,136\u003c/sup\u003e, which suggest metabolic restructuring and stress response optimization. The healthy bacterial isolates contained more CRISPR-Cas loci and metal ion homeostasis genes which indicate better phage defence capabilities and environmental perception\u003csup\u003e137\u003c/sup\u003e. \u0026nbsp;These findings suggest that cancer-associated oral dysbiosis is shaped by the selective retention of horizontally acquired genes that enhance stress tolerance, immune modulation, and nutrient utilisation within cancer-affected oral environments\u003csup\u003e113,138\u003c/sup\u003e.\u003c/p\u003e\n\u003cp\u003eCancer-enriched genes were predominantly associated within MGEs, particularly ICEs.\u0026nbsp;These HGT-associated genes, including transporters (e.g., \u003cem\u003elacF, celB\u003c/em\u003e)\u003csup\u003e139,140\u003c/sup\u003e and carbohydrate uptake and fermentation genes (e.g., \u003cem\u003elacC, bglG\u003c/em\u003e)\u003csup\u003e141,142\u003c/sup\u003e,\u0026nbsp;likely enhance metabolic versatility under nutrient-limited or stress-associated conditions common in cancer-affected oral environments\u003csup\u003e19\u003c/sup\u003e.\u0026nbsp;Because these genes were observed across multiple isolates but not concentrated in a single species, they likely reflect functional convergence rather than clonal expansion\u003csup\u003e143\u003c/sup\u003e.\u0026nbsp;The prevalence of ICEs further underscores their role in chromosome-based adaptation, although the abundance of unclassified MGEs warrants further characterization to elucidate their contributions\u003csup\u003e144,145\u003c/sup\u003e.\u003c/p\u003e\n\u003cp\u003eThe study provides has several limitations. The analyses were restricted to culturable bacteria, excluding unculturable species. Sequence clustering using CD-HIT thresholds at 90% identity and 80% coverage fails to detect distant homologs and may merge mosaic elements incorrectly. Moreover, this approach cannot infer the directionality of gene transfer. Although long-read sequencing improved assembly contiguity, the detection and complete annotation of plasmids and prophages remain challenging. Potential biases in the healthy cohort analysis arising from differences in batch conditions, geographical origin and sequencing platforms, were addressed through bioinformatic methods but could not be completely eliminated. Finally, while differential gene expression was observed between isolates from cancer and healthy donors, further validation is required to confirm whether cancer-specific enrichments confer true survival advantages. Nevertheless, this study provides a comprehensive and novel dataset that advances understanding of \u003cem\u003eStreptococcus\u003c/em\u003e species in HNC.\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eThis is the first study to demonstrate that\u0026nbsp;\u003cem\u003eStreptococcus\u003c/em\u003e isolates from HNC patients exhibit distinct genetic and functional profiles compared with those from healthy individuals. These differences arise primarily from accessory genome variation, ICE-mediated horizontal gene transfer, and species-specific adaptations in carbohydrate metabolism. Together, these findings reveal species-dependent evolutionary strategies that may enable bacterial survival within TMEs and identify new lineages and MGE-linked functions that may inform microbiome-based diagnostics and therapeutics for HNC.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eData Availability Statement\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAll sequencing data in this paper will be provided upon request.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor Contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eL.M.: Conceptualization, Methodology, Investigation, Data Analysis, Writing – Original draft, Writing – review and editing. G.B.: Investigation, Data Analysis, Writing – Original draft, Writing – review and editing. E.B, K.Y: Investigation, Writing – review and editing. J.H., S.K.: Resources, Writing – review and editing. P.W., R.V., A.P., S.V.: Resources, Writing – review and editing, Supervision, Funding Acquisition. K.F: Conceptualization, Investigation, Writing – Original draft, Writing – review and editing, Supervision, Project Administration, Funding Acquisition.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgments\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe would like to thank the medical staff from The Royal Adelaide Hospital and The Memorial Hospital for their assistance in sample collection. \u0026nbsp;Graphical Abstract was generated using BioRender.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis work was supported by an HSCGB Ray and Shirl Norman Cancer Research Grant (A.P., K.F., S.V., and R.V.), an NHMRC investigator grant APP1196832 (P.W.), a Passe and Williams Senior Fellowship (S.V.), a Cancer Council SA Research Fellowship (K.F) and a The University of Adelaide Research Training Program Scholarship (L.M).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConflict of Interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare that there are no conflicts of interest.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthics statement\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eEthics approval (HREC MYIP14116) for the collection and storage of patient samples was granted by Central Adelaide Local Health Network Human Research Ethics Committee (Adelaide, South Australia) in accordance with the Declaration of Helsinki, and all patients had signed written informed consent.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eBarsouk, A., Aluru, J. S., Rawla, P., Saginala, K. \u0026amp; Barsouk, A. Epidemiology, Risk Factors, and Prevention of Head and Neck Squamous Cell Carcinoma. \u003cem\u003eMed Sci (Basel)\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e (2023). https://doi.org/10.3390/medsci11020042\u003c/li\u003e\n\u003cli\u003eSung, H.\u003cem\u003e et al.\u003c/em\u003e Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. \u003cem\u003eCA Cancer J Clin\u003c/em\u003e \u003cstrong\u003e71\u003c/strong\u003e, 209-249 (2021). https://doi.org/10.3322/caac.21660\u003c/li\u003e\n\u003cli\u003eChow, L. Q. M. Head and Neck Cancer. \u003cem\u003eNew England Journal of Medicine\u003c/em\u003e \u003cstrong\u003e382\u003c/strong\u003e, 60-72 (2020). https://doi.org/10.1056/NEJMra1715715\u003c/li\u003e\n\u003cli\u003eIrfan, M., Delgado, R. Z. R. \u0026amp; Frias-Lopez, J. The Oral Microbiome and Cancer. \u003cem\u003eFrontiers in Immunology\u003c/em\u003e \u003cstrong\u003eVolume 11 - 2020\u003c/strong\u003e (2020). https://doi.org/10.3389/fimmu.2020.591088\u003c/li\u003e\n\u003cli\u003eBurcher, K. M.\u003cem\u003e et al.\u003c/em\u003e A Review of the Role of Oral Microbiome in the Development, Detection, and Management of Head and Neck Squamous Cell Cancers. \u003cem\u003eCancers (Basel)\u003c/em\u003e \u003cstrong\u003e14\u003c/strong\u003e (2022). https://doi.org/10.3390/cancers14174116\u003c/li\u003e\n\u003cli\u003eDorobisz, K., Dorobisz, T. \u0026amp; Zatoński, T. The Microbiome\u0026apos;s Influence on Head and Neck Cancers. \u003cem\u003eCurr Oncol Rep\u003c/em\u003e \u003cstrong\u003e25\u003c/strong\u003e, 163-171 (2023). https://doi.org/10.1007/s11912-022-01352-7\u003c/li\u003e\n\u003cli\u003eDewhirst, F. E.\u003cem\u003e et al.\u003c/em\u003e The human oral microbiome. \u003cem\u003eJ Bacteriol\u003c/em\u003e \u003cstrong\u003e192\u003c/strong\u003e, 5002-5017 (2010). https://doi.org/10.1128/jb.00542-10\u003c/li\u003e\n\u003cli\u003eKilian, M.\u003cem\u003e et al.\u003c/em\u003e The oral microbiome \u0026ndash; an update for oral healthcare professionals. \u003cem\u003eBritish Dental Journal\u003c/em\u003e \u003cstrong\u003e221\u003c/strong\u003e, 657-666 (2016). https://doi.org/10.1038/sj.bdj.2016.865\u003c/li\u003e\n\u003cli\u003eGilbert, J. A.\u003cem\u003e et al.\u003c/em\u003e Current understanding of the human microbiome. \u003cem\u003eNature Medicine\u003c/em\u003e \u003cstrong\u003e24\u003c/strong\u003e, 392-400 (2018). https://doi.org/10.1038/nm.4517\u003c/li\u003e\n\u003cli\u003eBaty, J. J., Stoner, S. N. \u0026amp; Scoffield, J. A. Oral Commensal Streptococci: Gatekeepers of the Oral Cavity. \u003cem\u003eJ Bacteriol\u003c/em\u003e \u003cstrong\u003e204\u003c/strong\u003e, e0025722 (2022). https://doi.org/10.1128/jb.00257-22\u003c/li\u003e\n\u003cli\u003eVelsko, I. M. \u0026amp; Warinner, C. Streptococcus abundance and oral site tropism in humans and non-human primates reflects host and lifestyle differences. \u003cem\u003enpj Biofilms and Microbiomes\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, 19 (2025). https://doi.org/10.1038/s41522-024-00642-1\u003c/li\u003e\n\u003cli\u003eZhang, Y.\u003cem\u003e et al.\u003c/em\u003e Human oral microbiota and its modulation for oral health. \u003cem\u003eBiomedicine \u0026amp; Pharmacotherapy\u003c/em\u003e \u003cstrong\u003e99\u003c/strong\u003e, 883-893 (2018). https://doi.org/https://doi.org/10.1016/j.biopha.2018.01.146\u003c/li\u003e\n\u003cli\u003eBloch, S., Hager-Mair, F. F., Andrukhov, O. \u0026amp; Sch\u0026auml;ffer, C. Oral streptococci: modulators of health and disease. \u003cem\u003eFrontiers in Cellular and Infection Microbiology\u003c/em\u003e \u003cstrong\u003eVolume 14 - 2024\u003c/strong\u003e (2024). https://doi.org/10.3389/fcimb.2024.1357631\u003c/li\u003e\n\u003cli\u003eKreth, J., Giacaman, R. A., Raghavan, R. \u0026amp; Merritt, J. The road less traveled - defining molecular commensalism with Streptococcus sanguinis. \u003cem\u003eMol Oral Microbiol\u003c/em\u003e \u003cstrong\u003e32\u003c/strong\u003e, 181-196 (2017). https://doi.org/10.1111/omi.12170\u003c/li\u003e\n\u003cli\u003eYe, D.\u003cem\u003e et al.\u003c/em\u003e Competitive dynamics and balance between Streptococcus mutans and commensal streptococci in oral microecology. \u003cem\u003eCrit Rev Microbiol\u003c/em\u003e \u003cstrong\u003e51\u003c/strong\u003e, 532-543 (2025). https://doi.org/10.1080/1040841x.2024.2389386\u003c/li\u003e\n\u003cli\u003eSenthil Kumar, S.\u003cem\u003e et al.\u003c/em\u003e Oral streptococci S. anginosus and S. mitis induce distinct morphological, inflammatory, and metabolic signatures in macrophages. \u003cem\u003eInfection and Immunity\u003c/em\u003e \u003cstrong\u003e92\u003c/strong\u003e, e00536-00523 (2024). https://doi.org/10.1128/iai.00536-23\u003c/li\u003e\n\u003cli\u003eTomic, U.\u003cem\u003e et al.\u003c/em\u003e Streptococcus mitis and Prevotella melaninogenica Influence Gene Expression Changes in Oral Mucosal Lesions in Periodontitis Patients. \u003cem\u003ePathogens\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, 1194 (2023). \u003c/li\u003e\n\u003cli\u003eNarikiyo, M.\u003cem\u003e et al.\u003c/em\u003e Frequent and preferential infection of Treponema denticola, Streptococcus mitis, and Streptococcus anginosus in esophageal cancers. \u003cem\u003eCancer Sci\u003c/em\u003e \u003cstrong\u003e95\u003c/strong\u003e, 569-574 (2004). https://doi.org/10.1111/j.1349-7006.2004.tb02488.x\u003c/li\u003e\n\u003cli\u003eFu, K.\u003cem\u003e et al.\u003c/em\u003e Streptococcus anginosus promotes gastric inflammation, atrophy, and tumorigenesis in mice. \u003cem\u003eCell\u003c/em\u003e \u003cstrong\u003e187\u003c/strong\u003e, 882-896.e817 (2024). https://doi.org/10.1016/j.cell.2024.01.004\u003c/li\u003e\n\u003cli\u003eSasaki, H.\u003cem\u003e et al.\u003c/em\u003e Presence of Streptococcus anginosus DNA in esophageal cancer, dysplasia of esophagus, and gastric cancer. \u003cem\u003eCancer Res\u003c/em\u003e \u003cstrong\u003e58\u003c/strong\u003e, 2991-2995 (1998). \u003c/li\u003e\n\u003cli\u003eSasaki, M.\u003cem\u003e et al.\u003c/em\u003e Streptococcus anginosus infection in oral cancer and its infection route. \u003cem\u003eOral Diseases\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, 151-156 (2005). https://doi.org/https://doi.org/10.1111/j.1601-0825.2005.01051.x\u003c/li\u003e\n\u003cli\u003eTateda, M.\u003cem\u003e et al.\u003c/em\u003e Streptococcus anginosus in head and neck squamous cell carcinoma: implication in carcinogenesis. \u003cem\u003eInt J Mol Med\u003c/em\u003e \u003cstrong\u003e6\u003c/strong\u003e, 699-703 (2000). https://doi.org/10.3892/ijmm.6.6.699\u003c/li\u003e\n\u003cli\u003eRai, A. K.\u003cem\u003e et al.\u003c/em\u003e Dysbiosis of salivary microbiome and cytokines influence oral squamous cell carcinoma through inflammation. \u003cem\u003eArch Microbiol\u003c/em\u003e \u003cstrong\u003e203\u003c/strong\u003e, 137-152 (2021). https://doi.org/10.1007/s00203-020-02011-w\u003c/li\u003e\n\u003cli\u003eZhou, L., Fan, S., Zhang, W., Wang, D. \u0026amp; Tang, D. Microbes in the tumor microenvironment: New additions to break the tumor immunotherapy dilemma. \u003cem\u003eMicrobiological Research\u003c/em\u003e \u003cstrong\u003e285\u003c/strong\u003e, 127777 (2024). https://doi.org/https://doi.org/10.1016/j.micres.2024.127777\u003c/li\u003e\n\u003cli\u003eLi, S.\u003cem\u003e et al.\u003c/em\u003e Gut Microbiota and Immune Modulatory Properties of Human Breast Milk Streptococcus salivarius and S. parasanguinis Strains. \u003cem\u003eFront Nutr\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, 798403 (2022). https://doi.org/10.3389/fnut.2022.798403\u003c/li\u003e\n\u003cli\u003eLi, W.\u003cem\u003e et al.\u003c/em\u003e A catalog of bacterial reference genomes from cultivated human oral bacteria. \u003cem\u003enpj Biofilms and Microbiomes\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, 45 (2023). https://doi.org/10.1038/s41522-023-00414-3\u003c/li\u003e\n\u003cli\u003eBurja, B.\u003cem\u003e et al.\u003c/em\u003e An Optimized Tissue Dissociation Protocol for Single-Cell RNA Sequencing Analysis of Fresh and Cultured Human Skin Biopsies. \u003cem\u003eFront Cell Dev Biol\u003c/em\u003e \u003cstrong\u003e10\u003c/strong\u003e, 872688 (2022). https://doi.org/10.3389/fcell.2022.872688\u003c/li\u003e\n\u003cli\u003eBoyanova, L. \u0026amp; Medeiros, J. A. d. S. in \u003cem\u003eClinMicroNow\u003c/em\u003e 1-7.\u003c/li\u003e\n\u003cli\u003eBouras, G.\u003cem\u003e et al.\u003c/em\u003e Hybracter: enabling scalable, automated, complete and accurate bacterial genome assemblies. \u003cem\u003eMicrobial Genomics\u003c/em\u003e \u003cstrong\u003e10\u003c/strong\u003e (2024). https://doi.org/https://doi.org/10.1099/mgen.0.001244\u003c/li\u003e\n\u003cli\u003eKolmogorov, M., Yuan, J., Lin, Y. \u0026amp; Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. \u003cem\u003eNature Biotechnology\u003c/em\u003e \u003cstrong\u003e37\u003c/strong\u003e, 540-546 (2019). https://doi.org/10.1038/s41587-019-0072-8\u003c/li\u003e\n\u003cli\u003eBouras, G., Grigson, S., Papudeshi, B., Mallawaarachchi, V. \u0026amp; Roach, M. Dnaapler: A tool to reorient circular microbial genomes. \u003cem\u003eJournal of Open Source Software\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, 5968 (2024). https://doi.org/10.21105/joss.05968\u003c/li\u003e\n\u003cli\u003eBouras, G., Sheppard, A. E., Mallawaarachchi, V. \u0026amp; Vreugde, S. Plassembler: an automated bacterial plasmid assembly tool. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e39\u003c/strong\u003e (2023). https://doi.org/10.1093/bioinformatics/btad409\u003c/li\u003e\n\u003cli\u003eChklovski, A., Parks, D. H., Woodcroft, B. J. \u0026amp; Tyson, G. W. CheckM2: a rapid, scalable and accurate tool for assessing microbial genome quality using machine learning. \u003cem\u003eNature Methods\u003c/em\u003e \u003cstrong\u003e20\u003c/strong\u003e, 1203-1212 (2023). https://doi.org/10.1038/s41592-023-01940-w\u003c/li\u003e\n\u003cli\u003eChaumeil, P.-A., Mussig, A. J., Hugenholtz, P. \u0026amp; Parks, D. H. GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e36\u003c/strong\u003e, 1925-1927 (2019). https://doi.org/10.1093/bioinformatics/btz848\u003c/li\u003e\n\u003cli\u003eSchwengers, O.\u003cem\u003e et al.\u003c/em\u003e Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification. \u003cem\u003eMicrobial Genomics\u003c/em\u003e \u003cstrong\u003e7\u003c/strong\u003e (2021). https://doi.org/https://doi.org/10.1099/mgen.0.000685\u003c/li\u003e\n\u003cli\u003eTonkin-Hill, G.\u003cem\u003e et al.\u003c/em\u003e Producing polished prokaryotic pangenomes with the Panaroo pipeline. \u003cem\u003eGenome Biology\u003c/em\u003e \u003cstrong\u003e21\u003c/strong\u003e, 180 (2020). https://doi.org/10.1186/s13059-020-02090-4\u003c/li\u003e\n\u003cli\u003eKatoh, K., Misawa, K., Kuma, K. i. \u0026amp; Miyata, T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. \u003cem\u003eNucleic Acids Research\u003c/em\u003e \u003cstrong\u003e30\u003c/strong\u003e, 3059-3066 (2002). https://doi.org/10.1093/nar/gkf436\u003c/li\u003e\n\u003cli\u003eKalyaanamoorthy, S., Minh, B. Q., Wong, T. K. F., von Haeseler, A. \u0026amp; Jermiin, L. S. ModelFinder: fast model selection for accurate phylogenetic estimates. \u003cem\u003eNature Methods\u003c/em\u003e \u003cstrong\u003e14\u003c/strong\u003e, 587-589 (2017). https://doi.org/10.1038/nmeth.4285\u003c/li\u003e\n\u003cli\u003eLetunic, I. \u0026amp; Bork, P. Interactive Tree of Life (iTOL) v6: recent updates to the phylogenetic tree display and annotation tool. \u003cem\u003eNucleic Acids Research\u003c/em\u003e \u003cstrong\u003e52\u003c/strong\u003e, W78-W82 (2024). https://doi.org/10.1093/nar/gkae268\u003c/li\u003e\n\u003cli\u003eMcInnes, L., Healy, J. \u0026amp; Melville, J. Umap: Uniform manifold approximation and projection for dimension reduction. \u003cem\u003earXiv preprint arXiv:1802.03426\u003c/em\u003e (2018). \u003c/li\u003e\n\u003cli\u003eKodinariya, T. \u0026amp; Makwana, P. Review on Determining of Cluster in K-means Clustering. \u003cem\u003eInternational Journal of Advance Research in Computer Science and Management Studies\u003c/em\u003e \u003cstrong\u003e1\u003c/strong\u003e, 90-95 (2013). \u003c/li\u003e\n\u003cli\u003eDrula, E.\u003cem\u003e et al.\u003c/em\u003e The carbohydrate-active enzyme database: functions and literature. \u003cem\u003eNucleic Acids Research\u003c/em\u003e \u003cstrong\u003e50\u003c/strong\u003e, D571-D577 (2021). https://doi.org/10.1093/nar/gkab1045\u003c/li\u003e\n\u003cli\u003eBuchfink, B., Xie, C. \u0026amp; Huson, D. H. Fast and sensitive protein alignment using DIAMOND. \u003cem\u003eNature Methods\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, 59-60 (2015). https://doi.org/10.1038/nmeth.3176\u003c/li\u003e\n\u003cli\u003eJohansson, M. H. K.\u003cem\u003e et al.\u003c/em\u003e Detection of mobile genetic elements associated with antibiotic resistance in Salmonella enterica using a newly developed web tool: MobileElementFinder. \u003cem\u003eJ Antimicrob Chemother\u003c/em\u003e \u003cstrong\u003e76\u003c/strong\u003e, 101-109 (2021). https://doi.org/10.1093/jac/dkaa390\u003c/li\u003e\n\u003cli\u003eCarattoli, A.\u003cem\u003e et al.\u003c/em\u003e In silico detection and typing of plasmids using PlasmidFinder and plasmid multilocus sequence typing. \u003cem\u003eAntimicrob Agents Chemother\u003c/em\u003e \u003cstrong\u003e58\u003c/strong\u003e, 3895-3903 (2014). https://doi.org/10.1128/aac.02412-14\u003c/li\u003e\n\u003cli\u003eBouras, G.\u003cem\u003e et al.\u003c/em\u003e Pharokka: a fast scalable bacteriophage annotation tool. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e39\u003c/strong\u003e (2022). https://doi.org/10.1093/bioinformatics/btac776\u003c/li\u003e\n\u003cli\u003eNayfach, S.\u003cem\u003e et al.\u003c/em\u003e CheckV assesses the quality and completeness of metagenome-assembled viral genomes. \u003cem\u003eNature Biotechnology\u003c/em\u003e \u003cstrong\u003e39\u003c/strong\u003e, 578-585 (2021). https://doi.org/10.1038/s41587-020-00774-7\u003c/li\u003e\n\u003cli\u003eLi, W. \u0026amp; Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e22\u003c/strong\u003e, 1658-1659 (2006). https://doi.org/10.1093/bioinformatics/btl158\u003c/li\u003e\n\u003cli\u003eSeemann, T. Prokka: rapid prokaryotic genome annotation. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e30\u003c/strong\u003e, 2068-2069 (2014). https://doi.org/10.1093/bioinformatics/btu153\u003c/li\u003e\n\u003cli\u003eZhang, Z. J.\u003cem\u003e et al.\u003c/em\u003e Comprehensive analyses of a large human gut Bacteroidales culture collection reveal species and strain level diversity and evolution. \u003cem\u003ebioRxiv\u003c/em\u003e (2024). https://doi.org/10.1101/2024.03.08.584156\u003c/li\u003e\n\u003cli\u003eSasaki, M.\u003cem\u003e et al.\u003c/em\u003e Streptococcus anginosus infection in oral cancer and its infection route. \u003cem\u003eOral diseases\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, 151-156 (2005). \u003c/li\u003e\n\u003cli\u003eYang, C.-Y.\u003cem\u003e et al.\u003c/em\u003e Oral Microbiota Community Dynamics Associated With Oral Squamous Cell Carcinoma Staging. \u003cem\u003eFrontiers in Microbiology\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e (2018). https://doi.org/10.3389/fmicb.2018.00862\u003c/li\u003e\n\u003cli\u003eMar\u0026ccedil;ais, G.\u003cem\u003e et al.\u003c/em\u003e MUMmer4: A fast and versatile genome alignment system. \u003cem\u003ePLOS Computational Biology\u003c/em\u003e \u003cstrong\u003e14\u003c/strong\u003e, e1005944 (2018). https://doi.org/10.1371/journal.pcbi.1005944\u003c/li\u003e\n\u003cli\u003eJain, C., Rodriguez-R, L. M., Phillippy, A. M., Konstantinidis, K. T. \u0026amp; Aluru, S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. \u003cem\u003eNature Communications\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, 5114 (2018). https://doi.org/10.1038/s41467-018-07641-9\u003c/li\u003e\n\u003cli\u003eZou, Y.\u003cem\u003e et al.\u003c/em\u003e Common Methods for Phylogenetic Tree Construction and Their Implementation in R. \u003cem\u003eBioengineering (Basel)\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e (2024). https://doi.org/10.3390/bioengineering11050480\u003c/li\u003e\n\u003cli\u003eKilian, M.\u003cem\u003e et al.\u003c/em\u003e The oral microbiome\u0026ndash;an update for oral healthcare professionals. \u003cem\u003eBritish dental journal\u003c/em\u003e \u003cstrong\u003e221\u003c/strong\u003e, 657-666 (2016). \u003c/li\u003e\n\u003cli\u003eBelman, S., Chaguza, C., Kumar, N., Lo, S. \u0026amp; Bentley, S. D. A new perspective on ancient Mitis group streptococcal genetics. \u003cem\u003eMicrob Genom\u003c/em\u003e \u003cstrong\u003e8\u003c/strong\u003e (2022). https://doi.org/10.1099/mgen.0.000753\u003c/li\u003e\n\u003cli\u003eDo, T.\u003cem\u003e et al.\u003c/em\u003e Population structure of Streptococcus oralis. \u003cem\u003eMicrobiology (Reading)\u003c/em\u003e \u003cstrong\u003e155\u003c/strong\u003e, 2593-2602 (2009). https://doi.org/10.1099/mic.0.027284-0\u003c/li\u003e\n\u003cli\u003eGao, X.-Y., Zhi, X.-Y., Li, H.-W., Klenk, H.-P. \u0026amp; Li, W.-J. Comparative Genomics of the Bacterial Genus Streptococcus Illuminates Evolutionary Implications of Species Groups. \u003cem\u003ePLOS ONE\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, e101229 (2014). https://doi.org/10.1371/journal.pone.0101229\u003c/li\u003e\n\u003cli\u003eAbranches, J.\u003cem\u003e et al.\u003c/em\u003e Biology of Oral Streptococci. \u003cem\u003eMicrobiol Spectr\u003c/em\u003e \u003cstrong\u003e6\u003c/strong\u003e (2018). https://doi.org/10.1128/microbiolspec.GPP3-0042-2018\u003c/li\u003e\n\u003cli\u003eTaylor, Z. A., Pham, D. N. \u0026amp; Zeng, L. Systematic analysis of the glucose-PTS in Streptococcus sanguinis highlighted its importance in central metabolism and bacterial fitness. \u003cem\u003eAppl Environ Microbiol\u003c/em\u003e \u003cstrong\u003e91\u003c/strong\u003e, e0193524 (2025). https://doi.org/10.1128/aem.01935-24\u003c/li\u003e\n\u003cli\u003eZheng, W.\u003cem\u003e et al.\u003c/em\u003e Distinct Biological Potential of Streptococcus gordonii and Streptococcus sanguinis Revealed by Comparative Genome Analysis. \u003cem\u003eScientific Reports\u003c/em\u003e \u003cstrong\u003e7\u003c/strong\u003e, 2949 (2017). https://doi.org/10.1038/s41598-017-02399-4\u003c/li\u003e\n\u003cli\u003eGray, T. Streptococcus anginosus group: Clinical significance of an important group of pathogens. \u003cem\u003eClinical Microbiology Newsletter\u003c/em\u003e \u003cstrong\u003e27\u003c/strong\u003e, 155-159 (2005). https://doi.org/https://doi.org/10.1016/j.clinmicnews.2005.09.006\u003c/li\u003e\n\u003cli\u003eSunwoo, B. Y. \u0026amp; Miller, W. T., Jr. Streptococcus anginosus infections: crossing tissue planes. \u003cem\u003eChest\u003c/em\u003e \u003cstrong\u003e146\u003c/strong\u003e, e121-e125 (2014). https://doi.org/10.1378/chest.13-2791\u003c/li\u003e\n\u003cli\u003eZhang, C.\u003cem\u003e et al.\u003c/em\u003e Glutamine enhances pneumococcal growth under methionine semi-starvation by elevating intracellular pH. \u003cem\u003eFrontiers in Microbiology\u003c/em\u003e \u003cstrong\u003eVolume 15 - 2024\u003c/strong\u003e (2024). https://doi.org/10.3389/fmicb.2024.1430038\u003c/li\u003e\n\u003cli\u003ePal, C.\u003cem\u003e et al.\u003c/em\u003e in \u003cem\u003eAdvances in Microbial Physiology\u003c/em\u003e Vol. 70 (ed Robert K. Poole) 261-313 (Academic Press, 2017).\u003c/li\u003e\n\u003cli\u003eNobbs, A. H., Lamont, R. J. \u0026amp; Jenkinson, H. F. Streptococcus adherence and colonization. \u003cem\u003eMicrobiol Mol Biol Rev\u003c/em\u003e \u003cstrong\u003e73\u003c/strong\u003e, 407-450, Table of Contents (2009). https://doi.org/10.1128/mmbr.00014-09\u003c/li\u003e\n\u003cli\u003eShelburne, S. A., Davenport, M. T., Keith, D. B. \u0026amp; Musser, J. M. The role of complex carbohydrate catabolism in the pathogenesis of invasive streptococci. \u003cem\u003eTrends Microbiol\u003c/em\u003e \u003cstrong\u003e16\u003c/strong\u003e, 318-325 (2008). https://doi.org/10.1016/j.tim.2008.04.002\u003c/li\u003e\n\u003cli\u003eQiao, Y.\u003cem\u003e et al.\u003c/em\u003e Lactate metabolism and lactylation in breast cancer: mechanisms and implications. \u003cem\u003eCancer Metastasis Rev\u003c/em\u003e \u003cstrong\u003e44\u003c/strong\u003e, 48 (2025). https://doi.org/10.1007/s10555-025-10264-4\u003c/li\u003e\n\u003cli\u003eJin, P., Wang, L., Chen, D. \u0026amp; Chen, Y. Unveiling the complexity of early childhood caries: Candida albicans and Streptococcus mutans cooperative strategies in carbohydrate metabolism and virulence. \u003cem\u003eJ Oral Microbiol\u003c/em\u003e \u003cstrong\u003e16\u003c/strong\u003e, 2339161 (2024). https://doi.org/10.1080/20002297.2024.2339161\u003c/li\u003e\n\u003cli\u003eRajasekaran, J. J.\u003cem\u003e et al.\u003c/em\u003e Oral Microbiome: A Review of Its Impact on Oral and Systemic Health. \u003cem\u003eMicroorganisms\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e (2024). https://doi.org/10.3390/microorganisms12091797\u003c/li\u003e\n\u003cli\u003eNazaret, F., Alloing, G., Mandon, K. \u0026amp; Frendo, P. MarR Family Transcriptional Regulators and Their Roles in Plant-Interacting Bacteria. \u003cem\u003eMicroorganisms\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e (2023). https://doi.org/10.3390/microorganisms11081936\u003c/li\u003e\n\u003cli\u003eLemos, J. A.\u003cem\u003e et al.\u003c/em\u003e The Biology of Streptococcus mutans. \u003cem\u003eMicrobiol Spectr\u003c/em\u003e \u003cstrong\u003e7\u003c/strong\u003e (2019). https://doi.org/10.1128/microbiolspec.GPP3-0051-2018\u003c/li\u003e\n\u003cli\u003eDurand, E., Oomen, C. \u0026amp; Waksman, G. Biochemical dissection of the ATPase TraB, the VirB4 homologue of the Escherichia coli pKM101 conjugation machinery. \u003cem\u003eJ Bacteriol\u003c/em\u003e \u003cstrong\u003e192\u003c/strong\u003e, 2315-2323 (2010). https://doi.org/10.1128/jb.01384-09\u003c/li\u003e\n\u003cli\u003eZeng, Y.\u003cem\u003e et al.\u003c/em\u003e cpsJ gene of Streptococcus iniae is involved in capsular polysaccharide synthesis and virulence. \u003cem\u003eAntonie van Leeuwenhoek\u003c/em\u003e \u003cstrong\u003e109\u003c/strong\u003e, 1483-1492 (2016). https://doi.org/10.1007/s10482-016-0750-1\u003c/li\u003e\n\u003cli\u003eMarceau, A. H. Functions of single-strand DNA-binding proteins in DNA replication, recombination, and repair. \u003cem\u003eMethods Mol Biol\u003c/em\u003e \u003cstrong\u003e922\u003c/strong\u003e, 1-21 (2012). https://doi.org/10.1007/978-1-62703-032-8_1\u003c/li\u003e\n\u003cli\u003eWang, J.\u003cem\u003e et al.\u003c/em\u003e The conserved domain database in 2023. \u003cem\u003eNucleic Acids Res\u003c/em\u003e \u003cstrong\u003e51\u003c/strong\u003e, D384-d388 (2023). https://doi.org/10.1093/nar/gkac1096\u003c/li\u003e\n\u003cli\u003eSchr\u0026ouml;der, G.\u003cem\u003e et al.\u003c/em\u003e TraG-like proteins of DNA transfer systems and of the Helicobacter pylori type IV secretion system: inner membrane gate for exported substrates? \u003cem\u003eJ Bacteriol\u003c/em\u003e \u003cstrong\u003e184\u003c/strong\u003e, 2767-2779 (2002). https://doi.org/10.1128/jb.184.10.2767-2779.2002\u003c/li\u003e\n\u003cli\u003eSteen, J. A., Bohlke, N., Vickers, C. E. \u0026amp; Nielsen, L. K. The Trehalose Phosphotransferase System (PTS) in E. coli W Can Transport Low Levels of Sucrose that Are Sufficient to Facilitate Induction of the csc Sucrose Catabolism Operon. \u003cem\u003ePLOS ONE\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, e88688 (2014). https://doi.org/10.1371/journal.pone.0088688\u003c/li\u003e\n\u003cli\u003eLehman, M. K.\u003cem\u003e et al.\u003c/em\u003e Proline transporters ProT and PutP are required for Staphylococcus aureus infection. \u003cem\u003ePLOS Pathogens\u003c/em\u003e \u003cstrong\u003e19\u003c/strong\u003e, e1011098 (2023). https://doi.org/10.1371/journal.ppat.1011098\u003c/li\u003e\n\u003cli\u003eStasi, R., Neves, H. I. \u0026amp; Spira, B. Phosphate uptake by the phosphonate transport system PhnCDE. \u003cem\u003eBMC Microbiology\u003c/em\u003e \u003cstrong\u003e19\u003c/strong\u003e, 79 (2019). https://doi.org/10.1186/s12866-019-1445-3\u003c/li\u003e\n\u003cli\u003eZhang, J.\u003cem\u003e et al.\u003c/em\u003e Structure of glycerol dehydrogenase (GldA) from Escherichia coli. \u003cem\u003eActa Crystallogr F Struct Biol Commun\u003c/em\u003e \u003cstrong\u003e75\u003c/strong\u003e, 176-183 (2019). https://doi.org/10.1107/s2053230x19000037\u003c/li\u003e\n\u003cli\u003eWilkinson, M.\u003cem\u003e et al.\u003c/em\u003e Structure of the DNA-Bound Spacer Capture Complex of a Type II CRISPR-Cas System. \u003cem\u003eMolecular Cell\u003c/em\u003e \u003cstrong\u003e75\u003c/strong\u003e, 90-101.e105 (2019). https://doi.org/10.1016/j.molcel.2019.04.020\u003c/li\u003e\n\u003cli\u003eLombard, V., Golaconda Ramulu, H., Drula, E., Coutinho, P. M. \u0026amp; Henrissat, B. The carbohydrate-active enzymes database (CAZy) in 2013. \u003cem\u003eNucleic Acids Res\u003c/em\u003e \u003cstrong\u003e42\u003c/strong\u003e, D490-495 (2014). https://doi.org/10.1093/nar/gkt1178\u003c/li\u003e\n\u003cli\u003eMuramatsu, M. K. \u0026amp; Winter, S. E. Nutrient acquisition strategies by gut microbes. \u003cem\u003eCell Host Microbe\u003c/em\u003e \u003cstrong\u003e32\u003c/strong\u003e, 863-874 (2024). https://doi.org/10.1016/j.chom.2024.05.011\u003c/li\u003e\n\u003cli\u003eAndreassen, P. R.\u003cem\u003e et al.\u003c/em\u003e Host-glycan metabolism is regulated by a species-conserved two-component system in Streptococcus pneumoniae. \u003cem\u003ePLoS Pathog\u003c/em\u003e \u003cstrong\u003e16\u003c/strong\u003e, e1008332 (2020). https://doi.org/10.1371/journal.ppat.1008332\u003c/li\u003e\n\u003cli\u003eZhao, S., Peralta, R. M., Avina-Ochoa, N., Delgoffe, G. M. \u0026amp; Kaech, S. M. Metabolic regulation of T cells in the tumor microenvironment by nutrient availability and diet. \u003cem\u003eSemin Immunol\u003c/em\u003e \u003cstrong\u003e52\u003c/strong\u003e, 101485 (2021). https://doi.org/10.1016/j.smim.2021.101485\u003c/li\u003e\n\u003cli\u003eMichalska, K.\u003cem\u003e et al.\u003c/em\u003e GH1-family 6-P-\u0026beta;-glucosidases from human microbiome lactic acid bacteria. \u003cem\u003eActa Crystallogr D Biol Crystallogr\u003c/em\u003e \u003cstrong\u003e69\u003c/strong\u003e, 451-463 (2013). https://doi.org/10.1107/s0907444912049608\u003c/li\u003e\n\u003cli\u003eStam, M. R., Danchin, E. G., Rancurel, C., Coutinho, P. M. \u0026amp; Henrissat, B. Dividing the large glycoside hydrolase family 13 into subfamilies: towards improved functional annotations of alpha-amylase-related proteins. \u003cem\u003eProtein Eng Des Sel\u003c/em\u003e \u003cstrong\u003e19\u003c/strong\u003e, 555-562 (2006). https://doi.org/10.1093/protein/gzl044\u003c/li\u003e\n\u003cli\u003eWohlk\u0026ouml;nig, A., Huet, J., Looze, Y. \u0026amp; Wintjens, R. Structural relationships in the lysozyme superfamily: significant evidence for glycoside hydrolase signature motifs. \u003cem\u003ePLoS One\u003c/em\u003e \u003cstrong\u003e5\u003c/strong\u003e, e15388 (2010). https://doi.org/10.1371/journal.pone.0015388\u003c/li\u003e\n\u003cli\u003eAlshareef, S. A. Metabolic analysis of the CAZy class glycosyltransferases in rhizospheric soil fungiome of the plant species Moringa oleifera. \u003cem\u003eSaudi J Biol Sci\u003c/em\u003e \u003cstrong\u003e31\u003c/strong\u003e, 103956 (2024). https://doi.org/10.1016/j.sjbs.2024.103956\u003c/li\u003e\n\u003cli\u003eM\u0026oslash;ller, M. S., Henriksen, A. \u0026amp; Svensson, B. Structure and function of \u0026alpha;-glucan debranching enzymes. \u003cem\u003eCell Mol Life Sci\u003c/em\u003e \u003cstrong\u003e73\u003c/strong\u003e, 2619-2641 (2016). https://doi.org/10.1007/s00018-016-2241-y\u003c/li\u003e\n\u003cli\u003eAkcapinar, G. B., Kappel, L., Sezerman, O. U. \u0026amp; Seidl-Seiboth, V. Molecular diversity of LysM carbohydrate-binding motifs in fungi. \u003cem\u003eCurr Genet\u003c/em\u003e \u003cstrong\u003e61\u003c/strong\u003e, 103-113 (2015). https://doi.org/10.1007/s00294-014-0471-9\u003c/li\u003e\n\u003cli\u003evan Wyk, N., Drancourt, M., Henrissat, B. \u0026amp; Kremer, L. Current perspectives on the families of glycoside hydrolases of Mycobacterium tuberculosis: their importance and prospects for assigning function to unknowns. \u003cem\u003eGlycobiology\u003c/em\u003e \u003cstrong\u003e27\u003c/strong\u003e, 112-122 (2017). https://doi.org/10.1093/glycob/cww099\u003c/li\u003e\n\u003cli\u003eAbbott, D. W. \u0026amp; van Bueren, A. L. Using structure to inform carbohydrate binding module function. \u003cem\u003eCurr Opin Struct Biol\u003c/em\u003e \u003cstrong\u003e28\u003c/strong\u003e, 32-40 (2014). https://doi.org/10.1016/j.sbi.2014.07.004\u003c/li\u003e\n\u003cli\u003eKato, K. \u0026amp; Ishiwa, A. The role of carbohydrates in infection strategies of enteric pathogens. \u003cem\u003eTrop Med Health\u003c/em\u003e \u003cstrong\u003e43\u003c/strong\u003e, 41-52 (2015). https://doi.org/10.2149/tmh.2014-25\u003c/li\u003e\n\u003cli\u003eTailford, L. E., Crost, E. H., Kavanaugh, D. \u0026amp; Juge, N. Mucin glycan foraging in the human gut microbiome. \u003cem\u003eFront Genet\u003c/em\u003e \u003cstrong\u003e6\u003c/strong\u003e, 81 (2015). https://doi.org/10.3389/fgene.2015.00081\u003c/li\u003e\n\u003cli\u003eHomann, N., Jousimies-Somer, H., Jokelainen, K., Heine, R. \u0026amp; Salaspuro, M. High acetaldehyde levels in saliva after ethanol consumption: methodological aspects and pathogenetic implications. \u003cem\u003eCarcinogenesis\u003c/em\u003e \u003cstrong\u003e18\u003c/strong\u003e, 1739-1743 (1997). https://doi.org/10.1093/carcin/18.9.1739\u003c/li\u003e\n\u003cli\u003eZhang, Q., Ma, Q., Wang, Y., Wu, H. \u0026amp; Zou, J. Molecular mechanisms of inhibiting glucosyltransferases for biofilm formation in Streptococcus mutans. \u003cem\u003eInt J Oral Sci\u003c/em\u003e \u003cstrong\u003e13\u003c/strong\u003e, 30 (2021). https://doi.org/10.1038/s41368-021-00137-1\u003c/li\u003e\n\u003cli\u003eQin, X., Chen, X. \u0026amp; Li, Q. \u003cem\u003eThe evolution and profile of AA10 lytic polysaccharide monooxygenase coupled with cellulose decomposition in different composting microenvironments\u003c/em\u003e. (2023).\u003c/li\u003e\n\u003cli\u003eOnyango, S. O., Juma, J., De Paepe, K. \u0026amp; Van de Wiele, T. Oral and Gut Microbial Carbohydrate-Active Enzymes Landscape in Health and Disease. \u003cem\u003eFrontiers in Microbiology\u003c/em\u003e \u003cstrong\u003eVolume 12 - 2021\u003c/strong\u003e (2021). https://doi.org/10.3389/fmicb.2021.653448\u003c/li\u003e\n\u003cli\u003eJensen, A., Scholz, C. F. P. \u0026amp; Kilian, M. Re-evaluation of the taxonomy of the Mitis group of the genus Streptococcus based on whole genome phylogenetic analyses, and proposed reclassification of Streptococcus dentisani as Streptococcus oralis subsp. dentisani comb. nov., Streptococcus tigurinus as Streptococcus oralis subsp. tigurinus comb. nov., and Streptococcus oligofermentans as a later synonym of Streptococcus cristatus. \u003cem\u003eInternational Journal of Systematic and Evolutionary Microbiology\u003c/em\u003e \u003cstrong\u003e66\u003c/strong\u003e, 4803-4820 (2016). https://doi.org/https://doi.org/10.1099/ijsem.0.001433\u003c/li\u003e\n\u003cli\u003eParks, D. H.\u003cem\u003e et al.\u003c/em\u003e A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. \u003cem\u003eNature Biotechnology\u003c/em\u003e \u003cstrong\u003e36\u003c/strong\u003e, 996-1004 (2018). https://doi.org/10.1038/nbt.4229\u003c/li\u003e\n\u003cli\u003eHuang, Y., Zhao, X., Cui, L. \u0026amp; Huang, S. Metagenomic and Metatranscriptomic Insight Into Oral Biofilms in Periodontitis and Related Systemic Diseases. \u003cem\u003eFrontiers in Microbiology\u003c/em\u003e \u003cstrong\u003eVolume 12 - 2021\u003c/strong\u003e (2021). https://doi.org/10.3389/fmicb.2021.728585\u003c/li\u003e\n\u003cli\u003eMark Welch, J. L., Rossetti, B. J., Rieken, C. W., Dewhirst, F. E. \u0026amp; Borisy, G. G. Biogeography of a human oral microbiome at the micron scale. \u003cem\u003eProceedings of the National Academy of Sciences\u003c/em\u003e \u003cstrong\u003e113\u003c/strong\u003e, E791-E800 (2016). https://doi.org/doi:10.1073/pnas.1522149113\u003c/li\u003e\n\u003cli\u003eMontanari, E.\u003cem\u003e et al.\u003c/em\u003e Biofilm formation by the host microbiota: a protective shield against immunity and its implication in cancer. \u003cem\u003eMol Cancer\u003c/em\u003e \u003cstrong\u003e24\u003c/strong\u003e, 148 (2025). https://doi.org/10.1186/s12943-025-02348-0\u003c/li\u003e\n\u003cli\u003eCrestani, C.\u003cem\u003e et al.\u003c/em\u003e Genomic and functional determinants of host spectrum in Group B Streptococcus. \u003cem\u003ePLoS Pathog\u003c/em\u003e \u003cstrong\u003e20\u003c/strong\u003e, e1012400 (2024). https://doi.org/10.1371/journal.ppat.1012400\u003c/li\u003e\n\u003cli\u003eLassalle, F.\u003cem\u003e et al.\u003c/em\u003e GC-Content evolution in bacterial genomes: the biased gene conversion hypothesis expands. \u003cem\u003ePLoS Genet\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, e1004941 (2015). https://doi.org/10.1371/journal.pgen.1004941\u003c/li\u003e\n\u003cli\u003eGuo, L., Dai, H., Feng, S. \u0026amp; Zhao, Y. Contribution of GalU to biofilm formation, motility, antibiotic and serum resistance, and pathogenicity of Salmonella Typhimurium. \u003cem\u003eFront Cell Infect Microbiol\u003c/em\u003e \u003cstrong\u003e13\u003c/strong\u003e, 1149541 (2023). https://doi.org/10.3389/fcimb.2023.1149541\u003c/li\u003e\n\u003cli\u003eOlson, A. B.\u003cem\u003e et al.\u003c/em\u003e Phylogenetic relationship and virulence inference of Streptococcus Anginosus Group: curated annotation and whole-genome comparative analysis support distinct species designation. \u003cem\u003eBMC Genomics\u003c/em\u003e \u003cstrong\u003e14\u003c/strong\u003e, 895 (2013). https://doi.org/10.1186/1471-2164-14-895\u003c/li\u003e\n\u003cli\u003ePinho, S. S. \u0026amp; Reis, C. A. Glycosylation in cancer: mechanisms and clinical implications. \u003cem\u003eNature Reviews Cancer\u003c/em\u003e \u003cstrong\u003e15\u003c/strong\u003e, 540-555 (2015). https://doi.org/10.1038/nrc3982\u003c/li\u003e\n\u003cli\u003eCrouch, L. I.\u003cem\u003e et al.\u003c/em\u003e The role of glycans in health and disease: Regulators of the interaction between gut microbiota and host immune system. \u003cem\u003eSemin Immunol\u003c/em\u003e \u003cstrong\u003e73\u003c/strong\u003e, 101891 (2024). https://doi.org/10.1016/j.smim.2024.101891\u003c/li\u003e\n\u003cli\u003eFlynn, K. J., Baxter, N. T. \u0026amp; Schloss, P. D. Metabolic and Community Synergy of Oral Bacteria in Colorectal Cancer. \u003cem\u003emSphere\u003c/em\u003e \u003cstrong\u003e1\u003c/strong\u003e (2016). https://doi.org/10.1128/mSphere.00102-16\u003c/li\u003e\n\u003cli\u003eGarrett, W. S. Cancer and the microbiota. \u003cem\u003eScience\u003c/em\u003e \u003cstrong\u003e348\u003c/strong\u003e, 80-86 (2015). https://doi.org/10.1126/science.aaa4972\u003c/li\u003e\n\u003cli\u003eYu, B.\u003cem\u003e et al.\u003c/em\u003e Identification and characterization of new proteins crucial for bacterial spore resistance and germination. \u003cem\u003eFrontiers in Microbiology\u003c/em\u003e \u003cstrong\u003eVolume 14 - 2023\u003c/strong\u003e (2023). https://doi.org/10.3389/fmicb.2023.1161604\u003c/li\u003e\n\u003cli\u003eJackson, C. R. \u0026amp; Dugas, S. L. Phylogenetic analysis of bacterial and archaeal arsC gene sequences suggests an ancient, common origin for arsenate reductase. \u003cem\u003eBMC Evolutionary Biology\u003c/em\u003e \u003cstrong\u003e3\u003c/strong\u003e, 18 (2003). https://doi.org/10.1186/1471-2148-3-18\u003c/li\u003e\n\u003cli\u003eBui, H. B. \u0026amp; Inaba, K. Structures, Mechanisms, and Physiological Functions of Zinc Transporters in Different Biological Kingdoms. \u003cem\u003eInt J Mol Sci\u003c/em\u003e \u003cstrong\u003e25\u003c/strong\u003e (2024). https://doi.org/10.3390/ijms25053045\u003c/li\u003e\n\u003cli\u003eHowell, K. J.\u003cem\u003e et al.\u003c/em\u003e Gene Content and Diversity of the Loci Encoding Biosynthesis of Capsular Polysaccharides of the 15 Serovar Reference Strains of Haemophilus parasuis. \u003cem\u003eJournal of Bacteriology\u003c/em\u003e \u003cstrong\u003e195\u003c/strong\u003e, 4264-4273 (2013). https://doi.org/doi:10.1128/jb.00471-13\u003c/li\u003e\n\u003cli\u003eChua, W.-Z.\u003cem\u003e et al.\u003c/em\u003e High-Throughput Mutagenesis and Cross-Complementation Experiments Reveal Substrate Preference and Critical Residues of the Capsule Transporters in Streptococcus pneumoniae. \u003cem\u003emBio\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e (2021). https://doi.org/10.1128/mBio.02615-21\u003c/li\u003e\n\u003cli\u003eKedlaya Herga, S.\u003cem\u003e et al.\u003c/em\u003e Streptococcus spp. in oral cancer: host-microbe interactions, mechanistic insights, and diagnostic implications. \u003cem\u003eFrontiers in Cellular and Infection Microbiology\u003c/em\u003e \u003cstrong\u003eVolume 15 - 2025\u003c/strong\u003e (2025). https://doi.org/10.3389/fcimb.2025.1688701\u003c/li\u003e\n\u003cli\u003eHong, Q., Ding, S., Xing, C. \u0026amp; Mu, Z. Advances in tumor immune microenvironment of head and neck squamous cell carcinoma: A review of literature. \u003cem\u003eMedicine (Baltimore)\u003c/em\u003e \u003cstrong\u003e103\u003c/strong\u003e, e37387 (2024). https://doi.org/10.1097/md.0000000000037387\u003c/li\u003e\n\u003cli\u003eAhmad, S.\u003cem\u003e et al.\u003c/em\u003e Oral Microbiome as a Biomarker and Therapeutic Target in Head and Neck Cancer: Current Insights and Future Directions. \u003cem\u003eCancers (Basel)\u003c/em\u003e \u003cstrong\u003e17\u003c/strong\u003e (2025). https://doi.org/10.3390/cancers17162667\u003c/li\u003e\n\u003cli\u003eChen, G., Wu, K., Li, H., Xia, D. \u0026amp; He, T. Role of hypoxia in the tumor microenvironment and targeted therapy. \u003cem\u003eFrontiers in Oncology\u003c/em\u003e \u003cstrong\u003eVolume 12 - 2022\u003c/strong\u003e (2022). https://doi.org/10.3389/fonc.2022.961637\u003c/li\u003e\n\u003cli\u003eAboelella, N. S., Brandle, C., Kim, T., Ding, Z. C. \u0026amp; Zhou, G. Oxidative Stress in the Tumor Microenvironment and Its Relevance to Cancer Immunotherapy. \u003cem\u003eCancers (Basel)\u003c/em\u003e \u003cstrong\u003e13\u003c/strong\u003e (2021). https://doi.org/10.3390/cancers13050986\u003c/li\u003e\n\u003cli\u003ePark, M., Mitchell, W. J. \u0026amp; Rafii, F. Effect of Trehalose and Trehalose Transport on the Tolerance of Clostridium perfringens to Environmental Stress in a Wild Type Strain and Its Fluoroquinolone-Resistant Mutant. \u003cem\u003eInternational Journal of Microbiology\u003c/em\u003e \u003cstrong\u003e2016\u003c/strong\u003e, 4829716 (2016). https://doi.org/https://doi.org/10.1155/2016/4829716\u003c/li\u003e\n\u003cli\u003eZhou, Y., Zhu, W., Bellur, P. S., Rewinkel, D. \u0026amp; Becker, D. F. Direct linking of metabolism and gene expression in the proline utilization A protein from Escherichia coli. \u003cem\u003eAmino Acids\u003c/em\u003e \u003cstrong\u003e35\u003c/strong\u003e, 711-718 (2008). https://doi.org/10.1007/s00726-008-0053-6\u003c/li\u003e\n\u003cli\u003eGrove, A. MarR family transcription factors. \u003cem\u003eCurrent Biology\u003c/em\u003e \u003cstrong\u003e23\u003c/strong\u003e, R142-R143 (2013). https://doi.org/10.1016/j.cub.2013.01.013\u003c/li\u003e\n\u003cli\u003eChen, T.\u003cem\u003e et al.\u003c/em\u003e The Human Oral Microbiome Database: a web accessible resource for investigating oral microbe taxonomic and genomic information. \u003cem\u003eDatabase (Oxford)\u003c/em\u003e \u003cstrong\u003e2010\u003c/strong\u003e, baq013 (2010). https://doi.org/10.1093/database/baq013\u003c/li\u003e\n\u003cli\u003eLamont, R. J., Koo, H. \u0026amp; Hajishengallis, G. The oral microbiota: dynamic communities and host interactions. \u003cem\u003eNature Reviews Microbiology\u003c/em\u003e \u003cstrong\u003e16\u003c/strong\u003e, 745-759 (2018). https://doi.org/10.1038/s41579-018-0089-x\u003c/li\u003e\n\u003cli\u003eLee, E.\u003cem\u003e et al.\u003c/em\u003e Genomic analysis of conjugative and chromosomally integrated mobile genetic elements in oral streptococci. \u003cem\u003eApplied and Environmental Microbiology\u003c/em\u003e \u003cstrong\u003e90\u003c/strong\u003e, e01360-01324 (2024). https://doi.org/doi:10.1128/aem.01360-24\u003c/li\u003e\n\u003cli\u003eGauntlett, J. C.\u003cem\u003e et al.\u003c/em\u003e Molecular Analysis of BcrR, a Membrane-bound Bacitracin Sensor and DNA-binding Protein from Enterococcus faecalis*. \u003cem\u003eJournal of Biological Chemistry\u003c/em\u003e \u003cstrong\u003e283\u003c/strong\u003e, 8591-8600 (2008). https://doi.org/https://doi.org/10.1074/jbc.M709503200\u003c/li\u003e\n\u003cli\u003eZhang, Y.\u003cem\u003e et al.\u003c/em\u003e Predominant role of msr(D) over mef(A) in macrolide resistance in Streptococcus pyogenes. \u003cem\u003eMicrobiology (Reading)\u003c/em\u003e \u003cstrong\u003e162\u003c/strong\u003e, 46-52 (2016). https://doi.org/10.1099/mic.0.000206\u003c/li\u003e\n\u003cli\u003eAlpert, C. A. \u0026amp; Siebers, U. The lac operon of Lactobacillus casei contains lacT, a gene coding for a protein of the Bg1G family of transcriptional antiterminators. \u003cem\u003eJ Bacteriol\u003c/em\u003e \u003cstrong\u003e179\u003c/strong\u003e, 1555-1562 (1997). https://doi.org/10.1128/jb.179.5.1555-1562.1997\u003c/li\u003e\n\u003cli\u003eYoshida, A. \u0026amp; Kuramitsu, H. K. Multiple Streptococcus mutans Genes Are Involved in Biofilm Formation. \u003cem\u003eAppl Environ Microbiol\u003c/em\u003e \u003cstrong\u003e68\u003c/strong\u003e, 6283-6291 (2002). https://doi.org/10.1128/aem.68.12.6283-6291.2002\u003c/li\u003e\n\u003cli\u003eDu, Q., Wang, H. \u0026amp; Xie, J. Thiamin (vitamin B1) biosynthesis and regulation: a rich source of antimicrobial drug targets? \u003cem\u003eInt J Biol Sci\u003c/em\u003e \u003cstrong\u003e7\u003c/strong\u003e, 41-52 (2011). https://doi.org/10.7150/ijbs.7.41\u003c/li\u003e\n\u003cli\u003eJurgenson, C. T., Begley, T. P. \u0026amp; Ealick, S. E. The structural and biochemical foundations of thiamin biosynthesis. \u003cem\u003eAnnu Rev Biochem\u003c/em\u003e \u003cstrong\u003e78\u003c/strong\u003e, 569-603 (2009). https://doi.org/10.1146/annurev.biochem.78.072407.102340\u003c/li\u003e\n\u003cli\u003eJiang, F. \u0026amp; Doudna, J. A. The structural biology of CRISPR-Cas systems. \u003cem\u003eCurr Opin Struct Biol\u003c/em\u003e \u003cstrong\u003e30\u003c/strong\u003e, 100-111 (2015). https://doi.org/10.1016/j.sbi.2015.02.002\u003c/li\u003e\n\u003cli\u003eJobin, C. Precision medicine using microbiota. \u003cem\u003eScience\u003c/em\u003e \u003cstrong\u003e359\u003c/strong\u003e, 32-34 (2018). https://doi.org/10.1126/science.aar2946\u003c/li\u003e\n\u003cli\u003ede Vos, W. M., Boerrigter, I., van Rooyen, R. J., Reiche, B. \u0026amp; Hengstenberg, W. Characterization of the lactose-specific enzymes of the phosphotransferase system in Lactococcus lactis. \u003cem\u003eJ Biol Chem\u003c/em\u003e \u003cstrong\u003e265\u003c/strong\u003e, 22554-22560 (1990). \u003c/li\u003e\n\u003cli\u003eZhao, J., Liang, Y., Zhang, S. \u0026amp; Xu, Z. Effect of sugar transporter on galactose utilization in Streptococcus thermophilus. \u003cem\u003eFrontiers in Microbiology\u003c/em\u003e \u003cstrong\u003eVolume 14 - 2023\u003c/strong\u003e (2023). https://doi.org/10.3389/fmicb.2023.1267237\u003c/li\u003e\n\u003cli\u003eAfzal, M., Shafeeq, S. \u0026amp; Kuipers, O. P. LacR is a repressor of lacABCD and LacT is an activator of lacTFEG, constituting the lac gene cluster in Streptococcus pneumoniae. \u003cem\u003eAppl Environ Microbiol\u003c/em\u003e \u003cstrong\u003e80\u003c/strong\u003e, 5349-5358 (2014). https://doi.org/10.1128/aem.01370-14\u003c/li\u003e\n\u003cli\u003eMarasco, R., Muscariello, L., Varcamonti, M., De Felice, M. \u0026amp; Sacco, M. Expression of the bglH gene of Lactobacillus plantarum is controlled by carbon catabolite repression. \u003cem\u003eJ Bacteriol\u003c/em\u003e \u003cstrong\u003e180\u003c/strong\u003e, 3400-3404 (1998). https://doi.org/10.1128/jb.180.13.3400-3404.1998\u003c/li\u003e\n\u003cli\u003eCroucher, N. J.\u003cem\u003e et al.\u003c/em\u003e Horizontal DNA Transfer Mechanisms of Bacteria as Weapons of Intragenomic Conflict. \u003cem\u003ePLOS Biology\u003c/em\u003e \u003cstrong\u003e14\u003c/strong\u003e, e1002394 (2016). https://doi.org/10.1371/journal.pbio.1002394\u003c/li\u003e\n\u003cli\u003eVale, F. F., Lehours, P. \u0026amp; Yamaoka, Y. Editorial: The Role of Mobile Genetic Elements in Bacterial Evolution and Their Adaptability. \u003cem\u003eFront Microbiol\u003c/em\u003e \u003cstrong\u003e13\u003c/strong\u003e, 849667 (2022). https://doi.org/10.3389/fmicb.2022.849667\u003c/li\u003e\n\u003cli\u003eYang, Q.\u003cem\u003e et al.\u003c/em\u003e Integrative and conjugative elements in streptococci can act as vectors for plasmids and translocatable units integrated via IS1216E. \u003cem\u003eInt J Antimicrob Agents\u003c/em\u003e \u003cstrong\u003e61\u003c/strong\u003e, 106793 (2023). https://doi.org/10.1016/j.ijantimicag.2023.106793\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-8400357/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8400357/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Disruption of the oral microbiome is increasingly implicated in head and neck cancer (HNC), yet the genomic adaptations that commensal bacteria acquire in tumour-associated environments remain unclear. We performed genome-resolved analyses of 101 complete Streptococcus genomes from the tumours and oral cavities of 31 HNC patients. Phylogenomic analysis identified 35 species, including ten novel species belonging to the Mitis group. The Streptococcus genus shared 29 core genes, with analysis of accessory genomes (1.7-2.5 Mbp) showing extensive horizontal gene transfer (HGT), supported by 245 ICE clusters, 82 prophages and 4 plasmid groups. Comparison with 391 published genomes from the oral cavities of healthy individuals showed that tumour-associated isolates exhibited niche-specific expansions of carbohydrate-active enzymes and enrichment of genes involved in sugar transport, thiamine biosynthesis and antimicrobial resistance. Together, these findings reveal distinct HGT-driven genomic remodelling in tumour-associated Streptococcus and provide the first comprehensive genomic resource for examining microbiome adaptation in HNC.\nGraphical Abstract","manuscriptTitle":"The Genomic Landscape of Head and Neck Cancer-associated Streptococci","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-01-09 06:36:34","doi":"10.21203/rs.3.rs-8400357/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"3b2bb9d0-9024-4e0f-8bbe-8ddc5b0786f6","owner":[],"postedDate":"January 9th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":60541792,"name":"Biological sciences/Cancer"},{"id":60541793,"name":"Biological sciences/Computational biology and bioinformatics"},{"id":60541794,"name":"Biological sciences/Genetics"},{"id":60541795,"name":"Biological sciences/Microbiology"}],"tags":[],"updatedAt":"2026-01-09T06:36:34+00:00","versionOfRecord":[],"versionCreatedAt":"2026-01-09 06:36:34","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8400357","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8400357","identity":"rs-8400357","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.