Host genetic control of the oral microbiome and its links to human metabolism and immunity | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Biological Sciences - Article Host genetic control of the oral microbiome and its links to human metabolism and immunity Xiaomin Liu, Yaohui Zhao, Yafeng Wang, Longke Zeng, Gewei Wang, and 27 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8406553/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted You are reading this latest preprint version Abstract Host genetic influence on the oral microbiota, their functions, and associations with host phenotypes remain under investigated. Here, we present a large-scale genome-wide association study of the tongue dorsum microbiome in 13,397 Chinese participants from the CHARLS-4DSZ dataset, identifying ten genome-wide significant and replicated loci associated with 17 microbial taxa, eight pathways, and 1,783 gene families. The strongest signal maps to the missense variant FUT2 I140F, which regulates three taxa, with the most significant effect on Haemophilus sputorum (P = 9.71 × 10−51), and this association is independent of ABO blood groups. FUT2 is further associated with microbial D-arabinose degradation pathway and 134 gene families, including α-L-fucosidases and ABC transporters, implicating fucose-mediated host genetic regulation of microbial metabolism. Most identified microbiome-associated loci are functionally interpretable, affecting tissue/single-cell gene expression and linking to host immunometabolic traits: the POLI locus associates with Haemophilus parahaemolyticus and influences white blood cell counts and triglyceride levels, while SLC2A9 (urate transporter) regulates serum uric acid and uric acid-degrading bacteria harboring uric acid-utilizing gene clusters. Additionally, 239 significant associations are observed between 94 microbial features and 43 host phenotypes. Mendelian randomization further confirms eleven causal relationships, including microbial effects on blood gamma-glutamyl transferase, creatine, and uric acid, suggesting microbial roles in the host liver and kidney metabolism. Together, our study provides a comprehensive map of oral microbiome genetics, advancing mechanistic understanding of host-microbe interactions. Biological sciences/Genetics/Genetic association study/Genome-wide association studies Biological sciences/Immunology Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Introduction The importance of microbiota in human health and diseases has been increasingly highlighted as the sequencing technology developed 1–4 . While the gut microbiome has long been the central focus of related research, the oral cavity, as a highly dynamic microbial environment that harbors diverse microbial communities, is increasingly recognized for its impact on both local and systemic health 5 . The oral microbial community is not only closely associated with local oral diseases such as dental caries and periodontitis, but its dysbiosis can also influence the risk of cardiovascular diseases, diabetes, gastrointestinal tumors, and other conditions through pathways such as inducing systemic inflammation, releasing specific metabolites, and modulating host immune responses 6–9 . Despite the significant clinical implications of the oral microbiome, research into its genetic and environmental determinants remains insufficient, particularly in non-European populations. This knowledge gap limits our ability to translate microbiome-host interactions into precision medicine and underscores the urgent need for population-specific studies. The metagenome-genome-wide association study (M-GWAS) has emerged as a powerful tool for deciphering host genetic influences on the microbiome. In several large-scale M-GWAS studies, the genetic effects of LCT and ABO locus variants on gut microbial abundance have been consistently replicated 10–12 . However, these studies primarily focused on fecal microbiomes while overlooking the oral cavity—an evolutionarily conserved site where host-microbe interactions occur more directly. Notably, a recent preprint using a large European cohort identified 11 significant host genetic signals affecting the salivary microbiota. This study mainly investigated populations of European ancestry 13,14 , neglecting the genetic and environmental diversity of non-European groups, which may obscure population-specific host-microbe interaction signals shaped by localized evolutionary pressures 15 , dietary habits 16 , and environmental exposures 17 . Our previous research based on a Chinese 4DSZ cohort identifies three and two study-wide significant host genetic determinants of tongue dorsum microbiota and salivary microbiota respectively 18 , and demonstrated that host genetic factors explain more variation in the oral microbiome (tongue dorsum and saliva) than environmental factors, highlighting that host–microbe interactions in other host body niches extend beyond the gut microbiome. Compared to the fluid saliva, which acts as a mixing reservoir, the tongue dorsum constitutes a more stable and nutrient-rich ecological niche with structured biofilms 19–21 . This enhanced temporal stability, as evidenced by the persistence of specific strain mixtures over extended periods 22 , makes the tongue microbiome a superior model for discerning the subtle effects of host genetics from transient environmental influences. Although our study identified several significant host genetic signals, it was limited to 2,984 younger individuals with a mean age of 30 years, resulting in insufficient population representativeness and detection power 18 . Expanding M-GWAS analyses to larger, more geographically and age-diverse natural Chinese populations will be essential for identifying stable and robust associations between host genetics and oral microbiota. Here, we conducted a large-scale tongue dorsum M-GWAS involving 11,380 individuals from the CHARLS cohort through whole genome sequencing and whole metagenomic data integration, and further incorporating our previous 4DSZ cohort (2,017 out of the 2,984 individuals with high-depth whole genome and tongue dorsum metagenome sequencing data) to comprehensively investigate the host genetic determinants of the tongue dorsum microbiome. Additionally, our cohort includes basic questionnaires, blood test parameters, dental conditions, and disease-related phenotypic data. Through microbiome and host phenotypic GWAS, observational correlation analysis, and Mendelian randomization, we further explored the interactions and potential causal relationships between the tongue dorsum microbiota and host blood chemistry, dental, and disease phenotypes. This study not only reveals mechanisms of the host-microbe interaction but also provides data references for developing targeted microbial interventions and therapies. Results Host genetic structure and age strongly influenced tongue dorsum microbiome composition in an extensive Chinese multi-omics dataset To systematically characterize the host genetic effects on the oral microbiome, we assembled a large-scale, high-quality multi-omics Chinese cohort (CHARLS-4DSZ) comprising 13,397 individuals, integrating an extensive dataset that included host whole-genome sequencing (WGS), tongue dorsum whole metagenome sequencing (WMS), blood biochemistry, and phenotypic information. After strict sample selection and quality control ( Supplementary Fig. 1 ), the CHARLS-4DSZ cohort encompassed three distinct sub-datasets ( Fig 1a, Supplementary Table 1 ): (1) the primary CHARLS discovery dataset (N = 8,331, mean age of 65 years; average depth of 20× for blood WGS and mean 17.60 ± 3.61Gb for tongue dorsum WMS; Supplementary Fig. 2a, b ); (2) the CHARLS replication dataset (N = 3,049, mean age of 65 years; an average 20.07 ± 5.13 Gb for tongue dorsum WMS of which mean 4.2× host reads achieved by aligning sequencing reads to the human reference genome, Methods ; Supplementary Fig. 2c, d ), and (3) the 4DSZ dataset collected in year of 2018 from our prior work (N = 2,017, mean age of 30 years; average depth of 33× for blood WGS and mean 19.18 ± 7.90 Gb for tongue dorsum WMS; Supplementary Fig. 2e, f ). Prior to integrated analysis, we assessed the overall genetic structural similarity of the individuals across the CHARLS discovery, replication, and 4DSZ cohorts. Principal component analysis (PCA) on host genetics revealed no evident stratification of the three Chinese cohorts. It represented the typical Chinese ethnic population structure with the first principal component (PC1) distinguishing the northern and southern Chinese and the second principal component (PC2) distinguishing the west-to-east populations, consistent with reported Chinese population studies 23,24 ( Supplementary Fig. 3a ). All three cohorts were clustered within the East Asian genetic backgrounds. They were clearly separated from other ethnic populations, such as African, American, European, and South Asian, when compared with the 1000 Genomes dataset ( Supplementary Fig. 3b ), thereby minimizing confounding due to population stratification. The principal coordinates analysis (PCoA) of microbial communities indicated a slight deviation of the replication cohort from the other two cohorts along PCoA1 and PCoA2, characterized by higher Streptococcus abundance and lower Prevotella abundancein the replication cohort( Supplementary Fig. 4 , Supplementary Table 2 ). We next assessed the influence of host confounders, including age, sex, BMI, and the top ten host genetic principal components (PCs), on microbiome diversity using multivariable linear regression models with Bonferroni correction ( Fig. 1b , Supplementary Table 3 ). Sex and host PC2 were significantly associated with the two microbial alpha-diversity indices, namely the Shannon and Simpson indices ( P Bonferroni < 1.00 × 10 −5 ). Age emerged as a significant negative predictor of microbial Simpson diversity (β = −0.076, P Bonferroni = 3.52 × 10 −16 ), reinforcing an established trend of microbial richness decline with aging. Notably, host genetic ancestry, captured by PC1 and PC2, exerted the strongest associations with the microbial top two principal components PCoA1 ( P Bonferroni = 1.93 × 10 −11 for PC1 and P Bonferroni = 0.038 for PC2) and PCoA2 ( P Bonferroni = 5.35 × 10 −6 for PC1 and P Bonferroni = 7.81 × 10 −37 for PC2), which collectively explained over 40% of the variance of the total microbial composition, highlighting the role of genetic background in shaping microbial ecology. When estimating the effects of host potential confounders on beta diversity, age, sex, host PC1, and PC2 consistently contributed the most to the microbial composition (each explained variance R 2 > 0.2%; P = 0.001 for 1,000 permutations in the permutational multivariate analysis of variance (PERMANOVA) test; Fig. 1b , Supplementary Table 4 ). These results underscore that host genetic structure, age, and sex significantly shape the oral microbiome in this Chinese cohort. Host genetic loci significantly associated with microbial taxa and pathways With this so far, the largest cohort of the whole genome and whole metagenome data, we first performed M-GWAS analysis on 5.6 million common (minor allele frequency (MAF) > 0.05) genetic variants to test their association with independent 534 taxa and 311 bacterial pathways with prevalence > 10% in participants from the CHARLS discovery dataset ( Supplementary Tables 5 and 6 ). M-GWAS was performed using linear regression for relative abundance traits and logistic regression for presence–absence traits of bacterial taxa or pathways, with adjustment for age, sex, BMI, and the top ten host genetic PCs ( Methods ). Next, we performed the same M-GWAS analysis on the two validation datasets: namely, the CHARLS replication and 4DSZ datasets. To ensure robustness of results, genome-wide significant results ( P < 5 × 10 −8 ) from the discovery dataset were defined as replicated when supported by nominal significance ( P < 0.05) and by a fully consistent effect direction for the same allele in at least one validation dataset. Finally, per-dataset results were combined in a meta-analysis. In the discovery stage, we identified 74 independent host genetic loci significantly associated with 66 microbial taxa and 28 functional pathways, meeting genome-wide significance ( P < 5 × 10 −8 ; r 2 < 0.1 in the ± 1 Mb flanking region; Supplementary Table 7 ). Among these, the associations of nine genetic loci with 14 taxa and four pathways were well replicated ( Fig. 2, Table 1 ): FUT2 , POLI - C18orf54 , AMY1C - AMY2B , PRB3 - PRB4 , SLC2A9 , HLA-DRA-DRB5 , MRPS18A - VGEFA , ZFR - SUB1 , and TMCO3 - TFDP1 . After applying for a more conservative Bonferroni correction for the number of features tested ( P < 9.36 × 10 −11 for 534 taxa and P < 1.60 × 10 −10 for 311 pathways), we identified seven study-wide significant genomic loci, namely FUT2 , POLI - C18orf54 , AMY1C - AMY2B , SLC2A9 , HLA - DRA - DRB5 and MGST1 , significantly associated with 16 tongue dorsum microbial features. An additional locus in MTMR6 - NUP58 also reached the study-wide significance in the meta-analysis, although it did not reach the genome-wide significance in the discovery dataset ( Table 1 ). Together, we discovered ten genome-wide significant and well-replicated loci, including eight study-wide significant loci, which were associated with 17 microbial taxa and eight pathways. Four loci showed pleiotropic effects and were related to multiple taxa and pathways: FUT2 , POLI –C18orf54 , SLC2A9, and MTMR6 . There was no evidence of any excess false positive rate in the GWAS analyses (genomic inflation factors λ GC ranged from 0.981 to 1.045 with a median of 1.01; Supplementary Fig. 5 ). All the genome-wide significant associations identified in the discovery cohort and their replication, as well as meta-analysis results, were listed in the Supplementary Table 8. The strongest signal was observed for the missense mutation FUT2 rs1047781 (A>T, p.I140F, resulting in an Ile140Phe amino acid substitution), an Eastern Asian-specific common variant (MAF = 0.439) that determines ABO antigen secretor status 25 . This variant showed significant associations with multiple tongue dorsum taxa and pathways ( Fig. 3 ), including the presence/absence status of three tongue dorsum species, namely Haemophilus sputorum ( P meta = 9.71 × 10 − 51 ), Granulicatella SGB8239 ( P meta = 3.80 × 10 − 21 ), and Veillonella SGB6928 ( P meta = 2.64 × 10 − 18 ), as well as the relative abundance of the three microbial species ( Fig. 3a,b ). The A-allele of rs1047781 defines secretor status, with AA/TA genotypes representing secretors and TT genotypes indicating weak or non-secretors. Compared to non-secretor individuals, the secretor individuals exhibited a significantly higher average prevalence ( P = 1.1 × 10 − 23 ~ 7.6 × 10 − 62 , Fig. 3c ) and relative abundances of these three taxa ( P = 5.2 × 10 − 4 ~ 7.9×10 − 12 , Fig. 3d) . Because FUT2 I140F determines the secretor status of ABO blood group antigens, we next examined the potential interactions between FUT2 and ABO on these three microbial taxa. We inferred blood groups according to the genotypes of three genetic variants in the East Asian population (Methods), yielding the following blood type distribution: O (33.5%), B (29.3%), A (28.6%), and AB (8.6%). No significant differences in the prevalence or abundances of these three bacteria were observed among ABO blood groups (O, A, B, and AB; Fig. 3e, f ). Notably, regardless of ABO blood groups, FUT2 -determined secretor individuals consistently showed higher bacterial prevalence and abundance across all three bacterial species than non-secretors ( Fig. 3g, h ). In contrast to previous gut microbiome studies reporting an interaction between the ABO and FUT2 genotypes 12,26 , our study indicated that FUT2 exerted a dominant and ABO-independent effect on the tongue dorsum microbiome. In addition to the FUT2 loci, the other identified microbiome-associated loci (MAL) were not randomly distributed but were significantly enriched in genes involved in key biological pathways central to host-microbiome interaction ( Supplementary Fig. 6 ): Immune recognition and antigen presentation. For instance, FUT2 determines the mucosal antigen synthesis and glycosylation patterns that are known to regulate bacterial adhesion and the gut microbiome composition 27,28 ; HLA-DRA - DRB5 , participating in the host adaptive immunity through MHC-II mediated antigen presentation 29 , was associated with an unknown SGB2208 from the phylum Bacteroidetes ( P meta = 2.13 × 10 −51 ); Nutrient metabolism and digestion. For example, AMY1 , encoding the salivary amylase that promots dietary starch digestion and directly influences nutrient availability for carbohydrate-metabolizing bacteria 30,31 , was associated with the abundance of genus Stomatobaculum ( P meta = 2.22 × 10 −10 ), and Stomatobaculum was observed to be enriched in the cluster ASV1 (highest sucrose intake) than cluster ASV2 (lowest sucrose intake) 32 . PRB , encoding the basic salivary proline-rich proteins that modulate oral lubrication and bacterial aggregation 33 , was associated with the abundance of Streptococcus infantis ( P meta = 1.08 × 10 −12 ); SLC2A9, as the urate transport that affects serum uric acid levels and is associated with multiple microbial communities, especially for Lachnoanaerobaculum ( P meta = 3.46 × 10 −15 ); Cellular housekeeping, signaling, and transport mechanisms . For example, POLI - C18orf54 locus, involved in DNA repair and genomic maintenance, was associated with the presence/absence status of Haemophilus parahaemolyticus ( P meta = 4.47 × 10 −19 ); MRPS18A – VEGFA , linking mitochondrial ribosomal function to angiogenesis and tissue microenvironment regulation 34 , was associated with a Proteobacteria SGB19290 ( P meta = 5.36 × 10 −11 ); both MTMR6 and MTMR12 loci, involved in the phosphoinositide signaling and nucleocytoplasmic transport that maintain the normal cell function and may regulate cell immune response 35,36 , were associated with multiple microbial traits: MTMR12 with Bacteroidetes CFGB570 ( P meta = 2.13 × 10 −51 ) and MTMR6 with microbial pathways (most significant for PWY-2941: L-lysine biosynthesis II; P meta = 2.82 × 10 −11 ), respectively. Together, this robust functional clustering, particularly the enrichment in immune-related pathways and metabolic processes, suggests that these genetic variants exert their influence by modulating the host physiological landscape, including alterations in nutrient availability, immune surveillance, and cellular homeostasis, thereby defining the ecological niche of resident microbiota. In addition to microbial taxa and pathways, we also tested the associations between host genetics and microbial diversity. No genome-wide significant associations were detected for microbial alpha diversity (Shannon and Simpson index). Five independent loci, LINC01739-LINC00466 (PCoA1), BMP2-LINC01428 (PCoA2), HDAC9 (PCoA7), CLDN10 (PCoA9), and FNDC3B (PCoA10), showed significant associations with at least one of the microbial top ten PCoAs ( P < 5 × 10 − 8 , Supplementary Table 9 , Supplementary Fig. 7 ). The associated genes may play core roles in epithelial barrier integrity ( CLDN10 tight junctions, FNDC3B cell adhesion), chromatin remodeling ( HDAC9 ), and skeletal or cartilage formation ( BMP2 ). MAL linking to microbial gene families helps understand host–microbiome interactions To further decipher the microbial gene functions involved in host genotype-microbiome interactions, we next performed a gene-family-level M-GWAS by testing associations between the 74 identified genome-wide significant MAL and ~700,000 microbial gene families, along with their contributing bacterial species. This analysis identified three loci involving 8,619 associations with 1,783 gene families, surpassing study-wide significance in the discovery dataset and replicated in at least one replication dataset ( P < 7.0 × 10 − 14 ; Fig. 4a, Supplementary Table 10 ). Notably, 98.3% (8,474/8,619) associations were for the FUT2 locus that was linked to 1,773 unique gene families, with the main contributing bacteria were from the genera Haemophilus and Streptococcus. The most significant gene families were involved in H-antigen transport and fucose metabolism, suggesting that the FUT2 locus shapes the functional capacity of oral microbes to utilize host-secreted antigen through the uptake, breakdown, and metabolic conversion of liberated fucose. This functional reshaping was further substantiated by our analysis of species-stratified pathway abundances, which revealed that the FUT2 locus was extremely significantly associated with multiple pathways contributed by specific bacteria ( Supplementary Fig. 8 ). Specifically, H. sputorum -contributed pathways, including fucose degradation and L-isoleucine biosynthesis, were strongly linked to FUT2 , providing direct evidence at the pathway level that bacteria adapt their metabolic repertoire to utilize host-derived fucose. Thus, based on the M-GWAS with taxa, pathway and gene families, we proposed the following mechanistic pathway modulated by host FUT2 ( Fig. 4b, c ): Host genotype dictates mucosal nutrient landscape . At the FUT2 I140F locus (rs1047781, A>T), individuals carrying at least one functional A allele (AA or AT) are secretors and exhibit robust expression of α-1,2-linked fucose (the H-antigen) on mucosal glycans, thereby imposing a powerful selective pressure on the tongue dorsum microbiome. Bacteria adapt to utilize host-derived fucose . The mucosal H-antigen served as a primary nutrient source for the enrichment of specific bacterial gene families essential for harvesting and consuming host-derived fucose ( Supplementary Fig. 9 ): (i) Recognition, binding and import: Bacteria utilize its extracellular solute-binding proteins (A0A0B7M2T5 annotated from the UniRef90 database; P meta = 2.36 × 10 − 76 ) , ABC transporter permeases (e.g., A0A0T7SSD5; P meta = 3.39 × 10 − 77 ), and ABC transporter substrate-binding protein (F9HPT4; P meta = 2.49 × 10 − 76 ) to recognize, bind and import the H-antigen from the mucosal environment; (ii) Cleavage: the alpha-L-fucosidase (F9HM02; P meta = 2.49 × 10 − 72 primarily contributed by Streptococcus mitis ) removes terminal α-L-fucosyl residues from fucosylated glycans (e.g., H antigen); and (iii) Metabolism: downstream metabolic enzymes, including L-fucose-proton symporter (P44776; P meta = 3.77 × 10 − 37 ) and L-fucose isomerase (B8F6Y0; P meta = 1.53 × 10 − 34 ) from Haemophilus sputorum , as well as L-fucose isomerases (Q97N97) from S. mitis ( P meta = 5.54 × 10 − 39 ), S. pneumoniae ( P meta = 4.30 × 10 − 23 ), and S. pseudopneumoniae ( P meta = 2.51 × 10 − 26 ), convert L-fucose into L-fuculose and support its entry into the central metabolism. Thus, the host FUT2 function mutation determines the availability of fucosylated H-antigens in the oral ecosystem and drives microbial adaptation to the host's fucose landscape, highlighting a key mechanism of host-microbe co-metabolism. In addition to FUT2 , two other genetic loci associated with microbial gene family abundance that passed the study-wide significance threshold were identified( Fig. 4a, Supplementary Table 10 ). These two loci were also significantly linked to the species-stratified pathways ( Supplementary Fig. 8 ). On chromosome 18, the POLI- C18orf54 locus was associated with various functional proteins of S. mitis and H. parahaemolyticus , including exo-alpha-sialidase, bacteriophage proteins, anaerobic C4-dicarboxylate transporter DcuB, and site-specific DNA-methyltransferases ( P < 7.3 × 10 − 14 ). It also showed study-wide significant correlations with three pathways contributed by H. parahaemolyticus, including ANAGLYCOLYSIS-PWY: glycolysis III (from glucose), PWY-1042: glycolysis IV, and COA-PWY-1: superpathway of coenzyme A biosynthesis III (mammals). On chromosome 4, the SLC2A9 locus was significantly associated with multiple gene families of Leptotrichia sp. oral taxon 212, including transporters, DNA polymerase III subunit alpha, DUF3290 domain-containing proteins, beta-eliminating lyases, and UDP-galactopyranose mutase ( P < 7.3 × 10 − 14 ). Additionally, the SLC2A9 locus was significantly associated with three pathways contributed by Leptotrichia sp. oral taxon 212, including HSERMETANA-PWY: L-methionine biosynthesis III, PEPTIDOGLYCANSYN-PWY: peptidoglycan biosynthesis I (meso-diaminopimelate containing), and PWY-3841: folate transformations II (plants). Collectively, our findings delineate a multi-layered architectural blueprint whereby human genetic variation orchestrates the oral microbiome. MAL enriched for host metabolism and immune by eQTL and PheWAS analysis To further explore the potential gene functions of the identified MAL, we performed functional mapping and annotation of genetic associations through mapping expression quantitative trait loci (eQTLs) and phenome-wide association studies (PheWAS). Through the colocalization of MAL with eQTLs information from the Genotype-Tissue Expression (GTEx) database, spanning 49 tissue types 37 , 39% (29/74) of MAL were associated with tissue-specific gene expression in 136 genes ( Supplementary Table 11 ). For example, the top two MAL showing the strongest links to microbial features were associated with expression of FUT2 and POLI , respectively, across 12 tissues, particularly for digestive tract tissues such as esophagus mucosa, pancreas, stomach, colon transverse, small intestine terminal ileum, and minor salivary gland ( Supplementary Fig. 10 ). The MAL linked to Lachnoanaerobaculum mainly regulated the expression of SLC2A9 in 18 tissues. The MAL linked to Stomatobaculum mainly regulated the expression of AMY2B in the brain putamen and basal ganglia. The MAL linked to Streptococcus infantis mainly regulated PRH1 expression across 18 tissues. The MAL linked to Bacteroidetes SGB2208 mainly regulated the expression of HLA-DQB1 , - DRB5 , and - DQA1 in over 30 tissues. The MAL linked to Capnocytophaga sp. oral taxon 863was correlated with TMCO3 expression in muscle and skeletal tissues. Further, we colocalized these MAL with single-cell-specific eQTLs and chromatin accessibility quantitative trait loci (caQTLs) from the Chinese Immune Multi-Omics Atlas (CIMA) study 38 . We observed that 22% (16/74) of MAL were associated with single cell-specific eQTLs or caQTLs ( Supplementary Table 12) . The MAL linked to H. parahaemolyticus showed concordant associations with both POLI expression and chromatin accessibility in all immune cell types, with the strongest for CD4 cells ( Supplementary Fig. 11 ). The MAL linked to Lachnoanaerobaculum primarily regulated SLC2A9 expression in CD4 cells. The MAL linked to Solobacterium SGB6829 were associated with AGAP1 expression in CD4 cells. Collectively, these findings indicate that MAL may exert their effects on the tongue dorsum microbiome by regulating of gene activation and the differentiation of blood immune cells. Next, PheWAS analysis was performed by examining 74 genome-wide significant loci in the summary statistics of traits from the GWAS catalog 39 , Biobank Japan (BBJ) 40 , and this current study including 24 blood metabolic traits, 16 diseases, and four dental conditions ( Fig. 5a ). Six MAL including five of the ten replicated MAL were linked to one or more metabolic/immune traits at P < 5 × 10 −8 ( Supplementary Fig. 12, Supplementary Table 13 ): AMY1C linked to alpha-amylase 1 and amylase measurements; SLC2A9 linked to multiple serum metabolites including urate measurement, uric acid etc.; HLA loci linked to lots of immune related traits/diseases such as rheumatoid arthritis and autoimmune disease; PDE2A loci linked to blood total protein, non-albumin protein and insomnia; POLI loci linked to cortical thickness, neuroticism measurement, blood white blood cell counts; FUT2 loci linked to multiple blood metabolic/immune indices, such as cancer biomarker measurement, alkaline phosphatase, serum carcinoembryonic antigen, vitamin B12, serum alanine aminotransferase. The locus showing the strongest association in the PheWAS analysis was SLC2A9, which was significantly associated with urate measurement ( P = 9.0 × 10 − 3353 ) and blood uric acid ( P = 4.0 × 10 − 496 ) in a meta-analysis of GWAS studies, as well as in the BBJ and our study. This is consistent with the known function of SLC2A9 as a uric acid transporter gene. The minor allele C of the index SNP SLC2A9 rs3796835 was associated with a lower serum uric acid level in both the CHARLS cohort ( P = 4.1 × 10 − 19 ; Fig. 5b ) and the 4DSZ cohort ( P = 6.7 × 10 − 6 ), regardless of age effects. M-GWAS analysis showed that this SNP was mostly strongly associated with the relative abundance of the genus Lachnoanaerobaculum in both the CHARLS cohort ( P = 1.3 × 10 − 13 ; Fig. 5c ) and the 4DSZ cohort ( P = 4.8 × 10 − 5 ), regardless of age effects. Similarly, these SLC2A9 -associated microbial taxa, such as Lachnoanaerobaculum , exhibited positive correlation with serum uric acid in both the CHARLS cohort (Spearman r = 0.05, P = 6.9 × 10 − 7 ; Fig. 5d ) and the 4DSZ cohort (r = 0.07, P = 2.4 × 10 − 4 ). These findings suggested the blood uric acid-mediated function mechanism of SLC2A9 on specific oral bacteria ( Fig. 5e ): the SLC2A9 rs3796835-T allele linked to an increase in serum uric acid levels, and the elevated serum uric acid environment selectively promotes the proliferation and colonization of bacteria with uric acid degradation capabilities, such as Lachnoanaerobaculum sp. ICM7. The molecular basis of this adaptive change in the bacterial community lies in the fact that these bacteria carry complete uric acid degradation functional gene clusters (including key genes such as ygeX , ygeY , ygeW , ssnA , ygfK , etc.), giving them the ability to utilize uric acid as a nutrient substrate efficiently, and thus gaining a competitive advantage in hosts with high uric acid 41 . In addition to the “ SLC2A9- serum uric acid -Lachnoanaerobaculum ” interactive axis, MAL were also involved in some other immunometabolic traits. For example, another SNP, rs12954177 near POLI , regulated the expression of POLI in 15 tissues, such as esophagus mucosa ( Supplementary Table 11, P = 2.50 × 10⁻ 69 ) and whole blood ( P = 2.20 × 10⁻ 43 ) from the GTEx, as well as the POLI expression in eQTLs and caQTLs of CD4 and CD8 cells ( Supplementary Table 12 ). This SNP was associated with white blood cell count in the BBJ cohort ( P = 6.20 × 10⁻ 7 ). It exhibited the strongest association with the presence of Haemophilus parahaemolyticus in M-GWAS analysis of both the CHARLS discovery and replication cohorts ( Fig. 6a, b ). Furthermore, this SNP was significantly associated with the relative abundance of H. parahaemolyticus , with the associations consistent across independent datasets ( Fig. 6c, d ). Notably, individuals carrying H. parahaemolyticus showed significantly elevated white blood cell count (WBC; β = –0.066, P = 5.46 × 10 –9 ), triglycerides (TG; β = –0.061, P = 2.95 × 10⁻⁸), and hemoglobin (HGB; β = –0.055, P = 1.02 × 10⁻ 6 ) concentrations, when compared to those without the bacterium ( Fig. 6e, f ). These results implicate the critical role of the POLI locus in interactions between oral microbes and host immunity/metabolism. Causal links between tongue dorsum microbiota and host metabolism To further comprehensively reveal the links and causalities between the microbiome and host phenotypes, we performed observational correlation and bidirectional MR analyses for 94 microbial features associated with at least one variant at genome-wide significance in our M-GWAS, and 43 host traits representing host metabolism, dental conditions, and diseases. The observational correlation analysis resulted the identification of a total of 239 significant associations after Bonferroni correction ( P < 1.22 × 10 −5 ; Fig. 7a , Supplementary Table 14 ), by using multivariate linear regression with adjustment for gender, age, BMI, and the top ten host PCs in 8,331 samples that exhibited multi-omics data and complete 43 phenotypic traits in the CHARLS cohort. Dentures (n = 42), blood urea nitrogen (n = 25), tooth loss (n = 21), blood gamma-glutamyl transferase (n = 19), creatine (n = 18), and blood glucose (n = 15), were among the host traits associated with the largest number of microbial features ( Supplementary Fig. 13 ). The class Betaproteobacteria, SGB1469, Haemophilus sputorum, and Neisseria subflava were linked to over eight blood traits. These results further extend prior findings and suggest quantitative relationships among oral microbial taxa/functions, dental conditions, and plasma metabolites. Leveraging the availability of comprehensive phenotypic data in 8,331 individuals, we first performed one-sample MR analyses to infer causal relationships for the 239 observationally significant associations. The selected instrumental variables were robust, with mean F statistics of 289 for microbial features and 186 for host traits, explaining on average 3.5% and 2.3% of the variance for microbial features and host traits, respectively ( Supplementary Fig. 14 ). We observed seven significant causal effects in the direction from microbiome to phenotypes after multiple test correction ( P < 2.09 × 10 −4 = 0.05/239, Fig. 7b , Supplementary Table 15 ). Moreover, to increase statistical power and robustness, we also used a two-sample MR method to analyze summary data from 8,331 samples with microbial features and 15,459 samples with host traits. The seven causal associations identified by one-sample MR were also significant in the two-sample MR analyses ( P = 8.5 × 10 −3 ~ 1.6 × 10 −4 , Supplementary Table 16 ). The two-sample MR analyses identified four additional causal associations: two showed that tongue dorsum microbial features have causal effects on blood metabolic traits, and the other two showed that blood metabolic traits have causal effects on tongue dorsum microbial features. These eleven causal relationships were confirmed by one of the one- and two-sample MR analyses and replicated by the other ( Fig. 7b ). The eleven inferred causal relationships revealed several key associations involved in hepatic and renal health. Two microbial metabolic pathways, L-histidine biosynthesis (HISTSYN-PWY) and the superpathway of L-tyrosine biosynthesis (PWY-6630), were causally associated with decreased blood gamma-glutamyl transferase (GGT) levels (β = –0.232 and β = –0.213, respectively). This protective effect is supported by recent evidence that sulfur-containing histidine derivatives (thiohistidines) can directly inhibit GGT activity 42 . Notably, we identified Neisseria subflava as a key contributing member of both these pathways (Pearson r = 0.88 for HISTSYN-PWY; r = 0.86 for PWY-6630) and confirmed its nominal causal effect on lowering GGT (β = –0.111). In contrast, the oral taxa Rothia mucilaginosa exhibited a causal link to elevated levels of both blood GGT (β = 0.165, P = 1.46 × 10 −4 ) and alanine aminotransferase (ALT) (β = 0.163, P = 2.02 × 10 −4 ). GGT and ALT were two liver enzymes whose elevated levels indicated liver damage, suggesting that R. mucilaginosa may be a potential microbial pathogen contributing to subclinical liver injury. The genus Granulicatella was similarly associated with increased GGT levels, reinforcing the role of specific oral bacteria in hepatic dysfunction. In addition to the hepatic biomarker, the oral microbiome influenced renal function and purine metabolism. L-arginine biosynthesis III (via N-acetyl-L-citrulline, PWY-5154) and Rothia mucilaginosa were linked to decreased creatine levels, and blood creatine level was correlated with reduced abundance of GGB1144_SGB1468, suggesting a previously unexplored oral-kidney axis. The prevalence of Oribacterium SGB5283 was positively associated with the increased blood uric acid level (β = 0.358, P = 4.36 × 10 −6 ), consistent with prior observations of its enrichment in groups with hyperuricemia (HUA) and obstructive sleep apnea (OSA) relative to only the OSA group 43 . Conversely, confirming a feedback mechanism, blood uric acid levels were causally associated with the prevalence of the experimentally confirmed uric acid-degrading bacteria, such as Lachnoanaerobaculum sp. ICM7 (at nominal significance), and Candidatus Nanosyncoccus (β = 0.177, P = 1.52 × 10 −5 ). In addition, we identified Leptotrichia hongkongensis as causally linked to an increased risk of dentures (β = 0.237, P = 2.93 × 10 −5 ), suggesting that oral bacterium may affect local dental health. Discussion This study establishes the tongue dorsum microbiome as a powerful model for elucidating host-genetic control of microbial ecosystems. By integrating multi-cohort data from young to elderly Chinese populations of 13,397 individuals, we performed the largest and most comprehensive oral M-GWAS to date, encompassing taxonomic, pathway, and gene-family levels. We demonstrate that: (i) the oral microbiome exhibits a stronger and more replicable host genetic signature than the gut microbiome, (ii) host-microbiome genetic architecture is characterized by both cross-population convergence and population-specific allelic heterogeneity, and (iii) genetically informed Mendelian randomization uncovers robust, causally implicated oral microbes and pathways modulating systemic host physiology, particularly hepatic and renal functions. These findings collectively shift the paradigm from descriptive association to mechanistic and causal understanding of the oral host-microbiome axis. First, through an M-GWAS, we identified 10 genetic loci significantly associated with microbial traits that were replicated in independent datasets, with 8 of these reaching study-wide significance. These genes represent functionally interpretable loci related to immune or metabolic traits, including the FUT2 locus previously reported in gut microbiome studies, the AMY1 locus that encodes salivary amylase and has co-evolved with the host, the POLI gene differential expressed in multiple tissues and blood immune cells linking to the blood white cell counts, the blood urate transporter gene SLC2A9 and human immune region HLA genes. Notably, the discovery of eight study-wide significant host loci in our oral M-GWAS stands in stark contrast to the gut microbiome field, where only two loci (LCT and ABO) have been consistently replicated at study-wide significance 10–12 , 44–49 . This disparity not only underscores a potentially stronger host genetic influence on the oral microbiome—likely due to its direct exposure to host-derived factors like saliva and mucosal immunity—but also highlights the unique power of the oral cavity as a model system for dissecting host genetic effects on commensal communities. Our results also highlight the power of combining multiple independent cohorts into a larger sample for M-GWAS analyses 50 , as this approach enables the discovery of robust and replicable results. Second, we compared with a recent study by Kamitaki et al ., which identified 11 host genetic loci associated with the salivary microbiome in a large European cohort using microbial principal components (mPCs) 51 . Despite differences in sampling site (saliva in Kamitaki et al . versus tongue dorsum in this study), population ancestry (European versus East Asian), and analytical framework (microbial community-level mPCs-based GWAS versus taxa/pathways-level GWAS), the two studies demonstrate striking convergence: five loci, including FUT2 , POLI , AMY1 , SLC2A9 , and PRB , were well replicated ( Supplementary Tables 17,18 ). However, our study moves beyond replication to deliver new biological insights. (i) Ancestry-stratified genetic architecture at shared loci. Although both studies implicate the same genomic regions as key determinants of the oral microbiome, the lead variants differ across populations ( Supplementary Table 19 ). For FUT2 , Kamitaki et al. identified the high-frequency European loss-of-function allele rs601338 (W154X; MAF=0.45 in Europeans but MAF=0.0087 in Chinese) in saliva, whereas our analysis highlights rs1047781 (I140F), a missense variant enriched in East Asians, as the strongest signal for the composition of tongue-dorsum microbiota. Likewise, the lead SNPs of POLI and TLR1 loci identified in Kamitaki et al.’s study were also European-specific and very rare in Chinese. All lead variants in the shared loci associated with the same microbes differed between the two ethnic populations. These findings illustrate how distinct functional alleles within the same gene can yield convergent phenotypic effects, underscoring the necessity of including diverse ancestries to fully characterize host–microbiome genetic interactions. (ii) Novel Chinese-specific loci were identified in this study. Beyond the cross-ancestry replication of five shared loci, our analysis identified five novel genome-wide significant and internally replicated loci that were not reported by Kamitaki et al. ( Supplementary Table 17 ), suggesting the population-specific genetic structure again. (iii) Mechanistic resolution of the FUT2 signal and reduction of dental confounding. In our cohort, the FUT2 association remains independent of ABO blood group, supporting a fucose-dependent mucosal mechanism, in contrast to the ABO-linked effect reported for saliva. Moreover, several loci highlighted by Kamitaki et al. appear partially driven by dental health and prosthesis-related confounding, whereas our genome-wide significant loci are enriched in genes involved in core metabolic and immune functions and show no association with denture use, tooth loss or chewing ability (all P > 1×10 − 4 ). These observations indicate that our signals more likely capture fundamental host–microbe biology rather than secondary effects of oral prostheses. (iv) Integration of host genetics, the oral microbiome and systemic traits. Leveraging extensive host-phenotype data together with Mendelian randomization, we mapped the systemic correlates of 94 genetically associated microbial features, identifying 239 significant host–microbe associations and 11 robust causal relationships. This integrative framework delineates specific microbes and microbial functions as putative modulators of host physiology, providing a foundation for microbiome-targeted interventions beyond descriptive associations. Compared to previous studies that focused solely on microbial taxonomic units 18 , this current M-GWAS study is more systematic, encompassing not only microbial taxonomic units but also pathways and gene families. This multidimensional analysis enabled us to explore potential adaptive mechanisms, such as the FUT2 signal, from diverse perspectives. At the species level, we observed significant associations between FUT2 genetic variation and multiple bacterial species, with the strongest signal detected in H. sputorum . At the pathway level, we identified the fucose degradation metabolic pathway as most strongly associated with the FUT2 signal, suggesting that host fucosylation may represent a core functional target. Further analysis at the gene family level revealed associations between FUT2 variation and multiple functionally significant gene families, with the top six annotated proteins including an ABC transporter permease and an extracellular solute-binding protein from Streptococcus pneumoniae , along with an ABC transporter substrate-binding protein and an alpha-L-fucosidase from Streptococcus mitis, as well as an L-fucose-proton symporter and an L-fucose isomerase from Haemophilus sputorum. Notably, although the strongest association with FUT2 was observed for H. sputorum , the most significant functional annotations at the gene family level were primarily from several species of the genus Streptococcus . The Streptococcus group participates in extracellular carbohydrate uptake, while the H. sputorum directly hydrolyzes host-derived fucosylated glycans. These results indicate that the mechanism underlying the FUT2 signal is not a single species effect but may reflect cross-species metabolic cooperation. Therefore, multi-level M-GWAS analysis incorporating species, pathways, and gene functions not only yields statistical significance but also reveals potential mechanisms of host-microbe interactions, advancing our understanding from single-point associations to mechanistic explanations. Finally, leveraging host genetic variants identified through M-GWAS as instrumental variables, we applied MR to infer causal relationships between the oral microbiome and host physiology—an approach increasingly adopted in microbiome research 12,52,53 . Based on observationally significant correlations, our MR analysis reveals compelling causal links between the oral microbiota and host hepatic/renal functions: microbial metabolic pathways (e.g., L-histidine and L-tyrosine biosynthesis) directly inhibit liver enzyme GGT activity through bioactive metabolites (e.g., thiohistidines) 42 , while specific bacteria like Rothia mucilaginosa and Granulicatella causally elevate GGT and ALT levels, suggesting their pathogenic potential in liver injury. Simultaneously, we identified an oral-kidney axis where Oribacterium SGB5283 increases blood uric acid levels, consistent with the host SLC2A9 gene’s function, while blood uric acid feedback inhibits uric acid-degrading bacteria 41 . These MR findings highlight the potential for microbiome-targeted interventions in the management of chronic diseases, particularly for hepatic and renal disorders, consistent with previous studies 54 . However, while MR provides strong evidence for causality, experimental validation through animal models or in vitro systems remains essential to confirm these mechanisms and establish clinical applicability 55,56 . Future studies should combine MR findings with functional experiments to translate these causal relationships into therapeutic interventions, ultimately advancing personalized microbiome-based therapies. Methods Study participants The primary study population was derived from the China Health and Retirement Longitudinal Study (CHARLS), a nationally representative cohort study of China’s middle-aged and older adult population and harmonized with the Health and Retirement Study (HRS) family of aging cohorts, including ELSA and SHARE 57,58 . CHARLS was launched in 2011, covering 17,705 respondents from 450 village-level units and 150 county-level units randomly selected across China. They were followed in 2013, 2015, 2018, 2020, and 2021-23, achieving a follow-up rate for baseline respondents above 85%. The CHARLS data have been widely used in the scientific community, with more than 160,000 users worldwide. CHARLS routinely collected anthropometric and biomarker data, along with a rich set of self-reported health, behavioral, and socioeconomic data 59 . In the 2021-2023 wave of the CHARLS survey, we initially included 11,931 individuals who achieved whole metagenomic sequencing (WMS) data of the tongue dorsum sample; of these, 8,590 had matched blood samples for whole-genome sequencing. For the remaining 3,341 samples without blood samples, we extracted host genomic reads directly from the tongue dorsum specimens, yielding an additional 3,341 host–microbiome paired dataset. To further include more samples across different ages to increase power, we also incorporated 2,017 individuals with high-depth sequenced blood WGS and WMS samples from the 4DSZ cohort 18 . Sample collection and sequencing protocols for blood and tongue dorsum specimens followed methods established in our prior work. Genomic DNA from blood was extracted using the MagPure Buffy Coat DNA Midi KF Kit (No. D3537-02) per the manufacturer’s protocol. Tongue dorsum samples were collected via swab, preserved in 2 mL of stabilization buffer, and processed with the MagPure Stool DNA KF Kit B (No. MD5115-02B), which includes a bead-beating step to enhance mechanical lysis of bacterial and fungal cells and improve microbial DNA yield. DNA concentrations were quantified using a Qubit fluorometer (Invitrogen). Libraries were constructed from 500 ng of DNA per sample and sequenced on the DNBSEQ platform with paired-end 100 bp reads. All procedures involving human participants were approved by the Institutional Review Boards (IRBs) of the CHARLS cohort (IRB00001052-11014) and BGI Shenzhen. Written informed consent was obtained from all participants before enrollment. Tongue dorsum microbiome sequencing, quality control and profiling Metagenomic sequencing was performed on the DNBSEQ platform for a total of 11,931 samples, with all samples sequenced using 100-bp paired-end reads and four libraries constructed per sequencing lane. For tongue dorsum samples, we generated an average of 19 GB of raw data per sample. Raw paired-end reads were first processed with fastp (v0.23.4) to remove sequencing adapters, trim low-quality bases, and discard short fragments, with adapter trimming enabled for both R1 and R2 and a minimum read-length cutoff of 30 bp. The resulting quality-filtered reads were then mapped to the human reference genome hg38 using Bowtie2 (v2.4.4) in end-to-end, very-sensitive mode, and non-host read pairs were extracted using samtools fastq (v1.12) by retaining only pairs in which both mates were unmapped. The host-removed clean reads were subsequently used for microbial profiling. The microbial taxonomic profile was calculated using MetaPhlAn4 60 (v4.0.6) based on the mpa_vOct22_CHOCOPhlAnSGB_202212 database. The marker gene database utilized by MetaPhlAn4 comprises approximately 1,000,000 fully annotated genomes, including 236,600 bacterial/archaeal reference genomes and 771,500 metagenome-assembled genomes, covering a broad spectrum of microbial diversity. Pathway and gene family level functional profiles were annotated and predicted using HUMAnN3 61 (v3.8; nucleotide-database: chocophlan v201901_v31; protein-database: uniref90_annotated_v201901b). Ultimately, we obtained a raw microbial taxonomic dataset encompassing 3890 taxa (26 phyla, 116 classes, 168 orders, 269 families, 842 genera, 2469 species), along with a dataset containing 559 pathways or functions. Whole-genome sequencing in the CHARLS discovery cohort The raw discovery cohort comprised 8,590 individuals with whole-genome sequencing depth of a predefined 20× for blood samples. Reads were filtered for adapter contamination and low-quality bases using SOAPnuke 62 (v1.5.6; -n 0.05, -q 0.2, -l 12, -M 2) and aligned to the GRCh38/hg38 reference genome using BWA 63 (v0.7.15) with default parameters. Aligned reads were converted to indexed BAM format using SAMtools (v0.1.18), and PCR duplicates were marked using Picard Tools (v1.62) for downstream filtering. Base quality score recalibration was performed using the Genome Analysis Toolkit (GATK 64 , v4.3.0). The BaseRecalibrator module generated a recalibration table by identifying known SNPs and indels from dbSNP (build 151) in the BAM files. Subsequent base quality recalibration was carried out with GATK Lite (v2.2.15), and read pairs flagged as misaligned by Stampy were removed. Variant calling was conducted with GATK’s HaplotypeCaller, producing gVCF files containing SNPs and indels. These were jointly processed with GATK (v4.3.0) to perform multi-sample genotyping and variant-quality filtering. CombineGVCFs merged gVCFs from all samples, followed by GenotypeGVCFs for joint genotyping. Variant quality scores were recalibrated using VariantRecalibrator, with filtering applied via ApplyRecalibration based on a Gaussian mixture model trained on high-confidence resources: for SNPs, HapMap 3.3, dbSNP (build 151), 1000 Genomes Omni 2.5M array, and 1000 Genomes Phase 1 high-confidence SNPs; for indels, Mills and 1000 Genomes gold standard indels and dbSNP (build 151). Variants were filtered at sensitivity thresholds of 99.5% for SNPs and 99.0% for indels. Further, a stringent variant inclusion criterion was applied: (1) mean depth > 5×; (2) Hardy–Weinberg equilibrium (HWE) P > 10 − 5 , and (3) genotype call rate > 98%. Samples were retained only if they met the following criteria: (1) mean depth > 6×; (2) variant call rate > 98%, and (3) absence of population stratification as assessed by principal component analysis (PCA) using PLINK 65 (v1.9), and (3) exclusion of related individuals based on identity-by-descent (IBD) estimates (Pi-hat threshold = 0.1875). Finally, 8,331 individuals with 5,589,561 high-quality common variants (MAF ≥ 5%) were used for subsequent M-GWAS analysis. Host genome extraction and genotype imputation from tongue dorsum metagenomic samples As blood-derived host genomic samples were unavailable for 3,341 participants from the CHARLS replication cohort, given the moderate host rate from the tongue dorsum as we previously reported, we extracted host genomic information from tongue dorsum metagenomic sequencing data. Sequencing reads were aligned to the human hg38_noalt reference genome using BWA-MEM, followed by duplicate removal and base-quality score recalibration using the Genome Analysis Toolkit (GATK). Genotype calling and likelihood estimation were performed using GATK and BCFtools. Low-pass sequencing genotypes were imputed using the Lowpass_v5-Human pipeline on the DCS platform (https://cloud.stomics.tech), which employs a hidden Markov model integrating genotype likelihoods with the refpanel_hg38 haplotype reference panel. This reference panel comprises 3,202 deeply sequenced samples from the 1000 Genomes Project, encompassing approximately 68 million variant sites. We preserved only variants with an imputation information score above 0.7. The imputed genotype data (impute.raw.vcf.gz) were merged across samples and converted to PLINK binary format (Bfile). This dataset was further refined to include only samples with no evidence of population stratification and no kinship (excluding related individuals based on pairwise identity by descent (IBD), with a Pi-hat threshold of 0.1875, in PLINK). Ultimately, 3,049 individuals with 5,589,561 variants identical to those used in the discovery cohort were included for M-GWAS replication analysis. To evaluate the quality of host genome data extracted from tongue dorsum samples, we also sequenced 14 blood samples from the cohort to a mean depth of 22× (ranging from 11× to 47×). We assessed concordance rates (CR) between the blood and the extracted host genome data. Compared with blood host genome data, the extracted host genome data had a missing rate of 0.32% but a high genotype concordance of 99% ( Supplementary Table 17 ). Further, PCA analysis showed no population stratification among the three datasets used in the M-GWAS study, and these individuals were obviously clustered into the East Asian population and separated from the other ethnic populations, such as African, American, European, and South Asian populations, when compared to the 1000 Genome datasets ( Supplementary Fig. 2 ). Correlation analysis of host PCs with microbial alpha diversity and PCoAs Based on species-level abundance data, microbial alpha diversity (Shannon and Simpson indices) and beta diversity (Bray-Curtis dissimilarity) were calculated using the 'diversity' and 'vegdist' functions from the R package 'vegan', respectively. Principal coordinate analysis (PCoA) was performed on the calculated beta-diversity dissimilarity using the 'capscale' function in 'vegan'. For each of the top 10 principal components, their associations with each alpha diversity index and each PCoA were analyzed using multivariable linear models, adjusting for sex, age, and BMI. Association analysis for microbial taxa and functions Given the statistical power of M-GWAS, we screened microbial taxa and functional pathways with an occurrence rate > 20%. After filtering, 744 microbial taxa and 335 pathways were kept. The representative genera of these microbial taxa covered 99.7% of the entire community in the cohort. Given the high correlation among many microbial taxa and functions, we performed multiple Spearman correlation tests to identify independent taxa for M-GWAS analysis to reduce the number of GWAS tests. Pairwise Spearman correlations between all taxa were calculated and used to construct an adjacency matrix, where correlations > 0.995 indicated edges between taxa. The graphical representation of this matrix was used to guide the greedy selection of representative taxa. Nodes (microbial taxa) were sorted by degree, and the node with the highest degree was selected as the final taxon (with random selection in case of ties). This taxon and its connected nodes were removed from the network, and the process was repeated until a final set of taxa was identified, ensuring that each discarded taxon was correlated with at least one selected taxon. This filtering ultimately yielded 845 microbial features (534 taxa and 311 functional pathways) for association analysis. We tested associations between host genetics and the oral microbiome using linear regression models based on relative abundance or logistic regression models based on presence/absence. Among these, 102 taxa and 243 pathways were analyzed using linear models, while the remaining 432 taxa and 68 pathways were analyzed using logistic models ( Supplementary Tables 5 and 6 ). Specifically, for the 345 microbial features present in > 95% of individuals, their relative abundances were log-transformed. Residuals were then calculated using 'lm' in R with the following covariates: (log 10 (microbial abundance) ~ age + sex + BMI + top 10 PCs). Model residuals were extracted using the residuals() function from the stats package and then used in univariate linear models to assess for associations with genotypes. However, for the 500 microbial features present in > 20% but < 95% of individuals, they were dichotomized into presence/absence patterns to prevent zero-inflation. Bacterial abundance was then analyzed as a binary trait using logistic regression, with the same covariates mentioned above as controls. These M-GWAS analyses were first performed in the CHARLS primary discovery cohort, and then the significant associations were further confirmed in the CHARLS replication and 4DSZ cohorts. Finally, a meta-analysis was performed on the association results of the three cohorts, using sample-size weighted fixed-effect meta-analysis in METAL 66 (updated 2020-05-05, https://genome.sph.umich.edu/wiki/METAL). The gene of genetic variants was annotated via the ANNOVAR 67 tool. The eQTL information of significant genetic variants was investigated by searching in the GTEx v8 dataset. The associations of significant genetic variants with reported phenotypes were investigated by searching in the GWAS catalog (https://www.ebi.ac.uk/gwas/), the BBJ dataset and the Chinese 4DSZ dataset. The regional plot was created with our own GWAS results at https://statgen.github.io/localzoom/. Association analysis for microbial alpha diversity and beta diversity GWAS for alpha diversity and the first 10 principal coordinates (PCoAs) was performed using linear analysis implemented in PLINK v1.9, with sex, age, BMI, and the top ten host genetic principal components as covariates. These M-GWAS analyses were performed across the three cohorts, and a meta-analysis was subsequently used for integration. Association analysis for microbial gene families To refine M-GWAS signals to the molecular function level, we further conducted association analysis for the identified 74 genome-wide significant loci with microbial gene families. Specifically, based on approximately 700,000 microbial gene family features output by HUMAnN3, we screened genome-wide variants using the same statistical framework and covariate adjustments (including sex, age, BMI, and the top ten host genetic PCs) as applied in the species and pathway analyses. These association analyses were performed in the three cohorts, respectively, and finally, a meta-analysis was used for integration. Blood type determination We performed genetic ABO blood-type assignment in the Chinese population using the method described by Wang et al. 68 . This approach utilizes the genotype combinations of three single-nucleotide polymorphisms (SNPs), rs8176719, rs635634, and rs7030248, to predict ABO blood type. These three SNPs combined were reported to be sufficient to predict blood type and achieved high accuracy (0.98) and F1 scores (micro 0.99 and macro 0.97) within the Chinese population. Consistently, we also achieved 98% accuracy (=1725/1760) while evaluating this approach on our in-house dataset. Thus, we extracted the genotypes of these three SNPs from the CHARLS-4DSZ cohort. For each individual, if any of the three SNPs were missing, the blood type was recorded as NA; otherwise, the ABO blood type was determined according to the SNP combination rules provided by Wang et al. The assigned blood types were O, A, B, and AB. Association analysis for host traits The 43 host traits, spanning hematological indices (hemoglobin, platelet counts), lipid profiles (triglycerides, HDL, LDL), renal markers (creatinine, cystatin C), systemic indicators (CRP, HbA1c), dental conditions (dentures, loss of tooth, chew ability), and diseases (diabetes, hypertension, digestive tract disease, etc.), were included for association analysis. To mitigate skewness, all quantitative traits were subject to a natural log-transformation, followed by outlier exclusion (observations > 4 standard deviations from the mean) and standardization (mean = 0, SD = 1). A linear model was used for the quantitative trait, and a logistic model was used for the binary trait, both implemented in PLINK. Age, gender, BMI and the top ten PCs were included as covariates. Observational correlation analysis The 94 microbial features associated with at least one variant at a genome-wide significant level ( P < 1 × 10 −8 ) were tested for associations with 43 host traits in 8,331 individuals from the CHARLS discovery cohort, which exhibited both microbiome and complete phenotypic data. Associations were assessed via multivariable linear regression adjusted for age and sex, BMI and top ten PCs, with significance determined by FDR correction ( P FDR < 0.05, Benjamini–Hochberg method). One-sample and two-sample MR analysis. To maximize robustness, causal relationships were limited to 94 microbial features associated with at least one variant at a genome-wide significant level ( P < 1 × 10 −8 ). To investigate the causal relationships between the 94 microbial features and 43 host traits, we first performed one-sample bidirectional MR analysis in 8,331 individuals with both microbiome and complete phenotypic data. We set P < 1 × 10⁻ 5 as the threshold to select SNP/INDEL instrumental variables for microbial features. SNP/INDEL instruments for blood metabolic traits and disease exposures were chosen at a genome-wide significance threshold ( P < 5 × 10 −8 ). Because no genome-wide significant variants were identified for dental conditions, SNP/INDEL instruments for dental conditions were also set to P < 1 × 10⁻ 5 . Subsequently, LD-clumping with a strict threshold ( r 2 < 0.1 in this high-depth CHARLS dataset) was performed to select independent genetic instruments with the lowest P values for exposure. We additionally calculated F-statistics and explained variance to demonstrate instrument strength directly ( Supplementary Table 15 ). The mean value of instrumental F statistics is 289 for microbial features and 186 for host traits. On average, 3.5% and 2.3% of variance could be explained by instruments for microbial features and host traits, respectively ( Supplementary Fig. 14 ). After selection of instrumental variables, unweighted polygenic risk scores (PRS) were calculated for each individual using PLINK v1.9. Each independent genetic variant was coded as 0, 1, or 2 based on the number of trait-specific risk-increasing alleles an individual carried. Then, a two-stage least squares (TSLS) regression approach 69 was employed for one-sample MR analysis. In the first stage, for each exposure trait, a linear regression model was used to assess the association between PRS and observed phenotypic values, obtaining predicted fitted values based on the instrumental variables. In the second stage, a linear regression was performed to relate the outcome trait to the genetically predicted exposure levels from the first stage. Both stages were adjusted for age, sex, BMI, and the top ten principal components of population structure. TSLS analysis for each trait was conducted using the 'ivreg' command from the AER package in R. To maximize sample size in MR analysis and confirm causal effects between microbial features and host traits, we also performed a two-sample BMR analysis using a GCTA-GSMR approach 70 as a robust validation and for new discovery. GWAS analysis for host traits was performed in a total 15,459 individuals, and then summary statistics data were used for two-sample MR analysis. Genetic variants with P < 1 × 10 -8 and LD r 2 < 0.1 were selected as instrumental variables for metabolic traits/diseases, whereas P < 1 × 10 -5 and LD r 2 < 0.1 for dental conditions. Declarations Data availability The metagenomic data have been deposited at https://db.cngb.org/data_resources/project/CNP0008650 (in uploading process) The summary statistics of associations between host genetics and microbiome have been uploaded to the GSA database (https://ngdc.cncb.ac.cn/gsub/submit/bioproject/subPRO073610/overview). The release of these data was approved by the National Health Commission of China (Project ID: xxx, in preparation). Access to individual-level host genetic data requires approval from the corresponding authors ( [email protected] , [email protected] , [email protected] ) and compliance with the regulations of the Human Genetic Resources Administration of China. The human reference genome hg38 dataset is publicly available at http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/. Code availability Host genome sequencing reads were aligned to the GRCh38/hg38 reference genome using BWA (v0.7.15). Alignments were converted to indexed BAM format using SAMtools (v0.1.18), and PCR duplicates were marked with Picard Tools (v1.62). Genomic variant calling and base quality recalibration were performed using the Genome Analysis Toolkit (GATK) (v3.8) and GATK Lite (v2.2.15). Low-pass sequencing genotypes were imputed using BCFtools and the Lowpass_v5-Human platform with the refpanel_hg38 reference panel. Metagenomic reads were aligned to hg38 using BWA-MEM, and taxonomic profiling was conducted with MetaPhlAn4 (v4.1.1). Functional annotation utilized HUMAnN3 (v3.0.0.alpha.3). Quality control, association analyses, and principal component analysis (PCA) were implemented in PLINK (v1.9). Statistical analyses, including Mendelian randomization, were performed in R. One-sample MR employed the TSLS method, while two-sample MR used GSMR (v1.0.7). Acknowledgements We thank all the participants for agreeing to join this study. We are very grateful to the colleagues from the CHARLS cohort for sample collection, and the colleagues at BGI Research for DNA extraction, library construction, and sequencing. This work was supported by the Ministry of Science and Technology of China (Grant Nos. 2022ZD0211600 and 2023YFC3603300). We thank the Shenzhen Key Laboratory of Neurogenomics (BGI Genomics, Project No. CXB201108250094A) for support with sequencing and analysis. D.W. is supported by the Netherlands Organization for Scientific Research (NWO)-VENI grant VI.Veni.222.016. Authors' contributions Y.Z. conceived and directed the CHARLS Cohort construction. Y.Z., T.Z., and C.N. conceived and directed this study. G.W., Q.M., B.C., X.C., Y.L., R.Z., and J.G. had established a detailed end-to-end process for sample management, including sample collection, transportation, and storage, ensuring standardization throughout the sample reception process. X.L. led the bioinformatics analyses with contributions from L.Z., Y.W., J.C., X.H., and D.W. X.L. conceived the framework of the article and wrote the initial manuscript. All authors contributed to the revision and discussion of the manuscript. Competing interests The authors have declared no competing interests. References Ley, R. E., Turnbaugh, P. J., Klein, S. & Gordon, J. I. Microbial ecology: human gut microbes associated with obesity. Nature 444 , 1022–1023 (2006). Qin, J. et al. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490 , 55–60 (2012). Wang, J. & Jia, H. Metagenome-wide association studies: fine-mining the microbiome. Nat Rev Microbiol 14 , 508–522 (2016). Microbiota in health and diseases | Signal Transduction and Targeted Therapy. https://www.nature.com/articles/s41392-022-00974-4. The oral–gut microbiome axis in health and disease | Nature Reviews Microbiology. https://www.nature.com/articles/s41579-024-01075-5. Hajishengallis, G. Periodontitis: from microbial immune subversion to systemic inflammation. Nat Rev Immunol 15 , 30–44 (2015). Kilian, M. et al. The oral microbiome – an update for oral healthcare professionals. Br Dent J 221 , 657–666 (2016). Hajishengallis, G. & Chavakis, T. Local and systemic mechanisms linking periodontal disease and inflammatory comorbidities. Nat Rev Immunol 21 , 426–440 (2021). Sedghi, L., DiMassa, V., Harrington, A., Lynch, S. V. & Kapila, Y. L. The oral microbiome: Role of key organisms and complex networks in oral health and disease. Periodontol 2000 87 , 107–131 (2021). Kurilshikov, A. et al. Large-scale association analyses identify host factors influencing human gut microbiome composition. Nat Genet 53 , 156–165 (2021). Lopera-Maya, E. A. et al. Effect of host genetics on the gut microbiome in 7,738 participants of the Dutch Microbiome Project. Nat Genet 54 , 143–151 (2022). Qin, Y. et al. Combined effects of host genetics and diet on human gut microbiota and incident disease in a single population cohort. Nat Genet 54 , 134–142 (2022). Medina-Gomez, C. et al. Challenges in conducting genome-wide association studies in highly admixed multi-ethnic populations: the Generation R Study. Eur J Epidemiol 30 , 317–330 (2015). Gurdasani, D., Barroso, I., Zeggini, E. & Sandhu, M. S. Genomics of disease risk in globally diverse populations. Nat Rev Genet 20 , 520–535 (2019). Weyrich, L. S. The evolutionary history of the human oral microbiota and its implications for modern health. Periodontology 2000 85 , 90–100 (2021). Santonocito, S. et al. A Cross-Talk between Diet and the Oral Microbiome: Balance of Nutrition on Inflammation and Immune System’s Response during Periodontitis. Nutrients 14 , 2426 (2022). Shaw, L. et al. The Human Salivary Microbiome Is Shaped by Shared Environment Rather than Genetics: Evidence from a Large Family of Closely Related Individuals. mBio 8 , e01237-17 (2017). Liu, X. et al. Metagenome-genome-wide association studies reveal human genetic impact on the oral microbiome. Cell Discov 7 , 117 (2021). Mark Welch, J. L., Ramírez-Puebla, S. T. & Borisy, G. G. Oral Microbiome Geography: Micron-Scale Habitat and Niche. Cell Host Microbe 28 , 160–168 (2020). Carr, V. R. et al. Abundance and diversity of resistomes differ between healthy human oral cavities and gut. Nat Commun 11 , 693 (2020). Roldán, S., Herrera, D. & Sanz, M. Biofilms and the tongue: therapeutical approaches for the control of halitosis. Clin Oral Investig 7 , 189–197 (2003). Strain profiling and epidemiology of bacterial species from metagenomic sequencing | Nature Communications. https://www.nature.com/articles/s41467-017-02209-5. Walters, R. G. et al. Genotyping and population characteristics of the China Kadoorie Biobank. Cell Genomics 3 , 100361 (2023). Li, L. et al. The ChinaMAP reference panel for the accurate genotype imputation in Chinese populations. Cell Res 31 , 1308–1310 (2021). He, Y. et al. East Asian-specific and cross-ancestry genome-wide meta-analyses provide mechanistic insights into peptic ulcer disease. Nat Genet 55 , 2129–2138 (2023). Genome-wide association study in 8,956 German individuals identifies influence of ABO histo-blood groups on gut microbiome | Nature Genetics. https://www.nature.com/articles/s41588-020-00747-1. Kelly, R. J., Rouquier, S., Giorgi, D., Lennon, G. G. & Lowe, J. B. Sequence and expression of a candidate for the human Secretor blood group alpha(1,2)fucosyltransferase gene (FUT2). Homozygosity for an enzyme-inactivating nonsense mutation commonly correlates with the non-secretor phenotype. J Biol Chem 270 , 4640–4649 (1995). Rausch, P. et al. Colonic mucosa-associated microbiota is influenced by an interaction of Crohn disease and FUT2 (Secretor) genotype. Proc Natl Acad Sci U S A 108 , 19030–19035 (2011). Human Leukocyte Antigen (HLA) System: Genetics and Association with Bacterial and Viral Infections - Medhasi - 2022 - Journal of Immunology Research - Wiley Online Library. https://onlinelibrary.wiley.com/doi/10.1155/2022/9710376. Poole, A. C. et al. Human Salivary Amylase Gene Copy Number Impacts Oral and Gut Microbiomes. Cell Host Microbe 25 , 553-564.e7 (2019). Perry, G. H. et al. Diet and the evolution of human amylase gene copy number variation. Nat Genet 39 , 1256–1260 (2007). Esberg, A., Haworth, S., Hasslöf, P., Lif Holgerson, P. & Johansson, I. Oral Microbiota Profile Associates with Sugar Intake and Taste Preference Genes. Nutrients 12 , 681 (2020). Stubbs, M. et al. Encoding of human basic and glycosylated proline-rich proteins by the PRB gene complex and proteolytic processing of their precursor proteins. Arch Oral Biol 43 , 753–770 (1998). Choi, S. H. et al. Six Novel Loci Associated with Circulating VEGF Levels Identified by a Meta-analysis of Genome-Wide Association Studies. PLoS Genet 12 , e1005874 (2016). Mochizuki, Y. et al. Phosphatidylinositol 3-Phosphatase Myotubularin-related Protein 6 (MTMR6) Is Regulated by Small GTPase Rab1B in the Early Secretory and Autophagic Pathways *. Journal of Biological Chemistry 288 , 1009–1021 (2013). Castro-Sánchez, P., Ramirez-Munoz, R. & Roda-Navarro, P. Gene Expression Profiles of Human Phosphotyrosine Phosphatases Consequent to Th1 Polarisation and Effector Function. Journal of Immunology Research 2017 , 8701042 (2017). The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369 , 1318–1330 (2020). Yin, J. et al. Single-Cell Genomics Elucidates Molecular Variations and Regulatory Mechanisms in Circulating Immune Cells. bioRxiv 2025.01.26.634963 (2025) doi:10.1101/2025.01.26.634963. Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res 42 , D1001-1006 (2014). Ishigaki, K. et al. Large-scale genome-wide association study in a Japanese population identifies novel susceptibility loci across different diseases. Nat Genet 52 , 669–679 (2020). Liu, Y. et al. A widely distributed gene cluster compensates for uricase loss in hominids. Cell 186 , 3400-3413.e20 (2023). Brancaccio, M. et al. Sulfur-containing histidine compounds inhibit γ-glutamyl transpeptidase activity in human cancer cells. Journal of Biological Chemistry 294 , 14603–14614 (2019). Lu, Y. et al. Association between Serum Uric Acid Levels and Salivary Microbiota in Patients with Obstructive Sleep Apnea. J. Microbiol. Biotechnol. 35 , e2503042 (2025). Blekhman, R. et al. Host genetic variation impacts microbiome composition across human body sites. Genome Biol 16 , 191 (2015). Goodrich, J. K. et al. Human Genetics Shape the Gut Microbiome. Cell 159 , 789–799 (2014). Wang, J. et al. Genome-wide association analysis identifies variation in vitamin D receptor and other host factors influencing the gut microbiota. Nat Genet 48 , 1396–1406 (2016). GEM Project Research Consortium et al. Association of host genome with intestinal microbial composition in a large healthy cohort. Nat Genet 48 , 1413–1417 (2016). Bonder, M. J. et al. The effect of host genetics on the gut microbiome. Nat Genet 48 , 1407–1412 (2016). Rothschild, D. et al. Environment dominates over host genetics in shaping human gut microbiota. Nature 555 , 210–215 (2018). Sanna, S., Kurilshikov, A., Van Der Graaf, A., Fu, J. & Zhernakova, A. Challenges and future directions for studying effects of host genetics on the gut microbiome. Nat Genet 54 , 100–106 (2022). Kamitaki, N. et al. Human and bacterial genetic variation shape oral microbiomes and health. medRxiv 2025.03.31.25324952 (2025) doi:10.1101/2025.03.31.25324952. Liu, X. et al. Mendelian randomization analyses support causal relationships between blood metabolites and the gut microbiome. Nat Genet 54 , 52–61 (2022). Boulund, U. et al. Gut microbiome associations with host genotype vary across ethnicities and potentially influence cardiometabolic traits. Cell Host & Microbe 30 , 1464-1480.e6 (2022). Sumida, K. et al. Gut Microbiota-Targeted Interventions in the Management of Chronic Kidney Disease. Seminars in Nephrology 43 , 151408 (2023). Walter, J., Armet, A. M., Finlay, B. B. & Shanahan, F. Establishing or Exaggerating Causality for the Gut Microbiome: Lessons from Human Microbiota-Associated Rodents. Cell 180 , 221–232 (2020). Schmidt, T. S. B., Raes, J. & Bork, P. The Human Gut Microbiome: From Association to Modulation. Cell 172 , 1198–1215 (2018). Zhao, Y., Hu, Y., Smith, J. P., Strauss, J. & Yang, G. Cohort profile: the China Health and Retirement Longitudinal Study (CHARLS). Int J Epidemiol 43 , 61–68 (2014). Chen, X. et al. Venous Blood-Based Biomarkers in the China Health and Retirement Longitudinal Study: Rationale, Design, and Results From the 2015 Wave. Am J Epidemiol 188 , 1871–1877 (2019). Chen, X. et al. Venous Blood-Based Biomarkers in the China Health and Retirement Longitudinal Study: Rationale, Design, and Results From the 2015 Wave. Am J Epidemiol 188 , 1871–1877 (2019). Blanco-Míguez, A. et al. Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4. Nat Biotechnol 41 , 1633–1644 (2023). Beghini, F. et al. Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3. eLife 10 , e65088 (2021). Chen, Y. et al. SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data. GigaScience 7 , (2018). Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25 , 1754–1760 (2009). McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20 , 1297–1303 (2010). Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81 , 559–575 (2007). Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26 , 2190–2191 (2010). Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res 38 , e164 (2010). Wang, M., Gao, J., Liu, J., Zhao, X. & Lei, Y. Genomic Association vs. Serological Determination of ABO Blood Types in a Chinese Cohort, with Application in Mendelian Randomization. Genes (Basel) 12 , 959 (2021). Permutt, T. & Hebel, J. R. Simultaneous-equation estimation in a clinical trial of the effect of smoking on birth weight. Biometrics 45 , 619–622 (1989). Zhu, Z. et al. Causal associations between risk factors and common diseases inferred from GWAS summary data. Nat Commun 9 , 224 (2018). Table Table 1 | Ten replicated host genetic loci associated with the tongue microbiome. Loci Variant MAF Taxon/Functions Discovery β ( P ) Replication β ( P ) 4DSZ β ( P ) P meta Phenome-wide GWAS information FUT2 19:48703374:T:A 0.444 s. Haemophilus sputorum _HB -0.443 (7.42E-41) -0.278 (2.99E-07) -0.336 (3.54E-07) 9.71E-51 cancer biomarker mesurement (6E-209), Alkaline phosphatase (2.3E-163), serum carcinoembryonic antigen (3E-81), vitamin B12 (4E-36), serum alanine aminotransferase (9E-13), Cholelithiasis (1E-11), BBJ_Gastric ulcer (1.7E-8) 19:48703374:T:A 0.444 s. Granulicatella SGB8239 _HB -0.251 (1.35E-15) -0.184 (5.90E-04) -0.256 (1.12E-04) 3.80E-21 19:48703374:T:A 0.444 s. Veillonella SGB6928 _HB -0.264 (3.57E-13) -0.273 (1.55E-05) -0.227 (1.98E-02) 2.64E-18 19:48709897:C:T 0.472 DARABCATK12-PWY: Darabinose degradation I_HB -0.195 (9.59E-10) -0.087 (4.53E-02) -0.148 (3.69E-02) 1.33E-10 POLI, C18orf54 18:54497190:A:T 0.297 s. Haemophilus parahaemolyticus _HB -0.234 (7.76E-12) -0.340 (2.06E-09) -0.121 (9.12E-02) 4.47E-19 cortical thickness (2E-13), neuroticism mesurement (8E-10), blood white blood cell counts (4.8E-8) 18:54499286:T:A 0.295 PWY-7204: pyridoxal 5'-phosphate salvage II (plants)_HB -0.249 (4.07E-11) -0.356 (2.20E-07) -0.114 (1.41E-01) 1.25E-16 AMY1C 1:103823102:T:A 0.397 g. Stomatobaculum LOGres 0.076 (1.05E-11) 0.144 (4.11E-02) 0.000 (9.99E-01) 2.22E-10 alpha-amylase 1 measurement (3E-69), amylase measurement (1E-16) PRB3 12:11242238:A:C 0.432 s. Streptococcus infantis _LOGres -0.072 (4.66E-11) -0.091 (4.12E-02) -0.056 (1.36E-02) 1.08E-12 bl_vitamin D (1E-7) SLC2A9 4:10009906:C:T 0.406 g. Lachnoanaerobaculum _LOGres -0.072 (8.97E-11) -0.168 (1.23E-02) -0.092 (4.80E-05) 3.46E-15 urate measurement (9E-3353), blood uric acid (6E-496) 4:9988548:C:T 0.374 g. Candidatus Nanosyncoccus _HB -0.196 (1.42E-09) -0.087 (1.09E-01) -0.195 (3.88E-03) 2.80E-11 4:10047243:C:T 0.404 g. Lachnoanaerobaculum sp. ICM7_HB -0.188 (2.74E-09) -0.212 (6.06E-05) -0.260 (1.08E-04) 5.56E-16 4:10014328:G:C 0.380 PWY-6353: purine nucleotides degradation II (aerobic)_LOGres -0.066 (3.01E-09) -0.064 (2.21E-02) -0.082 (4.20E-04) 1.05E-12 4:10014328:G:C 0.380 SALVADEHYPOX-PWY: adenosine nucleotides degradation II_LOGres -0.066 (3.37E-09) -0.065 (2.55E-02) -0.081 (4.96E-04) 1.58E-12 4:10057718:TA:T 0.372 s. Oribacterium SGB5283 _HB 0.183 (4.83E-08) 0.189 (5.75E-04) NA 1.10E-10 HLA-DRA,HLA-DRB5 6:32466946:A:T 0.061 p. Bacteroidetes | s. GGB1611_SGB2208 _HB -0.425 (1.05E-09) -0.467 (3.34E-05) -0.274 (4.88E-02) 3.91E-14 rheumatoid arthritis (6.6E-38), autoimmune disease (2.7E-27), staphylococcus seropositivity (2E-26), white blood cell counts (1.1E-17) TMCO3 13:113536340:CA:C 0.381 s. Capnocytophaga sp . oral taxon 863 _HB 0.191 (6.00E-09) 0.116 (2.54E-02) 0.090 (1.85E-01) 6.61E-10 blood urea nitrogen (1.6E-4) MRPS18A,VEGFA 6:43728831:GT:G 0.059 s. GGB12441_SGB19290 _HB 0.395 (9.31E-09) 0.237 (2.96E-02) 0.375 (1.04E-02) 5.36E-11 white blood cell count (1.1E-4) MTMR12,ZFR 5:32449710:T:C 0.167 p. Bacteroidetes | c. CFGB570_LOGres 0.061 (2.54E-08) 0.18 (2.36E-02) 0 (0.972) 4.41E-8 heat disease(2.6E-4) MTMR6,NUP58 13:25293309:T:C 0.411 PWY-2941:L-lysine biosynthesis II_LOGres -0.057 (3.70E-07) -0.085 (8.94E-06) NA 2.82E-11 13:25293309:T:C 0.411 PWY-5910: superpathway of geranylgeranyl diphosphate biosynthesis I (via mevalonate)_LOGres -0.054 (1.21E-06) -0.114 (2.32E-05) NA 2.17E-10 13:25293309:T:C 0.411 LACTOSECAT-PWY: lactose and galactose degradation I_LOGres -0.057 (2.39E-07) -0.108 (2.49E-04) NA 2.62E-10 13:25293309:T:C 0.411 o. Lactobacillales_LOGres -0.051 (2.74E-06) -0.175 (1.52E-05) NA 4.07E-10 13:25293309:T:C 0.411 PWY-922: mevalonate pathway I (eukaryotes and bacteria)_LOGres -0.057 (2.92E-07) -0.087 (3.96E-03) NA 6.64E-10 13:25293309:T:C 0.411 c. Bacilli_LOGres -0.052 (2.43E-06) -0.166 (3.53E-05) NA 1.65E-09 13:25293309:T:C 0.411 g. Streptococcus _LOGres -0.049 (7.34E-06) -0.171 (2.26E-05) NA 4.14E-09 Ten loci were identified through M-GWAS, with eight reaching study-wide significance. Association statistics include effect size (β) and P values for the discovery cohort, replication cohort and combined 4DSZ cohort, along with the meta-analysis P values ( P meta ). Previously reported phenotype associations from phenome-wide GWAS are annotated where available. MAF, minor allele frequency. Additional Declarations There is NO Competing Interest. Supplementary Files SupplementaryTables1216.xlsx Supplementary Table 1-20 SupplementaryFigures.charls.mgwas.20251118.docx Supplementary Figure 1-14 Cite Share Download PDF Status: Under Review Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8406553","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Biological Sciences - Article","associatedPublications":[],"authors":[{"id":569478094,"identity":"a88e5f11-2905-484d-8e67-27996474eae2","order_by":0,"name":"Xiaomin Liu","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA20lEQVRIiWNgGAWjYBACPmYQWWED4TE2AIkDBLSwgbWcSQMSzMRqARvecpgULew8ZlI3G87bG5w/f/DBzx0Mcnw3Ehg/F+B1GI+xce6O24kbbiQzG/aeYTCWvJHALD0DvxbDx7lnbicY3GBmk2ZsYwDqTQAK4tdicDi37RzQYYfZfwO11BOjBWhL2wHGDQeS2ZiBWoDWEdTCVmyccyY5ceaNZGPJ3jYJw5lnHjZL49PCz394m3ROhZ093/mDDz/8bLOR5zuefPAzPi3oQIIBGjujYBSMglEwCigBAL/ORwLHAFQ1AAAAAElFTkSuQmCC","orcid":"https://orcid.org/0000-0003-2155-3627","institution":"BGI-Shenzhen","correspondingAuthor":true,"prefix":"","firstName":"Xiaomin","middleName":"","lastName":"Liu","suffix":""},{"id":569478095,"identity":"4698f5e7-7143-411d-b778-927c840b38b8","order_by":1,"name":"Yaohui Zhao","email":"","orcid":"","institution":"Dongfureng School of Social and Economic Development, Wuhan University","correspondingAuthor":false,"prefix":"","firstName":"Yaohui","middleName":"","lastName":"Zhao","suffix":""},{"id":569478096,"identity":"71303eff-0f5c-48eb-a009-1f006cd212ac","order_by":2,"name":"Yafeng Wang","email":"","orcid":"","institution":"Institute for Social Science Survey, Peking University","correspondingAuthor":false,"prefix":"","firstName":"Yafeng","middleName":"","lastName":"Wang","suffix":""},{"id":569478097,"identity":"9190eb39-9c76-4f7b-b67f-7fa1e621f649","order_by":3,"name":"Longke Zeng","email":"","orcid":"","institution":"BGI Research, Wuhan","correspondingAuthor":false,"prefix":"","firstName":"Longke","middleName":"","lastName":"Zeng","suffix":""},{"id":569478098,"identity":"4c60658e-63e2-419e-a745-c0b361a9b720","order_by":4,"name":"Gewei Wang","email":"","orcid":"","institution":"Dongfureng School of Social and Economic Development, Wuhan University","correspondingAuthor":false,"prefix":"","firstName":"Gewei","middleName":"","lastName":"Wang","suffix":""},{"id":569478099,"identity":"b4caeee9-1700-4667-a2bf-3a2bca470ba5","order_by":5,"name":"Junhong Chen","email":"","orcid":"https://orcid.org/0000-0002-8313-8317","institution":"BGI Research, Wuhan","correspondingAuthor":false,"prefix":"","firstName":"Junhong","middleName":"","lastName":"Chen","suffix":""},{"id":569478100,"identity":"9ec7b4d1-3f3b-4a83-bfe5-cc231f52fc48","order_by":6,"name":"Xinxin Chen","email":"","orcid":"","institution":"Dongfureng School of Social and Economic Development, Wuhan University","correspondingAuthor":false,"prefix":"","firstName":"Xinxin","middleName":"","lastName":"Chen","suffix":""},{"id":569478101,"identity":"a11b9185-e1cc-43ba-9d78-f2be3f2e967f","order_by":7,"name":"Yan Li","email":"","orcid":"","institution":"","correspondingAuthor":false,"prefix":"","firstName":"Yan","middleName":"","lastName":"Li","suffix":""},{"id":569478102,"identity":"931341db-c9d4-4019-80e8-bc6f8e5c0b60","order_by":8,"name":"Rui Zhao","email":"","orcid":"","institution":"Dongfureng School of Social and Economic Development, Wuhan University","correspondingAuthor":false,"prefix":"","firstName":"Rui","middleName":"","lastName":"Zhao","suffix":""},{"id":569478103,"identity":"aad37e74-065b-4441-a8b7-50141df9546a","order_by":9,"name":"Daoming Wang","email":"","orcid":"https://orcid.org/0000-0003-4623-8527","institution":"University Medical Center Groningen","correspondingAuthor":false,"prefix":"","firstName":"Daoming","middleName":"","lastName":"Wang","suffix":""},{"id":569478104,"identity":"0fa23e3e-112e-4c9d-b85c-f1949a3b873b","order_by":10,"name":"Xuanlin Huang","email":"","orcid":"","institution":"State Key Laboratory of Genome and Multi-omics Technologies,BGI Research","correspondingAuthor":false,"prefix":"","firstName":"Xuanlin","middleName":"","lastName":"Huang","suffix":""},{"id":569478105,"identity":"5888db15-9d7c-4e68-8b91-5b8613762505","order_by":11,"name":"Bing Chen","email":"","orcid":"","institution":"BGI-Shenzhen","correspondingAuthor":false,"prefix":"","firstName":"Bing","middleName":"","lastName":"Chen","suffix":""},{"id":569478106,"identity":"bac04303-a163-45cb-ad98-43e64e42199f","order_by":12,"name":"Qinqin Meng","email":"","orcid":"","institution":"Dongfureng School of Social and Economic Development, Wuhan University","correspondingAuthor":false,"prefix":"","firstName":"Qinqin","middleName":"","lastName":"Meng","suffix":""},{"id":569478107,"identity":"071dcb80-044e-4b35-a6f2-e33e11d8f3a9","order_by":13,"name":"Jinquan Gong","email":"","orcid":"","institution":"Dongfureng School of Social and Economic Development, Wuhan University","correspondingAuthor":false,"prefix":"","firstName":"Jinquan","middleName":"","lastName":"Gong","suffix":""},{"id":569478108,"identity":"8deafb10-2ffa-4c13-921b-2dceb3dd7630","order_by":14,"name":"Yong Zhang","email":"","orcid":"https://orcid.org/0000-0001-9950-1793","institution":"BGI-Research","correspondingAuthor":false,"prefix":"","firstName":"Yong","middleName":"","lastName":"Zhang","suffix":""},{"id":569478109,"identity":"47322fb7-43cc-49d5-a60d-053d1cf31a4d","order_by":15,"name":"Jian Wang","email":"","orcid":"","institution":"Dong Fureng Institute of Economic and Social Development, Wuhan University, 54 Dongsi Lishi Hutong, Beijing, China; Center for Health Economics and Management at School of Economics and Management","correspondingAuthor":false,"prefix":"","firstName":"Jian","middleName":"","lastName":"Wang","suffix":""},{"id":569478110,"identity":"0e25cff8-ff6f-4bff-8e33-96be8d524209","order_by":16,"name":"Min Guo","email":"","orcid":"","institution":"Dongfureng School of Social and Economic Development, Wuhan University; University of International Business and Economics","correspondingAuthor":false,"prefix":"","firstName":"Min","middleName":"","lastName":"Guo","suffix":""},{"id":569478111,"identity":"3d424c61-91ad-4ecd-8676-9844e86d41a8","order_by":17,"name":"Yuxiang Yang","email":"","orcid":"","institution":"Dongfureng School of Social and Economic Development, Wuhan University","correspondingAuthor":false,"prefix":"","firstName":"Yuxiang","middleName":"","lastName":"Yang","suffix":""},{"id":569478112,"identity":"8f2297a0-9c57-4a1b-8138-c9a21dcc44e6","order_by":18,"name":"Hui Wang","email":"","orcid":"","institution":"Beijing Zhongguancun Hospital","correspondingAuthor":false,"prefix":"","firstName":"Hui","middleName":"","lastName":"Wang","suffix":""},{"id":569478113,"identity":"508fd222-85f1-4300-ab64-4a8141a2c2fc","order_by":19,"name":"Hongyan Zhou","email":"","orcid":"","institution":"Dongfureng School of Social and Economic Development, Wuhan University","correspondingAuthor":false,"prefix":"","firstName":"Hongyan","middleName":"","lastName":"Zhou","suffix":""},{"id":569478114,"identity":"cab7abd4-b086-4d5b-9106-eb5578f83220","order_by":20,"name":"Jun Wang","email":"","orcid":"","institution":"Dongfureng School of Social and Economic Development, Wuhan University","correspondingAuthor":false,"prefix":"","firstName":"Jun","middleName":"","lastName":"Wang","suffix":""},{"id":569478115,"identity":"0aad0262-e35a-47e7-a146-4ae357b35e11","order_by":21,"name":"Yuan Jia","email":"","orcid":"","institution":"Dongfureng School of Social and Economic Development, Wuhan University","correspondingAuthor":false,"prefix":"","firstName":"Yuan","middleName":"","lastName":"Jia","suffix":""},{"id":569478116,"identity":"3e758ea6-243b-4e84-90b2-00132f9b3972","order_by":22,"name":"Chuan Chen","email":"","orcid":"","institution":"Dongfureng School of Social and Economic Development, Wuhan University","correspondingAuthor":false,"prefix":"","firstName":"Chuan","middleName":"","lastName":"Chen","suffix":""},{"id":569478117,"identity":"a039ab15-4052-47f6-98d7-51358ecab07f","order_by":23,"name":"Jingwei Huang","email":"","orcid":"","institution":"Dongfureng School of Social and Economic Development, Wuhan University","correspondingAuthor":false,"prefix":"","firstName":"Jingwei","middleName":"","lastName":"Huang","suffix":""},{"id":569478118,"identity":"231d1702-00cf-49b1-b31f-6d8356094509","order_by":24,"name":"Rudai Bi","email":"","orcid":"","institution":"Dongfureng School of Social and Economic Development, Wuhan University","correspondingAuthor":false,"prefix":"","firstName":"Rudai","middleName":"","lastName":"Bi","suffix":""},{"id":569478119,"identity":"0c256ea5-5c50-4f0b-a535-082c2fab4972","order_by":25,"name":"Zheng Zhang","email":"","orcid":"","institution":"Dongfureng School of Social and Economic Development, Wuhan University","correspondingAuthor":false,"prefix":"","firstName":"Zheng","middleName":"","lastName":"Zhang","suffix":""},{"id":569478120,"identity":"bda36762-5e8f-484d-aaa6-7b1633229c2c","order_by":26,"name":"Xun Xu","email":"","orcid":"https://orcid.org/0000-0002-5338-5173","institution":"BGI Research","correspondingAuthor":false,"prefix":"","firstName":"Xun","middleName":"","lastName":"Xu","suffix":""},{"id":569478121,"identity":"94d45083-355d-4f30-9c13-d38618a5c75b","order_by":27,"name":"Xin Jin","email":"","orcid":"https://orcid.org/0000-0001-7554-4975","institution":"BGI Research","correspondingAuthor":false,"prefix":"","firstName":"Xin","middleName":"","lastName":"Jin","suffix":""},{"id":569478122,"identity":"19da2a24-bf8c-4ae4-b6b5-3c9b9dd304b8","order_by":28,"name":"Liang Xiao","email":"","orcid":"https://orcid.org/0000-0003-0836-4397","institution":"BGI-Shenzhen","correspondingAuthor":false,"prefix":"","firstName":"Liang","middleName":"","lastName":"Xiao","suffix":""},{"id":569478123,"identity":"639fb22a-7d4c-4fbf-9023-0f60e46909ea","order_by":29,"name":"Zhenhua Mao","email":"","orcid":"","institution":"Dongfureng School of Social and Economic Development, Wuhan University; School of Business, Hong Kong University","correspondingAuthor":false,"prefix":"","firstName":"Zhenhua","middleName":"","lastName":"Mao","suffix":""},{"id":569478124,"identity":"1b1518b1-5ef3-40ab-a0a4-e3b128ed6346","order_by":30,"name":"Chao Nie","email":"","orcid":"","institution":"State Key Laboratory of Genome and Multi-omics Technologies,BGI Research","correspondingAuthor":false,"prefix":"","firstName":"Chao","middleName":"","lastName":"Nie","suffix":""},{"id":569478125,"identity":"44178ee6-e76b-4701-9994-a13d64c9b364","order_by":31,"name":"Tao Zhang","email":"","orcid":"https://orcid.org/0000-0003-2765-2802","institution":"BGI-Shenzhen","correspondingAuthor":false,"prefix":"","firstName":"Tao","middleName":"","lastName":"Zhang","suffix":""}],"badges":[],"createdAt":"2025-12-19 16:05:26","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8406553/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8406553/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":100408683,"identity":"35e12cac-ecf1-4eb6-88fe-0c87c21c7e23","added_by":"auto","created_at":"2026-01-16 13:06:24","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":66403,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eThe CHARLS-4DSZ cohort revealed that host genetic structure and age strongly contributed to the oral microbiome composition.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ea, \u003c/strong\u003eOverview of study populations and analytical framework. This study included a total of 13,397 participants from the CHARLS-4DSZ cohort, comprising a discovery stage of 8,831 individuals (tongue dorsum WMS and blood WGS) and a replication stage of 3,036 individuals (tongue dorsum WMS and extracted WGS) from the CHARLS cohort, as well as 2,017 participants from the 4DSZ cohort incorporated for cross-cohort validation. The analytical framework included: (i) assessing the effects of covariates including host genetic principal components (PCs), sex, age, and BMI on the microbiome composition; (ii) conducting metagenome-genome-wide association studies (M-GWAS) to identify host genetic variants associated with microbial features; and (iii) performing observational correlation and Mendelian randomization (MR) analyses using genetic variants as instrumental variables to evaluate potential causal effects between microbial features and host traits. \u003cstrong\u003eb, \u003c/strong\u003eCorrelations of the top ten host PCs, age, sex, and BMI with microbial diversity and composition. Standardized β values were estimated using linear regression models to assess the associations of host genetic PC1-10, age, sex, and BMI with microbial α-diversity indices and the top ten microbial PCs (PCoA1-10). The explained variances of host potential covariates on microbial composition were also evaluated using the permutational multivariate analysis of variance (PERMANOVA, \u003cem\u003eP\u003c/em\u003e value calculated based on 1000 permutations).\u003c/p\u003e","description":"","filename":"Binder11.png","url":"https://assets-eu.researchsquare.com/files/rs-8406553/v1/b69845f88b13789e62313db4.png"},{"id":100408453,"identity":"17f4e206-2e85-4ab7-baff-4c151b059304","added_by":"auto","created_at":"2026-01-16 13:06:16","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":26976,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eHost genetic signals associated with tongue dorsum microbial taxa and functions.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eMirror Manhattan plots illustrated host genetic associations with microbial taxa (n = 534) (upper panel) and pathways (n = 311) (lower panel). The light blue and purple dashed lines indicated the genome-wide and study-wide significance thresholds, respectively, with their corresponding values shown in the upper-left corner of the plot. The well-replicated associations in either of the two validation cohorts with the same effect direction were marked in red, while the others were marked in blue/grey. The ten well-replicated signals of host genes and their associated microbial features were listed.\u003c/p\u003e","description":"","filename":"Binder12.png","url":"https://assets-eu.researchsquare.com/files/rs-8406553/v1/55c94832f623e0d3743e61dd.png"},{"id":100408604,"identity":"7fa5d2c6-1fc9-4980-90e1-5377ff1cbefd","added_by":"auto","created_at":"2026-01-16 13:06:21","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":133687,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003e\u003cstrong\u003eFUT2\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e influenced the presence and abundance of three tongue dorsum species, regardless of \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eABO\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003eblood groups.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ea-b, \u003c/strong\u003eThe strongest signal was for \u003cem\u003eFUT2\u003c/em\u003e missense SNP rs1047781 A \u0026gt; T (A418T, IIe129Phe, determines the secretor status of ABO blood group antigens in Asians) associated with not only the presence/absence status of three tongue dorsum species, namely \u003cem\u003eHaemophilus sputorum\u003c/em\u003e, \u003cem\u003eGranulicatella SGB8239\u003c/em\u003eand \u003cem\u003eVeillonella SGB6928\u003c/em\u003e, but also the relative abundance of the three microbial species. \u003cstrong\u003ec-d\u003c/strong\u003e, The secreted individuals determined by genotype rs1047781-AA/TA have an average higher prevalence and relative abundances of the three microbial species than the non-secreted ones of genotypes rs1047781-TT. \u003cstrong\u003ee-f\u003c/strong\u003e, Individuals with different ABO blood groups (A, AB, B, O) showed weak or no differences in the prevalence or relative abundance of these species. \u003cstrong\u003eg-h\u003c/strong\u003e, After stratification by ABO blood groups, secretor status remained consistently associated with increased prevalence and abundance of the three species, indicating effects independent of ABO blood groups. All statistical comparisons were performed using the Wilcoxon rank-sum test on relative abundance values. In box plots, the centre line represents the median, boxes denote the interquartile range (IQR), and whiskers extend to 1.5 × IQR. Violin plots show the kernel density distribution of the data.\u003c/p\u003e","description":"","filename":"Binder13.png","url":"https://assets-eu.researchsquare.com/files/rs-8406553/v1/c5156cc1668fed6fe31c7b73.png"},{"id":100408458,"identity":"6b0bc810-41a7-4076-ba0a-a5f7ec68939d","added_by":"auto","created_at":"2026-01-16 13:06:16","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":89417,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eHost \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eFUT2\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e polymorphism shapes tongue dorsum microbial gene families associated with fucose metabolism.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ea\u003c/strong\u003e, Manhattan plot displaying the association of the 74 genome-wide significant MAL with 700,000 microbial gene families annotated using HUMAnN3. The well replicated associations in either of the two validation cohorts with the same effect direction were marked in red, while the others were marked in grey. The top three well-replicated host genes that reached the study-wide significance were marked. The top four associated microbial gene families with \u003cem\u003eFUT2\u003c/em\u003e loci were listed. \u003cstrong\u003eb\u003c/strong\u003e, Schematic of the \u003cem\u003eFUT2\u003c/em\u003e-mediated fucosylation of the H antigen biosynthesis pathway. In homozygous carriers of the \u003cem\u003eFUT2\u003c/em\u003e rs1047781-TT variant (non-secretors), the absence of functional \u003cem\u003eFUT2\u003c/em\u003e leads to a lack of fucosylated H antigen on mucosal surfaces, whereas in secretors, type I H-antigen is synthesized and can be further modified into A-antigen or B-antigen. \u003cstrong\u003ec\u003c/strong\u003e, Proposed mechanism schematic diagram of host-microbe interactions shaped by \u003cem\u003eFUT2\u003c/em\u003e polymorphism. In secretor individuals, H antigens are synthesized intracellularly by the \u003cem\u003eFUT2\u003c/em\u003e-encoded fucosyltransferase and subsequently secreted into saliva. Depending on the individual's ABO genotype, intracellular A or B glycosyltransferases can convert H antigens into A or B antigens, which are also secreted into saliva. O blood group individuals lack A or B glycosyltransferases, resulting in saliva predominantly containing H antigens. In non-O secretors, salivary A and B antigens may be hydrolyzed by oral microbiota into H antigen precursors and corresponding products, which are further converted by \u003cem\u003eStreptococcus\u003c/em\u003e-derived α-L-fucosidases into α-L-fucopyranose, subsequently utilized by \u003cem\u003eHaemophilus sputorum\u003c/em\u003e via the FUCCAT-PWY. Gene families encoding key enzymes in this pathway are significantly associated with the rs1047781 locus, indicating that host genotype directly modulates microbial metabolic potential.\u003c/p\u003e","description":"","filename":"Binder14.png","url":"https://assets-eu.researchsquare.com/files/rs-8406553/v1/4d7449c424964a4a7315e3ed.png"},{"id":100408687,"identity":"851563d2-2b15-4ad7-b187-a2d9414ec351","added_by":"auto","created_at":"2026-01-16 13:06:24","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":227414,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePheWAS analysis linked MAL to host metabolic and immune traits.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ea,\u003c/strong\u003e Cross-trait Manhattan plots showing associations of MAL with host traits. The top panel summarizes corresponding associations of MAL with phenotypes in the BBJ and GWAS catalog, and the three lower panels show associations of MAL with serum biochemical traits, oral health indices, and diseases in the CHARLS cohort. Study-wide significant MAL and their associated host traits were detailly listed. For the \u003cem\u003eSLC2A9\u003c/em\u003e locus, the associated p-values for urate and uric acid overpassing 10\u003csup\u003e-300 \u003c/sup\u003ewere\u003csup\u003e \u003c/sup\u003elimited to 10\u003csup\u003e-300 \u003c/sup\u003efor the plot. \u003cstrong\u003eb,\u003c/strong\u003e Association of the \u003cem\u003eSLC2A9\u003c/em\u003e SNP rs3796835 with serum uric acid levels in the CHARLS and 4DSZ cohorts, respectively. \u003cstrong\u003ec,\u003c/strong\u003e Association between \u003cem\u003eSLC2A9\u003c/em\u003e SNP\u003cem\u003e \u003c/em\u003ers3796835 with the oral bacterial genus \u003cem\u003eLachnoanaerobaculum\u003c/em\u003e in CHARLS and 4DSZ cohorts, respectively. \u003cstrong\u003ed,\u003c/strong\u003e Positive correlation between \u003cem\u003eLachnoanaerobaculum\u003c/em\u003e abundance and serum uric acid levels across both cohorts. Spearman’s correlation coefficient (Rho) and p-values are shown. \u003cstrong\u003ee,\u003c/strong\u003e Schematic representation of the interaction among \u003cem\u003eSLC2A9\u003c/em\u003e SNP rs3796835, serum uric acid, and uric acid-degrading bacteria: \u003cem\u003eSLC2A9\u003c/em\u003e as a uric acid transporter, its minor allele T of SNP rs3796835 associated with higher serum uric acid level, and higher serum uric acid level promoted the growth of uric acid-degrading bacteria such as \u003cem\u003eLachnoanaerobaculum\u003c/em\u003e and \u003cem\u003eLachnoanaerobaculum umeaense \u003c/em\u003ethat harbored the uric acid-utilizing gene cluster \u003cem\u003eygeX\u003c/em\u003e, \u003cem\u003eygeY\u003c/em\u003e, \u003cem\u003eygeW\u003c/em\u003e, \u003cem\u003essnA\u003c/em\u003e, \u003cem\u003eygfK\u003c/em\u003e, etc..\u003c/p\u003e","description":"","filename":"Binder15.png","url":"https://assets-eu.researchsquare.com/files/rs-8406553/v1/763993025153f229c16295b2.png"},{"id":100408223,"identity":"56284f59-de32-4c31-a905-497fdf764fe3","added_by":"auto","created_at":"2026-01-16 13:05:48","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":38429,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eInteractions among the \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003ePOLI\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e locus, the presence and abundance of \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eHaemophilus parahaemolyticus\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e, and host immunometabolic traits.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ea–b\u003c/strong\u003e, Association of the lead SNP rs12954177 at the \u003cem\u003ePOLI\u003c/em\u003e locus with the presence of \u003cem\u003eH. parahaemolyticus\u003c/em\u003e in the CHARLS discovery and replication datasets, respectively. \u003cstrong\u003ec–d\u003c/strong\u003e, Association of \u003cem\u003ePOLI\u003c/em\u003e rs12954177 with the relative abundance of \u003cem\u003eH. parahaemolyticus\u003c/em\u003e in the CHARLS discovery and replication datasets, respectively. \u003cstrong\u003ee\u003c/strong\u003e, Volcano plot showing significant negative correlations between \u003cem\u003eH. parahaemolyticus\u003c/em\u003e presence status and host immunometabolic traits, including white blood cell count (WBC), triglycerides (TG), and hemoglobin (HGB). \u003cstrong\u003ef\u003c/strong\u003e, Comparison of WBC levels between \u003cem\u003eH. parahaemolyticus\u003c/em\u003e–absent and –present individuals. P-values were calculated using the GLM test with adjustment for age, sex, BMI and the top ten host PCs in both \u003cstrong\u003ee-f\u003c/strong\u003e.\u003c/p\u003e","description":"","filename":"Binder16.png","url":"https://assets-eu.researchsquare.com/files/rs-8406553/v1/65903bdda34a8975610673bd.png"},{"id":100408345,"identity":"db142a64-17ec-4bc5-a991-3bc4dfa01f1c","added_by":"auto","created_at":"2026-01-16 13:06:01","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":62950,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eThe observational correlations and inferred causal relationships between tongue dorsum microbiome and host traits.\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ea\u003c/strong\u003e, Heatmap showing the observational correlation results of 94 heritable microbial features and 43 host traits with 239 significant associations between them after Bonferroni correction (\u003cem\u003eP\u003c/em\u003e \u0026lt; 1.22 × 10\u003csup\u003e−5\u003c/sup\u003e) marked in cells with “*”. The GLM test was used for observational correlation analysis, adjusting for age, sex, BMI and the top ten host PCs. \u003cstrong\u003eb\u003c/strong\u003e, Forest plot showing the MR estimates and 95% CI values of eleven causal relationships between microbial features and host traits. The beta estimates and p-values from the observational correlation, one-sample (TSLS) and two-sample (GCTA-GSMR) MR analysis were listed.\u003c/p\u003e","description":"","filename":"Binder17.png","url":"https://assets-eu.researchsquare.com/files/rs-8406553/v1/21e553473f9df60b0e696ba8.png"},{"id":102295679,"identity":"e8da87ea-d746-4718-bde5-579d4fa21f9b","added_by":"auto","created_at":"2026-02-10 10:13:49","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":3302162,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8406553/v1/fc0056ed-4b9e-45df-86cb-9c629aa59b75.pdf"},{"id":100408558,"identity":"39f056fc-8a3b-44e5-8da9-917c5e5baabc","added_by":"auto","created_at":"2026-01-16 13:06:21","extension":"xlsx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":8262354,"visible":true,"origin":"","legend":"Supplementary Table 1-20","description":"","filename":"SupplementaryTables1216.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-8406553/v1/13630e654a1ee1a2b38fbf41.xlsx"},{"id":100408237,"identity":"89d5206a-a9c0-4991-9826-b6fabcac076b","added_by":"auto","created_at":"2026-01-16 13:05:49","extension":"docx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":127525070,"visible":true,"origin":"","legend":"Supplementary Figure 1-14","description":"","filename":"SupplementaryFigures.charls.mgwas.20251118.docx","url":"https://assets-eu.researchsquare.com/files/rs-8406553/v1/7002d5d993dc6a14737428a1.docx"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"Host genetic control of the oral microbiome and its links to human metabolism and immunity","fulltext":[{"header":"Introduction","content":"\u003cp\u003eThe importance of microbiota in human health and diseases has been increasingly highlighted as the sequencing technology developed\u003csup\u003e1–4\u003c/sup\u003e. While the gut microbiome has long been the central focus of related research, the oral cavity, as a highly dynamic microbial environment that harbors diverse microbial communities, is increasingly recognized for its impact on both local and systemic health\u003csup\u003e5\u003c/sup\u003e. The oral microbial community is not only closely associated with local oral diseases such as dental caries and periodontitis, but its dysbiosis can also influence the risk of cardiovascular diseases, diabetes, gastrointestinal tumors, and other conditions through pathways such as inducing systemic inflammation, releasing specific metabolites, and modulating host immune responses\u003csup\u003e6–9\u003c/sup\u003e. Despite the significant clinical implications of the oral microbiome, research into its genetic and environmental determinants remains insufficient, particularly in non-European populations. This knowledge gap limits our ability to translate microbiome-host interactions into precision medicine and underscores the urgent need for population-specific studies.\u003c/p\u003e\n\u003cp\u003eThe metagenome-genome-wide association study (M-GWAS) has emerged as a powerful tool for deciphering host genetic influences on the microbiome. In several large-scale M-GWAS studies, the genetic effects of \u003cem\u003eLCT\u0026nbsp;\u003c/em\u003eand \u003cem\u003eABO\u0026nbsp;\u003c/em\u003elocus variants on gut microbial abundance have been consistently replicated\u003csup\u003e10–12\u003c/sup\u003e. However, these studies primarily focused on fecal microbiomes while overlooking the oral cavity—an evolutionarily conserved site where host-microbe interactions occur more directly. Notably, a recent preprint using a large European cohort identified 11 significant host genetic signals affecting the salivary microbiota. This study mainly investigated populations of European ancestry\u003csup\u003e13,14\u003c/sup\u003e, neglecting the genetic and environmental diversity of non-European groups, which may obscure population-specific host-microbe interaction signals shaped by localized evolutionary pressures\u003csup\u003e15\u003c/sup\u003e, dietary habits\u003csup\u003e16\u003c/sup\u003e, and environmental exposures\u003csup\u003e17\u003c/sup\u003e. Our previous research based on a Chinese 4DSZ cohort identifies three and two study-wide significant host genetic determinants of tongue dorsum microbiota and salivary microbiota respectively\u003csup\u003e18\u003c/sup\u003e, and demonstrated that host genetic factors explain more variation in the oral microbiome (tongue dorsum and saliva) than environmental factors, highlighting that host–microbe interactions in other host body niches extend beyond the gut microbiome. Compared to the fluid saliva, which acts as a mixing reservoir, the tongue dorsum constitutes a more stable and nutrient-rich ecological niche with structured biofilms\u003csup\u003e19–21\u003c/sup\u003e. This enhanced temporal stability, as evidenced by the persistence of specific strain mixtures over extended periods\u003csup\u003e22\u003c/sup\u003e, makes the tongue microbiome a superior model for discerning the subtle effects of host genetics from transient environmental influences. Although our study identified several significant host genetic signals, it was limited to 2,984 younger individuals with a mean age of 30 years, resulting in insufficient population representativeness and detection power\u003csup\u003e18\u003c/sup\u003e. Expanding M-GWAS analyses to larger, more geographically and age-diverse natural Chinese populations will be essential for identifying stable and robust associations between host genetics and oral microbiota.\u003c/p\u003e\n\u003cp\u003eHere, we conducted a large-scale tongue dorsum M-GWAS involving 11,380 individuals from the CHARLS cohort through whole genome sequencing and whole metagenomic data integration, and further incorporating our previous 4DSZ cohort (2,017 out of the 2,984 individuals with high-depth\u0026nbsp;whole genome and tongue dorsum metagenome sequencing data) to comprehensively investigate the host genetic determinants of the tongue dorsum microbiome. Additionally, our cohort includes basic questionnaires, blood test parameters, dental conditions, and disease-related phenotypic data. Through microbiome and host phenotypic GWAS, observational correlation analysis, and Mendelian randomization, we further explored the interactions and potential causal relationships between the tongue dorsum microbiota and host blood chemistry, dental, and disease phenotypes. This study not only reveals mechanisms of the host-microbe interaction but also provides data references for developing targeted microbial interventions and therapies.\u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003e\u003cstrong\u003eHost genetic structure and age strongly influenced tongue dorsum microbiome composition in an extensive Chinese multi-omics dataset\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo systematically characterize the host genetic effects on the oral microbiome, we assembled a large-scale, high-quality multi-omics Chinese cohort (CHARLS-4DSZ) comprising 13,397 individuals, integrating an extensive dataset that included host whole-genome sequencing (WGS), tongue dorsum whole metagenome sequencing (WMS), blood biochemistry,\u0026nbsp;and phenotypic information. After strict sample selection and quality control (\u003cstrong\u003eSupplementary Fig. 1\u003c/strong\u003e), the CHARLS-4DSZ cohort encompassed three distinct sub-datasets (\u003cstrong\u003eFig 1a, Supplementary Table 1\u003c/strong\u003e): (1) the primary CHARLS discovery dataset (N = 8,331, mean age of 65 years; average depth of 20× for blood WGS and mean 17.60 ± 3.61Gb for tongue dorsum WMS; \u003cstrong\u003eSupplementary Fig. 2a, b\u003c/strong\u003e); (2) the CHARLS replication dataset (N = 3,049, mean age of 65 years; an average 20.07 ± 5.13 Gb for tongue dorsum WMS of which mean 4.2× host reads achieved by aligning sequencing reads to the human reference genome, \u003cstrong\u003eMethods\u003c/strong\u003e;\u003cstrong\u003e\u0026nbsp;Supplementary Fig. 2c, d\u003c/strong\u003e), and (3) the 4DSZ dataset collected in year of 2018 from our prior work (N = 2,017, mean age of 30 years; average depth of 33× for blood WGS and mean 19.18 ± 7.90 Gb for tongue dorsum WMS; \u003cstrong\u003eSupplementary Fig. 2e, f\u003c/strong\u003e). Prior to integrated analysis, we assessed the overall genetic structural similarity of the individuals across the CHARLS discovery, replication, and 4DSZ cohorts. Principal component analysis (PCA) on host genetics revealed no evident stratification of the three Chinese cohorts. It represented the typical Chinese ethnic population structure with the first principal component (PC1) distinguishing the northern and southern Chinese and the second principal component (PC2) distinguishing the west-to-east populations, consistent with reported Chinese population studies\u003csup\u003e23,24\u003c/sup\u003e (\u003cstrong\u003eSupplementary Fig. 3a\u003c/strong\u003e). All three cohorts were clustered within the East Asian genetic backgrounds. They were clearly separated from other ethnic populations, such as African, American, European, and South Asian, when compared with the 1000 Genomes dataset (\u003cstrong\u003eSupplementary Fig. 3b\u003c/strong\u003e), thereby minimizing confounding due to population stratification. The principal coordinates analysis (PCoA) of microbial communities indicated a slight deviation of the replication cohort from the other two cohorts along PCoA1 and PCoA2, characterized by higher Streptococcus abundance and lower Prevotella abundancein the replication cohort(\u003cstrong\u003eSupplementary Fig. 4\u003c/strong\u003e,\u003cstrong\u003e\u0026nbsp;Supplementary Table 2\u003c/strong\u003e). \u0026nbsp;\u003c/p\u003e\n\u003cp\u003eWe next assessed the influence of host confounders, including age, sex, BMI, and the top ten host genetic principal components (PCs), on microbiome diversity using multivariable linear regression models with \u003cem\u003eBonferroni\u003c/em\u003e correction (\u003cstrong\u003eFig. 1b\u003c/strong\u003e,\u003cstrong\u003e\u0026nbsp;Supplementary Table 3\u003c/strong\u003e). Sex and host PC2 were significantly associated with the two microbial alpha-diversity indices, namely the Shannon and Simpson indices (\u003cem\u003eP\u003csub\u003eBonferroni\u003c/sub\u003e\u0026nbsp;\u003c/em\u003e\u0026lt; 1.00 × 10\u003csup\u003e−5\u003c/sup\u003e). Age emerged as a significant negative predictor of microbial Simpson diversity (β = −0.076, \u003cem\u003eP\u003csub\u003eBonferroni\u0026nbsp;\u003c/sub\u003e\u003c/em\u003e= 3.52 × 10\u003csup\u003e−16\u003c/sup\u003e), reinforcing an established trend of microbial richness decline with aging. Notably, host genetic ancestry, captured by PC1 and PC2, exerted the strongest associations with the microbial top two principal components PCoA1 (\u003cem\u003eP\u003csub\u003eBonferroni\u003c/sub\u003e\u0026nbsp;\u003c/em\u003e= 1.93\u0026nbsp;× 10\u003csup\u003e−11\u0026nbsp;\u003c/sup\u003efor PC1 and\u003cem\u003eP\u003csub\u003eBonferroni\u003c/sub\u003e\u0026nbsp;\u003c/em\u003e= 0.038 for PC2) and PCoA2 (\u003cem\u003eP\u003csub\u003eBonferroni\u003c/sub\u003e\u0026nbsp;\u003c/em\u003e= 5.35\u0026nbsp;× 10\u003csup\u003e−6\u0026nbsp;\u003c/sup\u003efor PC1 and\u003cem\u003eP\u003csub\u003eBonferroni\u003c/sub\u003e\u0026nbsp;\u003c/em\u003e= 7.81\u0026nbsp;× 10\u003csup\u003e−37\u003c/sup\u003e for PC2), which collectively explained over 40% of the variance of the total microbial composition, highlighting the role of genetic background in shaping microbial ecology. When estimating the effects of host potential confounders on beta diversity, age, sex, host PC1, and PC2 consistently contributed the most to the microbial composition (each explained variance R\u003csup\u003e2\u0026nbsp;\u003c/sup\u003e\u0026gt; 0.2%; \u003cem\u003eP\u0026nbsp;\u003c/em\u003e= 0.001 for 1,000 permutations in the permutational multivariate analysis of variance (PERMANOVA) test; \u003cstrong\u003eFig. 1b\u003c/strong\u003e, \u003cstrong\u003eSupplementary Table 4\u003c/strong\u003e). These results underscore that host genetic structure, age, and sex significantly shape the oral microbiome in this Chinese cohort.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eHost genetic loci significantly associated with\u003c/strong\u003e\u003cstrong\u003emicrobial taxa and pathways\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWith this so far, the largest cohort of the whole genome and whole metagenome data, we first performed M-GWAS analysis on 5.6 million common (minor allele frequency (MAF) \u0026gt; 0.05) genetic variants to test their association with independent 534 taxa and 311 bacterial pathways with prevalence \u0026gt; 10% in participants from the CHARLS discovery dataset (\u003cstrong\u003eSupplementary Tables 5 and 6\u003c/strong\u003e). M-GWAS was performed using linear regression for relative abundance traits and logistic regression for presence–absence traits of bacterial taxa or pathways, with adjustment for age, sex, BMI, and the top ten host genetic PCs (\u003cstrong\u003eMethods\u003c/strong\u003e). Next, we performed the same M-GWAS analysis on the two validation datasets: namely, the CHARLS replication and 4DSZ datasets. To ensure robustness of results, genome-wide significant results (\u003cem\u003eP\u003c/em\u003e \u0026lt; 5 × 10\u003csup\u003e−8\u003c/sup\u003e) from the discovery dataset were defined as replicated when supported by nominal significance (\u003cem\u003eP\u0026nbsp;\u003c/em\u003e\u0026lt; 0.05) and by a fully consistent effect direction for the same allele in at least one validation dataset. Finally, per-dataset results were combined in a meta-analysis.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIn the discovery stage, we identified 74 independent host genetic loci significantly associated with 66 microbial taxa and 28 functional pathways, meeting genome-wide significance (\u003cem\u003eP\u003c/em\u003e \u0026lt; 5 × 10\u003csup\u003e−8\u003c/sup\u003e; r\u003csup\u003e2\u0026nbsp;\u003c/sup\u003e\u0026lt; 0.1 in the ± 1 Mb flanking region; \u003cstrong\u003eSupplementary Table 7\u003c/strong\u003e).\u0026nbsp;Among these, the associations of nine genetic loci with 14 taxa and four pathways were well replicated (\u003cstrong\u003eFig. 2, Table 1\u003c/strong\u003e):\u0026nbsp;\u003cem\u003eFUT2\u003c/em\u003e,\u0026nbsp;\u003cem\u003ePOLI\u003c/em\u003e-\u003cem\u003eC18orf54\u003c/em\u003e,\u0026nbsp;\u003cem\u003eAMY1C\u003c/em\u003e-\u003cem\u003eAMY2B\u003c/em\u003e,\u0026nbsp;\u003cem\u003ePRB3\u003c/em\u003e-\u003cem\u003ePRB4\u003c/em\u003e,\u0026nbsp;\u003cem\u003eSLC2A9\u003c/em\u003e,\u0026nbsp;\u003cem\u003eHLA-DRA-DRB5\u003c/em\u003e,\u0026nbsp;\u003cem\u003eMRPS18A\u003c/em\u003e-\u003cem\u003eVGEFA\u003c/em\u003e\u003cem\u003e,\u0026nbsp;\u003c/em\u003e\u003cem\u003eZFR\u003c/em\u003e-\u003cem\u003eSUB1\u003c/em\u003e, and\u003cem\u003eTMCO3\u003c/em\u003e-\u003cem\u003eTFDP1\u003c/em\u003e\u003cem\u003e.\u0026nbsp;\u003c/em\u003eAfter applying for a more conservative Bonferroni correction for the number of features tested (\u003cem\u003eP\u0026nbsp;\u003c/em\u003e\u0026lt; 9.36 × 10\u003csup\u003e−11\u003c/sup\u003e for 534 taxa and \u003cem\u003eP\u0026nbsp;\u003c/em\u003e\u0026lt; 1.60 × 10\u003csup\u003e−10\u003c/sup\u003e for 311 pathways), we identified seven study-wide significant genomic loci, namely \u003cem\u003eFUT2\u003c/em\u003e, \u003cem\u003ePOLI\u003c/em\u003e-\u003cem\u003eC18orf54\u003c/em\u003e,\u0026nbsp;\u003cem\u003eAMY1C\u003c/em\u003e-\u003cem\u003eAMY2B\u003c/em\u003e,\u0026nbsp;\u003cem\u003eSLC2A9\u003c/em\u003e,\u0026nbsp;\u003cem\u003eHLA\u003c/em\u003e-\u003cem\u003eDRA\u003c/em\u003e-\u003cem\u003eDRB5\u003c/em\u003e and \u003cem\u003eMGST1\u003c/em\u003e, significantly associated with 16 tongue dorsum microbial features. An additional locus in \u003cem\u003eMTMR6\u003c/em\u003e-\u003cem\u003eNUP58\u003c/em\u003e also reached the study-wide significance in the meta-analysis, although it did not reach the genome-wide significance in the discovery dataset (\u003cstrong\u003eTable 1\u003c/strong\u003e). Together, we discovered ten genome-wide significant and well-replicated loci, including eight study-wide significant loci, which were associated with 17 microbial taxa and eight pathways. Four loci showed pleiotropic effects and were related to multiple taxa and pathways: \u003cem\u003eFUT2\u003c/em\u003e, \u003cem\u003ePOLI\u003c/em\u003e\u003cem\u003e–C18orf54\u003c/em\u003e\u003cem\u003e, SLC2A9,\u0026nbsp;\u003c/em\u003eand\u0026nbsp;\u003cem\u003eMTMR6\u003c/em\u003e. There was no evidence of any excess false positive rate in the GWAS analyses (genomic inflation factors λ\u003csub\u003eGC\u003c/sub\u003e ranged from 0.981 to 1.045 with a median of 1.01; \u003cstrong\u003eSupplementary Fig. 5\u003c/strong\u003e). All the genome-wide significant associations identified in the discovery cohort and their replication, as well as meta-analysis results, were listed in the\u0026nbsp;\u003cstrong\u003eSupplementary Table 8.\u003c/strong\u003e\u0026nbsp; \u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe strongest signal was observed for the missense mutation \u003cem\u003eFUT2\u003c/em\u003e rs1047781 (A\u0026gt;T, p.I140F, resulting in an Ile140Phe amino acid substitution), an Eastern Asian-specific common variant (MAF = 0.439) that determines ABO antigen secretor status\u003csup\u003e25\u003c/sup\u003e. This variant showed significant associations with multiple tongue dorsum taxa and pathways (\u003cstrong\u003eFig. 3\u003c/strong\u003e), including the presence/absence status of three tongue dorsum species, namely \u003cem\u003eHaemophilus\u0026nbsp;\u003c/em\u003e\u003cem\u003esputorum\u003c/em\u003e (\u003cem\u003eP\u003c/em\u003e\u003csub\u003emeta\u0026nbsp;\u003c/sub\u003e= 9.71 × 10\u003csup\u003e−\u003c/sup\u003e\u003csup\u003e51\u003c/sup\u003e), \u003cem\u003eGranulicatella SGB8239\u003c/em\u003e (\u003cem\u003eP\u003c/em\u003e\u003csub\u003emeta\u0026nbsp;\u003c/sub\u003e= 3.80\u0026nbsp;×\u0026nbsp;10\u003csup\u003e−\u003c/sup\u003e\u003csup\u003e21\u003c/sup\u003e), and \u003cem\u003eVeillonella SGB6928\u0026nbsp;\u003c/em\u003e(\u003cem\u003eP\u003c/em\u003e\u003csub\u003emeta\u0026nbsp;\u003c/sub\u003e= 2.64\u0026nbsp;×\u0026nbsp;10\u003csup\u003e−\u003c/sup\u003e\u003csup\u003e18\u003c/sup\u003e), as well as the relative abundance of the three microbial species (\u003cstrong\u003eFig. 3a,b\u003c/strong\u003e). The A-allele of rs1047781 defines secretor status, with AA/TA genotypes representing secretors and TT genotypes indicating weak or non-secretors. Compared to non-secretor individuals, the secretor individuals exhibited a significantly higher average prevalence\u0026nbsp;(\u003cem\u003eP\u003c/em\u003e = 1.1\u0026nbsp;×\u0026nbsp;10\u003csup\u003e−\u003c/sup\u003e\u003csup\u003e23\u003c/sup\u003e ~ 7.6\u0026nbsp;×\u0026nbsp;10\u003csup\u003e−\u003c/sup\u003e\u003csup\u003e62\u003c/sup\u003e,\u003cstrong\u003e\u0026nbsp;Fig. 3c\u003c/strong\u003e)\u0026nbsp;and relative abundances of these three taxa (\u003cem\u003eP\u003c/em\u003e= 5.2\u0026nbsp;×\u0026nbsp;10\u003csup\u003e−\u003c/sup\u003e\u003csup\u003e4\u003c/sup\u003e ~ 7.9×10\u003csup\u003e−\u003c/sup\u003e\u003csup\u003e12\u003c/sup\u003e,\u003cstrong\u003e\u0026nbsp;Fig. 3d)\u003c/strong\u003e. Because \u003cem\u003eFUT2\u003c/em\u003e I140F determines the secretor status of \u003cem\u003eABO\u003c/em\u003e blood group antigens, we next examined the potential interactions between \u003cem\u003eFUT2\u003c/em\u003e and \u003cem\u003eABO\u003c/em\u003e on these three microbial taxa. We inferred blood groups according to the genotypes of three genetic variants in the East Asian population (Methods),\u0026nbsp;yielding the following blood type distribution: O (33.5%), B (29.3%), A (28.6%), and AB (8.6%). No significant differences in the prevalence or abundances of these three bacteria were observed among ABO blood groups (O, A, B, and AB; \u003cstrong\u003eFig. 3e, f\u003c/strong\u003e).\u0026nbsp;Notably, regardless of ABO blood groups, \u003cem\u003eFUT2\u003c/em\u003e-determined secretor individuals consistently showed higher bacterial prevalence and abundance across all three bacterial species than non-secretors (\u003cstrong\u003eFig. 3g, h\u003c/strong\u003e). In contrast to previous gut microbiome studies reporting an interaction between the \u003cem\u003eABO\u003c/em\u003e and \u003cem\u003eFUT2\u0026nbsp;\u003c/em\u003egenotypes\u003csup\u003e12,26\u003c/sup\u003e, our study indicated that \u003cem\u003eFUT2\u003c/em\u003e exerted a dominant and ABO-independent effect on the tongue dorsum microbiome.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIn addition to the \u003cem\u003eFUT2\u003c/em\u003e loci, the other identified microbiome-associated loci (MAL) were not randomly distributed but were significantly enriched in genes involved in key biological pathways central to host-microbiome interaction (\u003cstrong\u003eSupplementary Fig. 6\u003c/strong\u003e):\u003c/p\u003e\n\u003col\u003e\n \u003cli\u003e\u003cstrong\u003eImmune recognition and antigen presentation.\u003c/strong\u003e For instance, \u003cem\u003eFUT2\u0026nbsp;\u003c/em\u003edetermines the mucosal antigen synthesis and glycosylation patterns that are known to regulate bacterial adhesion and the gut microbiome composition\u003csup\u003e27,28\u003c/sup\u003e; \u003cem\u003eHLA-DRA\u003c/em\u003e-\u003cem\u003eDRB5\u003c/em\u003e, participating in the host adaptive immunity through MHC-II mediated antigen presentation\u003csup\u003e29\u003c/sup\u003e, was associated with an unknown \u003cem\u003eSGB2208\u0026nbsp;\u003c/em\u003efrom the phylum Bacteroidetes (\u003cem\u003eP\u003c/em\u003e\u003csub\u003emeta\u0026nbsp;\u003c/sub\u003e= 2.13 × 10\u003csup\u003e−51\u003c/sup\u003e);\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eNutrient metabolism and digestion.\u003c/strong\u003e For example, \u003cem\u003eAMY1\u003c/em\u003e, encoding the salivary amylase that promots dietary starch digestion and directly influences nutrient availability for carbohydrate-metabolizing bacteria\u003csup\u003e30,31\u003c/sup\u003e, was associated with the abundance of genus \u003cem\u003eStomatobaculum\u0026nbsp;\u003c/em\u003e(\u003cem\u003eP\u003c/em\u003e\u003csub\u003emeta\u0026nbsp;\u003c/sub\u003e= 2.22 × 10\u003csup\u003e−10\u003c/sup\u003e), and \u003cem\u003eStomatobaculum\u003c/em\u003e was observed to be enriched in the cluster ASV1 (highest sucrose intake) than cluster ASV2 (lowest sucrose intake)\u003csup\u003e32\u003c/sup\u003e. \u003cem\u003ePRB\u003c/em\u003e, encoding the basic salivary proline-rich proteins that modulate oral lubrication and bacterial aggregation\u003csup\u003e33\u003c/sup\u003e, was associated with the abundance of\u003cem\u003e\u0026nbsp;Streptococcus infantis\u003c/em\u003e (\u003cem\u003eP\u003c/em\u003e\u003csub\u003emeta\u0026nbsp;\u003c/sub\u003e=\u0026nbsp;1.08\u0026nbsp;× 10\u003csup\u003e−12\u003c/sup\u003e); \u003cem\u003eSLC2A9,\u0026nbsp;\u003c/em\u003eas the urate transport that affects serum uric acid levels and is associated with multiple microbial communities, especially for \u003cem\u003eLachnoanaerobaculum\u0026nbsp;\u003c/em\u003e(\u003cem\u003eP\u003c/em\u003e\u003csub\u003emeta\u0026nbsp;\u003c/sub\u003e=\u0026nbsp;3.46\u0026nbsp;× 10\u003csup\u003e−15\u003c/sup\u003e);\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eCellular housekeeping, signaling, and transport mechanisms\u003c/strong\u003e. For example, \u003cem\u003ePOLI\u003c/em\u003e-\u003cem\u003eC18orf54\u0026nbsp;\u003c/em\u003elocus, involved in DNA repair and genomic maintenance, was associated with the presence/absence status of \u003cem\u003eHaemophilus parahaemolyticus\u0026nbsp;\u003c/em\u003e(\u003cem\u003eP\u003c/em\u003e\u003csub\u003emeta\u0026nbsp;\u003c/sub\u003e= 4.47 × 10\u003csup\u003e−19\u003c/sup\u003e); \u003cem\u003eMRPS18A\u003c/em\u003e–\u003cem\u003eVEGFA\u003c/em\u003e, linking mitochondrial ribosomal function to angiogenesis and tissue microenvironment regulation\u003csup\u003e34\u003c/sup\u003e, was associated with a Proteobacteria SGB19290 (\u003cem\u003eP\u003c/em\u003e\u003csub\u003emeta\u0026nbsp;\u003c/sub\u003e=\u0026nbsp;5.36\u0026nbsp;× 10\u003csup\u003e−11\u003c/sup\u003e); both \u003cem\u003eMTMR6\u0026nbsp;\u003c/em\u003eand \u003cem\u003eMTMR12\u0026nbsp;\u003c/em\u003eloci, involved in the phosphoinositide signaling and nucleocytoplasmic transport that maintain the normal cell function and may regulate cell immune response\u003csup\u003e35,36\u003c/sup\u003e, were associated with multiple microbial traits: \u003cem\u003eMTMR12\u0026nbsp;\u003c/em\u003ewith Bacteroidetes CFGB570 (\u003cem\u003eP\u003c/em\u003e\u003csub\u003emeta\u0026nbsp;\u003c/sub\u003e=\u0026nbsp;2.13\u0026nbsp;× 10\u003csup\u003e−51\u003c/sup\u003e) and \u003cem\u003eMTMR6\u003c/em\u003e with microbial pathways (most significant for PWY-2941: L-lysine biosynthesis II;\u0026nbsp;\u003cem\u003eP\u003c/em\u003e\u003csub\u003emeta\u0026nbsp;\u003c/sub\u003e=\u0026nbsp;2.82\u0026nbsp;× 10\u003csup\u003e−11\u003c/sup\u003e), respectively.\u0026nbsp;\u003c/li\u003e\n\u003c/ol\u003e\n\u003cp\u003eTogether, this robust functional clustering, particularly the enrichment in immune-related pathways and metabolic processes, suggests that these genetic variants exert their influence by modulating the host physiological landscape, including alterations in nutrient availability, immune surveillance, and cellular homeostasis, thereby defining the ecological niche of resident microbiota.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIn addition to microbial taxa and pathways, we also tested the associations between host genetics and microbial diversity. No genome-wide significant associations were detected for microbial alpha diversity (Shannon and Simpson index). Five independent loci, \u003cem\u003eLINC01739-LINC00466\u003c/em\u003e (PCoA1), \u003cem\u003eBMP2-LINC01428\u003c/em\u003e (PCoA2), \u003cem\u003eHDAC9\u003c/em\u003e (PCoA7), \u003cem\u003eCLDN10\u003c/em\u003e (PCoA9), and \u003cem\u003eFNDC3B\u003c/em\u003e (PCoA10), showed significant associations with at least one of the microbial top ten PCoAs (\u003cem\u003eP\u003c/em\u003e \u0026lt; 5 × 10\u003csup\u003e−\u003c/sup\u003e\u003csup\u003e8\u003c/sup\u003e, \u003cstrong\u003eSupplementary Table 9\u003c/strong\u003e,\u003cstrong\u003e\u0026nbsp;Supplementary Fig. 7\u003c/strong\u003e). The associated genes may play core roles in epithelial barrier integrity (\u003cem\u003eCLDN10\u003c/em\u003e tight junctions, \u003cem\u003eFNDC3B\u003c/em\u003e cell adhesion), chromatin remodeling (\u003cem\u003eHDAC9\u003c/em\u003e), and skeletal or cartilage formation (\u003cem\u003eBMP2\u003c/em\u003e).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eMAL linking to microbial gene families helps understand host–microbiome interactions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo further decipher the microbial gene functions involved in host genotype-microbiome interactions, we next performed a gene-family-level M-GWAS by testing associations between the 74 identified genome-wide significant MAL and ~700,000 microbial gene families, along with their contributing bacterial species. This analysis identified three loci involving 8,619 associations with 1,783 gene families, surpassing study-wide significance in the discovery dataset and replicated in at least one replication dataset (\u003cem\u003eP\u003c/em\u003e\u0026lt;\u0026nbsp;7.0\u0026nbsp;× 10\u003csup\u003e−\u003c/sup\u003e\u003csup\u003e14\u003c/sup\u003e; \u003cstrong\u003eFig. 4a, Supplementary Table 10\u003c/strong\u003e).\u003c/p\u003e\n\u003cp\u003eNotably, 98.3% (8,474/8,619) associations were for the \u003cem\u003eFUT2\u0026nbsp;\u003c/em\u003elocus that was linked to 1,773 unique gene families, with the main contributing bacteria were from the genera \u003cem\u003eHaemophilus\u0026nbsp;\u003c/em\u003eand \u003cem\u003eStreptococcus.\u0026nbsp;\u003c/em\u003eThe most significant gene families were involved in H-antigen transport and fucose metabolism, suggesting that the \u003cem\u003eFUT2\u003c/em\u003e locus shapes the functional capacity of oral microbes to utilize host-secreted antigen through the uptake, breakdown, and metabolic conversion of liberated fucose. This functional reshaping was further substantiated by our analysis of species-stratified pathway abundances, which revealed that the \u003cem\u003eFUT2\u003c/em\u003e locus was extremely significantly associated with multiple pathways contributed by specific bacteria (\u003cstrong\u003eSupplementary Fig. 8\u003c/strong\u003e). Specifically, \u003cem\u003eH. sputorum\u003c/em\u003e-contributed pathways, including fucose degradation and L-isoleucine biosynthesis, were strongly linked to \u003cem\u003eFUT2\u003c/em\u003e, providing direct evidence at the pathway level that bacteria adapt their metabolic repertoire to utilize host-derived fucose. Thus, based on the M-GWAS with taxa, pathway and gene families, we proposed the following mechanistic pathway modulated by host \u003cem\u003eFUT2\u003c/em\u003e (\u003cstrong\u003eFig. 4b, c\u003c/strong\u003e):\u0026nbsp;\u003c/p\u003e\n\u003col\u003e\n \u003cli\u003e\u003cstrong\u003eHost genotype dictates mucosal nutrient landscape\u003c/strong\u003e. At the \u003cem\u003eFUT2\u003c/em\u003e I140F locus (rs1047781, A\u0026gt;T), individuals carrying at least one functional A allele (AA or AT) are secretors and exhibit robust expression of α-1,2-linked fucose (the H-antigen) on mucosal glycans, thereby imposing a powerful selective pressure on the tongue dorsum microbiome.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eBacteria adapt to utilize host-derived fucose\u003c/strong\u003e. The mucosal H-antigen served as a primary nutrient source for the enrichment of specific bacterial gene families essential for harvesting and consuming host-derived fucose (\u003cstrong\u003eSupplementary Fig. 9\u003c/strong\u003e): (i) Recognition, binding and import: Bacteria utilize its extracellular solute-binding proteins (A0A0B7M2T5 annotated from the UniRef90 database; \u003cem\u003eP\u003c/em\u003e\u003csub\u003emeta\u003c/sub\u003e = 2.36 × 10\u003csup\u003e−\u003c/sup\u003e\u003csup\u003e76\u003c/sup\u003e) , ABC transporter permeases (e.g., A0A0T7SSD5; \u003cem\u003eP\u003c/em\u003e\u003csub\u003emeta\u003c/sub\u003e = 3.39 × 10\u003csup\u003e−\u003c/sup\u003e\u003csup\u003e77\u003c/sup\u003e), and ABC transporter substrate-binding protein (F9HPT4; \u003cem\u003eP\u003c/em\u003e\u003csub\u003emeta\u003c/sub\u003e = 2.49 × 10\u003csup\u003e−\u003c/sup\u003e\u003csup\u003e76\u003c/sup\u003e) to recognize, bind and import the H-antigen from the mucosal environment; (ii) Cleavage: the alpha-L-fucosidase (F9HM02; \u003cem\u003eP\u003c/em\u003e\u003csub\u003emeta\u003c/sub\u003e = 2.49 × 10\u003csup\u003e−\u003c/sup\u003e\u003csup\u003e72\u003c/sup\u003e primarily contributed by \u003cem\u003eStreptococcus mitis\u003c/em\u003e) removes terminal α-L-fucosyl residues from fucosylated glycans (e.g., H antigen); and (iii) Metabolism: downstream metabolic enzymes, including L-fucose-proton symporter (P44776; \u003cem\u003eP\u003c/em\u003e\u003csub\u003emeta\u003c/sub\u003e = 3.77 × 10\u003csup\u003e−\u003c/sup\u003e\u003csup\u003e37\u003c/sup\u003e) and L-fucose isomerase (B8F6Y0; \u003cem\u003eP\u003c/em\u003e\u003csub\u003emeta\u003c/sub\u003e = 1.53 × 10\u003csup\u003e−\u003c/sup\u003e\u003csup\u003e34\u003c/sup\u003e) from \u003cem\u003eHaemophilus sputorum\u003c/em\u003e, as well as L-fucose isomerases (Q97N97) from \u003cem\u003eS. mitis\u003c/em\u003e (\u003cem\u003eP\u003c/em\u003e\u003csub\u003emeta\u003c/sub\u003e = 5.54 × 10\u003csup\u003e−\u003c/sup\u003e\u003csup\u003e39\u003c/sup\u003e), \u003cem\u003eS. pneumoniae\u003c/em\u003e (\u003cem\u003eP\u003c/em\u003e\u003csub\u003emeta\u003c/sub\u003e = 4.30 × 10\u003csup\u003e−\u003c/sup\u003e\u003csup\u003e23\u003c/sup\u003e), and \u003cem\u003eS. pseudopneumoniae\u003c/em\u003e (\u003cem\u003eP\u003c/em\u003e\u003csub\u003emeta\u003c/sub\u003e = 2.51 × 10\u003csup\u003e−\u003c/sup\u003e\u003csup\u003e26\u003c/sup\u003e), convert L-fucose into L-fuculose and support its entry into the central metabolism. Thus, the host \u003cem\u003eFUT2\u003c/em\u003e function mutation determines the availability of fucosylated H-antigens in the oral ecosystem and drives microbial adaptation to the host's fucose landscape, highlighting a key mechanism of host-microbe co-metabolism.\u003c/li\u003e\n\u003c/ol\u003e\n\u003cp\u003eIn addition to \u003cem\u003eFUT2\u003c/em\u003e, two other genetic loci associated with microbial gene family abundance that passed the study-wide significance threshold were identified(\u003cstrong\u003eFig. 4a, Supplementary Table 10\u003c/strong\u003e).\u0026nbsp;These two loci were also significantly linked to the species-stratified pathways (\u003cstrong\u003eSupplementary Fig. 8\u003c/strong\u003e).\u0026nbsp;On chromosome 18, the\u0026nbsp;\u003cem\u003ePOLI-\u003c/em\u003e\u003cem\u003eC18orf54\u003c/em\u003e locus\u0026nbsp;was associated with various functional proteins\u0026nbsp;of\u0026nbsp;\u003cem\u003eS. mitis\u0026nbsp;\u003c/em\u003eand \u003cem\u003eH. parahaemolyticus\u003c/em\u003e, including exo-alpha-sialidase, bacteriophage proteins, anaerobic C4-dicarboxylate transporter DcuB, and site-specific DNA-methyltransferases\u0026nbsp;(\u003cem\u003eP\u003c/em\u003e\u0026lt;\u0026nbsp;7.3 × 10\u003csup\u003e−\u003c/sup\u003e\u003csup\u003e14\u003c/sup\u003e).\u0026nbsp;It also showed study-wide significant correlations with three pathways contributed by \u003cem\u003eH. parahaemolyticus,\u003c/em\u003e including ANAGLYCOLYSIS-PWY: glycolysis III (from glucose), PWY-1042: glycolysis IV, and COA-PWY-1: superpathway of coenzyme A biosynthesis III (mammals).\u0026nbsp;On chromosome 4, the \u003cem\u003eSLC2A9\u003c/em\u003e locus\u0026nbsp;was significantly associated with multiple gene families\u0026nbsp;of\u0026nbsp;\u003cem\u003eLeptotrichia\u0026nbsp;\u003c/em\u003esp. oral taxon 212, including transporters, DNA polymerase III subunit alpha, DUF3290 domain-containing proteins, beta-eliminating lyases, and UDP-galactopyranose mutase\u0026nbsp;(\u003cem\u003eP\u0026nbsp;\u003c/em\u003e\u0026lt; 7.3 × 10\u003csup\u003e−\u003c/sup\u003e\u003csup\u003e14\u003c/sup\u003e). Additionally, the \u003cem\u003eSLC2A9\u003c/em\u003e locus was significantly associated with three pathways contributed by \u003cem\u003eLeptotrichia\u0026nbsp;\u003c/em\u003esp. oral taxon 212, including HSERMETANA-PWY: L-methionine biosynthesis III, PEPTIDOGLYCANSYN-PWY: peptidoglycan biosynthesis I (meso-diaminopimelate containing), and PWY-3841: folate transformations II (plants). Collectively, our findings delineate a multi-layered architectural blueprint whereby human genetic variation orchestrates the\u0026nbsp;oral\u0026nbsp;microbiome.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eMAL enriched for host metabolism and immune by eQTL and PheWAS analysis\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo further explore the potential gene functions of the identified MAL, we performed functional mapping and annotation of genetic associations through mapping expression quantitative trait loci (eQTLs) and phenome-wide association studies (PheWAS). Through the colocalization of MAL with eQTLs information from the Genotype-Tissue Expression (GTEx) database, spanning 49 tissue types\u003csup\u003e37\u003c/sup\u003e, 39% (29/74) of MAL were associated with tissue-specific gene expression in 136 genes (\u003cstrong\u003eSupplementary Table 11\u003c/strong\u003e). For example, the top two MAL showing the strongest links to microbial features were associated with expression of \u003cem\u003eFUT2\u003c/em\u003e and \u003cem\u003ePOLI\u003c/em\u003e, respectively, across 12 tissues, particularly for digestive tract tissues such as esophagus mucosa, pancreas, stomach, colon transverse, small intestine terminal ileum, and minor salivary gland (\u003cstrong\u003eSupplementary Fig. 10\u003c/strong\u003e). The MAL linked to \u003cem\u003eLachnoanaerobaculum\u003c/em\u003e mainly regulated the expression of \u003cem\u003eSLC2A9\u003c/em\u003e in 18 tissues. The MAL linked to \u003cem\u003eStomatobaculum\u003c/em\u003e mainly regulated the expression of \u003cem\u003eAMY2B\u003c/em\u003e in the brain putamen and basal ganglia. The MAL linked to \u003cem\u003eStreptococcus infantis\u003c/em\u003e mainly regulated \u003cem\u003ePRH1\u003c/em\u003e expression across 18 tissues. The MAL linked to \u003cem\u003eBacteroidetes SGB2208\u003c/em\u003e mainly regulated the expression of \u003cem\u003eHLA-DQB1\u003c/em\u003e, -\u003cem\u003eDRB5\u003c/em\u003e, and -\u003cem\u003eDQA1\u003c/em\u003e in over 30 tissues. The MAL linked to \u003cem\u003eCapnocytophaga\u0026nbsp;\u003c/em\u003esp. oral taxon 863was correlated with \u003cem\u003eTMCO3\u003c/em\u003e expression in muscle and skeletal tissues.\u003c/p\u003e\n\u003cp\u003eFurther, we colocalized these MAL with single-cell-specific eQTLs and chromatin accessibility quantitative trait loci (caQTLs) from the Chinese Immune Multi-Omics Atlas (CIMA) study\u003csup\u003e38\u003c/sup\u003e. We observed that 22% (16/74) of MAL were associated with single cell-specific eQTLs or caQTLs (\u003cstrong\u003eSupplementary Table 12)\u003c/strong\u003e. The MAL linked to\u0026nbsp;\u003cem\u003eH. parahaemolyticus\u003c/em\u003e showed concordant associations with both \u003cem\u003ePOLI\u003c/em\u003e expression and chromatin accessibility\u0026nbsp;in all immune cell types, with the strongest for CD4 cells (\u003cstrong\u003eSupplementary Fig. 11\u003c/strong\u003e). The MAL linked to \u003cem\u003eLachnoanaerobaculum\u003c/em\u003e primarily regulated \u003cem\u003eSLC2A9\u003c/em\u003e expression in CD4 cells. The MAL linked to \u003cem\u003eSolobacterium\u003c/em\u003e SGB6829 were associated with \u003cem\u003eAGAP1\u003c/em\u003e expression in CD4 cells. Collectively, these findings indicate that MAL may exert their effects on the tongue dorsum microbiome by regulating of gene activation and the differentiation of blood immune cells.\u003c/p\u003e\n\u003cp\u003eNext, PheWAS analysis was performed by examining 74 genome-wide significant loci in the summary statistics of traits from the GWAS catalog\u003csup\u003e39\u003c/sup\u003e, Biobank Japan (BBJ)\u0026nbsp;\u003csup\u003e40\u003c/sup\u003e, and this current study including 24 blood metabolic traits, 16 diseases, and four dental conditions (\u003cstrong\u003eFig. 5a\u003c/strong\u003e). Six MAL including five of the ten replicated MAL were linked to one or more metabolic/immune traits at \u003cem\u003eP\u003c/em\u003e \u0026lt; 5 × 10\u003csup\u003e−8\u003c/sup\u003e (\u003cstrong\u003eSupplementary Fig. 12, Supplementary Table 13\u003c/strong\u003e): \u003cem\u003eAMY1C\u003c/em\u003e linked to alpha-amylase 1 and amylase measurements; \u003cem\u003eSLC2A9\u003c/em\u003e linked to multiple serum metabolites including urate measurement, uric acid etc.; \u003cem\u003eHLA\u003c/em\u003e loci linked to lots of immune related traits/diseases such as rheumatoid arthritis and autoimmune disease; \u003cem\u003ePDE2A\u003c/em\u003e loci linked to blood total protein, non-albumin protein and insomnia; \u003cem\u003ePOLI\u003c/em\u003e loci linked to cortical thickness, neuroticism measurement, blood white blood cell counts; \u003cem\u003eFUT2\u003c/em\u003e loci linked to multiple blood metabolic/immune indices, such as cancer biomarker measurement, alkaline phosphatase, serum carcinoembryonic antigen, vitamin B12, serum alanine aminotransferase.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe locus showing the strongest association in the PheWAS analysis was SLC2A9, which was significantly associated with urate measurement (\u003cem\u003eP\u003c/em\u003e = 9.0 × 10\u003csup\u003e−\u003c/sup\u003e\u003csup\u003e3353\u003c/sup\u003e) and blood uric acid (\u003cem\u003eP\u003c/em\u003e = 4.0 × 10\u003csup\u003e−\u003c/sup\u003e\u003csup\u003e496\u003c/sup\u003e) in a meta-analysis of GWAS studies, as well as in the BBJ and our study. This is consistent with the known function of \u003cem\u003eSLC2A9\u0026nbsp;\u003c/em\u003eas a uric acid transporter gene. The minor allele C of the index SNP \u003cem\u003eSLC2A9\u0026nbsp;\u003c/em\u003ers3796835 was associated with a lower serum uric acid level in both the CHARLS cohort (\u003cem\u003eP\u0026nbsp;\u003c/em\u003e= 4.1 × 10\u003csup\u003e−\u003c/sup\u003e\u003csup\u003e19\u003c/sup\u003e; \u003cstrong\u003eFig. 5b\u003c/strong\u003e) and the 4DSZ cohort (\u003cem\u003eP\u0026nbsp;\u003c/em\u003e= 6.7\u0026nbsp;× 10\u003csup\u003e−\u003c/sup\u003e\u003csup\u003e6\u003c/sup\u003e), regardless of age effects. M-GWAS analysis showed that this SNP was mostly strongly associated with the relative abundance of the genus \u003cem\u003eLachnoanaerobaculum\u003c/em\u003e in both the CHARLS cohort (\u003cem\u003eP\u0026nbsp;\u003c/em\u003e= 1.3\u0026nbsp;×\u0026nbsp;10\u003csup\u003e−\u003c/sup\u003e\u003csup\u003e13\u003c/sup\u003e; \u003cstrong\u003eFig. 5c\u003c/strong\u003e) and the 4DSZ cohort (\u003cem\u003eP\u0026nbsp;\u003c/em\u003e= 4.8\u0026nbsp;× 10\u003csup\u003e−\u003c/sup\u003e\u003csup\u003e5\u003c/sup\u003e), regardless of age effects. Similarly, these \u003cem\u003eSLC2A9\u003c/em\u003e-associated microbial taxa, such as \u003cem\u003eLachnoanaerobaculum\u003c/em\u003e, exhibited positive correlation with serum uric acid in both the CHARLS cohort (Spearman r = 0.05, \u003cem\u003eP\u0026nbsp;\u003c/em\u003e= 6.9\u0026nbsp;×\u0026nbsp;10\u003csup\u003e−\u003c/sup\u003e\u003csup\u003e7\u003c/sup\u003e; \u003cstrong\u003eFig. 5d\u003c/strong\u003e) and the 4DSZ cohort (r = 0.07, \u003cem\u003eP\u0026nbsp;\u003c/em\u003e= 2.4\u0026nbsp;× 10\u003csup\u003e−\u003c/sup\u003e\u003csup\u003e4\u003c/sup\u003e). These findings suggested the blood uric acid-mediated function mechanism of \u003cem\u003eSLC2A9\u003c/em\u003e on specific oral bacteria (\u003cstrong\u003eFig. 5e\u003c/strong\u003e): the \u003cem\u003eSLC2A9\u003c/em\u003e rs3796835-T allele linked to an increase in serum uric acid levels, and the elevated serum uric acid environment selectively promotes the proliferation and colonization of bacteria with uric acid degradation capabilities, such as \u003cem\u003eLachnoanaerobaculum\u0026nbsp;\u003c/em\u003esp. ICM7. The molecular basis of this adaptive change in the bacterial community lies in the fact that these bacteria carry complete uric acid degradation functional gene clusters (including key genes such as \u003cem\u003eygeX\u003c/em\u003e, \u003cem\u003eygeY\u003c/em\u003e, \u003cem\u003eygeW\u003c/em\u003e, \u003cem\u003essnA\u003c/em\u003e, \u003cem\u003eygfK\u003c/em\u003e, etc.), giving them the ability to utilize uric acid as a nutrient substrate efficiently, and thus gaining a competitive advantage in hosts with high uric acid\u003csup\u003e41\u003c/sup\u003e. \u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIn addition to the “\u003cem\u003eSLC2A9-\u003c/em\u003eserum uric acid\u003cem\u003e-Lachnoanaerobaculum\u003c/em\u003e” interactive axis, MAL were also involved in some other immunometabolic traits. For example, another SNP, rs12954177 near \u003cem\u003ePOLI\u003c/em\u003e, regulated the expression of \u003cem\u003ePOLI\u003c/em\u003e in 15 tissues, such as esophagus mucosa (\u003cstrong\u003eSupplementary Table 11,\u0026nbsp;\u003c/strong\u003e\u003cem\u003eP\u0026nbsp;\u003c/em\u003e= 2.50 × 10⁻\u003csup\u003e69\u003c/sup\u003e) and whole blood (\u003cem\u003eP\u0026nbsp;\u003c/em\u003e= 2.20 × 10⁻\u003csup\u003e43\u003c/sup\u003e) from the GTEx, as well as the \u003cem\u003ePOLI\u003c/em\u003e expression in eQTLs and caQTLs of CD4 and CD8 cells (\u003cstrong\u003eSupplementary Table 12\u003c/strong\u003e). This SNP was associated with white blood cell count in the BBJ cohort (\u003cem\u003eP\u0026nbsp;\u003c/em\u003e= 6.20 × 10⁻\u003csup\u003e7\u003c/sup\u003e). It exhibited the strongest association with the presence of \u003cem\u003eHaemophilus parahaemolyticus\u003c/em\u003e in M-GWAS analysis of both the CHARLS discovery and replication cohorts (\u003cstrong\u003eFig. 6a, b\u003c/strong\u003e). Furthermore, this SNP was significantly associated with the relative abundance of \u003cem\u003eH. parahaemolyticus\u003c/em\u003e, with the associations consistent across independent datasets (\u003cstrong\u003eFig. 6c, d\u003c/strong\u003e). Notably, individuals carrying \u003cem\u003eH. parahaemolyticus\u0026nbsp;\u003c/em\u003eshowed significantly elevated white blood cell count (WBC; β = –0.066, \u003cem\u003eP\u003c/em\u003e = 5.46 × 10\u003csup\u003e–9\u003c/sup\u003e), triglycerides (TG; β = –0.061, \u003cem\u003eP\u003c/em\u003e = 2.95 × 10⁻⁸), and hemoglobin (HGB; β = –0.055, \u003cem\u003eP\u003c/em\u003e = 1.02 × 10⁻\u003csup\u003e6\u003c/sup\u003e) concentrations, when compared to those without the bacterium (\u003cstrong\u003eFig. 6e, f\u003c/strong\u003e). These results implicate the critical role of the \u003cem\u003ePOLI\u003c/em\u003e locus in interactions between oral microbes and host immunity/metabolism.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCausal links between tongue dorsum microbiota and host metabolism\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo further comprehensively reveal the links and causalities between the microbiome and host phenotypes, we performed observational correlation and bidirectional MR analyses for 94 microbial features associated with at least one variant at genome-wide significance in our M-GWAS, and 43 host traits representing host metabolism, dental conditions, and diseases. The observational correlation analysis resulted the identification of a total of 239 significant associations after \u003cem\u003eBonferroni\u0026nbsp;\u003c/em\u003ecorrection (\u003cem\u003eP\u0026nbsp;\u003c/em\u003e\u0026lt; 1.22 × 10\u003csup\u003e−5\u003c/sup\u003e; \u003cstrong\u003eFig. 7a\u003c/strong\u003e, \u003cstrong\u003eSupplementary Table 14\u003c/strong\u003e), by using multivariate linear regression with adjustment for gender, age, BMI, and the top ten host PCs in 8,331 samples that exhibited multi-omics data and complete 43 phenotypic traits in the CHARLS cohort. Dentures (n = 42), blood urea nitrogen (n = 25), tooth loss (n = 21), blood gamma-glutamyl transferase (n = 19), creatine (n = 18), and blood glucose (n = 15), were among the host traits associated with the largest number of microbial features (\u003cstrong\u003eSupplementary Fig. 13\u003c/strong\u003e). The class Betaproteobacteria, SGB1469,\u003cem\u003e\u0026nbsp;Haemophilus sputorum,\u0026nbsp;\u003c/em\u003e\u003cem\u003eand\u003c/em\u003e\u003cem\u003eNeisseria subflava\u0026nbsp;\u003c/em\u003ewere linked to over eight blood traits. These results further extend prior findings and suggest quantitative relationships among oral microbial taxa/functions, dental conditions, and plasma metabolites.\u003c/p\u003e\n\u003cp\u003eLeveraging the availability of comprehensive phenotypic data in 8,331 individuals, we first performed one-sample MR analyses to infer causal relationships for the 239 observationally significant associations. The selected instrumental variables were robust, with mean \u003cem\u003eF\u003c/em\u003e statistics of 289 for microbial features and 186 for host traits, explaining on average 3.5% and 2.3% of the variance for microbial features and host traits, respectively (\u003cstrong\u003eSupplementary Fig. 14\u003c/strong\u003e). We observed seven significant causal effects in the direction from microbiome to phenotypes after multiple test correction (\u003cem\u003eP\u0026nbsp;\u003c/em\u003e\u0026lt; 2.09\u0026nbsp;× 10\u003csup\u003e−4\u003c/sup\u003e = 0.05/239, \u003cstrong\u003eFig. 7b\u003c/strong\u003e, \u003cstrong\u003eSupplementary Table 15\u003c/strong\u003e). Moreover, to increase statistical power and robustness, we also used a two-sample MR method to analyze summary data from 8,331 samples with microbial features and 15,459 samples with host traits. The seven causal associations identified by one-sample MR were also significant in the two-sample MR analyses (\u003cem\u003eP\u003c/em\u003e = 8.5\u0026nbsp;× 10\u003csup\u003e−3\u003c/sup\u003e ~ 1.6\u0026nbsp;× 10\u003csup\u003e−4\u003c/sup\u003e, \u003cstrong\u003eSupplementary Table 16\u003c/strong\u003e). The two-sample MR analyses identified four additional causal associations: two showed that tongue dorsum microbial features have causal effects on blood metabolic traits, and the other two showed that blood metabolic traits have causal effects on tongue dorsum microbial features. These eleven causal relationships were confirmed by one of the one- and two-sample MR analyses and replicated by the other (\u003cstrong\u003eFig. 7b\u003c/strong\u003e).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe eleven inferred causal relationships revealed several key associations involved in hepatic and renal\u0026nbsp;health. Two microbial metabolic pathways,\u0026nbsp;L-histidine biosynthesis (HISTSYN-PWY)\u0026nbsp;and the\u0026nbsp;superpathway of L-tyrosine biosynthesis (PWY-6630), were causally associated with decreased blood gamma-glutamyl transferase (GGT) levels (β = –0.232 and β = –0.213, respectively). This protective effect is supported by recent evidence that sulfur-containing histidine derivatives (thiohistidines) can directly inhibit GGT activity\u003csup\u003e42\u003c/sup\u003e.\u0026nbsp;Notably, we identified \u003cem\u003eNeisseria subflava\u0026nbsp;\u003c/em\u003eas a key contributing member of both these pathways (Pearson r = 0.88 for HISTSYN-PWY; r = 0.86 for PWY-6630) and confirmed its nominal causal effect on lowering GGT (β =\u0026nbsp;–0.111). \u0026nbsp;In contrast, the oral taxa \u003cem\u003eRothia mucilaginosa\u0026nbsp;\u003c/em\u003eexhibited a causal link to elevated levels of both blood GGT (β = 0.165, \u003cem\u003eP\u003c/em\u003e = 1.46 × 10\u003csup\u003e−4\u003c/sup\u003e) and alanine aminotransferase (ALT) (β = 0.163, \u003cem\u003eP\u003c/em\u003e = 2.02 × 10\u003csup\u003e−4\u003c/sup\u003e). GGT and ALT were two liver enzymes whose elevated levels indicated liver damage, suggesting that \u003cem\u003eR. mucilaginosa\u003c/em\u003e may be a potential microbial pathogen contributing to subclinical liver injury. The genus \u003cem\u003eGranulicatella\u0026nbsp;\u003c/em\u003ewas similarly associated with increased GGT levels, reinforcing the role of specific oral bacteria in hepatic dysfunction. In addition to the hepatic biomarker, the oral microbiome influenced renal function and purine metabolism. L-arginine biosynthesis III (via N-acetyl-L-citrulline, PWY-5154) and \u003cem\u003eRothia mucilaginosa\u003c/em\u003e were linked to decreased creatine levels, and blood creatine level was correlated with reduced abundance of GGB1144_SGB1468, suggesting a previously unexplored oral-kidney axis. The prevalence of \u003cem\u003eOribacterium SGB5283\u003c/em\u003e was positively associated with the increased blood uric acid level (β = 0.358, \u003cem\u003eP\u0026nbsp;\u003c/em\u003e= 4.36 × 10\u003csup\u003e−6\u003c/sup\u003e), consistent with prior observations of its enrichment in groups with hyperuricemia (HUA) and obstructive sleep apnea (OSA) relative to only the OSA group\u003csup\u003e43\u003c/sup\u003e. Conversely, confirming a feedback mechanism, blood uric acid levels were causally associated with the prevalence of the experimentally confirmed uric acid-degrading bacteria, such as \u003cem\u003eLachnoanaerobaculum\u0026nbsp;\u003c/em\u003esp. ICM7 (at nominal significance), and \u003cem\u003eCandidatus Nanosyncoccus\u003c/em\u003e (β = 0.177, \u003cem\u003eP\u003c/em\u003e = 1.52 × 10\u003csup\u003e−5\u003c/sup\u003e). In addition, we identified \u003cem\u003eLeptotrichia hongkongensis\u003c/em\u003e as causally linked to an increased risk of dentures (β = 0.237, \u003cem\u003eP\u003c/em\u003e = 2.93 × 10\u003csup\u003e−5\u003c/sup\u003e), suggesting that oral bacterium may affect local dental health.\u0026nbsp;\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eThis study establishes the tongue dorsum microbiome as a powerful model for elucidating host-genetic control of microbial ecosystems. By integrating multi-cohort data from young to elderly Chinese populations of 13,397 individuals, we performed the largest and most comprehensive oral M-GWAS to date, encompassing taxonomic, pathway, and gene-family levels. We demonstrate that: (i) the oral microbiome exhibits a stronger and more replicable host genetic signature than the gut microbiome, (ii) host-microbiome genetic architecture is characterized by both cross-population convergence and population-specific allelic heterogeneity, and (iii) genetically informed Mendelian randomization uncovers robust, causally implicated oral microbes and pathways modulating systemic host physiology, particularly hepatic and renal functions. These findings collectively shift the paradigm from descriptive association to mechanistic and causal understanding of the oral host-microbiome axis.\u003c/p\u003e\n\u003cp\u003eFirst, through an M-GWAS, we identified 10 genetic loci significantly associated with microbial traits that were replicated in independent datasets, with 8 of these reaching study-wide significance. These genes represent functionally interpretable loci related to immune or metabolic traits, including the \u003cem\u003eFUT2\u0026nbsp;\u003c/em\u003elocus previously reported in gut microbiome studies, the \u003cem\u003eAMY1\u0026nbsp;\u003c/em\u003elocus that encodes salivary amylase and has co-evolved with the host, the \u003cem\u003ePOLI\u003c/em\u003e gene differential expressed in multiple tissues and blood immune cells linking to the blood white cell counts, the blood urate transporter gene \u003cem\u003eSLC2A9\u0026nbsp;\u003c/em\u003eand human immune region HLA genes. Notably, the discovery of eight study-wide significant host loci in our oral M-GWAS stands in stark contrast to the gut microbiome field, where only two loci (LCT and ABO) have been consistently replicated at study-wide significance\u003csup\u003e10–12\u003c/sup\u003e\u003csup\u003e,\u003c/sup\u003e\u003csup\u003e44–49\u003c/sup\u003e. This disparity not only underscores a potentially stronger host genetic influence on the oral microbiome—likely due to its direct exposure to host-derived factors like saliva and mucosal immunity—but also highlights the unique power of the oral cavity as a model system for dissecting host genetic effects on commensal communities. Our results also highlight the power of combining multiple independent cohorts into a larger sample for M-GWAS analyses\u003csup\u003e50\u003c/sup\u003e, as this approach enables the discovery of robust and replicable results.\u003c/p\u003e\n\u003cp\u003eSecond, we compared with a recent study by Kamitaki \u003cem\u003eet al\u003c/em\u003e., which identified 11 host genetic loci associated with the salivary microbiome in a large European cohort using microbial principal components (mPCs)\u003csup\u003e51\u003c/sup\u003e. Despite differences in sampling site (saliva in Kamitaki \u003cem\u003eet al\u003c/em\u003e. versus tongue dorsum in this study), population ancestry (European versus East Asian), and analytical framework (microbial community-level mPCs-based GWAS versus taxa/pathways-level GWAS), the two studies demonstrate striking convergence: five loci, including \u003cem\u003eFUT2\u003c/em\u003e, \u003cem\u003ePOLI\u003c/em\u003e, \u003cem\u003eAMY1\u003c/em\u003e, \u003cem\u003eSLC2A9\u003c/em\u003e, and \u003cem\u003ePRB\u003c/em\u003e, were well replicated (\u003cstrong\u003eSupplementary Tables 17,18\u003c/strong\u003e). However, our study moves beyond replication to deliver new biological insights. (i) Ancestry-stratified genetic architecture at shared loci. Although both studies implicate the same genomic regions as key determinants of the oral microbiome, the lead variants differ across populations (\u003cstrong\u003eSupplementary Table 19\u003c/strong\u003e). For \u003cem\u003eFUT2\u003c/em\u003e, Kamitaki \u003cem\u003eet al.\u003c/em\u003e identified the high-frequency European loss-of-function allele rs601338 (W154X; MAF=0.45 in Europeans but MAF=0.0087 in Chinese) in saliva, whereas our analysis highlights rs1047781 (I140F), a missense variant enriched in East Asians, as the strongest signal for the composition of tongue-dorsum microbiota. Likewise, the lead SNPs of \u003cem\u003ePOLI\u003c/em\u003e and \u003cem\u003eTLR1\u003c/em\u003e loci identified in Kamitaki \u003cem\u003eet al.’s\u003c/em\u003e study were also European-specific and very rare in Chinese. All lead variants in the shared loci associated with the same microbes differed between the two ethnic populations. These findings illustrate how distinct functional alleles within the same gene can yield convergent phenotypic effects, underscoring the necessity of including diverse ancestries to fully characterize host–microbiome genetic interactions. (ii) Novel Chinese-specific loci were identified in this study. Beyond the cross-ancestry replication of five shared loci, our analysis identified five novel genome-wide significant and internally replicated loci that were not reported by Kamitaki \u003cem\u003eet al.\u003c/em\u003e (\u003cstrong\u003eSupplementary Table 17\u003c/strong\u003e), suggesting the population-specific genetic structure again. (iii) Mechanistic resolution of the \u003cem\u003eFUT2\u003c/em\u003e signal and reduction of dental confounding. In our cohort, the \u003cem\u003eFUT2\u003c/em\u003e association remains independent of ABO blood group, supporting a fucose-dependent mucosal mechanism, in contrast to the ABO-linked effect reported for saliva. Moreover, several loci highlighted by Kamitaki \u003cem\u003eet al.\u003c/em\u003e appear partially driven by dental health and prosthesis-related confounding, whereas our genome-wide significant loci are enriched in genes involved in core metabolic and immune functions and show no association with denture use, tooth loss or chewing ability (all \u003cem\u003eP\u003c/em\u003e \u0026gt; 1×10\u003csup\u003e−\u003c/sup\u003e\u003csup\u003e4\u003c/sup\u003e). These observations indicate that our signals more likely capture fundamental host–microbe biology rather than secondary effects of oral prostheses. (iv) Integration of host genetics, the oral microbiome and systemic traits. Leveraging extensive host-phenotype data together with Mendelian randomization, we mapped the systemic correlates of 94 genetically associated microbial features, identifying 239 significant host–microbe associations and 11 robust causal relationships. This integrative framework delineates specific microbes and microbial functions as putative modulators of host physiology, providing a foundation for microbiome-targeted interventions beyond descriptive associations.\u003c/p\u003e\n\u003cp\u003eCompared to previous studies that focused solely on microbial taxonomic units\u003csup\u003e18\u003c/sup\u003e, this current M-GWAS study is more systematic, encompassing not only microbial taxonomic units but also pathways and gene families. This multidimensional analysis enabled us to explore potential adaptive mechanisms, such as the \u003cem\u003eFUT2\u003c/em\u003e signal, from diverse perspectives. At the species level, we observed significant associations between \u003cem\u003eFUT2\u003c/em\u003e genetic variation and multiple bacterial species, with the strongest signal detected in \u003cem\u003eH. sputorum\u003c/em\u003e. At the pathway level, we identified the fucose degradation metabolic pathway as most strongly associated with the \u003cem\u003eFUT2\u003c/em\u003e signal, suggesting that host fucosylation may represent a core functional target. Further analysis at the gene family level revealed associations between \u003cem\u003eFUT2\u003c/em\u003e variation and multiple functionally significant gene families, with the top six annotated proteins including an ABC transporter permease and an extracellular solute-binding protein from \u003cem\u003eStreptococcus pneumoniae\u003c/em\u003e, along with an ABC transporter substrate-binding protein and an alpha-L-fucosidase from \u003cem\u003eStreptococcus mitis,\u003c/em\u003e as well as an L-fucose-proton symporter and an L-fucose isomerase from \u003cem\u003eHaemophilus sputorum.\u0026nbsp;\u003c/em\u003eNotably, although the strongest association with \u003cem\u003eFUT2\u003c/em\u003e was observed for \u003cem\u003eH. sputorum\u003c/em\u003e, the most significant functional annotations at the gene family level were primarily from several species of the genus \u003cem\u003eStreptococcus\u003c/em\u003e. The \u003cem\u003eStreptococcus\u0026nbsp;\u003c/em\u003egroup participates in extracellular carbohydrate uptake, while the \u003cem\u003eH. sputorum\u0026nbsp;\u003c/em\u003edirectly hydrolyzes host-derived fucosylated glycans. These results indicate that the mechanism underlying the \u003cem\u003eFUT2\u003c/em\u003e signal is not a single species effect but may reflect cross-species metabolic cooperation. Therefore, multi-level M-GWAS analysis incorporating species, pathways, and gene functions not only yields statistical significance but also reveals potential mechanisms of host-microbe interactions, advancing our understanding from single-point associations to mechanistic explanations.\u003c/p\u003e\n\u003cp\u003eFinally, leveraging host genetic variants identified through M-GWAS as instrumental variables, we applied MR to infer causal relationships between the oral microbiome and host physiology—an approach increasingly adopted in microbiome research\u003csup\u003e12,52,53\u003c/sup\u003e. Based on observationally significant correlations, our MR analysis reveals compelling causal links between the oral microbiota and host hepatic/renal functions: microbial metabolic pathways (e.g., L-histidine and L-tyrosine biosynthesis) directly inhibit liver enzyme GGT activity through bioactive metabolites (e.g., thiohistidines)\u003csup\u003e42\u003c/sup\u003e, while specific bacteria like \u003cem\u003eRothia mucilaginosa\u003c/em\u003e and \u003cem\u003eGranulicatella\u003c/em\u003e causally elevate GGT and ALT levels, suggesting their pathogenic potential in liver injury. Simultaneously, we identified an oral-kidney axis where \u003cem\u003eOribacterium SGB5283\u003c/em\u003e increases blood uric acid levels, consistent with\u0026nbsp;the\u0026nbsp;host \u003cem\u003eSLC2A9\u003c/em\u003e gene’s\u0026nbsp;function, while blood uric acid feedback inhibits uric acid-degrading bacteria\u003csup\u003e41\u003c/sup\u003e.\u0026nbsp;These MR findings highlight the potential for microbiome-targeted interventions in the management of chronic diseases, particularly for hepatic and renal disorders, consistent with previous studies\u003csup\u003e54\u003c/sup\u003e. However, while MR provides strong evidence for causality, experimental validation through animal models or in vitro systems remains essential to confirm these mechanisms and establish clinical applicability\u003csup\u003e55,56\u003c/sup\u003e. Future studies should combine MR findings with functional experiments to translate these causal relationships into therapeutic interventions, ultimately advancing personalized microbiome-based therapies.\u003c/p\u003e"},{"header":"Methods","content":"\u003cp\u003e\u003cstrong\u003eStudy participants\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe primary study population was derived from the China Health and Retirement Longitudinal Study (CHARLS), a nationally representative cohort study of China\u0026rsquo;s middle-aged and older adult population and harmonized with the Health and Retirement Study (HRS) family of aging cohorts, including ELSA and SHARE\u003csup\u003e57,58\u003c/sup\u003e. CHARLS was launched in 2011, covering 17,705 respondents from 450 village-level units and 150 county-level units randomly selected across China. They were followed in 2013, 2015, 2018, 2020, and 2021-23, achieving a follow-up rate for baseline respondents above 85%. The CHARLS data have been widely used in the scientific community, with more than 160,000 users worldwide.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eCHARLS routinely collected anthropometric and biomarker data, along with a rich set of self-reported health, behavioral, and socioeconomic data\u003csup\u003e59\u003c/sup\u003e. In the 2021-2023 wave of the CHARLS survey, we initially included 11,931 individuals who achieved whole metagenomic sequencing (WMS) data of the tongue dorsum sample; of these, 8,590 had matched blood samples for whole-genome sequencing. For the remaining 3,341 samples without blood samples, we extracted host genomic reads directly from the tongue dorsum specimens, yielding an additional 3,341 host\u0026ndash;microbiome paired dataset. To further include more samples across different ages to increase power, we also incorporated 2,017 individuals with high-depth sequenced blood WGS and WMS samples from the 4DSZ cohort\u003csup\u003e18\u003c/sup\u003e. Sample collection and sequencing protocols for blood and tongue dorsum specimens followed methods established in our prior work. Genomic DNA from blood was extracted using the MagPure Buffy Coat DNA Midi KF Kit (No. D3537-02) per the manufacturer\u0026rsquo;s protocol. Tongue dorsum samples were collected via swab, preserved in 2 mL of stabilization buffer, and processed with the MagPure Stool DNA KF Kit B (No. MD5115-02B), which includes a bead-beating step to enhance mechanical lysis of bacterial and fungal cells and improve microbial DNA yield. DNA concentrations were quantified using a Qubit fluorometer (Invitrogen).\u0026nbsp;Libraries were constructed from 500 ng of DNA per sample and sequenced on the DNBSEQ platform with paired-end 100 bp reads.\u003c/p\u003e\n\u003cp\u003eAll procedures involving human participants were approved by the Institutional Review Boards (IRBs) of the CHARLS cohort (IRB00001052-11014) and BGI Shenzhen. Written informed consent was obtained from all participants before enrollment.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTongue dorsum microbiome sequencing, quality control and profiling\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eMetagenomic sequencing was performed on the DNBSEQ platform for a total of 11,931 samples, with all samples sequenced using 100-bp paired-end reads and four libraries constructed per sequencing lane. For tongue dorsum samples, we generated an average of 19 GB of raw data per sample. Raw paired-end reads were first processed with fastp (v0.23.4) to remove sequencing adapters, trim low-quality bases, and discard short fragments, with adapter trimming enabled for both R1 and R2 and a minimum read-length cutoff of 30 bp. The resulting quality-filtered reads were then mapped to the human reference genome hg38 using Bowtie2 (v2.4.4) in end-to-end, very-sensitive mode, and non-host read pairs were extracted using samtools fastq (v1.12) by retaining only pairs in which both mates were unmapped. The host-removed clean reads were subsequently used for microbial profiling.\u003c/p\u003e\n\u003cp\u003eThe microbial taxonomic profile was calculated using MetaPhlAn4\u003csup\u003e60\u003c/sup\u003e (v4.0.6) based on the mpa_vOct22_CHOCOPhlAnSGB_202212 database. The marker gene database utilized by MetaPhlAn4 comprises approximately 1,000,000 fully annotated genomes, including 236,600 bacterial/archaeal reference genomes and 771,500 metagenome-assembled genomes, covering a broad spectrum of microbial diversity. Pathway and gene family level functional profiles were annotated and predicted using HUMAnN3\u003csup\u003e61\u003c/sup\u003e (v3.8;\u0026nbsp;nucleotide-database: chocophlan v201901_v31; protein-database: uniref90_annotated_v201901b). Ultimately, we obtained a raw microbial taxonomic dataset encompassing 3890 taxa (26 phyla, 116 classes, 168 orders, 269 families, 842 genera, 2469 species), along with a dataset containing 559 pathways or functions.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eWhole-genome sequencing in the CHARLS discovery cohort\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe raw discovery cohort comprised 8,590 individuals with whole-genome sequencing depth of a predefined 20\u0026times; for blood samples. Reads were filtered for adapter contamination and low-quality bases using SOAPnuke\u003csup\u003e62\u003c/sup\u003e (v1.5.6; -n 0.05, -q 0.2, -l 12, -M 2) and aligned to the GRCh38/hg38 reference genome using BWA\u003csup\u003e63\u003c/sup\u003e (v0.7.15) with default parameters. Aligned reads were converted to indexed BAM format using SAMtools (v0.1.18), and PCR duplicates were marked using Picard Tools (v1.62) for downstream filtering. Base quality score recalibration was performed using the Genome Analysis Toolkit (GATK\u003csup\u003e64\u003c/sup\u003e, v4.3.0). The BaseRecalibrator module generated a recalibration table by identifying known SNPs and indels from dbSNP (build 151) in the BAM files. Subsequent base quality recalibration was carried out with GATK Lite (v2.2.15), and read pairs flagged as misaligned by Stampy were removed. Variant calling was conducted with GATK\u0026rsquo;s HaplotypeCaller, producing gVCF files containing SNPs and indels. These were jointly processed with GATK (v4.3.0) to perform multi-sample genotyping and variant-quality filtering. CombineGVCFs merged gVCFs from all samples, followed by GenotypeGVCFs for joint genotyping. Variant quality scores were recalibrated using VariantRecalibrator, with filtering applied via ApplyRecalibration based on a Gaussian mixture model trained on high-confidence resources: for SNPs, HapMap 3.3, dbSNP (build 151), 1000 Genomes Omni 2.5M array, and 1000 Genomes Phase 1 high-confidence SNPs; for indels, Mills and 1000 Genomes gold standard indels and dbSNP (build 151). Variants were filtered at sensitivity thresholds of 99.5% for SNPs and 99.0% for indels. Further, a stringent variant inclusion criterion was applied: (1) mean depth \u0026gt; 5\u0026times;; (2) Hardy\u0026ndash;Weinberg equilibrium (HWE) \u003cem\u003eP\u003c/em\u003e \u0026gt; 10\u003csup\u003e\u0026minus;\u003c/sup\u003e\u003csup\u003e5\u003c/sup\u003e, and (3) genotype call rate \u0026gt; 98%. Samples were retained only if they met the following criteria: (1) mean depth \u0026gt; 6\u0026times;; (2) variant call rate \u0026gt; 98%, and (3) absence of population stratification as assessed by principal component analysis (PCA) using PLINK\u003csup\u003e65\u003c/sup\u003e (v1.9), and (3) exclusion of related individuals based on identity-by-descent (IBD) estimates (Pi-hat threshold = 0.1875). Finally, 8,331 individuals with 5,589,561 high-quality common variants (MAF \u0026ge; 5%) were used for subsequent M-GWAS analysis.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eHost genome extraction and genotype imputation from tongue dorsum metagenomic samples\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAs blood-derived host genomic samples were unavailable for 3,341 participants from the CHARLS replication cohort, given the moderate host rate from the tongue dorsum as we previously reported, we extracted host genomic information from tongue dorsum metagenomic sequencing data. Sequencing reads were aligned to the human hg38_noalt reference genome using BWA-MEM, followed by duplicate removal and base-quality score recalibration using the Genome Analysis Toolkit (GATK). Genotype calling and likelihood estimation were performed using GATK and BCFtools. Low-pass sequencing genotypes were imputed using the Lowpass_v5-Human pipeline on the DCS platform (https://cloud.stomics.tech), which employs a hidden Markov model integrating genotype likelihoods with the refpanel_hg38 haplotype reference panel. This reference panel comprises 3,202 deeply sequenced samples from the 1000 Genomes Project, encompassing approximately 68 million variant sites. We preserved only variants with an imputation information score above 0.7. The imputed genotype data (impute.raw.vcf.gz) were merged across samples and converted to PLINK binary format (Bfile). This dataset was further refined to include only samples with no evidence of population stratification and no kinship (excluding related individuals based on pairwise identity by descent (IBD), with a Pi-hat threshold of 0.1875, in PLINK). Ultimately, 3,049 individuals with 5,589,561 variants identical to those used in the discovery cohort were included for M-GWAS replication analysis.\u003c/p\u003e\n\u003cp\u003eTo evaluate the quality of host genome data extracted from tongue dorsum samples, we also sequenced 14 blood samples from the cohort to a mean depth of 22\u0026times; (ranging from 11\u0026times; to 47\u0026times;). We assessed concordance rates (CR) between the blood and the extracted host genome data. Compared with blood host genome data, the extracted host genome data had a missing rate of 0.32% but a high genotype concordance of 99% (\u003cstrong\u003eSupplementary Table 17\u003c/strong\u003e). Further, PCA analysis showed no population stratification among the three datasets used in the M-GWAS study, and these individuals were obviously clustered into the East Asian population and separated from the other ethnic populations, such as African, American, European, and South Asian populations, when compared to the 1000 Genome datasets (\u003cstrong\u003eSupplementary Fig. 2\u003c/strong\u003e).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCorrelation analysis of host PCs with microbial alpha diversity and PCoAs\u003c/strong\u003e\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eBased on species-level abundance data, microbial alpha diversity (Shannon and Simpson indices) and beta diversity (Bray-Curtis dissimilarity) were calculated using the \u0026apos;diversity\u0026apos; and \u0026apos;vegdist\u0026apos; functions from the R package \u0026apos;vegan\u0026apos;, respectively. Principal coordinate analysis (PCoA) was performed on the calculated beta-diversity dissimilarity using the \u0026apos;capscale\u0026apos; function in \u0026apos;vegan\u0026apos;. For each of the top 10 principal components, their associations with each alpha diversity index and each PCoA were analyzed using multivariable linear models, adjusting for sex, age, and BMI.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAssociation analysis for microbial taxa and functions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eGiven the statistical power of M-GWAS, we screened microbial taxa and functional pathways with an occurrence rate \u0026gt; 20%. After filtering, 744 microbial taxa and 335 pathways were kept. The representative genera of these microbial taxa covered 99.7% of the entire community in the cohort. Given the high correlation among many microbial taxa and functions, we performed multiple Spearman correlation tests to identify independent taxa for M-GWAS analysis to reduce the number of GWAS tests. Pairwise Spearman correlations between all taxa were calculated and used to construct an adjacency matrix, where correlations \u0026gt; 0.995 indicated edges between taxa. The graphical representation of this matrix was used to guide the greedy selection of representative taxa. Nodes (microbial taxa) were sorted by degree, and the node with the highest degree was selected as the final taxon (with random selection in case of ties). This taxon and its connected nodes were removed from the network, and the process was repeated until a final set of taxa was identified, ensuring that each discarded taxon was correlated with at least one selected taxon. This filtering ultimately yielded 845 microbial features (534 taxa and 311 functional pathways) for association analysis. We tested associations between host genetics and the oral microbiome using linear regression models based on relative abundance or logistic regression models based on presence/absence. Among these, 102 taxa and 243 pathways were analyzed using linear models, while the remaining 432 taxa and 68 pathways were analyzed using logistic models (\u003cstrong\u003eSupplementary Tables 5 and 6\u003c/strong\u003e). Specifically, for the 345 microbial features present in \u0026gt; 95% of individuals, their relative abundances were log-transformed. Residuals were then calculated using \u0026apos;lm\u0026apos; in R with the following covariates: (log\u003csub\u003e10\u003c/sub\u003e(microbial abundance) ~ age + sex + BMI + top 10 PCs). Model residuals were extracted using the residuals() function from the stats package and then used in univariate linear models to assess for associations with genotypes. However, for the 500 microbial features present in \u0026gt; 20% but \u0026lt; 95% of individuals, they were dichotomized into presence/absence patterns to prevent zero-inflation. Bacterial abundance was then analyzed as a binary trait using logistic regression, with the same covariates mentioned above as controls.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThese M-GWAS analyses were first performed in the CHARLS primary discovery cohort, and then the significant associations were further confirmed in the CHARLS replication and 4DSZ cohorts. Finally, a meta-analysis was performed on the association results of the three cohorts, using sample-size weighted fixed-effect meta-analysis in METAL\u003csup\u003e66\u003c/sup\u003e (updated 2020-05-05, https://genome.sph.umich.edu/wiki/METAL).\u003c/p\u003e\n\u003cp\u003eThe gene of genetic variants was annotated via the ANNOVAR\u003csup\u003e67\u003c/sup\u003e tool. The eQTL information of significant genetic variants was investigated by searching in the GTEx v8 dataset. The associations of significant genetic variants with reported phenotypes were investigated by searching in the GWAS catalog (https://www.ebi.ac.uk/gwas/), the BBJ dataset and the Chinese 4DSZ dataset. The regional plot was created with our own GWAS results at https://statgen.github.io/localzoom/.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAssociation analysis for microbial alpha diversity and beta diversity\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eGWAS for alpha diversity and the first 10 principal coordinates (PCoAs) was performed using linear analysis implemented in PLINK v1.9, with sex, age, BMI, and the top ten host genetic principal components as covariates. These M-GWAS analyses were performed across the three cohorts, and a meta-analysis was subsequently used for integration.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAssociation analysis for microbial gene families\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo refine M-GWAS signals to the molecular function level, we further conducted association analysis for the identified 74 genome-wide significant loci with microbial gene families. Specifically, based on approximately 700,000 microbial gene family features output by HUMAnN3, we screened genome-wide variants using the same statistical framework and covariate adjustments (including sex, age, BMI, and the top ten host genetic PCs) as applied in the species and pathway analyses. These association analyses were performed in the three cohorts, respectively, and finally, a meta-analysis was used for integration.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eBlood type determination\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe performed genetic ABO blood-type assignment in the Chinese population using the method described by Wang et al.\u003csup\u003e68\u003c/sup\u003e. This approach utilizes the genotype combinations of three single-nucleotide polymorphisms (SNPs), rs8176719, rs635634, and rs7030248, to predict ABO blood type.\u0026nbsp;These three SNPs combined were reported to be sufficient to predict blood type and achieved high accuracy (0.98) and F1 scores (micro 0.99 and macro 0.97) within the Chinese population. Consistently, we also achieved 98% accuracy (=1725/1760) while evaluating this approach on our in-house dataset. Thus, we extracted the genotypes of these three SNPs from the CHARLS-4DSZ cohort. For each individual, if any of the three SNPs were missing, the blood type was recorded as NA; otherwise, the ABO blood type was determined according to the SNP combination rules provided by Wang et al. The assigned blood types were O, A, B, and AB.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAssociation analysis for host traits\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe 43 host traits, spanning hematological indices (hemoglobin, platelet counts), lipid profiles (triglycerides, HDL, LDL), renal markers (creatinine, cystatin C), systemic indicators (CRP, HbA1c), dental conditions (dentures, loss of tooth, chew ability), and diseases (diabetes, hypertension, digestive tract disease, etc.), were included for association analysis. To mitigate skewness, all quantitative traits were subject to a natural log-transformation, followed by outlier exclusion (observations \u0026gt; 4 standard deviations from the mean) and standardization (mean = 0, SD = 1). A linear model was used for the quantitative trait, and a logistic model was used for the binary trait, both implemented in PLINK. Age, gender, BMI and the top ten PCs were included as covariates.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eObservational correlation analysis\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe 94 microbial features associated with at least one variant at a genome-wide significant level (\u003cem\u003eP\u0026nbsp;\u003c/em\u003e\u0026lt; 1 \u0026times; 10\u003csup\u003e\u0026minus;8\u003c/sup\u003e) were tested for associations with 43 host traits in 8,331 individuals from the CHARLS discovery cohort, which exhibited both microbiome and complete phenotypic data. Associations were assessed via multivariable linear regression adjusted for age and sex, BMI and top ten PCs, with significance determined by FDR correction (\u003cem\u003eP\u003c/em\u003e\u003csub\u003eFDR\u003c/sub\u003e \u0026lt; 0.05, Benjamini\u0026ndash;Hochberg method).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eOne-sample and two-sample MR analysis.\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo maximize robustness, causal relationships were limited to 94 microbial features associated with at least one variant at a genome-wide significant level (\u003cem\u003eP\u0026nbsp;\u003c/em\u003e\u0026lt; 1 \u0026times; 10\u003csup\u003e\u0026minus;8\u003c/sup\u003e). To investigate the causal relationships between the 94 microbial features and 43 host traits, we first performed one-sample bidirectional MR analysis in 8,331 individuals with both microbiome and complete phenotypic data. We set \u003cem\u003eP\u003c/em\u003e \u0026lt; 1 \u0026times; 10⁻\u003csup\u003e5\u003c/sup\u003e as the threshold to select SNP/INDEL instrumental variables for microbial features. SNP/INDEL instruments for blood metabolic traits and disease exposures were chosen at a genome-wide significance threshold (\u003cem\u003eP\u0026nbsp;\u003c/em\u003e\u0026lt; 5 \u0026times; 10\u003csup\u003e\u0026minus;8\u003c/sup\u003e). Because no genome-wide significant variants were identified for dental conditions, SNP/INDEL instruments for dental conditions were also set to \u003cem\u003eP\u003c/em\u003e \u0026lt; 1 \u0026times; 10⁻\u003csup\u003e5\u003c/sup\u003e. Subsequently, LD-clumping with a strict threshold (\u003cem\u003er\u003c/em\u003e\u003csup\u003e2\u003c/sup\u003e \u0026lt; 0.1 in this high-depth CHARLS dataset) was performed to select independent genetic instruments with the lowest \u003cem\u003eP\u003c/em\u003e values for exposure. We additionally calculated F-statistics and explained variance to demonstrate instrument strength directly (\u003cstrong\u003eSupplementary Table 15\u003c/strong\u003e). The mean value of instrumental \u003cem\u003eF\u003c/em\u003e statistics is 289 for microbial features and 186 for host traits. On average, 3.5% and 2.3% of variance could be explained by instruments for \u0026nbsp;microbial features and host traits, respectively\u0026nbsp;(\u003cstrong\u003eSupplementary Fig. 14\u003c/strong\u003e).\u003c/p\u003e\n\u003cp\u003eAfter selection of instrumental variables, unweighted polygenic risk scores (PRS) were calculated for each individual using PLINK v1.9. Each independent genetic variant was coded as 0, 1, or 2 based on the number of trait-specific risk-increasing alleles an individual carried. Then, a two-stage least squares (TSLS) regression approach\u003csup\u003e69\u003c/sup\u003e was employed for one-sample MR analysis. In the first stage, for each exposure trait, a linear regression model was used to assess the association between PRS and observed phenotypic values, obtaining predicted fitted values based on the instrumental variables. In the second stage, a linear regression was performed to relate the outcome trait to the genetically predicted exposure levels from the first stage. Both stages were adjusted for age, sex, BMI, and the top ten principal components of population structure. TSLS analysis for each trait was conducted using the \u0026apos;ivreg\u0026apos; command from the AER package in R.\u003c/p\u003e\n\u003cp\u003eTo maximize sample size in MR analysis and confirm causal effects between microbial features and host traits, we also performed a two-sample BMR analysis using a GCTA-GSMR approach\u003csup\u003e70\u003c/sup\u003e as a robust validation and for new discovery. GWAS analysis for host traits was performed in a total 15,459 individuals, and then summary statistics data were used for two-sample MR analysis. Genetic variants with \u003cem\u003eP\u0026nbsp;\u003c/em\u003e\u0026lt; 1 \u0026times; 10\u003csup\u003e-8\u003c/sup\u003e and LD \u003cem\u003er\u003c/em\u003e\u003csup\u003e2\u003c/sup\u003e \u0026lt; 0.1 were selected as instrumental variables for metabolic traits/diseases, whereas \u003cem\u003eP\u0026nbsp;\u003c/em\u003e\u0026lt; 1 \u0026times; 10\u003csup\u003e-5\u003c/sup\u003e and LD \u003cem\u003er\u003c/em\u003e\u003csup\u003e2\u003c/sup\u003e \u0026lt; 0.1 for dental conditions.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eData availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe metagenomic data have been deposited at https://db.cngb.org/data_resources/project/CNP0008650 (in uploading process) The summary statistics of associations between host genetics and microbiome have been uploaded to the GSA database (https://ngdc.cncb.ac.cn/gsub/submit/bioproject/subPRO073610/overview). The release of these data was approved by the National Health Commission of China (Project ID: xxx, in preparation). Access to individual-level host genetic data requires approval from the corresponding authors (
[email protected],
[email protected],
[email protected]) and compliance with the regulations of the Human Genetic Resources Administration of China. The human reference genome hg38 dataset is publicly available at http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCode availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eHost genome sequencing reads were aligned to the GRCh38/hg38 reference genome using BWA (v0.7.15). Alignments were converted to indexed BAM format using SAMtools (v0.1.18), and PCR duplicates were marked with Picard Tools (v1.62). Genomic variant calling and base quality recalibration were performed using the Genome Analysis Toolkit (GATK) (v3.8) and GATK Lite (v2.2.15). Low-pass sequencing genotypes were imputed using BCFtools and the Lowpass_v5-Human platform with the refpanel_hg38 reference panel. Metagenomic reads were aligned to hg38 using BWA-MEM, and taxonomic profiling was conducted with MetaPhlAn4 (v4.1.1). Functional annotation utilized HUMAnN3 (v3.0.0.alpha.3). Quality control, association analyses, and principal component analysis (PCA) were implemented in PLINK (v1.9). Statistical analyses, including Mendelian randomization, were performed in R. One-sample MR employed the TSLS method, while two-sample MR used GSMR (v1.0.7).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe thank all the participants for agreeing to join this study. We are very grateful to the colleagues from the CHARLS cohort for sample collection, and the colleagues at BGI Research for DNA extraction, library construction, and sequencing. This work was supported by the Ministry of Science and Technology of China (Grant Nos. 2022ZD0211600 and 2023YFC3603300). We thank the Shenzhen Key Laboratory of Neurogenomics (BGI Genomics, Project No. CXB201108250094A) for support with sequencing and analysis. D.W. is supported by the Netherlands Organization for Scientific Research (NWO)-VENI grant VI.Veni.222.016.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthors\u0026apos; contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eY.Z. conceived and directed the CHARLS Cohort construction. Y.Z., T.Z., and C.N. conceived and directed this study. G.W., Q.M., B.C., X.C., Y.L., R.Z., and J.G. had established a detailed end-to-end process for sample management, including sample collection, transportation, and storage, ensuring standardization throughout the sample reception process. X.L. led the bioinformatics analyses with contributions from L.Z., Y.W., J.C., X.H., and D.W. X.L. conceived the framework of the article and wrote the initial manuscript. All authors contributed to the revision and discussion of the manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors have declared no competing interests.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n \u003cli\u003eLey, R. E., Turnbaugh, P. J., Klein, S. \u0026amp; Gordon, J. I. Microbial ecology: human gut microbes associated with obesity. \u003cem\u003eNature\u003c/em\u003e \u003cstrong\u003e444\u003c/strong\u003e, 1022\u0026ndash;1023 (2006).\u003c/li\u003e\n \u003cli\u003eQin, J. \u003cem\u003eet al.\u003c/em\u003e A metagenome-wide association study of gut microbiota in type 2 diabetes. \u003cem\u003eNature\u003c/em\u003e \u003cstrong\u003e490\u003c/strong\u003e, 55\u0026ndash;60 (2012).\u003c/li\u003e\n \u003cli\u003eWang, J. \u0026amp; Jia, H. Metagenome-wide association studies: fine-mining the microbiome. \u003cem\u003eNat Rev Microbiol\u003c/em\u003e \u003cstrong\u003e14\u003c/strong\u003e, 508\u0026ndash;522 (2016).\u003c/li\u003e\n \u003cli\u003eMicrobiota in health and diseases | Signal Transduction and Targeted Therapy. https://www.nature.com/articles/s41392-022-00974-4.\u003c/li\u003e\n \u003cli\u003eThe oral\u0026ndash;gut microbiome axis in health and disease | Nature Reviews Microbiology. https://www.nature.com/articles/s41579-024-01075-5.\u003c/li\u003e\n \u003cli\u003eHajishengallis, G. Periodontitis: from microbial immune subversion to systemic inflammation. \u003cem\u003eNat Rev Immunol\u003c/em\u003e \u003cstrong\u003e15\u003c/strong\u003e, 30\u0026ndash;44 (2015).\u003c/li\u003e\n \u003cli\u003eKilian, M. \u003cem\u003eet al.\u003c/em\u003e The oral microbiome \u0026ndash; an update for oral healthcare professionals. \u003cem\u003eBr Dent J\u003c/em\u003e \u003cstrong\u003e221\u003c/strong\u003e, 657\u0026ndash;666 (2016).\u003c/li\u003e\n \u003cli\u003eHajishengallis, G. \u0026amp; Chavakis, T. Local and systemic mechanisms linking periodontal disease and inflammatory comorbidities. \u003cem\u003eNat Rev Immunol\u003c/em\u003e \u003cstrong\u003e21\u003c/strong\u003e, 426\u0026ndash;440 (2021).\u003c/li\u003e\n \u003cli\u003eSedghi, L., DiMassa, V., Harrington, A., Lynch, S. V. \u0026amp; Kapila, Y. L. The oral microbiome: Role of key organisms and complex networks in oral health and disease. \u003cem\u003ePeriodontol 2000\u003c/em\u003e \u003cstrong\u003e87\u003c/strong\u003e, 107\u0026ndash;131 (2021).\u003c/li\u003e\n \u003cli\u003eKurilshikov, A. \u003cem\u003eet al.\u003c/em\u003e Large-scale association analyses identify host factors influencing human gut microbiome composition. \u003cem\u003eNat Genet\u003c/em\u003e \u003cstrong\u003e53\u003c/strong\u003e, 156\u0026ndash;165 (2021).\u003c/li\u003e\n \u003cli\u003eLopera-Maya, E. A. \u003cem\u003eet al.\u003c/em\u003e Effect of host genetics on the gut microbiome in 7,738 participants of the Dutch Microbiome Project. \u003cem\u003eNat Genet\u003c/em\u003e \u003cstrong\u003e54\u003c/strong\u003e, 143\u0026ndash;151 (2022).\u003c/li\u003e\n \u003cli\u003eQin, Y. \u003cem\u003eet al.\u003c/em\u003e Combined effects of host genetics and diet on human gut microbiota and incident disease in a single population cohort. \u003cem\u003eNat Genet\u003c/em\u003e \u003cstrong\u003e54\u003c/strong\u003e, 134\u0026ndash;142 (2022).\u003c/li\u003e\n \u003cli\u003eMedina-Gomez, C. \u003cem\u003eet al.\u003c/em\u003e Challenges in conducting genome-wide association studies in highly admixed multi-ethnic populations: the Generation R Study. \u003cem\u003eEur J Epidemiol\u003c/em\u003e \u003cstrong\u003e30\u003c/strong\u003e, 317\u0026ndash;330 (2015).\u003c/li\u003e\n \u003cli\u003eGurdasani, D., Barroso, I., Zeggini, E. \u0026amp; Sandhu, M. S. Genomics of disease risk in globally diverse populations. \u003cem\u003eNat Rev Genet\u003c/em\u003e \u003cstrong\u003e20\u003c/strong\u003e, 520\u0026ndash;535 (2019).\u003c/li\u003e\n \u003cli\u003eWeyrich, L. S. The evolutionary history of the human oral microbiota and its implications for modern health. \u003cem\u003ePeriodontology 2000\u003c/em\u003e \u003cstrong\u003e85\u003c/strong\u003e, 90\u0026ndash;100 (2021).\u003c/li\u003e\n \u003cli\u003eSantonocito, S. \u003cem\u003eet al.\u003c/em\u003e A Cross-Talk between Diet and the Oral Microbiome: Balance of Nutrition on Inflammation and Immune System\u0026rsquo;s Response during Periodontitis. \u003cem\u003eNutrients\u003c/em\u003e \u003cstrong\u003e14\u003c/strong\u003e, 2426 (2022).\u003c/li\u003e\n \u003cli\u003eShaw, L. \u003cem\u003eet al.\u003c/em\u003e The Human Salivary Microbiome Is Shaped by Shared Environment Rather than Genetics: Evidence from a Large Family of Closely Related Individuals. \u003cem\u003emBio\u003c/em\u003e \u003cstrong\u003e8\u003c/strong\u003e, e01237-17 (2017).\u003c/li\u003e\n \u003cli\u003eLiu, X. \u003cem\u003eet al.\u003c/em\u003e Metagenome-genome-wide association studies reveal human genetic impact on the oral microbiome. \u003cem\u003eCell Discov\u003c/em\u003e \u003cstrong\u003e7\u003c/strong\u003e, 117 (2021).\u003c/li\u003e\n \u003cli\u003eMark Welch, J. L., Ram\u0026iacute;rez-Puebla, S. T. \u0026amp; Borisy, G. G. Oral Microbiome Geography: Micron-Scale Habitat and Niche. \u003cem\u003eCell Host Microbe\u003c/em\u003e \u003cstrong\u003e28\u003c/strong\u003e, 160\u0026ndash;168 (2020).\u003c/li\u003e\n \u003cli\u003eCarr, V. R. \u003cem\u003eet al.\u003c/em\u003e Abundance and diversity of resistomes differ between healthy human oral cavities and gut. \u003cem\u003eNat Commun\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, 693 (2020).\u003c/li\u003e\n \u003cli\u003eRold\u0026aacute;n, S., Herrera, D. \u0026amp; Sanz, M. Biofilms and the tongue: therapeutical approaches for the control of halitosis. \u003cem\u003eClin Oral Investig\u003c/em\u003e \u003cstrong\u003e7\u003c/strong\u003e, 189\u0026ndash;197 (2003).\u003c/li\u003e\n \u003cli\u003eStrain profiling and epidemiology of bacterial species from metagenomic sequencing | Nature Communications. https://www.nature.com/articles/s41467-017-02209-5.\u003c/li\u003e\n \u003cli\u003eWalters, R. G. \u003cem\u003eet al.\u003c/em\u003e Genotyping and population characteristics of the China Kadoorie Biobank. \u003cem\u003eCell Genomics\u003c/em\u003e \u003cstrong\u003e3\u003c/strong\u003e, 100361 (2023).\u003c/li\u003e\n \u003cli\u003eLi, L. \u003cem\u003eet al.\u003c/em\u003e The ChinaMAP reference panel for the accurate genotype imputation in Chinese populations. \u003cem\u003eCell Res\u003c/em\u003e \u003cstrong\u003e31\u003c/strong\u003e, 1308\u0026ndash;1310 (2021).\u003c/li\u003e\n \u003cli\u003eHe, Y. \u003cem\u003eet al.\u003c/em\u003e East Asian-specific and cross-ancestry genome-wide meta-analyses provide mechanistic insights into peptic ulcer disease. \u003cem\u003eNat Genet\u003c/em\u003e \u003cstrong\u003e55\u003c/strong\u003e, 2129\u0026ndash;2138 (2023).\u003c/li\u003e\n \u003cli\u003eGenome-wide association study in 8,956 German individuals identifies influence of ABO histo-blood groups on gut microbiome | Nature Genetics. https://www.nature.com/articles/s41588-020-00747-1.\u003c/li\u003e\n \u003cli\u003eKelly, R. J., Rouquier, S., Giorgi, D., Lennon, G. G. \u0026amp; Lowe, J. B. Sequence and expression of a candidate for the human Secretor blood group alpha(1,2)fucosyltransferase gene (FUT2). Homozygosity for an enzyme-inactivating nonsense mutation commonly correlates with the non-secretor phenotype. \u003cem\u003eJ Biol Chem\u003c/em\u003e \u003cstrong\u003e270\u003c/strong\u003e, 4640\u0026ndash;4649 (1995).\u003c/li\u003e\n \u003cli\u003eRausch, P. \u003cem\u003eet al.\u003c/em\u003e Colonic mucosa-associated microbiota is influenced by an interaction of Crohn disease and FUT2 (Secretor) genotype. \u003cem\u003eProc Natl Acad Sci U S A\u003c/em\u003e \u003cstrong\u003e108\u003c/strong\u003e, 19030\u0026ndash;19035 (2011).\u003c/li\u003e\n \u003cli\u003eHuman Leukocyte Antigen (HLA) System: Genetics and Association with Bacterial and Viral Infections - Medhasi - 2022 - Journal of Immunology Research - Wiley Online Library. https://onlinelibrary.wiley.com/doi/10.1155/2022/9710376.\u003c/li\u003e\n \u003cli\u003ePoole, A. C. \u003cem\u003eet al.\u003c/em\u003e Human Salivary Amylase Gene Copy Number Impacts Oral and Gut Microbiomes. \u003cem\u003eCell Host Microbe\u003c/em\u003e \u003cstrong\u003e25\u003c/strong\u003e, 553-564.e7 (2019).\u003c/li\u003e\n \u003cli\u003ePerry, G. H. \u003cem\u003eet al.\u003c/em\u003e Diet and the evolution of human amylase gene copy number variation. \u003cem\u003eNat Genet\u003c/em\u003e \u003cstrong\u003e39\u003c/strong\u003e, 1256\u0026ndash;1260 (2007).\u003c/li\u003e\n \u003cli\u003eEsberg, A., Haworth, S., Hassl\u0026ouml;f, P., Lif Holgerson, P. \u0026amp; Johansson, I. Oral Microbiota Profile Associates with Sugar Intake and Taste Preference Genes. \u003cem\u003eNutrients\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, 681 (2020).\u003c/li\u003e\n \u003cli\u003eStubbs, M. \u003cem\u003eet al.\u003c/em\u003e Encoding of human basic and glycosylated proline-rich proteins by the PRB gene complex and proteolytic processing of their precursor proteins. \u003cem\u003eArch Oral Biol\u003c/em\u003e \u003cstrong\u003e43\u003c/strong\u003e, 753\u0026ndash;770 (1998).\u003c/li\u003e\n \u003cli\u003eChoi, S. H. \u003cem\u003eet al.\u003c/em\u003e Six Novel Loci Associated with Circulating VEGF Levels Identified by a Meta-analysis of Genome-Wide Association Studies. \u003cem\u003ePLoS Genet\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, e1005874 (2016).\u003c/li\u003e\n \u003cli\u003eMochizuki, Y. \u003cem\u003eet al.\u003c/em\u003e Phosphatidylinositol 3-Phosphatase Myotubularin-related Protein 6 (MTMR6) Is Regulated by Small GTPase Rab1B in the Early Secretory and Autophagic Pathways *. \u003cem\u003eJournal of Biological Chemistry\u003c/em\u003e \u003cstrong\u003e288\u003c/strong\u003e, 1009\u0026ndash;1021 (2013).\u003c/li\u003e\n \u003cli\u003eCastro-S\u0026aacute;nchez, P., Ramirez-Munoz, R. \u0026amp; Roda-Navarro, P. Gene Expression Profiles of Human Phosphotyrosine Phosphatases Consequent to Th1 Polarisation and Effector Function. \u003cem\u003eJournal of Immunology Research\u003c/em\u003e \u003cstrong\u003e2017\u003c/strong\u003e, 8701042 (2017).\u003c/li\u003e\n \u003cli\u003eThe GTEx Consortium atlas of genetic regulatory effects across human tissues. \u003cem\u003eScience\u003c/em\u003e \u003cstrong\u003e369\u003c/strong\u003e, 1318\u0026ndash;1330 (2020).\u003c/li\u003e\n \u003cli\u003eYin, J. \u003cem\u003eet al.\u003c/em\u003e Single-Cell Genomics Elucidates Molecular Variations and Regulatory Mechanisms in Circulating Immune Cells. \u003cem\u003ebioRxiv\u003c/em\u003e 2025.01.26.634963 (2025) doi:10.1101/2025.01.26.634963.\u003c/li\u003e\n \u003cli\u003eWelter, D. \u003cem\u003eet al.\u003c/em\u003e The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. \u003cem\u003eNucleic Acids Res\u003c/em\u003e \u003cstrong\u003e42\u003c/strong\u003e, D1001-1006 (2014).\u003c/li\u003e\n \u003cli\u003eIshigaki, K. \u003cem\u003eet al.\u003c/em\u003e Large-scale genome-wide association study in a Japanese population identifies novel susceptibility loci across different diseases. \u003cem\u003eNat Genet\u003c/em\u003e \u003cstrong\u003e52\u003c/strong\u003e, 669\u0026ndash;679 (2020).\u003c/li\u003e\n \u003cli\u003eLiu, Y. \u003cem\u003eet al.\u003c/em\u003e A widely distributed gene cluster compensates for uricase loss in hominids. \u003cem\u003eCell\u003c/em\u003e \u003cstrong\u003e186\u003c/strong\u003e, 3400-3413.e20 (2023).\u003c/li\u003e\n \u003cli\u003eBrancaccio, M. \u003cem\u003eet al.\u003c/em\u003e Sulfur-containing histidine compounds inhibit \u0026gamma;-glutamyl transpeptidase activity in human cancer cells. \u003cem\u003eJournal of Biological Chemistry\u003c/em\u003e \u003cstrong\u003e294\u003c/strong\u003e, 14603\u0026ndash;14614 (2019).\u003c/li\u003e\n \u003cli\u003eLu, Y. \u003cem\u003eet al.\u003c/em\u003e Association between Serum Uric Acid Levels and Salivary Microbiota in Patients with Obstructive Sleep Apnea. \u003cem\u003eJ. Microbiol. Biotechnol.\u003c/em\u003e \u003cstrong\u003e35\u003c/strong\u003e, e2503042 (2025).\u003c/li\u003e\n \u003cli\u003eBlekhman, R. \u003cem\u003eet al.\u003c/em\u003e Host genetic variation impacts microbiome composition across human body sites. \u003cem\u003eGenome Biol\u003c/em\u003e \u003cstrong\u003e16\u003c/strong\u003e, 191 (2015).\u003c/li\u003e\n \u003cli\u003eGoodrich, J. K. \u003cem\u003eet al.\u003c/em\u003e Human Genetics Shape the Gut Microbiome. \u003cem\u003eCell\u003c/em\u003e \u003cstrong\u003e159\u003c/strong\u003e, 789\u0026ndash;799 (2014).\u003c/li\u003e\n \u003cli\u003eWang, J. \u003cem\u003eet al.\u003c/em\u003e Genome-wide association analysis identifies variation in vitamin D receptor and other host factors influencing the gut microbiota. \u003cem\u003eNat Genet\u003c/em\u003e \u003cstrong\u003e48\u003c/strong\u003e, 1396\u0026ndash;1406 (2016).\u003c/li\u003e\n \u003cli\u003eGEM Project Research Consortium \u003cem\u003eet al.\u003c/em\u003e Association of host genome with intestinal microbial composition in a large healthy cohort. \u003cem\u003eNat Genet\u003c/em\u003e \u003cstrong\u003e48\u003c/strong\u003e, 1413\u0026ndash;1417 (2016).\u003c/li\u003e\n \u003cli\u003eBonder, M. J. \u003cem\u003eet al.\u003c/em\u003e The effect of host genetics on the gut microbiome. \u003cem\u003eNat Genet\u003c/em\u003e \u003cstrong\u003e48\u003c/strong\u003e, 1407\u0026ndash;1412 (2016).\u003c/li\u003e\n \u003cli\u003eRothschild, D. \u003cem\u003eet al.\u003c/em\u003e Environment dominates over host genetics in shaping human gut microbiota. \u003cem\u003eNature\u003c/em\u003e \u003cstrong\u003e555\u003c/strong\u003e, 210\u0026ndash;215 (2018).\u003c/li\u003e\n \u003cli\u003eSanna, S., Kurilshikov, A., Van Der Graaf, A., Fu, J. \u0026amp; Zhernakova, A. Challenges and future directions for studying effects of host genetics on the gut microbiome. \u003cem\u003eNat Genet\u003c/em\u003e \u003cstrong\u003e54\u003c/strong\u003e, 100\u0026ndash;106 (2022).\u003c/li\u003e\n \u003cli\u003eKamitaki, N. \u003cem\u003eet al.\u003c/em\u003e Human and bacterial genetic variation shape oral microbiomes and health. \u003cem\u003emedRxiv\u003c/em\u003e 2025.03.31.25324952 (2025) doi:10.1101/2025.03.31.25324952.\u003c/li\u003e\n \u003cli\u003eLiu, X. \u003cem\u003eet al.\u003c/em\u003e Mendelian randomization analyses support causal relationships between blood metabolites and the gut microbiome. \u003cem\u003eNat Genet\u003c/em\u003e \u003cstrong\u003e54\u003c/strong\u003e, 52\u0026ndash;61 (2022).\u003c/li\u003e\n \u003cli\u003eBoulund, U. \u003cem\u003eet al.\u003c/em\u003e Gut microbiome associations with host genotype vary across ethnicities and potentially influence cardiometabolic traits. \u003cem\u003eCell Host \u0026amp; Microbe\u003c/em\u003e \u003cstrong\u003e30\u003c/strong\u003e, 1464-1480.e6 (2022).\u003c/li\u003e\n \u003cli\u003eSumida, K. \u003cem\u003eet al.\u003c/em\u003e Gut Microbiota-Targeted Interventions in the Management of Chronic Kidney Disease. \u003cem\u003eSeminars in Nephrology\u003c/em\u003e \u003cstrong\u003e43\u003c/strong\u003e, 151408 (2023).\u003c/li\u003e\n \u003cli\u003eWalter, J., Armet, A. M., Finlay, B. B. \u0026amp; Shanahan, F. Establishing or Exaggerating Causality for the Gut Microbiome: Lessons from Human Microbiota-Associated Rodents. \u003cem\u003eCell\u003c/em\u003e \u003cstrong\u003e180\u003c/strong\u003e, 221\u0026ndash;232 (2020).\u003c/li\u003e\n \u003cli\u003eSchmidt, T. S. B., Raes, J. \u0026amp; Bork, P. The Human Gut Microbiome: From Association to Modulation. \u003cem\u003eCell\u003c/em\u003e \u003cstrong\u003e172\u003c/strong\u003e, 1198\u0026ndash;1215 (2018).\u003c/li\u003e\n \u003cli\u003eZhao, Y., Hu, Y., Smith, J. P., Strauss, J. \u0026amp; Yang, G. Cohort profile: the China Health and Retirement Longitudinal Study (CHARLS). \u003cem\u003eInt J Epidemiol\u003c/em\u003e \u003cstrong\u003e43\u003c/strong\u003e, 61\u0026ndash;68 (2014).\u003c/li\u003e\n \u003cli\u003eChen, X. \u003cem\u003eet al.\u003c/em\u003e Venous Blood-Based Biomarkers in the China Health and Retirement Longitudinal Study: Rationale, Design, and Results From the 2015 Wave. \u003cem\u003eAm J Epidemiol\u003c/em\u003e \u003cstrong\u003e188\u003c/strong\u003e, 1871\u0026ndash;1877 (2019).\u003c/li\u003e\n \u003cli\u003eChen, X. \u003cem\u003eet al.\u003c/em\u003e Venous Blood-Based Biomarkers in the China Health and Retirement Longitudinal Study: Rationale, Design, and Results From the 2015 Wave. \u003cem\u003eAm J Epidemiol\u003c/em\u003e \u003cstrong\u003e188\u003c/strong\u003e, 1871\u0026ndash;1877 (2019).\u003c/li\u003e\n \u003cli\u003eBlanco-M\u0026iacute;guez, A. \u003cem\u003eet al.\u003c/em\u003e Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4. \u003cem\u003eNat Biotechnol\u003c/em\u003e \u003cstrong\u003e41\u003c/strong\u003e, 1633\u0026ndash;1644 (2023).\u003c/li\u003e\n \u003cli\u003eBeghini, F. \u003cem\u003eet al.\u003c/em\u003e Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3. \u003cem\u003eeLife\u003c/em\u003e \u003cstrong\u003e10\u003c/strong\u003e, e65088 (2021).\u003c/li\u003e\n \u003cli\u003eChen, Y. \u003cem\u003eet al.\u003c/em\u003e SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data. \u003cem\u003eGigaScience\u003c/em\u003e \u003cstrong\u003e7\u003c/strong\u003e, (2018).\u003c/li\u003e\n \u003cli\u003eLi, H. \u0026amp; Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e25\u003c/strong\u003e, 1754\u0026ndash;1760 (2009).\u003c/li\u003e\n \u003cli\u003eMcKenna, A. \u003cem\u003eet al.\u003c/em\u003e The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. \u003cem\u003eGenome Res\u003c/em\u003e \u003cstrong\u003e20\u003c/strong\u003e, 1297\u0026ndash;1303 (2010).\u003c/li\u003e\n \u003cli\u003ePurcell, S. \u003cem\u003eet al.\u003c/em\u003e PLINK: a tool set for whole-genome association and population-based linkage analyses. \u003cem\u003eAm J Hum Genet\u003c/em\u003e \u003cstrong\u003e81\u003c/strong\u003e, 559\u0026ndash;575 (2007).\u003c/li\u003e\n \u003cli\u003eWiller, C. J., Li, Y. \u0026amp; Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e26\u003c/strong\u003e, 2190\u0026ndash;2191 (2010).\u003c/li\u003e\n \u003cli\u003eWang, K., Li, M. \u0026amp; Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. \u003cem\u003eNucleic Acids Res\u003c/em\u003e \u003cstrong\u003e38\u003c/strong\u003e, e164 (2010).\u003c/li\u003e\n \u003cli\u003eWang, M., Gao, J., Liu, J., Zhao, X. \u0026amp; Lei, Y. Genomic Association vs. Serological Determination of ABO Blood Types in a Chinese Cohort, with Application in Mendelian Randomization. \u003cem\u003eGenes (Basel)\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, 959 (2021).\u003c/li\u003e\n \u003cli\u003ePermutt, T. \u0026amp; Hebel, J. R. Simultaneous-equation estimation in a clinical trial of the effect of smoking on birth weight. \u003cem\u003eBiometrics\u003c/em\u003e \u003cstrong\u003e45\u003c/strong\u003e, 619\u0026ndash;622 (1989).\u003c/li\u003e\n \u003cli\u003eZhu, Z. \u003cem\u003eet al.\u003c/em\u003e Causal associations between risk factors and common diseases inferred from GWAS summary data. \u003cem\u003eNat Commun\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, 224 (2018).\u003c/li\u003e\n\u003c/ol\u003e"},{"header":"Table","content":"\u003ctable border=\"0\" cellspacing=\"0\" cellpadding=\"0\" width=\"100%\"\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"9\" style=\"width: 100px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eTable 1 |\u003c/strong\u003e \u003cstrong\u003eTen replicated host genetic loci associated with the tongue microbiome.\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 7px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eLoci\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eVariant\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 3px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eMAF\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eTaxon/Functions\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 8px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eDiscovery \u0026beta; (\u003cem\u003eP\u003c/em\u003e)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eReplication \u0026beta; (\u003cem\u003eP\u003c/em\u003e)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 7px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e4DSZ \u0026beta; (\u003cem\u003eP\u003c/em\u003e)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 4px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e\u003cem\u003eP\u003c/em\u003e\u003c/strong\u003e\u003cstrong\u003e\u003csub\u003emeta\u003c/sub\u003e\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 26px;\"\u003e\n \u003cp\u003e\u003cstrong\u003ePhenome-wide\u0026nbsp;GWAS\u0026nbsp;information\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"4\" style=\"width: 7px;\"\u003e\n \u003cp\u003e\u003cem\u003eFUT2\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e19:48703374:T:A\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 3px;\"\u003e\n \u003cp\u003e0.444\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003e\u003cem\u003es. Haemophilus sputorum\u003c/em\u003e_HB\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 8px;\"\u003e\n \u003cp\u003e-0.443 (7.42E-41)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e-0.278 (2.99E-07)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 7px;\"\u003e\n \u003cp\u003e-0.336 (3.54E-07)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 4px;\"\u003e\n \u003cp\u003e9.71E-51\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"4\" style=\"width: 26px;\"\u003e\n \u003cp\u003ecancer biomarker mesurement (6E-209), Alkaline phosphatase (2.3E-163), serum carcinoembryonic antigen (3E-81), vitamin B12 (4E-36), serum alanine aminotransferase (9E-13), Cholelithiasis (1E-11), BBJ_Gastric ulcer (1.7E-8)\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e19:48703374:T:A\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 3px;\"\u003e\n \u003cp\u003e0.444\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003e\u003cem\u003es. Granulicatella SGB8239\u003c/em\u003e_HB\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 8px;\"\u003e\n \u003cp\u003e-0.251 (1.35E-15)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e-0.184 (5.90E-04)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 7px;\"\u003e\n \u003cp\u003e-0.256 (1.12E-04)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 4px;\"\u003e\n \u003cp\u003e3.80E-21\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e19:48703374:T:A\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 3px;\"\u003e\n \u003cp\u003e0.444\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003e\u003cem\u003es. Veillonella SGB6928\u003c/em\u003e_HB\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 8px;\"\u003e\n \u003cp\u003e-0.264 (3.57E-13)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e-0.273 (1.55E-05)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 7px;\"\u003e\n \u003cp\u003e-0.227 (1.98E-02)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 4px;\"\u003e\n \u003cp\u003e2.64E-18\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e19:48709897:C:T\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 3px;\"\u003e\n \u003cp\u003e0.472\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003eDARABCATK12-PWY: Darabinose\u0026nbsp;degradation\u0026nbsp;I_HB\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 8px;\"\u003e\n \u003cp\u003e-0.195 (9.59E-10)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e-0.087 (4.53E-02)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 7px;\"\u003e\n \u003cp\u003e-0.148 (3.69E-02)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 4px;\"\u003e\n \u003cp\u003e1.33E-10\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"2\" style=\"width: 7px;\"\u003e\n \u003cp\u003e\u003cem\u003ePOLI,\u0026nbsp;C18orf54\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e18:54497190:A:T\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 3px;\"\u003e\n \u003cp\u003e0.297\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003e\u003cem\u003es. Haemophilus parahaemolyticus\u003c/em\u003e_HB\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 8px;\"\u003e\n \u003cp\u003e-0.234 (7.76E-12)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e-0.340 (2.06E-09)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 7px;\"\u003e\n \u003cp\u003e-0.121 (9.12E-02)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 4px;\"\u003e\n \u003cp\u003e4.47E-19\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" style=\"width: 26px;\"\u003e\n \u003cp\u003ecortical\u0026nbsp;thickness\u0026nbsp;(2E-13),\u0026nbsp;neuroticism\u0026nbsp;mesurement\u0026nbsp;(8E-10),\u0026nbsp;blood\u0026nbsp;white\u0026nbsp;blood\u0026nbsp;cell\u0026nbsp;counts\u0026nbsp;(4.8E-8)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e18:54499286:T:A\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 3px;\"\u003e\n \u003cp\u003e0.295\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003ePWY-7204:\u0026nbsp;pyridoxal\u0026nbsp;5\u0026apos;-phosphate\u0026nbsp;salvage\u0026nbsp;II\u0026nbsp;(plants)_HB\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 8px;\"\u003e\n \u003cp\u003e-0.249 (4.07E-11)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e-0.356 (2.20E-07)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 7px;\"\u003e\n \u003cp\u003e-0.114 (1.41E-01)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 4px;\"\u003e\n \u003cp\u003e1.25E-16\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 7px;\"\u003e\n \u003cp\u003e\u003cem\u003eAMY1C\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e1:103823102:T:A\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 3px;\"\u003e\n \u003cp\u003e0.397\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003eg. \u003cem\u003eStomatobaculum\u003c/em\u003e LOGres\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 8px;\"\u003e\n \u003cp\u003e0.076 (1.05E-11)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e0.144 (4.11E-02)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 7px;\"\u003e\n \u003cp\u003e0.000 (9.99E-01)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 4px;\"\u003e\n \u003cp\u003e2.22E-10\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 26px;\"\u003e\n \u003cp\u003ealpha-amylase\u0026nbsp;1\u0026nbsp;measurement\u0026nbsp;(3E-69),\u0026nbsp;amylase\u0026nbsp;measurement\u0026nbsp;(1E-16)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 7px;\"\u003e\n \u003cp\u003e\u003cem\u003ePRB3\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e12:11242238:A:C\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 3px;\"\u003e\n \u003cp\u003e0.432\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003es. \u003cem\u003eStreptococcus infantis\u003c/em\u003e_LOGres\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 8px;\"\u003e\n \u003cp\u003e-0.072 (4.66E-11)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e-0.091 (4.12E-02)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 7px;\"\u003e\n \u003cp\u003e-0.056 (1.36E-02)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 4px;\"\u003e\n \u003cp\u003e1.08E-12\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 26px;\"\u003e\n \u003cp\u003ebl_vitamin\u0026nbsp;D\u0026nbsp;(1E-7)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"6\" style=\"width: 7px;\"\u003e\n \u003cp\u003e\u003cem\u003eSLC2A9\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e4:10009906:C:T\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 3px;\"\u003e\n \u003cp\u003e0.406\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003eg. \u003cem\u003eLachnoanaerobaculum\u003c/em\u003e_LOGres\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 8px;\"\u003e\n \u003cp\u003e-0.072 (8.97E-11)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e-0.168 (1.23E-02)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 7px;\"\u003e\n \u003cp\u003e-0.092 (4.80E-05)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 4px;\"\u003e\n \u003cp\u003e3.46E-15\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"6\" style=\"width: 26px;\"\u003e\n \u003cp\u003eurate\u0026nbsp;measurement\u0026nbsp;(9E-3353),\u0026nbsp;blood\u0026nbsp;uric\u0026nbsp;acid\u0026nbsp;(6E-496)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e4:9988548:C:T\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 3px;\"\u003e\n \u003cp\u003e0.374\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003eg. \u003cem\u003eCandidatus Nanosyncoccus\u003c/em\u003e_HB\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 8px;\"\u003e\n \u003cp\u003e-0.196 (1.42E-09)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e-0.087 (1.09E-01)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 7px;\"\u003e\n \u003cp\u003e-0.195 (3.88E-03)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 4px;\"\u003e\n \u003cp\u003e2.80E-11\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e4:10047243:C:T\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 3px;\"\u003e\n \u003cp\u003e0.404\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003e\u003cem\u003eg. Lachnoanaerobaculum\u0026nbsp;\u003c/em\u003esp. ICM7_HB\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 8px;\"\u003e\n \u003cp\u003e-0.188 (2.74E-09)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e-0.212 (6.06E-05)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 7px;\"\u003e\n \u003cp\u003e-0.260 (1.08E-04)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 4px;\"\u003e\n \u003cp\u003e5.56E-16\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e4:10014328:G:C\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 3px;\"\u003e\n \u003cp\u003e0.380\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003ePWY-6353:\u0026nbsp;purine\u0026nbsp;nucleotides\u0026nbsp;degradation II\u0026nbsp;(aerobic)_LOGres\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 8px;\"\u003e\n \u003cp\u003e-0.066 (3.01E-09)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e-0.064 (2.21E-02)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 7px;\"\u003e\n \u003cp\u003e-0.082 (4.20E-04)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 4px;\"\u003e\n \u003cp\u003e1.05E-12\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e4:10014328:G:C\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 3px;\"\u003e\n \u003cp\u003e0.380\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003eSALVADEHYPOX-PWY:\u0026nbsp;adenosine\u0026nbsp;nucleotides\u0026nbsp;degradation II_LOGres\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 8px;\"\u003e\n \u003cp\u003e-0.066 (3.37E-09)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e-0.065 (2.55E-02)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 7px;\"\u003e\n \u003cp\u003e-0.081 (4.96E-04)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 4px;\"\u003e\n \u003cp\u003e1.58E-12\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e4:10057718:TA:T\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 3px;\"\u003e\n \u003cp\u003e0.372\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003es. \u003cem\u003eOribacterium SGB5283\u003c/em\u003e_HB\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 8px;\"\u003e\n \u003cp\u003e0.183 (4.83E-08)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e0.189 (5.75E-04)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 7px;\"\u003e\n \u003cp\u003eNA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 4px;\"\u003e\n \u003cp\u003e1.10E-10\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 7px;\"\u003e\n \u003cp\u003e\u003cem\u003eHLA-DRA,HLA-DRB5\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e6:32466946:A:T\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 3px;\"\u003e\n \u003cp\u003e0.061\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003ep. Bacteroidetes | s. \u003cem\u003eGGB1611_SGB2208\u003c/em\u003e_HB\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 8px;\"\u003e\n \u003cp\u003e-0.425 (1.05E-09)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e-0.467 (3.34E-05)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 7px;\"\u003e\n \u003cp\u003e-0.274 (4.88E-02)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 4px;\"\u003e\n \u003cp\u003e3.91E-14\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 26px;\"\u003e\n \u003cp\u003erheumatoid\u0026nbsp;arthritis\u0026nbsp;(6.6E-38),\u0026nbsp;autoimmune\u0026nbsp;disease\u0026nbsp;(2.7E-27),\u0026nbsp;staphylococcus\u0026nbsp;seropositivity\u0026nbsp;(2E-26),\u0026nbsp;white\u0026nbsp;blood\u0026nbsp;cell\u0026nbsp;counts\u0026nbsp;(1.1E-17)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 7px;\"\u003e\n \u003cp\u003e\u003cem\u003eTMCO3\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e13:113536340:CA:C\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 3px;\"\u003e\n \u003cp\u003e0.381\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003es. \u003cem\u003eCapnocytophaga\u0026nbsp;\u003c/em\u003esp\u003cem\u003e. oral taxon 863\u003c/em\u003e_HB\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 8px;\"\u003e\n \u003cp\u003e0.191 (6.00E-09)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e0.116 (2.54E-02)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 7px;\"\u003e\n \u003cp\u003e0.090 (1.85E-01)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 4px;\"\u003e\n \u003cp\u003e6.61E-10\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 26px;\"\u003e\n \u003cp\u003eblood\u0026nbsp;urea\u0026nbsp;nitrogen\u0026nbsp;(1.6E-4)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 7px;\"\u003e\n \u003cp\u003e\u003cem\u003eMRPS18A,VEGFA\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e6:43728831:GT:G\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 3px;\"\u003e\n \u003cp\u003e0.059\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003es. \u003cem\u003eGGB12441_SGB19290\u003c/em\u003e_HB\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 8px;\"\u003e\n \u003cp\u003e0.395 (9.31E-09)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e0.237 (2.96E-02)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 7px;\"\u003e\n \u003cp\u003e0.375 (1.04E-02)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 4px;\"\u003e\n \u003cp\u003e5.36E-11\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 26px;\"\u003e\n \u003cp\u003ewhite\u0026nbsp;blood\u0026nbsp;cell\u0026nbsp;count\u0026nbsp;(1.1E-4)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 7px;\"\u003e\n \u003cp\u003e\u003cem\u003eMTMR12,ZFR\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e5:32449710:T:C\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 3px;\"\u003e\n \u003cp\u003e0.167\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003ep. Bacteroidetes | c. CFGB570_LOGres\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 8px;\"\u003e\n \u003cp\u003e0.061 (2.54E-08)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e0.18 (2.36E-02)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 7px;\"\u003e\n \u003cp\u003e0 (0.972)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 4px;\"\u003e\n \u003cp\u003e4.41E-8\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 26px;\"\u003e\n \u003cp\u003eheat\u0026nbsp;disease(2.6E-4)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"7\" style=\"width: 7px;\"\u003e\n \u003cp\u003e\u003cem\u003eMTMR6,NUP58\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e13:25293309:T:C\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 3px;\"\u003e\n \u003cp\u003e0.411\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003ePWY-2941:L-lysine\u0026nbsp;biosynthesis\u0026nbsp;II_LOGres\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 8px;\"\u003e\n \u003cp\u003e-0.057 (3.70E-07)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e-0.085 (8.94E-06)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 7px;\"\u003e\n \u003cp\u003eNA\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 4px;\"\u003e\n \u003cp\u003e2.82E-11\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"7\" style=\"width: 26px;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e13:25293309:T:C\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 3px;\"\u003e\n \u003cp\u003e0.411\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003ePWY-5910: superpathway\u0026nbsp;of\u0026nbsp;geranylgeranyl diphosphate\u0026nbsp;biosynthesis\u0026nbsp;I\u0026nbsp;(via\u0026nbsp;mevalonate)_LOGres\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 8px;\"\u003e\n \u003cp\u003e-0.054 (1.21E-06)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e-0.114 (2.32E-05)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 7px;\"\u003e\n \u003cp\u003eNA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 4px;\"\u003e\n \u003cp\u003e2.17E-10\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e13:25293309:T:C\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 3px;\"\u003e\n \u003cp\u003e0.411\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003eLACTOSECAT-PWY:\u0026nbsp;lactose\u0026nbsp;and\u0026nbsp;galactose\u0026nbsp;degradation\u0026nbsp;I_LOGres\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 8px;\"\u003e\n \u003cp\u003e-0.057 (2.39E-07)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e-0.108 (2.49E-04)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 7px;\"\u003e\n \u003cp\u003eNA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 4px;\"\u003e\n \u003cp\u003e2.62E-10\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e13:25293309:T:C\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 3px;\"\u003e\n \u003cp\u003e0.411\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003eo. Lactobacillales_LOGres\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 8px;\"\u003e\n \u003cp\u003e-0.051 (2.74E-06)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e-0.175 (1.52E-05)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 7px;\"\u003e\n \u003cp\u003eNA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 4px;\"\u003e\n \u003cp\u003e4.07E-10\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e13:25293309:T:C\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 3px;\"\u003e\n \u003cp\u003e0.411\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003ePWY-922: mevalonate\u0026nbsp;pathway\u0026nbsp;I\u0026nbsp;(eukaryotes\u0026nbsp;and\u0026nbsp;bacteria)_LOGres\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 8px;\"\u003e\n \u003cp\u003e-0.057 (2.92E-07)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e-0.087 (3.96E-03)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 7px;\"\u003e\n \u003cp\u003eNA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 4px;\"\u003e\n \u003cp\u003e6.64E-10\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e13:25293309:T:C\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 3px;\"\u003e\n \u003cp\u003e0.411\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003ec. Bacilli_LOGres\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 8px;\"\u003e\n \u003cp\u003e-0.052 (2.43E-06)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e-0.166 (3.53E-05)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 7px;\"\u003e\n \u003cp\u003eNA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 4px;\"\u003e\n \u003cp\u003e1.65E-09\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e13:25293309:T:C\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 3px;\"\u003e\n \u003cp\u003e0.411\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003eg. \u003cem\u003eStreptococcus\u003c/em\u003e_LOGres\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 8px;\"\u003e\n \u003cp\u003e-0.049 (7.34E-06)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 9px;\"\u003e\n \u003cp\u003e-0.171 (2.26E-05)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 7px;\"\u003e\n \u003cp\u003eNA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 4px;\"\u003e\n \u003cp\u003e4.14E-09\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"9\" style=\"width: 100px;\"\u003e\n \u003cp\u003eTen loci were identified through M-GWAS, with eight reaching study-wide significance. Association statistics include effect size (\u0026beta;) and \u003cem\u003eP\u003c/em\u003e values for the discovery cohort, replication cohort and combined 4DSZ cohort, along with the meta-analysis \u003cem\u003eP\u003c/em\u003e values (\u003cem\u003eP\u003c/em\u003e\u003csub\u003emeta\u003c/sub\u003e). Previously reported phenotype associations from phenome-wide GWAS are annotated where available. MAF, minor allele frequency.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"nature-portfolio","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Nature Portfolio","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"ejp","reportingPortfolio":"","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-8406553/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8406553/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Host genetic influence on the oral microbiota, their functions, and associations with host phenotypes remain under investigated. Here, we present a large-scale genome-wide association study of the tongue dorsum microbiome in 13,397 Chinese participants from the CHARLS-4DSZ dataset, identifying ten genome-wide significant and replicated loci associated with 17 microbial taxa, eight pathways, and 1,783 gene families. The strongest signal maps to the missense variant FUT2 I140F, which regulates three taxa, with the most significant effect on Haemophilus sputorum (P = 9.71 × 10−51), and this association is independent of ABO blood groups. FUT2 is further associated with microbial D-arabinose degradation pathway and 134 gene families, including α-L-fucosidases and ABC transporters, implicating fucose-mediated host genetic regulation of microbial metabolism. Most identified microbiome-associated loci are functionally interpretable, affecting tissue/single-cell gene expression and linking to host immunometabolic traits: the POLI locus associates with Haemophilus parahaemolyticus and influences white blood cell counts and triglyceride levels, while SLC2A9 (urate transporter) regulates serum uric acid and uric acid-degrading bacteria harboring uric acid-utilizing gene clusters. Additionally, 239 significant associations are observed between 94 microbial features and 43 host phenotypes. Mendelian randomization further confirms eleven causal relationships, including microbial effects on blood gamma-glutamyl transferase, creatine, and uric acid, suggesting microbial roles in the host liver and kidney metabolism. Together, our study provides a comprehensive map of oral microbiome genetics, advancing mechanistic understanding of host-microbe interactions.","manuscriptTitle":"Host genetic control of the oral microbiome and its links to human metabolism and immunity","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-01-16 11:02:58","doi":"10.21203/rs.3.rs-8406553/v1","editorialEvents":[],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"nature-genetics","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"ng","sideBox":"Learn more about [Nature Genetics](http://www.nature.com/ng/)","snPcode":"","submissionUrl":"","title":"Nature Genetics","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Nature Research","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"cea79724-088e-481c-b6b0-d115846ae3f2","owner":[],"postedDate":"January 16th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":60593078,"name":"Biological sciences/Genetics/Genetic association study/Genome-wide association studies"},{"id":60593079,"name":"Biological sciences/Immunology"}],"tags":[],"updatedAt":"2026-03-20T18:35:31+00:00","versionOfRecord":[],"versionCreatedAt":"2026-01-16 11:02:58","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8406553","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8406553","identity":"rs-8406553","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.