A Map of Multi-omics Quantitative Trait Loci in a Chinese Population Reveals Regulatory Variations and Disease Links

doi:10.21203/rs.3.rs-8193543/v1

A Map of Multi-omics Quantitative Trait Loci in a Chinese Population Reveals Regulatory Variations and Disease Links

2026 · doi:10.21203/rs.3.rs-8193543/v1

preprint OA: closed

Full text JSON View at publisher

Full text 168,537 characters · extracted from preprint-html · click to expand

A Map of Multi-omics Quantitative Trait Loci in a Chinese Population Reveals Regulatory Variations and Disease Links | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article A Map of Multi-omics Quantitative Trait Loci in a Chinese Population Reveals Regulatory Variations and Disease Links Peilin Jia, Peng Yang, Shuhua Li, Qiwen Zheng, Xinxuan Liu, Siyu Pan, and 12 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8193543/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted You are reading this latest preprint version Abstract Quantitative trait loci (QTL) studies have been pivotal in mapping the genetic regulation of molecular traits but have been primarily conducted in European populations, limiting insights into diverse ethnic groups. To close this knowledge gap, we conducted a large-scale multi-omics QTL analyses using blood samples from 3,102 Chinese individuals, systematically characterizing the regulatory effects of genetic variants on DNA methylation, protein levels, and metabolites. Our study identified 209 protein QTLs (pQTLs) for 155 proteins and 587 metabolite QTLs (metabQTLs) for 369 metabolites. By integrating these findings with cis-methylation QTL (meQTL) associations identified in our previous work, we defined the shared genetic architecture across these three molecular layers. Colocalization analyses, both within our cohort and with external xQTLs, revealed 3,665 pairs of shared causal variants across traits, supported by strong mediation evidence for a regulatory cascade in 187 pairs. To link these molecular findings to health outcomes, we performed Mendelian randomization (MR) analyses, identifying 497 potential causal relationships between molecular traits and diseases. These findings were further validated through observational and colocalization studies. Collectively, we present a comprehensive genomic atlas of meQTLs, pQTLs, and metabQTLs specific to East Asian populations, providing critical insight into shared regulatory networks and candidate causal variants across molecular and disease phenotypes. Biological sciences/Genetics/Genetic association study/Genome-wide association studies Biological sciences/Genetics/Population genetics Biological sciences/Biochemistry/Metabolomics Biological sciences/Biochemistry/Proteomics Biological sciences/Genetics/Genomics/Epigenomics methylation QTL (meQTL) protein QTL (pQTL) metabolite QTL (metabQTL) East Asian cohorts colocalization mendelian randomization Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Introduction Genome-wide association studies (GWASs) have identified thousands of genetic variants associated with a wide range of complex traits and diseases 1 . However, the majority of these variants are located in noncoding regions 2 , suggesting that they exert their effects through regulatory mechanisms 3 , 4 . Quantitative trait loci (QTL) analyses have emerged as powerful tools for elucidating the genetic architecture of molecular phenotypes and understanding the regulatory functions of genetic variants in disease pathogenesis 5 – 7 . Among these, DNA methylation QTL (meQTL) studies have identified genetic variants associated with CpG methylation levels. For example, Min et al. integrated data from 36 cohorts comprising over 30,000 individuals and identified more than 270,000 independent meQTLs 8 . Similarly, Huan et al. reported over 4.7 million meQTLs in a study of 4,170 individuals 9 . Recently, we conducted a meQTL study in an East Asian population, analyzing whole blood samples from approximately 5,000 Chinese individuals 10 . Protein QTL (pQTL) analyses have also advanced, including the UK Biobank Pharma Proteomics Project (UKB-PPP) 11 and the deCODE 12 projects, mainly focusing on the European ancestry. Furthermore, the Canadian Longitudinal Study on Aging identified 1,702 metabolite QTL (metabQTL) associations for 690 metabolites in over 8,000 Canadians 13 . Notably, most of these QTL studies have focused on European populations, leaving regulatory genetic mechanisms in other ancestries, such as East Asians, underexplored 7 – 9 . Given the distinct genetic architecture, dietary patterns, and lifestyles in different populations, findings from European studies may not be directly transferable to East Asian populations 13 – 18 . Integrating multi-omics data is essential for disentangling the regulatory mechanisms underlying human traits and diseases. Analyzing multiple molecular layers within the same cohort can reduce the confounding effects from linkage disequilibrium (LD) and enhance the detection of shared genetic signals 19 . For example, a recent study 20 integrating transcriptomic, proteomic, and metabolomic QTLs revealed extensive genetic regulation of molecular traits and identified therapeutic targets such as WARS1 for hypertension and IFNAR2 for COVID-19. Such integrative analyses provide a holistic view of the interactions among epigenetic modifications, gene expression, protein activity, and metabolism, thereby accelerating biomarker discovery and therapeutic development. In the current study, we performed pQTL and metabQTL mapping using plasma protein and plasma metabolite data from 3,102 genotyped Chinese individuals (Fig. 1 ; Figure S1 ). Through colocalization, mediation, and partial correlation analyses, we identified shared genetic mechanisms across gene expression QTLs (eQTLs), meQTLs, pQTLs, and metabQTLs. Our analyses revealed complex regulatory interactions among epigenetic modifications, gene expression, protein abundance, and metabolite levels. Furthermore, by leveraging data from BioBank Japan (BBJ), we conducted two-sample Mendelian randomization (MR) analyses to assess the causal effects of proteins and metabolites on 220 traits and diseases in East Asians 21 . Overall, our study enhanced QTL data regarding East Asian populations, providing valuable insights into the regulatory mechanisms underlying complex traits and diseases and supporting the discovery of potential biomarkers and therapeutic targets. Results Overview of the Chinese Academy of Sciences (CAS) Cohort Multi-omics Data The CAS cohort is a prospective study with baseline data collected between August 2020 and October 2021. The clinical characteristics of 3,102 participants are summarized in Table S1 . Blood samples were collected after overnight fasting. We generated genotyping data (Illumina Infinium Asian Screening Array + MultiDisease-24 array followed by imputation for all participants), protein abundances (the Olink Explore 384 Inflammation panel 22 for 1,056 participants), metabolites (untargeted mass spectrometry for 2,470 participants), plus previously-reported methylation data (Illumina Infinium MethylationEPIC BeadChip for 1,060 participants). Building upon these resources, we conducted pQTL, protein ratio QTL (prQTL), metabQTL, and metabolite ratio QTL (mrQTL), generating a comprehensive atlas for multi-omics QTLs (xQTLs) for the East Asian population. MeQTL analysis for the CAS cohort was reported in our previous study 10 , in which we identified 20,917,614 meQTLs (14,508,327 cis- meQTLs, involving 2,121,167 SNPs and 170,047 CpGs) and successfully replicated 93.81% of them in cross-cohorts. In this study, we primarily focused on protein and metabolite QTLs and conducted integrative analyses across all types of omics data. Genetic Structure of Immune Plasma Proteome We conducted association analyses on the plasma protein abundances for 384 proteins measured in 1,056 CAS cohort participants ( Table S2 ). In total, 19,719 significant pQTLs involving 18,577 variants (pSNPs) were identified for 155 proteins ( P < 5 × 10 − 8 ; Fig. 1 b, Figure S2 , Table S3 ), representing 209 independent loci: 154 (74%) cis - and 55 (26%) trans -pQTLs. The genomic inflation factor lambda for all proteins ranged from 0.971 to 1.022, with an average value of 0.997, indicating no significant inflation ( Table S2 ). Of the 155 proteins, 106 (68%) had only cis -pQTLs, 28 (18%) had only trans -pQTLs, and 21 (14%) had both. Each protein was associated with one to four independent loci, with most (70.3%) having a single pQTL and seven (4.5%) having three or more pQTLs ( Figure S3b ). SNP-based heritability of 155 proteins was estimated using all independent lead pQTLs ( Methods; Fig. 2 a, d-e; Table S4 ). The median variance explained was 0.086 (range: 0.024 for the protein MEPE to 0.609 for PNLIPRP2) for 106 proteins with only cis -pQTLs, 0.049 (range: 0.025 for BSG to 0.357 for PTPRM) for 28 proteins with only trans -pQTLs, and 0.108 (range: 0.049 for EPHA1 to 0.498 for ENPP7) for 21 proteins with both cis - and trans -pQTLs. Most variants had a single associated protein, while 31 variants were linked to up to nine proteins ( Figure S3c ). A notable hotspot was observed at the ABO locus (rs8176693), associated with nine proteins: CD200, CD79B, CLEC4G, CTRC, ICAM4, IFNGR1, IL3RA, LIFR, and PTPRM (Fig. 2 b). Notably, the associations between the ABO locus and CTRC and CD200 have been reported previously 23 , confirming our findings. To assess the novelty of the identified pQTLs, we compared our results with 15 recent studies (two East Asian and 13 European; Tables S5 ). Overall, 104 in East Asian datasets (considering LD R2 > 0.8) and 161 pQTLs were replicated in European datasets, with matching alleles and effect directions. Importantly, 34 (16.3%) pQTLs were absent from both populations and were deemed novel (Fig. 2 c). Further comparison with the UKB-PPP 11 study, which profiled 2,923 proteins in 54,219 participants (262 East Asians and 34,557 Europeans) and covered all 384 inflammation-related proteins in our study, showed that 12.9% and 56.9% of the 209 independent pQTLs were replicated in the UKB-PPP East Asian and European data, respectively ( P < 5 × 10 − 8 ). Comparison of the effect sizes showed highly concordant results, with the Pearson correlation coefficients (r) being 0.99 for the East Asian and 0.93 for the UKB-PPP European data, respectively ( Figure S4a-b ). Similar replication patterns were observed for all 19,719 significant associations (8% in East Asians and 62.5% in Europeans; r = 0.98 and 0.92, respectively; Figure S4c-d ). In addition to single-protein QTLs, we analyzed protein-ratio QTLs (prQTLs) and identified 13 significant prQTLs involving seven protein ratios, of which several shared genetic signals with one or more individual proteins (see Supplementary Materials for details). Genetic Structure of Plasma Metabolites We applied precise metabolomics techniques to semi-quantitatively analyze 841 plasma metabolites from 2,470 participants in the CAS cohort 24 . These metabolites are classified into 10 superclasses 25 , including lipids and organic acids ( Table S8 ). Our analyses identified 45,001 metabQTLs (587 independent) for 369 metabolites across 194 loci at P < 5 × 10 − 8 (Fig. 1 b; Fig. 3 a; Table S9 ). The genomic inflation factor ranged from 0.97 to 1.03 with a median of 1.00, indicating minimal confounding from population structure ( Table S10 ). Significant pleiotropy was observed, with 86 of the 194 loci associated with multiple metabolites (range: 2–28). The fatty acid desaturase ( FADS ) locus on chromosome 11 (lead SNP: rs174548) exhibited the strongest pleiotropy, associating with 28 metabolites, mainly lipids or lipid-like molecules (27/28) (Fig. 3 b), consistent with previous studies linking this locus to 75 lipids 13 . SNP-based heritability of metabolite levels was estimated using GCTA-GREML 26 ( Table S11 ), with a median of 0.103. Lipids, lipid-like molecules, and organic oxygen compounds exhibited higher heritability, whereas alkaloids and nucleosides were lower (Fig. 3 c). To evaluate the novelty of the identified metabQTLs, we curated a comparison set of 50 recent studies, comprising two from East Asian, 39 from European, and nine from other populations 27 ( Table S12 ). Our results demonstrated significant validation: nine of 587 metabQTLs were replicated in East Asian cohorts (covering 8 of 9 previously investigated metabolites), and 191 (covering 130 of 171 previously investigated metabolites) were validated in European populations, for a total of 193 replicated metabQTLs ( Table S10 ). The effect direction also showed strong concordance, with fully concordant with East Asian studies and > 90% concordant with European studies 13 , 16 , 28 ( Figure S5 ). This suggested largely shared but partially distinct genetic architectures across populations. Overall, 394 metabQTLs were identified as novel, involving 261 metabolites across all 10 superclasses and spanning all 22 chromosomes (Fig. 3 d). In addition, a metabolite ratio QTL (mrQTL) analysis was performed. A total of 256 biologically informed metabolite ratios were analyzed, exhibiting a median heritability of 0.110. From these, 129 independent mrQTLs were identified across 58 loci, including 102 novel mrQTLs. Notably, 31 mrQTLs showed no significant association with their constituent metabolites, emphasizing that ratio-based analysis can reveal genetic determinants missed by examining individual metabolites. Detailed methods and results are provided in Supplementary Materials. Shared Causal Variants Across Molecular Traits: Within the CAS Cohort Collectively, we generated xQTLs using our CAS cohort (CAS-meQTL, CAS-pQTL, and CAS-metabQTL) and performed two complementary colocalization analyses to investigate shared genetic regulation of molecular traits. First, we used our within-cohort xQTLs to circumvent LD confounding and identify multi-omics causal variants. Second, we integrated external eQTLs to assess general and tissue-specific overlap. Throughout these analyses, effect direction refers to the allelic effect of the lead SNP (i.e., SNPs with the highest PP.H4 within each colocalized pair) on each pair of molecular traits. Colocalization analysis of CAS-meQTLs and CAS-pQTLs revealed 785 CpG-protein pairs (681 CpG sites, 125 proteins, Table S17 ). SNPs predominantly exhibited opposing effects on methylation and protein levels (54.9%) (Fig. 4 a), with negative correlations between methylation and protein levels (Fig. 4 b). This is consistent with the hypothesis that methylation suppresses gene expression and lowers protein levels. For example, at cg25259754- FCRL3 (mRNA)-FCRL3 (protein), all three signals were colocalized. The lead SNP was rs2210913, where the T allele increased FCRL3 mRNA (eQTL: beta = 0.73, P = 3.27 × 10 − 310 ) and protein (pQTL: beta = 1.1, P = 2.97 × 10 − 202 ) while decreasing methylation at cg25259754 (meQTL: beta = -0.5, P = 3.66 × 10 − 142 ). cg25259754 was strongly negatively correlated with FCRL3 protein (Pearson’s r = -0.54, P = 4.45 × 10 − 81 ). In contrast, at cg10700560- LHPP (mRNA)-LHPP (protein), all three signals were colocalized (Fig. 4 e), where the T allele of lead SNP rs11245086 showed negative effects on all three traits (meQTL: beta = -0.31, P = 3.28 × 10 − 17 ; eQTL: beta = -0.37, P = 3.27 × 10 − 310 ; pQTL: beta = -0.34, P = 4.71 × 10 − 20 ), while cg10700560 was positively correlated with LHPP protein (Pearson’s r = 0.19, P = 8.13 × 10 − 10 ). Colocalization analysis of CAS-meQTLs and CAS-metabQTLs revealed 2,874 CpG-metabolite pairs (984 CpG sites, 261 metabolites; Table S18 ). SNP effects on methylation and metabolite levels predominantly exhibited concordant direction (53.2%) (Fig. 4 a), corresponding to positive associations between methylation and metabolite levels (Fig. 4 b). DNA methylation may regulate metabolites via enzyme gene expression modulation. For example, colocalization was observed among cg21029357, FADS2 , and docosapentaenoic acid (Fig. 4 f). The A allele of the lead SNP rs174559 was associated with decreased methylation at cg21029357 (meQTL: beta = -0.17, P = 1.69 × 10 − 7 ), increased FADS2 mRNA expression (eQTL: beta = 0.64, P = 3.27 × 10 − 310 ), and reduced docosapentaenoic acid levels (metabQTL: beta = -0.19, P = 4.76 × 10 − 11 ). Additionally, cg21029357 showed a significant positive correlation with docosapentaenoic acid (Pearson’s r = 0.09, P = 0.0025). Colocalization analysis of CAS-pQTLs and CAS-metabQTLs revealed six protein-metabolite pairs (five proteins, five metabolites; Table S19 ), with SNP effects concordant in three pairs and opposite in the other three (Fig. 4 a). Four pairs showed negative associations (Fig. 4 b). Shared Causal Variants Across Molecular Traits: Colocalization with External eQTLs Colocalization analysis with external eQTL datasets included eQTLGen 29 , which encompassed blood eQTLs, and GTEx 30 , which encompassed eQTLs for 49 tissues categorized into five categories: epithelial, immune, mesenchymal, neural, and others, allowing for the assessment of tissue-specific colocalization 31 . Using CAS-meQTL and eQTLGen, we identified 20,543 CpG-gene pairs (20,543 CpG sites and 5,636 genes), with 52% of lead SNPs showing opposite effects on methylation and expression (Fig. 4 c; Table S20 ). Colocalization analyses with GTEx tissue eQTLs identified 6,421 colocalized pairs (6,421 CpGs and 1,176 genes), with 50.4% of SNPs showing opposite effects (Fig. 4 c; Table S21 ). GTEx analyses highlighted tissue-specific regulation: 36.6% (2,351/6,421) of pairs were category-specific, while 17% were found across all tissue categories (Fig. 4 d). The number of colocalized pairs varied significantly by tissue, ranging from the fewest in kidney cortex (n = 175, 44% effect consistency) to the most in whole blood (n = 1,595, 51.7% opposite effects). Colocalization analysis between CAS-pQTLs and eQTLs for the corresponding genes confirmed a modest overall degree of colocalization, consistent with prior studies 19 . Using eQTLGen, we identified 21 gene-protein pairs where the allelic effects on mRNA and protein abundance were highly concordant (95.2% of SNPs; Fig. 4 c; Table S22 ). Colocalization with GTEx tissue eQTLs yielded seven gene-protein pairs (Fig. 4 c; Table S23 ). Of these, four were category-specific and three were shared (Fig. 4 d). Notably, BTN3A2 demonstrated the most pervasive regulation, colocalizing across all 49 tissues, which suggests a broadly active regulatory mechanism. Colocalization analysis between CAS-metabQTLs and eQTLs identified 362 potential effector genes for metabQTLs associated with 230 metabolites ( Table S24 ). In eQTLGen, we discovered colocalization evidence for 134 metabolites and 105 genes, including the key example glutarylcarnitine- GCDH (colocalization PP = 0.99; opposite effect direction) and deoxycholic acid 3-glucuronide with UGT2B17 (PP = 0.88; same effect direction). Interestingly, GCDH encodes glutaryl-CoA dehydrogenase, and its deficiency leads to glutarylcarnitine accumulation. In GTEx, 955 metabolite-gene pairs (170 metabolites, 285 genes) across 49 tissues ( Table S25 ) were detected. We identified 181 pairs (35 genes and 76 metabolites) in whole blood, of which 84 pairs (46.4%) replicated the eQTLGen findings. The GTEx data also demonstrated regulatory diversity: 57.5% (549/955) pairs were category-specific, while only 12.3% (117/955) pairs were pan-tissue categories (e.g., glutarylcarnitine- GCDH in 44 tissues), highlighting widespread tissue-specific regulatory mechanisms (Figs. 4 d and S8). Mediation and Partial Correlation Analyses Our CAS cohort contained 1,054 individuals with matched genotype, methylation, proteomic, and metabolomic profiles, enabling mediation and partial correlation analyses that were otherwise not applicable. Using 3,665 colocalized pairs (785 CpG-protein pairs, 2,874 CpG-metabolite pairs, and six protein-metabolite pairs), we performed bidirectional mediation analyses to evaluate whether the effects of genetic variants on one molecular trait were mediated through another, and vice versa 32 – 34 . We identified 187 mediation pairs (Sobel P 0) (Fig. 5 a) across four models: SNP-CpG-Protein (SCP), SNP-Protein-CpG (SPC), SNP-CpG-Metabolite (SCM), and SNP-Metabolite-CpG (SMC) ( Table S26 ). Among these, 32 pairs exhibited unidirectional mediation, while 155 exhibited bidirectional mediation. The overall median mediation proportion was 0.095 (Fig. 5 b), with the SCM model demonstrating the highest estimates (median = 0.146) and the SMC model the lowest (median = 0.056). After Bonferroni correction (Sobel P 0), 17 significant pairs remained (8 unidirectional, 9 bidirectional) (Fig. 5 a), with an increased median mediation proportion of 0.249 (Fig. 5 b). The SCM model again had the highest mediation proportion (median = 0.416). We conducted partial correlation analysis on the 3,665 colocalized pairs to determine if the relationship between the two colocalized molecular phenotypes was independent of the causal genetic variant. This analysis tests for pleiotropy: if the variants independently influence both traits, the SNP-adjusted residuals should exhibit no residual correlation 32 . As expected, correlations decreased after adjusting for genetic effects (Fig. 5 c). Out of all pairs, 315 (8.6%) showed a significant partial correlation ( P < 0.05), with only 25 pairs (0.7%) remaining significant after Bonferroni correction ( Table S27 ). The results from the partial correlation analysis highly align with the mediation analysis: all 187 significant mediation pairs were captured within the 315 significant partial correlations (Fig. 5 d). For example, the rs28548211 – cg09540471 – AMN locus, where cg09540471 is located in the first exon of AMN , showed strong mediation and partial correlation. rs28548211 had a significant effect on cg09540471 (meQTL: beta = -0.34, P = 8.28 × 10 − 10 ) and on AMN (pQTL: beta = 0.37, P = 4.63 × 10 − 8 ). Mediation analysis found that cg09540471 methylation mediated 22.6% of rs28548211’s effect on AMN and maintained a residual negative correlation after adjustment for the SNP, suggesting that cg09540471 methylation may causally influence AMN levels. Similarly, the rs11245086 – cg10700560 – LHPP locus exhibited bidirectional mediation (SCP and SPC Sobel P < 0.05) and significant partial correlation (Pearson’s P < 0.05). After removing the effect of rs11245086, cg10700560 and LHPP remained positively correlated, further supporting the regulatory role of cg10700560 methylation in LHPP expression. Additionally, we identified the triplet rs174559 – cg21029357 – docosapentaenoic acid exhibiting bidirectional mediation, with significant Sobel P -values for both SCM and SMC ( P < 0.05) and significant partial correlation (Pearson’s P < 0.05). After adjusting for rs174559, cg21029357 and docosapentaenoic acid maintained a significant positive correlation, providing further evidence that cg21029357 methylation plays a regulatory role in docosapentaenoic acid levels. Causal Associations Between Proteins or Metabolites and Human Complex Phenotypes We performed two-sample MR analysis 35 to evaluate potential causal relationships between protein/metabolite levels and complex human phenotypes. Specifically, we used proteins/ metabolites involved in QTLs as exposures ( Tables S28-29 ), while outcome data were sourced from GWAS results for 220 phenotypes in the Biobank Japan (BBJ, East Asian) cohort 21 ( Table S30 ). After Bonferroni correction ( P < 2.27 × 10 − 4 ; 0.05/220), we identified 229 significant protein-outcome associations (55 proteins and 62 outcomes) and 290 significant metabolite-outcome associations (86 metabolites and 86 outcomes). Reverse MR confirmed bidirectional causality for two protein-outcome pairs (C1QA with serum creatinine; ICAM4 with red blood cell count) and 20 metabolite-outcome pairs (15 metabolites, with 10 pairs driven by total bilirubin). These associations were excluded from subsequent analyses, yielding 227 protein–outcome and 270 metabolite–outcome pairs ( Tables S31-32 ). We employed two distinct strategies to validate our MR results: observational consistency analysis and colocalization analysis. First, we examined linear relationships between 25 biomarker outcomes and their corresponding exposures in the CAS cohort. Among 105 protein-outcome pairs, 40 (38.1%) exhibited statistically significance observational associations ( P < 0.05), with 26 (65%) showing directionally consistent regression coefficients relative to MR beta estimates ( Table S33 ; Figure S9 ). For 130 metabolite-outcome pairs, 69 (53.1%) were significant ( P < 0.05), with 45 (65.2%) demonstrating consistency with MR beta estimates ( Table S34 ; Figure S9 ). This concordance between observational and MR-based associations supports the reliability of our causal inference framework. Second, we performed colocalization analysis to validate shared genetic determinants between exposures and outcomes. Specifically, we found 132 (58.1%) protein–outcome pairs and 169 (62.6%) metabolite–outcome pairs showed significant colocalization (PP > 0.75; Methods ), involving 81 exposures (34 proteins and 47 metabolites) and 89 outcomes ( Tables S31-32 ). The colocalization results suggested that protein/metabolite levels and the related outcomes may be co-regulated by shared genetic variants, further supporting the robustness of our MR results. One of our MR results showed that a significant causal association between elevated FCRL3 levels and increased Graves’ disease risk (MR: beta = 0.162, P = 8.37×10 − 12 ; Fig. 6 a). MR analyses identified five independent SNPs as instrumental variables (IVs), among which rs2210912 demonstrated the strongest association with FCRL3 protein levels (pQTL: P = 1.64 × 10 − 202 , Fig. 6 a). Colocalization analysis further supported this finding, indicating shared genetic variants between FCRL3 pQTL and Graves’ disease (colocalization PP = 0.93, Fig. 6 a). Additionally, rs2210912 was also found to be significantly associated with enhanced FCRL3 expression in blood (eQTLGen: beta = 0.69, P = 3.27 × 10 − 310 , Fig. 6 a), consistent with previous reports of upregulated FCRL3 expression in Graves’ disease patients 36 (Fig. 6 b). These convergent lines of evidence strongly support the pathogenic role of FCRL3 in Graves' disease etiology. Another example of the MR results revealed a protective association between deoxycholic acid 3-glucuronide (DCA-3G) and colorectal cancer (CRC) risk (MR: beta = − 0.096, P = 4.94 × 10 − 6 ), corroborating previous findtings 37 , which also found a reduced DCA-3G level associated with a high CRC risk (MR: beta = − 0.041, P = 4.17 × 10 − 7 ). Fecal DCA-3G levels are reported to be lower in CRC patients compared to healthy controls 38 , suggesting a potential relationship between reduced metabolite levels and the disease. Specifically, the candidate rs2603153 demonstrated robust association with elevated DCA-3G levels across multiple cohorts (metabQTL in CAS cohort: beta = 1.25, P = 3.51 × 10 − 294 ; CLSA metabQTL: beta = 0.51, P = 8.7 × 10 − 252 ; METSIM metabQTL: beta = 0.61, P = 3.3 × 10 − 186 ) and reduced CRC risk (BBJ GWAS for CRC: beta = − 0.12, P = 4.94 × 10 − 6 ). Further colocalization analysis confirmed shared causal variants between DCA-3G and CRC at rs2603153 (colocalization PP = 0.98) and implicated that the gene UGT2B17 was associated with both DCA-3G (colocalization PP = 0.88) and CRC (colocalization PP = 0.86) in this region (Fig. 6 c). Because UGT2B17 catalyzes the glucuronidation of DCA to form DCA-3G and glucuronidation is a critical detoxification pathway in the body for the clearance of toxic substances 39 , we hypothesized that rs2603153 upregulates UGT2B17 expression, leading to lower levels of DCA and toxicity to the colon or rectum, thereby lowering CRC risk (Fig. 6 d). Previous studies have shown that UGT2B17 was significantly lowly expressed in CRC tissues 40 (differential expression analysis: logFC = − 1.9, P = 1.04 × 10 − 11 ) by combining 10 CRC datasets, providing further support for the protective role of UGT2B17 on CRC. Discussion In this study, we conducted a comprehensive analysis of blood samples from 3,102 genotyped Chinese individuals and generated matched multi-omics data, including DNA methylation, protein expression, and metabolite measurements. Our findings revealed the broad regulatory effects of genetic variation on blood biomarkers, providing novel insights into complex regulatory networks. Along with the meQTL reported in our previous study, we generated pQTL, prQTL, metabQTL, mrQTL, and meQTL from the same cohort, providing valuable information regarding the Chinese population. By generating and integrating various types of QTLs, including in-house meQTLs, pQTLs, and metabQTLs, as well as external eQTLs, we explored shared genetic mechanisms among different molecular traits (e.g., methylation, protein expression, and metabolites) through colocalization analysis, mediation analysis, and partial correlation analysis. For the colocalization analysis, we investigated pairs of eQTL–meQTL, eQTL–pQTL, eQTL–metabQTL, meQTL–pQTL, meQTL–metabQTL, and pQTL–metabQTL. In the colocalization analysis with eQTLs, the effect directions of SNPs for colocalized pQTLs and eQTLs were highly consistent (over 90%), confirming that our reported pQTLs were highly reliable, regardless of population differences. We observed that more than 50% of colocalized meQTL–eQTL and meQTL–pQTL pairs tended to have opposite effect directions (e.g., cg25259754-FCRL3), supporting the notion that high DNA methylation suppresses transcription and, consequently, protein levels 41 . However, a notable proportion (over 40%; e.g., cg10700560-LHPP) showed concordant effect directions, meaning that CpGs with high DNA methylation levels were correlated with high gene and protein expression, highlighting the complexity of the relationship between DNA methylation and gene/protein expression 41 . The GTEx tissue eQTLs also allowed us to investigate tissue specificity of the pairs. Indeed, we found that 36.6%, 57.1%, and 57.5% of colocalized meQTLs, pQTLs, and metabQTLs were identified exclusively in a single tissue category, emphasizing tissue-specific associations. Mediation analysis identified 187 significant pairs, with the SCM model showing the highest mediation proportion (median = 0.146). Partial correlation analysis further supported these findings, because all significant pairs in the mediation analysis remained significant in the partial correlation analysis (e.g., cg21029357 and docosapentaenoic acid). These results emphasized the complex interplay between genetic variants, epigenetic regulation, and downstream molecular traits, providing valuable insights into the regulatory pathways influencing human disease. The comprehensive xQTL profiles, particularly for the East Asian population, allowed us to deeply investigate the potential biological mechanisms underlying complex traits using GWAS summary statistics. Through MR analysis, we identified potential causal relationships between 227 protein–outcome pairs and 270 metabolite–outcome pairs. By integrating multi-omics data, we depicted the potential regulatory framework of FCRL3 in Graves’ disease and revealed how DCA-3G exerts a protective role in colorectal cancer. The current study had several limitations. First, the participants in the CAS cohort were mainly employees of institutes affiliated with the CAS, most of whom were engaged in academic work, which may limit the generalizability of the findings to the broader population. Second, this study included only cis- meQTLs and did not evaluate the effects of distal DNA methylation regulation on proteins and metabolites. Third, the proteins tested were limited to those measurable in plasma and included in the Olink Inflammation panel. Consequently, the identified pQTLs did not encompass the full spectrum of proteins across various cell types or tissues, constraining the interpretation of their biological functions. This also led to a small number of colocalization results between proteins and metabolites. Additionally, the validation cohort was relatively small, and further validation in larger cohorts with more comprehensive proteomic coverage is necessary. Fourth, because of the absence of an East Asian validation cohort with metabolic profiles closely matching those of the CAS cohort, some of the metabQTL results were not independently validated, and their reliability requires further assessment. Fifth, because blood samples from the CAS cohort did not meet the requirements for transcriptomic profiling, we were unable to obtain transcriptomic data for eQTL discovery. Therefore, the colocalization of xQTLs with eQTLs relied on two external eQTL datasets, which may have affected the detection of colocalization signals and prevented mediation analyses from exploring the causal relationship between xQTLs and eQTLs. Finally, in the MR analyses, exposure data were derived from a Chinese population, while outcome data were derived from a Japanese population. Although both are East Asian populations, subtle differences in genetic architecture could have influenced the results. Furthermore, most exposure in the MR analyses was represented by single instrumental variables, limiting the capacity to systematically test heterogeneity and horizontal pleiotropy, which typically requires multiple instrumental variables. Therefore, despite the statistical significance of the MR results, their biological relevance should be interpreted with caution. In summary, the current study utilized extensive multi-omics data from the CAS cohort to build a comprehensive multi-omics genomic atlas of meQTL, pQTL, and metabQTL in an East Asian population, discovering many new QTLs. Using colocalization and mediation analyses, we identified shared genetic factors and causal pathways that connect DNA methylation, gene expression, protein levels, and metabolites. This research not only expands the population diversity of QTL studies but also offers important insights into the genetic architecture and regulatory mechanisms underlying complex traits and diseases. This resource will help deepen understanding of the causal links between molecular traits and disease development, aid in identifying clinical biomarkers, and support drug target validation. Methods The Chinese Academy of Sciences (CAS) Cohort The CAS cohort is a prospective study aimed at identifying risk factors for physical and mental health through traditional epidemiological and multi-omics analyses. A total of 3,102 participants, primarily employees of CAS institutes in Beijing, were recruited between August 2020 and October 2021. All participants underwent standardized physical examinations at Zhongguancun Hospital, conducted by trained physicians and nurses. Fasting blood samples were collected for omics profiling. We generated genotyping data for all participants, DNA methylation data for 1,060, proteomic data for 1,056, and metabolomic data for 2,479 individuals, with 1,054 participants having all four types of multi-omics data ( Figure S1 a ). The study was approved by the Institutional Review Board of the Beijing Institute of Genomics, CAS, and Beijing Zhongguancun Hospital (approval IDs: 2020H020, 2021H001, and 20201229). All participants provided written informed consent. Array Genotyping DNA was extracted from peripheral blood samples stored in -80°C freezers. Genotyping data were generated using the Infinium Asian Screening Array + MultiDisease-24 BeadChip. Genotypes were called using GenTrain v2.0 in GenomeStudio. Individuals with sex mismatches, cryptic relatedness, potential contamination, or genotyping call rate below 98% were excluded. At the SNP level, we excluded duplicates, non-autosomal variants, and SNPs with a call rate below 95%, MAF less than 1%, or Hardy-Weinberg equilibrium p-value ( P_HWE ) 0.6 were retained. Quality checks of genotype data used for the proteomic and metabolomic data analyses were conducted separately, including removal of individuals with heterozygosity > 5 standard deviations, and SNPs with a missing rate > 5%, MAF < 1%, and P_HWE < 1 × 10 − 6 . Finally, 4,825,573 SNPs from 1,056 samples and 4,903,063 SNPs from 2,470 samples were retained for subsequent pQTL and metabQTL analyses, respectively. The genotype data were merged with the 1000 Genomes Project data, and principal component analysis (PCA) was performed. We found that the CAS cohort clustered well with East Asian populations from the 1000 Genomes Project, indicating that the CAS cohort is highly representative of East Asian populations ( Figure S1 b ). DNA Methylation Profiling and meQTL Discovery DNA methylation profiling and meQTL discovery were conducted in our previous study 10 , which included detailed information regarding protocols, sample preprocessing, quality control, normalization, and statistical analysis. Proteomics Profiling The Olink Explore-384 Inflammation panel22 was used to measure the levels of 384 inflammation-related plasma proteins in 1,056 participants. Olink employed Normalized Protein Expression (NPX) as the unit of protein level on the log2 scale. Background levels were established using blank control samples for each protein, and the lower limit of detection was defined as 3 standard deviations above the background. Proteins with quality control (QC) or assay warnings were excluded. A total of 365 proteins were retained after QC for further analysis ( Table S2 ). Metabolomics Profiling Untargeted plasma metabolomics profiling was conducted on 2,479 samples from the CAS cohort (LipidALL Technologies, Changzhou, China). ACQUITY UPLC HSST3 1.8 µm, 2.1 × 100 mm columns (Waters, Dublin, Ireland) were used for reverse-phase chromatography, and ACQUITY UPLC BEHAmide 1.7 µm, 2.1 × 100 mm columns (Waters, Dublin, Ireland) were used for normal-phase chromatography. Liquid chromatography-mass spectrometry analysis was carried out using an ultra-high-performance liquid chromatography system (Agilent 1290, Agilent Technologies, Germany) coupled with a high-resolution mass spectrometer (5600 Triple TOF Plus, AB Sciex, Singapore). A total of 3,784 distinct metabolites were detected, with 848 successfully identified, and the remaining 2,936 metabolites, categorized into 492 unknown groups, remained unidentified. After removing metabolites with a missing rate greater than 50%, 841 metabolites were retained for further analysis. Identification of pQTL and Protein Ratio QTL NPX values were inverse-normalized and corrected for covariates (age, sex, and the first 10 genomic PCs) to obtain residuals. QTL analysis was then performed using PLINK v1.9 45 . Associations were classified as cis- pQTLs if variants were located within 1 Mb of the transcription start site (TSS), while those beyond 1 Mb were considered trans- pQTLs. We defined pQTL as a region including the association signals that reach genome-wide significance ( P < 5 × 10 − 8 ) and the lead variant of each pQTL as the variant with the lowest P -value in the region for a given protein. For protein ratios, we computed the partial correlation between all protein-protein pairs using the R function ggm.estimate.pcor from the package GeneNet 46 . At a Bonferroni significant P -value of 7.5 × 10 − 7 (0.05 × 2 / [365 × 364]), 28 partial correlations were identified. For these ratios, inverse-normal scaled differences between the two NPX values were used as dependent variables, based on the relation log(A/B) = log(A) − log(B). For genotypes, we followed the filtering options for pQTL mapping. The GWAS on 28 ratios was conducted using PLINK v2.0 47 with the -glm option, using age, sex, and the first genetic PC as covariates. The protein ratio associations with Bonferroni levels of significance for P -values ( P 28 × 10 7 ) were defined as protein ratio QTLs. We then refined the associations within ± 500 kb of the respective lead variant using the R package coloc (v.5.1.0) 48 . Identification of metabQTL and Metabolite Ratio QTL Metabolite levels were inverse-normalized, corrected for covariates (age, sex, and the first 10 genomic PCs), and inverse-normalized again to obtain residuals. QTL analysis was conducted using PLINK v2.0 47 to identify metabQTLs. For metabolite ratios, we queried HMDB 25 to identify enzymes or transporters associated with the metabolites, identifying 256 pairs of metabolites sharing the same enzymes or transporters ( Table S13 ). Metabolite ratios were calculated by dividing the level of one metabolite by the other. Similar to the metabolite levels, ratios were inverse-normalized, corrected for covariates, and inverse-normalized again before performing genome-wide association analysis using PLINK v2.0 47 . Identification of Independent QTL and Definition of Loci We used the GCTA-COJO 49 method to identify independent QTLs from multiple SNPs in LD. The LD reference was calculated using 1,056 and 2,470 unrelated East Asian individuals from our study. Other COJO parameters were set as follows: cojo-p 5e-8, maf 0.01, cojo-wind 5000, and cojo-collinear 0.9. SNPs with P < 5.0 × 10 − 8 in the COJO results were considered as significant independent QTLs. For each independent QTL, we defined loci as regions extending 500 kb upstream and downstream. If two loci overlapped, they were merged into a single larger locus. Identification of Novel QTL To identify novel pQTLs, we collected 15 studies published after 2020 from the metabolomix website 50 (accessed July 2, 2024) and the publication list from UKB-PPP 11 ( Table S5 ). These studies encompassed three assay platforms (Olink, SomaScan, and mass spectrometry) and two populations (East Asian and European). Previously reported pQTLs were defined as variants that (i) reached the reported significance threshold, (ii) had a consistent direction of effect, (iii) were associated with the same protein, and (iv) were in LD (R² > 0.8) with the pQTL in East Asian studies, whereas LD was not considered in European studies. All information about novelty identification is included in Tables S3 . For novel metabQTLs, we compiled data from 50 articles curated by Kastenmüller et al. 27 (accessed January 10, 2024). For articles published before 2021, we used the list curated by Yin et al. and filtered by Chen et al 13,28 . For studies published after 2021, we manually curated the data. We extracted association results from those papers using a P -value threshold at P < 5.0 × 10 − 8 or study-specific P -value thresholds. MetabQTLs identified in our CAS cohort were considered known if they were located within 1 Mb of previously reported associations; otherwise, they were deemed novel. Validation of QTL To validate our pQTL results, we downloaded the pQTL summary statistics from UKB-PPP, which included all proteins measured in our study. We included all significant pQTLs ( P < 5 × 10 − 8 ) from UKB-PPP for the 365 plasma immune proteins in our analysis and compared the shared pQTLs separately for the East Asian and European populations. To validate metabQTLs, we conducted a comparison with three previously published studies: one in East Asian populations (the Nagahama study 16 ) and two in European populations (CLSA 13 and METSIM 28 ). We compared the significance and effect direction between our results and those from the three reference studies. Protein Variance Explained by pQTL We used the following equation to estimate the proportion of variance explained (PVE) by the lead independent pQTLs for each protein: \(\:PVE=\:2\times\:MAF\times\:\left(1-MAF\right)\times\:{\beta\:}^{2}\) , where MAF denotes minor allele frequency and β denotes the SNP effect size. Details of the method have been previously reported 51 , 52 . SNP-based Metabolite Heritability SNP-based heritability of metabolites and metabolite ratios was assessed using GCTA-GREML (v1.94.1) 26 . We calculated the genetic relationship matrix using the quality-controlled genotype data and estimated heritability with default parameters. Before performing GREML, we inverse-normalized the phenotypes (metabolites or metabolite ratios) and adjusted for age, sex, and the first 10 genomic PCs using the --qcovar parameter 14 . Colocalization Analysis We conducted colocalization analysis to identify shared causal variants between meQTL/pQTL/metabQTL and eQTL. We utilized two external datasets: (1) eQTLGen 29 , a meta-analysis of 37 cohorts (N = 31,684) with cis- eQTL data for 19,251 genes in peripheral blood, and (2) GTEx 30 , encompassing 49 tissues from European populations. We extracted SNPs within 1 Mb of each independent QTL and performed colocalization analysis using the R package coloc (v.5.1.0) with the recommended parameters ( P 1 = 1 × 10 − 4 , P 2 = 1 × 10 − 4 , P 12 = 1 × 10 − 5 ), considering posterior probability (PP) > 0.75 as colocalized. Additionally, following the classification approach by Carreras-Torres et al. 31 , we grouped the GTEx tissues into four main categories and an “Others” category to assess tissue-specific associations. For meQTL-eQTL and pQTL-eQTL colocalization, we restricted the analyses to CpG, mRNA, and protein annotations of the same gene. For metabQTL-eQTL colocalization, we explored all possible pairs of variants. Effect directions of colocalized SNPs with the highest PP were used to evaluate consistency across traits. Additionally, we conducted colocalization analyses within the CAS cohort for different types of QTLs. Performing colocalization within the same cohort ensures consistency in genetic structure, eliminates the impact of LD, and allows for better identification of signals that might not be detectable across populations. Mediation Analysis The mediation test investigates whether the effect of a genetic variant on a phenotype is mediated through an intermediate molecular trait. In this case, using data from 1,054 individuals with matched multi-omics profiles, we performed mediation analyses on 3,665 pairs that were identified by the colocalization test. For each pair, we conducted bidirectional mediation analysis, testing whether the SNP influenced the phenotype through the mediator, and, conversely, whether the SNP influenced the mediator through the phenotype. While mediation implies a directional hypothesis, bidirectional testing was performed to explore the potential complexity and feedback in biological regulation. Mediation effect sizes were estimated using the R package mediation (v4.5.0) 34 , and statistical significance was assessed using the Sobel test implemented in the bda package (v18.3.2) 53 . Pairs with a Sobel P -value < 0.05 and a positive mediation effect were considered to show evidence of mediation, while those with a Sobel P -value < 0.05/3,665 (Bonferroni correction) and a positive mediation effect were considered to have statistically significant mediation. Before mediation analysis, we prepared each omics dataset to satisfy model assumptions and reduce technical confounding. For the methylation data, we applied inverse normalization, adjusted for age, sex, and the first 10 genomic PCs, methylation batch, and estimated blood cell fraction, and then performed inverse normalization on the residuals. For the proteomic data, we applied inverse normalization, adjusted for age, sex, the first 10 genomic PCs, and Olink batch, and again inverse normalized the residuals. For the metabolomic data, we applied inverse normalization, adjusted for age, sex, and the first 10 genomic PCs, and inverse normalized the residuals. Partial Correlation Analysis We performed partial correlation analysis on pairs of molecules identified by the colocalization analyses (N = 3,665) 32 . Before computing correlations, the data were processed using the same steps described in the mediation analysis section. First, we calculated the Pearson correlation coefficient and its significance for each pair. Next, we regressed out the effect of the colocalized SNP to obtain the residuals and computed the Pearson correlation between these residuals to derive the partial correlation for each pair. The original and partial correlations were compared to assess the impact of SNPs on the relationship between traits. Two-Sample MR Exposure and Instrumental Variable Selection. We included all proteins and metabolites with QTLs as exposures, encompassing 155 proteins and 369 metabolites that had QTLs. We then processed the exposure data using the “clump_data” function from the R package TwoSampleMR 54 (v.0.5.6) to obtain independent instrumental variables (LD R² < 0.05, distance ≥ 500 kb, pop = “EAS”) ( Tables S28-29 ). Instrumental Variable Proxies. To prevent missing instrumental variable SNPs in the outcome, we identified proxies for each instrumental variable using the 1000 Genomes Project East Asian as the reference panel 44 (LD R² > 0.8). All instrumental variables and their proxies are listed in Tables S28-29 . To control for horizontal pleiotropy, instrumental variables and their proxies associated with fewer than five exposures were retained 14 . After filtering, a total of 449 exposures (155 proteins and 294 metabolites) remained for analysis. Outcomes. We used data from 220 large-scale GWASs conducted by BBJ (East Asian) as outcomes 21 , covering 159 diseases, 23 drug prescriptions, and 38 biomarkers. Detailed information on the outcomes is available in Table S30 . There was no participant overlap between the exposure and outcome cohorts. MR Analysis. MR analysis was performed using the “mr” function from the R package TwoSampleMR. We employed the Wald ratio method for exposure–outcome pairs with a single instrumental variable and the inverse variance weighted method for those with multiple instrumental variables. Results that passed correction for multiple testing (Bonferroni corrected P = 0.05/220) were retained. Supplemental Analyses for MR Reverse MR. To assess bidirectional causality, we performed reverse MR analysis using 220 GWASs from BBJ as exposures and 155 proteins and 294 metabolites as outcomes. The instrumental variables for the exposures were selected using the same criteria as in forward MR. Results passing multiple testing correction (Bonferroni corrected P = 0.05/220) were considered significant. If these results overlapped with those from the forward MR analysis, they were considered to have bidirectional causality and were removed from further analyses. Observational Linear Regression Analysis. To validate the accuracy of our MR results, we conducted observational linear regression analyses to evaluate directional concordance between observational regression coefficients and causal MR beta estimates. Observational linear regression was conducted within the CAS cohort, which retained metrics from health examinations. We identified 25 outcomes shared between our CAS cohort and the BBJ cohort that also had significant MR results: alanine aminotransferase, aspartate transaminase, body mass index, blood urea nitrogen, body weight, diastolic blood pressure, eosinophil count, G-glutamyl transpeptidase, glucose, hemoglobin, hemoglobin A1c (HbA1c), high-density lipoprotein cholesterol (HDL-cholesterol), height, low-density lipoprotein cholesterol (LDL-cholesterol), lymphocyte count, monocyte count, platelet count, pulse pressure, red blood cell count, systolic blood pressure, serum creatinine, total cholesterol, triglycerides, uric acid, and white blood cell count. These outcomes were used for the observational linear regression analyses. Both exposures and outcomes were inverse-normalized before regression. Covariates included sex, age, smoking, alcohol consumption, and body mass index (BMI) for all outcomes, except for body weight, height, and BMI itself, where BMI was excluded. Directional concordance between observational regression coefficients and MR beta estimates was considered to provide evidence supporting MR findings. Colocalization Analysis of Exposures and Outcomes. To investigate whether exposures and outcomes share the same genetic determinants, we performed colocalization analysis using the R package coloc on all significant MR pairs. SNPs within 1 Mb of instrumental variables were included in the study. The parameters and thresholds were the same as those described in the Colocalization Analysis section. Declarations Funding This work was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences (XDA0460202, XDA0460204) and the National Natural Science Foundation of China (NSFC) (92374103, 32270706, 62473355). ACKNOWLEDGMENTS We thank all the participants. We thank Benjamin Knight, MSc., from Scribendi ( www.scribendi.com ) for editing a draft of this manuscript. CODE availability All custom Bash, R (version 4.3.2) and Python (version 3.11.6) scripts used in this study are available at GitHub ( https://github.com/YangPeng-CNCB/multi-omics-QTL ). Data availability The multi-omics QTL summary statistics have been deposited in the OMIX, China National Center for Bioinformation / Beijing Institute of Genomics, Chinese Academy of Sciences ( https://ngdc.cncb.ac.cn/omix ). The accession IDs are OMIX004116 for meQTL, OMIX008230 for pQTL, and OMIX011747 for metabQTL. References Uffelmann E et al (2021) Genome-wide association studies. Nat Rev Methods Primer 1:59 Maurano MT et al (2012) Systematic Localization of Common Disease-Associated Variation in Regulatory DNA. Science 337:1190–1195 Gallagher MD, Chen-Plotkin AS (2018) The Post-GWAS Era: From Association to Function. Am J Hum Genet 102:717–730 Musunuru K et al (2010) From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus. Nature 466:714–719 Nicolae DL et al (2010) Trait-Associated SNPs Are More Likely to Be eQTLs: Annotation to Enhance Discovery from GWAS. PLoS Genet 6:e1000888 Joehanes R et al (2017) Integrated genome-wide analysis of expression quantitative trait loci aids interpretation of genomic association studies. Genome Biol 18:16 Hawe JS et al (2022) Genetic variation influencing DNA methylation provides insights into molecular mechanisms regulating genomic function. Nat Genet 54:18–29 Min JL et al (2021) Genomic and phenotypic insights from an atlas of genetic effects on DNA methylation. Nat Genet 53:1311–1321 Huan T et al (2019) Genome-wide identification of DNA methylation QTLs in whole blood highlights pathways for cardiovascular disease. Nat Commun 10:4267 Peng Q et al (2024) Analysis of blood methylation quantitative trait loci in East Asians reveals ancestry-specific impacts on complex traits. Nat Genet 56:846–860 Sun BB et al (2023) Plasma proteomic associations with genetics and health in the UK Biobank. Nature 622:329–338 Ferkingstad E et al (2021) Large-scale integration of the plasma proteome with genetics and disease. Nat Genet 53:1712–1721 Chen Y et al (2023) Genomic atlas of the plasma metabolome prioritizes metabolites implicated in human diseases. Nat Genet 55:44–53 Karjalainen MK et al (2024) Genome-wide characterization of circulating metabolic biomarkers. Nature 628:130–138 MacTel, Consortium et al (2021) A cross-platform approach identifies genetic regulators of human metabolism and health. Nat Genet 53:54–64 Iwasaki T et al (2023) Genetic influences on human blood metabolites in the Japanese population. iScience 26:105738 Wang Z et al (2021) Genome-wide association study of metabolites in patients with coronary artery disease identified novel metabolite quantitative trait loci. Clin Transl Med 11:e290 Cheng C et al (2025) Genetic mapping of serum metabolome to chronic diseases among Han Chinese. Cell Genomics 5:100743 Wang QS et al (2024) Statistically and functionally fine-mapped blood eQTLs and pQTLs from 1,405 humans reveal distinct regulation patterns and disease relevance. Nat Genet 56:2054–2067 Tokolyi A et al (2025) The contribution of genetic determinants of blood gene expression and splicing to molecular phenotypes and health outcomes. Nat Genet. https://doi.org/10.1038/s41588-025-02096-3 Sakaue S et al (2021) A cross-population atlas of genetic associations for 220 human phenotypes. Nat Genet 53:1415–1424 Lundberg M, Eriksson A, Tran B, Assarsson E, Fredriksson S (2011) Homogeneous antibody-based proximity extension assays provide sensitive and specific detection of low-abundant proteins in human blood. Nucleic Acids Res 39:e102–e102 Höglund J, Karlsson T, Johansson T, Ek WE, Johansson Å (2021) Characterization of the human ABO genotypes and their association to common inflammatory and cardiovascular diseases in the UK Biobank. Am J Hematol 96:1350–1362 Tian H et al (2022) Precise Metabolomics Reveals a Diversity of Aging-Associated Metabolic Features. Small Methods 6:e2200130 Wishart DS et al (2022) HMDB 5.0: the Human Metabolome Database for 2022. Nucleic Acids Res 50:D622–D631 Yang J, Zeng J, Goddard ME, Wray NR, Visscher PM (2017) Concepts, estimation and interpretation of SNP-based heritability. Nat Genet 49:1304–1310 Kastenmüller G, Raffler J, Gieger C, Suhre K (2015) Genetics of human metabolism: an update. Hum Mol Genet 24:R93–R101 Yin X et al (2022) Genome-wide association studies of metabolites in Finnish men identify disease-relevant loci. Nat Commun 13:1644 Võsa U et al (2021) Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat Genet 53:1300–1310 The GTEx Consortium atlas of genetic regulatory effects across human tissues Carreras-Torres R et al (2024) Multiomic integration analysis identifies atherogenic metabolites mediating between novel immune genes and cardiovascular risk. Genome Med 16:122 Pierce BL et al (2018) Co-occurring expression and methylation QTLs allow detection of common causal variants and shared biological mechanisms. Nat Commun 9:804 Shang L et al (2023) meQTL mapping in the GENOA study reveals genetic determinants of DNA methylation in African Americans. Nat Commun 14:2711 Imai K, Keele L, Tingley D (2010) A general approach to causal mediation analysis. Psychol Methods 15:309–334 Smith GD (2004) Mendelian randomization: prospects, potentials, and limitations. Int J Epidemiol 33:30–42 Wojciechowska-Durczynska K, Stepniak J, Lewinski A, Karbownik-Lewinska M (2024) The Increased FCRL mRNA Expression in Patients with Graves’ Disease Is Associated with Hyperthyroidism (But Not with Positive Thyroid Antibodies). J Clin Med 13:5289 Sun J et al (2024) Systematic investigation of genetically determined plasma and urinary metabolites to discover potential interventional targets for colorectal cancer. JNCI J Natl Cancer Inst 116:1303–1312 Cai C et al (2021) Gut microbiota imbalance in colorectal cancer patients, the risk factor of COVID-19 mortality. Gut Pathog 13:70 Yang G et al (2017) Glucuronidation: driving factors and their impact on glucuronide disposition. Drug Metab Rev 49:105–138 Wei F-Z et al (2020) Differential Expression Analysis Revealing CLCA1 to Be a Prognostic and Diagnostic Biomarker for Colorectal Cancer. Front Oncol 10:573295 Moore LD, Le T, Fan GDNA (2013) Methylation and Its Basic Function. Neuropsychopharmacology 38:23–38 Delaneau O, Marchini J, Zagury J-F (2011) A linear complexity phasing method for thousands of genomes. Nat Methods 9:179–181 van Leeuwen EM et al (2015) Population-specific genotype imputations using minimac or IMPUTE2. Nat Protoc 10:1285–1296 The 1000 Genomes Project Consortium (2015) A global reference for human genetic variation. Nature 526:68–74 Purcell S et al (2007) PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. Am J Hum Genet 81:559–575 Schäfer J, Strimmer K (2005) A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat Appl Genet Mol Biol 4:Article32 Chang CC et al (2015) Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, s13742-015-0047–8 Giambartolomei C et al (2014) Bayesian Test for Colocalisation between Pairs of Genetic Association Studies Using Summary Statistics. PLoS Genet 10:e1004383 Genetic Investigation of ANthropometric Traits (GIANT) Consortium (2012) Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat Genet 44:369–375 Suhre K, McCarthy MI, Schwenk JM (2021) Genetics meets proteomics: perspectives for large population-based studies. Nat Rev Genet 22:19–37 Xu F et al (2023) Genome-wide genotype-serum proteome mapping provides insights into the cross-ancestry differences in cardiometabolic disease susceptibility. Nat Commun 14:896 Folkersen L et al (2020) Genomic and drug target evaluation of 90 cardiovascular proteins in 30,931 individuals. Nat Metab 2:1135–1148 Wang B (2024) Bda: Binned Data Analysis Hemani G et al (2018) The MR-Base platform supports systematic causal inference across the human phenome. eLife 7:e34408 Additional Declarations There is NO Competing Interest. Supplementary Files SupplementaryMaterials.docx Supplementary Materials SupplementaryFigures.docx Supplementary Figures SupplementaryTables.xlsx Supplementary Tables Cite Share Download PDF Status: Under Review Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8193543","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":560673946,"identity":"68622461-ccf1-47ea-8cf9-783d5fb2a139","order_by":0,"name":"Peilin Jia","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAsElEQVRIiWNgGAWjYFCCgw0MHxskSNTCOJNELQwMzLwNpCg3Zzzc+Nl2h4VdA/vhBww/dxChxbLhYLN07hmJ5AaeNAPG3jNEaDE4cLBBOrdNIpmBIYeBmbGNOC3Nvy1BWvjfEK+lTZqxTcKOQYJYW4B+abPsPSORwCbxzOBgLzFazCWOP77xc0edPT9/8sMHP4lymMQBMJ0IUnyACA1ALfwNYNqeKNWjYBSMglEwMgEAQ7U2ZdyP78gAAAAASUVORK5CYII=","orcid":"https://orcid.org/0000-0003-4523-4153","institution":"China National Center for Bioinformation, Beijing Institute of Genomics, Chinese Academy of Sciences, University of Chinese Academy of Sciences","correspondingAuthor":true,"prefix":"","firstName":"Peilin","middleName":"","lastName":"Jia","suffix":""},{"id":560673947,"identity":"fbf58b03-32fe-4e40-b8eb-5ba7973e8e55","order_by":1,"name":"Peng Yang","email":"","orcid":"","institution":"China National Center for Bioinformation, Beijing Institute of Genomics, Chinese Academy of Sciences, University of Chinese Academy of Sciences","correspondingAuthor":false,"prefix":"","firstName":"Peng","middleName":"","lastName":"Yang","suffix":""},{"id":560673948,"identity":"491f299d-b441-4990-bcb6-6852e5f8e0ca","order_by":2,"name":"Shuhua Li","email":"","orcid":"","institution":"China National Center for Bioinformation, Beijing Institute of Genomics, Chinese Academy of Sciences, University of Chinese Academy of Sciences","correspondingAuthor":false,"prefix":"","firstName":"Shuhua","middleName":"","lastName":"Li","suffix":""},{"id":560673949,"identity":"b774dc99-8d2c-4578-80ed-d025bc9891de","order_by":3,"name":"Qiwen Zheng","email":"","orcid":"","institution":"CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, University of Chinese Academy of Sciences, Chinese Academy of Sciences","correspondingAuthor":false,"prefix":"","firstName":"Qiwen","middleName":"","lastName":"Zheng","suffix":""},{"id":560673950,"identity":"50ff52a2-4703-41db-8235-17bb49b0d308","order_by":4,"name":"Xinxuan Liu","email":"","orcid":"https://orcid.org/0000-0002-1653-1434","institution":"University of Chinese Academy of Sciences","correspondingAuthor":false,"prefix":"","firstName":"Xinxuan","middleName":"","lastName":"Liu","suffix":""},{"id":560673951,"identity":"410b44a5-473b-4256-b5f2-82c03772be0e","order_by":5,"name":"Siyu Pan","email":"","orcid":"","institution":"China National Center for Bioinformation, Beijing Institute of Genomics, Chinese Academy of Sciences, University of Chinese Academy of Sciences","correspondingAuthor":false,"prefix":"","firstName":"Siyu","middleName":"","lastName":"Pan","suffix":""},{"id":560673952,"identity":"ee89ebc5-f5ea-4297-a6af-034dac0737b2","order_by":6,"name":"Yaning Zhang","email":"","orcid":"","institution":"Henan Academy of Sciences","correspondingAuthor":false,"prefix":"","firstName":"Yaning","middleName":"","lastName":"Zhang","suffix":""},{"id":560673953,"identity":"1e4e91ca-3240-4467-9d0b-72fb37e719e4","order_by":7,"name":"Tianzi Liu","email":"","orcid":"","institution":"CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences","correspondingAuthor":false,"prefix":"","firstName":"Tianzi","middleName":"","lastName":"Liu","suffix":""},{"id":560673954,"identity":"a54eaca0-de6f-4a58-ba2a-f237badc6442","order_by":8,"name":"Sin Man Lam","email":"","orcid":"https://orcid.org/0000-0001-9037-904X","institution":"LipidALL Technologies Company Limited","correspondingAuthor":false,"prefix":"","firstName":"Sin","middleName":"Man","lastName":"Lam","suffix":""},{"id":560673955,"identity":"0435cd7e-ac22-46f2-bf02-40a79c9f3124","order_by":9,"name":"Hongen Kang","email":"","orcid":"","institution":"China National Center for Bioinformation, Beijing Institute of Genomics, Chinese Academy of Sciences, University of Chinese Academy of Sciences","correspondingAuthor":false,"prefix":"","firstName":"Hongen","middleName":"","lastName":"Kang","suffix":""},{"id":560673956,"identity":"081b4172-28aa-4ec5-938b-cd206dc81a4d","order_by":10,"name":"Xiuli Zhu","email":"","orcid":"","institution":"China National Center for Bioinformation, Beijing Institute of Genomics, Chinese Academy of Sciences, University of Chinese Academy of Sciences","correspondingAuthor":false,"prefix":"","firstName":"Xiuli","middleName":"","lastName":"Zhu","suffix":""},{"id":560673957,"identity":"3cf7e08f-491b-469a-b693-e2e9f4bc3796","order_by":11,"name":"Shiqi Lin","email":"","orcid":"","institution":"China National Center for Bioinformation, Beijing Institute of Genomics, Chinese Academy of Sciences, University of Chinese Academy of Sciences","correspondingAuthor":false,"prefix":"","firstName":"Shiqi","middleName":"","lastName":"Lin","suffix":""},{"id":560673958,"identity":"ade2e315-c917-47e2-9c75-2231352b3c38","order_by":12,"name":"Zhanjie Fang","email":"","orcid":"","institution":"China National Center for Bioinformation, Beijing Institute of Genomics, Chinese Academy of Sciences, University of Chinese Academy of Sciences","correspondingAuthor":false,"prefix":"","firstName":"Zhanjie","middleName":"","lastName":"Fang","suffix":""},{"id":560673959,"identity":"766da595-4cfb-4576-8e45-ca6e35950471","order_by":13,"name":"Yin-Ying Wang","email":"","orcid":"","institution":"China National Center for Bioinformation, Beijing Institute of Genomics, Chinese Academy of Sciences","correspondingAuthor":false,"prefix":"","firstName":"Yin-Ying","middleName":"","lastName":"Wang","suffix":""},{"id":560673960,"identity":"dbbe4d23-85ae-4d67-b41b-86a6c68d1d1c","order_by":14,"name":"Jian Wang","email":"","orcid":"","institution":"China National Center for Bioinformation, Beijing Institute of Genomics, Chinese Academy of Sciences","correspondingAuthor":false,"prefix":"","firstName":"Jian","middleName":"","lastName":"Wang","suffix":""},{"id":560673961,"identity":"0328b93f-6644-4263-9dc6-45c320f26c63","order_by":15,"name":"Guanghou Shui","email":"","orcid":"https://orcid.org/0000-0002-1621-9643","institution":"Institute of Genetics and Developmental Biology, CAS","correspondingAuthor":false,"prefix":"","firstName":"Guanghou","middleName":"","lastName":"Shui","suffix":""},{"id":560673962,"identity":"afdb7a59-f4a9-4f6c-b136-d6465771ce05","order_by":16,"name":"Fan Liu","email":"","orcid":"https://orcid.org/0000-0001-9241-8161","institution":"NAIF ARAB UNIVERSITY FOR SECURITY SCIENCES","correspondingAuthor":false,"prefix":"","firstName":"Fan","middleName":"","lastName":"Liu","suffix":""},{"id":560673963,"identity":"a1a7fdc6-fc09-434c-a79b-49282d6251fb","order_by":17,"name":"Changqing Zeng","email":"","orcid":"https://orcid.org/0000-0002-0037-1771","institution":"Beijing Institute of Genomics","correspondingAuthor":false,"prefix":"","firstName":"Changqing","middleName":"","lastName":"Zeng","suffix":""}],"badges":[],"createdAt":"2025-11-24 12:40:41","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8193543/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8193543/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":99633305,"identity":"38199bd6-d790-4c14-b71a-164cead8e08d","added_by":"auto","created_at":"2026-01-06 16:28:46","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":650078,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eOverview of the Study Design and Genome-Wide Association Results. (a)\u003c/strong\u003e Schematic overview of the study design. (\u003cstrong\u003eb)\u003c/strong\u003e Circos Manhattan plot showing genome-wide association results for CpGs, proteins, and metabolites. Red diamonds mark significant protein and metabolite ratio QTLs. The blue line represents the genome-wide significance threshold (\u003cem\u003eP\u003c/em\u003e = 5 × 10\u003csup\u003e-8\u003c/sup\u003e).\u003c/p\u003e","description":"","filename":"Figure1331.png","url":"https://assets-eu.researchsquare.com/files/rs-8193543/v1/eea9420ffd9b80490a1649ec.png"},{"id":99794288,"identity":"8a39dd9a-aee3-4784-aaa8-68db3027f9bb","added_by":"auto","created_at":"2026-01-08 13:34:28","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":569451,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eCharacteristics of Independent pQTLs. (a\u003c/strong\u003e, \u003cstrong\u003ed-e)\u003c/strong\u003e Proportion of variance explained (PVE) by the sentinel variants associated with each protein. Proteins are divided into three groups: \u003cem\u003ecis\u003c/em\u003e-pQTLs only (orange), \u003cem\u003etrans\u003c/em\u003e-pQTLs only (blue), and those with both \u003cem\u003ecis\u003c/em\u003e- and \u003cem\u003etrans\u003c/em\u003e-pQTLs (red). (\u003cstrong\u003eb\u003c/strong\u003e) Circos plot showing the \u003cem\u003etrans-\u003c/em\u003epQTL “hotspot” at the \u003cem\u003eABO\u003c/em\u003e locus on chromosome 9, associated with six proteins. (\u003cstrong\u003ec\u003c/strong\u003e) Chromosome location of 168 identified pQTLs categorized by type (known or novel). The green dots represent previously reported pQTLs, while the orange dots represent newly identified pQTLs.\u003c/p\u003e","description":"","filename":"Figure1332.png","url":"https://assets-eu.researchsquare.com/files/rs-8193543/v1/59debaf49b43b3cccf13d58a.png"},{"id":99795350,"identity":"a4550096-46cb-4bc4-8425-302675885efe","added_by":"auto","created_at":"2026-01-08 13:37:48","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":732607,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eCharacteristics of Independent metabQTLs. (a)\u003c/strong\u003e The Manhattan plot displays the significant association results (\u003cem\u003eP\u003c/em\u003e \u0026lt; 5.0 × 10\u003csup\u003e-8\u003c/sup\u003e) of all metabolites across the genome. Color represents superclass types. (\u003cstrong\u003eb\u003c/strong\u003e) The number of metabolites associated with each locus. (\u003cstrong\u003ec\u003c/strong\u003e) Distribution of heritability explained by genotyped variants for metabolites in each superclass. Black lines represent the median heritability of metabolites within each superclass, while the blue dashed line marks the median heritability across all metabolites. (\u003cstrong\u003ed\u003c/strong\u003e) The distribution of all metabQTLs across 22 chromosomes.\u003c/p\u003e","description":"","filename":"Figure1333.png","url":"https://assets-eu.researchsquare.com/files/rs-8193543/v1/13c58f98f7b6b50e2af3b9bd.png"},{"id":99794239,"identity":"20adca81-ef22-4c66-b5e7-9a391171ac4d","added_by":"auto","created_at":"2026-01-08 13:34:19","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":800111,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSummary of QTL Colocalization Results. \u003c/strong\u003e(\u003cstrong\u003ea\u003c/strong\u003e) Direction of lead SNP effects in colocalized pairs between xQTLs of the CAS cohort. (\u003cstrong\u003eb\u003c/strong\u003e) Associations between colocalized pairs in the CAS cohort. (\u003cstrong\u003ec\u003c/strong\u003e) Direction of lead SNP effects in colocalized pairs between xQTLs of the CAS cohort and eQTLs. (\u003cstrong\u003ed\u003c/strong\u003e) Distribution of colocalized genes between xQTLs of the CAS cohort and eQTLs (GTEx) across tissue categories. In this plot, colocalization signals found in a single category are represented in lighter colors, while those found across multiple categories are indicated with darker colors. (\u003cstrong\u003ee\u003c/strong\u003e) Example of colocalization cases: cg10700560 – \u003cem\u003eLHPP\u003c/em\u003e (mRNA) – LHPP (protein). (\u003cstrong\u003ef\u003c/strong\u003e) Example of colocalization cases: cg21029357 – \u003cem\u003eFADS2\u003c/em\u003e – docosapentaenoic acid. Red lines: \u003cem\u003eP\u003c/em\u003e = 5.0 × 10\u003csup\u003e-8\u003c/sup\u003e.\u003c/p\u003e","description":"","filename":"Figure1334.png","url":"https://assets-eu.researchsquare.com/files/rs-8193543/v1/e3468720af7ef02ebeb53222.png"},{"id":99633309,"identity":"a34ee9e3-715d-4353-be29-2fd82ed18236","added_by":"auto","created_at":"2026-01-06 16:28:46","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":790701,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eMediation and Partial Correlation Analyses Reveal Shared Regulatory Mechanisms. \u003c/strong\u003e(\u003cstrong\u003ea\u003c/strong\u003e) Histogram of Sobel test \u003cem\u003eP\u003c/em\u003e-value for colocalization pairs. The blue dashed line indicates \u003cem\u003eP\u003c/em\u003e-value \u0026lt; 0.05, while the red dashed line represents the \u003cem\u003eP\u003c/em\u003e-value \u0026lt; 1.36 × 10\u003csup\u003e-5\u003c/sup\u003e (0.05/3,665). (\u003cstrong\u003eb\u003c/strong\u003e) Histogram of mediation proportions among significant mediation results. Yellow bars represent pairs with Sobel \u003cem\u003eP\u003c/em\u003e-value \u0026lt; 0.05, with the blue dashed line indicating the median, while green bars represent pairs with adjusted Sobel \u003cem\u003eP\u003c/em\u003e-value \u0026lt; 0.05, with the red dashed line indicating the median. (\u003cstrong\u003ec\u003c/strong\u003e) Comparison of correlation and partial correlation among colocalization pairs. Light-colored dots within the red dashed lines indicate correlations with \u003cem\u003eP\u003c/em\u003e-value \u0026gt; 0.05. (\u003cstrong\u003ed\u003c/strong\u003e) Venn diagram illustrating the overlap among the results of the mediation analysis, partial correlation analysis, and correlation analysis.\u003c/p\u003e","description":"","filename":"Figure1335.png","url":"https://assets-eu.researchsquare.com/files/rs-8193543/v1/2bd5fecc50630fe493654000.png"},{"id":99633306,"identity":"938b0f07-1280-46cd-869f-036df0048230","added_by":"auto","created_at":"2026-01-06 16:28:46","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":205676,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eExamples of Causal Effects of Proteins and Metabolites on Complex Phenotypes. \u003c/strong\u003e(\u003cstrong\u003ea\u003c/strong\u003e) Multi-omics evidence for Graves’ disease. pQTL: Box plot for FCRL3 protein level with genotype at rs2210912. Colocalization: Locus zoom plots of FCRL3 pQTL and Graves’ disease showing colocalization through the rs2210912 locus. eQTL: eQTLGen database search result. Differential expression: Literature search result. Mendelian randomization: MR result. (\u003cstrong\u003eb\u003c/strong\u003e)A graphical summary of the regulatory network linking the FCRL3 pQTL (rs2210912) to Graves’ disease. (\u003cstrong\u003ec\u003c/strong\u003e) Multi-omics evidence for colorectal cancer. \u003cem\u003eUGT2B17\u003c/em\u003e, deoxycholic acid 3-glucuronide, and colorectal cancer were colocalize at rs2603153. rs2603153 was not detected in eQTLGen. rs145450963 and rs2603153 were in strong LD (R2: EAS = 0.65, EUR = 0.94). (\u003cstrong\u003ed\u003c/strong\u003e) A graphical summary of the regulatory network linking deoxycholic acid 3-glucuronide to colorectal cancer.\u003c/p\u003e","description":"","filename":"Figure1336.png","url":"https://assets-eu.researchsquare.com/files/rs-8193543/v1/5e861793823523d2ef5c999d.png"},{"id":99804621,"identity":"7daee8b2-09ca-4267-9d41-285ef0203606","added_by":"auto","created_at":"2026-01-08 14:14:05","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":5061356,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8193543/v1/9b1237e7-c83e-4185-aa61-8cbcdcca4182.pdf"},{"id":99633303,"identity":"3f564ab1-f8a1-44c6-b2e8-362d8580c597","added_by":"auto","created_at":"2026-01-06 16:28:46","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":36403,"visible":true,"origin":"","legend":"Supplementary Materials","description":"","filename":"SupplementaryMaterials.docx","url":"https://assets-eu.researchsquare.com/files/rs-8193543/v1/b3910e068a018d0320a0fe8a.docx"},{"id":99793549,"identity":"54bdf1cd-0572-4f21-b898-b580a8dcc214","added_by":"auto","created_at":"2026-01-08 13:31:50","extension":"docx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":2409617,"visible":true,"origin":"","legend":"Supplementary Figures","description":"","filename":"SupplementaryFigures.docx","url":"https://assets-eu.researchsquare.com/files/rs-8193543/v1/58de37695929bc0c34e5fc08.docx"},{"id":99633311,"identity":"fc6dcc62-355a-4757-9fc6-a49908eb36f7","added_by":"auto","created_at":"2026-01-06 16:28:46","extension":"xlsx","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":12318662,"visible":true,"origin":"","legend":"Supplementary Tables","description":"","filename":"SupplementaryTables.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-8193543/v1/355a4d053af5445e9143c47b.xlsx"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"A Map of Multi-omics Quantitative Trait Loci in a Chinese Population Reveals Regulatory Variations and Disease Links","fulltext":[{"header":"Introduction","content":"\u003cp\u003eGenome-wide association studies (GWASs) have identified thousands of genetic variants associated with a wide range of complex traits and diseases\u003csup\u003e\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u003c/sup\u003e. However, the majority of these variants are located in noncoding regions\u003csup\u003e\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u003c/sup\u003e, suggesting that they exert their effects through regulatory mechanisms\u003csup\u003e\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e,\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u003c/sup\u003e. Quantitative trait loci (QTL) analyses have emerged as powerful tools for elucidating the genetic architecture of molecular phenotypes and understanding the regulatory functions of genetic variants in disease pathogenesis\u003csup\u003e\u003cspan additionalcitationids=\"CR6\" citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eAmong these, DNA methylation QTL (meQTL) studies have identified genetic variants associated with CpG methylation levels. For example, Min et al. integrated data from 36 cohorts comprising over 30,000 individuals and identified more than 270,000 independent meQTLs\u003csup\u003e\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u003c/sup\u003e. Similarly, Huan et al. reported over 4.7\u0026nbsp;million meQTLs in a study of 4,170 individuals\u003csup\u003e\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u003c/sup\u003e. Recently, we conducted a meQTL study in an East Asian population, analyzing whole blood samples from approximately 5,000 Chinese individuals\u003csup\u003e\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u003c/sup\u003e. Protein QTL (pQTL) analyses have also advanced, including the UK Biobank Pharma Proteomics Project (UKB-PPP)\u003csup\u003e\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u003c/sup\u003e and the deCODE\u003csup\u003e\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e projects, mainly focusing on the European ancestry. Furthermore, the Canadian Longitudinal Study on Aging identified 1,702 metabolite QTL (metabQTL) associations for 690 metabolites in over 8,000 Canadians\u003csup\u003e\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u003c/sup\u003e. Notably, most of these QTL studies have focused on European populations, leaving regulatory genetic mechanisms in other ancestries, such as East Asians, underexplored\u003csup\u003e\u003cspan additionalcitationids=\"CR8\" citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u003c/sup\u003e. Given the distinct genetic architecture, dietary patterns, and lifestyles in different populations, findings from European studies may not be directly transferable to East Asian populations\u003csup\u003e\u003cspan additionalcitationids=\"CR14 CR15 CR16 CR17\" citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eIntegrating multi-omics data is essential for disentangling the regulatory mechanisms underlying human traits and diseases. Analyzing multiple molecular layers within the same cohort can reduce the confounding effects from linkage disequilibrium (LD) and enhance the detection of shared genetic signals\u003csup\u003e\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u003c/sup\u003e. For example, a recent study\u003csup\u003e\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e\u003c/sup\u003e integrating transcriptomic, proteomic, and metabolomic QTLs revealed extensive genetic regulation of molecular traits and identified therapeutic targets such as \u003cem\u003eWARS1\u003c/em\u003e for hypertension and \u003cem\u003eIFNAR2\u003c/em\u003e for COVID-19. Such integrative analyses provide a holistic view of the interactions among epigenetic modifications, gene expression, protein activity, and metabolism, thereby accelerating biomarker discovery and therapeutic development.\u003c/p\u003e \u003cp\u003eIn the current study, we performed pQTL and metabQTL mapping using plasma protein and plasma metabolite data from 3,102 genotyped Chinese individuals (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e; \u003cb\u003eFigure \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e\u003c/b\u003e). Through colocalization, mediation, and partial correlation analyses, we identified shared genetic mechanisms across gene expression QTLs (eQTLs), meQTLs, pQTLs, and metabQTLs. Our analyses revealed complex regulatory interactions among epigenetic modifications, gene expression, protein abundance, and metabolite levels. Furthermore, by leveraging data from BioBank Japan (BBJ), we conducted two-sample Mendelian randomization (MR) analyses to assess the causal effects of proteins and metabolites on 220 traits and diseases in East Asians\u003csup\u003e\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u003c/sup\u003e. Overall, our study enhanced QTL data regarding East Asian populations, providing valuable insights into the regulatory mechanisms underlying complex traits and diseases and supporting the discovery of potential biomarkers and therapeutic targets.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eOverview of the Chinese Academy of Sciences (CAS) Cohort Multi-omics Data\u003c/h2\u003e \u003cp\u003eThe CAS cohort is a prospective study with baseline data collected between August 2020 and October 2021. The clinical characteristics of 3,102 participants are summarized in \u003cb\u003eTable \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e\u003c/b\u003e. Blood samples were collected after overnight fasting. We generated genotyping data (Illumina Infinium Asian Screening Array + MultiDisease-24 array followed by imputation for all participants), protein abundances (the Olink Explore 384 Inflammation panel\u003csup\u003e\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e\u003c/sup\u003e for 1,056 participants), metabolites (untargeted mass spectrometry for 2,470 participants), plus previously-reported methylation data (Illumina Infinium MethylationEPIC BeadChip for 1,060 participants). Building upon these resources, we conducted pQTL, protein ratio QTL (prQTL), metabQTL, and metabolite ratio QTL (mrQTL), generating a comprehensive atlas for multi-omics QTLs (xQTLs) for the East Asian population. MeQTL analysis for the CAS cohort was reported in our previous study\u003csup\u003e\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u003c/sup\u003e, in which we identified 20,917,614 meQTLs (14,508,327 \u003cem\u003ecis-\u003c/em\u003emeQTLs, involving 2,121,167 SNPs and 170,047 CpGs) and successfully replicated 93.81% of them in cross-cohorts. In this study, we primarily focused on protein and metabolite QTLs and conducted integrative analyses across all types of omics data.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eGenetic Structure of Immune Plasma Proteome\u003c/h3\u003e\n\u003cp\u003eWe conducted association analyses on the plasma protein abundances for 384 proteins measured in 1,056 CAS cohort participants (\u003cb\u003eTable \u003cspan refid=\"MOESM2\" class=\"InternalRef\"\u003eS2\u003c/span\u003e\u003c/b\u003e). In total, 19,719 significant pQTLs involving 18,577 variants (pSNPs) were identified for 155 proteins (\u003cem\u003eP\u003c/em\u003e \u0026lt; 5 × 10\u003csup\u003e− 8\u003c/sup\u003e; Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eb, \u003cb\u003eFigure \u003cspan refid=\"MOESM2\" class=\"InternalRef\"\u003eS2\u003c/span\u003e, Table S3\u003c/b\u003e), representing 209 independent loci: 154 (74%) \u003cem\u003ecis\u003c/em\u003e- and 55 (26%) \u003cem\u003etrans\u003c/em\u003e-pQTLs. The genomic inflation factor lambda for all proteins ranged from 0.971 to 1.022, with an average value of 0.997, indicating no significant inflation (\u003cb\u003eTable \u003cspan refid=\"MOESM2\" class=\"InternalRef\"\u003eS2\u003c/span\u003e\u003c/b\u003e). Of the 155 proteins, 106 (68%) had only \u003cem\u003ecis\u003c/em\u003e-pQTLs, 28 (18%) had only \u003cem\u003etrans\u003c/em\u003e-pQTLs, and 21 (14%) had both. Each protein was associated with one to four independent loci, with most (70.3%) having a single pQTL and seven (4.5%) having three or more pQTLs (\u003cb\u003eFigure S3b\u003c/b\u003e). SNP-based heritability of 155 proteins was estimated using all independent lead pQTLs (\u003cb\u003eMethods;\u003c/b\u003e Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ea, d-e; \u003cb\u003eTable S4\u003c/b\u003e). The median variance explained was 0.086 (range: 0.024 for the protein MEPE to 0.609 for PNLIPRP2) for 106 proteins with only \u003cem\u003ecis\u003c/em\u003e-pQTLs, 0.049 (range: 0.025 for BSG to 0.357 for PTPRM) for 28 proteins with only \u003cem\u003etrans\u003c/em\u003e-pQTLs, and 0.108 (range: 0.049 for EPHA1 to 0.498 for ENPP7) for 21 proteins with both \u003cem\u003ecis\u003c/em\u003e- and \u003cem\u003etrans\u003c/em\u003e-pQTLs. Most variants had a single associated protein, while 31 variants were linked to up to nine proteins (\u003cb\u003eFigure S3c\u003c/b\u003e). A notable hotspot was observed at the \u003cem\u003eABO\u003c/em\u003e locus (rs8176693), associated with nine proteins: CD200, CD79B, CLEC4G, CTRC, ICAM4, IFNGR1, IL3RA, LIFR, and PTPRM (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eb). Notably, the associations between the \u003cem\u003eABO\u003c/em\u003e locus and CTRC and CD200 have been reported previously\u003csup\u003e\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e\u003c/sup\u003e, confirming our findings.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eTo assess the novelty of the identified pQTLs, we compared our results with 15 recent studies (two East Asian and 13 European; \u003cb\u003eTables S5\u003c/b\u003e). Overall, 104 in East Asian datasets (considering LD R2 \u0026gt; 0.8) and 161 pQTLs were replicated in European datasets, with matching alleles and effect directions. Importantly, 34 (16.3%) pQTLs were absent from both populations and were deemed novel (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ec). Further comparison with the UKB-PPP\u003csup\u003e\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u003c/sup\u003e study, which profiled 2,923 proteins in 54,219 participants (262 East Asians and 34,557 Europeans) and covered all 384 inflammation-related proteins in our study, showed that 12.9% and 56.9% of the 209 independent pQTLs were replicated in the UKB-PPP East Asian and European data, respectively (\u003cem\u003eP\u003c/em\u003e \u0026lt; 5 × 10\u003csup\u003e− 8\u003c/sup\u003e). Comparison of the effect sizes showed highly concordant results, with the Pearson correlation coefficients (r) being 0.99 for the East Asian and 0.93 for the UKB-PPP European data, respectively (\u003cb\u003eFigure S4a-b\u003c/b\u003e). Similar replication patterns were observed for all 19,719 significant associations (8% in East Asians and 62.5% in Europeans; r = 0.98 and 0.92, respectively; \u003cb\u003eFigure S4c-d\u003c/b\u003e).\u003c/p\u003e \u003cp\u003eIn addition to single-protein QTLs, we analyzed protein-ratio QTLs (prQTLs) and identified 13 significant prQTLs involving seven protein ratios, of which several shared genetic signals with one or more individual proteins (see Supplementary Materials for details).\u003c/p\u003e\n\u003ch3\u003eGenetic Structure of Plasma Metabolites\u003c/h3\u003e\n\u003cp\u003eWe applied precise metabolomics techniques to semi-quantitatively analyze 841 plasma metabolites from 2,470 participants in the CAS cohort\u003csup\u003e\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e\u003c/sup\u003e. These metabolites are classified into 10 superclasses\u003csup\u003e\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e\u003c/sup\u003e, including lipids and organic acids (\u003cb\u003eTable S8\u003c/b\u003e). Our analyses identified 45,001 metabQTLs (587 independent) for 369 metabolites across 194 loci at \u003cem\u003eP\u003c/em\u003e \u0026lt; 5 × 10\u003csup\u003e− 8\u003c/sup\u003e (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eb; Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ea; \u003cb\u003eTable S9\u003c/b\u003e). The genomic inflation factor ranged from 0.97 to 1.03 with a median of 1.00, indicating minimal confounding from population structure (\u003cb\u003eTable S10\u003c/b\u003e). Significant pleiotropy was observed, with 86 of the 194 loci associated with multiple metabolites (range: 2–28). The fatty acid desaturase (\u003cem\u003eFADS\u003c/em\u003e) locus on chromosome 11 (lead SNP: rs174548) exhibited the strongest pleiotropy, associating with 28 metabolites, mainly lipids or lipid-like molecules (27/28) (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eb), consistent with previous studies linking this locus to 75 lipids\u003csup\u003e\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u003c/sup\u003e. SNP-based heritability of metabolite levels was estimated using GCTA-GREML\u003csup\u003e\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e\u003c/sup\u003e (\u003cb\u003eTable S11\u003c/b\u003e), with a median of 0.103. Lipids, lipid-like molecules, and organic oxygen compounds exhibited higher heritability, whereas alkaloids and nucleosides were lower (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ec).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eTo evaluate the novelty of the identified metabQTLs, we curated a comparison set of 50 recent studies, comprising two from East Asian, 39 from European, and nine from other populations\u003csup\u003e\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e\u003c/sup\u003e (\u003cb\u003eTable S12\u003c/b\u003e). Our results demonstrated significant validation: nine of 587 metabQTLs were replicated in East Asian cohorts (covering 8 of 9 previously investigated metabolites), and 191 (covering 130 of 171 previously investigated metabolites) were validated in European populations, for a total of 193 replicated metabQTLs (\u003cb\u003eTable S10\u003c/b\u003e). The effect direction also showed strong concordance, with fully concordant with East Asian studies and \u0026gt; 90% concordant with European studies\u003csup\u003e\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e,\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e,\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e\u003c/sup\u003e (\u003cb\u003eFigure S5\u003c/b\u003e). This suggested largely shared but partially distinct genetic architectures across populations. Overall, 394 metabQTLs were identified as novel, involving 261 metabolites across all 10 superclasses and spanning all 22 chromosomes (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ed).\u003c/p\u003e \u003cp\u003eIn addition, a metabolite ratio QTL (mrQTL) analysis was performed. A total of 256 biologically informed metabolite ratios were analyzed, exhibiting a median heritability of 0.110. From these, 129 independent mrQTLs were identified across 58 loci, including 102 novel mrQTLs. Notably, 31 mrQTLs showed no significant association with their constituent metabolites, emphasizing that ratio-based analysis can reveal genetic determinants missed by examining individual metabolites. Detailed methods and results are provided in Supplementary Materials.\u003c/p\u003e\n\u003ch3\u003eShared Causal Variants Across Molecular Traits: Within the CAS Cohort\u003c/h3\u003e\n\u003cp\u003eCollectively, we generated xQTLs using our CAS cohort (CAS-meQTL, CAS-pQTL, and CAS-metabQTL) and performed two complementary colocalization analyses to investigate shared genetic regulation of molecular traits. First, we used our within-cohort xQTLs to circumvent LD confounding and identify multi-omics causal variants. Second, we integrated external eQTLs to assess general and tissue-specific overlap. Throughout these analyses, effect direction refers to the allelic effect of the lead SNP (i.e., SNPs with the highest PP.H4 within each colocalized pair) on each pair of molecular traits.\u003c/p\u003e \u003cp\u003eColocalization analysis of CAS-meQTLs and CAS-pQTLs revealed 785 CpG-protein pairs (681 CpG sites, 125 proteins, \u003cb\u003eTable S17\u003c/b\u003e). SNPs predominantly exhibited opposing effects on methylation and protein levels (54.9%) (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ea), with negative correlations between methylation and protein levels (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eb). This is consistent with the hypothesis that methylation suppresses gene expression and lowers protein levels. For example, at cg25259754-\u003cem\u003eFCRL3\u003c/em\u003e (mRNA)-FCRL3 (protein), all three signals were colocalized. The lead SNP was rs2210913, where the T allele increased \u003cem\u003eFCRL3\u003c/em\u003e mRNA (eQTL: beta = 0.73, \u003cem\u003eP\u003c/em\u003e = 3.27 × 10\u003csup\u003e− 310\u003c/sup\u003e) and protein (pQTL: beta = 1.1, \u003cem\u003eP\u003c/em\u003e = 2.97 × 10\u003csup\u003e− 202\u003c/sup\u003e) while decreasing methylation at cg25259754 (meQTL: beta = -0.5, \u003cem\u003eP\u003c/em\u003e = 3.66 × 10\u003csup\u003e− 142\u003c/sup\u003e). cg25259754 was strongly negatively correlated with FCRL3 protein (Pearson’s r = -0.54, \u003cem\u003eP\u003c/em\u003e = 4.45 × 10\u003csup\u003e− 81\u003c/sup\u003e). In contrast, at cg10700560-\u003cem\u003eLHPP\u003c/em\u003e (mRNA)-LHPP (protein), all three signals were colocalized (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ee), where the T allele of lead SNP rs11245086 showed negative effects on all three traits (meQTL: beta = -0.31, \u003cem\u003eP\u003c/em\u003e = 3.28 × 10\u003csup\u003e− 17\u003c/sup\u003e; eQTL: beta = -0.37, \u003cem\u003eP\u003c/em\u003e = 3.27 × 10\u003csup\u003e− 310\u003c/sup\u003e; pQTL: beta = -0.34, \u003cem\u003eP\u003c/em\u003e = 4.71 × 10\u003csup\u003e− 20\u003c/sup\u003e), while cg10700560 was positively correlated with LHPP protein (Pearson’s r = 0.19, \u003cem\u003eP\u003c/em\u003e = 8.13 × 10\u003csup\u003e− 10\u003c/sup\u003e).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eColocalization analysis of CAS-meQTLs and CAS-metabQTLs revealed 2,874 CpG-metabolite pairs (984 CpG sites, 261 metabolites; \u003cb\u003eTable S18\u003c/b\u003e). SNP effects on methylation and metabolite levels predominantly exhibited concordant direction (53.2%) (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ea), corresponding to positive associations between methylation and metabolite levels (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eb). DNA methylation may regulate metabolites via enzyme gene expression modulation. For example, colocalization was observed among cg21029357, \u003cem\u003eFADS2\u003c/em\u003e, and docosapentaenoic acid (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ef). The A allele of the lead SNP rs174559 was associated with decreased methylation at cg21029357 (meQTL: beta = -0.17, \u003cem\u003eP\u003c/em\u003e = 1.69 × 10\u003csup\u003e− 7\u003c/sup\u003e), increased \u003cem\u003eFADS2\u003c/em\u003e mRNA expression (eQTL: beta = 0.64, \u003cem\u003eP\u003c/em\u003e = 3.27 × 10\u003csup\u003e− 310\u003c/sup\u003e), and reduced docosapentaenoic acid levels (metabQTL: beta = -0.19, \u003cem\u003eP\u003c/em\u003e = 4.76 × 10\u003csup\u003e− 11\u003c/sup\u003e). Additionally, cg21029357 showed a significant positive correlation with docosapentaenoic acid (Pearson’s r = 0.09, \u003cem\u003eP\u003c/em\u003e = 0.0025).\u003c/p\u003e \u003cp\u003eColocalization analysis of CAS-pQTLs and CAS-metabQTLs revealed six protein-metabolite pairs (five proteins, five metabolites; \u003cb\u003eTable S19\u003c/b\u003e), with SNP effects concordant in three pairs and opposite in the other three (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ea). Four pairs showed negative associations (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eb).\u003c/p\u003e\n\u003ch3\u003eShared Causal Variants Across Molecular Traits: Colocalization with External eQTLs\u003c/h3\u003e\n\u003cp\u003eColocalization analysis with external eQTL datasets included eQTLGen\u003csup\u003e\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e\u003c/sup\u003e, which encompassed blood eQTLs, and GTEx\u003csup\u003e\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e\u003c/sup\u003e, which encompassed eQTLs for 49 tissues categorized into five categories: epithelial, immune, mesenchymal, neural, and others, allowing for the assessment of tissue-specific colocalization\u003csup\u003e\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eUsing CAS-meQTL and eQTLGen, we identified 20,543 CpG-gene pairs (20,543 CpG sites and 5,636 genes), with 52% of lead SNPs showing opposite effects on methylation and expression (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ec; \u003cb\u003eTable S20\u003c/b\u003e). Colocalization analyses with GTEx tissue eQTLs identified 6,421 colocalized pairs (6,421 CpGs and 1,176 genes), with 50.4% of SNPs showing opposite effects (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ec; \u003cb\u003eTable S21\u003c/b\u003e). GTEx analyses highlighted tissue-specific regulation: 36.6% (2,351/6,421) of pairs were category-specific, while 17% were found across all tissue categories (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ed). The number of colocalized pairs varied significantly by tissue, ranging from the fewest in kidney cortex (n = 175, 44% effect consistency) to the most in whole blood (n = 1,595, 51.7% opposite effects).\u003c/p\u003e \u003cp\u003eColocalization analysis between CAS-pQTLs and eQTLs for the corresponding genes confirmed a modest overall degree of colocalization, consistent with prior studies\u003csup\u003e\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u003c/sup\u003e. Using eQTLGen, we identified 21 gene-protein pairs where the allelic effects on mRNA and protein abundance were highly concordant (95.2% of SNPs; Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ec; \u003cb\u003eTable S22\u003c/b\u003e). Colocalization with GTEx tissue eQTLs yielded seven gene-protein pairs (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ec; \u003cb\u003eTable S23\u003c/b\u003e). Of these, four were category-specific and three were shared (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ed). Notably, \u003cem\u003eBTN3A2\u003c/em\u003e demonstrated the most pervasive regulation, colocalizing across all 49 tissues, which suggests a broadly active regulatory mechanism.\u003c/p\u003e \u003cp\u003eColocalization analysis between CAS-metabQTLs and eQTLs identified 362 potential effector genes for metabQTLs associated with 230 metabolites (\u003cb\u003eTable S24\u003c/b\u003e). In eQTLGen, we discovered colocalization evidence for 134 metabolites and 105 genes, including the key example glutarylcarnitine-\u003cem\u003eGCDH\u003c/em\u003e (colocalization PP = 0.99; opposite effect direction) and deoxycholic acid 3-glucuronide with \u003cem\u003eUGT2B17\u003c/em\u003e (PP = 0.88; same effect direction). Interestingly, \u003cem\u003eGCDH\u003c/em\u003e encodes glutaryl-CoA dehydrogenase, and its deficiency leads to glutarylcarnitine accumulation. In GTEx, 955 metabolite-gene pairs (170 metabolites, 285 genes) across 49 tissues (\u003cb\u003eTable S25\u003c/b\u003e) were detected. We identified 181 pairs (35 genes and 76 metabolites) in whole blood, of which 84 pairs (46.4%) replicated the eQTLGen findings. The GTEx data also demonstrated regulatory diversity: 57.5% (549/955) pairs were category-specific, while only 12.3% (117/955) pairs were pan-tissue categories (e.g., glutarylcarnitine-\u003cem\u003eGCDH\u003c/em\u003e in 44 tissues), highlighting widespread tissue-specific regulatory mechanisms (Figs.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ed and S8).\u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eMediation and Partial Correlation Analyses\u003c/h2\u003e \u003cp\u003e Our CAS cohort contained 1,054 individuals with matched genotype, methylation, proteomic, and metabolomic profiles, enabling mediation and partial correlation analyses that were otherwise not applicable. Using 3,665 colocalized pairs (785 CpG-protein pairs, 2,874 CpG-metabolite pairs, and six protein-metabolite pairs), we performed bidirectional mediation analyses to evaluate whether the effects of genetic variants on one molecular trait were mediated through another, and vice versa\u003csup\u003e\u003cspan additionalcitationids=\"CR33\" citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e–\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e\u003c/sup\u003e. We identified 187 mediation pairs (Sobel \u003cem\u003eP\u003c/em\u003e \u0026lt; 0.05 and mediation \u0026gt; 0) (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ea) across four models: SNP-CpG-Protein (SCP), SNP-Protein-CpG (SPC), SNP-CpG-Metabolite (SCM), and SNP-Metabolite-CpG (SMC) (\u003cb\u003eTable S26\u003c/b\u003e). Among these, 32 pairs exhibited unidirectional mediation, while 155 exhibited bidirectional mediation. The overall median mediation proportion was 0.095 (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eb), with the SCM model demonstrating the highest estimates (median = 0.146) and the SMC model the lowest (median = 0.056). After Bonferroni correction (Sobel \u003cem\u003eP\u003c/em\u003e \u0026lt; 0.05/3,665 and mediation \u0026gt; 0), 17 significant pairs remained (8 unidirectional, 9 bidirectional) (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ea), with an increased median mediation proportion of 0.249 (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eb). The SCM model again had the highest mediation proportion (median = 0.416).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eWe conducted partial correlation analysis on the 3,665 colocalized pairs to determine if the relationship between the two colocalized molecular phenotypes was independent of the causal genetic variant. This analysis tests for pleiotropy: if the variants independently influence both traits, the SNP-adjusted residuals should exhibit no residual correlation\u003csup\u003e\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e\u003c/sup\u003e. As expected, correlations decreased after adjusting for genetic effects (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ec). Out of all pairs, 315 (8.6%) showed a significant partial correlation (\u003cem\u003eP\u003c/em\u003e \u0026lt; 0.05), with only 25 pairs (0.7%) remaining significant after Bonferroni correction (\u003cb\u003eTable S27\u003c/b\u003e).\u003c/p\u003e \u003cp\u003eThe results from the partial correlation analysis highly align with the mediation analysis: all 187 significant mediation pairs were captured within the 315 significant partial correlations (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ed). For example, the rs28548211 – cg09540471 – AMN locus, where cg09540471 is located in the first exon of \u003cem\u003eAMN\u003c/em\u003e, showed strong mediation and partial correlation. rs28548211 had a significant effect on cg09540471 (meQTL: beta = -0.34, \u003cem\u003eP\u003c/em\u003e = 8.28 × 10\u003csup\u003e− 10\u003c/sup\u003e) and on AMN (pQTL: beta = 0.37, \u003cem\u003eP\u003c/em\u003e = 4.63 × 10\u003csup\u003e− 8\u003c/sup\u003e). Mediation analysis found that cg09540471 methylation mediated 22.6% of rs28548211’s effect on AMN and maintained a residual negative correlation after adjustment for the SNP, suggesting that cg09540471 methylation may causally influence AMN levels. Similarly, the rs11245086 – cg10700560 – LHPP locus exhibited bidirectional mediation (SCP and SPC Sobel \u003cem\u003eP\u003c/em\u003e \u0026lt; 0.05) and significant partial correlation (Pearson’s \u003cem\u003eP\u003c/em\u003e \u0026lt; 0.05). After removing the effect of rs11245086, cg10700560 and LHPP remained positively correlated, further supporting the regulatory role of cg10700560 methylation in LHPP expression.\u003c/p\u003e \u003cp\u003eAdditionally, we identified the triplet rs174559 – cg21029357 – docosapentaenoic acid exhibiting bidirectional mediation, with significant Sobel \u003cem\u003eP\u003c/em\u003e-values for both SCM and SMC (\u003cem\u003eP\u003c/em\u003e \u0026lt; 0.05) and significant partial correlation (Pearson’s \u003cem\u003eP\u003c/em\u003e \u0026lt; 0.05). After adjusting for rs174559, cg21029357 and docosapentaenoic acid maintained a significant positive correlation, providing further evidence that cg21029357 methylation plays a regulatory role in docosapentaenoic acid levels.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eCausal Associations Between Proteins or Metabolites and Human Complex Phenotypes\u003c/h3\u003e\n\u003cp\u003eWe performed two-sample MR analysis\u003csup\u003e\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e\u003c/sup\u003e to evaluate potential causal relationships between protein/metabolite levels and complex human phenotypes. Specifically, we used proteins/ metabolites involved in QTLs as exposures (\u003cb\u003eTables S28-29\u003c/b\u003e), while outcome data were sourced from GWAS results for 220 phenotypes in the Biobank Japan (BBJ, East Asian) cohort\u003csup\u003e\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u003c/sup\u003e (\u003cb\u003eTable S30\u003c/b\u003e). After Bonferroni correction (\u003cem\u003eP\u003c/em\u003e \u0026lt; 2.27 × 10\u003csup\u003e− 4\u003c/sup\u003e; 0.05/220), we identified 229 significant protein-outcome associations (55 proteins and 62 outcomes) and 290 significant metabolite-outcome associations (86 metabolites and 86 outcomes). Reverse MR confirmed bidirectional causality for two protein-outcome pairs (C1QA with serum creatinine; ICAM4 with red blood cell count) and 20 metabolite-outcome pairs (15 metabolites, with 10 pairs driven by total bilirubin). These associations were excluded from subsequent analyses, yielding 227 protein–outcome and 270 metabolite–outcome pairs (\u003cb\u003eTables S31-32\u003c/b\u003e).\u003c/p\u003e \u003cp\u003eWe employed two distinct strategies to validate our MR results: observational consistency analysis and colocalization analysis. First, we examined linear relationships between 25 biomarker outcomes and their corresponding exposures in the CAS cohort. Among 105 protein-outcome pairs, 40 (38.1%) exhibited statistically significance observational associations (\u003cem\u003eP\u003c/em\u003e \u0026lt; 0.05), with 26 (65%) showing directionally consistent regression coefficients relative to MR beta estimates (\u003cb\u003eTable S33\u003c/b\u003e; \u003cb\u003eFigure S9\u003c/b\u003e). For 130 metabolite-outcome pairs, 69 (53.1%) were significant (\u003cem\u003eP\u003c/em\u003e \u0026lt; 0.05), with 45 (65.2%) demonstrating consistency with MR beta estimates (\u003cb\u003eTable S34\u003c/b\u003e; \u003cb\u003eFigure S9\u003c/b\u003e). This concordance between observational and MR-based associations supports the reliability of our causal inference framework. Second, we performed colocalization analysis to validate shared genetic determinants between exposures and outcomes. Specifically, we found 132 (58.1%) protein–outcome pairs and 169 (62.6%) metabolite–outcome pairs showed significant colocalization (PP \u0026gt; 0.75; \u003cb\u003eMethods\u003c/b\u003e), involving 81 exposures (34 proteins and 47 metabolites) and 89 outcomes (\u003cb\u003eTables S31-32\u003c/b\u003e). The colocalization results suggested that protein/metabolite levels and the related outcomes may be co-regulated by shared genetic variants, further supporting the robustness of our MR results.\u003c/p\u003e \u003cp\u003eOne of our MR results showed that a significant causal association between elevated FCRL3 levels and increased Graves’ disease risk (MR: beta = 0.162, \u003cem\u003eP\u003c/em\u003e = 8.37×10\u003csup\u003e− 12\u003c/sup\u003e; Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003ea). MR analyses identified five independent SNPs as instrumental variables (IVs), among which rs2210912 demonstrated the strongest association with FCRL3 protein levels (pQTL: \u003cem\u003eP\u003c/em\u003e = 1.64 × 10\u003csup\u003e− 202\u003c/sup\u003e, Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003ea). Colocalization analysis further supported this finding, indicating shared genetic variants between FCRL3 pQTL and Graves’ disease (colocalization PP = 0.93, Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003ea). Additionally, rs2210912 was also found to be significantly associated with enhanced \u003cem\u003eFCRL3\u003c/em\u003e expression in blood (eQTLGen: beta = 0.69, \u003cem\u003eP\u003c/em\u003e = 3.27 × 10\u003csup\u003e− 310\u003c/sup\u003e, Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003ea), consistent with previous reports of upregulated \u003cem\u003eFCRL3\u003c/em\u003e expression in Graves’ disease patients\u003csup\u003e\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e\u003c/sup\u003e (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eb). These convergent lines of evidence strongly support the pathogenic role of FCRL3 in Graves' disease etiology.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eAnother example of the MR results revealed a protective association between deoxycholic acid 3-glucuronide (DCA-3G) and colorectal cancer (CRC) risk (MR: beta = − 0.096, \u003cem\u003eP\u003c/em\u003e = 4.94 × 10\u003csup\u003e− 6\u003c/sup\u003e), corroborating previous findtings\u003csup\u003e\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e\u003c/sup\u003e, which also found a reduced DCA-3G level associated with a high CRC risk (MR: beta = − 0.041, \u003cem\u003eP\u003c/em\u003e = 4.17 × 10\u003csup\u003e− 7\u003c/sup\u003e). Fecal DCA-3G levels are reported to be lower in CRC patients compared to healthy controls\u003csup\u003e\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e\u003c/sup\u003e, suggesting a potential relationship between reduced metabolite levels and the disease. Specifically, the candidate rs2603153 demonstrated robust association with elevated DCA-3G levels across multiple cohorts (metabQTL in CAS cohort: beta = 1.25, \u003cem\u003eP\u003c/em\u003e = 3.51 × 10\u003csup\u003e− 294\u003c/sup\u003e; CLSA metabQTL: beta = 0.51, \u003cem\u003eP\u003c/em\u003e = 8.7 × 10\u003csup\u003e− 252\u003c/sup\u003e; METSIM metabQTL: beta = 0.61, \u003cem\u003eP\u003c/em\u003e = 3.3 × 10\u003csup\u003e− 186\u003c/sup\u003e) and reduced CRC risk (BBJ GWAS for CRC: beta = − 0.12, \u003cem\u003eP\u003c/em\u003e = 4.94 × 10\u003csup\u003e− 6\u003c/sup\u003e). Further colocalization analysis confirmed shared causal variants between DCA-3G and CRC at rs2603153 (colocalization PP = 0.98) and implicated that the gene \u003cem\u003eUGT2B17\u003c/em\u003e was associated with both DCA-3G (colocalization PP = 0.88) and CRC (colocalization PP = 0.86) in this region (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003ec). Because \u003cem\u003eUGT2B17\u003c/em\u003e catalyzes the glucuronidation of DCA to form DCA-3G and glucuronidation is a critical detoxification pathway in the body for the clearance of toxic substances\u003csup\u003e\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e\u003c/sup\u003e, we hypothesized that rs2603153 upregulates \u003cem\u003eUGT2B17\u003c/em\u003e expression, leading to lower levels of DCA and toxicity to the colon or rectum, thereby lowering CRC risk (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003ed). Previous studies have shown that \u003cem\u003eUGT2B17\u003c/em\u003e was significantly lowly expressed in CRC tissues\u003csup\u003e\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e\u003c/sup\u003e (differential expression analysis: logFC = − 1.9, \u003cem\u003eP\u003c/em\u003e = 1.04 × 10\u003csup\u003e− 11\u003c/sup\u003e) by combining 10 CRC datasets, providing further support for the protective role of \u003cem\u003eUGT2B17\u003c/em\u003e on CRC.\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eIn this study, we conducted a comprehensive analysis of blood samples from 3,102 genotyped Chinese individuals and generated matched multi-omics data, including DNA methylation, protein expression, and metabolite measurements. Our findings revealed the broad regulatory effects of genetic variation on blood biomarkers, providing novel insights into complex regulatory networks. Along with the meQTL reported in our previous study, we generated pQTL, prQTL, metabQTL, mrQTL, and meQTL from the same cohort, providing valuable information regarding the Chinese population.\u003c/p\u003e \u003cp\u003eBy generating and integrating various types of QTLs, including in-house meQTLs, pQTLs, and metabQTLs, as well as external eQTLs, we explored shared genetic mechanisms among different molecular traits (e.g., methylation, protein expression, and metabolites) through colocalization analysis, mediation analysis, and partial correlation analysis. For the colocalization analysis, we investigated pairs of eQTL–meQTL, eQTL–pQTL, eQTL–metabQTL, meQTL–pQTL, meQTL–metabQTL, and pQTL–metabQTL. In the colocalization analysis with eQTLs, the effect directions of SNPs for colocalized pQTLs and eQTLs were highly consistent (over 90%), confirming that our reported pQTLs were highly reliable, regardless of population differences. We observed that more than 50% of colocalized meQTL–eQTL and meQTL–pQTL pairs tended to have opposite effect directions (e.g., cg25259754-FCRL3), supporting the notion that high DNA methylation suppresses transcription and, consequently, protein levels\u003csup\u003e\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e\u003c/sup\u003e. However, a notable proportion (over 40%; e.g., cg10700560-LHPP) showed concordant effect directions, meaning that CpGs with high DNA methylation levels were correlated with high gene and protein expression, highlighting the complexity of the relationship between DNA methylation and gene/protein expression\u003csup\u003e\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e\u003c/sup\u003e. The GTEx tissue eQTLs also allowed us to investigate tissue specificity of the pairs. Indeed, we found that 36.6%, 57.1%, and 57.5% of colocalized meQTLs, pQTLs, and metabQTLs were identified exclusively in a single tissue category, emphasizing tissue-specific associations. Mediation analysis identified 187 significant pairs, with the SCM model showing the highest mediation proportion (median = 0.146). Partial correlation analysis further supported these findings, because all significant pairs in the mediation analysis remained significant in the partial correlation analysis (e.g., cg21029357 and docosapentaenoic acid). These results emphasized the complex interplay between genetic variants, epigenetic regulation, and downstream molecular traits, providing valuable insights into the regulatory pathways influencing human disease.\u003c/p\u003e \u003cp\u003eThe comprehensive xQTL profiles, particularly for the East Asian population, allowed us to deeply investigate the potential biological mechanisms underlying complex traits using GWAS summary statistics. Through MR analysis, we identified potential causal relationships between 227 protein–outcome pairs and 270 metabolite–outcome pairs. By integrating multi-omics data, we depicted the potential regulatory framework of FCRL3 in Graves’ disease and revealed how DCA-3G exerts a protective role in colorectal cancer.\u003c/p\u003e \u003cp\u003eThe current study had several limitations. First, the participants in the CAS cohort were mainly employees of institutes affiliated with the CAS, most of whom were engaged in academic work, which may limit the generalizability of the findings to the broader population. Second, this study included only \u003cem\u003ecis-\u003c/em\u003emeQTLs and did not evaluate the effects of distal DNA methylation regulation on proteins and metabolites. Third, the proteins tested were limited to those measurable in plasma and included in the Olink Inflammation panel. Consequently, the identified pQTLs did not encompass the full spectrum of proteins across various cell types or tissues, constraining the interpretation of their biological functions. This also led to a small number of colocalization results between proteins and metabolites. Additionally, the validation cohort was relatively small, and further validation in larger cohorts with more comprehensive proteomic coverage is necessary. Fourth, because of the absence of an East Asian validation cohort with metabolic profiles closely matching those of the CAS cohort, some of the metabQTL results were not independently validated, and their reliability requires further assessment. Fifth, because blood samples from the CAS cohort did not meet the requirements for transcriptomic profiling, we were unable to obtain transcriptomic data for eQTL discovery. Therefore, the colocalization of xQTLs with eQTLs relied on two external eQTL datasets, which may have affected the detection of colocalization signals and prevented mediation analyses from exploring the causal relationship between xQTLs and eQTLs. Finally, in the MR analyses, exposure data were derived from a Chinese population, while outcome data were derived from a Japanese population. Although both are East Asian populations, subtle differences in genetic architecture could have influenced the results. Furthermore, most exposure in the MR analyses was represented by single instrumental variables, limiting the capacity to systematically test heterogeneity and horizontal pleiotropy, which typically requires multiple instrumental variables. Therefore, despite the statistical significance of the MR results, their biological relevance should be interpreted with caution.\u003c/p\u003e \u003cp\u003eIn summary, the current study utilized extensive multi-omics data from the CAS cohort to build a comprehensive multi-omics genomic atlas of meQTL, pQTL, and metabQTL in an East Asian population, discovering many new QTLs. Using colocalization and mediation analyses, we identified shared genetic factors and causal pathways that connect DNA methylation, gene expression, protein levels, and metabolites. This research not only expands the population diversity of QTL studies but also offers important insights into the genetic architecture and regulatory mechanisms underlying complex traits and diseases. This resource will help deepen understanding of the causal links between molecular traits and disease development, aid in identifying clinical biomarkers, and support drug target validation.\u003c/p\u003e "},{"header":"Methods","content":"\u003ch2\u003eThe Chinese Academy of Sciences (CAS) Cohort\u003c/h2\u003e\u003cp\u003eThe CAS cohort is a prospective study aimed at identifying risk factors for physical and mental health through traditional epidemiological and multi-omics analyses. A total of 3,102 participants, primarily employees of CAS institutes in Beijing, were recruited between August 2020 and October 2021. All participants underwent standardized physical examinations at Zhongguancun Hospital, conducted by trained physicians and nurses. Fasting blood samples were collected for omics profiling. We generated genotyping data for all participants, DNA methylation data for 1,060, proteomic data for 1,056, and metabolomic data for 2,479 individuals, with 1,054 participants having all four types of multi-omics data (\u003cb\u003eFigure \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003ea\u003c/b\u003e). The study was approved by the Institutional Review Board of the Beijing Institute of Genomics, CAS, and Beijing Zhongguancun Hospital (approval IDs: 2020H020, 2021H001, and 20201229). All participants provided written informed consent.\u003c/p\u003e\u003ch2\u003eArray Genotyping\u003c/h2\u003e\u003cp\u003eDNA was extracted from peripheral blood samples stored in -80°C freezers. Genotyping data were generated using the Infinium Asian Screening Array + MultiDisease-24 BeadChip. Genotypes were called using GenTrain v2.0 in GenomeStudio. Individuals with sex mismatches, cryptic relatedness, potential contamination, or genotyping call rate below 98% were excluded. At the SNP level, we excluded duplicates, non-autosomal variants, and SNPs with a call rate below 95%, MAF less than 1%, or Hardy-Weinberg equilibrium p-value (\u003cem\u003eP_HWE\u003c/em\u003e) \u0026lt; 1 × 10\u003csup\u003e− 4\u003c/sup\u003e. Unmeasured SNPs were imputed using SHAPEIT2\u003csup\u003e42\u003c/sup\u003e and IMPUTE2\u003csup\u003e43\u003c/sup\u003e (reference panel: 1000 Genomes Project phase 3 reference\u003csup\u003e\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e\u003c/sup\u003e, GRCh37), and SNPs with an imputation info score \u0026gt; 0.6 were retained. Quality checks of genotype data used for the proteomic and metabolomic data analyses were conducted separately, including removal of individuals with heterozygosity \u0026gt; 5 standard deviations, and SNPs with a missing rate \u0026gt; 5%, MAF \u0026lt; 1%, and \u003cem\u003eP_HWE\u003c/em\u003e \u0026lt; 1 × 10\u003csup\u003e− 6\u003c/sup\u003e. Finally, 4,825,573 SNPs from 1,056 samples and 4,903,063 SNPs from 2,470 samples were retained for subsequent pQTL and metabQTL analyses, respectively. The genotype data were merged with the 1000 Genomes Project data, and principal component analysis (PCA) was performed. We found that the CAS cohort clustered well with East Asian populations from the 1000 Genomes Project, indicating that the CAS cohort is highly representative of East Asian populations (\u003cb\u003eFigure \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003eb\u003c/b\u003e).\u003c/p\u003e\u003ch2\u003eDNA Methylation Profiling and meQTL Discovery\u003c/h2\u003e\u003cp\u003eDNA methylation profiling and meQTL discovery were conducted in our previous study\u003csup\u003e\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u003c/sup\u003e, which included detailed information regarding protocols, sample preprocessing, quality control, normalization, and statistical analysis.\u003c/p\u003e\u003ch2\u003eProteomics Profiling\u003c/h2\u003e\u003cp\u003eThe Olink Explore-384 Inflammation panel22 was used to measure the levels of 384 inflammation-related plasma proteins in 1,056 participants. Olink employed Normalized Protein Expression (NPX) as the unit of protein level on the log2 scale. Background levels were established using blank control samples for each protein, and the lower limit of detection was defined as 3 standard deviations above the background. Proteins with quality control (QC) or assay warnings were excluded. A total of 365 proteins were retained after QC for further analysis (\u003cb\u003eTable \u003cspan refid=\"MOESM2\" class=\"InternalRef\"\u003eS2\u003c/span\u003e\u003c/b\u003e).\u003c/p\u003e\u003ch2\u003eMetabolomics Profiling\u003c/h2\u003e\u003cp\u003eUntargeted plasma metabolomics profiling was conducted on 2,479 samples from the CAS cohort (LipidALL Technologies, Changzhou, China). ACQUITY UPLC HSST3 1.8 µm, 2.1 × 100 mm columns (Waters, Dublin, Ireland) were used for reverse-phase chromatography, and ACQUITY UPLC BEHAmide 1.7 µm, 2.1 × 100 mm columns (Waters, Dublin, Ireland) were used for normal-phase chromatography. Liquid chromatography-mass spectrometry analysis was carried out using an ultra-high-performance liquid chromatography system (Agilent 1290, Agilent Technologies, Germany) coupled with a high-resolution mass spectrometer (5600 Triple TOF Plus, AB Sciex, Singapore). A total of 3,784 distinct metabolites were detected, with 848 successfully identified, and the remaining 2,936 metabolites, categorized into 492 unknown groups, remained unidentified. After removing metabolites with a missing rate greater than 50%, 841 metabolites were retained for further analysis.\u003c/p\u003e\u003ch2\u003eIdentification of pQTL and Protein Ratio QTL\u003c/h2\u003e\u003cp\u003eNPX values were inverse-normalized and corrected for covariates (age, sex, and the first 10 genomic PCs) to obtain residuals. QTL analysis was then performed using PLINK v1.9\u003csup\u003e45\u003c/sup\u003e. Associations were classified as \u003cem\u003ecis-\u003c/em\u003epQTLs if variants were located within 1 Mb of the transcription start site (TSS), while those beyond 1 Mb were considered \u003cem\u003etrans-\u003c/em\u003epQTLs. We defined pQTL as a region including the association signals that reach genome-wide significance (\u003cem\u003eP\u003c/em\u003e \u0026lt; 5 × 10\u003csup\u003e− 8\u003c/sup\u003e) and the lead variant of each pQTL as the variant with the lowest \u003cem\u003eP\u003c/em\u003e-value in the region for a given protein.\u003c/p\u003e\u003cp\u003eFor protein ratios, we computed the partial correlation between all protein-protein pairs using the R function ggm.estimate.pcor from the package GeneNet\u003csup\u003e\u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e\u003c/sup\u003e. At a Bonferroni significant \u003cem\u003eP\u003c/em\u003e-value of 7.5 × 10\u003csup\u003e− 7\u003c/sup\u003e (0.05 × 2 / [365 × 364]), 28 partial correlations were identified. For these ratios, inverse-normal scaled differences between the two NPX values were used as dependent variables, based on the relation log(A/B) = log(A) − log(B). For genotypes, we followed the filtering options for pQTL mapping. The GWAS on 28 ratios was conducted using PLINK v2.0\u003csup\u003e47\u003c/sup\u003e with the -glm option, using age, sex, and the first genetic PC as covariates. The protein ratio associations with Bonferroni levels of significance for \u003cem\u003eP\u003c/em\u003e-values (\u003cem\u003eP\u003c/em\u003e \u0026lt; 5 × 10\u003csup\u003e− 8\u003c/sup\u003e/28) and p-gains (p-gain \u0026gt; 28 × 10\u003csup\u003e7\u003c/sup\u003e) were defined as protein ratio QTLs. We then refined the associations within ± 500 kb of the respective lead variant using the R package coloc (v.5.1.0)\u003csup\u003e48\u003c/sup\u003e.\u003c/p\u003e\u003ch2\u003eIdentification of metabQTL and Metabolite Ratio QTL\u003c/h2\u003e\u003cp\u003eMetabolite levels were inverse-normalized, corrected for covariates (age, sex, and the first 10 genomic PCs), and inverse-normalized again to obtain residuals. QTL analysis was conducted using PLINK v2.0\u003csup\u003e47\u003c/sup\u003e to identify metabQTLs.\u003c/p\u003e\u003cp\u003eFor metabolite ratios, we queried HMDB\u003csup\u003e\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e\u003c/sup\u003e to identify enzymes or transporters associated with the metabolites, identifying 256 pairs of metabolites sharing the same enzymes or transporters (\u003cb\u003eTable S13\u003c/b\u003e). Metabolite ratios were calculated by dividing the level of one metabolite by the other. Similar to the metabolite levels, ratios were inverse-normalized, corrected for covariates, and inverse-normalized again before performing genome-wide association analysis using PLINK v2.0\u003csup\u003e47\u003c/sup\u003e.\u003c/p\u003e\u003ch2\u003eIdentification of Independent QTL and Definition of Loci\u003c/h2\u003e\u003cp\u003eWe used the GCTA-COJO\u003csup\u003e\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e\u003c/sup\u003e method to identify independent QTLs from multiple SNPs in LD. The LD reference was calculated using 1,056 and 2,470 unrelated East Asian individuals from our study. Other COJO parameters were set as follows: cojo-p 5e-8, maf 0.01, cojo-wind 5000, and cojo-collinear 0.9. SNPs with \u003cem\u003eP\u003c/em\u003e \u0026lt; 5.0 × 10\u003csup\u003e− 8\u003c/sup\u003e in the COJO results were considered as significant independent QTLs. For each independent QTL, we defined loci as regions extending 500 kb upstream and downstream. If two loci overlapped, they were merged into a single larger locus.\u003c/p\u003e\u003ch2\u003eIdentification of Novel QTL\u003c/h2\u003e\u003cp\u003eTo identify novel pQTLs, we collected 15 studies published after 2020 from the metabolomix website\u003csup\u003e\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e\u003c/sup\u003e (accessed July 2, 2024) and the publication list from UKB-PPP\u003csup\u003e\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u003c/sup\u003e (\u003cb\u003eTable S5\u003c/b\u003e). These studies encompassed three assay platforms (Olink, SomaScan, and mass spectrometry) and two populations (East Asian and European). Previously reported pQTLs were defined as variants that (i) reached the reported significance threshold, (ii) had a consistent direction of effect, (iii) were associated with the same protein, and (iv) were in LD (R² \u0026gt; 0.8) with the pQTL in East Asian studies, whereas LD was not considered in European studies. All information about novelty identification is included in \u003cb\u003eTables S3\u003c/b\u003e.\u003c/p\u003e\u003cp\u003eFor novel metabQTLs, we compiled data from 50 articles curated by Kastenmüller et al.\u003csup\u003e27\u003c/sup\u003e (accessed January 10, 2024). For articles published before 2021, we used the list curated by Yin et al. and filtered by Chen et al\u003csup\u003e13,28\u003c/sup\u003e. For studies published after 2021, we manually curated the data. We extracted association results from those papers using a \u003cem\u003eP\u003c/em\u003e-value threshold at \u003cem\u003eP\u003c/em\u003e \u0026lt; 5.0 × 10\u003csup\u003e− 8\u003c/sup\u003e or study-specific \u003cem\u003eP\u003c/em\u003e-value thresholds. MetabQTLs identified in our CAS cohort were considered known if they were located within 1 Mb of previously reported associations; otherwise, they were deemed novel.\u003c/p\u003e\u003ch2\u003eValidation of QTL\u003c/h2\u003e\u003cp\u003eTo validate our pQTL results, we downloaded the pQTL summary statistics from UKB-PPP, which included all proteins measured in our study. We included all significant pQTLs (\u003cem\u003eP\u003c/em\u003e \u0026lt; 5 × 10\u003csup\u003e− 8\u003c/sup\u003e) from UKB-PPP for the 365 plasma immune proteins in our analysis and compared the shared pQTLs separately for the East Asian and European populations.\u003c/p\u003e\u003cp\u003eTo validate metabQTLs, we conducted a comparison with three previously published studies: one in East Asian populations (the Nagahama study\u003csup\u003e\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e\u003c/sup\u003e) and two in European populations (CLSA\u003csup\u003e\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u003c/sup\u003e and METSIM\u003csup\u003e\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e\u003c/sup\u003e). We compared the significance and effect direction between our results and those from the three reference studies.\u003c/p\u003e\u003ch2\u003eProtein Variance Explained by pQTL\u003c/h2\u003e\u003cp\u003eWe used the following equation to estimate the proportion of variance explained (PVE) by the lead independent pQTLs for each protein: \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:PVE=\\:2\\times\\:MAF\\times\\:\\left(1-MAF\\right)\\times\\:{\\beta\\:}^{2}\\)\u003c/span\u003e\u003c/span\u003e, where MAF denotes minor allele frequency and β denotes the SNP effect size. Details of the method have been previously reported\u003csup\u003e\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e,\u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e\u003ch2\u003eSNP-based Metabolite Heritability\u003c/h2\u003e\u003cp\u003eSNP-based heritability of metabolites and metabolite ratios was assessed using GCTA-GREML (v1.94.1)\u003csup\u003e26\u003c/sup\u003e. We calculated the genetic relationship matrix using the quality-controlled genotype data and estimated heritability with default parameters. Before performing GREML, we inverse-normalized the phenotypes (metabolites or metabolite ratios) and adjusted for age, sex, and the first 10 genomic PCs using the --qcovar parameter\u003csup\u003e\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e\u003ch2\u003eColocalization Analysis\u003c/h2\u003e\u003cp\u003eWe conducted colocalization analysis to identify shared causal variants between meQTL/pQTL/metabQTL and eQTL. We utilized two external datasets: (1) eQTLGen\u003csup\u003e\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e\u003c/sup\u003e, a meta-analysis of 37 cohorts (N = 31,684) with \u003cem\u003ecis-\u003c/em\u003eeQTL data for 19,251 genes in peripheral blood, and (2) GTEx\u003csup\u003e\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e\u003c/sup\u003e, encompassing 49 tissues from European populations. We extracted SNPs within 1 Mb of each independent QTL and performed colocalization analysis using the R package coloc (v.5.1.0) with the recommended parameters (\u003cem\u003eP\u003c/em\u003e\u003csub\u003e\u003cem\u003e1\u003c/em\u003e\u003c/sub\u003e = 1 × 10\u003csup\u003e− 4\u003c/sup\u003e, \u003cem\u003eP\u003c/em\u003e\u003csub\u003e\u003cem\u003e2\u003c/em\u003e\u003c/sub\u003e = 1 × 10\u003csup\u003e− 4\u003c/sup\u003e, \u003cem\u003eP\u003c/em\u003e\u003csub\u003e\u003cem\u003e12\u003c/em\u003e\u003c/sub\u003e = 1 × 10\u003csup\u003e− 5\u003c/sup\u003e), considering posterior probability (PP) \u0026gt; 0.75 as colocalized. Additionally, following the classification approach by Carreras-Torres et al.\u003csup\u003e31\u003c/sup\u003e, we grouped the GTEx tissues into four main categories and an “Others” category to assess tissue-specific associations.\u003c/p\u003e\u003cp\u003eFor meQTL-eQTL and pQTL-eQTL colocalization, we restricted the analyses to CpG, mRNA, and protein annotations of the same gene. For metabQTL-eQTL colocalization, we explored all possible pairs of variants. Effect directions of colocalized SNPs with the highest PP were used to evaluate consistency across traits.\u003c/p\u003e\u003cp\u003eAdditionally, we conducted colocalization analyses within the CAS cohort for different types of QTLs. Performing colocalization within the same cohort ensures consistency in genetic structure, eliminates the impact of LD, and allows for better identification of signals that might not be detectable across populations.\u003c/p\u003e\u003ch2\u003eMediation Analysis\u003c/h2\u003e\u003cp\u003eThe mediation test investigates whether the effect of a genetic variant on a phenotype is mediated through an intermediate molecular trait. In this case, using data from 1,054 individuals with matched multi-omics profiles, we performed mediation analyses on 3,665 pairs that were identified by the colocalization test. For each pair, we conducted bidirectional mediation analysis, testing whether the SNP influenced the phenotype through the mediator, and, conversely, whether the SNP influenced the mediator through the phenotype. While mediation implies a directional hypothesis, bidirectional testing was performed to explore the potential complexity and feedback in biological regulation.\u003c/p\u003e\u003cp\u003eMediation effect sizes were estimated using the R package mediation (v4.5.0)\u003csup\u003e34\u003c/sup\u003e, and statistical significance was assessed using the Sobel test implemented in the bda package (v18.3.2)\u003csup\u003e53\u003c/sup\u003e. Pairs with a Sobel \u003cem\u003eP\u003c/em\u003e-value \u0026lt; 0.05 and a positive mediation effect were considered to show evidence of mediation, while those with a Sobel \u003cem\u003eP\u003c/em\u003e-value \u0026lt; 0.05/3,665 (Bonferroni correction) and a positive mediation effect were considered to have statistically significant mediation.\u003c/p\u003e\u003cp\u003eBefore mediation analysis, we prepared each omics dataset to satisfy model assumptions and reduce technical confounding. For the methylation data, we applied inverse normalization, adjusted for age, sex, and the first 10 genomic PCs, methylation batch, and estimated blood cell fraction, and then performed inverse normalization on the residuals. For the proteomic data, we applied inverse normalization, adjusted for age, sex, the first 10 genomic PCs, and Olink batch, and again inverse normalized the residuals. For the metabolomic data, we applied inverse normalization, adjusted for age, sex, and the first 10 genomic PCs, and inverse normalized the residuals.\u003c/p\u003e\u003ch2\u003ePartial Correlation Analysis\u003c/h2\u003e\u003cp\u003eWe performed partial correlation analysis on pairs of molecules identified by the colocalization analyses (N = 3,665)\u003csup\u003e32\u003c/sup\u003e. Before computing correlations, the data were processed using the same steps described in the mediation analysis section. First, we calculated the Pearson correlation coefficient and its significance for each pair. Next, we regressed out the effect of the colocalized SNP to obtain the residuals and computed the Pearson correlation between these residuals to derive the partial correlation for each pair. The original and partial correlations were compared to assess the impact of SNPs on the relationship between traits.\u003c/p\u003e\u003ch2\u003eTwo-Sample MR\u003c/h2\u003e\u003cp\u003e \u003cb\u003eExposure and Instrumental Variable Selection.\u003c/b\u003e We included all proteins and metabolites with QTLs as exposures, encompassing 155 proteins and 369 metabolites that had QTLs. We then processed the exposure data using the “clump_data” function from the R package TwoSampleMR\u003csup\u003e\u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e54\u003c/span\u003e\u003c/sup\u003e (v.0.5.6) to obtain independent instrumental variables (LD R² \u0026lt; 0.05, distance ≥ 500 kb, pop = “EAS”) (\u003cb\u003eTables S28-29\u003c/b\u003e).\u003c/p\u003e\u003cp\u003e \u003cb\u003eInstrumental Variable Proxies.\u003c/b\u003e To prevent missing instrumental variable SNPs in the outcome, we identified proxies for each instrumental variable using the 1000 Genomes Project East Asian as the reference panel\u003csup\u003e\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e\u003c/sup\u003e (LD R² \u0026gt; 0.8). All instrumental variables and their proxies are listed in \u003cb\u003eTables S28-29\u003c/b\u003e. To control for horizontal pleiotropy, instrumental variables and their proxies associated with fewer than five exposures were retained\u003csup\u003e\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u003c/sup\u003e. After filtering, a total of 449 exposures (155 proteins and 294 metabolites) remained for analysis.\u003c/p\u003e\u003cp\u003e \u003cb\u003eOutcomes.\u003c/b\u003e We used data from 220 large-scale GWASs conducted by BBJ (East Asian) as outcomes\u003csup\u003e\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u003c/sup\u003e, covering 159 diseases, 23 drug prescriptions, and 38 biomarkers. Detailed information on the outcomes is available in \u003cb\u003eTable S30\u003c/b\u003e. There was no participant overlap between the exposure and outcome cohorts.\u003c/p\u003e\u003cp\u003e \u003cb\u003eMR Analysis.\u003c/b\u003e MR analysis was performed using the “mr” function from the R package TwoSampleMR. We employed the Wald ratio method for exposure–outcome pairs with a single instrumental variable and the inverse variance weighted method for those with multiple instrumental variables. Results that passed correction for multiple testing (Bonferroni corrected \u003cem\u003eP\u003c/em\u003e = 0.05/220) were retained.\u003c/p\u003e\u003ch2\u003eSupplemental Analyses for MR\u003c/h2\u003e\u003cp\u003e \u003cb\u003eReverse MR.\u003c/b\u003e To assess bidirectional causality, we performed reverse MR analysis using 220 GWASs from BBJ as exposures and 155 proteins and 294 metabolites as outcomes. The instrumental variables for the exposures were selected using the same criteria as in forward MR. Results passing multiple testing correction (Bonferroni corrected \u003cem\u003eP\u003c/em\u003e = 0.05/220) were considered significant. If these results overlapped with those from the forward MR analysis, they were considered to have bidirectional causality and were removed from further analyses.\u003c/p\u003e\u003cp\u003e \u003cb\u003eObservational Linear Regression Analysis.\u003c/b\u003e To validate the accuracy of our MR results, we conducted observational linear regression analyses to evaluate directional concordance between observational regression coefficients and causal MR beta estimates. Observational linear regression was conducted within the CAS cohort, which retained metrics from health examinations. We identified 25 outcomes shared between our CAS cohort and the BBJ cohort that also had significant MR results: alanine aminotransferase, aspartate transaminase, body mass index, blood urea nitrogen, body weight, diastolic blood pressure, eosinophil count, G-glutamyl transpeptidase, glucose, hemoglobin, hemoglobin A1c (HbA1c), high-density lipoprotein cholesterol (HDL-cholesterol), height, low-density lipoprotein cholesterol (LDL-cholesterol), lymphocyte count, monocyte count, platelet count, pulse pressure, red blood cell count, systolic blood pressure, serum creatinine, total cholesterol, triglycerides, uric acid, and white blood cell count. These outcomes were used for the observational linear regression analyses. Both exposures and outcomes were inverse-normalized before regression. Covariates included sex, age, smoking, alcohol consumption, and body mass index (BMI) for all outcomes, except for body weight, height, and BMI itself, where BMI was excluded. Directional concordance between observational regression coefficients and MR beta estimates was considered to provide evidence supporting MR findings.\u003c/p\u003e\u003cp\u003e \u003cb\u003eColocalization Analysis of Exposures and Outcomes.\u003c/b\u003e To investigate whether exposures and outcomes share the same genetic determinants, we performed colocalization analysis using the R package \u003cem\u003ecoloc\u003c/em\u003e on all significant MR pairs. SNPs within 1 Mb of instrumental variables were included in the study. The parameters and thresholds were the same as those described in the Colocalization Analysis section.\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003eFunding\u003c/h2\u003e \u003cp\u003eThis work was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences (XDA0460202, XDA0460204) and the National Natural Science Foundation of China (NSFC) (92374103, 32270706, 62473355).\u003c/p\u003e\u003ch2\u003eACKNOWLEDGMENTS\u003c/h2\u003e \u003cp\u003eWe thank all the participants. We thank Benjamin Knight, MSc., from Scribendi (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ewww.scribendi.com\u003c/a\u003e\u003c/span\u003e\u003cspan address=\"http://www.scribendi.com\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) for editing a draft of this manuscript.\u003c/p\u003e\n\u003ch3\u003eCODE availability\u003c/h3\u003e\n\u003cp\u003eAll custom Bash, R (version 4.3.2) and Python (version 3.11.6) scripts used in this study are available at GitHub (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/YangPeng-CNCB/multi-omics-QTL\u003c/span\u003e\u003cspan address=\"https://github.com/YangPeng-CNCB/multi-omics-QTL\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e).\u003c/p\u003e\u003ch2\u003eData availability\u003c/h2\u003e\u003cp\u003eThe multi-omics QTL summary statistics have been deposited in the OMIX, China National Center for Bioinformation / Beijing Institute of Genomics, Chinese Academy of Sciences (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://ngdc.cncb.ac.cn/omix\u003c/span\u003e\u003cspan address=\"https://ngdc.cncb.ac.cn/omix\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). The accession IDs are OMIX004116 for meQTL, OMIX008230 for pQTL, and OMIX011747 for metabQTL.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eUffelmann E et al (2021) Genome-wide association studies. Nat Rev Methods Primer 1:59\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMaurano MT et al (2012) Systematic Localization of Common Disease-Associated Variation in Regulatory DNA. Science 337:1190\u0026ndash;1195\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGallagher MD, Chen-Plotkin AS (2018) The Post-GWAS Era: From Association to Function. Am J Hum Genet 102:717\u0026ndash;730\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMusunuru K et al (2010) From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus. Nature 466:714\u0026ndash;719\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNicolae DL et al (2010) Trait-Associated SNPs Are More Likely to Be eQTLs: Annotation to Enhance Discovery from GWAS. PLoS Genet 6:e1000888\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJoehanes R et al (2017) Integrated genome-wide analysis of expression quantitative trait loci aids interpretation of genomic association studies. Genome Biol 18:16\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHawe JS et al (2022) Genetic variation influencing DNA methylation provides insights into molecular mechanisms regulating genomic function. Nat Genet 54:18\u0026ndash;29\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMin JL et al (2021) Genomic and phenotypic insights from an atlas of genetic effects on DNA methylation. Nat Genet 53:1311\u0026ndash;1321\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHuan T et al (2019) Genome-wide identification of DNA methylation QTLs in whole blood highlights pathways for cardiovascular disease. Nat Commun 10:4267\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePeng Q et al (2024) Analysis of blood methylation quantitative trait loci in East Asians reveals ancestry-specific impacts on complex traits. Nat Genet 56:846\u0026ndash;860\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSun BB et al (2023) Plasma proteomic associations with genetics and health in the UK Biobank. Nature 622:329\u0026ndash;338\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFerkingstad E et al (2021) Large-scale integration of the plasma proteome with genetics and disease. Nat Genet 53:1712\u0026ndash;1721\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen Y et al (2023) Genomic atlas of the plasma metabolome prioritizes metabolites implicated in human diseases. Nat Genet 55:44\u0026ndash;53\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKarjalainen MK et al (2024) Genome-wide characterization of circulating metabolic biomarkers. Nature 628:130\u0026ndash;138\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMacTel, Consortium et al (2021) A cross-platform approach identifies genetic regulators of human metabolism and health. Nat Genet 53:54\u0026ndash;64\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eIwasaki T et al (2023) Genetic influences on human blood metabolites in the Japanese population. iScience 26:105738\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang Z et al (2021) Genome-wide association study of metabolites in patients with coronary artery disease identified novel metabolite quantitative trait loci. Clin Transl Med 11:e290\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCheng C et al (2025) Genetic mapping of serum metabolome to chronic diseases among Han Chinese. Cell Genomics 5:100743\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang QS et al (2024) Statistically and functionally fine-mapped blood eQTLs and pQTLs from 1,405 humans reveal distinct regulation patterns and disease relevance. Nat Genet 56:2054\u0026ndash;2067\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTokolyi A et al (2025) The contribution of genetic determinants of blood gene expression and splicing to molecular phenotypes and health outcomes. Nat Genet. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41588-025-02096-3\u003c/span\u003e\u003cspan address=\"10.1038/s41588-025-02096-3\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSakaue S et al (2021) A cross-population atlas of genetic associations for 220 human phenotypes. Nat Genet 53:1415\u0026ndash;1424\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLundberg M, Eriksson A, Tran B, Assarsson E, Fredriksson S (2011) Homogeneous antibody-based proximity extension assays provide sensitive and specific detection of low-abundant proteins in human blood. Nucleic Acids Res 39:e102\u0026ndash;e102\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eH\u0026ouml;glund J, Karlsson T, Johansson T, Ek WE, Johansson \u0026Aring; (2021) Characterization of the human ABO genotypes and their association to common inflammatory and cardiovascular diseases in the UK Biobank. Am J Hematol 96:1350\u0026ndash;1362\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTian H et al (2022) Precise Metabolomics Reveals a Diversity of Aging-Associated Metabolic Features. Small Methods 6:e2200130\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWishart DS et al (2022) HMDB 5.0: the Human Metabolome Database for 2022. Nucleic Acids Res 50:D622\u0026ndash;D631\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYang J, Zeng J, Goddard ME, Wray NR, Visscher PM (2017) Concepts, estimation and interpretation of SNP-based heritability. Nat Genet 49:1304\u0026ndash;1310\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKastenm\u0026uuml;ller G, Raffler J, Gieger C, Suhre K (2015) Genetics of human metabolism: an update. Hum Mol Genet 24:R93\u0026ndash;R101\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYin X et al (2022) Genome-wide association studies of metabolites in Finnish men identify disease-relevant loci. Nat Commun 13:1644\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eV\u0026otilde;sa U et al (2021) Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat Genet 53:1300\u0026ndash;1310\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eThe GTEx Consortium atlas of genetic regulatory effects across human tissues\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCarreras-Torres R et al (2024) Multiomic integration analysis identifies atherogenic metabolites mediating between novel immune genes and cardiovascular risk. Genome Med 16:122\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePierce BL et al (2018) Co-occurring expression and methylation QTLs allow detection of common causal variants and shared biological mechanisms. Nat Commun 9:804\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShang L et al (2023) meQTL mapping in the GENOA study reveals genetic determinants of DNA methylation in African Americans. Nat Commun 14:2711\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eImai K, Keele L, Tingley D (2010) A general approach to causal mediation analysis. Psychol Methods 15:309\u0026ndash;334\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSmith GD (2004) Mendelian randomization: prospects, potentials, and limitations. Int J Epidemiol 33:30\u0026ndash;42\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWojciechowska-Durczynska K, Stepniak J, Lewinski A, Karbownik-Lewinska M (2024) The Increased FCRL mRNA Expression in Patients with Graves\u0026rsquo; Disease Is Associated with Hyperthyroidism (But Not with Positive Thyroid Antibodies). J Clin Med 13:5289\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSun J et al (2024) Systematic investigation of genetically determined plasma and urinary metabolites to discover potential interventional targets for colorectal cancer. JNCI J Natl Cancer Inst 116:1303\u0026ndash;1312\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCai C et al (2021) Gut microbiota imbalance in colorectal cancer patients, the risk factor of COVID-19 mortality. Gut Pathog 13:70\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYang G et al (2017) Glucuronidation: driving factors and their impact on glucuronide disposition. Drug Metab Rev 49:105\u0026ndash;138\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWei F-Z et al (2020) Differential Expression Analysis Revealing CLCA1 to Be a Prognostic and Diagnostic Biomarker for Colorectal Cancer. Front Oncol 10:573295\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMoore LD, Le T, Fan GDNA (2013) Methylation and Its Basic Function. Neuropsychopharmacology 38:23\u0026ndash;38\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDelaneau O, Marchini J, Zagury J-F (2011) A linear complexity phasing method for thousands of genomes. Nat Methods 9:179\u0026ndash;181\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003evan Leeuwen EM et al (2015) Population-specific genotype imputations using minimac or IMPUTE2. Nat Protoc 10:1285\u0026ndash;1296\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eThe 1000 Genomes Project Consortium (2015) A global reference for human genetic variation. Nature 526:68\u0026ndash;74\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePurcell S et al (2007) PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. Am J Hum Genet 81:559\u0026ndash;575\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSch\u0026auml;fer J, Strimmer K (2005) A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat Appl Genet Mol Biol 4:Article32\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChang CC et al (2015) Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, s13742-015-0047\u0026ndash;8\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGiambartolomei C et al (2014) Bayesian Test for Colocalisation between Pairs of Genetic Association Studies Using Summary Statistics. PLoS Genet 10:e1004383\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGenetic Investigation of ANthropometric Traits (GIANT) Consortium (2012) Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat Genet 44:369\u0026ndash;375\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSuhre K, McCarthy MI, Schwenk JM (2021) Genetics meets proteomics: perspectives for large population-based studies. Nat Rev Genet 22:19\u0026ndash;37\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXu F et al (2023) Genome-wide genotype-serum proteome mapping provides insights into the cross-ancestry differences in cardiometabolic disease susceptibility. Nat Commun 14:896\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFolkersen L et al (2020) Genomic and drug target evaluation of 90 cardiovascular proteins in 30,931 individuals. Nat Metab 2:1135\u0026ndash;1148\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang B (2024) \u003cem\u003eBda: Binned Data Analysis\u003c/em\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHemani G et al (2018) The MR-Base platform supports systematic causal inference across the human phenome. eLife 7:e34408\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"nature-portfolio","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Nature Portfolio","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"ejp","reportingPortfolio":"","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"methylation QTL (meQTL), protein QTL (pQTL), metabolite QTL (metabQTL), East Asian cohorts, colocalization, mendelian randomization","lastPublishedDoi":"10.21203/rs.3.rs-8193543/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8193543/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eQuantitative trait loci (QTL) studies have been pivotal in mapping the genetic regulation of molecular traits but have been primarily conducted in European populations, limiting insights into diverse ethnic groups. To close this knowledge gap, we conducted a large-scale multi-omics QTL analyses using blood samples from 3,102 Chinese individuals, systematically characterizing the regulatory effects of genetic variants on DNA methylation, protein levels, and metabolites. Our study identified 209 protein QTLs (pQTLs) for 155 proteins and 587 metabolite QTLs (metabQTLs) for 369 metabolites. By integrating these findings with cis-methylation QTL (meQTL) associations identified in our previous work, we defined the shared genetic architecture across these three molecular layers. Colocalization analyses, both within our cohort and with external xQTLs, revealed 3,665 pairs of shared causal variants across traits, supported by strong mediation evidence for a regulatory cascade in 187 pairs. To link these molecular findings to health outcomes, we performed Mendelian randomization (MR) analyses, identifying 497 potential causal relationships between molecular traits and diseases. These findings were further validated through observational and colocalization studies. Collectively, we present a comprehensive genomic atlas of meQTLs, pQTLs, and metabQTLs specific to East Asian populations, providing critical insight into shared regulatory networks and candidate causal variants across molecular and disease phenotypes.\u003c/p\u003e","manuscriptTitle":"A Map of Multi-omics Quantitative Trait Loci in a Chinese Population Reveals Regulatory Variations and Disease Links","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-01-06 16:28:41","doi":"10.21203/rs.3.rs-8193543/v1","editorialEvents":[],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"nature-communications","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"NCOMMS","sideBox":"Learn more about [Nature Communications](http://www.nature.com/ncomms/)","snPcode":"","submissionUrl":"https://mts-ncomms.nature.com/","title":"Nature Communications","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Nature Communications","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"d328f629-a183-4b2f-9873-7865741020f1","owner":[],"postedDate":"January 6th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":59689988,"name":"Biological sciences/Genetics/Genetic association study/Genome-wide association studies"},{"id":59689989,"name":"Biological sciences/Genetics/Population genetics"},{"id":59689990,"name":"Biological sciences/Biochemistry/Metabolomics"},{"id":59689991,"name":"Biological sciences/Biochemistry/Proteomics"},{"id":59689992,"name":"Biological sciences/Genetics/Genomics/Epigenomics"}],"tags":[],"updatedAt":"2026-01-06T16:28:41+00:00","versionOfRecord":[],"versionCreatedAt":"2026-01-06 16:28:41","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8193543","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8193543","identity":"rs-8193543","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00