Transforming polygenic risk prediction: functional annotation and digital twin modeling with whole-exome sequencing

preprint OA: closed
Full text JSON View at publisher
Full text 239,296 characters · extracted from preprint-html · click to expand
Transforming polygenic risk prediction: functional annotation and digital twin modeling with whole-exome sequencing | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Transforming polygenic risk prediction: functional annotation and digital twin modeling with whole-exome sequencing Alejandro Correa Rojo, Toomas Kivisild, Dirk Valkenborg, Gökhan Ertaylan This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6169446/v1 This work is licensed under a CC BY 4.0 License Status: Under Revision Version 1 posted 9 You are reading this latest preprint version Abstract Background Polygenic risk scores (PRSs) are widely used to assess genetic predisposition, but genotyping arrays typically target non-coding variants with limited functional annotation. In contrast, whole-exome sequencing (WES) maps variants to protein-coding regions, providing functional insights that can enrich PRS interpretation and support novel computational frameworks to infer individual genetic predisposition. Results We evaluated WES for polygenic risk modeling and functional interpretation using common exonic variants across 27 clinical biomarkers and 17 disease outcomes in the UK Biobank (N = 105,506) and applied the approach to the VITO IAM Frontier cohort (N = 30). WES achieved a 70.63% mapping rate of single-nucleotide polymorphisms (SNPs) to functional genomic information, compared to 11.64% for genotyping arrays, with most associations observed for lipid, hepatic, and renal biomarkers. PRS performance was comparable to that derived from imputed array data and linked to 11 disease outcomes, including cardiovascular conditions. The best-performing PRS in the target cohort was used to develop a digital twin model that integrates biological pathways, gene tissue expression signatures, and disease associations, validated by existing clinical and metabolomic data. Conclusions Our study demonstrates that WES-derived PRSs can effectively capture clinically relevant disease associations. However, through functional characterization of associated exonic variants, we show that a PRS, as a digital twin model, could potentially explain individual-level variation and provide biological information on how genetic variants mediate genetic risk. digital twin whole-exome sequencing polygenic risk scores clinical biomarkers disease risk UK Biobank IAM Frontier Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 Figure 11 Figure 12 Figure 13 Figure 14 Figure 15 Introduction Polygenic risk scores (PRSs), which quantify an individual’s genetic susceptibility or propensity for a specific trait or disease, have become valuable tools in biomedical research ( 1 , 2 ). Derived from genome-wide association studies (GWAS) and large-scale genomic resources ( 3 – 5 ), PRSs are often employed in population-based studies to predict disease risk, especially when combined with molecular datasets and electronic health records (EHR) ( 6 – 8 ). For many complex diseases, PRSs improve risk stratification models by enhancing predictive accuracy and linking genetic risk to disease states. However, a key limitation affecting their clinical utility is the uncertainty about whether they can reveal the molecular mechanisms underlying disease development ( 9 – 11 ). Traditionally, PRSs are calculated using genotyping or single-nucleotide polymorphism (SNP) arrays, which rely on genetic variants identified in large population cohorts. While these arrays are cost-effective and enable the identification of numerous common variants associated with complex diseases, they mainly capture non-coding variants, most of which lack functional annotation, thus limiting their biological interpretation ( 12 ). Whole-exome sequencing (WES), which targets the protein-coding regions of the genome (the exome), is well established as a diagnostic tool for inherited disorders due to its ability to identify rare coding variants associated with monogenic diseases ( 13 , 14 ). WES also captures common exonic variants offering useful insights into the genetics of complex traits ( 15 , 16 ). Despite this advantage, its application in polygenic risk modeling has been limited because PRSs typically rely on genome-wide common SNPs, and WES does not provide the same breadth of coverage. Nevertheless, WES has been used to investigate the contribution of rare variants to PRS accuracy and to offer functional insights through gene-based variant characterization, highlighting its potential for enhancing polygenic risk assessments ( 17 – 21 ). Because WES targets variants included in genes, its functional insights may enrich PRSs by adding biological context to associated variants, offering an initial gauge of individual risk. Large-scale population cohorts, such as the UK Biobank (UKB), provide extensive sequencing data and phenotyping for large-scale genetic studies, yielding novel insights into clinically relevant complex traits ( 22 – 24 ). Furthermore, several studies indicate that intermediate phenotypes , such as biomarker measurements can offer a clearer perspective on multifactorial diseases than binary traits, particularly when integrated with EHRs and multi-omics ( 7 , 25 – 28 ). Understanding these intermediate phenotypes sheds light on a disease’s genetic foundation and underlying biological processes. Consequently, coupling genetic risk scores with functional data from associated SNPs could enable computational frameworks that simulate individual health outcomes through biomarker measurements (such as in digital twins applications in health). Although WES is not yet fully leveraged for disease risk modeling of complex traits, it can serve as tool for characterizing the functional effects of genetic variants in these conditions ( 29 – 32 ). Polygenic risk scores consist of large sets of SNP predictors that collectively describe an individual’s genetic risk. Given that each SNP may carry additional functional information, such as trait associations, biological impacts, and clinical pathogenicity, one potential strategy for characterizing a PRS is through the concept of digital twins modeling : virtual models designed to simulate and predict attributes mirroring a physical system, in the health context of the patient/individual ( 33 – 35 ). In a medical context, digital twins have been proposed as tools to integrate multi-layered data, thereby capturing individual characteristics pertinent to disease outcomes ( 36 ). From this viewpoint, a PRS model not only predicts genetic risk but also, through the functional or biological details of its associated SNPs, can elucidate key molecular features of complex traits, particularly when applied to intermediate phenotypes. To explore whether functional information can inform a PRS model, we conducted a systematic analysis of WES for polygenic risk modeling and developed a digital twin framework to contextualize individual genetic liability. Using the UKB as a reference cohort with WES, genotyping, and extensive phenotypic data, we performed a large-scale genetic study of 24 blood biomarkers and three physical measurements in European populations. Specifically, we conducted genetic association testing, functional characterization, and polygenic risk prediction using common variants in WES, benchmarking our findings against imputed genotype data from the UKB. We then examined the correlation between these PRSs and 17 disease conditions pertinent to prevention and clinical care. Additionally, we generated PRSs for a target dataset, the VITO IAM Frontier (IAF) cohort of 30 healthy Flemish individuals with WGS and deep phenotyping ( 37 ), by leveraging both individual-level and summary-level UKB data. Hence, we evaluated the transferability of WES-derived PRSs and developed a PRS-informed digital twin by integrating functional insights from associated SNPs identified in our genetic association study. A schematic overview of the study, including the definitions of clinical biomarkers and disease outcomes, is presented in Fig. 1 . Study design An overview of the methodology is shown in Fig. 2 . We used the UKB as a reference for population analysis on the IAF target cohort, genetic association analysis, and PRS estimation using a quality-controlled set of 105,506 unrelated individuals. The UKB data were split into a base set (70%) and a hold-out set (30%). The base set (73,817 individuals) provided summary statistics from genetic analyses of clinical measurements, which were used for PRS construction. The hold-out set was further divided into a training set (9,473 individuals) for polygenic prediction and method selection, and a validation set (22,118 individuals) for PRS estimation and integration with the IAF dataset (30 individuals) using the selected clinical biomarkers. Population and genomic characteristics of the study cohorts We first performed a population structure analysis of the IAF target cohort, using the UKB as the reference dataset, by conducting principal component analysis (PCA) with FlashPCA2 ( 39 ) on variants filtered by minor allele frequency (MAF < 0.01) and linkage disequilibrium (LD; r² < 0.1). In both the WES (N Ind = 200,673; N SNPs = 8,201) and imputed array datasets (N Ind = 487,439; N SNPs = 66,063), IAF participants clustered with individuals identified as “British,” “Irish,” or “Any other white background” (Fig. 3 a; Additional file 1: Fig. S2 a). To check for fine-scale structure and identify outliers, we next applied uniform manifold approximation and projection (UMAP) on the first 40 principal components (PCs) from both genotype datasets. In both analyses, IAF participants clustered closely with UKB “British” individuals, as observed in the PCA (Fig. 3 b, Additional file 1: Fig. S2 b). Consequently, we retained UKB individuals with matched ancestry for subsequent analyses and polygenic risk modeling. Prior to association analyses, we conducted quality control on the reference genotype datasets for unrelated individuals with matched ancestry (See Additional file 1: Supplementary methodology). Before LD pruning, we compared autosomal common SNPs from the UKB datasets (WES and imputed array) with IAF whole-genome sequencing (WGS) data, accounting for platform-specific SNP content differences (Fig. 3 c, 3 d). The imputed array included 4,054,517 SNPs, primarily intronic (49.64%) or intergenic (38.27%), while the exome dataset contained 138,368 SNPs, mainly intronic (40.06%), missense (19.11%), or synonymous (17.17%). The IAF WGS data comprised 9,629,741 raw autosomal SNPs, chiefly intronic (49.19%) or intergenic (38.66%). The number of shared SNPs reflects variants with dbSNP identifiers that mapped to post-imputed variants in the imputed array (imputation score > 0.3). Genetic association of common variants To generate base (summary) statistics from the UKB reference dataset for PRS calculations, we used regenie ( 40 ) to perform single-variant association tests between autosomal SNPs from both quality-controlled WES (N Ind = 73,817; N SNPs = 39,687) and imputed array (N Ind = 73,817; N SNPs = 317,175) datasets against the biomarker measurements. In the WES dataset (Fig. 4 ; Additional file 1: Figures S3 -S29; Additional file 2: Table S1 ), 626 associations were genome-wide significant (P-value < 5 × 10 − 8 ), of which 494 remained significant after Bonferroni correction (P-value < 1.85 × 10 − 9 ). Among these genome-wide associations, 182 unique SNPs were coding-related variants (missense or synonymous). Cardiovascular-related biomarkers showed the highest number of associations (N SNPs = 271), primarily driven by lipid related traits such as cholesterol-related and apolipoprotein levels (e.g., LDL, APOB, CHO). From the association tests, genomic inflation factor (λ GC ) ranged from 1.07 to 1.17, indicating minimal population structure bias (Additional file 1: Table S6 ). In contrast, the imputed array dataset yielded 1,331 SNPs at genome-wide significance, with 986 surviving Bonferroni correction, though only 49 of the significant SNPs were coding-related (Additional file 1: Figures S30-S56; Additional file 2: Table S2 ). The λ GC values here ranged from 1.03 to 1.10, suggesting similarly low inflation (Additional file 1: Table S7 ). As in the WES dataset, cardiovascular-related biomarkers had the highest number of associated variants (N SNPs = 487). Finally, using PhenoScanner V2 ( 41 ) and gwasrapidd ( 42 ), we compared our results to previous genomic studies (Additional file 2: Tables S3–S8) and found that while most associations had been reported, we identified 97 novel SNPs across 27 biomarker measurements in the WES dataset and 311 novel SNPs in the imputed array dataset, with the WES data containing more unreported missense variants (N SNPs = 27 versus 6). Notably, these novel SNPs were not observed in the large GWAS meta-analysis of the same cohort by Sinott-Armstrong et al. (2021) ( 7 ) and were predominantly related to liver, renal, cancer, and bone and joint measurements. Beyond the association analyses, we assessed SNP-based heritability (h 2 g ) to quantify how much of each trait’s variance is explained by SNPs, since a base dataset with h 2 g > 0.05 is recommended for robust PRS construction( 4 ). Using BOLT-REML (via BOLT-LMM ( 43 )) on the UKB datasets (WES and imputed array), we found h 2 g values in the WES dataset ranging from 0.043 (GLUC) to 0.252 (TBIL), and from 0.082 (GLUC) to 0.429 (TBIL) in the imputed array (Fig. 5 a; Additional file 1: Table S7 ). With the exception of APOB for WES (h 2 g = 0.172), heritability estimates for most traits were higher in the array-based data, and all traits surpassed the 0.05 threshold except GLUC in the WES dataset (h 2 g = 0.043), which we therefore excluded from further analyses. We also computed genetic correlations (r g ) among the biomarkers using SCORE ( 44 ), as r g describes the genetic relationship between two traits and thus provide insights into shared biological pathways or potential causal relationships ( 45 ). From each dataset, 729 correlation estimates were obtained (Fig. 5 b; Additional file 1: Fig. S57; Additional file 2: Table S7 ). For the WES summary statistics, r g ranged from − 0.530 (HDL and TRIG) to 0.960 (LDL and APOB), whereas the array-based results spanned from − 0.619 (HDL and TRIG) to 0.975 (LDL and APOB). Lipid traits were most prominently correlated, with 12 pairs showing r g > 0.6. Notably, no significant differences emerged between datasets (Wilcoxon-test = 262,170; P-value = 0.66), suggesting that the same genetic architecture drive these correlations despite the differing SNP content in WES and imputed array data. Functional characterization of the base summary dataset To functionally annotate associated SNPs in each UKB base summary dataset, we used Functional Mapping and Annotation of Genome-Wide Association Studies (FUMA) ( 46 ), using the 1000 Genomes Project (1KG; European background) ( 47 ) as annotation reference and a Fisher’s exact test for trait-level significance (Fig. 6 a; Additional file 3: Tables S1-S2). From the WES summary data, 28,029 of 39,687 variants (70.63%) were functionally annotated; intronic variants were the largest group (56.731%), followed by intergenic (24.022%), and exonic (2.419%). In contrast, 36,302 of 317,175 variants (11.45%) in the imputed array dataset were annotated, with intronic variants again most common (53.507%), followed by intergenic (29.235%) and exonic (1.519%). At the trait level (Fig. 6 b), intronic and intergenic variants were significantly enriched (Fisher’s P-value < 0.05) for most measurements except for intergenic variants in BMI, VITD, and ALT (for both datasets), and intronic variants in ALB (imputed array only). In the WES dataset, exonic variants were significantly associated with all traits except URT, ALT, CREA, and BMI, whereas in the imputed array data, 13 traits showed significant exonic variant enrichment. Following annotation, we identified independent and lead SNPs associated with the biomarker measurements, where independent and lead SNPs serve as signals to facilitate the functional interpretation of genetic regions (Additional file 3: Tables S3-S4). In the WES dataset, we detected 606 independent variants (r² < 0.6) and 359 genomic risk loci across 26 traits, of which 595 variants were designated as lead SNPs (r² < 0.1). In the imputed array dataset, 1,147 independent variants and 429 genomic risk loci were identified, with 938 qualifying as lead SNPs. Overall, 44 unique SNPs were common independent signals across both datasets. These signals were used as input for the fine-mapping analyses described in the following sections. Gene-based and gene-set enrichment analyses Using FUMA, we also performed a gene-based and gene-set analyss to group variants into associated genes and functional pathways, drawing on MAGMA ( 48 ) and multiple reference datasets provided by the platform. From a curated set of 14,857 genes, we identified 162 unique associations from the WES-based summary statistics and 200 from the imputed array-based summary statistics (Bonferroni P-value < 3.36 × 10⁻⁶) (Additional file 1: Fig. S58). Across both datasets, 60 unique genes were commonly associated with 21 biomarkers (Fig. 7 ). Among all genes, APOB, PCSK9, LDLR, and TOMM40 were most frequently linked to lipid-related measurements. Next, we used the significantly associated genes to map variants to relevant biological pathways using the MsigDB database (via FUMA), identifying 437 significantly enriched pathways (Bonferroni P-value < 3.36 × 10⁻⁶) from the WES summary data and 256 from the imputed array data (Fig. 8 a, 8 b). Cardiovascular-related traits, particularly lipid-related traits, were most prominently represented, with enrichment in pathways related to lipid metabolism, familial hyperlipidemia, lipid particle composition, and statin inhibition of cholesterol production (Additional file 3: Tables S1-S2). We then investigated gene expression signatures for these associated genes using MAGMA and the Genotype-Tissue Expression (GTEx) RNA-sequencing dataset (via FUMA). Across 54 related tissues (Bonferroni P-value < 9.26 × 10⁻⁴), the WES dataset showed more associated genes per trait (N Genes = 55; Fig. 8 c) compared to the imputed array dataset (N Genes = 12; Fig. 8 d). Importantly, genes associated with lipid, kidney, and renal biomarkers mapped to metabolic-related tissues such as liver, kidney, and gut, while those associated with BMI were mapped to brain-related tissues (Additional file 3: Tables S3-S4). Pathogenicity prediction We performed a pathogenicity analysis of significantly associated missense variants from the base datasets to evaluate their potential functional impacts on protein translation, using AlphaMissense ( 49 ) and ESM1b ( 50 ) via the ProtVar ( 51 ) web server. In the WES-based summary data, 87 missense variants were identified, with five predicted as likely pathogenic; in contrast, the imputed array data yielded 24 missense variants, only two of which were deemed likely pathogenic (Additional file 5: Tables S1-S2). Among the WES-based variants, four showed concordant pathogenic predictions (AlphaMissense score > 0.8 and ESM1b score < − 10): rs1740032, rs3798220, rs1801689, and rs1801272, located in PDE11A, LPA, APOH, and CYP2A6, respectively. Notably, variants rs3798220 and rs1801689 has been described with functional impact associated with elevated lipoprotein(a) levels ( 52 , 53 ). Fine-mapping analyses We conducted a fine-mapping analysis to identify potential causal variants using FINEMAP ( 54 ) from PolyFun ( 55 ), focusing on independent variants (r² < 0.6) derived from FUMA and individual-level UKB genotype data. SNPs were fine-mapped within a 5 Mb window of each independent variant, yielding 919,857 configurations across both WES and imputed array datasets that contained 15 or fewer causal variants (Fig. 9 a; Additional file 6: Tables S1-S2). The WES dataset contributed 91,970 genetic variants, with APOB having the highest number of contributing variants (N SNPs = 12,437) and BMI the lowest (N SNPs = 900). In the imputed array dataset, 821,887 genetic configurations were fine-mapped, with ALP having the most (N SNPs = 130,120) and DBP the fewest (N SNPs = 6,744). Among all configurations, 7,172 genetic variants surpassed a posterior inclusion probability (PIP) > 95% in both datasets, of which 6,565 were prioritized as causal (PIP > 99%), with TBIL exhibiting the highest number of causal variants (Fig. 9 b). From these prioritized variants, 90 unique missense variants appeared in the WES dataset (Fig. 9 c), primarily linked to cardiovascular biomarkers (N SNPs = 27), while only two were associated with anthropometric traits. Meanwhile, 25 unique missense variants emerged in the imputed array dataset for cardiovascular, renal, liver, bone and joint, and cancer-related biomarkers (Fig. 9 d), again showing the highest representation in cardiovascular traits (N SNPs = 12), and the fewest in anthropometric traits (N SNPs = 1). Polygenic risk modeling of clinical biomarker measurements After functionally characterizing the UKB-derived base summary datasets, we estimated PRSs for 26 clinical biomarkers using clumping and thresholding (PRSice2 ( 56 )), Bayesian regression (LDpred2 ( 57 ), RapidoPGS ( 58 ), PRS-CS ( 59 )), and penalized regression (lassosum2 ( 60 )). In a training set (N ind = 9,473) from the same quality-controlled UKB reference dataset, we identified the best-performing method by comparing the explained variance (R²) from a full model (PRS + covariates) versus a null model (covariates only). After adjusting for sex, age, 40 ancestry PCs, and genotype batch (for imputed array data), LDpred2 (grid and auto) emerged as the top method for both WES and imputed array datasets (Additional file 6: Tables S1-S2). In the WES training set, PRSs explained between 0.005 (URA) and 0.245 (TBIL) of the phenotypic variance, while in the imputed array data, they ranged from 0.012 (URA) to 0.295 (TBIL) (Fig. 10 a). Using LDpred2-derived weights, we then calculated PRSs for these biomarkers in a validation set (N ind = 22,118), also drawn from the reference dataset, observing broadly similar performance between WES and imputed array PRSs (Fig. 10 b; Additional file 1: Table S8 ). Notably, five WES-based risk scores (APOB, LDL, CHO, SHBG, ALB) explained slightly more variance than their array-based counterparts, and Spearman analysis indicated strong concordance (P-value < 1.13 × 10⁻²⁸¹) between exome- and array-based PRSs, with LDL, VITD, APOB, CHO, and TBIL showing correlations above 0.6 (Fig. 10 c; Additional file 1: Table S9). Targeted phenome-wide association and Mendelian randomization analyses To assess the clinical relevance of our generated PRSs, we performed a targeted PheWAS in the validation dataset on 17 disease phenotypes, with 15 available for analyses (N Cases > 200; Additional file 1: Table S1 0). Using logistic regression adjusted for sex, age, 40 ancestry PCs, and genotype batch (imputed array data only), we found 12 WES-based PRSs (biomarkers) significantly associated (FDR P-value 1) while five were decreasing (OR < 1). The strongest effect was between SBP and HYP (OR = 1.164, 95% CI 1.127–1.202; FDR P-value = 6.05 × 10⁻¹⁹), and the weakest was between TPROT and COPD (OR = 0.923, 95% CI 0.871–0.978; FDR P-value = 4.89 × 10⁻²). Cardiovascular-related PRSs (APOB, CHO, CRP, DBP, LDL, SBP) were frequently associated with hypertension, cardiovascular conditions, and obesity (HYP, CAD, ISC, MYO, OBS). Meanwhile, 15 imputed array–based PRSs significantly associated (FDR P-value < 0.05) with 10 disease outcomes (Fig. 11 b; Additional file 8: Table S2 ), showing 30 risk-increasing and eight risk-decreasing associations. The strongest was between BMI and OBS (OR = 1.417, 95% CI 1.340–1.498; FDR P-value = 2.93 × 10⁻³³), whereas the weakest was between ALB and MYO (OR = 1.111, 95% CI 1.031–1.197; FDR P-value = 4.95 × 10⁻²). Across datasets, 15 WES-based associations replicated in the array-based PheWAS, most often involving hypertension and coronary artery disease. These replicated associations showed highly concordant effect sizes (Spearman r = 0.804; P-value = 4.87 × 10⁻⁴) between WES and array-based PRSs. From the significant associations identified in the PheWAS, we performed a two-sample Mendelian randomization (MR) analysis to explore potential causal relationships between biomarker measurements and the associated disease phenotypes, through associated SNPs in the PRSs. We examined 71 PRS associations: 33 from the WES-based PheWAS and 38 from the imputed array–based PheWAS. All genome-wide significant SNPs (P-value < 5 × 10⁻⁸) were used as genetic instruments, while the outcome data were sourced from GIANT ( 61 ), CARDIoGRAMplusC4D ( 62 ), and FINNGEN ( 63 ) summary statistics (Additional file 1: Table S11). In the main inverse variance weighted (IVW) analysis of the WES-based data, 17 associations were significant (FDR P-value < 0.05; Fig. 11 c, 11 e). The strongest effect was observed for SBP on HYP (OR = 8.160 per SD, 95% CI 5.656–11.773; FDR P-value = 1.23 × 10⁻²⁸), while the weakest was for APOB on OBS (OR = 0.837 per SD, 95% CI 0.711–0.985; FDR P-value = 4.29 × 10⁻²) (Additional file 1: Tables S12-S13). No evidence of pleiotropy emerged (Additional file 1: Table S14), and IVW estimates were directionally consistent with MR-Egger and median-based methods, although ten associations did not reach significance under MR-Egger (FDR P-value > 0.05). Compared to the WES-based MR analysis, the imputed array–based IVW analysis identified 18 significant associations (FDR P-value < 0.05; Fig. 11 d, 11 f; Additional file 1: Tables S15-S16). The strongest was observed between SBP and HYP (OR = 12.893 per SD, 95% CI 5.656–11.773; FDR P-value = 3.00 × 10⁻²⁰), while the weakest involved DBP and CAD (OR = 2.603 per SD, 95% CI 1.067–6.345; FDR P-value = 4.72 × 10⁻²). HBA1C was associated with T2D (OR = 1.53 per SD, 95% CI 1.183–2.003; FDR P-value = 0.003) in the IVW results; however, this finding was not replicated in the weighted median (OR = 1.18 per SD, 95% CI 0.976–1.427; FDR P-value = 0.117) or MR-Egger (OR = 0.982 per SD, 95% CI 0.464–2.078; FDR P-value = 0.963) models. No evidence of horizontal pleiotropy was detected (Additional file 1: Table S17), and except for the HBA1C–T2D association, IVW estimates were directionally consistent with the sensitivity analyses. Nevertheless, 11 associations were not significant according to MR-Egger. From the WES-based MR analysis, six associations replicated in the array-based MR analysis, involving lipid-related traits (APOB, LDL, CHO) with CAD, blood pressure (SBP, DBP) with HYP, and body weight (BMI) with OBS. Although MR-PRESSO detected outliers in several disease associations, the significant results remained unchanged after excluding these variants (Additional file 1: Tables S18-S19). Prediction of the PRSs for the target cohort We next assessed the utility of the UKB reference dataset for polygenic risk prediction in the IAF target cohort by merging each UKB test set (WES- and array-based) with IAF WGS data and generating PRSs for 26 biomarkers (Additional file 1: Table S20). Nineteen PRSs showed significant correlations (P-value 0.6), notably APOB (r = 0.788), CHO (r = 0.753), CRP (r = 0.690), and TBIL (r = 0.846). Stratifying IAF participants by PRS quartiles revealed frequent mismatches for PRSs with weaker correlations (e.g., BMI, AST, HBA1C) but minimal mismatches for those with strong correlations (e.g., TBIL, APOB, LDL) (Fig. 12 ). We also compared each PRS to clinical measurements (Additional file 1: Tables S21-S22). Among WES-based PRSs, seven (APOA, APOB, CRP, LDL, SBP, TRIG, GGT) significantly correlated (P < 0.05) with their corresponding biomarkers, led by APOB (r = 0.617). Similarly, seven array-based PRSs (ALB, APOB, CRP, GGT, SHBG, TBIL, TPROT) were significant, with three (APOB, CRP, GGT) replicating the WES-based correlations, reinforcing the consistency of these scores across genotyping platforms. These findings shows the suitability of UKB as a matched-ancestry reference for trans-geographic PRS prediction, demonstrating that WES-based PRSs can describe clinical biomarkers variation and prove particularly valuable for lipid-related traits. Contextualizing PRSs at the individual level from a digital twin approach In our polygenic risk modeling, we showed that WES-based PRSs not only associate with clinical biomarkers but also map exonic variants to functional gene information, including protein mutations, pathways, and tissue-specific expression. Using the APOB PRS as a test case, we identified ten top exonic variants indicative of high genetic risk for elevated APOB (Fig. 13 a, 13 b). Among 19 genome-wide significant SNPs (mapped to 28 genes), three were missense variants, and two (rs3798220 in LPA and rs1801689 in APOH) were flagged as pathogenic by AlphaMissense and likely causal in fine-mapping analyses. Gene- and pathway-level enrichments pointed to lipid metabolism and familial hyperlipidemia, with relevant genes overexpressed in metabolic tissues (liver, pancreas, kidney, stomach). Together, these findings illustrate how a WES-based PRS can integrate genetic risk indicators with their functional and biological contexts at the individual level. Building on these findings, we developed a digital twin representation of the APOB WES-PRS model as a multi-layered framework linking risk scores with the functional data from the SNPs used for PRS estimation. By consolidating gene-level annotations, protein variation (pathogenicity), pathway associations, tissue-expression profiles, fine-mapped causal loci, and broader clinical evidence (e.g., PheWAS, MR analyses), we propose that the top k SNP predictors form an “individual genetic profile”. As illustrated in Fig. 13 c, a PRS should thus be interpreted both quantitatively (the aggregate risk score) and qualitatively (functional context of each variant), providing a more holistic view of genetic risk for the associated trait. In our test case, the quantitative context of the APOB PRS model includes the score and model specifications (cohort summary data, population ancestry, covariates, and statistical metrics). The qualitative context encompasses all functional information mapped from the contributing SNP predictors, including pathogenicity, gene-set analyses, and reported disease-related associations (targeted PheWAS or MR analyses). Evaluation of the PRS-informed digital twin model To evaluate whether the WES-based PRS digital twin model for APOB accurately reflects relevant biological features, we performed two analyses centered on biological pathway associations and gene-tissue expression signatures. First, we tested how the APOB PRS values in IAF participants correlated with their median metabolite levels from Nightingale NMR Metabolomics, under the assumption that a genetic risk score tied to lipid pathways would also correlate with related metabolite components. Out of 250 measured metabolites, 31 were significantly associated (FDR P-value < 0.05), all pertaining to lipids or lipoprotein subclasses (Additional file 8: Table S3 ). Notably, phospholipids, free cholesterol, and LDL-related measures showed strong correlations with the APOB PRS, with S_LDL_PL (phospholipids in small LDL particles) displaying the highest correlation (r = 0.620, FDR = 0.021) (Fig. 14 ). For the gene–tissue expression signatures, which included genes in significantly associated tissues, we constructed tissue-specific networks sourced from the GIANT database by prioritizing genes identified in our functional and fine-mapping analyses, using the HumanBase web platform ( https://hb.flatironinstitute.org ). For gene prioritization, we selected genes with genome-wide significant, contributing SNPs characterized as “likely causal” (fine-mapping) and significantly associated with the trait (MAGMA gene-based analyses). We hypothesized that a PRS could describe tissue-specific molecular processes potentially involved in the genetic risk for the trait.⁵ The prioritized genes with genome-wide significant SNPs and associations (from gene-based and fine-mapping analyses) included CELSR2 (rs3895559), SARS/SARS1 (rs685653), POC5 (rs888789), LPA (rs3798220), BUD13 (rs11820589), APOH (rs1801689), PLCG1 (rs2076148), TOMM40 (rs157581), and LDLR (rs6413504). Using these genes, we constructed tissue-specific networks for the liver, kidney, pancreas, and stomach (Fig. 15 ), where links represent interaction confidence scores. Across tissues, strong gene–gene connections emerged, with TOMM40 appearing as the most interconnected gene in the liver, pancreas, and kidney (interaction confidence > 0.6). To further support these findings, we queried the Cardiovascular Disease Knowledge Portal ( 64 ) for lipid metabolism or cardiovascular evidence on the 57 mapped genes. Of these, 45 showed moderate to compelling support (Huge Score > 3.0), linking them to cholesterol, lipoprotein metabolism, and various cardiovascular conditions such as coronary artery disease, hypertension, and cardiomyopathy (Additional file 1: Table S23). These results suggest that SNPs in PRSs with robust gene-level and functional annotations can help elucidate the underlying molecular characteristics of associated biomarkers. Finally, we performed an enrichment analysis on the tissue-specific HumanBase networks using the g:Profiler ( 65 ) web tool (Additional file 1: Fig. S60). The most prominent pathways involved lipase activity, DNA replication, RNA processing, and metabolic processes, with the liver network notably enriched for DNA unwinding, helicase activity, and nucleic acid metabolism. Other significant processes, including blood vessel morphogenesis, tRNA aminoacylation, and protein translation regulation, were enriched across kidney and pancreas networks (g:SCS P-value < 0.05). Discussion In this study, we set out to investigate the potential of using WES for polygenic risk modeling and individual-level interpretation of clinical biomarkers (as part of digital twin creation process) in disease diagnostics. Utilizing genomic data and biomarker measurements from the UKB, we demonstrated that PRSs derived from common exonic variants perform similarly to those generated from array-based approaches. Additionally, we showed that PRSs from WES can describe disease associations and potential causal relationships of genetic variants through targeted analysis of 17 disease outcomes. However, through functional annotation of genetic variants, we found that WES offers a greater biological context compared to genotyping arrays, linking SNPs to molecular entities and properties such as genes, pathogenicity, biological pathways, and tissue-related signatures. Furthermore, in an application case involving PRSs in a target population set (IAF), we illustrated how these molecular characteristics can describe individual-level variation based on the functional characteristics of the predictors included in a PRS model of a biomarker measurement, using a digital twin representation approach. Although WES has traditionally been used to identify rare variants in monogenic disorders, our study broadens its utility by showing that common exonic variants also significantly contribute to disease etiology through their associations with clinical biochemistry biomarkers. We found that WES achieved a 70.63% mapping rate of tested SNPs to functional genomic information, compared to 11.64% for genotyping arrays. Most of the significant associations involved cardiovascular, hepatic, and renal biomarkers, with lipid-related measures showing the strongest links. Genes integral to lipid metabolism, including LPA, LDLR, PCSK9, and APOB, were strongly associated with these biomarkers, in line with prior thorough research on lipid genetics and cardiovascular diseases( 66 – 69 ). Our analyses further demonstrate that WES can generate PRSs whose performance is comparable to those derived from imputed array data. Scores based on 26 of the 27 biomarkers examined aligned with array-based results and, when combined with targeted PheWAS and MR analyses, revealed genetic predispositions to multiple disease outcomes, included previously described ones from genotyping array studies ( 7 , 70 ). Most notably, lipid measurement PRSs were associated with cardiovascular diseases, supporting previous evidence of causal links between lipid-related variants and conditions such as ischemic heart disease (ISC), myocardial infarction (MYO), and coronary artery disease (CAD) ( 26 , 71 , 72 ). Furthermore, our approach shows that WES-based PRSs can be transferred across European subpopulations, illustrated by the results for seven PRSs in the IAF Flemish cohort. From our analysis, our results supports that matching local cohorts serve as controls in polygenic risk prediction ( 73 , 74 ). In our study, we demonstrated that integrating functional information from exonic variants within a PRS model can reveal key molecular aspects of individual genetic risk. Using APOB lipid measurements from the WES PRS model as a test case, we identified contributing SNP predictors associated with cardiovascular genes linked to lipid metabolism and expressed in organs such as the liver and kidneys. Building on these insights, we propose a PRS-informed digital twin framework that combines risk scores with functional data, encompassing risk genes, identified causal loci, biological pathways, gene-tissue expression signatures, and associated disease outcomes, to create an “individual genetic profile.” This multi-layered representation not only quantifies genetic risk within a population context, as described by the APOB PRS model, but also provides qualitative insights into how lipid metabolism mediates this risk. Using NMR metabolomics measurements from the IAF cohort, we identified significant associations between APOB WES-based risk scores and various lipoprotein and cholesterol subclasses, supporting our hypothesis that a PRS captures key molecular pathways, as illustrated in our PRS-informed digital twin model. Elevated levels of lipoprotein and cholesterol-related particles are well-recognized drivers of dyslipidemia and cardiometabolic risk, with APOB playing a central role as a molecular transporter of these metabolites ( 75 – 77 ). Our PRS-informed digital twin model corroborates existing evidence on the validity of the APOB PRS, aligning with recent studies that show genetic effects on APOB levels mediate the abundance of LDL-related particles, which are linked to increased risk for CAD, peripheral artery disease, and venous thromboembolism ( 78 , 79 ). Furthermore, our tissue-specific network analysis of the contributing SNPs in the APOB WES-based PRS model revealed several PRS-associated genes implicated in cardiometabolic risk. Notably, TOMM40 emerged as a hub gene in the liver, kidney, and pancreas, where it has previously been described for its roles in mitochondrial function, oxidative stress, and lipid metabolism ( 80 – 82 ). Several interacting genes, particularly those partnering with TOMM40 in the liver, such as the MCM complex (DNA metabolism) ( 83 ), MRPL3 (energy metabolism) ( 84 ), HSPD1 (mitochondrial and lipid metabolism) ( 85 ), and RUBVL2 (glucose and lipid regulation) ( 86 ), further suggest that the APOB PRS encompasses direct and potentially indirect links to lipid metabolism, as well as broader molecular mechanisms influencing metabolic processes across multiple tissues. Previous studies suggest that genes involved in metabolic pathways, also influence regulatory processes tied to various diseases, including obesity, diabetes, and cancer ( 87 , 88 ). The omnigenic model, a recently proposed framework, posits that, beyond core genes exerting direct effects, peripheral genes in broader cellular processes also shape disease risk through gene-regulatory pathways ( 89 , 90 ). In the context of our study, genes such as LDLR, APOB, and PCSK9 have been previously described as core genes in monogenic dyslipidemias, directly affecting lipid metabolism and contributing to disease risk ( 91 ). Yet our analyses also highlight numerous regulatory variants across multiple genes, including HNF1A (associated with monogenic diabetes), that contribute to CAD risk ( 92 ). Furthermore, tissue-specific networks from our digital twin model identify genes involved not only in metabolic processes but also in regulatory roles like protein translation, cell cycle control, and DNA replication, supporting the omnigenic model for lipid traits. Nonetheless, our analysis does not provide direct confirmation, as the genetic architecture of lipid metabolism across multiple disease outcomes, including its gene-regulatory aspects, remains poorly understood. Although the omnigenic model remains hypothetical, developing a PRS-informed digital twin framework could help determine whether a polygenic trait follows an omnigenic architecture. By integrating functional genomic data with proteomics, transcriptomics, and clinical datasets, researchers can evaluate the “omnigenic trait” hypothesis across conditions such as cardiovascular disorders ( 93 ), neurological disorders ( 94 ), and cancer ( 95 ). Our APOB PRS-informed digital twin case study also demonstrates how functional genomics, particularly from sequencing, can elucidate variant-driven metabolic interactions, even with limited statistical power. These variants affect multiple biological levels, underscoring the need for a systems genetics approach to refine the biological relevance of PRSs ( 96 ). By first focusing on key tissues and then expanding to related genes and pathways, researchers can uncover novel gene-molecular interactions that underlie tissue-specific processes and regulatory mechanisms in complex diseases. Ultimately, a digital twin framework enhances genomic methodologies by integrating functional evidence to prioritize molecular targets, automate prediction, and deepen insights into complex genetic architectures, paving the way for more precise, personalized medicine. There are several limitations to our study worth noting. First, we focused on populations with European ancestry, primarily due to data availability for our in-house IAF cohort. Future work with multi-ancestry cohorts is essential to confirm the utility of WES-based PRSs and improve their transferability across diverse populations, while also aiding in functional gene characterization. Second, our prediction analyses were limited by the small size of the target cohort (N Ind = 30). Although using the UKB as a proxy dataset revealed significant associations, larger cohorts would strengthen these findings. Third, our analysis was confined to uncorrelated SNPs (r² < 0.1) to facilitate the mapping of functional characteristics and validate reported associations. We also used genome-wide significant variants in MR analyses to evaluate their effects as instrumental variables, underscoring the need for larger studies to capture additional associated variants. Fourth, we focused exclusively on common variants (MAF > 1%), excluding rare variants that may contribute to disease risk but pose challenges for PRS inclusion due to low allele frequencies and reduced predictive impact. Lastly, this study was an exploratory application of WES for polygenic risk modeling. While PRS performance was generally similar for most traits, determining whether WES outperforms SNP-array platforms for every trait was beyond our scope. Further sequencing studies and integrative analyses linking genomic, molecular, and clinical data are needed to refine PRS methodologies and strengthen risk stratification models. Such efforts would enhance our understanding of individual genetic variation, particularly when PRS approaches are integrated within a digital twin framework. Conclusions In this study, we demonstrate that PRSs for clinical biomarkers can be estimated from WES and associated with clinically relevant disease outcomes. Additionally, we showed that the functional characterization of genetic variants provides biological insights into the associated biomarkers, which can be linked to disease risk. Moreover, we illustrated that a PRS, as a digital twin model, could potentially explain individual-level variation based on the functional information of the predictors. Finally, we showed that the UKB can be used as a proxy dataset to predict PRSs for small population studies, potentially accelerating biomedical research for local or small cohorts. Materials and methods Description of the study cohorts The UKB resource is a large, prospective cohort that incorporates phenotype, genotyping, sequencing, clinical, and health-related data from more than 500,000 participants recruited throughout the United Kingdom between 2006 and 2010, all aged 40 to 70 at the time of assessment. Additional information on data collection methods can be found elsewhere. The VITO IAF study is a smaller, prospective longitudinal cohort of 30 healthy participants aged 47 to 54 at recruitment, in which clinical biochemistry, WGS, multi-omics, health questionnaires, and physical characteristics were measured over one year. Further details on design, eligibility criteria, and data collection provided by Dries et al. (2024) ( 37 ). Phenotype definition of clinical biomarkers We focused our analysis on 24 serum biomarkers and three physical measurements, available in both UKB and IAF datasets, known for their associations with diseases and diagnostic value( 97 ). These measurements included anthropometric, bone and joint, cancer, cardiovascular, diabetes, liver, and renal-related biomarkers. Summary information on baseline characteristics and phenotype summary statistics for the datasets used in this study can be found in Additional file 1: Tables S1-S5. For the UKB, we conducted quality control at the phenotype level. We excluded participants who had withdrawn from the biobank and those with missing measurements. We then identified individuals taking cholesterol-lowering and anti-hypertensive medications using the medication questionnaire data (Data fields: 6177 and 6153). We removed participants taking cholesterol-lowering drugs. For individuals who reported using an anti-hypertensive medication, we adjusted their SBP and DBP values by adding 15 and 10 mmHg, respectively ( 98 , 99 ). Sequencing and genotyping data For the UKB, we utilized the interim release of the population-level exome variants (Data field: 23156). This dataset comprises exonic variants in PLINK format available for 200,643 individuals. In addition to WES, we used the imputed array data available for 487,159 individuals (Data field: 22828). For this dataset, we selected variants with an imputation “INFO” score greater than 0.3 and restricted our analyses to individuals with available exome sequencing data. For the IAF, we used whole-genome data (30X) from all participants. Information on the sequencing procedure can be found in Additional file 1: Supplementary methodology. We processed the raw FASTQ files for variant calling according to the OQFE protocol( 100 ). Briefly, raw reads were mapped to the GRCh38 human reference genome with BWA-MEM( 101 ) to generate mapped BAM files. For each BAM file, we called variants using DeepVariant ( 102 ) with the parameter “—model_type WGS” to obtain genomic variant call files (gVCF). All gVCFs were then joint-genotyped with GLNexus ( 103 ) and converted into BED format. For association analyses and polygenic risk modeling, we applied stringent quality control to the genomic data of the UKB reference dataset, following the recommendations by Chang (2020) and Marees et al. (2018) ( 104 , 105 ). We focused our analysis on common variants with MAF > 0.01 and LD r 2 < 0.1 for UKB unrelated individuals with matched ancestry on the IAF cohort, identified through population structure analysis using PCA and UMAP. Detail information on the population structure analysis and variant quality control can be found in Additional file 1: Supplementary methodology. Single-variant association analysis We performed single-variant association testing using regenie ( 40 ) via a two-step regression approach. In the first step, regenie fits a whole genome regression model to capture individual trait variability using the leave-one-chromosome-out (LOCO) scheme and Ridge regression. In the second step, the LOCO predictions are used as offsets for association testing using linear regression models. For single-variant association testing, we first applied an inverse rank-based normal transformation( 106 ) (RINT) to each quantitative trait. We then fitted the linear regression models adjusting for age, sex, and the first 40 ancestry PCs. For the reference imputed array data (UKB), we included the genotype batch as a covariate in the regression models. Finally, for the obtained associations, we checked for previously reported in other studies using the PhenoScanner V2 database and the GWAS Catalog from the R packages PhenoScanner ( 41 ) and gwasrapidd ( 42 ). Heritability measurements and genetic correlation We estimated the SNP-based heritability (h 2 g ) from the UKB sequencing and genotyping reference datasets using BOLT-REML (via BOLT-LMM) ( 43 ), which employs a Monte Carlo algorithm-based Restricted Maximum Likelihood (REML) estimation. For each trait, we fitted the model using the individual-level genotype data and adjusted for age, sex, and the first 40 ancestry PCs. Additionally, for the imputed array data, we included the genotype batch as a covariate in the model. To calibrate BOLT-REML statistics, we utilized the 1KG LD scores (European background) as a reference panel, provided by the developers. To detect shared genetic architecture between traits, we calculated pairwise genetic correlations (r g ) with SCORE ( 44 ), using a randomized method of moments estimator. Similarly, as with BOLT-REML, we applied SCORE to the individual-level genotype data and adjusted for covariates for both sequencing and genotyping base datasets. Functional annotation and characterization with FUMA We used FUMA ( 46 ) web-based platform to annotate the base summary statistics using its integrated ANNOVAR resource( 107 ), and identify independent significant SNPs. Independent SNPs were defined at a genome-wide significance level (P-value < 5 × 10 − 8 ) and r 2 < 0.6. From this subset, lead SNPs were defined at r 2 < 0.1. The 1KG European panel was used as a reference panel, merging between LD blocks at a maximum distance of 2.5 Mb. To identify independent signals, we analyzed SNPs outside the major histocompatibility complex (MHC) region. In addition to functional annotation, we conducted a generalized gene-set analyses using MAGMA ( 48 ) implementation from FUMA. MAGMA ( 48 ) performs multiple linear principal component regression analyses to map SNPs into gene properties using several reference datasets provided by FUMA, including a curated set of 14,857 protein-coding genes, the MSigDB for biological pathways, and the GTEx RNA-sequencing dataset for tissue-specificity. To account for multiple testing, we applied the Bonferroni correction. Pathogenicity analysis To check for functional impact of missense variants detected in the single-variant association testing, we predict their pathogenicity using AlphaMissense ( 49 ) and ESM1b ( 50 ) scores, using ProtVar ( 51 ) web-server. For AlphaMissense, a variant is predicted as likely “pathogenic” if its score is greater than 0.56. For ESM1b model, we consider a variant likely pathogenic if its score is lower than − 10. Fine-mapping We performed a fine-mapping analysis on the base summary statistics (UKB) to identify causal variants affecting the clinical biomarkers, using FINEMAP ( 54 ) with PolyFun ( 55 ). From the independent loci detected by FUMA, we extended each associated locus to a minimum size of ± 5 Mb to obtain a test region for fine-mapping. For each locus, we calculated the LD score matrix from the individual-level genotype data (in-sample LD) using PolyFun. We then applied FINEMAP with default parameters and a maximum number of 12 signals or credible sets. Priors were defined as 1/number of SNPs located in the genomic region, following default options. We summarized the output from FINEMAP to report the posterior probability (PIP) and causal effect sizes of each SNP and their credible set. Polygenic risk scores We generated PRSs for the traits using five different approaches based on clumping and thresholding (PRSice2( 56 )), Bayesian regression (LDpred2 ( 57 ), RápidoPGS ( 58 ), PRS-CS ( 59 )), and penalized regression (lassosum2 ( 60 )). We estimated the risk scores for the UKB reference training and validation sets using the base summary statistics generated from the single-variant association analysis. PRSs were calculated as the sum of dosages of SNPs multiplied by their adjusted effect sizes or weights. Information on the application of the methods can be found in Additional file 1: Supplementary methodology. For each method, we initially estimated PRSs on the training set to compare and select the best method for the validation set based on performance. Linear regression models were used to evaluate the association between the scores and the measurements. We adjusted each model for age, sex, and the first 40 ancestry PCs ( 108 ). For the imputed array data, the genotyping batch was also included as a covariate. We evaluated the predictive performance of the PRSs using the R 2 estimate for explained variation. This was calculated by subtracting the R 2 value of the full model (PRSs and covariates) from the R 2 of the null model (only covariates). We calculated the normal-based 95% confidence interval (CI) using bootstrapping with 100 replicates. Targeted phenome-wide association analysis We performed a targeted PheWAS in the UKB validation set to determine associations between the risk scores and multiple chronic diseases within the context of the target data (IAF). For this endeavor, we focused on the most prevalent diseases in the Belgian population reported by Van Wilder et al. (2022) ( 38 ), studying 17 diseases that include cardiovascular, infectious, and respiratory diseases, musculoskeletal disorders, cancer, mental disorders, and endocrine-related conditions. Summary details on the phenotype definitions can be found in Additional file 1: Table S1 . From the UKB, we used summary diagnoses from hospital inpatient data (Data fields 41202 and 41203), coded according to the International Classification of Diseases (ICD-9 and ICD-10). We then manually mapped ICD codes to Phecodes ( 109 ) and derived cases and controls using the R package PheWAS ( 110 ). For the selected diseases in the PheWAS, we only included outcomes with at least 200 cases in the validation dataset. Summary details on Phecodes are provided in Additional file 1: Table S10. We tested associations between PRSs and each disease case using logistic regression models adjusted for age, sex, the first 40 ancestry PCs, and (for the imputed array data) genotype batch. Confidence intervals were estimated with a normal-based 95% CI. To correct for multiple testing, we applied the false discovery rate (FDR) method, considering associations significant at FDR P-value < 0.05. Mendelian randomization analyses From the significant associations obtained from the targeted PheWAS, we conducted a two-sample MR analysis to assess the causal relationship between genetic variants included in PRSs and disease outcomes. We followed the guidelines provided by Burgess et al. (2019) ( 111 ). For MR, we defined SNPs with genome-wide significance < 5 × 10 − 8 as genetic instruments (instrumental variables) associated with clinical biomarkers (exposure), using the summary statistics from the generated UKB base datasets For outcome data, we used the GIANT ( 61 ) consortium, CARDIoGRAMplusC4D ( 62 ), and FinnGen ( 63 ). Summary information about the outcome datasets used in this study can be found in Additional file 1: Table S11. Outcome datasets were retrieved from the MR-Base platform ( 112 ). For causal inference, we used the multiplicative random-effect IVW method as our primary approach for estimating causal effects. Heterogeneity was assessed with Cochran’s Q to identify potential outliers, while MR-Egger regression (Egger intercept P-value < 0.05) tested for horizontal pleiotropy. We further applied the simple and weighted median methods (assuming at least 50% valid instruments), and MR-PRESSO to correct outliers, comparing results across methods to support our IVW estimates. All analyses were performed in R using the TwoSampleMR and MRPRESSO packages, with data clumped at r² < 0.01 and harmonized by correcting non-palindromic SNP strands and removing palindromic variants. Multiple testing was controlled using FDR, with significance set at FDR P-value < 0.05. Prediction of PRSs for the target cohort We estimated clinical biomarker PRSs for the IAF cohort using our UKB base summary and validation datasets. After merging IAF genotype data by retaining common variants from individuals with matched ancestry, we computed risk scores for each biomarker using PLINK and the best-performing PRS method. PRSs were generated from both exome-based and array-based summary statistics and compared using Spearman’s correlation analysis. Finally, we examined correlations between the risk scores and the median biomarker measurements. PRS-informed digital twin model From the IAF target cohort PRSs, we selected the WES-based model, chosen for its high concordance with the array-based approach and strong correlation with biomarker measurements, to construct a digital twin model. Following Vallée (2023) ( 36 ), who defines a digital twin as an integration of multi-scale knowledge with a data-driven model, we conceptualized the PRS as a combination of risk scores and the functional information of contributing SNPs. We then mapped the top k SNPs to functional annotations derived from our characterization analyses and disease association tests (targeted PheWAS and MR analyses). Using this functional information, we developed a multi-layered representation with the following categories: Associated Genes : genes linked to the top contributing, genome-wide significant SNPs. Pathogenicity : contributing SNPs predicted as pathogenic by AlphaMissense and ESM1b. Biological pathways : Key pathways identified via FUMA (MAGMA) enrichment analysis. GTEx signatures : Tissues significantly enriched according to GTEx data via FUMA (MAGMA). Causal loci (candidate genes) : SNPs deemed “likely causal” in fine-mapping and mapped to significant genes via FUMA enrichment. Associated outcomes : Disease outcomes linked to the PRS model from targeted PheWAS and MR analyses. Validation of the PRS-informed digital twin To validate that the WES PRS model reflects the biological characteristics of our digital twin model, we conducted two key analyses focusing on biological pathways and GTEx signatures. For the pathway analysis, we hypothesized that if the SNPs in the PRS are linked to relevant pathways, then the risk scores should correlate with molecular components of these pathways. We tested this by performing a Spearman correlation between the PRS values in the IAF target cohort and its measurements from the Nightingale NMR metabolomics dataset, which includes 250 biomarkers covering lipid, amino acid, and glycolysis-related pathways. These biomarkers serve as proxies for the metabolic processes associated with the enriched pathways, allowing us to assess whether the PRS captures pathway-level molecular perturbations. For the GTEx signatures analysis, we hypothesized that if genes mapped from the contributing SNPs in the PRS exhibit significant tissue-expression profiles, then the PRS should capture the tissue-specific molecular processes underlying trait variation. To test this, we used candidate genes with SNPs identified as “likely causal” in our PRS-informed digital model to construct tissue-specific gene interaction networks via the HumanBase platform ( https://hb.flatironinstitute.org ) using data from the GIANT database. These networks integrate gene co-expression and tissue-specific expression data to elucidate gene functions in each tissue. We then performed gene-set enrichment analysis using g:Profiler ( 65 ) to identify the biological categories associated with each network, applying the g:SCS algorithm for multiple testing correction. Declarations Acknowledgements We would like to thank all participants of the UK Biobank and the participants from the VITO IAM Frontier Study. UK Biobank data access was registered under the project application number 71521. Authors contributions A.C.R. and G.E. conceptualized the study. A.C.R. developed the methodology. Investigation was performed by A.C.R., D.K., and G.E. T.K., D.K., and G.E. supervised this work. The original draft was written by A.C.R. A.C.R., T.K., and G.E. reviewed and edited the draft. Funding This study was supported by the Flemish Special Research Fund (BOF): BOF21DOC23. Declaration of interests The authors declare that they have no competing interests. Ethics declaration For the UK Biobank, informed consent was obtained from all participants, and the study received approval from the North West Multi-Centre Research Ethics Committee (MREC). For the VITO IAM Frontier Study, ethical approval was obtained from the Ethical Committee of the University Hospital Antwerp (UZA) and the University of Antwerp. Informed consent was provided from all participants. Data availability The VITO IAM Frontier dataset have not been deposited in a public repository since it includes personal data from a private research institution. However, data access can be available upon request from the corresponding author. For the UK Biobank, data access can be requested at https://www.ukbiobank.ac.uk/enable-your-research/apply-for-access as an approved research project. Code availability Analysis scripts and codes are available on GitHub at https://github.com/alejocrojo09/wes_prs/. References Lewis CM, Vassos E. Polygenic risk scores: From research tools to clinical instruments. Genome Med. 2020;12(1):1–11. Xiang R, Kelemen M, Xu Y, Harris LW, Parkinson H, Inouye M, et al. Recent advances in polygenic scores: translation, equitability, methods and FAIR tools. Genome Med. 2024;16(1):1–14. Collister JA, Liu X, Clifton L. Calculating Polygenic Risk Scores (PRS) in UK Biobank: A Practical Guide for Epidemiologists. Front Genet. 2022;13(February):1–17. Choi SW, Mak TSH, O’Reilly PF. Tutorial: a guide to performing polygenic risk score analyses. Nat Protoc. 2020;15(9):2759–72. Lambert SA, Gil L, Jupp S, Ritchie SC, Xu Y, Buniello A, et al. The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nat Genet. 2021;53(4):416–25. Lennon NJ, Kottyan LC, Kachulis C, Abul-Husn NS, Arias J, Belbin G, et al. Selection, optimization and validation of ten chronic disease polygenic risk scores for clinical implementation in diverse US populations. Nat Med. 2024;30(2):480–7. Sinnott-Armstrong N, Tanigawa Y, Amar D, Mars N, Benner C, Aguirre M et al. Genetics of 35 blood and urine biomarkers in the UK Biobank. Nat Genet [Internet]. 2021;53(2):185–94. Available from: http://dx.doi.org/10.1038/s41588-020-00757-z Xu Y, Ritchie SC, Liang Y, Timmers PRHJ, Pietzner M, Lannelongue L, et al. An atlas of genetic scores to predict multi-omic traits. Nature. 2023;616(7955):123–31. Warren TL, Tubbs JD, Lesh TA, Corona MB, Pakzad SS, Albuquerque MD et al. Association of neurotransmitter pathway polygenic risk with specific symptom profiles in psychosis. Mol Psychiatry. 2024;(January). Zhang W, Zhang K. Understanding the Biological Basis of Polygenic Risk Scores and Disparities in Prostate Cancer: A Comprehensive Genomic Analysis. Cancer Inf. 2024;23. Hari Dass SA, McCracken K, Pokhvisneva I, Chen LM, Garg E, Nguyen TTT et al. A biologically-informed polygenic score identifies endophenotypes and clinical conditions associated with the insulin receptor function on specific brain regions. EBioMedicine [Internet]. 2019;42:188–202. Available from: https://doi.org/10.1016/j.ebiom.2019.03.051 Gaulton KJ, Preissl S, Ren B. Interpreting non-coding disease-associated human variants using single-cell epigenomics. Nat Rev Genet. 2023;24(8):516–34. Niguidula N, Alamillo C, Shahmirzadi Mowlavi L, Powis Z, Cohen JS, Farwell Hagman KD. Clinical whole-exome sequencing results impact medical management. Mol Genet genomic Med. 2018;6(6):1068–78. Liu Y, Hao C, Li K, Hu X, Gao H, Zeng J et al. Clinical Application of Whole Exome Sequencing for Monogenic Disorders in PICU of China. Front Genet. 2021;12(September). Van Hout CV, Tachmazidou I, Backman JD, Hoffman JD, Liu D, Pandey AK et al. Exome sequencing and characterization of 49,960 individuals in the UK Biobank. Nature [Internet]. 2020;586(7831):749–56. Available from: http://dx.doi.org/10.1038/s41586-020-2853-0 Samuels DC, Han L, Li J, Quanghu S, Clark TA, Shyr Y et al. Finding the lost treasures in exome sequencing data. Trends Genet [Internet]. 2013;29(10):593–9. Available from: http://dx.doi.org/10.1016/j.tig.2013.07.006 Wang Z, Choi SW, Chami N, Boerwinkle E, Fornage M, Redline S, et al. The Value of Rare Genetic Variation in the Prediction of Common Obesity in European Ancestry Populations. Front Endocrinol (Lausanne). 2022;13(May):1–12. Dou J, Wu D, Ding L, Wang K, Jiang M, Chai X, et al. Using off-target data from whole-exome sequencing to improve genotyping accuracy, association analysis and polygenic risk prediction. Brief Bioinform. 2021;22(3):1–12. Alkhamis FA, Alabdali MM, Alsulaiman AA, Alamri AS, Alali R, Akhtar MS et al. Whole-exome sequencing analyses in a Saudi Ischemic Stroke Cohort reveal association signals, and shows polygenic risk scores are related to Modified Rankin Scale Risk. Funct Integr Genomics [Internet]. 2023;23(2):1–9. Available from: https://doi.org/10.1007/s10142-023-01039-7 Yuan J, Qiu R, Wang Y, Chen ZJ, Sun H, Dai W et al. Exome-wide genetic risk score (ExGRS) to predict high myopia across multi-ancestry populations. 2024;1–10. Aldisi R, Hassanin E, Sivalingam S, Buness A, Klinkhammer H, Mayr A et al. Gene-based burden scores identify rare variant associations for 28 blood biomarkers. BMC Genomic Data [Internet]. 2023;24(1):1–11. Available from: https://doi.org/10.1186/s12863-023-01155-0 Wang Q, Dhindsa RS, Carss K, Harper AR, Nag A, Tachmazidou I, et al. Rare variant contribution to human disease in 281,104 UK Biobank exomes. Nature. 2021;597(7877):527–32. Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562(7726):203–9. Park J, Talozzi L, Greicius MD. Rare genetic associations with human lifespan in UK Biobank are enriched for oncogenic genes. Nat Commun [Internet]. 2025; Available from: http://dx.doi.org/10.1038/s41467-025-57315-6 Nag A, Dhindsa RS, Middleton L, Jiang X, Vitsios D, Wigmore E, et al. Effects of protein-coding variants on blood metabolite measurements and clinical biomarkers in the UK Biobank. Am J Hum Genet. 2023;110(3):487–98. Allara E, Morani G, Carter P, Gkatzionis A, Zuber V, Foley CN, et al. Genetic Determinants of Lipids and Cardiovascular Disease Outcomes: A Wide-Angled Mendelian Randomization Investigation. Circ Genomic Precis Med. 2019;12(12):543–51. Tabassum R, Rämö JT, Ripatti P, Koskela JT, Kurki M, Karjalainen J et al. Genetic architecture of human plasma lipidome and its link to cardiovascular disease. Nat Commun. 2019;10(1). Stevenson-Hoare J, Heslegrave A, Leonenko G, Fathalla D, Bellou E, Luckcuck L et al. Plasma biomarkers and genetics in the diagnosis and prediction of Alzheimer’s disease. Brain [Internet]. 2023;146(2):690–9. Available from: https://doi.org/10.1093/brain/awac128 Gui H, Schriemer D, Cheng WW, Chauhan RK, Antiňolo G, Berrios C, et al. Whole exome sequencing coupled with unbiased functional analysis reveals new Hirschsprung disease genes. Genome Biol. 2017;18(1):1–13. Liang H, Cheung LWT, Li J, Ju Z, Yu S, Stemke-Hale K, et al. Whole-exome sequencing combined with functional genomics reveals novel candidate driver cancer genes in endometrial cancer. Genome Res. 2012;22(11):2120–9. Wilcox N, Dumont M, González-Neira A, Carvalho S, Joly Beauparlant C, Crotti M, et al. Exome sequencing identifies breast cancer susceptibility genes and defines the contribution of coding variants to breast cancer risk. Nat Genet. 2023;55(9):1435–9. Bomba L, Walter K, Guo Q, Surendran P, Kundu K, Nongmaithem S et al. Whole-exome sequencing identifies rare genetic variants associated with human plasma metabolites. Am J Hum Genet [Internet]. 2022;109(6):1038–54. Available from: https://doi.org/10.1016/j.ajhg.2022.04.009 Katsoulakis E, Wang Q, Wu H, Shahriyari L, Fletcher R, Liu J, et al. Digital twins for health: a scoping review. npj Digit Med. 2024;7(1):1–11. Li X, Loscalzo J, Mahmud AKMF, Aly DM, Rzhetsky A, Zitnik M et al. Digital twins as global learning health and disease models for preventive and personalized medicine. Genome Med [Internet]. 2025;17(1):11. Available from: http://www.ncbi.nlm.nih.gov/pubmed/39920778 Björnsson B, Borrebaeck C, Elander N, Gasslander T, Gawel DR, Gustafsson M, et al. Digital twins to personalize medicine. Genome Med. 2020;12(1):10–3. Vallée A. Digital twin for healthcare systems. Front Digit Heal [Internet]. 2023;5(September):1–6. Available from: https://doi.org/10.3389/fdgth.2023.1253050 Heylen D, Clerck C, De, Pusparum M, Rojo AC, Heuvel R, Van Den, Standaert A, et al. Cohort profile: The I AM Frontier prospective cohort study in Flanders Key words Introduction. A healthcare transition towards personalized prevention; 2024. Van Wilder L, Devleesschauwer B, Clays E, Van der Heyden J, Charafeddine R, Scohy A et al. QALY losses for chronic diseases and its social distribution in the general population: results from the Belgian Health Interview Survey. BMC Public Health [Internet]. 2022;22(1):1–9. Available from: https://doi.org/10.1186/s12889-022-13675-y Abraham G, Qiu Y, Inouye M. FlashPCA2: principal component analysis of Biobank-scale genotype datasets. Bioinformatics. 2017;33(17):2776–8. Mbatchou J, Barnard L, Backman J, Marcketta A, Kosmicki JA, Ziyatdinov A et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat Genet [Internet]. 2021;53(7):1097–103. Available from: http://dx.doi.org/10.1038/s41588-021-00870-7 Kamat MA, Blackshaw JA, Young R, Surendran P, Burgess S, Danesh J, et al. PhenoScanner V2: An expanded tool for searching human genotype-phenotype associations. Bioinformatics. 2019;35(22):4851–3. Magno R, Maia AT. Gwasrapidd: An R package to query, download and wrangle GWAS catalog data. Bioinformatics. 2020;36(2):649–50. Orliac EJ, Banos DT, Ojavee SE, Läll K, Mägi R, Visscher PM, et al. Improving GWAS discovery and genomic prediction accuracy in biobank data. Proc Natl Acad Sci U S A. 2022;119(31):1–8. Wu Y, Burch KS, Ganna A, Pajukanta P, Pasaniuc B, Sankararaman S. Fast estimation of genetic correlation for biobank-scale data. Am J Hum Genet. 2022;109(1):24–32. van Rheenen W, Peyrot WJ, Schork AJ, Lee SH, Wray NR. Genetic correlations of polygenic disease traits: from theory to practice. Nat Rev Genet [Internet]. 2019;20(10):567–81. Available from: http://dx.doi.org/10.1038/s41576-019-0137-z Watanabe K, Taskesen E, Van Bochoven A, Posthuma D. Functional mapping and annotation of genetic associations with FUMA. Nat Commun [Internet]. 2017;8(1):1–10. Available from: http://dx.doi.org/10.1038/s41467-017-01261-5 Auton A, Abecasis GR, Altshuler DM, Durbin RM, Bentley DR, Chakravarti A, et al. A global reference for human genetic variation. Nature. 2015;526(7571):68–74. de Leeuw CA, Mooij JM, Heskes T, Posthuma D. MAGMA: Generalized Gene-Set Analysis of GWAS Data. PLoS Comput Biol. 2015;11(4):1–19. Cheng J, Novati G, Pan J, Bycroft C, Žemgulyte A, Applebaum T et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science (80-). 2023;381(6664). Brandes N, Goldman G, Wang CH, Ye CJ, Ntranos V. Genome-wide prediction of disease variant effects with a deep protein language model. Nat Genet. 2023;55(9):1512–22. Stephenson JD, Totoo P, Burke DF, Jänes J, Beltrao P, Martin MJ. ProtVar: mapping and contextualizing human missense variation. Nucleic Acids Res. 2024;52(W1):W140–7. Epstein ES. Genetic Testing in Patients with High Lipoprotein(a): Experience from the UCSD Lipoprotein(a) Specialty Clinic†. J Clin Lipidol [Internet]. 2022;16(1, Supplement):e12–3. Available from: https://www.sciencedirect.com/science/article/pii/S1933287421002178 Hoekstra M, Chen HY, Rong J, Dufresne L, Yao J, Guo X, et al. Genome-Wide Association Study Highlights APOH as a Novel Locus for Lipoprotein(a) Levels-Brief Report. Arterioscler Thromb Vasc Biol. 2021;41(1):458–64. Benner C, Spencer CCA, Havulinna AS, Salomaa V, Ripatti S, Pirinen M. FINEMAP: Efficient variable selection using summary data from genome-wide association studies. Bioinformatics. 2016;32(10):1493–501. Weissbrod O, Hormozdiari F, Benner C, Cui R, Ulirsch J, Gazal S et al. Functionally informed fine-mapping and polygenic localization of complex trait heritability. Nat Genet [Internet]. 2020;52(12):1355–63. Available from: http://dx.doi.org/10.1038/s41588-020-00735-5 Choi SW, O’Reilly PF. PRSice-2: Polygenic Risk Score software for biobank-scale data. Gigascience. 2019;8(7):1–6. Privé F, Arbel J, Vilhjálmsson BJ. LDpred2: Better, faster, stronger. Bioinformatics. 2020;36(22–23):5424–31. Reales G, Vigorito E, Kelemen M, Wallace C, RápidoPGS:. A rapid polygenic score calculator for summary GWAS data without a test dataset. Bioinformatics. 2021;37(23):4444–50. Ge T, Chen CY, Ni Y, Feng YCA, Smoller JW. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat Commun [Internet]. 2019;10(1):1–10. Available from: http://dx.doi.org/10.1038/s41467-019-09718-5 Privé F, Arbel J, Aschard H, Vilhjálmsson BJ. Identifying and correcting for misspecifications in GWAS summary statistics and polygenic scores. Hum Genet Genomics Adv. 2022;3(4):1–15. Yengo L, Sidorenko J, Kemper KE, Zheng Z, Wood AR, Weedon MN, et al. Meta-analysis of genome-wide association studies for height and body mass index in ~ 700 000 individuals of European ancestry. Hum Mol Genet. 2018;27(20):3641–9. Deloukas P, Kanoni S, Willenborg C, Farrall M, Assimes TL, Thompson JR, et al. Large-scale association analysis identifies new risk loci for coronary artery disease. Nat Genet. 2013;45(1):25–33. Kurki MI, Karjalainen J, Palta P, Sipilä TP, Kristiansson K, Donner KM, et al. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature. 2023;613(7944):508–18. Costanzo MC, Roselli C, Brandes M, Duby M, Hoang Q, Jang D, et al. Cardiovascular Disease Knowledge Portal: A Community Resource for Cardiovascular Disease Research. Circ Genomic Precis Med. 2023;16(6):583–6. Kolberg L, Raudvere U, Kuzmin I, Adler P, Vilo J, Peterson H. G:Profiler-interoperable web service for functional enrichment analysis and gene identifier mapping (2023 update). Nucleic Acids Res. 2023;51(W1):W207–12. Selvaraj MS, Li X, Li Z, Pampana A, Zhang DY, Park J et al. Whole genome sequence analysis of blood lipid levels in > 66,000 individuals. Nat Commun. 2022;13(1). Kathiresan S, Srivastava D. Genetics of human cardiovascular disease. Cell [Internet]. 2012;148(6):1242–57. Available from: http://dx.doi.org/10.1016/j.cell.2012.03.001 Dron JS, Hegele RA. Genetics of Lipid and Lipoprotein Disorders and Traits. Curr Genet Med Rep. 2016;4(3):130–41. Erdmann J, Kessler T, Munoz Venegas L, Schunkert H. A decade of genome-wide association studies for coronary artery disease: The challenges ahead. Cardiovasc Res. 2018;114(9):1241–57. Richardson TG, Harrison S, Hemani G, Smith GD. An atlas of polygenic risk score associations to highlight putative causal relationships across the human phenome. Elife. 2019;8:1–24. O’Sullivan JW, Raghavan S, Marquez-Luna C, Luzum JA, Damrauer SM, Ashley EA, et al. Polygenic Risk Scores for Cardiovascular Disease: A Scientific Statement From the American Heart Association. Circulation. 2022;146(8):E93–118. Fu L, Liu Q, Cheng H, Zhao X, Xiong J, Mi J. Insights Into Causal Effects of Genetically Proxied Lipids and Lipid-Modifying Drug Targets on Cardiometabolic Diseases. J Am Hear Assoc. 2025;14(3):1–20. Artomov M, Loboda AA, Artyomov MN, Daly MJ. Public platform with 39,472 exome control samples enables association studies without genotype sharing. Nat Genet. 2024;56(February). Wojcik GL, Murphy J, Edelson JL, Gignoux CR, Ioannidis AG, Manning A, et al. Opportunities and challenges for the use of common controls in sequencing studies. Nat Rev Genet. 2022;23(11):665–79. Urbina EM, McCoy CE, Gao Z, Khoury PR, Shah AS, Dolan LM et al. Lipoprotein particle number and size predict vascular structure and function better than traditional lipids in adolescents and young adults. J Clin Lipidol [Internet]. 2017;11(4):1023–31. Available from: https://doi.org/10.1016/j.jacl.2017.05.011 Aday AW, Lawler PR, Cook NR, Ridker PM, Mora S, Pradhan AD. Lipoprotein Particle Profiles, Standard Lipids, and Peripheral Artery Disease Incidence: Prospective Data from the Women’s Health Study. Circulation. 2018;138(21):2330–41. Galimberti F, Casula M, Olmastroni E. Apolipoprotein B compared with low-density lipoprotein cholesterol in the atherosclerotic cardiovascular diseases risk assessment. Pharmacol Res [Internet]. 2023;195(May):106873. Available from: https://doi.org/10.1016/j.phrs.2023.106873 Lee J, Gilliland TC, Dron J, Koyama S, Nakao T, Lannery K, et al. Integrative Metabolomics Differentiate Coronary Artery Disease, Peripheral Artery Disease, and Venous Thromboembolism Risks. Arterioscler Thromb Vasc Biol. 2024;44(9):2108–17. Zuber V, Gill D, Ala-Korpela M, Langenberg C, Butterworth A, Bottolo L, et al. High-throughput multivariable Mendelian randomization analysis prioritizes apolipoprotein B as key lipid risk factor for coronary artery disease. Int J Epidemiol. 2021;50(3):893–901. Sayeed N, Sugaya K. Exosome mediated Tom40 delivery protects against hydrogen peroxide-induced oxidative stress by regulating mitochondrial function. PLoS One [Internet]. 2022;17(8 August):1–16. Available from: http://dx.doi.org/10.1371/journal.pone.0272511 Wang X, Wang S, Liu W, Wang T, Wang J, Gao X et al. Epigenetic upregulation of miR-126 induced by heat stress contributes to apoptosis of rat cardiomyocytes by promoting Tomm40 transcription. J Mol Cell Cardiol [Internet]. 2019;129:39–48. Available from: https://doi.org/10.1016/j.yjmcc.2018.10.005 Humphries AD, Streimann IC, Stojanovski D, Johnston AJ, Yano M, Hoogenraad NJ et al. Dissection of the mitochondrial import and assembly pathway for human Tom40. J Biol Chem [Internet]. 2005;280(12):11535–43. Available from: http://dx.doi.org/10.1074/jbc.M413816200 Cao T, Yi SJ, Wang LX, Zhao JX, Xiao J, Xie N et al. Identification of the DNA Replication Regulator MCM Complex Expression and Prognostic Significance in Hepatic Carcinoma. Biomed Res Int. 2020;2020. Lin X, Guo L, Lin X, Wang Y, Zhang G. Expression and prognosis analysis of mitochondrial ribosomal protein family in breast cancer. Sci Rep [Internet]. 2022;12(1):1–13. Available from: https://doi.org/10.1038/s41598-022-14724-7 Parma B, Ramesh V, Gollavilli PN, Siddiqui A, Pinna L, Schwab A et al. Metabolic impairment of non-small cell lung cancers by mitochondrial HSPD1 targeting. J Exp Clin Cancer Res [Internet]. 2021;40(1):1–20. Available from: https://doi.org/10.1186/s13046-021-02049-8 Javary J, Allain-Courtois N, Saucisse N, Costet P, Heraud C, Benhamed F et al. Liver Reptin/RUVBL2 controls glucose and lipid metabolism with opposite actions on mTORC1 and mTORC2 signalling. Gut [Internet]. 2018;67(12):2192 LP – 2203. Available from: http://gut.bmj.com/content/67/12/2192.abstract Dahik VD, Frisdal E, Goff W, Le. Rewiring of lipid metabolism in adipose tissue macrophages in obesity: Impact on insulin resistance and type 2 diabetes. Int J Mol Sci. 2020;21(15):1–30. Snaebjornsson MT, Janaki-Raman S, Schulze A. Greasing the Wheels of the Cancer Machine: The Role of Lipid Metabolism in Cancer. Cell Metab [Internet]. 2020;31(1):62–76. Available from: https://www.sciencedirect.com/science/article/pii/S1550413119306175 Boyle EA, Li YI, Pritchard JK. An Expanded View of Complex Traits: From Polygenic to Omnigenic. Cell [Internet]. 2017;169(7):1177–86. Available from: http://dx.doi.org/10.1016/j.cell.2017.05.038 Mathieson I. The omnigenic model and polygenic prediction of complex traits. Am J Hum Genet [Internet]. 2021;108(9):1558–63. Available from: https://doi.org/10.1016/j.ajhg.2021.07.003 Liu X, Li YI, Pritchard JK. Trans Effects on Gene Expression Can Drive Omnigenic Inheritance. Cell [Internet]. 2019;177(4):1022–1034.e6. Available from: https://doi.org/10.1016/j.cell.2019.04.014 Hughes MF, Lenighan YM, Godson C, Roche HM. Exploring Coronary Artery Disease GWAs Targets With Functional Links to Immunometabolism. Front Cardiovasc Med. 2018;5(November):1–12. Van Der Harst P, Verweij N. Identification of 64 novel genetic loci provides an expanded view on the genetic architecture of coronary artery disease. Circ Res. 2018;122(3):433–43. Liu Y, Ren H, Zhang Y, Deng W, Ma X, Zhao L et al. Temporal changes in brain morphology related to inflammation and schizophrenia: An omnigenic Mendelian randomization study. Psychol Med. 2024. Wang B, Dong X, Hu J, Gao L. Multi-omics peripheral and core regions of cancer. npj Syst Biol Appl. 2022;8(1). Allayee H, Farber CR, Seldin MM, Williams EG, James DE, Lusis AJ. Systems genetics approaches for understanding complex traits with relevance for human disease. Elife. 2023;12:1–29. Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J, et al. UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age. PLoS Med. 2015;12(3):1–10. Warren HR, Evangelou E, Cabrera CP, Gao H, Ren M, Mifsud B, et al. Genome-wide association analysis identifies novel blood pressure loci and offers biological insights into cardiovascular risk. Nat Genet. 2017;49(3):403–15. Fung K, Ramírez J, Warren HR, Aung N, Lee AM, Tzanis E, et al. Genome-wide association study identifies loci for arterial stiffness index in 127,121 UK Biobank participants. Sci Rep. 2019;9(1):1–8. Szustakowski JD, Balasubramanian S, Kvikstad E, Khalid S, Bronson PG, Sasson A, et al. Advancing human genetics research and drug discovery through exome sequencing of the UK Biobank. Nat Genet. 2021;53(7):942–8. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754–60. Poplin R, Chang PC, Alexander D, Schwartz S, Colthurst T, Ku A, et al. A universal snp and small-indel variant caller using deep neural networks. Nat Biotechnol. 2018;36(10):983. Lin MF, Dnanexus OR, Penn J, Bai X, Reid JG, Krasheninina O et al. GLnexus: joint variant calling for large cohort sequencing. bioRxiv [Internet]. 2018;343970. Available from: https://www.biorxiv.org/content/10.1101/343970v1%0Ahttps://www.biorxiv.org/content/10.1101/343970v1.abstract Marees AT, de Kluiver H, Stringer S, Vorspan F, Curis E, Marie-Claire C, et al. A tutorial on conducting genome-wide association studies: Quality control and statistical analysis. Int J Methods Psychiatr Res. 2018;27(2):1–10. Chang CC. Data Management and Summary Statistics with PLINK. Methods Mol Biol. 2020;2090:49–65. McCaw ZR, Lane JM, Saxena R, Redline S, Lin X. Operating characteristics of the rank-based inverse normal transformation for quantitative trait analysis in genome-wide association studies. Biometrics. 2020;76(4):1262–72. Wang K, Li M, Hakonarson H. ANNOVAR: Functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010;38(16):1–7. Dor E, Margaliot I, Brandes N, Zuk O, Linial M, Rappoport N. Selecting Covariates for Genome-Wide Association Studies. bioRxiv [Internet]. 2023;2023.02.07.527425. Available from: https://www.biorxiv.org/content/10.1101/2023.02.07.527425v1%0Ahttps://www.biorxiv.org/content/ 10.1101/2023.02.07.527425v1.abstract Wei WQ, Bastarache LA, Carroll RJ, Marlo JE, Osterman TJ, Gamazon ER, et al. Evaluating phecodes, clinical classification software, and ICD-9-CM codes for phenome-wide association studies in the electronic health record. PLoS ONE. 2017;12(7):1–16. Carroll RJ, Bastarache L, Denny JC. R PheWAS: Data analysis and plotting tools for phenome-wide association studies in the R environment. Bioinformatics. 2014;30(16):2375–6. Burgess S, Davey Smith G, Davies NM, Dudbridge F, Gill D, Glymour MM, et al. Guidelines for performing Mendelian randomization investigations. Wellcome Open Res. 2019;4:186. Hemani G, Zheng J, Elsworth B, Wade KH, Haberland V, Baird D, et al. The MR-base platform supports systematic causal inference across the human phenome. Elife. 2018;7:1–29. Additional Declarations No competing interests reported. Supplementary Files Additionalfile1.docx Additionalfile2.xlsx Additionalfile3.xlsx Additionalfile4.xlsx Additionalfile5.xlsx Additionalfile6.xlsx Additionalfile7.xlsx Additionalfile8.xlsx GA.png Graphical abstract Cite Share Download PDF Status: Under Revision Version 1 posted Editorial decision: Revision requested 29 Apr, 2026 Reviews received at journal 06 Apr, 2026 Reviewers agreed at journal 27 Feb, 2026 Reviews received at journal 14 Apr, 2025 Reviewers agreed at journal 28 Mar, 2025 Reviewers invited by journal 28 Mar, 2025 Editor assigned by journal 17 Mar, 2025 Submission checks completed at journal 06 Mar, 2025 First submitted to journal 06 Mar, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6169446","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":425015812,"identity":"cd75b7a4-194b-46f8-9411-3d905729605c","order_by":0,"name":"Alejandro Correa Rojo","email":"","orcid":"","institution":"Flemish Institute for Technological Research","correspondingAuthor":false,"prefix":"","firstName":"Alejandro","middleName":"Correa","lastName":"Rojo","suffix":""},{"id":425015813,"identity":"fc78bcb1-c241-4781-a4bb-e61fa20ed8c6","order_by":1,"name":"Toomas Kivisild","email":"","orcid":"","institution":"KU Leuven","correspondingAuthor":false,"prefix":"","firstName":"Toomas","middleName":"","lastName":"Kivisild","suffix":""},{"id":425015814,"identity":"1d8cee7f-8e2c-41ab-b380-ed46a3250aec","order_by":2,"name":"Dirk Valkenborg","email":"","orcid":"","institution":"Hasselt University","correspondingAuthor":false,"prefix":"","firstName":"Dirk","middleName":"","lastName":"Valkenborg","suffix":""},{"id":425015818,"identity":"a3f7834a-d584-4154-9727-23dcbd555a8b","order_by":3,"name":"Gökhan Ertaylan","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABCklEQVRIiWNgGAWjYFACNjYE+wEDgxwDM5DB2ECMFhCVwMBgzMDMTKKWxAYGAlrk3dvSHvzcwSBnPr/5mURCzZ30Dcf5Dz5g3GGDU4vhmWPHDXvPMBjLHGMzk0g49ix3w2FmZgPGM2m4tcxIb5PgbWNInMHGANTCdjh3ZjMzmwRj22G8WiT/tjHUz2Bj/yaR8O9wuiREy3/cfpFIOyYNtCVBgo3HTCKx7XACPzNYywGcWgx4jqVJy7ZJGM5gyym2SOw7bNjPzGxskHgmGbct7W1mkm/bbOQlmI9vvPHh22F5Nv6DDx983GGH2xaIAyRABIsEXDgBpwagLQ0INvMHPApHwSgYBaNgBAMAhEtNHTVlMLEAAAAASUVORK5CYII=","orcid":"","institution":"Flemish Institute for Technological Research","correspondingAuthor":true,"prefix":"","firstName":"Gökhan","middleName":"","lastName":"Ertaylan","suffix":""}],"badges":[],"createdAt":"2025-03-06 10:08:35","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6169446/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6169446/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":78142808,"identity":"632562ba-fe70-4c08-8c02-228f3208baac","added_by":"auto","created_at":"2025-03-10 10:34:43","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":210881,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eOverview of the study. \u003c/strong\u003eWe analyzed 27 clinical biomarkers to evaluate WES-based polygenic risk modeling using UKB as the reference dataset. We also assessed the use of UKB data for small cohort studies (IAF target) and developed a PRS-informed digital twin model integrating functional SNP data. Biomarkers were classified according to UKB physical and biochemistry panels, and outcomes were chosen based on the Belgian Health Interview Survey (Van Wilder et al., 2022) (38). Detailed study design, population characteristics, and phenotype summaries are provided in Additional file 1: Tables S1–S5.\u003c/p\u003e","description":"","filename":"image2.png","url":"https://assets-eu.researchsquare.com/files/rs-6169446/v1/251e6cd3405cd511908f48da.png"},{"id":78141310,"identity":"e6f5bdcd-dadd-4622-81d4-9aaf5127ec0f","added_by":"auto","created_at":"2025-03-10 10:18:43","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":178242,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eOverview of the study methodology. \u003c/strong\u003eSummary representation of the genetic association analysis, polygenic risk modeling, and association testing with the selected clinical measurements and disease outcomes. \u003cstrong\u003ea\u003c/strong\u003e Comparative analysis of WES and imputed genotype array data available from the UKB used as a reference cohort. \u003cstrong\u003eb\u003c/strong\u003e PRS prediction for the IAF target cohort and development of a PRS-informed digital twin model using UKB summary statistics and validation data.\u003c/p\u003e","description":"","filename":"image3.png","url":"https://assets-eu.researchsquare.com/files/rs-6169446/v1/534a15b04aaf42ddc7193303.png"},{"id":78141354,"identity":"ad236906-f712-4ff0-8ffb-263f5d3f81a5","added_by":"auto","created_at":"2025-03-10 10:18:45","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":307904,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePopulation structure of the study cohorts and genetic content of the reference (UKB) and target (IAF) datasets\u003c/strong\u003e. \u003cstrong\u003ea\u003c/strong\u003e PCA from the WES-based reference dataset (NInd = 200,673; NSNPs = 8,201). IAF individuals are shown in cyan and labeled as “IAF”. \u003cstrong\u003eb\u003c/strong\u003e UMAP from the WES-based reference dataset. UMAP was applied to the 40 first PCs. \u003cstrong\u003ec \u003c/strong\u003eProportion of SNPS by biological consequence. \u003cstrong\u003ed\u003c/strong\u003e Number of common and unique SNPs across technological datasets. The reported numbers correspond to SNPs before LD pruning.\u003c/p\u003e","description":"","filename":"image4.png","url":"https://assets-eu.researchsquare.com/files/rs-6169446/v1/9d50448e00fdd6a95e65be22.png"},{"id":78141323,"identity":"aadadc82-90ef-47d7-8703-25fc2e498bbb","added_by":"auto","created_at":"2025-03-10 10:18:44","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":802445,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eCircular representation of the significant associations across obtained from the WES-based dataset.\u003c/strong\u003eEach dot corresponds to an associated SNPs (mapped locus or gene). Larger dots highlights pleiotropic associations or same associated SNP across traits. Top left includes a bar plot of the number of associations per biomarker panel. Plot generated with Fujiplot(7).\u003c/p\u003e","description":"","filename":"image5.png","url":"https://assets-eu.researchsquare.com/files/rs-6169446/v1/d61e867305c4fa0e60271669.png"},{"id":78141337,"identity":"19a33c75-ca40-4e8f-b4fb-fc10f461ddab","added_by":"auto","created_at":"2025-03-10 10:18:44","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":122428,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePost-association analyses on the UKB base summary datasets\u003c/strong\u003e. \u003cstrong\u003ea \u003c/strong\u003eHeritability measurements estimated with BOLT-REML. \u003cstrong\u003eb \u003c/strong\u003eBoxplot representation of genetic correlation values estimates with SCORE.\u003c/p\u003e","description":"","filename":"image6.png","url":"https://assets-eu.researchsquare.com/files/rs-6169446/v1/9f9742bf8855b0e72651ff77.png"},{"id":78141319,"identity":"3edd00d4-35fb-4c97-a65f-c5fd22cf0aa7","added_by":"auto","created_at":"2025-03-10 10:18:43","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":102533,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFunctional annotation of the base summary dataset (UKB). a \u003c/strong\u003eDistribution of genetic variants based on functional groups. \u003cstrong\u003eb\u003c/strong\u003e Distribution of genetic variants based on functional groups across traits.\u003c/p\u003e","description":"","filename":"image7.png","url":"https://assets-eu.researchsquare.com/files/rs-6169446/v1/f62aa9d9e1dd452a2ae2fada.png"},{"id":78142521,"identity":"b37e2757-c6e3-4281-8cfa-6d284547b823","added_by":"auto","created_at":"2025-03-10 10:26:44","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":430888,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eChord diagram of common associated genes between platforms\u003c/strong\u003e. Genes with Bonferroni P-value \u0026lt; 3.36 × 10\u003csup\u003e-6\u003c/sup\u003e are shown.\u003c/p\u003e","description":"","filename":"image8.png","url":"https://assets-eu.researchsquare.com/files/rs-6169446/v1/161bf341370ae6cb8ce03497.png"},{"id":78141340,"identity":"c0084bc5-26e7-4bda-873a-ceff557e55f7","added_by":"auto","created_at":"2025-03-10 10:18:44","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":182158,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSummary of the gene-set analyses.\u003c/strong\u003e \u003cstrong\u003ea\u003c/strong\u003e Distribution of significantly associated biological pathways per trait across blood biochemistry panels (WES dataset; Bonferroni P-value \u0026lt; 3.36 × 10\u003csup\u003e-6\u003c/sup\u003e). \u003cstrong\u003eb\u003c/strong\u003e Distribution of significantly associated biological pathways per trait across blood biochemistry panels (imputed array dataset; Bonferroni P-value \u0026lt; 3.36 × 10\u003csup\u003e-6\u003c/sup\u003e).\u003cstrong\u003ec\u003c/strong\u003e Distribution of the tissue-specific gene expression hits per mapped organ across traits (WES dataset; Bonferroni P-value \u0026lt; Bonferroni P-value \u0026lt; 9.26 × 10\u003csup\u003e-4\u003c/sup\u003e). \u003cstrong\u003ed\u003c/strong\u003e Distribution of the tissue-specific gene expression hits per mapped organ across traits (imputed array dataset; Bonferroni P-value \u0026lt; 9.26 × 10\u003csup\u003e-4\u003c/sup\u003e).\u003c/p\u003e","description":"","filename":"image9.png","url":"https://assets-eu.researchsquare.com/files/rs-6169446/v1/f3f7b2daf46dcde204b64b90.png"},{"id":78142536,"identity":"e88a422b-c0f7-43bf-a65c-248e93a9d305","added_by":"auto","created_at":"2025-03-10 10:26:45","extension":"png","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":128197,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSummary of the fine-mapping analysis.\u003c/strong\u003e Using independent variants detected by FUMA, 919,857 SNP configurations were fine-mapped to 12 or fewer causal variants across all 26 traits in both datasets. \u003cstrong\u003ea\u003c/strong\u003e Total number of fine-mapped variants across datasets and traits. \u003cstrong\u003eb\u003c/strong\u003e Fine-mapped variants with PIP \u0026gt; 95%; SNPs with PIP ≥ 99% were prioritized as causal. \u003cstrong\u003ec\u003c/strong\u003e UpSet plot of unique missense variants with PIP ≥ 99% in the WES dataset. \u003cstrong\u003ed \u003c/strong\u003eUpSet plot of unique missense variants with PIP ≥ 99% in the imputed array dataset. In the UpSet plots, bars indicate the number of missense variants in each intersection, and dark circles mark the sets included.\u003c/p\u003e","description":"","filename":"image10.png","url":"https://assets-eu.researchsquare.com/files/rs-6169446/v1/0e7af15cc88e5998c93e65a6.png"},{"id":78141369,"identity":"ad24c6da-3fd3-438b-9d2f-f7f1fc3976c4","added_by":"auto","created_at":"2025-03-10 10:18:45","extension":"png","order_by":10,"title":"Figure 10","display":"","copyAsset":false,"role":"figure","size":337536,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePredictive performance of PRSs estimated for UKB training and validation datasets.\u003c/strong\u003e \u003cstrong\u003ea \u003c/strong\u003eBenchmarking of models with the training dataset (N\u003csub\u003eInd\u003c/sub\u003e = 9,473). \u003cstrong\u003eb \u003c/strong\u003eComparison of performances between platforms and across traits for the validation datasets (N\u003csub\u003eInd\u003c/sub\u003e = 22,118). The best-performing method obtained from the training dataset was used. \u003cstrong\u003ec\u003c/strong\u003e Spearman correlation matrix of PRSs between datasets. The diagonal shows the level of concordance between same-trait scores.\u003c/p\u003e","description":"","filename":"image11.png","url":"https://assets-eu.researchsquare.com/files/rs-6169446/v1/4f1fd633279e0150889cdd2c.png"},{"id":78141336,"identity":"f2c5032e-10aa-47ab-9349-6597668a04de","added_by":"auto","created_at":"2025-03-10 10:18:44","extension":"png","order_by":11,"title":"Figure 11","display":"","copyAsset":false,"role":"figure","size":202490,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSummary of the targeted PheWAS and two-sample MR analyses. a \u003c/strong\u003eVolcano plot of targeted PheWAS from the WES validation dataset. \u003cstrong\u003eb \u003c/strong\u003eVolcano plot of targeted PheWAS from the imputed-array validation dataset. \u003cstrong\u003ec \u003c/strong\u003eNetwork representation of causal links between WES-based genetic variants included in PRSs and disease outcome. \u003cstrong\u003ed \u003c/strong\u003eNetwork representation of causal links between imputed array-based genetic variants included in PRSs and disease outcome. \u003cstrong\u003ee \u003c/strong\u003eForest plot of the significant causal estimates (FDR P-value \u0026lt; 0.05) obtained from the IVW-method (WES-based MR). \u003cstrong\u003ef \u003c/strong\u003eForest plot of the significant causal estimates (FDR P-value \u0026lt; 0.05) obtained from the IVW-method (Imputed array-based MR).\u003c/p\u003e","description":"","filename":"image12.png","url":"https://assets-eu.researchsquare.com/files/rs-6169446/v1/702b8dc30d4d025da01b1abf.png"},{"id":78142523,"identity":"8ea91110-3e5b-4a16-a952-0ecbb70d0e06","added_by":"auto","created_at":"2025-03-10 10:26:44","extension":"png","order_by":12,"title":"Figure 12","display":"","copyAsset":false,"role":"figure","size":490583,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eRisk heatmap for the IAF target cohort using the UKB summary datasets for polygenic risk modeling and the validation set as a proxy dataset for the target cohort. \u003c/strong\u003ePRSs were estimated for the IAF cohort (N\u003csub\u003eInd\u003c/sub\u003e = 30) and individuals were stratified based on the quartiles of their scores. \u003cstrong\u003ea \u003c/strong\u003eRisk matrix obtained from the WES-based reference dataset. \u003cstrong\u003eb \u003c/strong\u003eRisk matrix obtained from the imputed array-based reference dataset. Individuals names were anonymized and randomized for the presentation of results.\u003c/p\u003e","description":"","filename":"image13.png","url":"https://assets-eu.researchsquare.com/files/rs-6169446/v1/38de7107be0ea76c5cd8982e.png"},{"id":78142533,"identity":"39e396ec-56cc-41e2-bfa4-e7e412e8c492","added_by":"auto","created_at":"2025-03-10 10:26:45","extension":"png","order_by":13,"title":"Figure 13","display":"","copyAsset":false,"role":"figure","size":258175,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eDescription of the PRS model and proposed approach for PRS-informed digital twin mapping. \u003c/strong\u003ePRS model contextualization using the WES-based APOB model as test case. \u003cstrong\u003ea \u003c/strong\u003eTop 10 contributing SNPs in the PRS model (LDpred2_grid). \u003cstrong\u003eb \u003c/strong\u003eGroup stratification of the test dataset (UKB and IAF individuals) based on the PRS model and the APOB concentration levels. \u003cstrong\u003ec \u003c/strong\u003eProposed representation of a PRS-informed digital twin with functional data from associated SNPs.\u003c/p\u003e","description":"","filename":"image14.png","url":"https://assets-eu.researchsquare.com/files/rs-6169446/v1/4b4dd033e72a3a7511204f20.png"},{"id":78141361,"identity":"9089a6e3-3f58-4a13-a327-d603d2b85a05","added_by":"auto","created_at":"2025-03-10 10:18:45","extension":"png","order_by":14,"title":"Figure 14","display":"","copyAsset":false,"role":"figure","size":45859,"visible":true,"origin":"","legend":"\u003cp\u003eAssociated lipid metabolites from the IAF NMR Nightingale metabolomics panel with the APOB PRS model (FDR P-value \u0026lt; 0.05).\u003c/p\u003e","description":"","filename":"image15.png","url":"https://assets-eu.researchsquare.com/files/rs-6169446/v1/b71637aad03d9b054ce330bf.png"},{"id":78141318,"identity":"d4af34e6-9b54-4713-a534-a506724cd78a","added_by":"auto","created_at":"2025-03-10 10:18:43","extension":"png","order_by":15,"title":"Figure 15","display":"","copyAsset":false,"role":"figure","size":706925,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eTissue-specific networks generated from the HumanBase platform with the associated PRS-genes mapped to the associated tissue signatures described in the functional analyses\u003c/strong\u003e. \u003cstrong\u003ea \u003c/strong\u003eStomach. \u003cstrong\u003eb \u003c/strong\u003eLiver. \u003cstrong\u003ec \u003c/strong\u003eKidney. \u003cstrong\u003ed \u003c/strong\u003ePancreas. Networks were constructed with a minimum interaction confidence of 0.10 and maximum 20 genes.\u003c/p\u003e","description":"","filename":"image16.png","url":"https://assets-eu.researchsquare.com/files/rs-6169446/v1/8547025d572fb57b11af0f4c.png"},{"id":78144359,"identity":"9c05d463-1085-4d42-9688-e108a7cc6ca0","added_by":"auto","created_at":"2025-03-10 10:42:46","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":5676074,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6169446/v1/0d5a9ac4-98e4-456c-8115-de3b56901b9b.pdf"},{"id":78141326,"identity":"8fc82233-8ef6-4456-9d5d-18db4ffc386c","added_by":"auto","created_at":"2025-03-10 10:18:44","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":7849936,"visible":true,"origin":"","legend":"","description":"","filename":"Additionalfile1.docx","url":"https://assets-eu.researchsquare.com/files/rs-6169446/v1/1355ae19541941d60d2ad06b.docx"},{"id":78141357,"identity":"1c19172d-1b8f-438d-9ec5-24b480297b23","added_by":"auto","created_at":"2025-03-10 10:18:45","extension":"xlsx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":2192145,"visible":true,"origin":"","legend":"","description":"","filename":"Additionalfile2.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6169446/v1/4433814190bbac0a31a20b1e.xlsx"},{"id":78141316,"identity":"c713139e-17f8-4fca-950e-b8b7f83b8d79","added_by":"auto","created_at":"2025-03-10 10:18:43","extension":"xlsx","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":134618,"visible":true,"origin":"","legend":"","description":"","filename":"Additionalfile3.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6169446/v1/e2ef231a75b7e7065f9c5e2a.xlsx"},{"id":78142529,"identity":"af1805ba-b72b-4f69-a4c3-35b0916a1a1c","added_by":"auto","created_at":"2025-03-10 10:26:44","extension":"xlsx","order_by":4,"title":"","display":"","copyAsset":false,"role":"supplement","size":76123,"visible":true,"origin":"","legend":"","description":"","filename":"Additionalfile4.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6169446/v1/453a5e55864fad5ce3c0f2f6.xlsx"},{"id":78141311,"identity":"a092bd27-3af7-4b0e-ae38-6d843baed778","added_by":"auto","created_at":"2025-03-10 10:18:43","extension":"xlsx","order_by":5,"title":"","display":"","copyAsset":false,"role":"supplement","size":96415,"visible":true,"origin":"","legend":"","description":"","filename":"Additionalfile5.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6169446/v1/3d6743b8e0837505961306b5.xlsx"},{"id":78141377,"identity":"ad98a069-7a9f-4ad8-88f6-dd8ef8695bf2","added_by":"auto","created_at":"2025-03-10 10:18:46","extension":"xlsx","order_by":6,"title":"","display":"","copyAsset":false,"role":"supplement","size":19292186,"visible":true,"origin":"","legend":"","description":"","filename":"Additionalfile6.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6169446/v1/54cf091c5357b4c9bca2a7b6.xlsx"},{"id":78142810,"identity":"e3e57a8f-4e78-4b1f-9888-c2b7a1bac7c4","added_by":"auto","created_at":"2025-03-10 10:34:45","extension":"xlsx","order_by":7,"title":"","display":"","copyAsset":false,"role":"supplement","size":31762,"visible":true,"origin":"","legend":"","description":"","filename":"Additionalfile7.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6169446/v1/2c7fe8168c3ebc4f93765da9.xlsx"},{"id":78141343,"identity":"55281ce6-8298-4543-8160-170561853fe6","added_by":"auto","created_at":"2025-03-10 10:18:44","extension":"xlsx","order_by":8,"title":"","display":"","copyAsset":false,"role":"supplement","size":127314,"visible":true,"origin":"","legend":"","description":"","filename":"Additionalfile8.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6169446/v1/b9c0379212ceeedd386c2a01.xlsx"},{"id":78141313,"identity":"fb1a8605-99cc-4f1c-a159-92aa876b0f9c","added_by":"auto","created_at":"2025-03-10 10:18:43","extension":"png","order_by":9,"title":"","display":"","copyAsset":false,"role":"supplement","size":130347,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eGraphical abstract\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"GA.png","url":"https://assets-eu.researchsquare.com/files/rs-6169446/v1/02f6e40f7819d9f4ff290126.png"}],"financialInterests":"No competing interests reported.","formattedTitle":"Transforming polygenic risk prediction: functional annotation and digital twin modeling with whole-exome sequencing","fulltext":[{"header":"Introduction","content":"\u003cp\u003ePolygenic risk scores (PRSs), which quantify an individual\u0026rsquo;s genetic susceptibility or propensity for a specific trait or disease, have become valuable tools in biomedical research (\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e). Derived from genome-wide association studies (GWAS) and large-scale genomic resources (\u003cspan additionalcitationids=\"CR4\" citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e), PRSs are often employed in population-based studies to predict disease risk, especially when combined with molecular datasets and electronic health records (EHR) (\u003cspan additionalcitationids=\"CR7\" citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e). For many complex diseases, PRSs improve risk stratification models by enhancing predictive accuracy and linking genetic risk to disease states. However, a key limitation affecting their clinical utility is the uncertainty about whether they can reveal the molecular mechanisms underlying disease development (\u003cspan additionalcitationids=\"CR10\" citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e). Traditionally, PRSs are calculated using genotyping or single-nucleotide polymorphism (SNP) arrays, which rely on genetic variants identified in large population cohorts. While these arrays are cost-effective and enable the identification of numerous common variants associated with complex diseases, they mainly capture non-coding variants, most of which lack functional annotation, thus limiting their biological interpretation (\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eWhole-exome sequencing (WES), which targets the protein-coding regions of the genome (the exome), is well established as a diagnostic tool for inherited disorders due to its ability to identify rare coding variants associated with monogenic diseases (\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e, \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e). WES also captures common exonic variants offering useful insights into the genetics of complex traits (\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e, \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e). Despite this advantage, its application in polygenic risk modeling has been limited because PRSs typically rely on genome-wide common SNPs, and WES does not provide the same breadth of coverage. Nevertheless, WES has been used to investigate the contribution of rare variants to PRS accuracy and to offer functional insights through gene-based variant characterization, highlighting its potential for enhancing polygenic risk assessments (\u003cspan additionalcitationids=\"CR18 CR19 CR20\" citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eBecause WES targets variants included in genes, its functional insights may enrich PRSs by adding biological context to associated variants, offering an initial gauge of individual risk. Large-scale population cohorts, such as the UK Biobank (UKB), provide extensive sequencing data and phenotyping for large-scale genetic studies, yielding novel insights into clinically relevant complex traits (\u003cspan additionalcitationids=\"CR23\" citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e). Furthermore, several studies indicate that \u003cem\u003eintermediate phenotypes\u003c/em\u003e, such as biomarker measurements can offer a clearer perspective on multifactorial diseases than binary traits, particularly when integrated with EHRs and multi-omics (\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e, \u003cspan additionalcitationids=\"CR26 CR27\" citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e). Understanding these \u003cem\u003eintermediate phenotypes\u003c/em\u003e sheds light on a disease\u0026rsquo;s genetic foundation and underlying biological processes. Consequently, coupling genetic risk scores with functional data from associated SNPs could enable computational frameworks that simulate individual health outcomes through biomarker measurements (such as in digital twins applications in health). Although WES is not yet fully leveraged for disease risk modeling of complex traits, it can serve as tool for characterizing the functional effects of genetic variants in these conditions (\u003cspan additionalcitationids=\"CR30 CR31\" citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e).\u003c/p\u003e \u003cp\u003ePolygenic risk scores consist of large sets of SNP predictors that collectively describe an individual\u0026rsquo;s genetic risk. Given that each SNP may carry additional functional information, such as trait associations, biological impacts, and clinical pathogenicity, one potential strategy for characterizing a PRS is through the concept of digital twins modeling : virtual models designed to simulate and predict attributes mirroring a physical system, in the health context of the patient/individual (\u003cspan additionalcitationids=\"CR34\" citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e). In a medical context, digital twins have been proposed as tools to integrate multi-layered data, thereby capturing individual characteristics pertinent to disease outcomes (\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e). From this viewpoint, a PRS model not only predicts genetic risk but also, through the functional or biological details of its associated SNPs, can elucidate key molecular features of complex traits, particularly when applied to intermediate phenotypes.\u003c/p\u003e \u003cp\u003eTo explore whether functional information can inform a PRS model, we conducted a systematic analysis of WES for polygenic risk modeling and developed a digital twin framework to contextualize individual genetic liability. Using the UKB as a reference cohort with WES, genotyping, and extensive phenotypic data, we performed a large-scale genetic study of 24 blood biomarkers and three physical measurements in European populations. Specifically, we conducted genetic association testing, functional characterization, and polygenic risk prediction using common variants in WES, benchmarking our findings against imputed genotype data from the UKB. We then examined the correlation between these PRSs and 17 disease conditions pertinent to prevention and clinical care. Additionally, we generated PRSs for a target dataset, the VITO IAM Frontier (IAF) cohort of 30 healthy Flemish individuals with WGS and deep phenotyping (\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e), by leveraging both individual-level and summary-level UKB data. Hence, we evaluated the transferability of WES-derived PRSs and developed a PRS-informed digital twin by integrating functional insights from associated SNPs identified in our genetic association study. A schematic overview of the study, including the definitions of clinical biomarkers and disease outcomes, is presented in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e"},{"header":"Study design","content":"\u003cp\u003eAn overview of the methodology is shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e. We used the UKB as a reference for population analysis on the IAF target cohort, genetic association analysis, and PRS estimation using a quality-controlled set of 105,506 unrelated individuals. The UKB data were split into a base set (70%) and a hold-out set (30%). The base set (73,817 individuals) provided summary statistics from genetic analyses of clinical measurements, which were used for PRS construction. The hold-out set was further divided into a training set (9,473 individuals) for polygenic prediction and method selection, and a validation set (22,118 individuals) for PRS estimation and integration with the IAF dataset (30 individuals) using the selected clinical biomarkers.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003ePopulation and genomic characteristics of the study cohorts\u003c/h2\u003e \u003cp\u003eWe first performed a population structure analysis of the IAF target cohort, using the UKB as the reference dataset, by conducting principal component analysis (PCA) with FlashPCA2 (\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e) on variants filtered by minor allele frequency (MAF\u0026thinsp;\u0026lt;\u0026thinsp;0.01) and linkage disequilibrium (LD; r\u0026sup2; \u0026lt; 0.1). In both the WES (N\u003csub\u003eInd\u003c/sub\u003e = 200,673; N\u003csub\u003eSNPs\u003c/sub\u003e = 8,201) and imputed array datasets (N\u003csub\u003eInd\u003c/sub\u003e = 487,439; N\u003csub\u003eSNPs\u003c/sub\u003e = 66,063), IAF participants clustered with individuals identified as \u0026ldquo;British,\u0026rdquo; \u0026ldquo;Irish,\u0026rdquo; or \u0026ldquo;Any other white background\u0026rdquo; (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ea; Additional file 1: Fig. \u003cspan refid=\"MOESM2\" class=\"InternalRef\"\u003eS2\u003c/span\u003ea). To check for fine-scale structure and identify outliers, we next applied uniform manifold approximation and projection (UMAP) on the first 40 principal components (PCs) from both genotype datasets. In both analyses, IAF participants clustered closely with UKB \u0026ldquo;British\u0026rdquo; individuals, as observed in the PCA (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eb, Additional file 1: Fig. \u003cspan refid=\"MOESM2\" class=\"InternalRef\"\u003eS2\u003c/span\u003eb). Consequently, we retained UKB individuals with matched ancestry for subsequent analyses and polygenic risk modeling.\u003c/p\u003e \u003cp\u003ePrior to association analyses, we conducted quality control on the reference genotype datasets for unrelated individuals with matched ancestry (See Additional file 1: Supplementary methodology). Before LD pruning, we compared autosomal common SNPs from the UKB datasets (WES and imputed array) with IAF whole-genome sequencing (WGS) data, accounting for platform-specific SNP content differences (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ec, \u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ed). The imputed array included 4,054,517 SNPs, primarily intronic (49.64%) or intergenic (38.27%), while the exome dataset contained 138,368 SNPs, mainly intronic (40.06%), missense (19.11%), or synonymous (17.17%). The IAF WGS data comprised 9,629,741 raw autosomal SNPs, chiefly intronic (49.19%) or intergenic (38.66%). The number of shared SNPs reflects variants with dbSNP identifiers that mapped to post-imputed variants in the imputed array (imputation score\u0026thinsp;\u0026gt;\u0026thinsp;0.3).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eGenetic association of common variants\u003c/h3\u003e\n\u003cp\u003eTo generate base (summary) statistics from the UKB reference dataset for PRS calculations, we used regenie (\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e) to perform single-variant association tests between autosomal SNPs from both quality-controlled WES (N\u003csub\u003eInd\u003c/sub\u003e = 73,817; N\u003csub\u003eSNPs\u003c/sub\u003e = 39,687) and imputed array (N\u003csub\u003eInd\u003c/sub\u003e = 73,817; N\u003csub\u003eSNPs\u003c/sub\u003e = 317,175) datasets against the biomarker measurements. In the WES dataset (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e; Additional file 1: Figures \u003cspan refid=\"MOESM3\" class=\"InternalRef\"\u003eS3\u003c/span\u003e-S29; Additional file 2: Table \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e), 626 associations were genome-wide significant (P-value\u0026thinsp;\u0026lt;\u0026thinsp;5 \u0026times; 10\u003csup\u003e\u0026minus;\u0026thinsp;8\u003c/sup\u003e), of which 494 remained significant after Bonferroni correction (P-value\u0026thinsp;\u0026lt;\u0026thinsp;1.85 \u0026times; 10\u003csup\u003e\u0026minus;\u0026thinsp;9\u003c/sup\u003e). Among these genome-wide associations, 182 unique SNPs were coding-related variants (missense or synonymous). Cardiovascular-related biomarkers showed the highest number of associations (N\u003csub\u003eSNPs\u003c/sub\u003e = 271), primarily driven by lipid related traits such as cholesterol-related and apolipoprotein levels (e.g., LDL, APOB, CHO). From the association tests, genomic inflation factor (λ\u003csub\u003eGC\u003c/sub\u003e) ranged from 1.07 to 1.17, indicating minimal population structure bias (Additional file 1: Table \u003cspan refid=\"MOESM6\" class=\"InternalRef\"\u003eS6\u003c/span\u003e). In contrast, the imputed array dataset yielded 1,331 SNPs at genome-wide significance, with 986 surviving Bonferroni correction, though only 49 of the significant SNPs were coding-related (Additional file 1: Figures S30-S56; Additional file 2: Table \u003cspan refid=\"MOESM2\" class=\"InternalRef\"\u003eS2\u003c/span\u003e). The λ\u003csub\u003eGC\u003c/sub\u003e values here ranged from 1.03 to 1.10, suggesting similarly low inflation (Additional file 1: Table \u003cspan refid=\"MOESM7\" class=\"InternalRef\"\u003eS7\u003c/span\u003e). As in the WES dataset, cardiovascular-related biomarkers had the highest number of associated variants (N\u003csub\u003eSNPs\u003c/sub\u003e = 487). Finally, using PhenoScanner V2 (\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e) and gwasrapidd (\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e), we compared our results to previous genomic studies (Additional file 2: Tables S3\u0026ndash;S8) and found that while most associations had been reported, we identified 97 novel SNPs across 27 biomarker measurements in the WES dataset and 311 novel SNPs in the imputed array dataset, with the WES data containing more unreported missense variants (N\u003csub\u003eSNPs\u003c/sub\u003e = 27 versus 6). Notably, these novel SNPs were not observed in the large GWAS meta-analysis of the same cohort by Sinott-Armstrong et al. (2021) (\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e) and were predominantly related to liver, renal, cancer, and bone and joint measurements.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eBeyond the association analyses, we assessed SNP-based heritability (h\u003csup\u003e2\u003c/sup\u003e\u003csub\u003eg\u003c/sub\u003e) to quantify how much of each trait\u0026rsquo;s variance is explained by SNPs, since a base dataset with h\u003csup\u003e2\u003c/sup\u003e\u003csub\u003eg\u003c/sub\u003e\u0026thinsp;\u0026gt;\u0026thinsp;0.05 is recommended for robust PRS construction(\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e). Using BOLT-REML (via BOLT-LMM (\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e)) on the UKB datasets (WES and imputed array), we found h\u003csup\u003e2\u003c/sup\u003e\u003csub\u003eg\u003c/sub\u003e values in the WES dataset ranging from 0.043 (GLUC) to 0.252 (TBIL), and from 0.082 (GLUC) to 0.429 (TBIL) in the imputed array (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ea; Additional file 1: Table \u003cspan refid=\"MOESM7\" class=\"InternalRef\"\u003eS7\u003c/span\u003e). With the exception of APOB for WES (h\u003csup\u003e2\u003c/sup\u003e\u003csub\u003eg\u003c/sub\u003e\u0026thinsp;=\u0026thinsp;0.172), heritability estimates for most traits were higher in the array-based data, and all traits surpassed the 0.05 threshold except GLUC in the WES dataset (h\u003csup\u003e2\u003c/sup\u003e\u003csub\u003eg\u003c/sub\u003e\u0026thinsp;=\u0026thinsp;0.043), which we therefore excluded from further analyses. We also computed genetic correlations (r\u003csub\u003eg\u003c/sub\u003e) among the biomarkers using SCORE (\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e), as r\u003csub\u003eg\u003c/sub\u003e describes the genetic relationship between two traits and thus provide insights into shared biological pathways or potential causal relationships (\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e). From each dataset, 729 correlation estimates were obtained (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eb; Additional file 1: Fig. S57; Additional file 2: Table \u003cspan refid=\"MOESM7\" class=\"InternalRef\"\u003eS7\u003c/span\u003e). For the WES summary statistics, r\u003csub\u003eg\u003c/sub\u003e ranged from \u0026minus;\u0026thinsp;0.530 (HDL and TRIG) to 0.960 (LDL and APOB), whereas the array-based results spanned from \u0026minus;\u0026thinsp;0.619 (HDL and TRIG) to 0.975 (LDL and APOB). Lipid traits were most prominently correlated, with 12 pairs showing r\u003csub\u003eg\u003c/sub\u003e \u0026gt; 0.6. Notably, no significant differences emerged between datasets (Wilcoxon-test\u0026thinsp;=\u0026thinsp;262,170; P-value\u0026thinsp;=\u0026thinsp;0.66), suggesting that the same genetic architecture drive these correlations despite the differing SNP content in WES and imputed array data.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e\n\u003ch3\u003eFunctional characterization of the base summary dataset\u003c/h3\u003e\n\u003cp\u003eTo functionally annotate associated SNPs in each UKB base summary dataset, we used Functional Mapping and Annotation of Genome-Wide Association Studies (FUMA) (\u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e), using the 1000 Genomes Project (1KG; European background) (\u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e) as annotation reference and a Fisher\u0026rsquo;s exact test for trait-level significance (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003ea; Additional file 3: Tables S1-S2). From the WES summary data, 28,029 of 39,687 variants (70.63%) were functionally annotated; intronic variants were the largest group (56.731%), followed by intergenic (24.022%), and exonic (2.419%). In contrast, 36,302 of 317,175 variants (11.45%) in the imputed array dataset were annotated, with intronic variants again most common (53.507%), followed by intergenic (29.235%) and exonic (1.519%). At the trait level (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eb), intronic and intergenic variants were significantly enriched (Fisher\u0026rsquo;s P-value\u0026thinsp;\u0026lt;\u0026thinsp;0.05) for most measurements except for intergenic variants in BMI, VITD, and ALT (for both datasets), and intronic variants in ALB (imputed array only). In the WES dataset, exonic variants were significantly associated with all traits except URT, ALT, CREA, and BMI, whereas in the imputed array data, 13 traits showed significant exonic variant enrichment.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFollowing annotation, we identified independent and lead SNPs associated with the biomarker measurements, where independent and lead SNPs serve as signals to facilitate the functional interpretation of genetic regions (Additional file 3: Tables S3-S4). In the WES dataset, we detected 606 independent variants (r\u0026sup2; \u0026lt; 0.6) and 359 genomic risk loci across 26 traits, of which 595 variants were designated as lead SNPs (r\u0026sup2; \u0026lt; 0.1). In the imputed array dataset, 1,147 independent variants and 429 genomic risk loci were identified, with 938 qualifying as lead SNPs. Overall, 44 unique SNPs were common independent signals across both datasets. These signals were used as input for the fine-mapping analyses described in the following sections.\u003c/p\u003e\n\u003ch3\u003eGene-based and gene-set enrichment analyses\u003c/h3\u003e\n\u003cp\u003eUsing FUMA, we also performed a gene-based and gene-set analyss to group variants into associated genes and functional pathways, drawing on MAGMA (\u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e) and multiple reference datasets provided by the platform. From a curated set of 14,857 genes, we identified 162 unique associations from the WES-based summary statistics and 200 from the imputed array-based summary statistics (Bonferroni P-value\u0026thinsp;\u0026lt;\u0026thinsp;3.36 \u0026times; 10⁻⁶) (Additional file 1: Fig. S58). Across both datasets, 60 unique genes were commonly associated with 21 biomarkers (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003e). Among all genes, APOB, PCSK9, LDLR, and TOMM40 were most frequently linked to lipid-related measurements.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eNext, we used the significantly associated genes to map variants to relevant biological pathways using the MsigDB database (via FUMA), identifying 437 significantly enriched pathways (Bonferroni P-value\u0026thinsp;\u0026lt;\u0026thinsp;3.36 \u0026times; 10⁻⁶) from the WES summary data and 256 from the imputed array data (Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003ea, \u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003eb). Cardiovascular-related traits, particularly lipid-related traits, were most prominently represented, with enrichment in pathways related to lipid metabolism, familial hyperlipidemia, lipid particle composition, and statin inhibition of cholesterol production (Additional file 3: Tables S1-S2). We then investigated gene expression signatures for these associated genes using MAGMA and the Genotype-Tissue Expression (GTEx) RNA-sequencing dataset (via FUMA). Across 54 related tissues (Bonferroni P-value\u0026thinsp;\u0026lt;\u0026thinsp;9.26 \u0026times; 10⁻⁴), the WES dataset showed more associated genes per trait (N\u003csub\u003eGenes\u003c/sub\u003e = 55; Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003ec) compared to the imputed array dataset (N\u003csub\u003eGenes\u003c/sub\u003e = 12; Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003ed). Importantly, genes associated with lipid, kidney, and renal biomarkers mapped to metabolic-related tissues such as liver, kidney, and gut, while those associated with BMI were mapped to brain-related tissues (Additional file 3: Tables S3-S4).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e\n\u003ch3\u003ePathogenicity prediction\u003c/h3\u003e\n\u003cp\u003eWe performed a pathogenicity analysis of significantly associated missense variants from the base datasets to evaluate their potential functional impacts on protein translation, using AlphaMissense (\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e) and ESM1b (\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e) via the ProtVar (\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e) web server. In the WES-based summary data, 87 missense variants were identified, with five predicted as likely pathogenic; in contrast, the imputed array data yielded 24 missense variants, only two of which were deemed likely pathogenic (Additional file 5: Tables S1-S2). Among the WES-based variants, four showed concordant pathogenic predictions (AlphaMissense score\u0026thinsp;\u0026gt;\u0026thinsp;0.8 and ESM1b score \u0026lt; \u0026minus;\u0026thinsp;10): rs1740032, rs3798220, rs1801689, and rs1801272, located in PDE11A, LPA, APOH, and CYP2A6, respectively. Notably, variants rs3798220 and rs1801689 has been described with functional impact associated with elevated lipoprotein(a) levels (\u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e, \u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e53\u003c/span\u003e).\u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eFine-mapping analyses\u003c/h2\u003e \u003cp\u003eWe conducted a fine-mapping analysis to identify potential causal variants using FINEMAP (\u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e54\u003c/span\u003e) from PolyFun (\u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e55\u003c/span\u003e), focusing on independent variants (r\u0026sup2; \u0026lt; 0.6) derived from FUMA and individual-level UKB genotype data. SNPs were fine-mapped within a 5 Mb window of each independent variant, yielding 919,857 configurations across both WES and imputed array datasets that contained 15 or fewer causal variants (Fig.\u0026nbsp;\u003cspan refid=\"Fig9\" class=\"InternalRef\"\u003e9\u003c/span\u003ea; Additional file 6: Tables S1-S2). The WES dataset contributed 91,970 genetic variants, with APOB having the highest number of contributing variants (N\u003csub\u003eSNPs\u003c/sub\u003e = 12,437) and BMI the lowest (N\u003csub\u003eSNPs\u003c/sub\u003e = 900). In the imputed array dataset, 821,887 genetic configurations were fine-mapped, with ALP having the most (N\u003csub\u003eSNPs\u003c/sub\u003e = 130,120) and DBP the fewest (N\u003csub\u003eSNPs\u003c/sub\u003e = 6,744). Among all configurations, 7,172 genetic variants surpassed a posterior inclusion probability (PIP)\u0026thinsp;\u0026gt;\u0026thinsp;95% in both datasets, of which 6,565 were prioritized as causal (PIP\u0026thinsp;\u0026gt;\u0026thinsp;99%), with TBIL exhibiting the highest number of causal variants (Fig.\u0026nbsp;\u003cspan refid=\"Fig9\" class=\"InternalRef\"\u003e9\u003c/span\u003eb).\u003c/p\u003e \u003cp\u003eFrom these prioritized variants, 90 unique missense variants appeared in the WES dataset (Fig.\u0026nbsp;\u003cspan refid=\"Fig9\" class=\"InternalRef\"\u003e9\u003c/span\u003ec), primarily linked to cardiovascular biomarkers (N\u003csub\u003eSNPs\u003c/sub\u003e = 27), while only two were associated with anthropometric traits. Meanwhile, 25 unique missense variants emerged in the imputed array dataset for cardiovascular, renal, liver, bone and joint, and cancer-related biomarkers (Fig.\u0026nbsp;\u003cspan refid=\"Fig9\" class=\"InternalRef\"\u003e9\u003c/span\u003ed), again showing the highest representation in cardiovascular traits (N\u003csub\u003eSNPs\u003c/sub\u003e = 12), and the fewest in anthropometric traits (N\u003csub\u003eSNPs\u003c/sub\u003e = 1).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003ePolygenic risk modeling of clinical biomarker measurements\u003c/h3\u003e\n\u003cp\u003eAfter functionally characterizing the UKB-derived base summary datasets, we estimated PRSs for 26 clinical biomarkers using clumping and thresholding (PRSice2 (\u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e56\u003c/span\u003e)), Bayesian regression (LDpred2 (\u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e57\u003c/span\u003e), RapidoPGS (\u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e58\u003c/span\u003e), PRS-CS (\u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e59\u003c/span\u003e)), and penalized regression (lassosum2 (\u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e60\u003c/span\u003e)). In a training set (N\u003csub\u003eind\u003c/sub\u003e = 9,473) from the same quality-controlled UKB reference dataset, we identified the best-performing method by comparing the explained variance (R\u0026sup2;) from a full model (PRS\u0026thinsp;+\u0026thinsp;covariates) versus a null model (covariates only). After adjusting for sex, age, 40 ancestry PCs, and genotype batch (for imputed array data), LDpred2 (grid and auto) emerged as the top method for both WES and imputed array datasets (Additional file 6: Tables S1-S2). In the WES training set, PRSs explained between 0.005 (URA) and 0.245 (TBIL) of the phenotypic variance, while in the imputed array data, they ranged from 0.012 (URA) to 0.295 (TBIL) (Fig.\u0026nbsp;\u003cspan refid=\"Fig10\" class=\"InternalRef\"\u003e10\u003c/span\u003ea). Using LDpred2-derived weights, we then calculated PRSs for these biomarkers in a validation set (N\u003csub\u003eind\u003c/sub\u003e = 22,118), also drawn from the reference dataset, observing broadly similar performance between WES and imputed array PRSs (Fig.\u0026nbsp;\u003cspan refid=\"Fig10\" class=\"InternalRef\"\u003e10\u003c/span\u003eb; Additional file 1: Table \u003cspan refid=\"MOESM8\" class=\"InternalRef\"\u003eS8\u003c/span\u003e). Notably, five WES-based risk scores (APOB, LDL, CHO, SHBG, ALB) explained slightly more variance than their array-based counterparts, and Spearman analysis indicated strong concordance (P-value\u0026thinsp;\u0026lt;\u0026thinsp;1.13 \u0026times; 10⁻\u0026sup2;⁸\u0026sup1;) between exome- and array-based PRSs, with LDL, VITD, APOB, CHO, and TBIL showing correlations above 0.6 (Fig.\u0026nbsp;\u003cspan refid=\"Fig10\" class=\"InternalRef\"\u003e10\u003c/span\u003ec; Additional file 1: Table S9).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e\n\u003ch3\u003eTargeted phenome-wide association and Mendelian randomization analyses\u003c/h3\u003e\n\u003cp\u003eTo assess the clinical relevance of our generated PRSs, we performed a targeted PheWAS in the validation dataset on 17 disease phenotypes, with 15 available for analyses (N\u003csub\u003eCases\u003c/sub\u003e \u0026gt; 200; Additional file 1: Table \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e0). Using logistic regression adjusted for sex, age, 40 ancestry PCs, and genotype batch (imputed array data only), we found 12 WES-based PRSs (biomarkers) significantly associated (FDR P-value\u0026thinsp;\u0026lt;\u0026thinsp;0.05) with 11 phenotypes (Fig.\u0026nbsp;\u003cspan refid=\"Fig11\" class=\"InternalRef\"\u003e11\u003c/span\u003ea; Additional file 8: Table \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e). Among these associations, 28 showed risk-increasing effects (OR\u0026thinsp;\u0026gt;\u0026thinsp;1) while five were decreasing (OR\u0026thinsp;\u0026lt;\u0026thinsp;1). The strongest effect was between SBP and HYP (OR\u0026thinsp;=\u0026thinsp;1.164, 95% CI 1.127\u0026ndash;1.202; FDR P-value\u0026thinsp;=\u0026thinsp;6.05 \u0026times; 10⁻\u0026sup1;⁹), and the weakest was between TPROT and COPD (OR\u0026thinsp;=\u0026thinsp;0.923, 95% CI 0.871\u0026ndash;0.978; FDR P-value\u0026thinsp;=\u0026thinsp;4.89 \u0026times; 10⁻\u0026sup2;). Cardiovascular-related PRSs (APOB, CHO, CRP, DBP, LDL, SBP) were frequently associated with hypertension, cardiovascular conditions, and obesity (HYP, CAD, ISC, MYO, OBS). Meanwhile, 15 imputed array\u0026ndash;based PRSs significantly associated (FDR P-value\u0026thinsp;\u0026lt;\u0026thinsp;0.05) with 10 disease outcomes (Fig.\u0026nbsp;\u003cspan refid=\"Fig11\" class=\"InternalRef\"\u003e11\u003c/span\u003eb; Additional file 8: Table \u003cspan refid=\"MOESM2\" class=\"InternalRef\"\u003eS2\u003c/span\u003e), showing 30 risk-increasing and eight risk-decreasing associations. The strongest was between BMI and OBS (OR\u0026thinsp;=\u0026thinsp;1.417, 95% CI 1.340\u0026ndash;1.498; FDR P-value\u0026thinsp;=\u0026thinsp;2.93 \u0026times; 10⁻\u0026sup3;\u0026sup3;), whereas the weakest was between ALB and MYO (OR\u0026thinsp;=\u0026thinsp;1.111, 95% CI 1.031\u0026ndash;1.197; FDR P-value\u0026thinsp;=\u0026thinsp;4.95 \u0026times; 10⁻\u0026sup2;). Across datasets, 15 WES-based associations replicated in the array-based PheWAS, most often involving hypertension and coronary artery disease. These replicated associations showed highly concordant effect sizes (Spearman r\u0026thinsp;=\u0026thinsp;0.804; P-value\u0026thinsp;=\u0026thinsp;4.87 \u0026times; 10⁻⁴) between WES and array-based PRSs.\u003c/p\u003e \u003cp\u003eFrom the significant associations identified in the PheWAS, we performed a two-sample Mendelian randomization (MR) analysis to explore potential causal relationships between biomarker measurements and the associated disease phenotypes, through associated SNPs in the PRSs. We examined 71 PRS associations: 33 from the WES-based PheWAS and 38 from the imputed array\u0026ndash;based PheWAS. All genome-wide significant SNPs (P-value\u0026thinsp;\u0026lt;\u0026thinsp;5 \u0026times; 10⁻⁸) were used as genetic instruments, while the outcome data were sourced from GIANT (\u003cspan citationid=\"CR61\" class=\"CitationRef\"\u003e61\u003c/span\u003e), CARDIoGRAMplusC4D (\u003cspan citationid=\"CR62\" class=\"CitationRef\"\u003e62\u003c/span\u003e), and FINNGEN (\u003cspan citationid=\"CR63\" class=\"CitationRef\"\u003e63\u003c/span\u003e) summary statistics (Additional file 1: Table S11). In the main inverse variance weighted (IVW) analysis of the WES-based data, 17 associations were significant (FDR P-value\u0026thinsp;\u0026lt;\u0026thinsp;0.05; Fig.\u0026nbsp;\u003cspan refid=\"Fig11\" class=\"InternalRef\"\u003e11\u003c/span\u003ec, \u003cspan refid=\"Fig11\" class=\"InternalRef\"\u003e11\u003c/span\u003ee). The strongest effect was observed for SBP on HYP (OR\u0026thinsp;=\u0026thinsp;8.160 per SD, 95% CI 5.656\u0026ndash;11.773; FDR P-value\u0026thinsp;=\u0026thinsp;1.23 \u0026times; 10⁻\u0026sup2;⁸), while the weakest was for APOB on OBS (OR\u0026thinsp;=\u0026thinsp;0.837 per SD, 95% CI 0.711\u0026ndash;0.985; FDR P-value\u0026thinsp;=\u0026thinsp;4.29 \u0026times; 10⁻\u0026sup2;) (Additional file 1: Tables S12-S13). No evidence of pleiotropy emerged (Additional file 1: Table S14), and IVW estimates were directionally consistent with MR-Egger and median-based methods, although ten associations did not reach significance under MR-Egger (FDR P-value\u0026thinsp;\u0026gt;\u0026thinsp;0.05).\u003c/p\u003e \u003cp\u003eCompared to the WES-based MR analysis, the imputed array\u0026ndash;based IVW analysis identified 18 significant associations (FDR P-value\u0026thinsp;\u0026lt;\u0026thinsp;0.05; Fig.\u0026nbsp;\u003cspan refid=\"Fig11\" class=\"InternalRef\"\u003e11\u003c/span\u003ed, \u003cspan refid=\"Fig11\" class=\"InternalRef\"\u003e11\u003c/span\u003ef; Additional file 1: Tables S15-S16). The strongest was observed between SBP and HYP (OR\u0026thinsp;=\u0026thinsp;12.893 per SD, 95% CI 5.656\u0026ndash;11.773; FDR P-value\u0026thinsp;=\u0026thinsp;3.00 \u0026times; 10⁻\u0026sup2;⁰), while the weakest involved DBP and CAD (OR\u0026thinsp;=\u0026thinsp;2.603 per SD, 95% CI 1.067\u0026ndash;6.345; FDR P-value\u0026thinsp;=\u0026thinsp;4.72 \u0026times; 10⁻\u0026sup2;). HBA1C was associated with T2D (OR\u0026thinsp;=\u0026thinsp;1.53 per SD, 95% CI 1.183\u0026ndash;2.003; FDR P-value\u0026thinsp;=\u0026thinsp;0.003) in the IVW results; however, this finding was not replicated in the weighted median (OR\u0026thinsp;=\u0026thinsp;1.18 per SD, 95% CI 0.976\u0026ndash;1.427; FDR P-value\u0026thinsp;=\u0026thinsp;0.117) or MR-Egger (OR\u0026thinsp;=\u0026thinsp;0.982 per SD, 95% CI 0.464\u0026ndash;2.078; FDR P-value\u0026thinsp;=\u0026thinsp;0.963) models. No evidence of horizontal pleiotropy was detected (Additional file 1: Table S17), and except for the HBA1C\u0026ndash;T2D association, IVW estimates were directionally consistent with the sensitivity analyses. Nevertheless, 11 associations were not significant according to MR-Egger.\u003c/p\u003e \u003cp\u003eFrom the WES-based MR analysis, six associations replicated in the array-based MR analysis, involving lipid-related traits (APOB, LDL, CHO) with CAD, blood pressure (SBP, DBP) with HYP, and body weight (BMI) with OBS. Although MR-PRESSO detected outliers in several disease associations, the significant results remained unchanged after excluding these variants (Additional file 1: Tables S18-S19).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003ePrediction of the PRSs for the target cohort\u003c/h2\u003e \u003cp\u003eWe next assessed the utility of the UKB reference dataset for polygenic risk prediction in the IAF target cohort by merging each UKB test set (WES- and array-based) with IAF WGS data and generating PRSs for 26 biomarkers (Additional file 1: Table S20). Nineteen PRSs showed significant correlations (P-value\u0026thinsp;\u0026lt;\u0026thinsp;0.05) between the WES-based and array-based scores, with eight exhibiting strong concordance (r\u0026thinsp;\u0026gt;\u0026thinsp;0.6), notably APOB (r\u0026thinsp;=\u0026thinsp;0.788), CHO (r\u0026thinsp;=\u0026thinsp;0.753), CRP (r\u0026thinsp;=\u0026thinsp;0.690), and TBIL (r\u0026thinsp;=\u0026thinsp;0.846). Stratifying IAF participants by PRS quartiles revealed frequent mismatches for PRSs with weaker correlations (e.g., BMI, AST, HBA1C) but minimal mismatches for those with strong correlations (e.g., TBIL, APOB, LDL) (Fig.\u0026nbsp;\u003cspan refid=\"Fig12\" class=\"InternalRef\"\u003e12\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eWe also compared each PRS to clinical measurements (Additional file 1: Tables S21-S22). Among WES-based PRSs, seven (APOA, APOB, CRP, LDL, SBP, TRIG, GGT) significantly correlated (P\u0026thinsp;\u0026lt;\u0026thinsp;0.05) with their corresponding biomarkers, led by APOB (r\u0026thinsp;=\u0026thinsp;0.617). Similarly, seven array-based PRSs (ALB, APOB, CRP, GGT, SHBG, TBIL, TPROT) were significant, with three (APOB, CRP, GGT) replicating the WES-based correlations, reinforcing the consistency of these scores across genotyping platforms. These findings shows the suitability of UKB as a matched-ancestry reference for trans-geographic PRS prediction, demonstrating that WES-based PRSs can describe clinical biomarkers variation and prove particularly valuable for lipid-related traits.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003eContextualizing PRSs at the individual level from a digital twin approach\u003c/h2\u003e \u003cp\u003eIn our polygenic risk modeling, we showed that WES-based PRSs not only associate with clinical biomarkers but also map exonic variants to functional gene information, including protein mutations, pathways, and tissue-specific expression. Using the APOB PRS as a test case, we identified ten top exonic variants indicative of high genetic risk for elevated APOB (Fig.\u0026nbsp;\u003cspan refid=\"Fig13\" class=\"InternalRef\"\u003e13\u003c/span\u003ea, \u003cspan refid=\"Fig13\" class=\"InternalRef\"\u003e13\u003c/span\u003eb). Among 19 genome-wide significant SNPs (mapped to 28 genes), three were missense variants, and two (rs3798220 in LPA and rs1801689 in APOH) were flagged as pathogenic by AlphaMissense and likely causal in fine-mapping analyses. Gene- and pathway-level enrichments pointed to lipid metabolism and familial hyperlipidemia, with relevant genes overexpressed in metabolic tissues (liver, pancreas, kidney, stomach). Together, these findings illustrate how a WES-based PRS can integrate genetic risk indicators with their functional and biological contexts at the individual level. Building on these findings, we developed a \u003cb\u003edigital twin\u003c/b\u003e representation of the APOB WES-PRS model as a multi-layered framework linking risk scores with the functional data from the SNPs used for PRS estimation. By consolidating gene-level annotations, protein variation (pathogenicity), pathway associations, tissue-expression profiles, fine-mapped causal loci, and broader clinical evidence (e.g., PheWAS, MR analyses), we propose that the top \u003cem\u003ek\u003c/em\u003e SNP predictors form an \u0026ldquo;individual genetic profile\u0026rdquo;. As illustrated in Fig.\u0026nbsp;\u003cspan refid=\"Fig13\" class=\"InternalRef\"\u003e13\u003c/span\u003ec, a PRS should thus be interpreted both quantitatively (the aggregate risk score) and qualitatively (functional context of each variant), providing a more holistic view of genetic risk for the associated trait.\u003c/p\u003e \u003cp\u003eIn our test case, the quantitative context of the APOB PRS model includes the score and model specifications (cohort summary data, population ancestry, covariates, and statistical metrics). The qualitative context encompasses all functional information mapped from the contributing SNP predictors, including pathogenicity, gene-set analyses, and reported disease-related associations (targeted PheWAS or MR analyses).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003eEvaluation of the PRS-informed digital twin model\u003c/h2\u003e \u003cp\u003eTo evaluate whether the WES-based PRS digital twin model for APOB accurately reflects relevant biological features, we performed two analyses centered on biological pathway associations and gene-tissue expression signatures. First, we tested how the APOB PRS values in IAF participants correlated with their median metabolite levels from Nightingale NMR Metabolomics, under the assumption that a genetic risk score tied to lipid pathways would also correlate with related metabolite components. Out of 250 measured metabolites, 31 were significantly associated (FDR P-value\u0026thinsp;\u0026lt;\u0026thinsp;0.05), all pertaining to lipids or lipoprotein subclasses (Additional file 8: Table \u003cspan refid=\"MOESM3\" class=\"InternalRef\"\u003eS3\u003c/span\u003e). Notably, phospholipids, free cholesterol, and LDL-related measures showed strong correlations with the APOB PRS, with S_LDL_PL (phospholipids in small LDL particles) displaying the highest correlation (r\u0026thinsp;=\u0026thinsp;0.620, FDR\u0026thinsp;=\u0026thinsp;0.021) (Fig.\u0026nbsp;\u003cspan refid=\"Fig14\" class=\"InternalRef\"\u003e14\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFor the gene\u0026ndash;tissue expression signatures, which included genes in significantly associated tissues, we constructed tissue-specific networks sourced from the GIANT database by prioritizing genes identified in our functional and fine-mapping analyses, using the HumanBase web platform (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://hb.flatironinstitute.org\u003c/span\u003e\u003cspan address=\"https://hb.flatironinstitute.org\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). For gene prioritization, we selected genes with genome-wide significant, contributing SNPs characterized as \u0026ldquo;likely causal\u0026rdquo; (fine-mapping) and significantly associated with the trait (MAGMA gene-based analyses). We hypothesized that a PRS could describe tissue-specific molecular processes potentially involved in the genetic risk for the trait.⁵\u003c/p\u003e \u003cp\u003eThe prioritized genes with genome-wide significant SNPs and associations (from gene-based and fine-mapping analyses) included CELSR2 (rs3895559), SARS/SARS1 (rs685653), POC5 (rs888789), LPA (rs3798220), BUD13 (rs11820589), APOH (rs1801689), PLCG1 (rs2076148), TOMM40 (rs157581), and LDLR (rs6413504). Using these genes, we constructed tissue-specific networks for the liver, kidney, pancreas, and stomach (Fig.\u0026nbsp;\u003cspan refid=\"Fig15\" class=\"InternalRef\"\u003e15\u003c/span\u003e), where links represent interaction confidence scores. Across tissues, strong gene\u0026ndash;gene connections emerged, with TOMM40 appearing as the most interconnected gene in the liver, pancreas, and kidney (interaction confidence\u0026thinsp;\u0026gt;\u0026thinsp;0.6). To further support these findings, we queried the Cardiovascular Disease Knowledge Portal (\u003cspan citationid=\"CR64\" class=\"CitationRef\"\u003e64\u003c/span\u003e) for lipid metabolism or cardiovascular evidence on the 57 mapped genes. Of these, 45 showed moderate to compelling support (Huge Score\u0026thinsp;\u0026gt;\u0026thinsp;3.0), linking them to cholesterol, lipoprotein metabolism, and various cardiovascular conditions such as coronary artery disease, hypertension, and cardiomyopathy (Additional file 1: Table S23). These results suggest that SNPs in PRSs with robust gene-level and functional annotations can help elucidate the underlying molecular characteristics of associated biomarkers.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFinally, we performed an enrichment analysis on the tissue-specific HumanBase networks using the g:Profiler (\u003cspan citationid=\"CR65\" class=\"CitationRef\"\u003e65\u003c/span\u003e) web tool (Additional file 1: Fig. S60). The most prominent pathways involved lipase activity, DNA replication, RNA processing, and metabolic processes, with the liver network notably enriched for DNA unwinding, helicase activity, and nucleic acid metabolism. Other significant processes, including blood vessel morphogenesis, tRNA aminoacylation, and protein translation regulation, were enriched across kidney and pancreas networks (g:SCS P-value\u0026thinsp;\u0026lt;\u0026thinsp;0.05).\u003c/p\u003e \u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003eIn this study, we set out to investigate the potential of using WES for polygenic risk modeling and individual-level interpretation of clinical biomarkers (as part of digital twin creation process) in disease diagnostics. Utilizing genomic data and biomarker measurements from the UKB, we demonstrated that PRSs derived from common exonic variants perform similarly to those generated from array-based approaches. Additionally, we showed that PRSs from WES can describe disease associations and potential causal relationships of genetic variants through targeted analysis of 17 disease outcomes. However, through functional annotation of genetic variants, we found that WES offers a greater biological context compared to genotyping arrays, linking SNPs to molecular entities and properties such as genes, pathogenicity, biological pathways, and tissue-related signatures. Furthermore, in an application case involving PRSs in a target population set (IAF), we illustrated how these molecular characteristics can describe individual-level variation based on the functional characteristics of the predictors included in a PRS model of a biomarker measurement, using a digital twin representation approach.\u003c/p\u003e \u003cp\u003eAlthough WES has traditionally been used to identify rare variants in monogenic disorders, our study broadens its utility by showing that common exonic variants also significantly contribute to disease etiology through their associations with clinical biochemistry biomarkers. We found that WES achieved a 70.63% mapping rate of tested SNPs to functional genomic information, compared to 11.64% for genotyping arrays. Most of the significant associations involved cardiovascular, hepatic, and renal biomarkers, with lipid-related measures showing the strongest links. Genes integral to lipid metabolism, including LPA, LDLR, PCSK9, and APOB, were strongly associated with these biomarkers, in line with prior thorough research on lipid genetics and cardiovascular diseases(\u003cspan additionalcitationids=\"CR67 CR68\" citationid=\"CR66\" class=\"CitationRef\"\u003e66\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR69\" class=\"CitationRef\"\u003e69\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eOur analyses further demonstrate that WES can generate PRSs whose performance is comparable to those derived from imputed array data. Scores based on 26 of the 27 biomarkers examined aligned with array-based results and, when combined with targeted PheWAS and MR analyses, revealed genetic predispositions to multiple disease outcomes, included previously described ones from genotyping array studies (\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e, \u003cspan citationid=\"CR70\" class=\"CitationRef\"\u003e70\u003c/span\u003e). Most notably, lipid measurement PRSs were associated with cardiovascular diseases, supporting previous evidence of causal links between lipid-related variants and conditions such as ischemic heart disease (ISC), myocardial infarction (MYO), and coronary artery disease (CAD) (\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e, \u003cspan citationid=\"CR71\" class=\"CitationRef\"\u003e71\u003c/span\u003e, \u003cspan citationid=\"CR72\" class=\"CitationRef\"\u003e72\u003c/span\u003e). Furthermore, our approach shows that WES-based PRSs can be transferred across European subpopulations, illustrated by the results for seven PRSs in the IAF Flemish cohort. From our analysis, our results supports that matching local cohorts serve as controls in polygenic risk prediction (\u003cspan citationid=\"CR73\" class=\"CitationRef\"\u003e73\u003c/span\u003e, \u003cspan citationid=\"CR74\" class=\"CitationRef\"\u003e74\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eIn our study, we demonstrated that integrating functional information from exonic variants within a PRS model can reveal key molecular aspects of individual genetic risk. Using APOB lipid measurements from the WES PRS model as a test case, we identified contributing SNP predictors associated with cardiovascular genes linked to lipid metabolism and expressed in organs such as the liver and kidneys. Building on these insights, we propose a PRS-informed digital twin framework that combines risk scores with functional data, encompassing risk genes, identified causal loci, biological pathways, gene-tissue expression signatures, and associated disease outcomes, to create an \u0026ldquo;individual genetic profile.\u0026rdquo; This multi-layered representation not only quantifies genetic risk within a population context, as described by the APOB PRS model, but also provides qualitative insights into how lipid metabolism mediates this risk.\u003c/p\u003e \u003cp\u003eUsing NMR metabolomics measurements from the IAF cohort, we identified significant associations between APOB WES-based risk scores and various lipoprotein and cholesterol subclasses, supporting our hypothesis that a PRS captures key molecular pathways, as illustrated in our PRS-informed digital twin model. Elevated levels of lipoprotein and cholesterol-related particles are well-recognized drivers of dyslipidemia and cardiometabolic risk, with APOB playing a central role as a molecular transporter of these metabolites (\u003cspan additionalcitationids=\"CR76\" citationid=\"CR75\" class=\"CitationRef\"\u003e75\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR77\" class=\"CitationRef\"\u003e77\u003c/span\u003e). Our PRS-informed digital twin model corroborates existing evidence on the validity of the APOB PRS, aligning with recent studies that show genetic effects on APOB levels mediate the abundance of LDL-related particles, which are linked to increased risk for CAD, peripheral artery disease, and venous thromboembolism (\u003cspan citationid=\"CR78\" class=\"CitationRef\"\u003e78\u003c/span\u003e, \u003cspan citationid=\"CR79\" class=\"CitationRef\"\u003e79\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eFurthermore, our tissue-specific network analysis of the contributing SNPs in the APOB WES-based PRS model revealed several PRS-associated genes implicated in cardiometabolic risk. Notably, TOMM40 emerged as a hub gene in the liver, kidney, and pancreas, where it has previously been described for its roles in mitochondrial function, oxidative stress, and lipid metabolism (\u003cspan additionalcitationids=\"CR81\" citationid=\"CR80\" class=\"CitationRef\"\u003e80\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR82\" class=\"CitationRef\"\u003e82\u003c/span\u003e). Several interacting genes, particularly those partnering with TOMM40 in the liver, such as the MCM complex (DNA metabolism) (\u003cspan citationid=\"CR83\" class=\"CitationRef\"\u003e83\u003c/span\u003e), MRPL3 (energy metabolism) (\u003cspan citationid=\"CR84\" class=\"CitationRef\"\u003e84\u003c/span\u003e), HSPD1 (mitochondrial and lipid metabolism) (\u003cspan citationid=\"CR85\" class=\"CitationRef\"\u003e85\u003c/span\u003e), and RUBVL2 (glucose and lipid regulation) (\u003cspan citationid=\"CR86\" class=\"CitationRef\"\u003e86\u003c/span\u003e), further suggest that the APOB PRS encompasses direct and potentially indirect links to lipid metabolism, as well as broader molecular mechanisms influencing metabolic processes across multiple tissues.\u003c/p\u003e \u003cp\u003ePrevious studies suggest that genes involved in metabolic pathways, also influence regulatory processes tied to various diseases, including obesity, diabetes, and cancer (\u003cspan citationid=\"CR87\" class=\"CitationRef\"\u003e87\u003c/span\u003e, \u003cspan citationid=\"CR88\" class=\"CitationRef\"\u003e88\u003c/span\u003e). The omnigenic model, a recently proposed framework, posits that, beyond core genes exerting direct effects, peripheral genes in broader cellular processes also shape disease risk through gene-regulatory pathways (\u003cspan citationid=\"CR89\" class=\"CitationRef\"\u003e89\u003c/span\u003e, \u003cspan citationid=\"CR90\" class=\"CitationRef\"\u003e90\u003c/span\u003e). In the context of our study, genes such as LDLR, APOB, and PCSK9 have been previously described as core genes in monogenic dyslipidemias, directly affecting lipid metabolism and contributing to disease risk (\u003cspan citationid=\"CR91\" class=\"CitationRef\"\u003e91\u003c/span\u003e). Yet our analyses also highlight numerous regulatory variants across multiple genes, including HNF1A (associated with monogenic diabetes), that contribute to CAD risk (\u003cspan citationid=\"CR92\" class=\"CitationRef\"\u003e92\u003c/span\u003e). Furthermore, tissue-specific networks from our digital twin model identify genes involved not only in metabolic processes but also in regulatory roles like protein translation, cell cycle control, and DNA replication, supporting the omnigenic model for lipid traits. Nonetheless, our analysis does not provide direct confirmation, as the genetic architecture of lipid metabolism across multiple disease outcomes, including its gene-regulatory aspects, remains poorly understood.\u003c/p\u003e \u003cp\u003eAlthough the omnigenic model remains hypothetical, developing a PRS-informed digital twin framework could help determine whether a polygenic trait follows an omnigenic architecture. By integrating functional genomic data with proteomics, transcriptomics, and clinical datasets, researchers can evaluate the \u0026ldquo;omnigenic trait\u0026rdquo; hypothesis across conditions such as cardiovascular disorders (\u003cspan citationid=\"CR93\" class=\"CitationRef\"\u003e93\u003c/span\u003e), neurological disorders (\u003cspan citationid=\"CR94\" class=\"CitationRef\"\u003e94\u003c/span\u003e), and cancer (\u003cspan citationid=\"CR95\" class=\"CitationRef\"\u003e95\u003c/span\u003e). Our APOB PRS-informed digital twin case study also demonstrates how functional genomics, particularly from sequencing, can elucidate variant-driven metabolic interactions, even with limited statistical power. These variants affect multiple biological levels, underscoring the need for a systems genetics approach to refine the biological relevance of PRSs (\u003cspan citationid=\"CR96\" class=\"CitationRef\"\u003e96\u003c/span\u003e). By first focusing on key tissues and then expanding to related genes and pathways, researchers can uncover novel gene-molecular interactions that underlie tissue-specific processes and regulatory mechanisms in complex diseases. Ultimately, a digital twin framework enhances genomic methodologies by integrating functional evidence to prioritize molecular targets, automate prediction, and deepen insights into complex genetic architectures, paving the way for more precise, personalized medicine.\u003c/p\u003e \u003cp\u003eThere are several limitations to our study worth noting. First, we focused on populations with European ancestry, primarily due to data availability for our in-house IAF cohort. Future work with multi-ancestry cohorts is essential to confirm the utility of WES-based PRSs and improve their transferability across diverse populations, while also aiding in functional gene characterization. Second, our prediction analyses were limited by the small size of the target cohort (N\u003csub\u003eInd\u003c/sub\u003e = 30). Although using the UKB as a proxy dataset revealed significant associations, larger cohorts would strengthen these findings. Third, our analysis was confined to uncorrelated SNPs (r\u0026sup2; \u0026lt; 0.1) to facilitate the mapping of functional characteristics and validate reported associations. We also used genome-wide significant variants in MR analyses to evaluate their effects as instrumental variables, underscoring the need for larger studies to capture additional associated variants. Fourth, we focused exclusively on common variants (MAF\u0026thinsp;\u0026gt;\u0026thinsp;1%), excluding rare variants that may contribute to disease risk but pose challenges for PRS inclusion due to low allele frequencies and reduced predictive impact. Lastly, this study was an exploratory application of WES for polygenic risk modeling. While PRS performance was generally similar for most traits, determining whether WES outperforms SNP-array platforms for every trait was beyond our scope. Further sequencing studies and integrative analyses linking genomic, molecular, and clinical data are needed to refine PRS methodologies and strengthen risk stratification models. Such efforts would enhance our understanding of individual genetic variation, particularly when PRS approaches are integrated within a digital twin framework.\u003c/p\u003e"},{"header":"Conclusions","content":"\u003cp\u003eIn this study, we demonstrate that PRSs for clinical biomarkers can be estimated from WES and associated with clinically relevant disease outcomes. Additionally, we showed that the functional characterization of genetic variants provides biological insights into the associated biomarkers, which can be linked to disease risk. Moreover, we illustrated that a PRS, as a digital twin model, could potentially explain individual-level variation based on the functional information of the predictors. Finally, we showed that the UKB can be used as a proxy dataset to predict PRSs for small population studies, potentially accelerating biomedical research for local or small cohorts.\u003c/p\u003e"},{"header":"Materials and methods","content":"\u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003eDescription of the study cohorts\u003c/h2\u003e \u003cp\u003e The UKB resource is a large, prospective cohort that incorporates phenotype, genotyping, sequencing, clinical, and health-related data from more than 500,000 participants recruited throughout the United Kingdom between 2006 and 2010, all aged 40 to 70 at the time of assessment. Additional information on data collection methods can be found elsewhere.\u003c/p\u003e \u003cp\u003eThe VITO IAF study is a smaller, prospective longitudinal cohort of 30 healthy participants aged 47 to 54 at recruitment, in which clinical biochemistry, WGS, multi-omics, health questionnaires, and physical characteristics were measured over one year. Further details on design, eligibility criteria, and data collection provided by Dries et al. (2024) (\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003ePhenotype definition of clinical biomarkers\u003c/h2\u003e \u003cp\u003eWe focused our analysis on 24 serum biomarkers and three physical measurements, available in both UKB and IAF datasets, known for their associations with diseases and diagnostic value(\u003cspan citationid=\"CR97\" class=\"CitationRef\"\u003e97\u003c/span\u003e). These measurements included anthropometric, bone and joint, cancer, cardiovascular, diabetes, liver, and renal-related biomarkers. Summary information on baseline characteristics and phenotype summary statistics for the datasets used in this study can be found in Additional file 1: Tables S1-S5.\u003c/p\u003e \u003cp\u003eFor the UKB, we conducted quality control at the phenotype level. We excluded participants who had withdrawn from the biobank and those with missing measurements. We then identified individuals taking cholesterol-lowering and anti-hypertensive medications using the medication questionnaire data (Data fields: 6177 and 6153). We removed participants taking cholesterol-lowering drugs. For individuals who reported using an anti-hypertensive medication, we adjusted their SBP and DBP values by adding 15 and 10 mmHg, respectively (\u003cspan citationid=\"CR98\" class=\"CitationRef\"\u003e98\u003c/span\u003e, \u003cspan citationid=\"CR99\" class=\"CitationRef\"\u003e99\u003c/span\u003e).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec19\" class=\"Section2\"\u003e \u003ch2\u003eSequencing and genotyping data\u003c/h2\u003e \u003cp\u003eFor the UKB, we utilized the interim release of the population-level exome variants (Data field: 23156). This dataset comprises exonic variants in PLINK format available for 200,643 individuals. In addition to WES, we used the imputed array data available for 487,159 individuals (Data field: 22828). For this dataset, we selected variants with an imputation \u0026ldquo;INFO\u0026rdquo; score greater than 0.3 and restricted our analyses to individuals with available exome sequencing data.\u003c/p\u003e \u003cp\u003eFor the IAF, we used whole-genome data (30X) from all participants. Information on the sequencing procedure can be found in Additional file 1: Supplementary methodology. We processed the raw FASTQ files for variant calling according to the OQFE protocol(\u003cspan citationid=\"CR100\" class=\"CitationRef\"\u003e100\u003c/span\u003e). Briefly, raw reads were mapped to the GRCh38 human reference genome with BWA-MEM(\u003cspan citationid=\"CR101\" class=\"CitationRef\"\u003e101\u003c/span\u003e) to generate mapped BAM files. For each BAM file, we called variants using DeepVariant (\u003cspan citationid=\"CR102\" class=\"CitationRef\"\u003e102\u003c/span\u003e) with the parameter \u0026ldquo;\u0026mdash;model_type WGS\u0026rdquo; to obtain genomic variant call files (gVCF). All gVCFs were then joint-genotyped with GLNexus (\u003cspan citationid=\"CR103\" class=\"CitationRef\"\u003e103\u003c/span\u003e) and converted into BED format.\u003c/p\u003e \u003cp\u003eFor association analyses and polygenic risk modeling, we applied stringent quality control to the genomic data of the UKB reference dataset, following the recommendations by Chang (2020) and Marees et al. (2018) (\u003cspan citationid=\"CR104\" class=\"CitationRef\"\u003e104\u003c/span\u003e, \u003cspan citationid=\"CR105\" class=\"CitationRef\"\u003e105\u003c/span\u003e). We focused our analysis on common variants with MAF\u0026thinsp;\u0026gt;\u0026thinsp;0.01 and LD r\u003csup\u003e2\u003c/sup\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.1 for UKB unrelated individuals with matched ancestry on the IAF cohort, identified through population structure analysis using PCA and UMAP. Detail information on the population structure analysis and variant quality control can be found in Additional file 1: Supplementary methodology.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec20\" class=\"Section2\"\u003e \u003ch2\u003eSingle-variant association analysis\u003c/h2\u003e \u003cp\u003eWe performed single-variant association testing using regenie (\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e) via a two-step regression approach. In the first step, regenie fits a whole genome regression model to capture individual trait variability using the leave-one-chromosome-out (LOCO) scheme and Ridge regression. In the second step, the LOCO predictions are used as offsets for association testing using linear regression models. For single-variant association testing, we first applied an inverse rank-based normal transformation(\u003cspan citationid=\"CR106\" class=\"CitationRef\"\u003e106\u003c/span\u003e) (RINT) to each quantitative trait. We then fitted the linear regression models adjusting for age, sex, and the first 40 ancestry PCs. For the reference imputed array data (UKB), we included the genotype batch as a covariate in the regression models. Finally, for the obtained associations, we checked for previously reported in other studies using the PhenoScanner V2 database and the GWAS Catalog from the R packages \u003cem\u003ePhenoScanner\u003c/em\u003e (\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e) and \u003cem\u003egwasrapidd\u003c/em\u003e (\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec21\" class=\"Section2\"\u003e \u003ch2\u003eHeritability measurements and genetic correlation\u003c/h2\u003e \u003cp\u003eWe estimated the SNP-based heritability (h\u003csup\u003e2\u003c/sup\u003e\u003csub\u003eg\u003c/sub\u003e) from the UKB sequencing and genotyping reference datasets using BOLT-REML (via BOLT-LMM) (\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e), which employs a Monte Carlo algorithm-based Restricted Maximum Likelihood (REML) estimation. For each trait, we fitted the model using the individual-level genotype data and adjusted for age, sex, and the first 40 ancestry PCs. Additionally, for the imputed array data, we included the genotype batch as a covariate in the model. To calibrate BOLT-REML statistics, we utilized the 1KG LD scores (European background) as a reference panel, provided by the developers.\u003c/p\u003e \u003cp\u003eTo detect shared genetic architecture between traits, we calculated pairwise genetic correlations (r\u003csub\u003eg\u003c/sub\u003e) with SCORE (\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e), using a randomized method of moments estimator. Similarly, as with BOLT-REML, we applied SCORE to the individual-level genotype data and adjusted for covariates for both sequencing and genotyping base datasets.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec22\" class=\"Section2\"\u003e \u003ch2\u003eFunctional annotation and characterization with FUMA\u003c/h2\u003e \u003cp\u003eWe used FUMA (\u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e) web-based platform to annotate the base summary statistics using its integrated ANNOVAR resource(\u003cspan citationid=\"CR107\" class=\"CitationRef\"\u003e107\u003c/span\u003e), and identify independent significant SNPs. Independent SNPs were defined at a genome-wide significance level (P-value\u0026thinsp;\u0026lt;\u0026thinsp;5 \u0026times; 10\u003csup\u003e\u0026minus;\u0026thinsp;8\u003c/sup\u003e) and r\u003csup\u003e2\u003c/sup\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.6. From this subset, lead SNPs were defined at r\u003csup\u003e2\u003c/sup\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.1. The 1KG European panel was used as a reference panel, merging between LD blocks at a maximum distance of 2.5 Mb. To identify independent signals, we analyzed SNPs outside the major histocompatibility complex (MHC) region.\u003c/p\u003e \u003cp\u003eIn addition to functional annotation, we conducted a generalized gene-set analyses using MAGMA (\u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e) implementation from FUMA. MAGMA (\u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e) performs multiple linear principal component regression analyses to map SNPs into gene properties using several reference datasets provided by FUMA, including a curated set of 14,857 protein-coding genes, the MSigDB for biological pathways, and the GTEx RNA-sequencing dataset for tissue-specificity. To account for multiple testing, we applied the Bonferroni correction.\u003c/p\u003e \u003cdiv id=\"Sec23\" class=\"Section3\"\u003e \u003ch2\u003ePathogenicity analysis\u003c/h2\u003e \u003cp\u003eTo check for functional impact of missense variants detected in the single-variant association testing, we predict their pathogenicity using AlphaMissense (\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e) and ESM1b (\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e) scores, using ProtVar (\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e) web-server. For AlphaMissense, a variant is predicted as likely \u0026ldquo;pathogenic\u0026rdquo; if its score is greater than 0.56. For ESM1b model, we consider a variant likely pathogenic if its score is lower than \u0026minus;\u0026thinsp;10.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec24\" class=\"Section2\"\u003e \u003ch2\u003eFine-mapping\u003c/h2\u003e \u003cp\u003eWe performed a fine-mapping analysis on the base summary statistics (UKB) to identify causal variants affecting the clinical biomarkers, using FINEMAP (\u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e54\u003c/span\u003e) with PolyFun (\u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e55\u003c/span\u003e). From the independent loci detected by FUMA, we extended each associated locus to a minimum size of \u0026plusmn;\u0026thinsp;5 Mb to obtain a test region for fine-mapping. For each locus, we calculated the LD score matrix from the individual-level genotype data (in-sample LD) using PolyFun. We then applied FINEMAP with default parameters and a maximum number of 12 signals or credible sets. Priors were defined as 1/number of SNPs located in the genomic region, following default options. We summarized the output from FINEMAP to report the posterior probability (PIP) and causal effect sizes of each SNP and their credible set.\u003c/p\u003e \u003cdiv id=\"Sec25\" class=\"Section3\"\u003e \u003ch2\u003ePolygenic risk scores\u003c/h2\u003e \u003cp\u003eWe generated PRSs for the traits using five different approaches based on clumping and thresholding (PRSice2(\u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e56\u003c/span\u003e)), Bayesian regression (LDpred2 (\u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e57\u003c/span\u003e), R\u0026aacute;pidoPGS (\u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e58\u003c/span\u003e), PRS-CS (\u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e59\u003c/span\u003e)), and penalized regression (lassosum2 (\u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e60\u003c/span\u003e)). We estimated the risk scores for the UKB reference training and validation sets using the base summary statistics generated from the single-variant association analysis. PRSs were calculated as the sum of dosages of SNPs multiplied by their adjusted effect sizes or weights. Information on the application of the methods can be found in Additional file 1: Supplementary methodology.\u003c/p\u003e \u003cp\u003eFor each method, we initially estimated PRSs on the training set to compare and select the best method for the validation set based on performance. Linear regression models were used to evaluate the association between the scores and the measurements. We adjusted each model for age, sex, and the first 40 ancestry PCs (\u003cspan citationid=\"CR108\" class=\"CitationRef\"\u003e108\u003c/span\u003e). For the imputed array data, the genotyping batch was also included as a covariate. We evaluated the predictive performance of the PRSs using the R\u003csup\u003e2\u003c/sup\u003e estimate for explained variation. This was calculated by subtracting the R\u003csup\u003e2\u003c/sup\u003e value of the full model (PRSs and covariates) from the R\u003csup\u003e2\u003c/sup\u003e of the null model (only covariates). We calculated the normal-based 95% confidence interval (CI) using bootstrapping with 100 replicates.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec26\" class=\"Section3\"\u003e \u003ch2\u003eTargeted phenome-wide association analysis\u003c/h2\u003e \u003cp\u003eWe performed a targeted PheWAS in the UKB validation set to determine associations between the risk scores and multiple chronic diseases within the context of the target data (IAF). For this endeavor, we focused on the most prevalent diseases in the Belgian population reported by Van Wilder et al. (2022) (\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e), studying 17 diseases that include cardiovascular, infectious, and respiratory diseases, musculoskeletal disorders, cancer, mental disorders, and endocrine-related conditions. Summary details on the phenotype definitions can be found in Additional file 1: Table \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e From the UKB, we used summary diagnoses from hospital inpatient data (Data fields 41202 and 41203), coded according to the International Classification of Diseases (ICD-9 and ICD-10). We then manually mapped ICD codes to Phecodes (\u003cspan citationid=\"CR109\" class=\"CitationRef\"\u003e109\u003c/span\u003e) and derived cases and controls using the R package \u003cem\u003ePheWAS\u003c/em\u003e (\u003cspan citationid=\"CR110\" class=\"CitationRef\"\u003e110\u003c/span\u003e). For the selected diseases in the PheWAS, we only included outcomes with at least 200 cases in the validation dataset. Summary details on Phecodes are provided in Additional file 1: Table S10. We tested associations between PRSs and each disease case using logistic regression models adjusted for age, sex, the first 40 ancestry PCs, and (for the imputed array data) genotype batch. Confidence intervals were estimated with a normal-based 95% CI. To correct for multiple testing, we applied the false discovery rate (FDR) method, considering associations significant at FDR P-value\u0026thinsp;\u0026lt;\u0026thinsp;0.05.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec27\" class=\"Section3\"\u003e \u003ch2\u003eMendelian randomization analyses\u003c/h2\u003e \u003cp\u003eFrom the significant associations obtained from the targeted PheWAS, we conducted a two-sample MR analysis to assess the causal relationship between genetic variants included in PRSs and disease outcomes. We followed the guidelines provided by Burgess et al. (2019) (\u003cspan citationid=\"CR111\" class=\"CitationRef\"\u003e111\u003c/span\u003e). For MR, we defined SNPs with genome-wide significance\u0026thinsp;\u0026lt;\u0026thinsp;5 \u0026times; 10\u003csup\u003e\u0026minus;\u0026thinsp;8\u003c/sup\u003e as genetic instruments (instrumental variables) associated with clinical biomarkers (exposure), using the summary statistics from the generated UKB base datasets For outcome data, we used the GIANT (\u003cspan citationid=\"CR61\" class=\"CitationRef\"\u003e61\u003c/span\u003e) consortium, CARDIoGRAMplusC4D (\u003cspan citationid=\"CR62\" class=\"CitationRef\"\u003e62\u003c/span\u003e), and FinnGen (\u003cspan citationid=\"CR63\" class=\"CitationRef\"\u003e63\u003c/span\u003e). Summary information about the outcome datasets used in this study can be found in Additional file 1: Table S11. Outcome datasets were retrieved from the MR-Base platform (\u003cspan citationid=\"CR112\" class=\"CitationRef\"\u003e112\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eFor causal inference, we used the multiplicative random-effect IVW method as our primary approach for estimating causal effects. Heterogeneity was assessed with Cochran\u0026rsquo;s Q to identify potential outliers, while MR-Egger regression (Egger intercept P-value\u0026thinsp;\u0026lt;\u0026thinsp;0.05) tested for horizontal pleiotropy. We further applied the simple and weighted median methods (assuming at least 50% valid instruments), and MR-PRESSO to correct outliers, comparing results across methods to support our IVW estimates. All analyses were performed in R using the \u003cem\u003eTwoSampleMR\u003c/em\u003e and \u003cem\u003eMRPRESSO\u003c/em\u003e packages, with data clumped at r\u0026sup2; \u0026lt; 0.01 and harmonized by correcting non-palindromic SNP strands and removing palindromic variants. Multiple testing was controlled using FDR, with significance set at FDR P-value\u0026thinsp;\u0026lt;\u0026thinsp;0.05.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec28\" class=\"Section2\"\u003e \u003ch2\u003ePrediction of PRSs for the target cohort\u003c/h2\u003e \u003cp\u003eWe estimated clinical biomarker PRSs for the IAF cohort using our UKB base summary and validation datasets. After merging IAF genotype data by retaining common variants from individuals with matched ancestry, we computed risk scores for each biomarker using PLINK and the best-performing PRS method. PRSs were generated from both exome-based and array-based summary statistics and compared using Spearman\u0026rsquo;s correlation analysis. Finally, we examined correlations between the risk scores and the median biomarker measurements.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec29\" class=\"Section2\"\u003e \u003ch2\u003ePRS-informed digital twin model\u003c/h2\u003e \u003cp\u003eFrom the IAF target cohort PRSs, we selected the WES-based model, chosen for its high concordance with the array-based approach and strong correlation with biomarker measurements, to construct a digital twin model. Following Vall\u0026eacute;e (2023) (\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e), who defines a digital twin as an integration of multi-scale knowledge with a data-driven model, we conceptualized the PRS as a combination of risk scores and the functional information of contributing SNPs. We then mapped the top \u003cem\u003ek\u003c/em\u003e SNPs to functional annotations derived from our characterization analyses and disease association tests (targeted PheWAS and MR analyses).\u003c/p\u003e \u003cp\u003eUsing this functional information, we developed a multi-layered representation with the following categories:\u003c/p\u003e \u003cp\u003e\u003cul\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eAssociated Genes\u003c/b\u003e: genes linked to the top contributing, genome-wide significant SNPs.\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003ePathogenicity\u003c/b\u003e: contributing SNPs predicted as pathogenic by AlphaMissense and ESM1b.\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eBiological pathways\u003c/b\u003e: Key pathways identified via FUMA (MAGMA) enrichment analysis.\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eGTEx signatures\u003c/b\u003e: Tissues significantly enriched according to GTEx data via FUMA (MAGMA).\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eCausal loci (candidate genes)\u003c/b\u003e: SNPs deemed \u0026ldquo;likely causal\u0026rdquo; in fine-mapping and mapped to significant genes via FUMA enrichment.\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eAssociated outcomes\u003c/b\u003e: Disease outcomes linked to the PRS model from targeted PheWAS and MR analyses.\u003c/p\u003e\u003c/li\u003e\u003c/ul\u003e\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eValidation of the PRS-informed digital twin\u003c/h3\u003e\n\u003cp\u003eTo validate that the WES PRS model reflects the biological characteristics of our digital twin model, we conducted two key analyses focusing on biological pathways and GTEx signatures. For the pathway analysis, we hypothesized that if the SNPs in the PRS are linked to relevant pathways, then the risk scores should correlate with molecular components of these pathways. We tested this by performing a Spearman correlation between the PRS values in the IAF target cohort and its measurements from the Nightingale NMR metabolomics dataset, which includes 250 biomarkers covering lipid, amino acid, and glycolysis-related pathways. These biomarkers serve as proxies for the metabolic processes associated with the enriched pathways, allowing us to assess whether the PRS captures pathway-level molecular perturbations.\u003c/p\u003e \u003cp\u003eFor the GTEx signatures analysis, we hypothesized that if genes mapped from the contributing SNPs in the PRS exhibit significant tissue-expression profiles, then the PRS should capture the tissue-specific molecular processes underlying trait variation. To test this, we used candidate genes with SNPs identified as \u0026ldquo;likely causal\u0026rdquo; in our PRS-informed digital model to construct tissue-specific gene interaction networks via the HumanBase platform (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://hb.flatironinstitute.org\u003c/span\u003e\u003cspan address=\"https://hb.flatironinstitute.org\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) using data from the GIANT database. These networks integrate gene co-expression and tissue-specific expression data to elucidate gene functions in each tissue. We then performed gene-set enrichment analysis using g:Profiler (\u003cspan citationid=\"CR65\" class=\"CitationRef\"\u003e65\u003c/span\u003e) to identify the biological categories associated with each network, applying the g:SCS algorithm for multiple testing correction.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe would like to thank all participants of the UK Biobank and the participants from the VITO IAM Frontier Study. UK Biobank data access was registered under the project application number 71521.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthors contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eA.C.R. and G.E. conceptualized the study. A.C.R. developed the methodology. Investigation was performed by A.C.R., D.K., and G.E. T.K., D.K., and G.E. supervised this work. The original draft was written by A.C.R. A.C.R., T.K., and G.E. reviewed and edited the draft.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis study was supported by the Flemish Special Research Fund (BOF): BOF21DOC23.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eDeclaration of interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare that they have no competing interests.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthics declaration\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eFor the UK Biobank, informed consent was obtained from all participants, and the study received approval from the North West Multi-Centre Research Ethics Committee (MREC).\u003c/p\u003e\n\u003cp\u003eFor the VITO IAM Frontier Study, ethical approval was obtained from the Ethical Committee of the University Hospital Antwerp (UZA) and the University of Antwerp. Informed consent was provided from all participants.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe VITO IAM Frontier dataset have not been deposited in a public repository since it includes personal data from a private research institution. However, data access can be available upon request from the corresponding author. For the UK Biobank, data access can be requested at https://www.ukbiobank.ac.uk/enable-your-research/apply-for-access as an approved research project.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCode availability\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAnalysis scripts and codes are available on GitHub at https://github.com/alejocrojo09/wes_prs/.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eLewis CM, Vassos E. Polygenic risk scores: From research tools to clinical instruments. Genome Med. 2020;12(1):1\u0026ndash;11.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXiang R, Kelemen M, Xu Y, Harris LW, Parkinson H, Inouye M, et al. Recent advances in polygenic scores: translation, equitability, methods and FAIR tools. Genome Med. 2024;16(1):1\u0026ndash;14.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCollister JA, Liu X, Clifton L. Calculating Polygenic Risk Scores (PRS) in UK Biobank: A Practical Guide for Epidemiologists. Front Genet. 2022;13(February):1\u0026ndash;17.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChoi SW, Mak TSH, O\u0026rsquo;Reilly PF. Tutorial: a guide to performing polygenic risk score analyses. Nat Protoc. 2020;15(9):2759\u0026ndash;72.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLambert SA, Gil L, Jupp S, Ritchie SC, Xu Y, Buniello A, et al. The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nat Genet. 2021;53(4):416\u0026ndash;25.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLennon NJ, Kottyan LC, Kachulis C, Abul-Husn NS, Arias J, Belbin G, et al. Selection, optimization and validation of ten chronic disease polygenic risk scores for clinical implementation in diverse US populations. Nat Med. 2024;30(2):480\u0026ndash;7.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSinnott-Armstrong N, Tanigawa Y, Amar D, Mars N, Benner C, Aguirre M et al. Genetics of 35 blood and urine biomarkers in the UK Biobank. Nat Genet [Internet]. 2021;53(2):185\u0026ndash;94. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://dx.doi.org/10.1038/s41588-020-00757-z\u003c/span\u003e\u003cspan address=\"10.1038/s41588-020-00757-z\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXu Y, Ritchie SC, Liang Y, Timmers PRHJ, Pietzner M, Lannelongue L, et al. An atlas of genetic scores to predict multi-omic traits. Nature. 2023;616(7955):123\u0026ndash;31.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWarren TL, Tubbs JD, Lesh TA, Corona MB, Pakzad SS, Albuquerque MD et al. Association of neurotransmitter pathway polygenic risk with specific symptom profiles in psychosis. Mol Psychiatry. 2024;(January).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang W, Zhang K. Understanding the Biological Basis of Polygenic Risk Scores and Disparities in Prostate Cancer: A Comprehensive Genomic Analysis. Cancer Inf. 2024;23.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHari Dass SA, McCracken K, Pokhvisneva I, Chen LM, Garg E, Nguyen TTT et al. A biologically-informed polygenic score identifies endophenotypes and clinical conditions associated with the insulin receptor function on specific brain regions. EBioMedicine [Internet]. 2019;42:188\u0026ndash;202. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.ebiom.2019.03.051\u003c/span\u003e\u003cspan address=\"10.1016/j.ebiom.2019.03.051\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGaulton KJ, Preissl S, Ren B. Interpreting non-coding disease-associated human variants using single-cell epigenomics. Nat Rev Genet. 2023;24(8):516\u0026ndash;34.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNiguidula N, Alamillo C, Shahmirzadi Mowlavi L, Powis Z, Cohen JS, Farwell Hagman KD. Clinical whole-exome sequencing results impact medical management. Mol Genet genomic Med. 2018;6(6):1068\u0026ndash;78.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu Y, Hao C, Li K, Hu X, Gao H, Zeng J et al. Clinical Application of Whole Exome Sequencing for Monogenic Disorders in PICU of China. Front Genet. 2021;12(September).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVan Hout CV, Tachmazidou I, Backman JD, Hoffman JD, Liu D, Pandey AK et al. Exome sequencing and characterization of 49,960 individuals in the UK Biobank. Nature [Internet]. 2020;586(7831):749\u0026ndash;56. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://dx.doi.org/10.1038/s41586-020-2853-0\u003c/span\u003e\u003cspan address=\"10.1038/s41586-020-2853-0\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSamuels DC, Han L, Li J, Quanghu S, Clark TA, Shyr Y et al. Finding the lost treasures in exome sequencing data. Trends Genet [Internet]. 2013;29(10):593\u0026ndash;9. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://dx.doi.org/10.1016/j.tig.2013.07.006\u003c/span\u003e\u003cspan address=\"10.1016/j.tig.2013.07.006\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang Z, Choi SW, Chami N, Boerwinkle E, Fornage M, Redline S, et al. The Value of Rare Genetic Variation in the Prediction of Common Obesity in European Ancestry Populations. Front Endocrinol (Lausanne). 2022;13(May):1\u0026ndash;12.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDou J, Wu D, Ding L, Wang K, Jiang M, Chai X, et al. Using off-target data from whole-exome sequencing to improve genotyping accuracy, association analysis and polygenic risk prediction. Brief Bioinform. 2021;22(3):1\u0026ndash;12.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAlkhamis FA, Alabdali MM, Alsulaiman AA, Alamri AS, Alali R, Akhtar MS et al. Whole-exome sequencing analyses in a Saudi Ischemic Stroke Cohort reveal association signals, and shows polygenic risk scores are related to Modified Rankin Scale Risk. Funct Integr Genomics [Internet]. 2023;23(2):1\u0026ndash;9. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s10142-023-01039-7\u003c/span\u003e\u003cspan address=\"10.1007/s10142-023-01039-7\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYuan J, Qiu R, Wang Y, Chen ZJ, Sun H, Dai W et al. Exome-wide genetic risk score (ExGRS) to predict high myopia across multi-ancestry populations. 2024;1\u0026ndash;10.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAldisi R, Hassanin E, Sivalingam S, Buness A, Klinkhammer H, Mayr A et al. Gene-based burden scores identify rare variant associations for 28 blood biomarkers. BMC Genomic Data [Internet]. 2023;24(1):1\u0026ndash;11. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1186/s12863-023-01155-0\u003c/span\u003e\u003cspan address=\"10.1186/s12863-023-01155-0\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang Q, Dhindsa RS, Carss K, Harper AR, Nag A, Tachmazidou I, et al. Rare variant contribution to human disease in 281,104 UK Biobank exomes. Nature. 2021;597(7877):527\u0026ndash;32.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562(7726):203\u0026ndash;9.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePark J, Talozzi L, Greicius MD. Rare genetic associations with human lifespan in UK Biobank are enriched for oncogenic genes. Nat Commun [Internet]. 2025; Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://dx.doi.org/10.1038/s41467-025-57315-6\u003c/span\u003e\u003cspan address=\"10.1038/s41467-025-57315-6\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNag A, Dhindsa RS, Middleton L, Jiang X, Vitsios D, Wigmore E, et al. Effects of protein-coding variants on blood metabolite measurements and clinical biomarkers in the UK Biobank. Am J Hum Genet. 2023;110(3):487\u0026ndash;98.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAllara E, Morani G, Carter P, Gkatzionis A, Zuber V, Foley CN, et al. Genetic Determinants of Lipids and Cardiovascular Disease Outcomes: A Wide-Angled Mendelian Randomization Investigation. Circ Genomic Precis Med. 2019;12(12):543\u0026ndash;51.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTabassum R, R\u0026auml;m\u0026ouml; JT, Ripatti P, Koskela JT, Kurki M, Karjalainen J et al. Genetic architecture of human plasma lipidome and its link to cardiovascular disease. Nat Commun. 2019;10(1).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eStevenson-Hoare J, Heslegrave A, Leonenko G, Fathalla D, Bellou E, Luckcuck L et al. Plasma biomarkers and genetics in the diagnosis and prediction of Alzheimer\u0026rsquo;s disease. Brain [Internet]. 2023;146(2):690\u0026ndash;9. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/brain/awac128\u003c/span\u003e\u003cspan address=\"10.1093/brain/awac128\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGui H, Schriemer D, Cheng WW, Chauhan RK, Antiňolo G, Berrios C, et al. Whole exome sequencing coupled with unbiased functional analysis reveals new Hirschsprung disease genes. Genome Biol. 2017;18(1):1\u0026ndash;13.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiang H, Cheung LWT, Li J, Ju Z, Yu S, Stemke-Hale K, et al. Whole-exome sequencing combined with functional genomics reveals novel candidate driver cancer genes in endometrial cancer. Genome Res. 2012;22(11):2120\u0026ndash;9.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWilcox N, Dumont M, Gonz\u0026aacute;lez-Neira A, Carvalho S, Joly Beauparlant C, Crotti M, et al. Exome sequencing identifies breast cancer susceptibility genes and defines the contribution of coding variants to breast cancer risk. Nat Genet. 2023;55(9):1435\u0026ndash;9.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBomba L, Walter K, Guo Q, Surendran P, Kundu K, Nongmaithem S et al. Whole-exome sequencing identifies rare genetic variants associated with human plasma metabolites. Am J Hum Genet [Internet]. 2022;109(6):1038\u0026ndash;54. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.ajhg.2022.04.009\u003c/span\u003e\u003cspan address=\"10.1016/j.ajhg.2022.04.009\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKatsoulakis E, Wang Q, Wu H, Shahriyari L, Fletcher R, Liu J, et al. Digital twins for health: a scoping review. npj Digit Med. 2024;7(1):1\u0026ndash;11.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi X, Loscalzo J, Mahmud AKMF, Aly DM, Rzhetsky A, Zitnik M et al. Digital twins as global learning health and disease models for preventive and personalized medicine. Genome Med [Internet]. 2025;17(1):11. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://www.ncbi.nlm.nih.gov/pubmed/39920778\u003c/span\u003e\u003cspan address=\"http://www.ncbi.nlm.nih.gov/pubmed/39920778\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBj\u0026ouml;rnsson B, Borrebaeck C, Elander N, Gasslander T, Gawel DR, Gustafsson M, et al. Digital twins to personalize medicine. Genome Med. 2020;12(1):10\u0026ndash;3.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVall\u0026eacute;e A. Digital twin for healthcare systems. Front Digit Heal [Internet]. 2023;5(September):1\u0026ndash;6. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3389/fdgth.2023.1253050\u003c/span\u003e\u003cspan address=\"10.3389/fdgth.2023.1253050\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHeylen D, Clerck C, De, Pusparum M, Rojo AC, Heuvel R, Van Den, Standaert A, et al. Cohort profile: The I AM Frontier prospective cohort study in Flanders Key words Introduction. A healthcare transition towards personalized prevention; 2024.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVan Wilder L, Devleesschauwer B, Clays E, Van der Heyden J, Charafeddine R, Scohy A et al. QALY losses for chronic diseases and its social distribution in the general population: results from the Belgian Health Interview Survey. BMC Public Health [Internet]. 2022;22(1):1\u0026ndash;9. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1186/s12889-022-13675-y\u003c/span\u003e\u003cspan address=\"10.1186/s12889-022-13675-y\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAbraham G, Qiu Y, Inouye M. FlashPCA2: principal component analysis of Biobank-scale genotype datasets. Bioinformatics. 2017;33(17):2776\u0026ndash;8.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMbatchou J, Barnard L, Backman J, Marcketta A, Kosmicki JA, Ziyatdinov A et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat Genet [Internet]. 2021;53(7):1097\u0026ndash;103. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://dx.doi.org/10.1038/s41588-021-00870-7\u003c/span\u003e\u003cspan address=\"10.1038/s41588-021-00870-7\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKamat MA, Blackshaw JA, Young R, Surendran P, Burgess S, Danesh J, et al. PhenoScanner V2: An expanded tool for searching human genotype-phenotype associations. Bioinformatics. 2019;35(22):4851\u0026ndash;3.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMagno R, Maia AT. Gwasrapidd: An R package to query, download and wrangle GWAS catalog data. Bioinformatics. 2020;36(2):649\u0026ndash;50.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOrliac EJ, Banos DT, Ojavee SE, L\u0026auml;ll K, M\u0026auml;gi R, Visscher PM, et al. Improving GWAS discovery and genomic prediction accuracy in biobank data. Proc Natl Acad Sci U S A. 2022;119(31):1\u0026ndash;8.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWu Y, Burch KS, Ganna A, Pajukanta P, Pasaniuc B, Sankararaman S. Fast estimation of genetic correlation for biobank-scale data. Am J Hum Genet. 2022;109(1):24\u0026ndash;32.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003evan Rheenen W, Peyrot WJ, Schork AJ, Lee SH, Wray NR. Genetic correlations of polygenic disease traits: from theory to practice. Nat Rev Genet [Internet]. 2019;20(10):567\u0026ndash;81. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://dx.doi.org/10.1038/s41576-019-0137-z\u003c/span\u003e\u003cspan address=\"10.1038/s41576-019-0137-z\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWatanabe K, Taskesen E, Van Bochoven A, Posthuma D. Functional mapping and annotation of genetic associations with FUMA. Nat Commun [Internet]. 2017;8(1):1\u0026ndash;10. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://dx.doi.org/10.1038/s41467-017-01261-5\u003c/span\u003e\u003cspan address=\"10.1038/s41467-017-01261-5\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAuton A, Abecasis GR, Altshuler DM, Durbin RM, Bentley DR, Chakravarti A, et al. A global reference for human genetic variation. Nature. 2015;526(7571):68\u0026ndash;74.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ede Leeuw CA, Mooij JM, Heskes T, Posthuma D. MAGMA: Generalized Gene-Set Analysis of GWAS Data. PLoS Comput Biol. 2015;11(4):1\u0026ndash;19.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCheng J, Novati G, Pan J, Bycroft C, Žemgulyte A, Applebaum T et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science (80-). 2023;381(6664).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBrandes N, Goldman G, Wang CH, Ye CJ, Ntranos V. Genome-wide prediction of disease variant effects with a deep protein language model. Nat Genet. 2023;55(9):1512\u0026ndash;22.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eStephenson JD, Totoo P, Burke DF, J\u0026auml;nes J, Beltrao P, Martin MJ. ProtVar: mapping and contextualizing human missense variation. Nucleic Acids Res. 2024;52(W1):W140\u0026ndash;7.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEpstein ES. Genetic Testing in Patients with High Lipoprotein(a): Experience from the UCSD Lipoprotein(a) Specialty Clinic\u0026dagger;. J Clin Lipidol [Internet]. 2022;16(1, Supplement):e12\u0026ndash;3. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.sciencedirect.com/science/article/pii/S1933287421002178\u003c/span\u003e\u003cspan address=\"https://www.sciencedirect.com/science/article/pii/S1933287421002178\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHoekstra M, Chen HY, Rong J, Dufresne L, Yao J, Guo X, et al. Genome-Wide Association Study Highlights APOH as a Novel Locus for Lipoprotein(a) Levels-Brief Report. Arterioscler Thromb Vasc Biol. 2021;41(1):458\u0026ndash;64.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBenner C, Spencer CCA, Havulinna AS, Salomaa V, Ripatti S, Pirinen M. FINEMAP: Efficient variable selection using summary data from genome-wide association studies. Bioinformatics. 2016;32(10):1493\u0026ndash;501.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWeissbrod O, Hormozdiari F, Benner C, Cui R, Ulirsch J, Gazal S et al. Functionally informed fine-mapping and polygenic localization of complex trait heritability. Nat Genet [Internet]. 2020;52(12):1355\u0026ndash;63. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://dx.doi.org/10.1038/s41588-020-00735-5\u003c/span\u003e\u003cspan address=\"10.1038/s41588-020-00735-5\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChoi SW, O\u0026rsquo;Reilly PF. PRSice-2: Polygenic Risk Score software for biobank-scale data. Gigascience. 2019;8(7):1\u0026ndash;6.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePriv\u0026eacute; F, Arbel J, Vilhj\u0026aacute;lmsson BJ. LDpred2: Better, faster, stronger. Bioinformatics. 2020;36(22\u0026ndash;23):5424\u0026ndash;31.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eReales G, Vigorito E, Kelemen M, Wallace C, R\u0026aacute;pidoPGS:. A rapid polygenic score calculator for summary GWAS data without a test dataset. Bioinformatics. 2021;37(23):4444\u0026ndash;50.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGe T, Chen CY, Ni Y, Feng YCA, Smoller JW. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat Commun [Internet]. 2019;10(1):1\u0026ndash;10. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://dx.doi.org/10.1038/s41467-019-09718-5\u003c/span\u003e\u003cspan address=\"10.1038/s41467-019-09718-5\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePriv\u0026eacute; F, Arbel J, Aschard H, Vilhj\u0026aacute;lmsson BJ. Identifying and correcting for misspecifications in GWAS summary statistics and polygenic scores. Hum Genet Genomics Adv. 2022;3(4):1\u0026ndash;15.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYengo L, Sidorenko J, Kemper KE, Zheng Z, Wood AR, Weedon MN, et al. Meta-analysis of genome-wide association studies for height and body mass index in ~\u0026thinsp;700 000 individuals of European ancestry. Hum Mol Genet. 2018;27(20):3641\u0026ndash;9.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDeloukas P, Kanoni S, Willenborg C, Farrall M, Assimes TL, Thompson JR, et al. Large-scale association analysis identifies new risk loci for coronary artery disease. Nat Genet. 2013;45(1):25\u0026ndash;33.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKurki MI, Karjalainen J, Palta P, Sipil\u0026auml; TP, Kristiansson K, Donner KM, et al. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature. 2023;613(7944):508\u0026ndash;18.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCostanzo MC, Roselli C, Brandes M, Duby M, Hoang Q, Jang D, et al. Cardiovascular Disease Knowledge Portal: A Community Resource for Cardiovascular Disease Research. Circ Genomic Precis Med. 2023;16(6):583\u0026ndash;6.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKolberg L, Raudvere U, Kuzmin I, Adler P, Vilo J, Peterson H. G:Profiler-interoperable web service for functional enrichment analysis and gene identifier mapping (2023 update). Nucleic Acids Res. 2023;51(W1):W207\u0026ndash;12.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSelvaraj MS, Li X, Li Z, Pampana A, Zhang DY, Park J et al. Whole genome sequence analysis of blood lipid levels in \u0026gt;\u0026thinsp;66,000 individuals. Nat Commun. 2022;13(1).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKathiresan S, Srivastava D. Genetics of human cardiovascular disease. Cell [Internet]. 2012;148(6):1242\u0026ndash;57. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://dx.doi.org/10.1016/j.cell.2012.03.001\u003c/span\u003e\u003cspan address=\"10.1016/j.cell.2012.03.001\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDron JS, Hegele RA. Genetics of Lipid and Lipoprotein Disorders and Traits. Curr Genet Med Rep. 2016;4(3):130\u0026ndash;41.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eErdmann J, Kessler T, Munoz Venegas L, Schunkert H. A decade of genome-wide association studies for coronary artery disease: The challenges ahead. Cardiovasc Res. 2018;114(9):1241\u0026ndash;57.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRichardson TG, Harrison S, Hemani G, Smith GD. An atlas of polygenic risk score associations to highlight putative causal relationships across the human phenome. Elife. 2019;8:1\u0026ndash;24.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eO\u0026rsquo;Sullivan JW, Raghavan S, Marquez-Luna C, Luzum JA, Damrauer SM, Ashley EA, et al. Polygenic Risk Scores for Cardiovascular Disease: A Scientific Statement From the American Heart Association. Circulation. 2022;146(8):E93\u0026ndash;118.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFu L, Liu Q, Cheng H, Zhao X, Xiong J, Mi J. Insights Into Causal Effects of Genetically Proxied Lipids and Lipid-Modifying Drug Targets on Cardiometabolic Diseases. J Am Hear Assoc. 2025;14(3):1\u0026ndash;20.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eArtomov M, Loboda AA, Artyomov MN, Daly MJ. Public platform with 39,472 exome control samples enables association studies without genotype sharing. Nat Genet. 2024;56(February).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWojcik GL, Murphy J, Edelson JL, Gignoux CR, Ioannidis AG, Manning A, et al. Opportunities and challenges for the use of common controls in sequencing studies. Nat Rev Genet. 2022;23(11):665\u0026ndash;79.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eUrbina EM, McCoy CE, Gao Z, Khoury PR, Shah AS, Dolan LM et al. Lipoprotein particle number and size predict vascular structure and function better than traditional lipids in adolescents and young adults. J Clin Lipidol [Internet]. 2017;11(4):1023\u0026ndash;31. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.jacl.2017.05.011\u003c/span\u003e\u003cspan address=\"10.1016/j.jacl.2017.05.011\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAday AW, Lawler PR, Cook NR, Ridker PM, Mora S, Pradhan AD. Lipoprotein Particle Profiles, Standard Lipids, and Peripheral Artery Disease Incidence: Prospective Data from the Women\u0026rsquo;s Health Study. Circulation. 2018;138(21):2330\u0026ndash;41.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGalimberti F, Casula M, Olmastroni E. Apolipoprotein B compared with low-density lipoprotein cholesterol in the atherosclerotic cardiovascular diseases risk assessment. Pharmacol Res [Internet]. 2023;195(May):106873. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.phrs.2023.106873\u003c/span\u003e\u003cspan address=\"10.1016/j.phrs.2023.106873\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLee J, Gilliland TC, Dron J, Koyama S, Nakao T, Lannery K, et al. Integrative Metabolomics Differentiate Coronary Artery Disease, Peripheral Artery Disease, and Venous Thromboembolism Risks. Arterioscler Thromb Vasc Biol. 2024;44(9):2108\u0026ndash;17.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZuber V, Gill D, Ala-Korpela M, Langenberg C, Butterworth A, Bottolo L, et al. High-throughput multivariable Mendelian randomization analysis prioritizes apolipoprotein B as key lipid risk factor for coronary artery disease. Int J Epidemiol. 2021;50(3):893\u0026ndash;901.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSayeed N, Sugaya K. Exosome mediated Tom40 delivery protects against hydrogen peroxide-induced oxidative stress by regulating mitochondrial function. PLoS One [Internet]. 2022;17(8 August):1\u0026ndash;16. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://dx.doi.org/10.1371/journal.pone.0272511\u003c/span\u003e\u003cspan address=\"10.1371/journal.pone.0272511\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang X, Wang S, Liu W, Wang T, Wang J, Gao X et al. Epigenetic upregulation of miR-126 induced by heat stress contributes to apoptosis of rat cardiomyocytes by promoting Tomm40 transcription. J Mol Cell Cardiol [Internet]. 2019;129:39\u0026ndash;48. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.yjmcc.2018.10.005\u003c/span\u003e\u003cspan address=\"10.1016/j.yjmcc.2018.10.005\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHumphries AD, Streimann IC, Stojanovski D, Johnston AJ, Yano M, Hoogenraad NJ et al. Dissection of the mitochondrial import and assembly pathway for human Tom40. J Biol Chem [Internet]. 2005;280(12):11535\u0026ndash;43. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://dx.doi.org/10.1074/jbc.M413816200\u003c/span\u003e\u003cspan address=\"10.1074/jbc.M413816200\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCao T, Yi SJ, Wang LX, Zhao JX, Xiao J, Xie N et al. Identification of the DNA Replication Regulator MCM Complex Expression and Prognostic Significance in Hepatic Carcinoma. Biomed Res Int. 2020;2020.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLin X, Guo L, Lin X, Wang Y, Zhang G. Expression and prognosis analysis of mitochondrial ribosomal protein family in breast cancer. Sci Rep [Internet]. 2022;12(1):1\u0026ndash;13. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41598-022-14724-7\u003c/span\u003e\u003cspan address=\"10.1038/s41598-022-14724-7\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eParma B, Ramesh V, Gollavilli PN, Siddiqui A, Pinna L, Schwab A et al. Metabolic impairment of non-small cell lung cancers by mitochondrial HSPD1 targeting. J Exp Clin Cancer Res [Internet]. 2021;40(1):1\u0026ndash;20. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1186/s13046-021-02049-8\u003c/span\u003e\u003cspan address=\"10.1186/s13046-021-02049-8\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJavary J, Allain-Courtois N, Saucisse N, Costet P, Heraud C, Benhamed F et al. Liver Reptin/RUVBL2 controls glucose and lipid metabolism with opposite actions on mTORC1 and mTORC2 signalling. Gut [Internet]. 2018;67(12):2192 LP \u0026ndash; 2203. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://gut.bmj.com/content/67/12/2192.abstract\u003c/span\u003e\u003cspan address=\"http://gut.bmj.com/content/67/12/2192.abstract\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDahik VD, Frisdal E, Goff W, Le. Rewiring of lipid metabolism in adipose tissue macrophages in obesity: Impact on insulin resistance and type 2 diabetes. Int J Mol Sci. 2020;21(15):1\u0026ndash;30.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSnaebjornsson MT, Janaki-Raman S, Schulze A. Greasing the Wheels of the Cancer Machine: The Role of Lipid Metabolism in Cancer. Cell Metab [Internet]. 2020;31(1):62\u0026ndash;76. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.sciencedirect.com/science/article/pii/S1550413119306175\u003c/span\u003e\u003cspan address=\"https://www.sciencedirect.com/science/article/pii/S1550413119306175\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBoyle EA, Li YI, Pritchard JK. An Expanded View of Complex Traits: From Polygenic to Omnigenic. Cell [Internet]. 2017;169(7):1177\u0026ndash;86. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://dx.doi.org/10.1016/j.cell.2017.05.038\u003c/span\u003e\u003cspan address=\"10.1016/j.cell.2017.05.038\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMathieson I. The omnigenic model and polygenic prediction of complex traits. Am J Hum Genet [Internet]. 2021;108(9):1558\u0026ndash;63. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.ajhg.2021.07.003\u003c/span\u003e\u003cspan address=\"10.1016/j.ajhg.2021.07.003\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu X, Li YI, Pritchard JK. Trans Effects on Gene Expression Can Drive Omnigenic Inheritance. Cell [Internet]. 2019;177(4):1022\u0026ndash;1034.e6. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.cell.2019.04.014\u003c/span\u003e\u003cspan address=\"10.1016/j.cell.2019.04.014\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHughes MF, Lenighan YM, Godson C, Roche HM. Exploring Coronary Artery Disease GWAs Targets With Functional Links to Immunometabolism. Front Cardiovasc Med. 2018;5(November):1\u0026ndash;12.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVan Der Harst P, Verweij N. Identification of 64 novel genetic loci provides an expanded view on the genetic architecture of coronary artery disease. Circ Res. 2018;122(3):433\u0026ndash;43.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu Y, Ren H, Zhang Y, Deng W, Ma X, Zhao L et al. Temporal changes in brain morphology related to inflammation and schizophrenia: An omnigenic Mendelian randomization study. Psychol Med. 2024.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang B, Dong X, Hu J, Gao L. Multi-omics peripheral and core regions of cancer. npj Syst Biol Appl. 2022;8(1).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAllayee H, Farber CR, Seldin MM, Williams EG, James DE, Lusis AJ. Systems genetics approaches for understanding complex traits with relevance for human disease. Elife. 2023;12:1\u0026ndash;29.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J, et al. UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age. PLoS Med. 2015;12(3):1\u0026ndash;10.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWarren HR, Evangelou E, Cabrera CP, Gao H, Ren M, Mifsud B, et al. Genome-wide association analysis identifies novel blood pressure loci and offers biological insights into cardiovascular risk. Nat Genet. 2017;49(3):403\u0026ndash;15.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFung K, Ram\u0026iacute;rez J, Warren HR, Aung N, Lee AM, Tzanis E, et al. Genome-wide association study identifies loci for arterial stiffness index in 127,121 UK Biobank participants. Sci Rep. 2019;9(1):1\u0026ndash;8.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSzustakowski JD, Balasubramanian S, Kvikstad E, Khalid S, Bronson PG, Sasson A, et al. Advancing human genetics research and drug discovery through exome sequencing of the UK Biobank. Nat Genet. 2021;53(7):942\u0026ndash;8.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754\u0026ndash;60.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePoplin R, Chang PC, Alexander D, Schwartz S, Colthurst T, Ku A, et al. A universal snp and small-indel variant caller using deep neural networks. Nat Biotechnol. 2018;36(10):983.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLin MF, Dnanexus OR, Penn J, Bai X, Reid JG, Krasheninina O et al. GLnexus: joint variant calling for large cohort sequencing. bioRxiv [Internet]. 2018;343970. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.biorxiv.org/content/10.1101/343970v1%0Ahttps://www.biorxiv.org/content/10.1101/343970v1.abstract\u003c/span\u003e\u003cspan address=\"https://www.biorxiv.org/content/10.1101/343970v1%0Ahttps://www.biorxiv.org/content/10.1101/343970v1.abstract\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMarees AT, de Kluiver H, Stringer S, Vorspan F, Curis E, Marie-Claire C, et al. A tutorial on conducting genome-wide association studies: Quality control and statistical analysis. Int J Methods Psychiatr Res. 2018;27(2):1\u0026ndash;10.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChang CC. Data Management and Summary Statistics with PLINK. Methods Mol Biol. 2020;2090:49\u0026ndash;65.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMcCaw ZR, Lane JM, Saxena R, Redline S, Lin X. Operating characteristics of the rank-based inverse normal transformation for quantitative trait analysis in genome-wide association studies. Biometrics. 2020;76(4):1262\u0026ndash;72.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang K, Li M, Hakonarson H. ANNOVAR: Functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010;38(16):1\u0026ndash;7.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDor E, Margaliot I, Brandes N, Zuk O, Linial M, Rappoport N. Selecting Covariates for Genome-Wide Association Studies. bioRxiv [Internet]. 2023;2023.02.07.527425. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.biorxiv.org/content/10.1101/2023.02.07.527425v1%0Ahttps://www.biorxiv.org/content/\u003c/span\u003e\u003cspan address=\"https://www.biorxiv.org/content/10.1101/2023.02.07.527425v1%0Ahttps://www.biorxiv.org/content/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1101/2023.02.07.527425v1.abstract\u003c/span\u003e\u003cspan address=\"10.1101/2023.02.07.527425v1.abstract\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWei WQ, Bastarache LA, Carroll RJ, Marlo JE, Osterman TJ, Gamazon ER, et al. Evaluating phecodes, clinical classification software, and ICD-9-CM codes for phenome-wide association studies in the electronic health record. PLoS ONE. 2017;12(7):1\u0026ndash;16.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCarroll RJ, Bastarache L, Denny JC. R PheWAS: Data analysis and plotting tools for phenome-wide association studies in the R environment. Bioinformatics. 2014;30(16):2375\u0026ndash;6.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBurgess S, Davey Smith G, Davies NM, Dudbridge F, Gill D, Glymour MM, et al. Guidelines for performing Mendelian randomization investigations. Wellcome Open Res. 2019;4:186.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHemani G, Zheng J, Elsworth B, Wade KH, Haberland V, Baird D, et al. The MR-base platform supports systematic causal inference across the human phenome. Elife. 2018;7:1\u0026ndash;29.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"genome-biology","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"gbio","sideBox":"Learn more about [Genome Biology](https://genomebiology.biomedcentral.com/)","snPcode":"13059","submissionUrl":"https://submission.springernature.com/new-submission/13059/3","title":"Genome Biology","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"BMC/SO AJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"digital twin, whole-exome sequencing, polygenic risk scores, clinical biomarkers, disease risk, UK Biobank, IAM Frontier","lastPublishedDoi":"10.21203/rs.3.rs-6169446/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6169446/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003e\u003cstrong\u003eBackground\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003ePolygenic risk scores (PRSs) are widely used to assess genetic predisposition, but genotyping arrays typically target non-coding variants with limited functional annotation. In contrast, whole-exome sequencing (WES) maps variants to protein-coding regions, providing functional insights that can enrich PRS interpretation and support novel computational frameworks to infer individual genetic predisposition.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eResults\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe evaluated WES for polygenic risk modeling and functional interpretation using common exonic variants across 27 clinical biomarkers and 17 disease outcomes in the UK Biobank (N = 105,506) and applied the approach to the VITO IAM Frontier cohort (N = 30). WES achieved a 70.63% mapping rate of single-nucleotide polymorphisms (SNPs) to functional genomic information, compared to 11.64% for genotyping arrays, with most associations observed for lipid, hepatic, and renal biomarkers. PRS performance was comparable to that derived from imputed array data and linked to 11 disease outcomes, including cardiovascular conditions. The best-performing PRS in the target cohort was used to develop a digital twin model that integrates biological pathways, gene tissue expression signatures, and disease associations, validated by existing clinical and metabolomic data.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConclusions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eOur study demonstrates that WES-derived PRSs can effectively capture clinically relevant disease associations. However, through functional characterization of associated exonic variants, we show that a PRS, as a digital twin model, could potentially explain individual-level variation and provide biological information on how genetic variants mediate genetic risk.\u003c/p\u003e","manuscriptTitle":"Transforming polygenic risk prediction: functional annotation and digital twin modeling with whole-exome sequencing","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-03-10 10:18:37","doi":"10.21203/rs.3.rs-6169446/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2026-04-29T17:32:06+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-04-06T17:13:35+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"52353245443256524056449017851777549190","date":"2026-02-27T14:46:28+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-04-14T20:13:06+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"12896353403136538380972314658633263501","date":"2025-03-28T18:58:17+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-03-28T16:01:23+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-03-17T11:41:39+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-03-06T10:33:24+00:00","index":"","fulltext":""},{"type":"submitted","content":"Genome Biology","date":"2025-03-06T10:06:48+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"genome-biology","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"gbio","sideBox":"Learn more about [Genome Biology](https://genomebiology.biomedcentral.com/)","snPcode":"13059","submissionUrl":"https://submission.springernature.com/new-submission/13059/3","title":"Genome Biology","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"BMC/SO AJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"721ffe23-030c-461c-8d12-86483b8f0ab2","owner":[],"postedDate":"March 10th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"in-revision","subjectAreas":[],"tags":[],"updatedAt":"2026-04-29T17:39:18+00:00","versionOfRecord":[],"versionCreatedAt":"2025-03-10 10:18:37","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-6169446","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6169446","identity":"rs-6169446","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00