Exploring and visualizing stratified genome-wide association study results with PheWeb 2

doi:10.21203/rs.3.rs-7463215/v1

Exploring and visualizing stratified genome-wide association study results with PheWeb 2

2025 · doi:10.21203/rs.3.rs-7463215/v1

preprint OA: closed

Full text JSON View at publisher

Full text 74,140 characters · extracted from preprint-html · click to expand

Exploring and visualizing stratified genome-wide association study results with PheWeb 2 | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Brief Communication Exploring and visualizing stratified genome-wide association study results with PheWeb 2 Justin Bellavance*, Hongyu Xiao*, Le Chang, Mehrdad Kazemi, Seyla Wickramasinghe, and 5 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7463215/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 19 Jan, 2026 Read the published version in Nature Genetics → Version 1 posted You are reading this latest preprint version Abstract The lack of functionalities in web-based tools for interacting with stratified genome-wide association study (GWAS) summary-level results is currently hindering researchers from advancing knowledge of ancestry and sex on the genetics of complex human diseases and traits. Here we introduce PheWeb 2, a completely rewritten enhanced version of our original web-based tool, which offers intuitive and efficient interactive navigation and visual comparison across stratified GWAS results within a single framework. *Justin Bellavance & Hongyu Xiao contributed equally. Biological sciences/Genetics/Genetic association study/Genome-wide association studies Health sciences/Medical research/Genetics research Sex-stratified ancestry-stratified interactive browser summary statistics data sharing genome-wide association analysis PheWeb CLSA Figures Figure 1 Introduction The increasing scale and diversity of population-based biobanks now enable genome-wide association studies (GWAS) in hundreds of thousands of individuals. 1-3 This scale enables stratified analyses to uncover sex- or ancestry-differential genetic effects (e.g. 4-6 ). However, a critical bottleneck remains: the lack of functionalities in existing interactive web-based tools for intuitive exploration and comparison of stratified GWAS summary-level data, which hinders the interpretation of results into biological insights. Our original PheWeb, for instance, has facilitated biobank-scale exploration of GWAS results and has been adopted by numerous international consortia. 7 Here, we introduce PheWeb 2, a complete rewrite of the software designed to support the next generation of genetic discoveries. Within a single interactive environment, PheWeb 2 enables researchers to integrate and query GWAS results not only across hundreds of phenotypes but also across numerous cohort stratifications. Key functionality enhancements in PheWeb 2 include interactive Miami plots and stacked LocusZoom and PheWAS plots, where researchers can dynamically select GWAS stratifications of interest for visual side-by-side comparison. Each plot is synchronized with the accompanying dynamic table, which in real-time joins variant-level association results across any selected pair of stratifications. When exploring Miami plots, the analyst can instantly switch between views displaying statistics of genotype-phenotype associations and sex-by-genotype interactions. The addition of this new interaction feature facilitates an integrated dynamic workflow for investigating potential differential genetic effects across GWAS stratifications of the same trait and across the traits within a single PheWeb 2 instance. To exemplify the utility of PheWeb 2, we applied it to visualize and query stratified GWAS results for human traits using data from the Canadian Longitudinal Study on Aging (CLSA). 8,9 In brief, the CLSA is a national, longitudinal research platform that included 51,338 participants aged 45 to 85 years at baseline from the 10 Canadian provinces, of which 30,097 from 7 provinces (comprehensive cohort) completed in-person interviews including physical assessments, blood and urine samples, and body composition assessed using dual energy x-ray absorptiometry (DXA). 8 Details regarding the study design have been previously published. Over 26,000 participants from the CLSA comprehensive cohort had genetic data available at baseline. 9 The CLSA PheWeb 2 instance displays genotype-phenotype association results for 48,784,824 well-imputed (imputation quality r 2 >0.3) autosomal and chromosome X variants with at least five minor alleles, and sex-by-genotype interaction results for 11,274,702 variants with at least 160 minor alleles in the study population. In total, we analyzed 226 unique traits across six stratifications, of which 224 had at least one genome-wide significant locus in at least one stratification (p < 5×10 -8 ). Details on quality control and analyses are provided in the Online Methods . We showcase how PheWeb 2 facilitates the discovery of sex-differential genetic effects using sex-stratified results of non-high-density lipoprotein (non HDL) cholesterol levels as an illustrative example ( Figure 1a, Supplementary Figure 1 ). Variants in the APOE gene, such as missense variant rs7412, were significantly associated in female and male stratifications for non HDL in CLSA participants from all genetic ancestries (females: N = 13,187, p = 3.20×10 -66 , β[SE] = -0.37[0.022]; males: N = 13,205, p = 1.00×10 -15 , β[SE] = -0.17[0.022]) ( Figure 1a ). When investigating the sex-by-genotype interaction results for non HDL via PheWeb 2’s new interaction Manhattan plot ( Figure 1b ), the T allele rs7412 was identified as being differentially associated in both the female and male stratifications (p = 1.1×10 -8 , β[SE] = -0.19[0.033]). Stacked regional LocusZoom plots in PheWeb 2 ( Figure 1c ) allowed for intuitive comparison between two selected stratifications and a quick deduction that rs7412 is in linkage disequilibrium (r 2 > 0.8) with rs1065853. The integration of the GWAS Catalog track above the stacked regional plots assuredly showed that there are many known trait associations within this region, such as Alzheimer’s disease, blood protein levels, cerebral amyloid deposit (PET imaging) and low-density lipoprotein (LDL) cholesterol levels. When looking at PheWeb 2’s stacked PheWAS plots for the all ancestries female stratification and the all ancestries male stratification for rs7412, this variant shows evidence of association with other related continuous traits assessed in CLSA PheWeb 2, including total cholesterol (females: p = 1.80×10 -42 ; males: p = 2.00×10 -12 ), low-density lipoprotein (females: p = 2.00×10 -95 ; males: p = 5.90×10 -39 ), and triglycerides (females: p = 1.10×10 -6 ; males: p = 2.40×10 -17 ), among others ( Figure 1d ). In total, our sex-stratified comparison allowed us to highlight 227 significant loci unique to the female stratification and 263 significant loci unique to the male stratification. Among the 277 significant loci that overlapped between the female and male stratifications, none were located on the X chromosome. Our results highlight the importance of assessing sex-differential effects genome-wide, which could have important implications for personalized medicine. When using PheWeb 2 to cross-compare results from ancestry stratifications, the genetic associations with glycosylated hemoglobin (HbA1c) levels were the most prominent example in CLSA ( Supplementary Figure 2 ). The T allele of missense variant rs1050828 within the G6PD gene on chromosome X was associated with decreased levels of HbA1c in the analysis considering all genetic ancestries (p = 5.90×10 -11 , β[SE] = -0.91[0.14]), which agrees with findings from published GWAS of non-European ancestries or multi-ancestry meta-analyses. 10-13 The accompanying interactive table ( Supplementary Figure 3 ) shows that the T allele was very rare in the CLSA combined genetic ancestries analysis (allele frequency 0.1%) and was absent in the European genetic ancestry stratification. The PheWeb 2’s stacked regional plots for this locus did not show evidence of association in the European genetic ancestry stratifications ( Supplementary Figure 4 ). This result exemplifies the importance of routinely including chromosome X variants as well as study participants of non-European genetic ancestries, even in studies like the CLSA that are largely (>90%) European, into GWAS. By making stratified genotype-phenotype and sex-by-genotype interaction results readily explorable and cross-comparable, our extension and reworking of PheWeb into PheWeb 2 enables researchers to systematically identify and contextualize contrasting results from many GWAS within a single web browser tab. Overall, PheWeb 2 transitions from the paradigm of “one cohort - one GWAS per phenotype” to a more realistic model of “one cohort - multiple GWAS per phenotype.” In addition to the cohort stratifications currently implemented in PheWeb 2, its framework can be utilized for cross-comparisons of various non-linear association models, among others. However, the utility of any interactive web-based tool is contingent on the quality of the underlying summary statistics. Users must remain mindful of potential biases in the input data, including stratification imbalances, suboptimal phenotype definitions, or residual population structure. In summary, PheWeb 2 provides an essential, intuitive platform for exploring the differential genetic basis of complex traits in large-scale datasets, aligning with the critical need to understand sex and ancestry as key variables in the genetic architecture of human health and disease. Online Methods Inclusion and ethics statement This study has been approved by the Ethics Committee of the Montreal Heart Institute, project number 2023-3206. This work carried out in this study was approved by the Canadian Longitudinal Study on Aging, project number 23ME002. Genotype data quality control Data from the CLSA comprehensive cohort at the baseline assessment were used. Specifically, 794,409 genotyped single nucleotide polymorphisms (SNPs) or short indels aligned on GRCh37 assessed on the Affymetrix UK Biobank Axiom array were available from 26,622 study participants. Before proceeding with further analysis, we applied quality control measures at the level of individuals and genetic variants as previously described. 9 In brief, variants were initially filtered out based on significant deviation from Hardy-Weinberg equilibrium (p < 3.15×10 -6 ), significant differences in genotype frequencies between five genotyping batches after accounting for genetic ancestry (Fisher’s exact test p 0.05 in at least one control sample across all batches. Lastly, variants that were found to be discordant between sexes in at least one genotyping batch were removed (Fisher’s exact test p < 3.15×10 -15 ). In total, this removed 37,706 variants. Individuals were excluded from further analyses if they were outliers based on heterozygosity and genotype missingness while accounting for their genetic ancestry, or if their self-reported sex did not match chromosomal sex (i.e. to exclude possible sample swaps). Individuals with sex chromosome aneuploidy were also excluded from further analysis due to limitations of downstream statistical methods. In total, 63 samples and 14,682 private genetic variants were excluded. As a result, the filtered genotype data included 26,563 individuals and 742,021 genetic variants. We lifted the filtered data to the GRCh38 human reference genome build using the UCSC LiftOver tool and excluded short indels and variants that did not map to GRCh38 or mapped to alternate contigs of GRCh38, were palindromic, or that had alleles which didn’t match the reference genome after position lifting. 14 This step resulted in 682,191 genetic variants in the final filtered dataset on the GRCh38 human reference genome build, which were used in subsequent analyses. Genotype imputation We imputed the filtered genotype array data on the GRCh38 human reference genome build using the TOPMed r3 panel through the TOPMed Imputation Server. 3 This panel includes 133,597 deeply genotyped samples and 445,600,184 unique variants across the autosomes and chromosome X. As input for the TOPMed Imputation Server, we split the genotype array data into two overlapping batches, each containing 25,000 randomly allocated individuals, and imputed each batch separately. We merged the imputed batches using hds-util (https://github.com/statgen/hds-util.git). Specifically, we used imputed genotype dosages for 25,000 individuals from the first batch and imputed genotype dosages for the remaining 1,563 individuals from the second batch. We represented genotype dosages for males in the non-pseudo-autosomal regions of chromosome X as haploids (i.e. imputed genotype dosage ≤ 1). 3,15,16 As a result of imputation, a total of 445,125,610 variants across the autosomes or chromosome X were available, leaving 138,546,712 after filtering for imputation quality r 2 ≥0.3. Stratification by estimated genetic ancestry We used LASER v2.04 17 to project genotyping array data on 26,563 CLSA study individuals into the 20-dimensional principal component analysis (PCA) space of the combined Human Genome Diversity Project and 1000 Genomes Project reference panel (HGDP+1000G). 18,19 Then, following the example from GnomAD v4, 20 we used a random forest model, trained on HGDP+1000G genetic ancestry labels, to predict the genetic ancestry labels of the projected CLSA study individuals. The 24,505 CLSA study individuals with ≥0.79 assignment probability to the European genetic ancestry label and genetically similar to its 10 closest neighbors from the reference panel (LASER’s Z score absolute value < 5) were included in the European stratification group. Relatedness and population structure modelling We used genotyping array data to account for the relatedness between individuals and for population stratification in our genome-wide association models. We selected only biallelic SNPs with minor allele frequency (MAF) >1%, Hardy-Weinberg test p < 10 -15 , per-variant and per-sample missing genotype frequencies < 10%. We further pruned SNPs based on the linkage disequilibrium (LD) using the PLINK2 21 –indeppairwise option with the window size of 1000 kbp, step size of 100 SNPs, and r 2 LD threshold of 0.9 (i.e. two SNPs with r 2 > 0.9 were considered in LD). We also removed SNPs in known long-range LD regions 22 and low-complexity regions defined in the UCSC Genome Browser (RepeatMasker and WM+SDust tracks). We performed principal component analyses (PCA) on pruned genotyping array data using the PLINK2 --pca option and extracted the top 20 principal components. We performed this variant pruning procedure and PCA on genotyping array data separately for all 26,563 individuals and 24,505 individuals in European-genetic ancestry stratification, resulting in 339,020 and 338,934 independent SNPs, respectively. Phenotypes From the CLSA comprehensive cohort baseline data, we retained 828 continuous traits, considered as those with more than 20 unique non-missing values. 23-25 Individuals with outlying phenotypic values (values above or below five standard deviations from the mean) were subsequently removed. Then, only 587 phenotypes with more than 1000 non-missing values were considered. Lasty, 226 total traits (of which one was sex-specific, age of menopause onset) were retained for analysis after manual curation and removal of redundance. These criteria were applied to the European genetic ancestry subset for the three stratifications (sex-combined, female-specific and male-specific), and then the same set of traits was assessed in the all ancestries analysis. Traits were classified into four categories (behavior, blood measure, health, physical measure) for PheWeb’s PheWAS view, based on the CLSA Data Preview Portal assignments. Single variant association testing To perform single variant genome-wide association testing, we employed Regenie v4.1 26 on all traits across the six stratifications: All Combined (N = 26,559), European Combined (N = 24,505), All females (N = 13,240), European females (N = 12,324), All males (N = 13,319), European males (N =12,181). 26 If an individual had missing information for a trait, then it was excluded from the analysis for that trait, leading to a small variation in sample sizes across tested traits. We applied rank inverse normal transformation for all continuous traits using the built-in Regenie option during the estimation of polygenic effects on each trait and the association testing. To estimate polygenic effects, we used pruned genotyping array data that corresponded to the stratification. We included the following covariates as well: the first 20 principal components corresponding to the stratification, genotyping batch (1-5), sex, participant age at data collection site visit and age squared to account for possible non-linear relationships. For sex-stratified analyses, sex was not included as a covariate. For sex-combined analyses, we estimated genotype-by-sex interaction effects using the built-in Regenie’s --interaction option. Variants with the minor allele count (MAC) < 5 were not used in the single variant association tests as recommended in Regenie’s documentation. Counting overlapping significant GWAS loci between stratifications For this analysis, we used only tested variants with the effect allele frequency (EAF) > 0.01 that showed statistically significant association with a phenotype (p < 5×10 -8 ). For each phenotype within each stratification, we first defined significant GWAS loci as 500 Kbp regions around each lead variant (the variant with the most significant association). Then, we merged any overlapping loci within the same phenotype and stratification into a single locus (chromosome X PAR regions were treated separately from non-PAR). This resulted in a set of non-overlapping loci of ≥1 Mbp in length for each phenotype within each stratification. When comparing loci for a phenotype between two stratifications (e.g. female vs male), we declared two or more loci equivalent if they overlapped (i.e. even if the corresponding lead variants were different). CLSA PheWeb instance We deployed an instance of PheWeb for the CLSA study on a single Linux-based virtual machine (8 vCPUs and 32 GB of memory) on the Secure Data for Health (SD4H, http://www.sd4health.ca/) cloud, which uses an open-source OpenStack 27 cloud computing infrastructure software. Terabytes of compressed GWAS summary-level results are stored and queried by PheWeb 2 from the cloud object store using the S3 protocol, reducing its disk use footprint. Only genotype-phenotype association results for variants with genotype imputation quality r 2 >0.3 were ingested into the CLSA PheWeb instance. The sex-by-genotype interaction results were ingested into the CLSA PheWeb instance only for genetic variants with at least 160 alleles in the study population. Declarations Data Availability Individual-level data are available from the Canadian Longitudinal Study on Aging (http://www.clsa-elcv.ca) for researchers who meet the criteria for access to de-identified CLSA data. The presented summary-level data are freely available for querying and download through CLSA PheWeb 2 at https://clsa-pheweb.cerc-genomic-medicine.ca. Code Availability PheWeb 2’s codebase is open source and hosted on GitHub at https://github.com/GaglianoTaliun-Lab/PheWeb2 and the corresponding API at https://github.com/GaglianoTaliun-Lab/PheWeb2-api. The code used to generate specific results described in this manuscript is at https://github.com/GaglianoTaliun-Lab/PheWeb2-manuscript. Acknowledgements The authors thank Andrew P Boughton and Ryan P Welch for their contributions to LocusZoom features. The authors acknowledge Yann Ilboudo, Divya Joshi, Joosung Min, Olga Vishnyakova, and Satoshi Yoshiji, as well as Isabel Fortier and Rita Wissa (from Maelstrom Research), for discussion and feedback on important aspects of this work. The AB SCREEN™ II assessment tool is owned by Dr. Heather Keller. Use of the AB SCREEN™ II assessment tool was made under license from the University of Guelph. This research was made possible using the data/biospecimens collected by the Canadian Longitudinal Study on Aging (CLSA). Funding for the Canadian Longitudinal Study on Aging (CLSA) is provided by the Government of Canada through the Canadian Institutes of Health Research (CIHR) under grant reference: LSA 94473 and the Canada Foundation for Innovation, as well as the following provinces, Newfoundland, Nova Scotia, Quebec, Ontario, Manitoba, Alberta, and British Columbia. This research has been conducted using the CLSA datasets: baseline comprehensive dataset – version 7.0 and Genome-wide genetic data – version 3.0, under Application Number 23ME002. The CLSA is led by Drs. Parminder Raina, Christina Wolfson and Susan Kirkland. The time and commitment of the participants to the CLSA study platform is gratefully acknowledged, without whom this research would not be possible. The opinions expressed in this manuscript are the author's own and do not reflect the views of the Canadian Longitudinal Study on Aging. This research was enabled in part by support provided by Calcul Quebec (https://www.calculquebec.ca), Digital Research Alliance of Canada (https://www.alliancecan.ca), and Secure Data for Health (SD4H, http://www.sd4health.ca/). SAGT and DT acknowledge support from the Canadian Institutes of Health Research (CIHR) (AD6-192920, PJT-197954 and AD7-200181). SAGT acknowledges salary support from a Fonds de Recherche du Québec – Santé (FRQS; https://frq.gouv.qc.ca) Junior 2 Award (https://doi.org/10.69777/347366). DT acknowledges the support of the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant (RGPIN-2025-06572). JB and MK were supported by CIHR Canada Graduate Scholarships – Master's (CSG M) awards. Author Contributions JB, HX, LC, MK, SW, PV, SAGT and DT performed analyses. AJM curated the selection of DXA-related phenotypes. PR contributed to data curation. SAGT and DT supervised the work, secured funding and ensured that the necessary computational resources were available. All authors approved the final manuscript. Competing Interests The authors declare no competing interests. References All of Us Research Program Genomics, I. Genomic data in the All of Us Research Program. Nature 627 , 340-346 (2024). https://doi.org:10.1038/s41586-023-06957-x Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562 , 203-209 (2018). https://doi.org:10.1038/s41586-018-0579-z Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590 , 290-299 (2021). https://doi.org:10.1038/s41586-021-03205-y Zhang, X. et al. Whole genome sequencing analysis of body mass index identifies novel African ancestry-specific risk allele. Nat Commun 16 , 3470 (2025). https://doi.org:10.1038/s41467-025-58420-2 Yang, M. L. et al. Sex-specific genetic architecture of blood pressure. Nat Med 30 , 818-828 (2024). https://doi.org:10.1038/s41591-024-02858-2 Silveira, P. P., Pokhvisneva, I., Howard, D. M. & Meaney, M. J. A sex-specific genome-wide association study of depression phenotypes in UK Biobank. Mol Psychiatry 28 , 2469-2479 (2023). https://doi.org:10.1038/s41380-023-01960-0 Gagliano Taliun, S. A. et al. Exploring and visualizing large-scale genetic associations by using PheWeb. Nat Genet 52 , 550-552 (2020). https://doi.org:10.1038/s41588-020-0622-5 Raina, P. et al. Cohort Profile: The Canadian Longitudinal Study on Aging (CLSA). Int J Epidemiol 48 , 1752-1753j (2019). https://doi.org:10.1093/ije/dyz173 Forgetta, V. et al. Cohort profile: genomic data for 26 622 individuals from the Canadian Longitudinal Study on Aging (CLSA). BMJ Open 12 , e059021 (2022). https://doi.org:10.1136/bmjopen-2021-059021 Wheeler, E. et al. Impact of common genetic determinants of Hemoglobin A1c on type 2 diabetes risk and diagnosis in ancestrally diverse populations: A transethnic genome-wide meta-analysis. PLoS Med 14 , e1002383 (2017). https://doi.org:10.1371/journal.pmed.1002383 Willems, S. M. et al. Large-scale exome array summary statistics resources for glycemic traits to aid effector gene prioritization. Wellcome Open Res 8 , 483 (2023). https://doi.org:10.12688/wellcomeopenres.18754.1 Moon, J. Y. et al. A Genome-Wide Association Study Identifies Blood Disorder-Related Variants Influencing Hemoglobin A(1c) With Implications for Glycemic Status in U.S. Hispanics/Latinos. Diabetes Care 42 , 1784-1791 (2019). https://doi.org:10.2337/dc19-0168 Chen, J. et al. The trans-ancestral genomic architecture of glycemic traits. Nat Genet 53 , 840-860 (2021). https://doi.org:10.1038/s41588-021-00852-9 Perez, G. et al. The UCSC Genome Browser database: 2025 update. Nucleic Acids Res 53 , D1243-D1249 (2025). https://doi.org:10.1093/nar/gkae974 Das, S. et al. Next-generation genotype imputation service and methods. Nat Genet 48 , 1284-1287 (2016). https://doi.org:10.1038/ng.3656 Loh, P. R. et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat Genet 48 , 1443-1448 (2016). https://doi.org:10.1038/ng.3679 Taliun, D. et al. LASER server: ancestry tracing with genotypes or sequence reads. Bioinformatics 33 , 2056-2058 (2017). https://doi.org:10.1093/bioinformatics/btx075 Koenig, Z. et al. A harmonized public resource of deeply sequenced diverse human genomes. Genome Res 34 , 796-809 (2024). https://doi.org:10.1101/gr.278378.123 Wang, C., Zhan, X., Liang, L., Abecasis, G. R. & Lin, X. Improved ancestry estimation for both genotyping and sequencing data using projection procrustes analysis and genotype imputation. Am J Hum Genet 96 , 926-937 (2015). https://doi.org:10.1016/j.ajhg.2015.04.018 Chen, S. et al. A genomic mutational constraint map using variation in 76,156 human genomes. Nature 625 , 92-100 (2024). https://doi.org:10.1038/s41586-023-06045-0 Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4 , 7 (2015). https://doi.org:10.1186/s13742-015-0047-8 Anderson, C. A. et al. Data quality control in genetic case-control association studies. Nat Protoc 5 , 1564-1573 (2010). https://doi.org:10.1038/nprot.2010.116 Teng, E. The Mental Alternations Test (MAT). The Clinical Neuropsychologist 9 , 287 (1995). O'Connell, M. E. et al. Methodological considerations when establishing reliable and valid normative data: Canadian Longitudinal Study on Aging (CLSA) neuropsychological battery. Clin Neuropsychol 36 , 2168-2187 (2022). https://doi.org:10.1080/13854046.2021.1954243 Kessler, R. C. et al. Screening for serious mental illness in the general population. Arch Gen Psychiatry 60 , 184-189 (2003). https://doi.org:10.1001/archpsyc.60.2.184 Mbatchou, J. et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat Genet 53 , 1097-1103 (2021). https://doi.org:10.1038/s41588-021-00870-7 Sefraoui, O. A., M.; Eleuldj, M. OpenStack: Toward an Open-Source Solution for Cloud Computing. International Journal of Computer Applications 55 , 38-42 (2012). Additional Declarations There is NO Competing Interest. Supplementary Files SupplementaryInformation.pdf Supplementary Material Cite Share Download PDF Status: Published Journal Publication published 19 Jan, 2026 Read the published version in Nature Genetics → Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7463215","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Brief Communication","associatedPublications":[],"authors":[{"id":505847991,"identity":"8993e64e-9bb6-4fc1-b3bd-83a9d129cfa0","order_by":0,"name":"Justin Bellavance*","email":"","orcid":"https://orcid.org/0009-0006-8828-4942","institution":"Université de Montréal","correspondingAuthor":false,"prefix":"","firstName":"Justin","middleName":"","lastName":"Bellavance*","suffix":""},{"id":505847992,"identity":"150ef778-5d22-4b41-ab82-55ea3949c3b8","order_by":1,"name":"Hongyu Xiao*","email":"","orcid":"","institution":"McGill University","correspondingAuthor":false,"prefix":"","firstName":"Hongyu","middleName":"","lastName":"Xiao*","suffix":""},{"id":505847993,"identity":"cf208010-9960-49f1-bfc6-c7e01c91dbae","order_by":2,"name":"Le Chang","email":"","orcid":"","institution":"Université de Montréal","correspondingAuthor":false,"prefix":"","firstName":"Le","middleName":"","lastName":"Chang","suffix":""},{"id":505847994,"identity":"b68f8184-be33-4565-acdc-6391591b524c","order_by":3,"name":"Mehrdad Kazemi","email":"","orcid":"","institution":"McGill University","correspondingAuthor":false,"prefix":"","firstName":"Mehrdad","middleName":"","lastName":"Kazemi","suffix":""},{"id":505847995,"identity":"334414b7-e3bd-4e74-8268-5615217ba9bb","order_by":4,"name":"Seyla Wickramasinghe","email":"","orcid":"","institution":"McGill University","correspondingAuthor":false,"prefix":"","firstName":"Seyla","middleName":"","lastName":"Wickramasinghe","suffix":""},{"id":505847996,"identity":"b65eddd6-8dd8-4005-9a77-12ec2a614126","order_by":5,"name":"Alexandra J. Mayhew","email":"","orcid":"","institution":"McMaster University","correspondingAuthor":false,"prefix":"","firstName":"Alexandra","middleName":"J.","lastName":"Mayhew","suffix":""},{"id":505847997,"identity":"87ed26bb-1837-457b-87b1-af3e46c35a10","order_by":6,"name":"Parminder Raina","email":"","orcid":"https://orcid.org/0000-0002-8107-3193","institution":"McMaster University","correspondingAuthor":false,"prefix":"","firstName":"Parminder","middleName":"","lastName":"Raina","suffix":""},{"id":505847998,"identity":"44ec44e1-ba2d-4db8-998e-baa9b35c0d34","order_by":7,"name":"Peter VandeHaar","email":"","orcid":"","institution":"University of Michigan","correspondingAuthor":false,"prefix":"","firstName":"Peter","middleName":"","lastName":"VandeHaar","suffix":""},{"id":505847999,"identity":"36db5b66-2496-44db-a50c-7aa118abdfd4","order_by":8,"name":"Daniel Taliun","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABA0lEQVRIiWNgGAWjYDCCAyBUAGSwMzYwMFSAhBgbDzwgqMWAQYKBGaTlDFhLw4EEAloYIFpAituggvi08B0/e/DABwOGOv5m5jbpwnmHo/n7F4NssZPHpUXyTF7CwRlAWyQOM7ZJz9x2OHfGjYcgLcmGDTi0GBzIMTjMA3IYSAsvUMsGiYMgLQcYcWo5/8bg8B+gFnmwljkILfY4tdwA2gLyvgFYSwNQC38jWEsiLi2SN94YHOwxkJDceJix2ZrnWDrQL6BANkhOxqWF73yO8YcfFTb8csfbH97mqbHO7e8//vDBhwo7W1xaoEACmZ0AcjB+9WiA/wBJykfBKBgFo2D4AwDEHWF1LfNHEwAAAABJRU5ErkJggg==","orcid":"","institution":"McGill University","correspondingAuthor":true,"prefix":"","firstName":"Daniel","middleName":"","lastName":"Taliun","suffix":""},{"id":505847990,"identity":"5a36a4af-b6cd-4b85-aa8d-569e240cf861","order_by":9,"name":"Sarah A. Gagliano Taliun","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABRklEQVRIie3PMUvDQBQH8HccxOVK1icHva8Q6SJEcfBjuFgK7VSoW4YaIoFk0qwJFPsV2sW5EEiXuAe6WAqdCwURKuo1KdWgcRa8/3C8u3c/3h2AisofDc3XA+LAZ2Htu8fVhBYEiyLdN/AXUmwwL4hXTYT/OF/34FqIc3qz7llg6y71+NV9fKHzIFmt+gi6P/lKjLTT4CFMj8YxcXmYAmJMPB49xN1oENMoTBAwvSwRaANnkJCxSxxe894RtqQmyShrUco0lHdKRARLupHkTBJ3U/MARU4GO/L6Jon+VPpM1tbklH5zSLc3JTFy4uwI8STB8sOypWYyY9Iaya7JUjjcfspkSacbha0Gub1DhllpigjadMYs+3To+wtZgF6f+vMZ65vdAJsLeHk+qetBacpuVgzG5PtxEfbzsQ3CqSIqKioq/z4fPDBskxDLnfgAAAAASUVORK5CYII=","orcid":"https://orcid.org/0000-0003-1306-1868","institution":"Université de Montréal","correspondingAuthor":true,"prefix":"","firstName":"Sarah","middleName":"A. Gagliano","lastName":"Taliun","suffix":""}],"badges":[],"createdAt":"2025-08-26 13:06:13","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7463215/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7463215/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41588-025-02469-8","type":"published","date":"2026-01-19T05:00:00+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":90965848,"identity":"11c4cec5-b511-4219-923a-f11c7f61f928","added_by":"auto","created_at":"2025-09-10 06:36:59","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":1138833,"visible":true,"origin":"","legend":"\u003cp\u003eScreenshots of CLSA PheWeb 2 as of June 2025. \u003cstrong\u003ea\u003c/strong\u003e) Miami plot view for GWAS results of non-high-density lipoprotein cholesterol in blood (non HDL) in individuals of all genetic ancestries. The upper portion displays GWAS results in the female-only stratification (N = 13,187), whereas the lower portion displays male-specific GWAS results for the male-only stratification (N =13,205). \u003cstrong\u003eb\u003c/strong\u003e) Manhattan plot view for sex-by-genotype interaction genome-wide testing results. \u003cstrong\u003ec\u003c/strong\u003e) Regional view (LocusZoom) for variant rs7412 for female, all ancestries (top) and male, all ancestries (bottom) stratifications, with known variants within the NHGRI-EBI GWAS Catalog presented in the track above (“Hits in GWAS Catalog”), and nearby genes displayed below. \u003cstrong\u003ed\u003c/strong\u003e) Phenome-wide view (PheWAS) for variant rs7412, showing phenome-wide significant associations with other related traits assessed in CLSA such as high-density lipoprotein cholesterol (HDL) or low-density lipoprotein cholesterol (LDL), among other traits. Traits are sorted and colored according to categories from the CLSA’s Maelstrom portal (health, blood measurements, behavior and physical traits). The direction of the effect of the tested\u003cstrong\u003e \u003c/strong\u003eallele with each trait is exhibited by upward-facing (trait-increasing) or downward facing (trait-decreasing) triangles.\u003c/p\u003e","description":"","filename":"figure1manuscript3.png","url":"https://assets-eu.researchsquare.com/files/rs-7463215/v1/e12e8576b7bc5a5a9c44f73b.png"},{"id":100662318,"identity":"90612d97-c8f4-4155-ba1e-cf83340023ad","added_by":"auto","created_at":"2026-01-20 08:58:51","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1655249,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7463215/v1/90d45115-d424-4340-966b-20a743a191ef.pdf"},{"id":90965847,"identity":"760015f6-0866-4527-8d75-7bdf8c1d19c6","added_by":"auto","created_at":"2025-09-10 06:36:59","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":1033575,"visible":true,"origin":"","legend":"Supplementary Material","description":"","filename":"SupplementaryInformation.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7463215/v1/9cd6f9e90e288c069b148a6b.pdf"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"Exploring and visualizing stratified genome-wide association study results with PheWeb 2","fulltext":[{"header":"Introduction","content":"\u003cp\u003eThe increasing scale and diversity of population-based biobanks now enable genome-wide association studies (GWAS) in hundreds of thousands of individuals.\u003csup\u003e1-3\u003c/sup\u003e This scale enables stratified analyses to uncover sex- or ancestry-differential genetic effects (e.g. \u003csup\u003e4-6\u003c/sup\u003e). However, a critical bottleneck remains: the lack of functionalities in existing interactive web-based tools for intuitive exploration and comparison of stratified GWAS summary-level data, which hinders the interpretation of results into biological insights. Our original PheWeb, for instance, has facilitated biobank-scale exploration of GWAS results and has been adopted by numerous international consortia.\u003csup\u003e7\u003c/sup\u003e Here, we introduce PheWeb 2, a complete rewrite of the software designed to support the next generation of genetic discoveries. Within a single interactive environment, PheWeb 2 enables researchers to integrate and query GWAS results not only across hundreds of phenotypes but also across numerous cohort stratifications.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eKey functionality enhancements in PheWeb 2 include interactive Miami plots and stacked LocusZoom and PheWAS plots, where researchers can dynamically select GWAS stratifications of interest for visual side-by-side comparison. Each plot is synchronized with the accompanying dynamic table, which in real-time joins variant-level association results across any selected pair of stratifications. When exploring Miami plots, the analyst can instantly switch between views displaying statistics of genotype-phenotype associations and sex-by-genotype interactions. The addition of this new interaction feature facilitates an integrated dynamic workflow for investigating potential differential genetic effects across GWAS stratifications of the same trait and across the traits within a single PheWeb 2 instance.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eTo exemplify the utility of PheWeb 2, we applied it to visualize and query stratified GWAS results for human traits using data from the Canadian Longitudinal Study on Aging (CLSA).\u003csup\u003e8,9\u003c/sup\u003e In brief, the CLSA is a national, longitudinal research platform that included 51,338 participants aged 45 to 85 years at baseline from the 10 Canadian provinces, of which 30,097 from 7 provinces (comprehensive cohort) completed in-person interviews including physical assessments, blood and urine samples, and body composition assessed using dual energy x-ray absorptiometry (DXA).\u003csup\u003e8\u003c/sup\u003e Details regarding the study design have been previously published. Over 26,000 participants from the CLSA comprehensive cohort had genetic data available at baseline.\u003csup\u003e9\u003c/sup\u003e \u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe CLSA PheWeb 2 instance displays genotype-phenotype association results for 48,784,824 well-imputed (imputation quality r\u003csup\u003e2\u003c/sup\u003e\u0026gt;0.3) autosomal and chromosome X variants with at least five minor alleles, and sex-by-genotype interaction results for 11,274,702 variants with at least 160 minor alleles in the study population. In total, we analyzed 226 unique traits across six stratifications, of which 224 had at least one genome-wide significant locus in at least one stratification (p \u0026lt; 5\u0026times;10\u003csup\u003e-8\u003c/sup\u003e). Details on quality control and analyses are provided in the \u003cstrong\u003eOnline Methods\u003c/strong\u003e. \u0026nbsp;\u003c/p\u003e\n\u003cp\u003eWe showcase how PheWeb 2 facilitates the discovery of sex-differential genetic effects using sex-stratified results of non-high-density lipoprotein (non HDL) cholesterol levels as an illustrative example (\u003cstrong\u003eFigure 1a, Supplementary Figure 1\u003c/strong\u003e). Variants in the \u003cem\u003eAPOE\u003c/em\u003e gene, such as missense variant rs7412, were significantly associated in female and male stratifications for non HDL in CLSA participants from all genetic ancestries (females: N = 13,187, p = 3.20\u0026times;10\u003csup\u003e-66\u003c/sup\u003e, \u0026beta;[SE] = -0.37[0.022]; males: N = 13,205, p = 1.00\u0026times;10\u003csup\u003e-15\u003c/sup\u003e, \u0026beta;[SE] = -0.17[0.022]) (\u003cstrong\u003eFigure 1a\u003c/strong\u003e). When investigating the sex-by-genotype interaction results for non HDL via PheWeb 2\u0026rsquo;s new interaction Manhattan plot (\u003cstrong\u003eFigure 1b\u003c/strong\u003e), the T allele rs7412 was identified as being differentially associated in both the female and male stratifications (p = 1.1\u0026times;10\u003csup\u003e-8\u003c/sup\u003e, \u0026beta;[SE] = -0.19[0.033]). Stacked regional LocusZoom plots in PheWeb 2 (\u003cstrong\u003eFigure 1c\u003c/strong\u003e) allowed for intuitive comparison between two selected stratifications and a quick deduction that rs7412 is in linkage disequilibrium (r\u003csup\u003e2\u003c/sup\u003e \u0026gt; 0.8) with rs1065853. The integration of the GWAS Catalog track above the stacked regional plots assuredly showed that there are many known trait associations within this region, such as Alzheimer\u0026rsquo;s disease, blood protein levels, cerebral amyloid deposit (PET imaging) and low-density lipoprotein (LDL) cholesterol levels. When looking at PheWeb 2\u0026rsquo;s stacked PheWAS plots for the all ancestries female stratification and the all ancestries male stratification for rs7412, this variant shows evidence of association with other related continuous traits assessed in CLSA PheWeb 2, including total cholesterol (females: p = 1.80\u0026times;10\u003csup\u003e-42\u003c/sup\u003e; males: p = 2.00\u0026times;10\u003csup\u003e-12\u003c/sup\u003e), low-density lipoprotein (females: p = 2.00\u0026times;10\u003csup\u003e-95\u003c/sup\u003e; males: p = 5.90\u0026times;10\u003csup\u003e-39\u003c/sup\u003e), and triglycerides (females: p = 1.10\u0026times;10\u003csup\u003e-6\u003c/sup\u003e; males: p = 2.40\u0026times;10\u003csup\u003e-17\u003c/sup\u003e), among others (\u003cstrong\u003eFigure 1d\u003c/strong\u003e).\u0026nbsp;In total, our sex-stratified comparison allowed us to highlight\u0026nbsp;227 significant loci unique to the female stratification and 263 significant loci unique to the male stratification. Among the 277 significant loci that overlapped between the female and male stratifications, none were located on the X chromosome. Our results highlight the importance of assessing sex-differential effects genome-wide, which could have important implications for personalized medicine.\u003c/p\u003e\n\u003cp\u003eWhen using PheWeb 2 to cross-compare results from ancestry stratifications, the genetic associations with glycosylated hemoglobin (HbA1c) levels were the most prominent example in CLSA (\u003cstrong\u003eSupplementary Figure 2\u003c/strong\u003e). The T allele of missense variant rs1050828 within the \u003cem\u003eG6PD\u0026nbsp;\u003c/em\u003egene on chromosome X was associated with decreased levels of HbA1c in the analysis considering all genetic ancestries (p = 5.90\u0026times;10\u003csup\u003e-11\u003c/sup\u003e, \u0026beta;[SE] = -0.91[0.14]), which agrees with findings from published GWAS of non-European ancestries or multi-ancestry meta-analyses.\u003csup\u003e10-13\u003c/sup\u003e The accompanying interactive table (\u003cstrong\u003eSupplementary Figure 3\u003c/strong\u003e) shows that the T allele was very rare in the CLSA combined genetic ancestries analysis (allele frequency 0.1%) and was absent in the European genetic ancestry stratification. The PheWeb 2\u0026rsquo;s stacked regional plots for this locus did not show evidence of association in the European genetic ancestry stratifications (\u003cstrong\u003eSupplementary Figure 4\u003c/strong\u003e). This result exemplifies the importance of routinely including chromosome X variants as well as study participants of non-European genetic ancestries, even in studies like the CLSA that are largely (\u0026gt;90%) European, into GWAS.\u003c/p\u003e\n\u003cp\u003eBy making stratified genotype-phenotype and sex-by-genotype interaction results readily explorable and cross-comparable, our extension and reworking of PheWeb into PheWeb 2 enables researchers to systematically identify and contextualize contrasting results from many GWAS within a single web browser tab. Overall, PheWeb 2 transitions from the paradigm of \u0026ldquo;one cohort - one GWAS per phenotype\u0026rdquo; to a more realistic model of \u0026ldquo;one cohort - multiple GWAS per phenotype.\u0026rdquo; In addition to the cohort stratifications currently implemented in PheWeb 2, its framework can be utilized for cross-comparisons of various non-linear association models, among others. However, the utility of any interactive web-based tool is contingent on the quality of the underlying summary statistics. Users must remain mindful of potential biases in the input data, including stratification imbalances, suboptimal phenotype definitions, or residual population structure. In summary, PheWeb 2 provides an essential, intuitive platform for exploring the differential genetic basis of complex traits in large-scale datasets, aligning with the critical need to understand sex and ancestry as key variables in the genetic architecture of human health and disease.\u003c/p\u003e"},{"header":"Online Methods","content":"\u003cp\u003e\u003cem\u003eInclusion and ethics statement\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eThis study has been approved by the Ethics Committee of the Montreal Heart Institute, project number 2023-3206. This work carried out in this study was approved by the Canadian Longitudinal Study on Aging, project number 23ME002. \u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eGenotype data quality control\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eData from the CLSA comprehensive cohort at the baseline assessment were used. Specifically, 794,409 genotyped single nucleotide polymorphisms (SNPs) or short indels aligned on GRCh37 assessed on the Affymetrix UK Biobank Axiom array were available from 26,622 study participants.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eBefore proceeding with further analysis, we applied quality control measures at the level of individuals and genetic variants as previously described.\u003csup\u003e9\u003c/sup\u003e In brief, variants were initially filtered out based on significant deviation from Hardy-Weinberg equilibrium \u0026nbsp;(p \u0026lt; 3.15\u0026times;10\u003csup\u003e-6\u003c/sup\u003e), significant differences in genotype frequencies between five genotyping batches after accounting for genetic ancestry (Fisher\u0026rsquo;s exact test p \u0026lt; 3.15 \u0026times;10\u003csup\u003e-10\u003c/sup\u003e), and genotype discordance \u0026gt;0.05 in at least one control sample across all batches. Lastly, variants that were found to be discordant between sexes in at least one genotyping batch were removed (Fisher\u0026rsquo;s exact test p \u0026lt; 3.15\u0026times;10\u003csup\u003e-15\u003c/sup\u003e). In total, this removed 37,706 variants.\u003c/p\u003e\n\u003cp\u003eIndividuals were excluded from further analyses if they were outliers based on heterozygosity and genotype missingness while accounting for their genetic ancestry, or if their self-reported sex did not match chromosomal sex (i.e. to exclude possible sample swaps). Individuals with sex chromosome aneuploidy were also excluded from further analysis due to limitations of downstream statistical methods. In total, 63 samples and 14,682 private genetic variants were excluded. As a result, the filtered genotype data included 26,563 individuals and 742,021 genetic variants. We lifted the filtered data to the GRCh38 human reference genome build using the UCSC LiftOver tool and excluded short indels and variants that did not map to GRCh38 or mapped to alternate contigs of GRCh38, were palindromic, or that had alleles which didn\u0026rsquo;t match the reference genome after position lifting.\u003csup\u003e14\u003c/sup\u003e This step resulted in 682,191 genetic variants in the final filtered dataset on the GRCh38 human reference genome build, which were used in subsequent analyses.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eGenotype imputation\u0026nbsp;\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eWe imputed the filtered genotype array data on the GRCh38 human reference genome build using the TOPMed r3 panel through the TOPMed Imputation Server.\u003csup\u003e3\u003c/sup\u003e This panel includes 133,597 deeply genotyped samples and 445,600,184 unique variants across the autosomes and chromosome X. As input for the TOPMed Imputation Server, we split the genotype array data into two overlapping batches, each containing 25,000 randomly allocated individuals, and imputed each batch separately. We merged the imputed batches using hds-util (https://github.com/statgen/hds-util.git). Specifically, we used imputed genotype dosages for 25,000 individuals from the first batch and imputed genotype dosages for the remaining 1,563 individuals from the second batch. We represented genotype dosages for males in the non-pseudo-autosomal regions of chromosome X as haploids (i.e. imputed genotype dosage \u0026le; 1).\u003csup\u003e3,15,16\u003c/sup\u003e As a result of imputation, a total of 445,125,610 variants across the autosomes or chromosome X were available, leaving 138,546,712 after filtering for imputation quality r\u003csup\u003e2\u003c/sup\u003e\u0026ge;0.3. \u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eStratification by estimated genetic ancestry\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eWe used LASER v2.04\u003csup\u003e17\u003c/sup\u003e to project genotyping array data on 26,563 CLSA study individuals into the 20-dimensional principal component analysis (PCA) space of the combined Human Genome Diversity Project and 1000 Genomes Project reference panel (HGDP+1000G).\u003csup\u003e18,19\u003c/sup\u003e Then, following the example from GnomAD v4,\u003csup\u003e20\u003c/sup\u003e we used a random forest model, trained on HGDP+1000G genetic ancestry labels, to predict the genetic ancestry labels of the projected CLSA study individuals. \u0026nbsp;The 24,505 CLSA study individuals with \u0026ge;0.79 assignment probability to the European genetic ancestry label and genetically similar to its 10 closest neighbors from the reference panel (LASER\u0026rsquo;s Z score absolute value \u0026lt; 5) were included in the European stratification group.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eRelatedness and population structure modelling\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eWe used genotyping array data to account for the relatedness between individuals and for population stratification in our genome-wide association models. We selected only biallelic SNPs with minor allele frequency (MAF) \u0026gt;1%, Hardy-Weinberg test p \u0026lt; 10\u003csup\u003e-15\u003c/sup\u003e, per-variant and per-sample missing genotype frequencies \u0026lt; 10%. We further pruned SNPs based on the linkage disequilibrium (LD) using the PLINK2\u003csup\u003e21\u003c/sup\u003e \u0026ndash;indeppairwise option with the window size of 1000 kbp, step size of 100 SNPs, and r\u003csup\u003e2\u003c/sup\u003e LD threshold of 0.9 (i.e. two SNPs with r\u003csup\u003e2\u0026nbsp;\u003c/sup\u003e\u0026gt; 0.9 were considered in LD). We also removed SNPs in known long-range LD regions\u003csup\u003e22\u003c/sup\u003e and low-complexity regions defined in the UCSC Genome Browser (RepeatMasker and WM+SDust tracks). We performed principal component analyses (PCA) on pruned genotyping array data using the PLINK2 --pca option and extracted the top 20 principal components. We performed this variant pruning procedure and PCA on genotyping array data separately for all 26,563 individuals and 24,505 individuals in European-genetic ancestry stratification, resulting in 339,020 and 338,934 independent SNPs, respectively.\u003cem\u003e\u0026nbsp;\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003e\u003cem\u003ePhenotypes\u0026nbsp;\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eFrom the CLSA comprehensive cohort baseline data, we retained 828 continuous traits, considered as those with more than 20 unique non-missing values.\u003csup\u003e23-25\u003c/sup\u003e Individuals with outlying phenotypic values (values above or below five standard deviations from the mean) were subsequently removed. Then, only 587 phenotypes with more than 1000 non-missing values were considered. Lasty, 226 total traits (of which one was sex-specific, age of menopause onset) were retained for analysis after manual curation and removal of redundance. These criteria were applied to the European genetic ancestry subset for the three stratifications (sex-combined, female-specific and male-specific), and then the same set of traits was assessed in the all ancestries analysis. Traits were classified into four categories (behavior, blood measure, health, physical measure) for PheWeb\u0026rsquo;s PheWAS view, based on the CLSA Data Preview Portal assignments.\u003cem\u003e\u0026nbsp;\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eSingle variant association testing\u0026nbsp;\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eTo perform single variant genome-wide association testing, we employed Regenie v4.1\u003csup\u003e26\u003c/sup\u003e\u0026nbsp; on all traits across the six stratifications: All Combined (N = 26,559), European Combined (N = 24,505), All females (N = 13,240), European females (N = 12,324), All males (N = 13,319), European males (N =12,181).\u003csup\u003e26\u003c/sup\u003e If an individual had missing information for a trait, then it was excluded from the analysis for that trait, leading to a small variation in sample sizes across tested traits. We applied rank inverse normal transformation for all continuous traits using the built-in Regenie option during the estimation of polygenic effects on each trait and the association testing. To estimate polygenic effects, we used pruned genotyping array data that corresponded to the stratification. We included the following covariates as well: the first 20 principal components corresponding to the stratification, genotyping batch (1-5), sex, participant age at data collection site visit and age squared to account for possible non-linear relationships. For sex-stratified analyses, sex was not included as a covariate. For sex-combined analyses, we estimated genotype-by-sex interaction effects using the built-in Regenie\u0026rsquo;s --interaction option. Variants with the minor allele count (MAC) \u0026lt; 5 were not used in the single variant association tests as recommended in Regenie\u0026rsquo;s documentation.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eCounting overlapping significant GWAS loci between stratifications\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eFor this analysis, we used only tested variants with the effect allele frequency (EAF) \u0026gt; 0.01 that showed statistically significant association with a phenotype (p \u0026lt; 5\u0026times;10\u003csup\u003e-8\u003c/sup\u003e). For each phenotype within each stratification, we first defined significant GWAS loci as \u003cimg width=\"12\" height=\"19\" src=\"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAwAAAATBAMAAABW2/GaAAAAAXNSR0IArs4c6QAAABhQTFRFAAAAAAAAADqQOpDbkDoA25A62//////bpI32zQAAAAF0Uk5TAEDm2GYAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAAZdEVYdFNvZnR3YXJlAE1pY3Jvc29mdCBPZmZpY2V/7TVxAAAAIElEQVQYV2NgwAtYDcDSuKkSQSBQw6sErh1sFFwDkr0A07kDz718ktAAAAAASUVORK5CYII=\" alt=\"image\"\u003e\u0026nbsp;500 Kbp regions around each lead variant (the variant with the most significant association). Then, we merged any overlapping loci within the same phenotype and stratification into a single locus (chromosome X PAR regions were treated separately from non-PAR). This resulted in a set of non-overlapping loci of \u0026ge;1 Mbp in length for each phenotype within each stratification. When comparing loci for a phenotype between two stratifications (e.g. female vs male), we declared two or more loci equivalent if they overlapped (i.e. even if the corresponding lead variants were different). \u003c/p\u003e\n\u003cp\u003e\u003cem\u003eCLSA PheWeb instance\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eWe deployed an instance of PheWeb for the CLSA study on a single Linux-based virtual machine (8 vCPUs and 32 GB of memory) on the Secure Data for Health (SD4H, http://www.sd4health.ca/) cloud, which uses an open-source OpenStack\u003csup\u003e27\u003c/sup\u003e cloud computing infrastructure software. Terabytes of compressed GWAS summary-level results are stored and queried by PheWeb 2 from the cloud object store using the S3 protocol, reducing its disk use footprint. Only genotype-phenotype association results for variants with genotype imputation quality r\u003csup\u003e2\u003c/sup\u003e\u0026gt;0.3 were ingested into the CLSA PheWeb instance. The sex-by-genotype interaction results were ingested into the CLSA PheWeb instance only for genetic variants with at least 160 alleles in the study population.\u003cbr\u003e\u0026nbsp;\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eData Availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eIndividual-level data are available from the Canadian Longitudinal Study on Aging (http://www.clsa-elcv.ca) for researchers who meet the criteria for access to de-identified CLSA data. The presented summary-level data are freely available for querying and download through CLSA PheWeb 2 at https://clsa-pheweb.cerc-genomic-medicine.ca.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCode Availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003ePheWeb 2\u0026rsquo;s codebase is open source and hosted on GitHub at https://github.com/GaglianoTaliun-Lab/PheWeb2 and the corresponding API at https://github.com/GaglianoTaliun-Lab/PheWeb2-api. The code used to generate specific results described in this manuscript is at https://github.com/GaglianoTaliun-Lab/PheWeb2-manuscript.\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors thank Andrew P Boughton and Ryan P Welch for their contributions to LocusZoom features. The authors acknowledge Yann Ilboudo, Divya Joshi, Joosung Min, Olga Vishnyakova, and Satoshi Yoshiji, as well as Isabel Fortier and Rita Wissa (from Maelstrom Research), for discussion and feedback on important aspects of this work. The AB SCREEN\u0026trade; II assessment tool is owned by Dr. Heather Keller. Use of the AB SCREEN\u0026trade; II assessment tool was made under license from the University of Guelph.\u003c/p\u003e\n\u003cp\u003eThis research was made possible using the data/biospecimens collected by the Canadian Longitudinal Study on Aging (CLSA). Funding for the Canadian Longitudinal Study on Aging (CLSA) is provided by the Government of Canada through the Canadian Institutes of Health Research (CIHR) under grant reference: LSA 94473 and the Canada Foundation for Innovation, as well as the following provinces, Newfoundland, Nova Scotia, Quebec, Ontario, Manitoba, Alberta, and British Columbia. This research has been conducted using the CLSA datasets: baseline comprehensive dataset \u0026ndash; version 7.0 and Genome-wide genetic data \u0026ndash; version 3.0, under Application Number 23ME002. The CLSA is led by Drs. Parminder Raina, Christina Wolfson and Susan Kirkland. The time and commitment of the participants to the CLSA study platform is gratefully acknowledged, without whom this research would not be possible. The opinions expressed in this manuscript are the author\u0026apos;s own and do not reflect the views of the Canadian Longitudinal Study on Aging.\u003c/p\u003e\n\u003cp\u003eThis research was enabled in part by support provided by Calcul Quebec (https://www.calculquebec.ca), Digital Research Alliance of Canada (https://www.alliancecan.ca), and Secure Data for Health (SD4H, http://www.sd4health.ca/). SAGT and DT acknowledge support from the Canadian Institutes of Health Research (CIHR) (AD6-192920, PJT-197954 and AD7-200181). SAGT acknowledges salary support from a Fonds de Recherche du Qu\u0026eacute;bec \u0026ndash; Sant\u0026eacute; (FRQS; https://frq.gouv.qc.ca) Junior 2 Award (https://doi.org/10.69777/347366). DT acknowledges the support of the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant (RGPIN-2025-06572). JB and MK were supported by CIHR Canada Graduate Scholarships \u0026ndash; Master\u0026apos;s (CSG M) awards.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor Contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eJB, HX, LC, MK, SW, PV, SAGT and DT performed analyses. AJM curated the selection of DXA-related phenotypes. PR contributed to data curation. SAGT and DT supervised the work, secured funding and ensured that the necessary computational resources were available. All authors approved the final manuscript.\u003cbr\u003e \u003cbr\u003e \u003cstrong\u003eCompeting Interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare no competing interests.\u003c/p\u003e\n"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eAll of Us Research Program Genomics, I. Genomic data in the All of Us Research Program. \u003cem\u003eNature\u003c/em\u003e \u003cstrong\u003e627\u003c/strong\u003e, 340-346 (2024). https://doi.org:10.1038/s41586-023-06957-x\u003c/li\u003e\n\u003cli\u003eBycroft, C.\u003cem\u003e et al.\u003c/em\u003e The UK Biobank resource with deep phenotyping and genomic data. \u003cem\u003eNature\u003c/em\u003e \u003cstrong\u003e562\u003c/strong\u003e, 203-209 (2018). https://doi.org:10.1038/s41586-018-0579-z\u003c/li\u003e\n\u003cli\u003eTaliun, D.\u003cem\u003e et al.\u003c/em\u003e Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. \u003cem\u003eNature\u003c/em\u003e \u003cstrong\u003e590\u003c/strong\u003e, 290-299 (2021). https://doi.org:10.1038/s41586-021-03205-y\u003c/li\u003e\n\u003cli\u003eZhang, X.\u003cem\u003e et al.\u003c/em\u003e Whole genome sequencing analysis of body mass index identifies novel African ancestry-specific risk allele. \u003cem\u003eNat Commun\u003c/em\u003e \u003cstrong\u003e16\u003c/strong\u003e, 3470 (2025). https://doi.org:10.1038/s41467-025-58420-2\u003c/li\u003e\n\u003cli\u003eYang, M. L.\u003cem\u003e et al.\u003c/em\u003e Sex-specific genetic architecture of blood pressure. \u003cem\u003eNat Med\u003c/em\u003e \u003cstrong\u003e30\u003c/strong\u003e, 818-828 (2024). https://doi.org:10.1038/s41591-024-02858-2\u003c/li\u003e\n\u003cli\u003eSilveira, P. P., Pokhvisneva, I., Howard, D. M. \u0026amp; Meaney, M. J. A sex-specific genome-wide association study of depression phenotypes in UK Biobank. \u003cem\u003eMol Psychiatry\u003c/em\u003e \u003cstrong\u003e28\u003c/strong\u003e, 2469-2479 (2023). https://doi.org:10.1038/s41380-023-01960-0\u003c/li\u003e\n\u003cli\u003eGagliano Taliun, S. A.\u003cem\u003e et al.\u003c/em\u003e Exploring and visualizing large-scale genetic associations by using PheWeb. \u003cem\u003eNat Genet\u003c/em\u003e \u003cstrong\u003e52\u003c/strong\u003e, 550-552 (2020). https://doi.org:10.1038/s41588-020-0622-5\u003c/li\u003e\n\u003cli\u003eRaina, P.\u003cem\u003e et al.\u003c/em\u003e Cohort Profile: The Canadian Longitudinal Study on Aging (CLSA). \u003cem\u003eInt J Epidemiol\u003c/em\u003e \u003cstrong\u003e48\u003c/strong\u003e, 1752-1753j (2019). https://doi.org:10.1093/ije/dyz173\u003c/li\u003e\n\u003cli\u003eForgetta, V.\u003cem\u003e et al.\u003c/em\u003e Cohort profile: genomic data for 26 622 individuals from the Canadian Longitudinal Study on Aging (CLSA). \u003cem\u003eBMJ Open\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, e059021 (2022). https://doi.org:10.1136/bmjopen-2021-059021\u003c/li\u003e\n\u003cli\u003eWheeler, E.\u003cem\u003e et al.\u003c/em\u003e Impact of common genetic determinants of Hemoglobin A1c on type 2 diabetes risk and diagnosis in ancestrally diverse populations: A transethnic genome-wide meta-analysis. \u003cem\u003ePLoS Med\u003c/em\u003e \u003cstrong\u003e14\u003c/strong\u003e, e1002383 (2017). https://doi.org:10.1371/journal.pmed.1002383\u003c/li\u003e\n\u003cli\u003eWillems, S. M.\u003cem\u003e et al.\u003c/em\u003e Large-scale exome array summary statistics resources for glycemic traits to aid effector gene prioritization. \u003cem\u003eWellcome Open Res\u003c/em\u003e \u003cstrong\u003e8\u003c/strong\u003e, 483 (2023). https://doi.org:10.12688/wellcomeopenres.18754.1\u003c/li\u003e\n\u003cli\u003eMoon, J. Y.\u003cem\u003e et al.\u003c/em\u003e A Genome-Wide Association Study Identifies Blood Disorder-Related Variants Influencing Hemoglobin A(1c) With Implications for Glycemic Status in U.S. Hispanics/Latinos. \u003cem\u003eDiabetes Care\u003c/em\u003e \u003cstrong\u003e42\u003c/strong\u003e, 1784-1791 (2019). https://doi.org:10.2337/dc19-0168\u003c/li\u003e\n\u003cli\u003eChen, J.\u003cem\u003e et al.\u003c/em\u003e The trans-ancestral genomic architecture of glycemic traits. \u003cem\u003eNat Genet\u003c/em\u003e \u003cstrong\u003e53\u003c/strong\u003e, 840-860 (2021). https://doi.org:10.1038/s41588-021-00852-9\u003c/li\u003e\n\u003cli\u003ePerez, G.\u003cem\u003e et al.\u003c/em\u003e The UCSC Genome Browser database: 2025 update. \u003cem\u003eNucleic Acids Res\u003c/em\u003e \u003cstrong\u003e53\u003c/strong\u003e, D1243-D1249 (2025). https://doi.org:10.1093/nar/gkae974\u003c/li\u003e\n\u003cli\u003eDas, S.\u003cem\u003e et al.\u003c/em\u003e Next-generation genotype imputation service and methods. \u003cem\u003eNat Genet\u003c/em\u003e \u003cstrong\u003e48\u003c/strong\u003e, 1284-1287 (2016). https://doi.org:10.1038/ng.3656\u003c/li\u003e\n\u003cli\u003eLoh, P. R.\u003cem\u003e et al.\u003c/em\u003e Reference-based phasing using the Haplotype Reference Consortium panel. \u003cem\u003eNat Genet\u003c/em\u003e \u003cstrong\u003e48\u003c/strong\u003e, 1443-1448 (2016). https://doi.org:10.1038/ng.3679\u003c/li\u003e\n\u003cli\u003eTaliun, D.\u003cem\u003e et al.\u003c/em\u003e LASER server: ancestry tracing with genotypes or sequence reads. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e33\u003c/strong\u003e, 2056-2058 (2017). https://doi.org:10.1093/bioinformatics/btx075\u003c/li\u003e\n\u003cli\u003eKoenig, Z.\u003cem\u003e et al.\u003c/em\u003e A harmonized public resource of deeply sequenced diverse human genomes. \u003cem\u003eGenome Res\u003c/em\u003e \u003cstrong\u003e34\u003c/strong\u003e, 796-809 (2024). https://doi.org:10.1101/gr.278378.123\u003c/li\u003e\n\u003cli\u003eWang, C., Zhan, X., Liang, L., Abecasis, G. R. \u0026amp; Lin, X. Improved ancestry estimation for both genotyping and sequencing data using projection procrustes analysis and genotype imputation. \u003cem\u003eAm J Hum Genet\u003c/em\u003e \u003cstrong\u003e96\u003c/strong\u003e, 926-937 (2015). https://doi.org:10.1016/j.ajhg.2015.04.018\u003c/li\u003e\n\u003cli\u003eChen, S.\u003cem\u003e et al.\u003c/em\u003e A genomic mutational constraint map using variation in 76,156 human genomes. \u003cem\u003eNature\u003c/em\u003e \u003cstrong\u003e625\u003c/strong\u003e, 92-100 (2024). https://doi.org:10.1038/s41586-023-06045-0\u003c/li\u003e\n\u003cli\u003eChang, C. C.\u003cem\u003e et al.\u003c/em\u003e Second-generation PLINK: rising to the challenge of larger and richer datasets. \u003cem\u003eGigascience\u003c/em\u003e \u003cstrong\u003e4\u003c/strong\u003e, 7 (2015). https://doi.org:10.1186/s13742-015-0047-8\u003c/li\u003e\n\u003cli\u003eAnderson, C. A.\u003cem\u003e et al.\u003c/em\u003e Data quality control in genetic case-control association studies. \u003cem\u003eNat Protoc\u003c/em\u003e \u003cstrong\u003e5\u003c/strong\u003e, 1564-1573 (2010). https://doi.org:10.1038/nprot.2010.116\u003c/li\u003e\n\u003cli\u003eTeng, E. The Mental Alternations Test (MAT). \u003cem\u003eThe Clinical Neuropsychologist\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, 287 (1995). \u003c/li\u003e\n\u003cli\u003eO\u0026apos;Connell, M. E.\u003cem\u003e et al.\u003c/em\u003e Methodological considerations when establishing reliable and valid normative data: Canadian Longitudinal Study on Aging (CLSA) neuropsychological battery. \u003cem\u003eClin Neuropsychol\u003c/em\u003e \u003cstrong\u003e36\u003c/strong\u003e, 2168-2187 (2022). https://doi.org:10.1080/13854046.2021.1954243\u003c/li\u003e\n\u003cli\u003eKessler, R. C.\u003cem\u003e et al.\u003c/em\u003e Screening for serious mental illness in the general population. \u003cem\u003eArch Gen Psychiatry\u003c/em\u003e \u003cstrong\u003e60\u003c/strong\u003e, 184-189 (2003). https://doi.org:10.1001/archpsyc.60.2.184\u003c/li\u003e\n\u003cli\u003eMbatchou, J.\u003cem\u003e et al.\u003c/em\u003e Computationally efficient whole-genome regression for quantitative and binary traits. \u003cem\u003eNat Genet\u003c/em\u003e \u003cstrong\u003e53\u003c/strong\u003e, 1097-1103 (2021). https://doi.org:10.1038/s41588-021-00870-7\u003c/li\u003e\n\u003cli\u003eSefraoui, O. A., M.; Eleuldj, M. OpenStack: Toward an Open-Source Solution for Cloud Computing. \u003cem\u003eInternational Journal of Computer Applications\u003c/em\u003e \u003cstrong\u003e55\u003c/strong\u003e, 38-42 (2012). \u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"nature-portfolio","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Nature Portfolio","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"ejp","reportingPortfolio":"","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"Sex-stratified, ancestry-stratified, interactive browser, summary statistics, data sharing, genome-wide association analysis, PheWeb, CLSA","lastPublishedDoi":"10.21203/rs.3.rs-7463215/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7463215/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThe lack of functionalities in web-based tools for interacting with stratified genome-wide association study (GWAS) summary-level results is currently hindering researchers from advancing knowledge of ancestry and sex on the genetics of complex human diseases and traits. Here we introduce PheWeb 2, a completely rewritten enhanced version of our original web-based tool, which offers intuitive and efficient interactive navigation and visual comparison across stratified GWAS results within a single framework.\u003c/p\u003e\n\u003cp\u003e\u003cbr\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e*Justin Bellavance \u0026amp; Hongyu Xiao contributed equally.\u003c/strong\u003e\u003c/p\u003e","manuscriptTitle":"Exploring and visualizing stratified genome-wide association study results with PheWeb 2","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-09-10 06:36:54","doi":"10.21203/rs.3.rs-7463215/v1","editorialEvents":[],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"nature-genetics","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"ng","sideBox":"Learn more about [Nature Genetics](http://www.nature.com/ng/)","snPcode":"","submissionUrl":"","title":"Nature Genetics","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Nature Research","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"5e54a73a-050a-4cc7-b6d4-d0f826f57b51","owner":[],"postedDate":"September 10th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":53730473,"name":"Biological sciences/Genetics/Genetic association study/Genome-wide association studies"},{"id":53730474,"name":"Health sciences/Medical research/Genetics research"}],"tags":[],"updatedAt":"2026-01-20T08:34:41+00:00","versionOfRecord":{"articleIdentity":"rs-7463215","link":"https://doi.org/10.1038/s41588-025-02469-8","journal":{"identity":"nature-genetics","isVorOnly":false,"title":"Nature Genetics"},"publishedOn":"2026-01-19 05:00:00","publishedOnDateReadable":"January 19th, 2026"},"versionCreatedAt":"2025-09-10 06:36:54","video":"","vorDoi":"10.1038/s41588-025-02469-8","vorDoiUrl":"https://doi.org/10.1038/s41588-025-02469-8","workflowStages":[]},"version":"v1","identity":"rs-7463215","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7463215","identity":"rs-7463215","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00