Admixed gene expression models expand molecular and neurological insights into 6 major psychiatric disorders

preprint OA: closed CC-BY-4.0
📄 Open PDF Full text JSON View at publisher
Full text 120,688 characters · extracted from preprint-html · click to expand
Admixed gene expression models expand molecular and neurological insights into 6 major psychiatric disorders | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Admixed gene expression models expand molecular and neurological insights into 6 major psychiatric disorders Xavier Bledsoe, Nathan Watkins, Tavian Bowen-Moore, Eric R. Gamazon This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6229829/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted You are reading this latest preprint version Abstract Our understanding of the influence of ancestral background on genetically determined expression remains limited, especially when gene expression models are applied to studies from different or multiple populations. We performed transcriptome wide association studies (TWAS) in 6 different psychiatric conditions, leveraging gene expression models trained in cohorts with different proportions of African, European, and Indigenous American genetic ancestries. For comparison we repeated each TWAS using a model trained in individuals of predominantly European ancestry. We identified 1,416 statistically significant TWAS associations (FDR p 92% correlation in the gene-level effects on disease risk, a statistic that remained robust for TWAS results that only reached statistical significance in one population. Using admixed gene expression models validated and greatly extended the yield of TWAS. The resulting transcriptomic signatures implicated neuroimaging features associated with diagnostic symptoms. Biological sciences/Genetics/Gene expression Biological sciences/Computational biology and bioinformatics Biological sciences/Genetics/Genetic association study Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Background Uncovering the physiologic basis of psychiatric conditions has long been a challenge in medical science. Genome wide-association studies (GWAS) and large-scale cohort studies have heralded a major expansion in molecular associations with psychiatric conditions. 1 Recently, transcriptome-wide association studies (TWAS) have built upon the GWAS framework by identifying imputed genetically determined gene expression measures associated with diagnoses. 2 – 4 Improvements in TWAS implementation offer further enhancements for evaluation of psychiatric disease mechanisms. One such improvement is the expansion of ancestral diversity in the training data for TWAS models. The TWAS methodology relies on pre-existing reference panels which detail the relationship between single nucleotide polymorphisms and RNA transcript quantity. 5 Model training transforms these data into predictive models 4 which can then be used by formal methods to quantify associations between genetically regulated gene expression (GReX) and disease. 6 , 7 Traditionally, TWAS models have been trained primarily on individuals of European ancestry. 5 , 8 , 9 It has been shown that the predictive performance of models trained in individuals of European ancestry is diminished when these models are applied to individuals from different populations. 8 , 10 As new GWAS meta-analyses for psychiatric conditions from ancestrally diverse study participants are published, the development and implementation of admixed TWAS models may improve interpretation of GWAS findings. Recently, Kachuri et al generated TWAS models from two consortia enriched for individuals of admixed ancestry: the Genes-environments and Admixture in Latino Asthmatics (GALA II) and Study of African Americans, Asthma, Genes, and Environments (SAGE). 11 These models were trained on data from 2,733 individuals who self-identify as African American, Mexican, Puerto Rican, or other Latino American. The individuals studied in each model demonstrate different proportions of genetic admixture across African, American, and European superpopulations 11 . We hypothesize that this increased genetic diversity will enable identification of novel gene-level associations in the context of psychiatric disease. However, it is not yet clear how discordance in ancestry between GWAS cohorts and TWAS models affect the disease-GReX associations identified by TWAS. Here we used the G2S models to perform TWAS for psychiatric conditions. To avoid trait-specific findings, we performed the TWAS in parallel on 6 major psychiatric conditions: major depressive disorder, alcohol use disorder, attention deficit hyperactivity disorder, bipolar disorder, post-traumatic stress disorder, and schizophrenia. 12 – 18 We then performed TWAS for each condition using a whole blood gene expression model trained in individuals of predominantly European ancestry from the Genotype-Tissue Expression (GTEx) consortium. 5 Prior to interpretation of the resulting gene-level associations, we first performed an in-depth statistical assessment into the comparative performance of the G2S and GTEx models when applied to GWAS of varying ancestries. We found substantial heterogeneity in the SNP features for the same genes across different models. This heterogeneity was reflected in poorly correlated p-values for the gene level associations present in both G2S and GTEx derived TWAS findings. When we examined the effect size estimates themselves, however, we observed a high degree of correlation. The estimated effect of GReX on disease risk was largely stable against variation in model ancestry, disease type, GWAS ancestry, SNP features, and SNP weights, indicating that SNP level differences between populations converge on similar gene-level targets in the context of psychiatric disease. Lastly, we show the value of the admixed TWAS approach by presenting each newly expanded set of gene level associations and imputing the neurological consequences of disease-associated transcriptomic signatures. We provide evidence that the application of ancestrally diverse gene expression models to psychiatric GWAS’ replicates and substantially expands the set gene-level and brain-level associations previously obtained from European-ancestry models. Methods GWAS summary statistics We utilized GWAS meta-analyses summary statistics from the Psychiatric Genetics Consortium (PGC) for all 6 psychiatric diseases. 12 – 18 For each condition, we selected the most recent large multi-ancestry GWAS, with the exception of BD1, ADHD, and AUD for which the largest GWAS included individuals exclusively of European descent (Table S1 ). For select GWAS analyses, summary statistics for both the full meta-analysis and subsets of ancestry-specific data were available. In such instances, we analyzed all available datasets separately. Gene expression models We leveraged 4 PrediXcan (whole blood gene expression) models trained on data from the GALAII/SAGE (G2S) cohort studies. Three models are specific to self-reported ancestry: African American, Mexican American, and Puerto Rican. G2S includes individuals of ‘Other Latino’ ancestry; however, with only 299 individuals, this group was too small to create an independent training model. Their genetic and transcriptomic data are included in the fourth model, referred to as ‘All whole blood’. This model includes all data across the full cohort in a single gene expression prediction model. These models are publicly available on Zenodo at https://zenodo.org/doi/ 10.5281/zenodo.6622367 . For the predominantly European whole blood gene expression model we accessed the GTEx data from the public Zenodo repository https://zenodo.org/doi/ 10.5281/zenodo.3842263 . The model is trained on GTEx v8 data. GTEx version 8 contains 15,201 RNA-sequencing samples quantified from 49 tissues of 838 postmortem donors. 85.3% of the donor population (715) was identified as European American, with additional demographics including 12.3% African American (103), 1.4% Asian American (12), and 1.9% reporting Hispanic or Latino ethnicity (16). 66.4% of donors (557) were male and 33.5% were female (285). 5 We selected the PrediXcan model for whole blood run to best match the tissue used in the creation of the GALAII/SAGE models which was also whole blood. PrediXcan TWAS analyses We performed TWAS on all GWAS summary statistics for the 6 selected traits obtained from the PGC. We used S-PrediXcan command line tool from the MetaXcan package using the 5 different gene expression models described above for each set of summary statistics. 19 Each pairing of GWAS summary statistics and TWAS gene expression model is assessed individually. Prior evidence points to the presence of large inversion structural variants in the chromosome 17 and 8 regions that are disproportionately detected in European genetic ancestry and also violate statistical assumptions regarding LD underlying the TWAS approach. 20 Consistent with prior practice, we removed the results from these regions that are predicted by the European dominant model. 21 In all models, we removed findings from the MHC region due to similar issues in structural complexity. 22 TWAS multiple testing correction We performed multiple testing correction using the Benjamini-Hochberg false discovery rate (FDR < 0.05). We assessed TWAS results for each disease and gene expression model pairing as independent studies. Mean correlation for each disease/ancestry pair as a function of p-value threshold We performed TWAS of the ancestry specific GWAS datasets using the 5 gene expression models described above. At each TWAS p-value threshold, we calculated the Pearson correlation of effect size estimates with the corresponding discovery TWAS. We then identified the mean correlation for each disease/ancestry pair at each p-value threshold. Cross-model SNP feature comparison We compared the SNP features in the G2S models against the GTEx whole blood model. We first subset both G2S models and GTEx models with the SNPs from the GWAS study. We then annotated each gene as ‘G2S’ if there are only SNP features for that gene in a G2S model. We made a similar annotation for GTEx. Some genes had SNP features in both models. We annotated these genes as either ‘shared_distinct’ when the SNP features did not overlap between the two studies or ‘shared_overlapping’ if there was at least one SNP shared between the two. We defined an additional gene-level annotation depending on whether the gene-disease association reached statistical significance in none of the models, G2S only, GTEx only, or both. Stratifying genes across the model statistical significance and SNP sharing, we calculated the correlation of the normalized effect size estimates (zscore) as calculated in the G2S model and the GTEx model. Gene annotation by model specificity The full set of genes evaluated by GTEx is not identical to the set of gene evaluated by G2S. Certain genes are characterized only in one model but not the other. We annotated each gene according to its presence across models. Genes that were tested in any of the 4 G2S models but not GTEx were classified as ‘G2S only’ while those tested in GTEx whole blood but none of the G2S genes were classified as ‘GTEx only’. The genes that were tested in both GTEx and at least one G2S gene were classified as ‘co-tested’. Transcriptomic similarity across psychiatric diseases For each disease/ancestry pair, we curated a list of effect size estimates of all associated genes that passed the significance threshold (FDR < 0.05). We then calculated the Pearson correlation of effect size estimates for shared genes across diseases for each gene model. These data were visualized by correlogram. NeuroimaGene mapping of disease genes to brain structure We used the NeuroimaGene package in R to identify neuroimaging features implicated by the transcriptomic signatures of each disease. 23 The NeuroimaGene R package enables the user to test for statistically significant associations between GReX measures and MRI-based neuroimaging derived phenotypes (NIDPs). These NIDPs were characterized in European individuals from the UK biobank and encode measurements of cortical area, volume, and thickness as well as subcortical volumes. The repository of GReX-NIDP associations was generated via application of JTI-PrediXcan to GWAS of over 3,500 NIDPs in the UK Biobank. Full description of the data and methods can be found in the original publication. 21 We used the NeuroimaGene method to extend the interpretability of our disease TWAS by relating molecular correlates of disease to brain features which may carry psychiatric or psychological import. We first identified transcriptomic correlates with each of the 6 diseases. To assess the broadest single set of results from the G2S models, we subset the TWAS findings to those derived from the aggregate whole blood model ( All whole blood). We then used NeuroimaGene to identify the neuroimaging features in the Desikan cortical atlas and ‘Subcortex’ subcortical atlas that are associated (Benjamini-Hochberg FDR < 0.05) with expression of each gene set 21 , 23 . We then compared the number of NIDPs from the application of the G2S and GTEx models and quantified the relative increase from the use of admixed models. Results A majority of TWAS findings derive from admixed models We identified 1,416 statistically significant associations (FDR p < 0.05) between GReX and all 6 psychiatric diagnoses (Fig. 1 a, S1). Most of these associations derived from analyses of bipolar disorder 1 (263), and schizophrenia (717), with the rest from ADHD (121), major depressive disorder (64), alcohol use disorder (17), and PTSD (235). A majority of associations (62%) were uniquely detected by models trained on individuals of African American, Latino, Puerto Rican, and Mexican self-reported ancestry. This stands in contrast to the 14% of gene-level associations specific to the majority European model trained in GTEx. The high performance of the G2S models relative to GTEx remains true for bipolar disorder 1, alcohol use disorder, and ADHD, which were all derived from GWAS in individuals of exclusively European genetic ancestry 12 , 13 , 17 . Disease TWAS associations differ based on ancestry background of gene models We observed that only 24% of all gene-level associations passed TWAS significance thresholds using both predominantly European and admixed models. On assessment, 79% (939) of significant TWAS genes were tested in both GTEx and G2S (Fig. 1 b). Of the other 247 significant genes, 213 were only assayed by G2S while 34 were only assayed by GTEx. The 79% of co-tested genes contrasts with the 24% of significant gene-level associations that replicated across models. Because the same GWAS summary statistics are used for the different TWAS, the remaining differences in gene-level associations must arise from incongruities in the SNP features for the G2S and GTEx models. Gene-level TWAS associations demonstrate high correlation of effect sizes Accordingly, we examined the TWAS effect size estimates across G2S and GTEx. Each gene-level association is characterized by an estimated effect size of GReX on disease risk (Fig. 2 a). For all genes and models, the correlations in GReX-disease effect sizes ranged from 0.66 to 0.80 with a mean correlation of 0.76 (Fig. 2 b, Figure S2). This high correlation in effect sizes rose to an average of 0.93 when we considered only those genes that were called significant in at least one of the models (Figure S3). For the genes significant in both models, the effect sizes obtained a correlation of 0.99 (Table S2). Notably, for the genes that were called significant in either a G2S model or GTEx but not both, the effect sizes still obtained a mean correlation of 0.88 (Table S2). The 24% replication in significant findings was thus not explained by differences in effect sizes between G2S and GReX. The effect sizes matched with a high degree of correlation. Gene-level TWAS associations demonstrate high p-value heterogeneity We calculated the correlation in p-values across G2S and GTEx associations. Across all genes, we obtained a mean correlation coefficient of 0.50 (Fig. 2 c, S4). Restricting the set of analyzed genes to those significant in at least one TWAS resulted in a mean correlation coefficient of -0.04 (0.32 for shared significant genes; Figure S5). More significant associations demonstrated lower correlation in p-values. This low correlation in significance statistics is much more consistent with the low replication of significant gene-level associations across G2S and GTEx. We highlight an association between APPL2 and bipolar disease 1 as a representative example. The normalized effect size estimates for this association in GTEx whole blood and the aggregate G2S whole blood model are similar in magnitude and direction (-2.05 and − 3.22 respectively). While the effect size estimates are similar, the association reaches statistical significance in the aggregate G2S model only (P FDR = 0.044) but not in GTEx (P FDR = 0.29). APPL2 has been implicated in exome sequencing of individuals with bipolar disorder 1 24 and functions as a molecular regulator for the mania-associated DISC1 locus 25 . Detected in G2S but not in GTEx, these data provide the first evidence associating GReX of APPL2 with risk of bipolar disorder. G2S and GTEx models rely on largely distinct sets of SNP features We next assessed if the differences in SNP features across G2S and GTEx informed the high p-value discordance of the gene-level associations. We identified the intersecting set of SNPs that were both reported in the GWAS and used in each gene expression model. The G2S models used a median of 28 SNPs per gene while GTEx used a median of 8 (Fig. 2 d, S6). The average number of GWAS-interrogated SNPs shared across both models per gene was 0.90. In 56% of genes tested, no SNPs were shared between the G2S and GTEx models. The G2S models thus used more SNPs in gene expression prediction and used largely different sets of SNPs. Both the G2S and GTEx variants were obtained using whole genome sequencing. Cross-ancestry heterogeneity exists in SNP features of gene expression Higher weight magnitude for a SNP feature implies that the SNP predicts a greater change in gene expression. Across diseases, only 1.8% of SNP features were shared for the same genes across the models (Fig. 2 d, S6). We assessed the correlation in weights assigned to these SNP features across G2S and GTEx models. Correlations in the SNP weights ranged from 0.40–0.43 across diseases and models (Fig. 2 e, S7). While G2S and GTEx used largely different sets of SNP features, even when they use the same SNP feature, the assigned weights were only modestly correlated. We thus observed high correlation in effect size estimates for all gene level associations in the presence of low sharing and low similarity of SNP weights (Fig. 2 f). Broad differences exist across prediction models Beyond the correlation of weights, the performance of models could be influenced by the distribution of SNP features, the accuracy of prediction performance, and other variables (Fig. 3 a). As was previously noted, the G2S models used a median of 28 SNP features per gene compared to the median of 8 in GTEx. We also observed that the median weight of each SNP feature on its gene was less in G2S than GTEx (Fig. 3 b). The result of differing SNP sets on gene expression imputation was reflected in the prediction performance statistic of each model. G2S obtained a median \(\:{r}^{2}\) of 0.157 while GTEx was significantly lower at 0.124 (Fig. 3 c, S8). Conversely, for genes captured by both G2S and GTEx, the \(\:{r}^{2}\) was greater in G2S models (Figure S9). Collectively the G2S models predicted gene expression using a greater number of SNPs and more low-weight SNPs than the GTEx model. Regarding the SNP features shared between G2S and GTEx, the weights assigned by G2S were generally higher than those in GTEx (Fig. 2 e, S10). Thus, there are systemic differences between the SNP features identified in GTEx vs those derived from populations of African American, Mexican American, and Puerto-Rican individuals. Cross-model TWAS correlations remain high despite distinct SNP features The high effect-size correlation is not intuitive given the systemic differences in the SNP predictors between models. Here we assessed the null hypothesis that the convergent effect size estimates are due to shared, high impact SNP predictors used in both G2S and GTEx. To test the null, we first curated all genes with models in both G2S and GTEx. We then divided these genes into two categories depending on if the sets of SNP features for the gene are distinct (shared_distinct) or if there was at least one overlapping SNP between the predictor sets (shared_overlapping). We next stratified genes according to their TWAS significance in the G2S and GTEx based analyses. Lastly, we calculated the correlation coefficient for TWAS effect size estimates and p-values. The gene level associations derived from non-overlapping sets of SNP features demonstrated correlation coefficients that were lower but similar in range to the gene level associations derived from overlapping sets of SNP features. While sharing SNP features did increase the correlation of TWAS effect size estimates, correlations greater than 0.84 and 0.9 were still observed for genes that are significant in one or both models respectively but had no intersecting SNPs (Fig. 3 d, S11). Notably these findings did not replicate for correlations of p-values, where high discordance persisted across diseases and subcategories (Figure S12). SNP sharing predicts gene expression with greater prediction accuracy The \(\:{r}^{2}\) cross validation measure represents the correlation between predicted gene expression and measured gene expression from the training sets in GTEx and G2S. Regardless of whether the \(\:{r}^{2}\) cross validation is assessed in GTEx or G2S, the genes with the greatest predictive accuracy were those classified in both GTEx and G2S models with overlapping SNP predictors (Fig. 3 e, S13). Additionally, even when the SNP predictor sets were fully distinct, these dual-classified genes still demonstrated greater \(\:{r}^{2}\) cross validation than genes called significant by only a single model. Parallel ancestry-concordant TWAS recapitulate high correlations in effect size estimates Three multi-ancestry GWAS meta-analyses (PTSD, SCZ, and MDD) provided summary statistics stratified by ancestry. We performed TWAS on each subset of GWAS summary statistics using the G2S and GTEx models (Figure S14). Analyses of the African and Indigenous American subsets were not sufficiently powered to identify any significant gene-level associations for any of the conditions (Figure S15). To perform parallel, ancestry concordant TWAS analyses, we used two different meta-analyzed GWAS of MDD performed by the PGC. The first is described above and included individuals of African (36%), East Asian (26%), East Asian (6%) and Hispanic/Latin American (32%). 14 The second is a GWAS meta-analysis in individuals of only European descent (Fig. 4 a). 26 We performed a TWAS of the European GWAS using the GTEx whole blood model. In parallel, we performed TWAS of the multi-ancestry GWAS using the G2S models. As with the meta-analyzed data, we identified SNPs used in each of the G2S TWAS and compared them to the SNP features used in the GTEx TWAS. The G2S models consistently used more SNPs than GTEx (Fig. 4 b) and included SNPs with smaller SNP weight magnitude on gene expression (Fig. 4 c). Across shared SNP-gene pairs, we observed a correlation in the SNP prediction weights of approximately 0.42. Considering the estimated associations between GReX and disease, the correlation of effect size estimates with GTEx was approximately 0.75 for all shared genes detected in the G2S models. Restricting the effect size comparison to genes that were significantly associated with MDD in at least one model, we again identified correlation values of greater than 0.95 (Fig. 4 d). Phenomic correlation analysis using transcriptomic signatures In the analyses up to this point, we treated each disease as its own entity and compared the details of the different models within the disease. Demonstrating similarities in performance within diseases does not inform how the models will perform across diseases. The predicted relationships of transcriptomic signatures across diseases could differ depending on the model used to impute GReX. As such, we quantified the transcriptomic similarity of all 6 diagnoses. Within each ancestry model, we calculated the Pearson correlation of s across all FDR significant genes associated with each disease (Fig. 5 ). The rank ordering of disease pairs by correlation coefficient was similar across all diseases. The greatest transcriptomic correlation was observed between MDD and SCZ, followed by ADHD and PTSD. In all cases, we observed high correlation between BD1 and ADHD. Alcohol use disorder demonstrated little transcriptomic correlation with the other 5 diagnoses. These patterns were robust to variation in the gene expression model used across both G2S and GTEx. Multivariate imaging correlates of transcriptomic disease profiles The TWAS methodology identifies statistical associations between gene expression and each psychiatric condition. This approach does not implicate aspects of organ-level biology that may mediate disease risk. Prior studies suggest that GReX changes associated with psychiatric disease have consequences on brain physiology that may affect disease risk 27 . We wanted to assess the extent to which using admixed gene expression models in TWAS improved the identification of putative biological intermediates involved in risk. We used the NeuroimaGene approach to assess the impact of the trait-associated GReX for each condition on measures of neurological structure. NeuroimaGene functions as a repository of associations between GReX and over 3,400 neuroimaging derived phenotypes. Derived from predominately healthy individuals, these associations quantify endogenous relationships between imputed gene expression and the physical structure of the brain. We used GReX measures derived from the G2S aggregate model for each disease. Using NeuroimaGene, we tested for associations between these GReX measures and NIDPs from two atlases: the Desikan cortical atlas and an automated segmentation of subcortical regions (Fig. 6 ). The genes associated with BD1 and SCZ both implicated widespread cortical alterations. These findings are seen most dramatically in the negative correlation between cortical thickness and disease GReX measures. Alcohol use disorder and MDD presented with relatively little cortical targeting by trait-associated genes. This contrasted with subcortical findings where prominent effects on the bilateral putamen in alcohol use disorder and widespread subcortical involvement in MDD were observed. We noted that 59% of all NIDPs associated with trait genes were the result of genes that were significant in G2S but not GTEx. This number was highest in the volume and thickness measures associated with MDD and subcortical volumes in schizophrenia, all reaching 100%. With only 14 total NIDPs identified, alcohol use disorder presented with no G2S specific NIDPs. These statistics likely underestimate the breadth of potential information carried by the G2S genes given that the NeuroimaGene resource does not include information from diverse cohorts. It only reflects gene-level associations derived from Europeans. This limitation notwithstanding, the 489 additional NIDPs associated with the 6 conditions via G2S represent a 183% improvement over those identified by GTEx alone. As with the initial TWAS, we again observed that the use of admixed models improved association detection in context of predominately European genetic ancestries, this time in the space of neuroimaging associations. Discussion We compared the results from TWAS using 5 gene expression models that differ in their genetic ancestries. To minimize capturing phenotype specific results, we repeated the analysis for 6 different psychiatric GWAS meta-analyses. Gene expression models trained on admixed populations (AFR, EUR, and AMR) generally identified more significant gene level (TWAS) associations than models trained on individuals of predominantly European ancestry. We observed this pattern when TWAS was applied to GWAS cohorts of European ancestry as well as cross ancestry meta-analyses. Our results are consistent with prior findings suggesting that the increased variance in allele frequencies can improve TWAS association power 28 . Overall, we identified 1,416 gene-level associations with psychiatric diagnoses, of which 881 were uniquely detected in highly admixed ancestry models compared to European ancestry models. These findings suggest that there may be significant additional utility in increasing the genetic diversity of transcriptomic resources. One previously unreported result is a significant association between ADHD and DCHS1 in the G2S models. The association does not reach statistical significance in the GTEx whole blood analysis and has not been associated with ADHD in preexisting literature. DCHS1 codes for a cadherin protein that is most strongly detected in the developing fetal brain 29 . Other cadherins have already been strongly associated with ADHD, supporting the plausibility of DCHS1 . 30 Regarding functional support, observational studies in humans as well as interventional analyses in mice indicate that perturbations of DCHS1 leads to disruptions in cerebral cortical development. 31 Lastly, previous data shows that gene expression models trained on non-brain tissues suffer from reduced power regarding brain-related TWAS than those trained in the brain 21 . While DCHS1 is not associated with ADHD in the GTEx whole blood model, the association achieves statistical significance when using a gene expression model trained on the brain cortex from GTEx. This cross-tissue replication suggests that some of the inherent limitations of using whole blood models to predict psychiatric conditions may be ameliorated by using admixed gene expression models. There was incomplete portability of statistically significant gene-level associations across G2S and GTEx models. Specifically, 76% of significant associations were either significant in a G2S model or the GTEx model but not via both. When one gene-disease association is statistically significant in one study and not another, it is not immediately clear which p-value statistic should be accorded more weight. This set of p-value discordant genes is thus of high import as they may represent false positives or novel associations with potential biological implications. Focusing in on the associations with discrepant p-values, we observed a high degree of correlation in the effect size estimates (> 90%). This correlation is not likely to be a consequence of SNP overlap between models given that the majority of G2S/GTEx comparisons had zero overlapping SNP features per gene. When there were overlapping SNP features between genes, the weights of the features were often different, reflected by correlation coefficients near 0.4. The G2S and GTEx models (1) leveraged largely different SNPs, (2) accorded different weights to the small proportion of shared SNPs and (3) still arrived at highly correlated effect size estimates for gene-disease associations. Our finding of convergent gene-level associations from divergent populations of SNP features speaks to an interesting question about the role of ancestry in genetic studies. Variation in the presence and frequency of different alleles across different populations is well described 32 . As a result, when SNP-based prediction algorithms such as PRS are trained in one ancestry group, they often suffer from reduced accuracy in other populations on account of these differing SNP profiles 33 , 34 . Similarly, GWAS of the same trait can identify different trait-associated SNP variants when performed in populations of different genetic ancestries 35 , 36 . The prevailing hypothesis regarding complex disease is that individuals of different populations share similar molecular disease processes which are merely being tagged by different SNPs 37 , 38 . Our data support this paradigm. Specifically, we observe substantial ancestry-based heterogeneity regarding which GWAS SNPs predict gene expression. The convergence of these different SNP features onto a shared set of highly correlated gene-disease associations suggests that similar transcriptomic markers of psychiatric disease are tagged by different SNPs depending on the population used for imputation reference. This is further supported by our analysis of MDD for which two different ancestry-stratified GWAS were each assessed via the TWAS methodology using ancestry concordant gene expression models. Linkage disequilibrium is a topic that deserves special consideration in the context of this analysis. Our main findings were that (1) the highly admixed G2S models identified more associations than the predominately European GTEx model, (2) the effect size estimates of gene-disease associations were largely similar across tested ancestry groups, and (3), this sharing was robust to highly discordant SNP features across gene expression models. Linkage disequilibrium is most relevant to finding 3 and we expect that some degree of cross ancestry LD exists between SNP features from G2S and GTEx. While LD could explain a proportion of the correlation in gene-disease effect size estimates, it would not invalidate the primary result which is that the correlation exists. Should that correlation exist due to cross-ancestry LD of SNP features, such a finding would further validate the claim that there is substantial sharing of gene-disease associations across populations. While we do not assess this question here, such an analysis could be an interesting follow up to the work presented. The full set of transcriptome associations for each disease represents a method of describing a disease entity by its molecular correlates. We compared the molecular similarities between diseases (Fig. 5 ). If PTSD for example possessed a different molecular profile in one ancestry relative to another, we would expect the transcriptomic relationship of PTSD to other diseases to differ. Instead, we observed a similar relationship between transcriptomic signatures of disease across all 5 models. Lastly, we observed that the inclusion of genes identified by the admixed models increased the identified disease-relevant neuroimaging features by 183%, highlighting the potential for these associations to better inform the neurobiology of psychiatric disease. This study has several relevant limitations. The G2S models are defined using self-reported ancestry 11 . Because admixture thresholds were not used for participant inclusion, the models cannot be treated as strictly representative of discrete genetic ancestries. European admixture increases the SNP-level similarity of the G2S and GTEx models. The same is true for GTEx given the 12% of participants who were identified as African American and the ~ 3.1% of individuals identifying as Asian American or Latino. While the result of this admixture would be to increase SNP-level similarities in gene expression models, we still observe substantial heterogeneity which then converges on shared biological intermediates including RNA transcripts and neuroimaging features. Secondly, we limit these analyses to psychiatric disease GWAS published by the Psychiatric Genomics Consortium to minimize heterogeneity in the meta-analytic methodology for the source data. Therefore, these findings are not guaranteed to extrapolate to non-psychiatric conditions. Our data include just five gene expression models covering only European, indigenous American, and African genetic ancestries. There is a tremendous genetic variation in other people groups that is not represented here. We anticipate the creation and validation of additional TWAS models to further such analyses. Declarations Data Availability All data generated in the production of this manuscript can be found at Zenodo [https://zenodo.org/uploads/14889758] (temporary link to be finalized following revisions) Code Availability All code involved in the production of this manuscript and the analyses therein can be found at Zenodo [https://zenodo.org/uploads/14889758] (temporary link to be finalized following revisions) Author Contributions X.B. contributed to the design and implementation of the research, X.B., N.W., and T.B. contributed to the analysis of the results and X.B. and E.R.G. contributed to the writing of the manuscript. Acknowledgment This study was supported by the following National Institutes of Health (NIH) grants to E.R.G.: NHGRI R35HG010718, NHGRI R01HG011138, NIA AG068026, NIGMS R01GM140287, NIMH R01MH126459, and the Scott Hamilton Foundation. Competing Interests E.R.G. has served as a consultant for Thryv Therapeutics. Materials and Correspondence All correspondence and material requests should be addressed to Eric R. Gamazon, [email protected] References Andreassen, O.A., Hindley, G.F.L., Frei, O. & Smeland, O.B. New insights from the last decade of research in psychiatric genetics: discoveries, challenges and clinical implications. World Psychiatry 22 , 4-24 (2023). Li, B. & Ritchie, M.D. From GWAS to Gene: Transcriptome-Wide Association Studies and Other Methods to Functionally Understand GWAS Discoveries. Frontiers in Genetics 12 (2021). Wainberg, M. et al. Opportunities and challenges for transcriptome-wide association studies. Nature Genetics 51 , 592-599 (2019). Gamazon, E.R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nature genetics 47 , 1091-1098 (2015). The Gtex, C. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369 , 1318-1330 (2020). Barbeira, A.N. et al. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nature Communications 9 , 1-20 (2018). Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nature genetics 48 , 245-252 (2016). Bhattacharya, A. et al. Best practices for multi-ancestry, meta-analytic transcriptome-wide association studies: lessons from the Global Biobank Meta-analysis Initiative. Cell genomics 2 (2022). Mai, J., Lu, M., Gao, Q., Zeng, J. & Xiao, J. Transcriptome-wide association studies: recent advances in methods, applications and available databases. Communications Biology 6 , 899 (2023). Keys, K.L. et al. On the cross-population generalizability of gene expression prediction models. PLoS genetics 16 , e1008927 (2020). Kachuri, L. et al. Gene expression in African Americans, Puerto Ricans and Mexican Americans reveals ancestry-specific patterns of genetic architecture. Nature genetics 55 , 952-963 (2023). Demontis, D. et al. Genome-wide analyses of ADHD identify 27 risk loci, refine the genetic architecture and implicate several cognitive domains. Nature genetics 55 , 198-208 (2023). Mullins, N. et al. Genome-wide association study of more than 40,000 bipolar disorder cases provides new insights into the underlying biology. Nature Genetics 53 , 817-829 (2021). Meng, X. et al. Multi-ancestry genome-wide association study of major depression aids locus discovery, fine mapping, gene prioritization and causal inference. Nature genetics 56 , 222-233 (2024). Howard, D.M. et al. Genome-wide meta-analysis of depression identifies 102 independent variants and highlights the importance of the prefrontal brain regions. Nature neuroscience 22 , 343-352 (2019). Trubetskoy, V. et al. Mapping genomic loci implicates genes and synaptic biology in schizophrenia. Nature 604 , 502-508 (2022). Sanchez-Roige, S. et al. Genome-wide association study meta-analysis of the alcohol use disorders identification test (AUDIT) in two population-based cohorts. American Journal of Psychiatry 176 , 107-118 (2019). Nievergelt, C.M. et al. Genome-wide association analyses identify 95 risk loci and provide insights into the neurobiology of post-traumatic stress disorder. Nature Genetics , 1-17 (2024). Zhou, D. et al. A unified framework for joint-tissue transcriptome-wide association and Mendelian randomization analysis. Nature Genetics 52 , 1239-1246 (2020). Campoy, E., Puig, M., Yakymenko, I., Lerga-Jaso, J. & Cáceres, M. Genomic architecture and functional effects of potential human inversion supergenes. Philosophical Transactions of the Royal Society B 377 , 20210209 (2022). Bledsoe, X. & Gamazon, E.R. A transcriptomic atlas of the human brain reveals genetically determined aspects of neuropsychiatric health. The American Journal of Human Genetics 111 , 1559-1572 (2024). Horton, R. et al. Gene map of the extended human MHC. Nature Reviews Genetics 5 , 889-899 (2004). Bledsoe, X. & Gamazon, E.R. NeuroimaGene: an R package for assessing the neurological correlates of genetically regulated gene expression. BMC bioinformatics 25 , 325 (2024). Goes, F.S. et al. Exome sequencing of familial bipolar disorder. JAMA psychiatry 73 , 590-597 (2016). Teng, S. et al. Rare disruptive variants in the DISC1 Interactome and Regulome: association with cognitive ability and schizophrenia. Molecular psychiatry 23 , 1270-1277 (2018). Howard, D.M. et al. Genome-wide meta-analysis of depression identifies 102 independent variants and highlights the importance of the prefrontal brain regions. Nature neuroscience 22 , 343-352 (2019). Arnatkeviciute, A., Fulcher, B.D., Bellgrove, M.A. & Fornito, A. Imaging transcriptomics of brain disorders. Biological Psychiatry Global Open Science 2 , 319-331 (2022). Li, Z. et al. METRO: Multi-ancestry transcriptome-wide association studies for powerful gene-trait association detection. The American Journal of Human Genetics 109 , 783-801 (2022). Fishilevich, S. et al. Genic insights from integrated human proteomics in GeneCards. Database 2016 , baw030 (2016). Hawi, Z. et al. The role of cadherin genes in five major psychiatric disorders: A literature update. American Journal of Medical Genetics Part B: Neuropsychiatric Genetics 177 , 168-180 (2018). Cappello, S. et al. Mutations in genes encoding the cadherin receptor-ligand pair DCHS1 and FAT4 disrupt cerebral cortical development. Nature genetics 45 , 1300-1308 (2013). Genomes Project, C. A global reference for human genetic variation. Nature 526 , 68 (2015). Duncan, L. et al. Analysis of polygenic risk score usage and performance in diverse human populations. Nature communications 10 , 3328 (2019). Ding, Y. et al. Polygenic scoring accuracy varies across the genetic ancestry continuum. Nature 618 , 774-781 (2023). Graff, M. et al. Discovery and fine-mapping of height loci via high-density imputation of GWASs in individuals of African ancestry. The American Journal of Human Genetics 108 , 564-582 (2021). Zhang, J., Zhang, S., Qiao, J., Wang, T. & Zeng, P. Similarity and diversity of genetic architecture for complex traits between East Asian and European populations. BMC genomics 24 , 314 (2023). Li, Y.R. & Keating, B.J. Trans-ethnic genome-wide association studies: advantages and challenges of mapping in diverse populations. Genome medicine 6 , 1-14 (2014). Rosenberg, N.A. et al. Genome-wide association studies in diverse populations. Nature Reviews Genetics 11 , 356-366 (2010). Additional Declarations Yes there is potential Competing Interest. E.R.G. has served as a consultant for Thryv Therapeutics. Supplementary Files Supplemental.docx Supplementary Information Cite Share Download PDF Status: Under Review Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6229829","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":433847586,"identity":"37b91905-3751-437b-a284-d106a980a45f","order_by":0,"name":"Xavier Bledsoe","email":"","orcid":"","institution":"Medical Scientist Training Program, Vanderbilt University, Nashville, TN; Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN","correspondingAuthor":false,"prefix":"","firstName":"Xavier","middleName":"","lastName":"Bledsoe","suffix":""},{"id":433847587,"identity":"917196a0-9044-430b-ac20-e545676c77d0","order_by":1,"name":"Nathan Watkins","email":"","orcid":"","institution":"Chapman University, Orange, CA","correspondingAuthor":false,"prefix":"","firstName":"Nathan","middleName":"","lastName":"Watkins","suffix":""},{"id":433847588,"identity":"30b86dbc-8888-425c-8872-967273bedd84","order_by":2,"name":"Tavian Bowen-Moore","email":"","orcid":"","institution":"Gonzaga University, Spokane, WA","correspondingAuthor":false,"prefix":"","firstName":"Tavian","middleName":"","lastName":"Bowen-Moore","suffix":""},{"id":431523349,"identity":"ca271a97-b899-4c8c-b6ff-dfab4530dceb","order_by":3,"name":"Eric R. Gamazon","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA3klEQVRIiWNgGAWjYFACxgYE+wPJWhhnkGwhMw8xqvilDzd+YPhlF83ff/jYY9u2w9H8DczHPn7Bo0WyL7FZgrEvOXfGgWPpxrlth4EMtuTZMni0GJxhbGNg7GHObTjYYyadc+Zw7gYGHmNmCTxa7CFa6nPnH+Yxk7YgRosBD1ALww+gymNALQwVEC2M+EJb4gxjs0Riw/HcjWfY0iR7KtJzZxxmS2bGo4OBv4f94YcPf6pz550/fEzih4F1bn9782HGH/j0gEBiGzKPmagI+oPGJ2zLKBgFo2AUjCQAADKQSwUUx2DBAAAAAElFTkSuQmCC","orcid":"https://orcid.org/0000-0003-4204-8734","institution":"Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN; Vanderbilt Memory \u0026 Alzheimer’s Center, Nashville, TN","correspondingAuthor":true,"prefix":"","firstName":"Eric","middleName":"R.","lastName":"Gamazon","suffix":""}],"badges":[],"createdAt":"2025-03-15 02:05:07","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6229829/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6229829/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":78965416,"identity":"6d8fc7c2-09b7-4dbf-82bd-cf693469102b","added_by":"auto","created_at":"2025-03-21 12:44:00","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":268026,"visible":true,"origin":"","legend":"\u003cp\u003eTWAS of 6 major psychiatric conditions.\u003c/p\u003e\n\u003cp\u003eA. Manhattan-style plot of gene level associations with each psychiatric condition displayed. Points are colored according to the gene expression model in which the gene-disease association was identified. The top associations for each disease plot are annotated by gene name. The number of genes surpassing a false discovery rate corrected p-value of 0.05 is annotated in red in the upper right quadrant of each plot.\u003c/p\u003e\n\u003cp\u003eB. Distribution of FDR-significant gene level associations for each disease according to the gene expression model. All 4 G2S models are aggregated into the G2S column. Genes that were significant in both GTEx and G2S are colored dark grey. Those that were significant in G2S only but tested in both models are in fuchsia. The co-tested genes that are significant only in GTEx are in aqua while those genes that were both uniquely tested by and significant in G2S and GTEx are colored pink and blue respectively.\u003c/p\u003e\n\u003cp\u003eADHD- Attention deficit hyperactivity disorder; AUD- Alcohol use disorder; BD1- Bipolar disorder 1; MDD- Major depressive disorder; PTSD- Post traumatic stress disorder; SCZ- Schizophrenia; G2S- GALAII/SAGE; GTEX- Genotype tissue expression consortium; WB- Whole Blood.\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-6229829/v1/973e60ed954df9e1706e8d2e.png"},{"id":78965742,"identity":"8b934183-148e-4b66-b746-4e69ecf554e6","added_by":"auto","created_at":"2025-03-21 12:52:00","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":328428,"visible":true,"origin":"","legend":"\u003cp\u003eQuantitative analysis of SNP feature and gene-level data.\u003c/p\u003e\n\u003cp\u003ea. Diagrammatic overview of TWAS outputs.\u003c/p\u003e\n\u003cp\u003eb. Comparison of estimated effect sizes of GReX on SCZ as called by the African American (AA) G2S and GTEx models. The x-axis shows the normalized effect size of GReX on SCZ risk in G2S. The y-axis position of a point represents the normalized effect size of that same gene-disease association as identified in the GTEx model. The trendline represents a linear regression with the correlation of points across the two models recorded in the upper left quadrant. These associations are not filtered for significance.\u003c/p\u003e\n\u003cp\u003ec. Comparison of p-values for GReX effects on SCZ as called by the AA G2S and GTEx models. The x-axis shows the p-value for the GReX effect size estimate on SCZ risk in G2S. The y-axis position of a point represents the p-value for the effect size estimate of that same gene-disease association as identified in the GTEx model. The trendline represents a linear regression with the correlation of points across the two models recorded in the upper left quadrant. These associations are not filtered for significance.\u003c/p\u003e\n\u003cp\u003ed. Distribution of the number of SNP features per gene unique to the G2S AA training model in red, GTEx training model in green, and those that are shared in blue. Boxplots represent median values and the interquartile range.\u003c/p\u003e\n\u003cp\u003ee. Comparison of SNP feature effect sizes (zscore) matched by gene across the AA G2S and GTEx models. The x-axis shows the weight of a SNP feature on gene expression in G2S. The y-axis position of a point represents the weight of that same SNP-gene association as identified in the GTEx model. The trendline represents a linear regression with the correlation value recorded in the upper left quadrant.\u003c/p\u003e\n\u003cp\u003ef. Study-wide correlation of G2S vs GTEx statistics for SNP prediction weight, global GReX effect size, and significant GReX-disease association effect sizes. Points are colored according to the G2S model that is being compared against GTEx. The y-axis represents the correlation coefficient. Error bars represent the 95% confidence interval. Panels are divided according to the disease under analysis.\u003c/p\u003e\n\u003cp\u003eAA- African American; MX- Mexican American; PR- Puerto Rican; ZSC- zscore\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-6229829/v1/233733c179b94b9d13fd9524.png"},{"id":78965417,"identity":"fce29dae-02af-41fd-a536-99c9db49f126","added_by":"auto","created_at":"2025-03-21 12:44:00","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":200799,"visible":true,"origin":"","legend":"\u003cp\u003eRelationships between SNP features and gene expression\u003c/p\u003e\n\u003cp\u003ea. Visual overview of GWAS, SNP, and gene-based analyses.\u003c/p\u003e\n\u003cp\u003eb. Comparison of effect size estimate magnitudes of SNP features on gene expression in SCZ analysis between the African American G2S model and GTEx. Each box plot shows the median and the interquartile range.\u003c/p\u003e\n\u003cp\u003ec. Comparison of prediction performance of SNP features on gene expression in SCZ analysis between the African American G2S model and GTEx. Each box plot shows the median and the interquartile range.\u003c/p\u003e\n\u003cp\u003ed. We plot the correlation in effect size estimates for gene level associations according to which models called the gene-level association as significant (FDR \u0026lt; 0.05). \u0026nbsp;We then divided the genes into those that included overlapping SNP features (shared_overlapping) and genes that had completely distinct sets of SNP features across GTEx and G2S (shared_distinct). Data points represent the Pearson correlation and error bars represent the 95% confidence interval.\u003c/p\u003e\n\u003cp\u003ee. Distribution of the prediction performance statistics \u003cem\u003e(r\u003c/em\u003e\u003csup\u003e\u003cem\u003e2\u003c/em\u003e\u003c/sup\u003e) per gene according to the SNP feature source for the G2S AA model and the GTEx model. The prediction performance statistics are presented both for the G2S AA model and the G2S GTEx model along the x-axis. Genes with SNP features only in the G2S system are presented in red. Genes with non-overlapping SNP features in both G2S AA and GTEx models are shown in green. The genes with at least one shared SNP predictor across both G2S AA and GTEx models are shown in blue.\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-6229829/v1/5116b6849d92de020f84175f.png"},{"id":78965743,"identity":"ff392c77-1246-48cd-a777-5353e80f9358","added_by":"auto","created_at":"2025-03-21 12:52:00","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":221411,"visible":true,"origin":"","legend":"\u003cp\u003eCross-ancestry analysis of MDD TWAS\u003c/p\u003e\n\u003cp\u003ea. Schematic of parallel TWAS analyses. We perform a TWAS of MDD using GTEx models applied to GWAS summary statistics from a uniquely European cohort. In parallel, we perform TWAS of MDD using the G2S models applied to GWAS summary statistics from a predominantly African ancestry cohort.\u003c/p\u003e\n\u003cp\u003eb. The number of SNP features per gene used by each model in the TWAS of MDD from the pertinent GWAS analyses.\u003c/p\u003e\n\u003cp\u003ec. The y-axis represents the median weight of SNP features on gene expression for all genes across the different models. Each boxplot represents the interquartile range.\u003c/p\u003e\n\u003cp\u003ed. Correlation of G2S vs GTEx statistics for SNP prediction weight, global GReX effect size estimates, and significant GReX-disease association effect size estimates. Points are colored according to the G2S model that is being compared against GTEx. The y-axis represents the correlation coefficient. Error bars represent the 95% confidence interval.\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-6229829/v1/3783ac677fca7ae18525594c.png"},{"id":78965421,"identity":"55758483-6769-4bbe-9e58-132a7fd8fe9f","added_by":"auto","created_at":"2025-03-21 12:44:00","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":164790,"visible":true,"origin":"","legend":"\u003cp\u003eCorrelograms detailing the transcriptomic correlations across 6 major psychiatric disorders according to the G2S and GTEx whole blood models. The axes are categorical representations of each psychiatric condition. The size and color of dots at the intersection of traits describe the transcriptomic correlation of effect size estimates across all tested genes for both traits.\u003c/p\u003e","description":"","filename":"5.png","url":"https://assets-eu.researchsquare.com/files/rs-6229829/v1/ec48b15dc943dbd232fe420a.png"},{"id":78966681,"identity":"34d7b77e-a8c8-4231-bdea-a67742d3477f","added_by":"auto","created_at":"2025-03-21 13:08:00","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":472434,"visible":true,"origin":"","legend":"\u003cp\u003eEstimated effects of TWAS gene expression on cortical and subcortical morphology.\u003c/p\u003e\n\u003cp\u003ea. We use the NeuroimaGene resource to quantify the effects of GReX on the human brain according to reference data from healthy individuals. Cortical measures are reported according to the Desikan parcellation with area, thickness, and volume measures in both hemispheres shown for all 6 conditions. Subcortical findings reflect the freesurfer automated segmentation protocol and accord with volumetric measurements. The color schema indicates the mean effect size estimate of all associated disease GReX measures on the brain region.\u003c/p\u003e\n\u003cp\u003eb. Certain neuroimaging findings are correlates of GReX from GTEx models while others are correlates of GReX from G2S models. We show the percentage of total detected NIDPs that are associated only with GReX from G2S models in panel b with bars split by the measurement type as shown on the x axis.\u003c/p\u003e","description":"","filename":"6.png","url":"https://assets-eu.researchsquare.com/files/rs-6229829/v1/79a7bc324d26e5df6c443d07.png"},{"id":79204664,"identity":"91b9ed9d-15f4-428f-9e69-f0f2a6329bf2","added_by":"auto","created_at":"2025-03-25 15:25:24","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2490789,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6229829/v1/bc87e071-242a-4ed8-abfe-320fb362a29d.pdf"},{"id":78965419,"identity":"e2962f94-abe9-4932-8cfb-7548b4243658","added_by":"auto","created_at":"2025-03-21 12:44:00","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":8983495,"visible":true,"origin":"","legend":"Supplementary Information","description":"","filename":"Supplemental.docx","url":"https://assets-eu.researchsquare.com/files/rs-6229829/v1/5660cf010488edae326ffd3d.docx"}],"financialInterests":"\u003cb\u003eYes\u003c/b\u003e there is potential Competing Interest.\nE.R.G. has served as a consultant for Thryv Therapeutics.","formattedTitle":"Admixed gene expression models expand molecular and neurological insights into 6 major psychiatric disorders","fulltext":[{"header":"Background","content":"\u003cp\u003eUncovering the physiologic basis of psychiatric conditions has long been a challenge in medical science. Genome wide-association studies (GWAS) and large-scale cohort studies have heralded a major expansion in molecular associations with psychiatric conditions.\u003csup\u003e\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u003c/sup\u003e Recently, transcriptome-wide association studies (TWAS) have built upon the GWAS framework by identifying imputed genetically determined gene expression measures associated with diagnoses.\u003csup\u003e\u003cspan additionalcitationids=\"CR3\" citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e–\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u003c/sup\u003e Improvements in TWAS implementation offer further enhancements for evaluation of psychiatric disease mechanisms.\u003c/p\u003e \u003cp\u003eOne such improvement is the expansion of ancestral diversity in the training data for TWAS models. The TWAS methodology relies on pre-existing reference panels which detail the relationship between single nucleotide polymorphisms and RNA transcript quantity.\u003csup\u003e\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u003c/sup\u003e Model training transforms these data into predictive models\u003csup\u003e\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u003c/sup\u003e which can then be used by formal methods to quantify associations between genetically regulated gene expression (GReX) and disease.\u003csup\u003e\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e,\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e\u003c/sup\u003e\u003c/p\u003e \u003cp\u003eTraditionally, TWAS models have been trained primarily on individuals of European ancestry.\u003csup\u003e\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e,\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e,\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u003c/sup\u003e It has been shown that the predictive performance of models trained in individuals of European ancestry is diminished when these models are applied to individuals from different populations.\u003csup\u003e\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e,\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u003c/sup\u003e As new GWAS meta-analyses for psychiatric conditions from ancestrally diverse study participants are published, the development and implementation of admixed TWAS models may improve interpretation of GWAS findings.\u003c/p\u003e \u003cp\u003eRecently, Kachuri et al generated TWAS models from two consortia enriched for individuals of admixed ancestry: the Genes-environments and Admixture in Latino Asthmatics (GALA II) and Study of African Americans, Asthma, Genes, and Environments (SAGE).\u003csup\u003e\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u003c/sup\u003e These models were trained on data from 2,733 individuals who self-identify as African American, Mexican, Puerto Rican, or other Latino American. The individuals studied in each model demonstrate different proportions of genetic admixture across African, American, and European superpopulations\u003csup\u003e\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u003c/sup\u003e. We hypothesize that this increased genetic diversity will enable identification of novel gene-level associations in the context of psychiatric disease. However, it is not yet clear how discordance in ancestry between GWAS cohorts and TWAS models affect the disease-GReX associations identified by TWAS.\u003c/p\u003e \u003cp\u003eHere we used the G2S models to perform TWAS for psychiatric conditions. To avoid trait-specific findings, we performed the TWAS in parallel on 6 major psychiatric conditions: major depressive disorder, alcohol use disorder, attention deficit hyperactivity disorder, bipolar disorder, post-traumatic stress disorder, and schizophrenia.\u003csup\u003e\u003cspan additionalcitationids=\"CR13 CR14 CR15 CR16 CR17\" citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e–\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e\u003c/sup\u003e We then performed TWAS for each condition using a whole blood gene expression model trained in individuals of predominantly European ancestry from the Genotype-Tissue Expression (GTEx) consortium.\u003csup\u003e\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u003c/sup\u003e\u003c/p\u003e \u003cp\u003ePrior to interpretation of the resulting gene-level associations, we first performed an in-depth statistical assessment into the comparative performance of the G2S and GTEx models when applied to GWAS of varying ancestries. We found substantial heterogeneity in the SNP features for the same genes across different models. This heterogeneity was reflected in poorly correlated p-values for the gene level associations present in both G2S and GTEx derived TWAS findings. When we examined the effect size estimates themselves, however, we observed a high degree of correlation. The estimated effect of GReX on disease risk was largely stable against variation in model ancestry, disease type, GWAS ancestry, SNP features, and SNP weights, indicating that SNP level differences between populations converge on similar gene-level targets in the context of psychiatric disease. Lastly, we show the value of the admixed TWAS approach by presenting each newly expanded set of gene level associations and imputing the neurological consequences of disease-associated transcriptomic signatures. We provide evidence that the application of ancestrally diverse gene expression models to psychiatric GWAS’ replicates and substantially expands the set gene-level and brain-level associations previously obtained from European-ancestry models.\u003c/p\u003e "},{"header":"Methods","content":"\u003ch3\u003eGWAS summary statistics\u003c/h3\u003e\u003cp\u003eWe utilized GWAS meta-analyses summary statistics from the Psychiatric Genetics Consortium (PGC) for all 6 psychiatric diseases.\u003csup\u003e\u003cspan additionalcitationids=\"CR13 CR14 CR15 CR16 CR17\" citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e–\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e\u003c/sup\u003e For each condition, we selected the most recent large multi-ancestry GWAS, with the exception of BD1, ADHD, and AUD for which the largest GWAS included individuals exclusively of European descent (Table \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e). For select GWAS analyses, summary statistics for both the full meta-analysis and subsets of ancestry-specific data were available. In such instances, we analyzed all available datasets separately.\u003c/p\u003e\u003ch2\u003eGene expression models\u003c/h2\u003e\u003cp\u003eWe leveraged 4 PrediXcan (whole blood gene expression) models trained on data from the GALAII/SAGE (G2S) cohort studies. Three models are specific to self-reported ancestry: African American, Mexican American, and Puerto Rican. G2S includes individuals of ‘Other Latino’ ancestry; however, with only 299 individuals, this group was too small to create an independent training model. Their genetic and transcriptomic data are included in the fourth model, referred to as ‘All whole blood’. This model includes all data across the full cohort in a single gene expression prediction model. These models are publicly available on Zenodo at https://zenodo.org/doi/\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.5281/zenodo.6622367\u003c/span\u003e\u003cspan address=\"10.5281/zenodo.6622367\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/p\u003e\u003cp\u003eFor the predominantly European whole blood gene expression model we accessed the GTEx data from the public Zenodo repository https://zenodo.org/doi/\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.5281/zenodo.3842263\u003c/span\u003e\u003cspan address=\"10.5281/zenodo.3842263\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. The model is trained on GTEx v8 data. GTEx version 8 contains 15,201 RNA-sequencing samples quantified from 49 tissues of 838 postmortem donors. 85.3% of the donor population (715) was identified as European American, with additional demographics including 12.3% African American (103), 1.4% Asian American (12), and 1.9% reporting Hispanic or Latino ethnicity (16). 66.4% of donors (557) were male and 33.5% were female (285). \u003csup\u003e\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u003c/sup\u003e We selected the PrediXcan model for whole blood run to best match the tissue used in the creation of the GALAII/SAGE models which was also whole blood.\u003c/p\u003e\u003ch3\u003ePrediXcan TWAS analyses\u003c/h3\u003e\u003cp\u003eWe performed TWAS on all GWAS summary statistics for the 6 selected traits obtained from the PGC. We used S-PrediXcan command line tool from the MetaXcan package using the 5 different gene expression models described above for each set of summary statistics.\u003csup\u003e\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u003c/sup\u003e Each pairing of GWAS summary statistics and TWAS gene expression model is assessed individually. Prior evidence points to the presence of large inversion structural variants in the chromosome 17 and 8 regions that are disproportionately detected in European genetic ancestry and also violate statistical assumptions regarding LD underlying the TWAS approach.\u003csup\u003e\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e\u003c/sup\u003e Consistent with prior practice, we removed the results from these regions that are predicted by the European dominant model.\u003csup\u003e\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u003c/sup\u003e In all models, we removed findings from the MHC region due to similar issues in structural complexity.\u003csup\u003e\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e\u003c/sup\u003e\u003c/p\u003e\u003ch3\u003eTWAS multiple testing correction\u003c/h3\u003e\u003cp\u003eWe performed multiple testing correction using the Benjamini-Hochberg false discovery rate (FDR \u0026lt; 0.05). We assessed TWAS results for each disease and gene expression model pairing as independent studies.\u003c/p\u003e\u003ch3\u003eMean correlation for each disease/ancestry pair as a function of p-value threshold\u003c/h3\u003e\u003cp\u003eWe performed TWAS of the ancestry specific GWAS datasets using the 5 gene expression models described above. At each TWAS p-value threshold, we calculated the Pearson correlation of effect size estimates with the corresponding discovery TWAS. We then identified the mean correlation for each disease/ancestry pair at each p-value threshold.\u003c/p\u003e\u003ch3\u003eCross-model SNP feature comparison\u003c/h3\u003e\u003cp\u003eWe compared the SNP features in the G2S models against the GTEx whole blood model. We first subset both G2S models and GTEx models with the SNPs from the GWAS study. We then annotated each gene as ‘G2S’ if there are only SNP features for that gene in a G2S model. We made a similar annotation for GTEx. Some genes had SNP features in both models. We annotated these genes as either ‘shared_distinct’ when the SNP features did not overlap between the two studies or ‘shared_overlapping’ if there was at least one SNP shared between the two. We defined an additional gene-level annotation depending on whether the gene-disease association reached statistical significance in none of the models, G2S only, GTEx only, or both. Stratifying genes across the model statistical significance and SNP sharing, we calculated the correlation of the normalized effect size estimates (zscore) as calculated in the G2S model and the GTEx model.\u003c/p\u003e\u003ch2\u003eGene annotation by model specificity\u003c/h2\u003e\u003cp\u003eThe full set of genes evaluated by GTEx is not identical to the set of gene evaluated by G2S. Certain genes are characterized only in one model but not the other. We annotated each gene according to its presence across models. Genes that were tested in any of the 4 G2S models but not GTEx were classified as ‘G2S only’ while those tested in GTEx whole blood but none of the G2S genes were classified as ‘GTEx only’. The genes that were tested in both GTEx and at least one G2S gene were classified as ‘co-tested’.\u003c/p\u003e\u003ch3\u003eTranscriptomic similarity across psychiatric diseases\u003c/h3\u003e\u003cp\u003eFor each disease/ancestry pair, we curated a list of effect size estimates of all associated genes that passed the significance threshold (FDR \u0026lt; 0.05). We then calculated the Pearson correlation of effect size estimates for shared genes across diseases for each gene model. These data were visualized by correlogram.\u003c/p\u003e\u003ch3\u003eNeuroimaGene mapping of disease genes to brain structure\u003c/h3\u003e\u003cp\u003eWe used the NeuroimaGene package in R to identify neuroimaging features implicated by the transcriptomic signatures of each disease.\u003csup\u003e\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e\u003c/sup\u003e The NeuroimaGene R package enables the user to test for statistically significant associations between GReX measures and MRI-based neuroimaging derived phenotypes (NIDPs). These NIDPs were characterized in European individuals from the UK biobank and encode measurements of cortical area, volume, and thickness as well as subcortical volumes. The repository of GReX-NIDP associations was generated via application of JTI-PrediXcan to GWAS of over 3,500 NIDPs in the UK Biobank. Full description of the data and methods can be found in the original publication.\u003csup\u003e\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u003c/sup\u003e We used the NeuroimaGene method to extend the interpretability of our disease TWAS by relating molecular correlates of disease to brain features which may carry psychiatric or psychological import. We first identified transcriptomic correlates with each of the 6 diseases. To assess the broadest single set of results from the G2S models, we subset the TWAS findings to those derived from the aggregate whole blood model ( All whole blood). We then used NeuroimaGene to identify the neuroimaging features in the Desikan cortical atlas and ‘Subcortex’ subcortical atlas that are associated (Benjamini-Hochberg FDR \u0026lt; 0.05) with expression of each gene set\u003csup\u003e\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e,\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e\u003c/sup\u003e. We then compared the number of NIDPs from the application of the G2S and GTEx models and quantified the relative increase from the use of admixed models.\u003c/p\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec12\" class=\"Section2\"\u003e\n \u003ch2\u003eA majority of TWAS findings derive from admixed models\u003c/h2\u003e\n \u003cp\u003eWe identified 1,416 statistically significant associations (FDR p\u0026thinsp;\u0026lt;\u0026thinsp;0.05) between GReX and all 6 psychiatric diagnoses (Fig. \u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003ea, S1). Most of these associations derived from analyses of bipolar disorder 1 (263), and schizophrenia (717), with the rest from ADHD (121), major depressive disorder (64), alcohol use disorder (17), and PTSD (235). A majority of associations (62%) were uniquely detected by models trained on individuals of African American, Latino, Puerto Rican, and Mexican self-reported ancestry. This stands in contrast to the 14% of gene-level associations specific to the majority European model trained in GTEx. The high performance of the G2S models relative to GTEx remains true for bipolar disorder 1, alcohol use disorder, and ADHD, which were all derived from GWAS in individuals of exclusively European genetic ancestry\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e12\u003c/span\u003e,\u003cspan class=\"CitationRef\"\u003e13\u003c/span\u003e,\u003cspan class=\"CitationRef\"\u003e17\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec13\" class=\"Section2\"\u003e\n \u003ch2\u003eDisease TWAS associations differ based on ancestry background of gene models\u003c/h2\u003e\n \u003cp\u003eWe observed that only 24% of all gene-level associations passed TWAS significance thresholds using both predominantly European and admixed models. On assessment, 79% (939) of significant TWAS genes were tested in both GTEx and G2S (Fig. \u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003eb). Of the other 247 significant genes, 213 were only assayed by G2S while 34 were only assayed by GTEx. The 79% of co-tested genes contrasts with the 24% of significant gene-level associations that replicated across models. Because the same GWAS summary statistics are used for the different TWAS, the remaining differences in gene-level associations must arise from incongruities in the SNP features for the G2S and GTEx models.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec14\" class=\"Section2\"\u003e\n \u003ch2\u003eGene-level TWAS associations demonstrate high correlation of effect sizes\u003c/h2\u003e\n \u003cp\u003eAccordingly, we examined the TWAS effect size estimates across G2S and GTEx. Each gene-level association is characterized by an estimated effect size of GReX on disease risk (Fig. \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003ea). For all genes and models, the correlations in GReX-disease effect sizes ranged from 0.66 to 0.80 with a mean correlation of 0.76 (Fig. \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003eb, Figure S2). This high correlation in effect sizes rose to an average of 0.93 when we considered only those genes that were called significant in at least one of the models (Figure S3). For the genes significant in both models, the effect sizes obtained a correlation of 0.99 (Table S2). Notably, for the genes that were called significant in either a G2S model or GTEx but not both, the effect sizes still obtained a mean correlation of 0.88 (Table S2). The 24% replication in significant findings was thus not explained by differences in effect sizes between G2S and GReX. The effect sizes matched with a high degree of correlation.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec15\" class=\"Section2\"\u003e\n \u003ch2\u003eGene-level TWAS associations demonstrate high p-value heterogeneity\u003c/h2\u003e\n \u003cp\u003eWe calculated the correlation in p-values across G2S and GTEx associations. Across all genes, we obtained a mean correlation coefficient of 0.50 (Fig. \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003ec, S4). Restricting the set of analyzed genes to those significant in at least one TWAS resulted in a mean correlation coefficient of -0.04 (0.32 for shared significant genes; Figure S5). More significant associations demonstrated lower correlation in p-values. This low correlation in significance statistics is much more consistent with the low replication of significant gene-level associations across G2S and GTEx.\u003c/p\u003e\n \u003cp\u003eWe highlight an association between \u003cem\u003eAPPL2\u003c/em\u003e and bipolar disease 1 as a representative example. The normalized effect size estimates for this association in GTEx whole blood and the aggregate G2S whole blood model are similar in magnitude and direction (-2.05 and \u0026minus;\u0026thinsp;3.22 respectively). While the effect size estimates are similar, the association reaches statistical significance in the aggregate G2S model only (P\u003csub\u003eFDR\u003c/sub\u003e = 0.044) but not in GTEx (P\u003csub\u003eFDR\u003c/sub\u003e = 0.29). \u003cem\u003eAPPL2\u003c/em\u003e has been implicated in exome sequencing of individuals with bipolar disorder 1\u003csup\u003e24\u003c/sup\u003e and functions as a molecular regulator for the mania-associated \u003cem\u003eDISC1\u003c/em\u003e locus\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e25\u003c/span\u003e\u003c/sup\u003e. Detected in G2S but not in GTEx, these data provide the first evidence associating GReX of \u003cem\u003eAPPL2\u003c/em\u003e with risk of bipolar disorder.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec16\" class=\"Section2\"\u003e\n \u003ch2\u003eG2S and GTEx models rely on largely distinct sets of SNP features\u003c/h2\u003e\n \u003cp\u003eWe next assessed if the differences in SNP features across G2S and GTEx informed the high p-value discordance of the gene-level associations. We identified the intersecting set of SNPs that were both reported in the GWAS and used in each gene expression model. The G2S models used a median of 28 SNPs per gene while GTEx used a median of 8 (Fig. \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003ed, S6). The average number of GWAS-interrogated SNPs shared across both models per gene was 0.90. In 56% of genes tested, no SNPs were shared between the G2S and GTEx models. The G2S models thus used more SNPs in gene expression prediction and used largely different sets of SNPs. Both the G2S and GTEx variants were obtained using whole genome sequencing.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec17\" class=\"Section2\"\u003e\n \u003ch2\u003eCross-ancestry heterogeneity exists in SNP features of gene expression\u003c/h2\u003e\n \u003cp\u003eHigher weight magnitude for a SNP feature implies that the SNP predicts a greater change in gene expression. Across diseases, only 1.8% of SNP features were shared for the same genes across the models (Fig. \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003ed, S6). We assessed the correlation in weights assigned to these SNP features across G2S and GTEx models. Correlations in the SNP weights ranged from 0.40\u0026ndash;0.43 across diseases and models (Fig. \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003ee, S7). While G2S and GTEx used largely different sets of SNP features, even when they use the same SNP feature, the assigned weights were only modestly correlated. We thus observed high correlation in effect size estimates for all gene level associations in the presence of low sharing and low similarity of SNP weights (Fig. \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003ef).\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec18\" class=\"Section2\"\u003e\n \u003ch2\u003eBroad differences exist across prediction models\u003c/h2\u003e\n \u003cp\u003eBeyond the correlation of weights, the performance of models could be influenced by the distribution of SNP features, the accuracy of prediction performance, and other variables (Fig. \u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003ea). As was previously noted, the G2S models used a median of 28 SNP features per gene compared to the median of 8 in GTEx. We also observed that the median weight of each SNP feature on its gene was less in G2S than GTEx (Fig. \u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003eb). The result of differing SNP sets on gene expression imputation was reflected in the prediction performance statistic of each model. G2S obtained a median \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{r}^{2}\\)\u003c/span\u003e\u003c/span\u003e of 0.157 while GTEx was significantly lower at 0.124 (Fig. \u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003ec, S8). Conversely, for genes captured by both G2S and GTEx, the \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{r}^{2}\\)\u003c/span\u003e\u003c/span\u003e was greater in G2S models (Figure S9). Collectively the G2S models predicted gene expression using a greater number of SNPs and more low-weight SNPs than the GTEx model. Regarding the SNP features shared between G2S and GTEx, the weights assigned by G2S were generally higher than those in GTEx (Fig. \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003ee, S10). Thus, there are systemic differences between the SNP features identified in GTEx vs those derived from populations of African American, Mexican American, and Puerto-Rican individuals.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec19\" class=\"Section2\"\u003e\n \u003ch2\u003eCross-model TWAS correlations remain high despite distinct SNP features\u003c/h2\u003e\n \u003cp\u003eThe high effect-size correlation is not intuitive given the systemic differences in the SNP predictors between models. Here we assessed the null hypothesis that the convergent effect size estimates are due to shared, high impact SNP predictors used in both G2S and GTEx. To test the null, we first curated all genes with models in both G2S and GTEx. We then divided these genes into two categories depending on if the sets of SNP features for the gene are distinct (shared_distinct) or if there was at least one overlapping SNP between the predictor sets (shared_overlapping). We next stratified genes according to their TWAS significance in the G2S and GTEx based analyses. Lastly, we calculated the correlation coefficient for TWAS effect size estimates and p-values. The gene level associations derived from non-overlapping sets of SNP features demonstrated correlation coefficients that were lower but similar in range to the gene level associations derived from overlapping sets of SNP features. While sharing SNP features did increase the correlation of TWAS effect size estimates, correlations greater than 0.84 and 0.9 were still observed for genes that are significant in one or both models respectively but had no intersecting SNPs (Fig. \u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003ed, S11). Notably these findings did not replicate for correlations of p-values, where high discordance persisted across diseases and subcategories (Figure S12).\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec20\" class=\"Section2\"\u003e\n \u003ch2\u003eSNP sharing predicts gene expression with greater prediction accuracy\u003c/h2\u003e\n \u003cp\u003eThe \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{r}^{2}\\)\u003c/span\u003e\u003c/span\u003e cross validation measure represents the correlation between predicted gene expression and measured gene expression from the training sets in GTEx and G2S.\u003c/p\u003e\n \u003cp\u003eRegardless of whether the \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{r}^{2}\\)\u003c/span\u003e\u003c/span\u003e cross validation is assessed in GTEx or G2S, the genes with the greatest predictive accuracy were those classified in both GTEx and G2S models with overlapping SNP predictors (Fig. \u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003ee, S13). Additionally, even when the SNP predictor sets were fully distinct, these dual-classified genes still demonstrated greater \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{r}^{2}\\)\u003c/span\u003e\u003c/span\u003e cross validation than genes called significant by only a single model.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec21\" class=\"Section2\"\u003e\n \u003ch2\u003eParallel ancestry-concordant TWAS recapitulate high correlations in effect size estimates\u003c/h2\u003e\n \u003cp\u003eThree multi-ancestry GWAS meta-analyses (PTSD, SCZ, and MDD) provided summary statistics stratified by ancestry. We performed TWAS on each subset of GWAS summary statistics using the G2S and GTEx models (Figure S14). Analyses of the African and Indigenous American subsets were not sufficiently powered to identify any significant gene-level associations for any of the conditions (Figure S15). To perform parallel, ancestry concordant TWAS analyses, we used two different meta-analyzed GWAS of MDD performed by the PGC. The first is described above and included individuals of African (36%), East Asian (26%), East Asian (6%) and Hispanic/Latin American (32%).\u003csup\u003e14\u003c/sup\u003e The second is a GWAS meta-analysis in individuals of only European descent (Fig. \u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003ea).\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e26\u003c/span\u003e\u003c/sup\u003e We performed a TWAS of the European GWAS using the GTEx whole blood model. In parallel, we performed TWAS of the multi-ancestry GWAS using the G2S models. As with the meta-analyzed data, we identified SNPs used in each of the G2S TWAS and compared them to the SNP features used in the GTEx TWAS. The G2S models consistently used more SNPs than GTEx (Fig. \u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003eb) and included SNPs with smaller SNP weight magnitude on gene expression (Fig. \u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003ec). Across shared SNP-gene pairs, we observed a correlation in the SNP prediction weights of approximately 0.42. Considering the estimated associations between GReX and disease, the correlation of effect size estimates with GTEx was approximately 0.75 for all shared genes detected in the G2S models. Restricting the effect size comparison to genes that were significantly associated with MDD in at least one model, we again identified correlation values of greater than 0.95 (Fig. \u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003ed).\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec22\" class=\"Section2\"\u003e\n \u003ch2\u003ePhenomic correlation analysis using transcriptomic signatures\u003c/h2\u003e\n \u003cp\u003eIn the analyses up to this point, we treated each disease as its own entity and compared the details of the different models within the disease. Demonstrating similarities in performance within diseases does not inform how the models will perform across diseases. The predicted relationships of transcriptomic signatures across diseases could differ depending on the model used to impute GReX. As such, we quantified the transcriptomic similarity of all 6 diagnoses. Within each ancestry model, we calculated the Pearson correlation of s across all FDR significant genes associated with each disease (Fig. \u003cspan class=\"InternalRef\"\u003e5\u003c/span\u003e). The rank ordering of disease pairs by correlation coefficient was similar across all diseases. The greatest transcriptomic correlation was observed between MDD and SCZ, followed by ADHD and PTSD. In all cases, we observed high correlation between BD1 and ADHD. Alcohol use disorder demonstrated little transcriptomic correlation with the other 5 diagnoses. These patterns were robust to variation in the gene expression model used across both G2S and GTEx.\u003c/p\u003e\n \u003cdiv id=\"Sec23\" class=\"Section3\"\u003e\n \u003ch2\u003eMultivariate imaging correlates of transcriptomic disease profiles\u003c/h2\u003e\n \u003cp\u003eThe TWAS methodology identifies statistical associations between gene expression and each psychiatric condition. This approach does not implicate aspects of organ-level biology that may mediate disease risk. Prior studies suggest that GReX changes associated with psychiatric disease have consequences on brain physiology that may affect disease risk\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e27\u003c/span\u003e\u003c/sup\u003e. We wanted to assess the extent to which using admixed gene expression models in TWAS improved the identification of putative biological intermediates involved in risk. We used the NeuroimaGene approach to assess the impact of the trait-associated GReX for each condition on measures of neurological structure. NeuroimaGene functions as a repository of associations between GReX and over 3,400 neuroimaging derived phenotypes. Derived from predominately healthy individuals, these associations quantify endogenous relationships between imputed gene expression and the physical structure of the brain. We used GReX measures derived from the G2S aggregate model for each disease. Using NeuroimaGene, we tested for associations between these GReX measures and NIDPs from two atlases: the Desikan cortical atlas and an automated segmentation of subcortical regions (Fig. \u003cspan class=\"InternalRef\"\u003e6\u003c/span\u003e).\u003c/p\u003e\n \u003cp\u003eThe genes associated with BD1 and SCZ both implicated widespread cortical alterations. These findings are seen most dramatically in the negative correlation between cortical thickness and disease GReX measures. Alcohol use disorder and MDD presented with relatively little cortical targeting by trait-associated genes. This contrasted with subcortical findings where prominent effects on the bilateral putamen in alcohol use disorder and widespread subcortical involvement in MDD were observed. We noted that 59% of all NIDPs associated with trait genes were the result of genes that were significant in G2S but not GTEx. This number was highest in the volume and thickness measures associated with MDD and subcortical volumes in schizophrenia, all reaching 100%. With only 14 total NIDPs identified, alcohol use disorder presented with no G2S specific NIDPs. These statistics likely underestimate the breadth of potential information carried by the G2S genes given that the NeuroimaGene resource does not include information from diverse cohorts. It only reflects gene-level associations derived from Europeans. This limitation notwithstanding, the 489 additional NIDPs associated with the 6 conditions via G2S represent a 183% improvement over those identified by GTEx alone. As with the initial TWAS, we again observed that the use of admixed models improved association detection in context of predominately European genetic ancestries, this time in the space of neuroimaging associations.\u003c/p\u003e\n \u003c/div\u003e\n\u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003eWe compared the results from TWAS using 5 gene expression models that differ in their genetic ancestries. To minimize capturing phenotype specific results, we repeated the analysis for 6 different psychiatric GWAS meta-analyses. Gene expression models trained on admixed populations (AFR, EUR, and AMR) generally identified more significant gene level (TWAS) associations than models trained on individuals of predominantly European ancestry. We observed this pattern when TWAS was applied to GWAS cohorts of European ancestry as well as cross ancestry meta-analyses. Our results are consistent with prior findings suggesting that the increased variance in allele frequencies can improve TWAS association power\u003csup\u003e\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eOverall, we identified 1,416 gene-level associations with psychiatric diagnoses, of which 881 were uniquely detected in highly admixed ancestry models compared to European ancestry models. These findings suggest that there may be significant additional utility in increasing the genetic diversity of transcriptomic resources.\u003c/p\u003e \u003cp\u003eOne previously unreported result is a significant association between ADHD and \u003cem\u003eDCHS1\u003c/em\u003e in the G2S models. The association does not reach statistical significance in the GTEx whole blood analysis and has not been associated with ADHD in preexisting literature. \u003cem\u003eDCHS1\u003c/em\u003e codes for a cadherin protein that is most strongly detected in the developing fetal brain\u003csup\u003e\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e\u003c/sup\u003e. Other cadherins have already been strongly associated with ADHD, supporting the plausibility of \u003cem\u003eDCHS1\u003c/em\u003e.\u003csup\u003e\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e\u003c/sup\u003e Regarding functional support, observational studies in humans as well as interventional analyses in mice indicate that perturbations of \u003cem\u003eDCHS1\u003c/em\u003e leads to disruptions in cerebral cortical development.\u003csup\u003e\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e\u003c/sup\u003e Lastly, previous data shows that gene expression models trained on non-brain tissues suffer from reduced power regarding brain-related TWAS than those trained in the brain\u003csup\u003e\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u003c/sup\u003e. While \u003cem\u003eDCHS1\u003c/em\u003e is not associated with ADHD in the GTEx whole blood model, the association achieves statistical significance when using a gene expression model trained on the brain cortex from GTEx. This cross-tissue replication suggests that some of the inherent limitations of using whole blood models to predict psychiatric conditions may be ameliorated by using admixed gene expression models.\u003c/p\u003e \u003cp\u003eThere was incomplete portability of statistically significant gene-level associations across G2S and GTEx models. Specifically, 76% of significant associations were either significant in a G2S model or the GTEx model but not via both. When one gene-disease association is statistically significant in one study and not another, it is not immediately clear which p-value statistic should be accorded more weight. This set of p-value discordant genes is thus of high import as they may represent false positives or novel associations with potential biological implications.\u003c/p\u003e \u003cp\u003eFocusing in on the associations with discrepant p-values, we observed a high degree of correlation in the effect size estimates (\u0026gt;\u0026thinsp;90%). This correlation is not likely to be a consequence of SNP overlap between models given that the majority of G2S/GTEx comparisons had zero overlapping SNP features per gene. When there were overlapping SNP features between genes, the weights of the features were often different, reflected by correlation coefficients near 0.4. The G2S and GTEx models (1) leveraged largely different SNPs, (2) accorded different weights to the small proportion of shared SNPs and (3) still arrived at highly correlated effect size estimates for gene-disease associations.\u003c/p\u003e \u003cp\u003eOur finding of convergent gene-level associations from divergent populations of SNP features speaks to an interesting question about the role of ancestry in genetic studies. Variation in the presence and frequency of different alleles across different populations is well described\u003csup\u003e\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e\u003c/sup\u003e. As a result, when SNP-based prediction algorithms such as PRS are trained in one ancestry group, they often suffer from reduced accuracy in other populations on account of these differing SNP profiles\u003csup\u003e\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e,\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e\u003c/sup\u003e. Similarly, GWAS of the same trait can identify different trait-associated SNP variants when performed in populations of different genetic ancestries\u003csup\u003e\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e,\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e\u003c/sup\u003e. The prevailing hypothesis regarding complex disease is that individuals of different populations share similar molecular disease processes which are merely being tagged by different SNPs\u003csup\u003e\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e,\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e\u003c/sup\u003e. Our data support this paradigm. Specifically, we observe substantial ancestry-based heterogeneity regarding which GWAS SNPs predict gene expression. The convergence of these different SNP features onto a shared set of highly correlated gene-disease associations suggests that similar transcriptomic markers of psychiatric disease are tagged by different SNPs depending on the population used for imputation reference. This is further supported by our analysis of MDD for which two different ancestry-stratified GWAS were each assessed via the TWAS methodology using ancestry concordant gene expression models.\u003c/p\u003e \u003cp\u003eLinkage disequilibrium is a topic that deserves special consideration in the context of this analysis. Our main findings were that (1) the highly admixed G2S models identified more associations than the predominately European GTEx model, (2) the effect size estimates of gene-disease associations were largely similar across tested ancestry groups, and (3), this sharing was robust to highly discordant SNP features across gene expression models. Linkage disequilibrium is most relevant to finding 3 and we expect that some degree of cross ancestry LD exists between SNP features from G2S and GTEx. While LD could explain a proportion of the correlation in gene-disease effect size estimates, it would not invalidate the primary result which is that the correlation exists. Should that correlation exist due to cross-ancestry LD of SNP features, such a finding would further validate the claim that there is substantial sharing of gene-disease associations across populations. While we do not assess this question here, such an analysis could be an interesting follow up to the work presented.\u003c/p\u003e \u003cp\u003eThe full set of transcriptome associations for each disease represents a method of describing a disease entity by its molecular correlates. We compared the molecular similarities between diseases (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e). If PTSD for example possessed a different molecular profile in one ancestry relative to another, we would expect the transcriptomic relationship of PTSD to other diseases to differ. Instead, we observed a similar relationship between transcriptomic signatures of disease across all 5 models.\u003c/p\u003e \u003cp\u003eLastly, we observed that the inclusion of genes identified by the admixed models increased the identified disease-relevant neuroimaging features by 183%, highlighting the potential for these associations to better inform the neurobiology of psychiatric disease.\u003c/p\u003e \u003cp\u003eThis study has several relevant limitations. The G2S models are defined using self-reported ancestry\u003csup\u003e\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u003c/sup\u003e. Because admixture thresholds were not used for participant inclusion, the models cannot be treated as strictly representative of discrete genetic ancestries. European admixture increases the SNP-level similarity of the G2S and GTEx models. The same is true for GTEx given the 12% of participants who were identified as African American and the ~\u0026thinsp;3.1% of individuals identifying as Asian American or Latino. While the result of this admixture would be to increase SNP-level similarities in gene expression models, we still observe substantial heterogeneity which then converges on shared biological intermediates including RNA transcripts and neuroimaging features. Secondly, we limit these analyses to psychiatric disease GWAS published by the Psychiatric Genomics Consortium to minimize heterogeneity in the meta-analytic methodology for the source data. Therefore, these findings are not guaranteed to extrapolate to non-psychiatric conditions. Our data include just five gene expression models covering only European, indigenous American, and African genetic ancestries. There is a tremendous genetic variation in other people groups that is not represented here. We anticipate the creation and validation of additional TWAS models to further such analyses.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eData Availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAll data generated in the production of this manuscript can be found at Zenodo [https://zenodo.org/uploads/14889758] (temporary link to be finalized following revisions)\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCode Availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAll code involved in the production of this manuscript and the analyses therein can be found at Zenodo [https://zenodo.org/uploads/14889758] (temporary link to be finalized following revisions)\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor Contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eX.B. contributed to the design and implementation of the research, X.B., N.W., and T.B. contributed to the analysis of the results and X.B. and E.R.G. contributed to the writing of the manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgment\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis study was supported by the following National Institutes of Health (NIH) grants to E.R.G.: NHGRI R35HG010718, NHGRI R01HG011138, NIA AG068026, NIGMS R01GM140287, NIMH R01MH126459, and the Scott Hamilton Foundation.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting Interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eE.R.G. has served as a consultant for Thryv Therapeutics.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eMaterials and Correspondence\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAll correspondence and material requests should be addressed to Eric R. Gamazon, [email protected]\u0026nbsp;\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eAndreassen, O.A., Hindley, G.F.L., Frei, O. \u0026amp; Smeland, O.B. New insights from the last decade of research in psychiatric genetics: discoveries, challenges and clinical implications. \u003cem\u003eWorld Psychiatry\u003c/em\u003e \u003cstrong\u003e22\u003c/strong\u003e, 4-24 (2023).\u003c/li\u003e\n\u003cli\u003eLi, B. \u0026amp; Ritchie, M.D. From GWAS to Gene: Transcriptome-Wide Association Studies and Other Methods to Functionally Understand GWAS Discoveries. \u003cem\u003eFrontiers in Genetics\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e(2021).\u003c/li\u003e\n\u003cli\u003eWainberg, M.\u003cem\u003e et al.\u003c/em\u003e Opportunities and challenges for transcriptome-wide association studies. \u003cem\u003eNature Genetics\u003c/em\u003e \u003cstrong\u003e51\u003c/strong\u003e, 592-599 (2019).\u003c/li\u003e\n\u003cli\u003eGamazon, E.R.\u003cem\u003e et al.\u003c/em\u003e A gene-based association method for mapping traits using reference transcriptome data. \u003cem\u003eNature genetics\u003c/em\u003e \u003cstrong\u003e47\u003c/strong\u003e, 1091-1098 (2015).\u003c/li\u003e\n\u003cli\u003eThe Gtex, C. The GTEx Consortium atlas of genetic regulatory effects across human tissues. \u003cem\u003eScience\u003c/em\u003e \u003cstrong\u003e369\u003c/strong\u003e, 1318-1330 (2020).\u003c/li\u003e\n\u003cli\u003eBarbeira, A.N.\u003cem\u003e et al.\u003c/em\u003e Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. \u003cem\u003eNature Communications\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, 1-20 (2018).\u003c/li\u003e\n\u003cli\u003eGusev, A.\u003cem\u003e et al.\u003c/em\u003e Integrative approaches for large-scale transcriptome-wide association studies. \u003cem\u003eNature genetics\u003c/em\u003e \u003cstrong\u003e48\u003c/strong\u003e, 245-252 (2016).\u003c/li\u003e\n\u003cli\u003eBhattacharya, A.\u003cem\u003e et al.\u003c/em\u003e Best practices for multi-ancestry, meta-analytic transcriptome-wide association studies: lessons from the Global Biobank Meta-analysis Initiative. \u003cem\u003eCell genomics\u003c/em\u003e \u003cstrong\u003e2\u003c/strong\u003e(2022).\u003c/li\u003e\n\u003cli\u003eMai, J., Lu, M., Gao, Q., Zeng, J. \u0026amp; Xiao, J. Transcriptome-wide association studies: recent advances in methods, applications and available databases. \u003cem\u003eCommunications Biology\u003c/em\u003e \u003cstrong\u003e6\u003c/strong\u003e, 899 (2023).\u003c/li\u003e\n\u003cli\u003eKeys, K.L.\u003cem\u003e et al.\u003c/em\u003e On the cross-population generalizability of gene expression prediction models. \u003cem\u003ePLoS genetics\u003c/em\u003e \u003cstrong\u003e16\u003c/strong\u003e, e1008927 (2020).\u003c/li\u003e\n\u003cli\u003eKachuri, L.\u003cem\u003e et al.\u003c/em\u003e Gene expression in African Americans, Puerto Ricans and Mexican Americans reveals ancestry-specific patterns of genetic architecture. \u003cem\u003eNature genetics\u003c/em\u003e \u003cstrong\u003e55\u003c/strong\u003e, 952-963 (2023).\u003c/li\u003e\n\u003cli\u003eDemontis, D.\u003cem\u003e et al.\u003c/em\u003e Genome-wide analyses of ADHD identify 27 risk loci, refine the genetic architecture and implicate several cognitive domains. \u003cem\u003eNature genetics\u003c/em\u003e \u003cstrong\u003e55\u003c/strong\u003e, 198-208 (2023).\u003c/li\u003e\n\u003cli\u003eMullins, N.\u003cem\u003e et al.\u003c/em\u003e Genome-wide association study of more than 40,000 bipolar disorder cases provides new insights into the underlying biology. \u003cem\u003eNature Genetics\u003c/em\u003e \u003cstrong\u003e53\u003c/strong\u003e, 817-829 (2021).\u003c/li\u003e\n\u003cli\u003eMeng, X.\u003cem\u003e et al.\u003c/em\u003e Multi-ancestry genome-wide association study of major depression aids locus discovery, fine mapping, gene prioritization and causal inference. \u003cem\u003eNature genetics\u003c/em\u003e \u003cstrong\u003e56\u003c/strong\u003e, 222-233 (2024).\u003c/li\u003e\n\u003cli\u003eHoward, D.M.\u003cem\u003e et al.\u003c/em\u003e Genome-wide meta-analysis of depression identifies 102 independent variants and highlights the importance of the prefrontal brain regions. \u003cem\u003eNature neuroscience\u003c/em\u003e \u003cstrong\u003e22\u003c/strong\u003e, 343-352 (2019).\u003c/li\u003e\n\u003cli\u003eTrubetskoy, V.\u003cem\u003e et al.\u003c/em\u003e Mapping genomic loci implicates genes and synaptic biology in schizophrenia. \u003cem\u003eNature\u003c/em\u003e \u003cstrong\u003e604\u003c/strong\u003e, 502-508 (2022).\u003c/li\u003e\n\u003cli\u003eSanchez-Roige, S.\u003cem\u003e et al.\u003c/em\u003e Genome-wide association study meta-analysis of the alcohol use disorders identification test (AUDIT) in two population-based cohorts. \u003cem\u003eAmerican Journal of Psychiatry\u003c/em\u003e \u003cstrong\u003e176\u003c/strong\u003e, 107-118 (2019).\u003c/li\u003e\n\u003cli\u003eNievergelt, C.M.\u003cem\u003e et al.\u003c/em\u003e Genome-wide association analyses identify 95 risk loci and provide insights into the neurobiology of post-traumatic stress disorder. \u003cem\u003eNature Genetics\u003c/em\u003e, 1-17 (2024).\u003c/li\u003e\n\u003cli\u003eZhou, D.\u003cem\u003e et al.\u003c/em\u003e A unified framework for joint-tissue transcriptome-wide association and Mendelian randomization analysis. \u003cem\u003eNature Genetics\u003c/em\u003e \u003cstrong\u003e52\u003c/strong\u003e, 1239-1246 (2020).\u003c/li\u003e\n\u003cli\u003eCampoy, E., Puig, M., Yakymenko, I., Lerga-Jaso, J. \u0026amp; C\u0026aacute;ceres, M. Genomic architecture and functional effects of potential human inversion supergenes. \u003cem\u003ePhilosophical Transactions of the Royal Society B\u003c/em\u003e \u003cstrong\u003e377\u003c/strong\u003e, 20210209 (2022).\u003c/li\u003e\n\u003cli\u003eBledsoe, X. \u0026amp; Gamazon, E.R. A transcriptomic atlas of the human brain reveals genetically determined aspects of neuropsychiatric health. \u003cem\u003eThe American Journal of Human Genetics\u003c/em\u003e \u003cstrong\u003e111\u003c/strong\u003e, 1559-1572 (2024).\u003c/li\u003e\n\u003cli\u003eHorton, R.\u003cem\u003e et al.\u003c/em\u003e Gene map of the extended human MHC. \u003cem\u003eNature Reviews Genetics\u003c/em\u003e \u003cstrong\u003e5\u003c/strong\u003e, 889-899 (2004).\u003c/li\u003e\n\u003cli\u003eBledsoe, X. \u0026amp; Gamazon, E.R. NeuroimaGene: an R package for assessing the neurological correlates of genetically regulated gene expression. \u003cem\u003eBMC bioinformatics\u003c/em\u003e \u003cstrong\u003e25\u003c/strong\u003e, 325 (2024).\u003c/li\u003e\n\u003cli\u003eGoes, F.S.\u003cem\u003e et al.\u003c/em\u003e Exome sequencing of familial bipolar disorder. \u003cem\u003eJAMA psychiatry\u003c/em\u003e \u003cstrong\u003e73\u003c/strong\u003e, 590-597 (2016).\u003c/li\u003e\n\u003cli\u003eTeng, S.\u003cem\u003e et al.\u003c/em\u003e Rare disruptive variants in the DISC1 Interactome and Regulome: association with cognitive ability and schizophrenia. \u003cem\u003eMolecular psychiatry\u003c/em\u003e \u003cstrong\u003e23\u003c/strong\u003e, 1270-1277 (2018).\u003c/li\u003e\n\u003cli\u003eHoward, D.M.\u003cem\u003e et al.\u003c/em\u003e Genome-wide meta-analysis of depression identifies 102 independent variants and highlights the importance of the prefrontal brain regions. \u003cem\u003eNature neuroscience\u003c/em\u003e \u003cstrong\u003e22\u003c/strong\u003e, 343-352 (2019).\u003c/li\u003e\n\u003cli\u003eArnatkeviciute, A., Fulcher, B.D., Bellgrove, M.A. \u0026amp; Fornito, A. Imaging transcriptomics of brain disorders. \u003cem\u003eBiological Psychiatry Global Open Science\u003c/em\u003e \u003cstrong\u003e2\u003c/strong\u003e, 319-331 (2022).\u003c/li\u003e\n\u003cli\u003eLi, Z.\u003cem\u003e et al.\u003c/em\u003e METRO: Multi-ancestry transcriptome-wide association studies for powerful gene-trait association detection. \u003cem\u003eThe American Journal of Human Genetics\u003c/em\u003e \u003cstrong\u003e109\u003c/strong\u003e, 783-801 (2022).\u003c/li\u003e\n\u003cli\u003eFishilevich, S.\u003cem\u003e et al.\u003c/em\u003e Genic insights from integrated human proteomics in GeneCards. \u003cem\u003eDatabase\u003c/em\u003e \u003cstrong\u003e2016\u003c/strong\u003e, baw030 (2016).\u003c/li\u003e\n\u003cli\u003eHawi, Z.\u003cem\u003e et al.\u003c/em\u003e The role of cadherin genes in five major psychiatric disorders: A literature update. \u003cem\u003eAmerican Journal of Medical Genetics Part B: Neuropsychiatric Genetics\u003c/em\u003e \u003cstrong\u003e177\u003c/strong\u003e, 168-180 (2018).\u003c/li\u003e\n\u003cli\u003eCappello, S.\u003cem\u003e et al.\u003c/em\u003e Mutations in genes encoding the cadherin receptor-ligand pair DCHS1 and FAT4 disrupt cerebral cortical development. \u003cem\u003eNature genetics\u003c/em\u003e \u003cstrong\u003e45\u003c/strong\u003e, 1300-1308 (2013).\u003c/li\u003e\n\u003cli\u003eGenomes Project, C. A global reference for human genetic variation. \u003cem\u003eNature\u003c/em\u003e \u003cstrong\u003e526\u003c/strong\u003e, 68 (2015).\u003c/li\u003e\n\u003cli\u003eDuncan, L.\u003cem\u003e et al.\u003c/em\u003e Analysis of polygenic risk score usage and performance in diverse human populations. \u003cem\u003eNature communications\u003c/em\u003e \u003cstrong\u003e10\u003c/strong\u003e, 3328 (2019).\u003c/li\u003e\n\u003cli\u003eDing, Y.\u003cem\u003e et al.\u003c/em\u003e Polygenic scoring accuracy varies across the genetic ancestry continuum. \u003cem\u003eNature\u003c/em\u003e \u003cstrong\u003e618\u003c/strong\u003e, 774-781 (2023).\u003c/li\u003e\n\u003cli\u003eGraff, M.\u003cem\u003e et al.\u003c/em\u003e Discovery and fine-mapping of height loci via high-density imputation of GWASs in individuals of African ancestry. \u003cem\u003eThe American Journal of Human Genetics\u003c/em\u003e \u003cstrong\u003e108\u003c/strong\u003e, 564-582 (2021).\u003c/li\u003e\n\u003cli\u003eZhang, J., Zhang, S., Qiao, J., Wang, T. \u0026amp; Zeng, P. Similarity and diversity of genetic architecture for complex traits between East Asian and European populations. \u003cem\u003eBMC genomics\u003c/em\u003e \u003cstrong\u003e24\u003c/strong\u003e, 314 (2023).\u003c/li\u003e\n\u003cli\u003eLi, Y.R. \u0026amp; Keating, B.J. Trans-ethnic genome-wide association studies: advantages and challenges of mapping in diverse populations. \u003cem\u003eGenome medicine\u003c/em\u003e \u003cstrong\u003e6\u003c/strong\u003e, 1-14 (2014).\u003c/li\u003e\n\u003cli\u003eRosenberg, N.A.\u003cem\u003e et al.\u003c/em\u003e Genome-wide association studies in diverse populations. \u003cem\u003eNature Reviews Genetics\u003c/em\u003e\u003cstrong\u003e11\u003c/strong\u003e, 356-366 (2010).\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"nature-portfolio","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Nature Portfolio","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"ejp","reportingPortfolio":"","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-6229829/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6229829/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eOur understanding of the influence of ancestral background on genetically determined expression remains limited, especially when gene expression models are applied to studies from different or multiple populations. We performed transcriptome wide association studies (TWAS) in 6 different psychiatric conditions, leveraging gene expression models trained in cohorts with different proportions of African, European, and Indigenous American genetic ancestries. For comparison we repeated each TWAS using a model trained in individuals of predominantly European ancestry. We identified 1,416 statistically significant TWAS associations (FDR p\u0026thinsp;\u0026lt;\u0026thinsp;0.05) across the 6 diagnoses, of which 62% were uniquely detected by the admixed gene models. We observed\u0026thinsp;\u0026gt;\u0026thinsp;92% correlation in the gene-level effects on disease risk, a statistic that remained robust for TWAS results that only reached statistical significance in one population. Using admixed gene expression models validated and greatly extended the yield of TWAS. The resulting transcriptomic signatures implicated neuroimaging features associated with diagnostic symptoms.\u003c/p\u003e","manuscriptTitle":"Admixed gene expression models expand molecular and neurological insights into 6 major psychiatric disorders","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-03-21 12:43:55","doi":"10.21203/rs.3.rs-6229829/v1","editorialEvents":[],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"nature-communications","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"NCOMMS","sideBox":"Learn more about [Nature Communications](http://www.nature.com/ncomms/)","snPcode":"","submissionUrl":"https://mts-ncomms.nature.com/","title":"Nature Communications","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Nature Communications","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"ef22c8c4-e1ec-4621-9187-2967576fd918","owner":[],"postedDate":"March 21st, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":45961553,"name":"Biological sciences/Genetics/Gene expression"},{"id":45961554,"name":"Biological sciences/Computational biology and bioinformatics"},{"id":45961555,"name":"Biological sciences/Genetics/Genetic association study"}],"tags":[],"updatedAt":"2026-05-12T13:22:21+00:00","versionOfRecord":[],"versionCreatedAt":"2025-03-21 12:43:55","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-6229829","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6229829","identity":"rs-6229829","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-05-24T02:00:01.246996+00:00
License: CC-BY-4.0