Genetic
Endometriosis has been associated with a wide range of reproductive, metabolic, immune/inflammatory, and other chronic pain comorbidities 1 . To investigate the potential for a shared biological basis underlying these phenotypes, we estimated genetic correlations (rg) between endometriosis and 32 traits/diseases with European ancestry-based GWAS results available using LD score regression (LDSC) 34 , 35 . Sample sizes for the GWAS varied from 4,533 cases for coeliac disease to 98,389 cases for back pain ( Supplementary Table 23 ).
Of nine immune/inflammatory conditions tested, asthma (rg=0.17, p=6.41×10 −4 ), and osteoarthritis (rg=0.24, p=2.50×10 −8 ) showed significant (Bonferroni correction, p<1.67×10 −3 ) positive genetic correlations with endometriosis, suggesting a likely shared genetic component. Using the data available, none of the autoimmune conditions showed significant genetic correlation with endometriosis ( Fig. 6 , Supplementary Table 24 ).
Significant genetic correlation between endometriosis and excessive/irregular menstruation (rg=0.69, p=8.14×10 −14 ), shorter cycle length (rg=−0.26, p=6.65×10 −5 ), earlier age at menarche (rg=−0.11, p=8.96×10 −8 ) and risk of uterine fibroids (rg=0.40, p=3.54×10 −19 ) was observed, confirming previous epidemiological findings and consistent with increased exposure to menstruation and hormones 36 , 37 , 38 . Earlier age at natural menopause (rg=−0.25, p=6.09×10 −11 ) and younger age at first birth (rg=−0.35, p=1.11×10 −22 ) were also genetically correlated with risk of endometriosis ( Fig. 6 ). Age at first birth is a complex trait influenced by both sociological and biological factors, including oestrogen regulation and age at menarche, which could explain the association observed 39 .
The increased prevalence of other pain conditions among women with endometriosis-associated pelvic pain has been well recognised 40 , 41 . The experience of dysmenorrhea and acyclic pelvic pain symptoms associated with endometriosis is associated with sensitization of the nervous system 40 , 42 – 44 , increasing the risk of other pain conditions, and this sensitization - or a susceptibility to developing it - could be genetically mediated. We investigated the genetic correlation between endometriosis and 12 different pain conditions available from the UK Biobank ( Fig. 6 , Supplementary Table 24 ). All showed significant (p<1.56×10 −3 ) positive correlations with endometriosis, including migraine (rg=0.29, p=1.05×10 −16 ), headache (rg=0.26, p=4.88×10 −17 ), dorsalgia (rg=0.45, p=9.52×10 −16 ), multi-site chronic pain (rg=0.43, p=5.19×10 −71 ) and chronic back pain (rg=0.33, p=2.89×10 −24 ) Restricting the analysis to women with surgical and/or medically confirmed endometriosis (N=8,390) did not alter the results ( Extended Data Fig. 9 , Supplementary Table 25 ). Analysis of the genetic correlation of endometriosis and 19 brain imaging phenotypes from 7,916 up to 8,427 women in the UK Biobank 45 provided no evidence that the shared susceptibility between endometriosis and other pain conditions is regulated through genetically underpinned alterations in brain structure or resting state activity (rg −0.12 to 0.23, with 17/19 < |0.1|), although we recognise that power is low ( Extended Data Fig. 10 , Supplementary Table 26 – 28 ).
We investigated if endometriosis-associated variants in the 99% credible sets for each distinct endometriosis signal were associated with any of the genetically correlated traits/conditions at genome-wide significance, using Phenoscanner and GWAS Catalog 46 . This revealed 10 genome-wide significant variants shared with 11 different traits and conditions ( Supplementary Table 29 ). Three variants were shared with pain traits: rs1352889 at BSN /3p21.31 and rs10828249 at MLLT10 /10p12.31 with multi-site chronic pain; and rs12030576 at NGF /1p13.2 with migraine and dysmenorrhea. Loci shared with uterine fibroids (3), menstrual cycle length (2), age at menarche (1), age at menopause (1), BMI (2), type 2 diabetes (1), and asthma (1) are discussed in Supplementary Text . Detailed genomic enrichment analyses for endometriosis with each of these traits and conditions is required to fully elucidate the biological basis for their genetic correlations.
Methods
All human research was approved by the relevant Institutional Review Boards and conducted according to the Declaration of Helsinki. All participants provided written informed consent. Study level ethical statements are provided in the Supplementary Information .
The meta-analysis included 24 datasets, which in total contribute 58,961 cases, and 700,345 controls of mainly European ancestry and one dataset from Japanese ancestry (1,713 cases: 1,581 controls). Case and control ascertainment and study descriptions are summarised in Supplementary table 1 . The datasets were genotyped with various genotyping arrays and each dataset was subject to sample and variant quality controls, genomic control adjustments where genomic inflation factor is reported per study ( Supplementary Table 2 ). For each GWAS, samples were pre-phased and imputed up to reference panels from the 1000 Genomes Project (phase 1, March 2012 release; phase 3, October 2014 release) 69 , 70 , Haplotype Reference Consortium 71 , or population-specific whole-genome sequencing 72 , 73 ( Supplementary Table 2 ). SNVs with poor imputation quality were excluded from downstream association analyses ( Supplementary Table 2 ). Association with endometriosis was evaluated in a regression framework, under an additive model in the dosage of the minor allele, with adjustment for study-specific covariates ( Supplementary Table 2 ). Allelic effects and corresponding standard errors that were estimated from a linear (mixed) model were converted to the log-odds scale 74 . Study-level association summary statistics ( P -values and standard error of allelic log-ORs) were corrected for residual structure, not accounted for in the regression analysis, by means of genomic control 75 if the inflation factor was >1 ( Supplementary Table 2 ). Genome-wide association analyses were conducted for overall endometriosis and sub-phenotype GWAS analyses were conducted for stage I/II, stage III/IV, infertile endometriosis case groups in studies with the respective case categorisation ( Supplementary Table 1 ).
To increase power, accuracy of effect estimates and generalisability between studies, GWAS summary statistics were combined in a meta-analysis. Variants were first filtered at the study level for: MAF>0.5%, SE0.4 or r 2 >0.3. The summary statistics for each study were genomic control corrected to adjust for any residual population structure. Per-variant estimates collected from each individual study from 60,674 cases and 701,926 controls, for 10,401,531 SNPs were meta-analysed using the inverse variance weighting fixed-effects method implemented in METAL 76 . We conducted additional sub-phenotype meta-analyses including (1) Stage III/IV endometriosis (4,045 cases), (2) Stage I/II endometriosis (3,916 cases), (3) Infertile endometriosis (3,060 cases). Post meta-analysis, we excluded variants that were not present in more than 50% of the total effective sample size (N=8,163,508 SNPs). We maintained the conventional genome-wide significance threshold 5×10 −8 . Moreover we conducted two meta-analysis for sensitivity testing: (1) a meta-analysis of datasets including >300 cases (15 studies of the 24) for overall endometriosis to ensure smaller studies are not contributing noise, (2) a meta-analysis of GWAS with only female controls (N cases=44,176: N controls=657,747) and a meta-analysis of GWAS with mixed female and male controls (N cases=5,222: N controls=44,176) with the ‘sex-differentiated’ function in GWAMA 77 to estimate heterogeneity in allelic effect sizes due to sex differences in the controls sets. Significant heterogeneity was defined as 0.05/42=1.19×10 −3 .
For each lead SNV, we modelled allelic log-ORs across GWAS in a linear regression framework, weighted by the inverse of the variance of the effect estimates, incorporating the following covariates: (i) an indicator variable representing the ancestry of the study (European or East Asian); and (ii) two indicator variables representing case ascertainment in the study (Self-reported, medical records, mixed (Medical records and self-reported). In this modelling framework, we tested for heterogeneity in allelic effects on endometriosis between GWAS that is: (i) due to ancestry (after adjustment for case ascertainment); (ii) due to ascertainment (after adjustment for ancestry); and (iii) due to other unmeasured confounders (residual after accounting for ancestry and case ascertainment).
We defined a locus as the chromosomal regions 500 kb up- and down-stream of the lead SNP. If we had more than one lead SNP within 500Kb up- and down-stream we combined these into one locus. To identify the presence of distinct associations at genome-wide significant loci, we conducted approximate conditional analysis based on summary meta-analysis results restricted to European ancestry GWAS (because the approximation requires a reference for LD, which is not consistent across ancestry groups) in GCTA software 78 . We used 1000G P3v5 European population as the reference panel. We utilised genome-wide significance p<5×10 −8 . For loci with a single signal of association, we defined the index SNP as the lead SNP from the multi-ancestry unconditional analysis. For loci with multiple signals of association, we defined index SNPs as having the minimum p-value in the European conditional approximate analysis.
Based on the liability threshold model 79 , we estimated the proportion of variance explained by SNPs using the effective allele frequency and odds ratio from GWAS analysis of European ancestry meta-analysis 80 . We assumed population prevalence of 8%, 2.5%, 5.5% and 2.5% for overall endometriosis, stage III/IV endometriosis, stage I/II endometriosis and infertile endometriosis, respectively 9 , 81 , 82 .
Within each locus, for each distinct signal, we first approximated the Bayes’ factor in favour of endometriosis association of each SNV using summary statistics from the European ancestry-specific meta-analysis. Specifically, the Bayes’ factor for the j th SNV at the i th distinct association signal is approximated by
Λ i j = V i j V i j + ω e x p ω β i j 2 2 V i j V i j + ω
where β i j and V i j denote the estimated allelic effect (log-OR) and corresponding variance from the meta-analysis 83 . In loci with multiple distinct signals of association, summary statistics were obtained from the approximate conditional analysis after adjusting for all other index variants in the fine-mapping region. In loci with a single association signal, summary statistics were obtained from the unconditional meta-analysis. The parameter ω denotes the prior variance in allelic effects, taken here to be 0.04 83 .
Within each locus, for each distinct signal, we then calculated the posterior probability of driving the endometriosis association for each SNV under a uniform prior model for causal variants. Specifically, for the j th SNV at the i th distinct signal, the posterior probability is given by π i j ∝ Λ i j , where Λ i j is the Bayes’ factor in favour of association. We then derived a 99% credible set 84 for the i th distinct association signal by: (i) ranking all SNVs according to their posterior probability π i j ; and (ii) including ranked SNVs until their cumulative posterior probability attains or exceeds 0.99.
Variants within the credible sets for each of the 49 distinct signal associated with endometriosis were annotated using Ensembl Variant Effect Predictor (VEP) ( https://github.com/Ensembl/ensembl-vep%20401 ). PhastCon, Grantham, GERP and PolyPhen predictions for all coding variants were accessed via the Exome Variant server ( https://evs.gs.washington.edu/EVS/ ) and from SIFT ( https://sift.bii.a-star.edu.sg/ ). Moreover, the web-based software platform, FUMA 85 , was used to annotate 99% credible sets. Variants were annotated with Combined Annotation Dependent Depletion (CADD) scores, RegulomeDB scores, chromatin states and chromatin interactions using reference datasets available in the software. SNPs were also annotated to valid promoter associated chromatin loops generated from H3K27Ac HiChIP libraries from a normal immortalized endometrial cell line (E6E7hTERT) and three endometrial cancer cell lines (ARK1, Ishikawa and JHUEM-14) 86 . We determined if SNPs fell within chromatin interaction anchor points that interacted with the promoter of a gene.
To determine if genes located in loci associated with endometriosis risk were enriched in particular tissues we conducted a tissue enrichment analysis. Genes with a transcription start site (TSS) within ±200kb of each of the 49 index SNPs were tested for enrichment in specific tissues using the TissueEnrich R package 87 . TissueEnrich uses a hypergeometric test approach to calculate tissue-specific gene enrichment using the available Human Protein Atlas (HPA) dataset containing RNA-Seq data across 35 human tissues 88 and GTEx dataset containing RNA-Seq data across 29 human tissues 89 .
To test for causality or pleiotropy between a variant, expression of a gene and risk of endometriosis we performed SMR analysis 15 . The SMR was applied using cis -QTL summary statistics for tissues enriched for genes in risk loci, as shown in our previous analysis. This included cis -eQTLs summary statistics from endometrium 11 , 90 , uterus 91 and cis -mQTL summary statistics from endometrium 14 . A proportion of QTLs are shared and correlated between tissues 92 , 93 , as such, we repeated the analysis in a large blood eQTL dataset 94 and mQTL dataset 13 as a proxy.
We conducted a meta-analysis for eQTLs in endometrium using a previously published eQTL dataset 11 generated on an Australian cohort (N=206) and an eQTL dataset from a UK cohort (N=163) ( Supplementary Table 34 – 35 ) consisting of endometrial samples from a total of 369 women for 1MB around the 42 lead SNPs. The Australian dataset contained 6,230,993 genotyped and imputed SNPs and 17,255 genes with CPM>0.22 (~10 counts) in 90% samples. The UK dataset contained 7,179,995 genotyped and imputed SNPs and 18,152 genes with CPM>0.22 in 90% samples. Both RNA-sequencing datasets were aligned to GRCh38 and underwent TMM normalisation and log2 transformation using the edgeR R package 95 . eQTL analysis was performed using the nominal pass method in QTLTools 96 , the cis distance was set of 1Mb and data was adjusted to match a normal distribution (--normal). Flow cell, lane, menstrual cycle stage and endometriosis status were included as covariates in the analysis of the Australian dataset. Principal components 1–3 generated from RNAseq data were included as covariates in the analysis of the UK dataset. Effect alleles were harmonised between the datasets. The two eQTL datasets were meta-analysed for the 1MB region around the 42 lead SNPs using the software tool OSCA (OmicS-data-based Complex trait Analysis) 97 and secondary signals were identified using conditional analyses utilising QIMIRHCS as the LD reference cohort in GCTA-COJO 78 .
SMR was performed using these QTL datasets alongside results from the endometriosis meta-analysis of European cohorts in this study. For analyses containing endometrial QTLs, Oxford and Melbourne cohorts were removed from the meta-analysis to ensure GWAS and endometrial eQTL datasets were independent. A total of 531 probes and 571 genes were included from the array and RNA-Seq endometrial eQTLs respectively and SMR p-value thresholds of 9.42×10 −5 (0.05/531) and 8.76×10 −5 (0.05/571) were applied to determine significance. Heterogeneity of effect sizes in cis -eQTL regions can result from colocalization and LD between multiple casual SNPs. Heterogeneity was tested using the HEIDI (heterogeneity in dependent instruments) test. A HEIDI p-value of <0.05/m_SMR_sig, where m_SMR_sig is the number of genes that passed the SMR P-value threshold, suggests heterogeneity in the SNP effects in the region. The SMR and HEIDI test was repeated using eQTLs for 15,513 genes (P eQTL <5×10 −8 ) from the eQTLGen blood dataset 94 and an SMR P-value threshold of 3.22×10 −6 and using eQTLs for 939 (P eQTL <5×10 −8 ) genes with from GTEx uterus 91 and an SMR P-value threshold of 5.33×10 −5 .
Any association between endometriosis risk variants and regulation of methylation was also tested using SMR and HEIDI. Two mQTL datasets were used including an endometrial mQTL dataset 14 (7,803 probes with P mQTL <5×10 −8 ) and large blood mQTL dataset 13 (93,154 probes with P mQTL <5×10 −8 ). SMR p-value thresholds of 6.41×10 −6 and 5.37×10 −7 were applied to the endometrial and blood SMR results respectively.
As described previously for the 99% credible set, significant variants from the SMR analysis were annotated with CADD scores, RegulomeDB scores, chromatin states and chromatin interactions using reference datasets available in the FUMA 85 software. Similarly, SNPs were also annotated to promoter associated chromatin loops from endometrial (E6E7hTERT) and three endometrial cancer cell lines (ARK1, Ishikawa and JHUEM-14) 86 .
The genes identified from the SMR analysis (N=23) were annotated for cell-type specific expression utilising GTEx multi-gene single cell viewer. This is a visualization tool which helps explore single cell expression data for a list of genes from various cell types. The GTEx single cell data is generated from 8 tissues including, breast, esophagus mucosa, esophagus muscularis, heart, lung, skeletal muscle, prostate and skin from 25 archived, frozen tissue sample from 16 donors of GTEx project 98 . We determined an arbitrary cut-off of detection in at least 10% of the respective cells and looked at patterns of expression in these cells.
To determine if any of the endometriosis GWAS meta-analysis associations were driven by particular surgical or clinical features we tested for association of the lead 42 SNPs with 17 sub-phenotypes summarised in three groups: Surgical sub-phenotypes, symptom sub-phenotypes and common morbidity sub-phenotypes. Definitions of the sub-phenotypes, their data sources and study-level statistical analyses are described in detail in the supplementary information: methods . The meta-analysis was performed using an inverse variance weighted approach conducted with METAL software 76 . I 2 measures were reported to estimate heterogeneity between studies included for each sub-phenotype analysis ( Supplementary Table 21 ). Descriptive statistics were generated as forest plots for SNP-sub-phenotype associations using the forestplot package in R 99 . Correlation and scatter plots of sub-phenotype association effect sizes were generated using R. Due to the lack of independence between women contributing to the different phenotypic analyses, no formal statistical tests could be conducted.
To determine the genetic correlation between endometriosis and related traits, we performed linkage disequilibrium (LD) score regression using LDSC v1.0.1 ( https://github.com/bulik/ldsc ) 34 , 35 . The software was installed on local secure cluster GenomeDK. The LDSC program was run using precomputed European LD scores available from Broad Institute ( https://data.broadinstitute.org/alkesgroup/LDSCORE/eur_w_ld_chr.tar.bz2 ), with calculations limited to SNPs from HapMap3 ( https://data.broadinstitute.org/alkesgroup/LDSCORE/w_hm3.snplist.bz2 ). Summary statistics for 49 traits were downloaded and grouped in five categories, relating to the immune system/inflammation, pain, reproduction, obesity and brain volume. All summary statistics were prepared for analysis by determining the effect allele and using the munge_sumstats.py function of LDSC ( Supplementary Tables 23 and 26 ). Bonferroni correction was applied to determine significant correlations, resulting in an adjusted p-value threshold of p<1.56×10 −3 (0.05/32) for disease/traits; p<2.63×10 −3 (0.05/19) for brain volume and function traits. A list of all summary statistics can be found in supplementary table 24 and 27 . Both analyses were repeated with only medically confirmed endometriosis GWAS (N=8,390 cases) as sensitivity analyses ( Supplementary Table 25 and 28 ).
Liability scale SNP heritability was calculated using LDSC by using the --sample-prev (disease prevalence in the sample) and --pop-prev (disease prevalence in the population) flags in the ldsc.py function. This enables the function to convert binary trait heritability estimates from the observed scale. Population prevalence was primarily obtained from the paper presenting the sumstat. In cases where this was not possible, prevalence was taken from suitable scientific literature as seen in supplementary table 23 .
For all coding variants within the credible sets for each of the 49 distinct signals associated with endometriosis we searched for reported associations (P<5×10 −8 ) with other traits and conditions using Phenoscanner 46 ( http://www.phenoscanner.medschl.cam.ac.uk/ ) (February 2020) and GWAS catalog ( https://www.ebi.ac.u/gwas/home ) (December 2019).
The summary GWAS results from the latest European ancestry GWAS meta-analysis was requested and obtained (1) for migraine excluding 23andMe and UKBB datasets including 38,094 migraine cases and 210,211 controls 100 , (2) for multi-site chronic pain from UKBB including 240,651 individuals 101 . For endometriosis, the European ancestry GWAS meta-analysis results are utilised for this analysis.
The following SNPs were filtered out: (1) SNPS with MAF=0.2 among datasets, (5) SNPs not on the autosomes. Only SNPs that are shared between the GWAS analyses were utilised in the analysis. Z-scores (log(OR/SE)) were computed for all SNPs. After variant filtering, a total of 5,945,867 common SNPs between endometriosis, migraine and MCP were included in the MTAG analysis.
MTAG is a generalisation of standard inverse-variance weighted meta-analysis framework. It takes pre-processed GWAS summary statistics from multiple traits. Here, MTAG was utilised to run a meta-analysis including endometriosis, MCP and migraine. Birvarite LD score regression was used as part of MTAG analysis to account for possibly unknown sample ovarlap between GWAS results of different traits. In the results, MTAG outputs trait-specific effect estimated for each SNP and resulting p-value can be interpreted and used like those in single-trait GWAS 47 .
For each trait, significant lead SNPs were identified based on (1) achiving a genome-wide significant p-value (p<5×10 −8 ), (2) being 500Kb away from each other and, (3) being independent (r 2 <0.1). Then, the lead SNPs associated with respective traits (i.e. from endometriosis and MCP) that sit within 500Kb were identified and LD between them was checked. If the LD between lead SNPs of respective traits were r 2 >=0.8, they were deemed shared loci between those traits.
Summary results for the top 10,000 SNPs included in the entire Genome-wide Association Study (GWAS) meta-analysis are provided in Supplementary Table 33 . Summary statistics from the endometriosis meta-analysis excluding 23andMe is available from: EBI GWAS Catalog Study Accession GCST90205183. GWAS summary statistics from 23andMe, Inc were made available under a data use agreement that protects participant privacy. Please contact
[email protected] or visit research.23andMe.com/collaborate for more information and to apply to access the data. The UK endometrium eQTL dataset (N=163) 1MB around each 42 lead SNPs is provided in Supplementary Table 34 – 35 .
We utilised publicly available software in all the analyses. These are listed with appropriate citations in the methods .
No statistical method was used to predetermine sample size. We used the largest sample size of endometriosis cases and controls with GWAS data available to us. Within each contributing study, data were excluded on the basis of well-established individual and variant quality control (QC) procedures to remove poor quality genotypes, samples and variants. These QC procedures are described in Supplementary Table 2 for each study. We did not conduct replication since we had already brought together all study data available to us via meta-analysis. All reported association signals were checked to confirm that effects were not driven by false positives in single studies. Randomisation was not performed. Within each study, covariates were adjusted for to account for potential confounding. Covariate adjustments are reported in Supplementary Table 2 . Group allocation was not relevant to this study, so blinding was not necessary.
Results
We conducted a meta-analysis of 24 GWAS with a total effective sample size of 206,106 (60,674 cases and 701,926 controls) of European (98%: Europe, USA, Australia), and East Asian (2%: Japan) ancestry. In 12 GWAS, women had surgically confirmed endometriosis including 3,916 with rASRM stage I/II and 4,045 with rASRM stage III/IV disease; in 9 studies, 3,060 cases had known infertility in addition to endometriosis ( Supplementary Table 1 ).
Each GWAS dataset was imputed up to 1000 Genomes (1000G P3v5), Haplotype Reference Consortium (HRC r1.1 2016) or population-specific whole genome sequence data, and standardised quality control was performed ( Methods ; Supplementary Table 2 ). Meta-analysis was conducted for 10,401,531 SNPs under a fixed-effects model with inverse-variance weighting for overall, rASRM stage I/II, rASRM stage III/IV and endometriosis associated infertility ( Extended Data Fig. 1 – 2 ).
We identified 42 genome-wide significant loci (p<5×10 −8 ) for overall endometriosis, 31 of which have not been previously reported ( Fig. 1 , Supplementary Table 3 , Extended Data Figs. 1 – 2 ); 12 of the 42 loci were significant at a more stringent p<5×10 −9 threshold, suggested as more appropriate for whole genome sequence data. Following earlier observations of a greater genetic contribution to rASRM stage III/IV endometriosis 9 , we conducted sub-phenotype analyses including rASRM stage III/IV (4,045 cases/379,890 controls), rASRM stage I/II (3,916 cases/184,006 controls) and endometriosis-associated infertility (3,060 cases/242,555 controls). No additional genome-wide association signals were detected in sub-phenotype analyses; of the 42 loci, eight reached genome-wide significance for stage III/IV, one for endometriosis-associated infertility but none for stage I/II. However, lead SNPs at 38/42 genome-wide significant loci showed larger effect sizes in stage III/IV vs. stage I/II analysis, and of these, 6 with non-overlapping 95% confidence intervals between the sub-phenotypes ( KDR /4q12, SYNE1 /6q25.1, 7p15.2/7p15.2, CDKN2-BAS1 /9p21.3, SLC19A2 /1q24.2, IGF1 /12q23.2) ( Fig. 1 ; Supplementary Table 3 ; Supplementary Figs. 1 – 2 ).
Of the 42 genome-wide significant loci, we observed nominally significant between-study heterogeneity for 11 loci (p-overall heterogeneity <0.05, 2.1 loci expected) in the overall endometriosis meta-analysis; for six loci this was explained by case ascertainment and two by ancestry ( Supplementary text ; Supplementary Tables 4 – 6 ; Supplementary Figs 3 – 5 ). Sensitivity analyses to test the impact of small datasets and the inclusion of male controls in sex-combined analyses did not alter the results ( Supplementary text ; Supplementary Table 7 , Extended data Figs 1e and 3 ).
To identify the presence of multiple distinct associations at the 42 genome-wide significant loci, we conducted approximate conditional analysis based on summary meta-analysis results restricted to European ancestry GWAS. Four loci had multiple distinct associations after conditioning (p<5×10 −8 ; see Methods ) including three with two signals ( GREB1 /2p25.1, CDKN2-BAS1 / 9p21.3, IGF1 /12q23.2); and SYNE1 /6q25.1 with five signals ( Supplementary Table 3 ). In total, 49 index SNPs representing distinct associations were identified across the 42 endometriosis loci ( Supplementary Table 8 ; Supplementary Fig. 6 ).
The phenotypic variance for overall endometriosis explained by the 42 lead SNPs was 1.62% vs. 1.98% by the 49 index SNPs. Meta-analysis limited to datasets with surgically/medically ascertained cases, less prone to misclassification bias and potentially including women with more severe symptoms, showed that the 42 lead SNPs explained 3.99% of endometriosis variance, and the 49 index SNPs 4.79%. For stage III/IV disease, the 42 lead SNPs explained 4.10% and 49 index SNPs 5.01% of disease variance ( Supplementary Table 9 ).
To identify potential causal variants, we performed fine-mapping based on the European ancestry-specific meta-analysis results; 99% credible sets were constructed assuming a single causal SNP driving each of the 49 distinct association signals ( Supplementary Table 10 – 12 ; see Methods ). For seven association signals, credible SNP sets containing ≤10 SNPs were observed. Fine-mapping uncovered 6 “high-confidence” variants (posterior probability (π) >50% to be causal): rs71575922 in SYNE1 and rs73625113 in ESR1 ; rs6456259 in LNC-LBCS ; rs6970537 in HOXA10 and rs3803042 1kb downstream of HOXC10 ; and rs73241342 in LINC00629 ( Supplementary Table 13 ). All were in non-coding regions.
Genes located within ±200kb of each of the 49 index SNPs were enriched for expression in endometrium, smooth muscle (Human Protein Atlas) and uterus (GTex) (see Supplementary text ; Extended Data Figs. 4a – d ). To identify specific genes regulated by the 49 distinct endometriosis association signals, we analysed four expression quantitative trait loci (eQTL) datasets: 1) a gene expression microarray study of 229 endometrial samples 10 ; 2) a meta-analysis of RNAseq-based eQTL datasets including 368 endometrial samples ( 11 and Supplementary Tables 34 – 35 ; see Methods ); 3) RNAseq expression data from 129 uterus tissue samples from GTEx; and 4) data from 31,684 blood samples from the eQTLGen Consortium 12 . We also associated SNPs in the endometriosis risk regions with DNA methylation of nearby CpG sites in endometrium and blood using previously published mQTL datasets ( Supplementary Table 17 ). 13 , 14
Summary data-based Mendelian Randomisation (SMR) 15 was used to identify genes whose expression (eQTL) or methylation (mQTL) levels are associated with endometriosis due to the effects of a common genetic variant (either by direct causal or pleiotropic effects) rather than due to linkage disequilibrium (LD). Table 1 summarises the significant eQTL/mQTL SMR results across endometriosis GWAS loci, together with evidence from chromatin interactions (see Supplementary Text , Supplementary Tables 14 – 19 , and Extended Data Figs 5 – 6 for detailed results ).
Observations that endometriosis risk variants on chromosomes 2p25.1 and 12q22 may function through changes in expression/methylation of GREB1 , and VEZT and/or FGD6 , respectively, have been reported previously 10 , 11 , 16 . Many of the other potentially causal genes have strong biological support. Notable was the signal for GDAP1 ( GDAP1 /8q21.11), previously associated with dysmenorrhea severity and neuronal development ( Fig. 2 ) 17 , 18 . Rs4567029 regulates methylation of probes near GDAP1 and is in perfect LD with rs10283076 that was identified as the variant regulating GDAP1 expression in blood tissue ( Supplementary Table 16 ). The SRP14-AS1 /15q15.1 locus harbored multiple distinct association signals in endometrium and blood and chromatin interaction with BMF (Bcl2 modifying factor) ( Fig. 3 ; Supplementary Tables 15 , 16 , 18 ). BMF encodes for a glycoprotein associated with sex hormone binding globulin and regulating bioavailability of oestrogen and testosterone 19 . Variants in SRP14 may also affect endometriosis-associated pain genesis and maintenance through regulation of DHEA-sulfate (DHEA-S) levels 20 . DHEA-S is a neurosteroid functioning as a neurotropin, that can bind and activate nerve growth factor ( NGF ) and brain-derived neurotrophic factor ( BDNF ) 21 , 22 . BDNF has been shown to regulate the maintenance of chronic pain in various chronic disorders 23 , and its expression appears increased in the eutopic endometrium of women with endometriosis compared to controls 24 , 25 . The expression of NGF, one of our other GWAS loci, has been suggested to partly mediate local nerve density around endometriosis lesions, associated with dyspareunia 26 .
We next investigated whether the endometriosis-associated genes identified through SMR analyses ( Table 1 ) are expressed in specific cell types. We used the large GTEx single cell expression database, although this was limited to cellular components from 8 tissues not including endometrium, ovary or blood (see Methods ). Of the 23 genes, 18 were expressed in cells included in the GTEx single cell database, ( Extended Data Fig. 7 ). BMF, GDAP1, MLLT10, TRA2A, SRP14, SYNE1, and VEZT and were expressed in neuronal and neuroendocrine cells. CALCRL, MLLT10, SYNE1, TRA2A, and VEZT were expressed in adipocytes, endothelial cells and fibroblasts and a cluster of genes including ARL14EP, ESR1, SYNE1, MLLT10, TRA2A and VEZT were expressed in immune cells, most commonly B cells and T cells. Epithelial cells expressed ABO, ESR1, MLLT10, TRA2A, SRP14, SYNE1, and VEZT .
The SMR results from endometrium and blood implicated several causal genes associated with mechanisms of pain perception and maintenance ( SRP14/BMF, GDAP1, MLLT10, BSN, NGF ). We next investigated the association of the GWAS loci with sub-phenotypes of endometriosis, including pelvic pain, in more detail.
Given the generally stronger GWAS associations with rASRM stage III/IV endometriosis ( Supplementary Tables 3 and 9 ), we first explored whether certain surgical or clinical features of endometriosis were driving these associations. We compared effect sizes of the 42 lead SNPs for stage III/IV disease with endometrioma, deep lesions and superficial peritoneal lesions using eight studies (6,502 cases/57,407 controls; Supplementary Table 20 ) with surgical sub-phenotype information (see Methods ). We observed the largest correlation between effect sizes for rASRM stage III/IV disease vs. endometrioma (r=0.73), compared to deep lesions (r=0.42) and superficial peritoneal lesions (r=0.15; Fig. 5 ). None of the lead SNPs were significantly associated with a surgical sub-phenotype after Bonferroni correction (0.05/42=1.19×10 −3 ); nominal associations (p<0.05) included 7 SNPs for endometrioma, 6 for superficial peritoneal lesions, and 3 for deep lesions ( Fig. 5 , Supplementary Table 21 , Supplementary Fig. 7 ).
When we tested for differences in effect sizes between endometriosis and adenomyosis (the growth of endometrium into the myometrium) for each of the 42 endometriosis lead SNPs ( Supplementary Text: Methods/Results ), no significant differences were observed (p<1.19×10 −3 , Supplementary Table 22 ). Moderate correlations in effect sizes were observed between adenomyosis and rASRM stage I/II (r=0.52), superficial lesions (r=0.40) and deep lesions (r=0.31), but low correlations with rASRM stage III/IV (r=0.02) and endometrioma (r 2 =0.18; Extended Data Fig. 8 ).
Epidemiological evidence has consistently shown a lack of correlation between rASRM stage III/IV and pelvic pain severity 7 . Using six studies with pelvic pain sub-phenotype information ( Supplementary Table 20 ; four with World Endometriosis Research Foundation (WERF)-Endometriosis Phenome Harmonisation (EPHect) standardised phenotyping 27 ), the correlation of effect sizes of the 42 lead SNPs between rASRM stages and the number of pelvic pain symptoms as a measure of multi-site pain (see Methods ) was weak ( Fig. 5 ; r=0.29 for rASRM I/II vs. r= −0.01 for rASRM III/IV). All five pain types consistently showed stronger correlations in effect sizes with rASRM stage I/II compared to rASRM stage III/IV, particularly dysmenorrhea (rASRM I/II r=0.48 vs. rASRM III/IV r=0.19) ( Supplementary Fig. 8 ). For deep endometriosis (not well captured by rASRM staging), the strongest correlation was also observed with dysmenorrhea (r=0.36; Supplementary Fig. 8 ) while there was low correlation with acyclic pelvic pain (r=0.21), bladder pain (r=0.18), GI/IBS pain (r=0.18) and none with dyspareunia (r=−0.01).
Of the 42 lead SNPs, two were significantly associated (p<1.19×10 −3 ) with pain sub-phenotypes: SYNE1 /6q25.1 and WT1 /11p14.1 ( Supplementary Fig. 8 ; Supplementary Table 21 ). The WT1 /11p14.1 lead SNP was associated with earlier onset (age <18 years) dysmenorrhea (p=7.86×10 −4 , OR=1.57 (1.21–2.04)). WT1 encodes a transcription factor involved in the development of the urogenital system and female fertility 28 . The lead SNP at SYNE1 /6q25.1 was associated with dysmenorrhea (p=5.00×10 −5 , OR=1.49 (1.23–1.80)), dyspareunia (p=3.5×10 −4 , OR=1.48 (95% CI: 1.19–1.83); severe dyspareunia p=3.17×10 −4 , OR=2.07 (95% CI: 1.40–3.08)) and acyclical pelvic pain (p=1.14×10 −3 , OR=1.44 (1.16–1.80)). This locus includes multiple genes involved in sex-steroid hormone signalling, SYNE1 , ESR1 and CCDC170 , harboring multiple variants previously associated with female hormone-dependent diseases 29 . We identified two high-confidence variants, rs71575922 (π=0.997) intronic to SYNE1 and rs73625113 (π=0.506) intronic to ESR1 , driving two of the five association signals at the SYNE1 /6p25.1 locus ( Supplementary Table 13 ). These risk-variants are in strong LD (r 2 >0.8) with variants driving expression of ESR1 and DNA methylation near ESR1 and SYNE1, and also with rs2941739 identified as a variant regulating ESR1 expression in blood ( Table 1 ; Fig. 4 ). ESR1 expression has been correlated with expression of seven other genes across this region in endometrium 30 . Oestrogen is essential for the growth of endometriotic lesions, and affects inflammation, which likely influences endometriosis-associated pain 7 . In addition to the role of ESR1 in the sex hormone pathway, it also regulates expression of COMT which encodes for a key enzyme (catechol-O-methyltransferase) in catabolic pathways of pain-relevant neurotransmitters such as dopamine, epinephrine, shown to play a role in pain vulnerability and modulation in animal and human studies 31 – 33 .
Lead SNPs at a further 12 loci showed nominal evidence of association (p<0.05) with pain sub-phenotypes including 8 with dysmenorrhea, 6 with dyspareunia, 6 with bladder pain, 4 with acyclical pelvic pain and 3 with GI/IBS pain ( Supplementary Table 21 ). The rs12441483 variant in SRP14-AS1 /15q15.1 which functions through regulation of expression of SRP14 gene in blood ( Table 1 ; Supplementary Table 16 ) was nominally associated (p2 types of pain ( Supplementary Table 21 ). These analyses revealed many biologically interesting candidate regions for pain development and maintenance in endometriosis that require further exploration in larger datasets and other ancestry groups.
Discussion
In this study we identified 42 loci with 49 distinct signals associated with endometriosis, a 3-fold increase from previous studies 8 . Most had larger effect sizes for stage III/IV disease, and combined they explained 5.01% of stage III/IV disease variance. Our sub-phenotype analysis provided evidence that the presence of ovarian endometriosis drives the larger effect sizes for stage III/IV disease at least in part, with a genetic basis that is distinct from other disease manifestations. While we did not observe significant differences in effect sizes between endometriosis and adenomyosis (sometimes referred to as “endometriosis of the uterus”) 58 , the correlation of effects was highest with stage I/II endometriosis and superficial peritoneal lesions suggesting a shared pathogenesis of these subtypes, or their associated symptoms, with adenomyosis. Important caveats are the potential for misclassification of ICD-defined adenomyosis as controls 59 (although this will have applied similarly to endometriosis diagnoses) and the modest sample size of adenomyosis-only cases.
The SMR analyses linking variants at the identified risk loci to expression and DNA methylation differences in endometrium and blood identified numerous functional insights into the mechanisms through which endometriosis association signals are mediated. Potentially causal relationships were observed with genes expressed in endometrium ( SRP14, HOXB9, TRA2A, with previous evidence confirmed for VEZT/FGD9 and GREB1 ) 11 , 14 , 60 and in blood (including ABO, ESR1/SYNE1, GDAP1, FSHB, MLLT10, SRP14-AS1, WNT4 ). Many of these genes are expressed in different cell types relevant to endometriosis pathogenesis, including neuronal cells, immune cells, and epithelial cells; eQTL studies associating genetic variants to cell-specific expression patterns in endometrium and peritoneal fluid are needed to dissect differential regulation at a cellular level.
The genes causally related to endometriosis risk variants have functions in sex-hormone driven signalling, uterine development, oncogenesis, inflammatory adhesion molecules and angiogenic factors. Particularly notable, however, was the link of many of the genes identified, such as SRP14/BMF , GDAP1, MLLT10, BSN, NGF, as well as ‘hormone regulators’ SYNE1/ESR1 and FSHB , to mechanisms of pain perception and maintenance. Our genetic correlation analysis highlighted significant correlations between endometriosis and 11 pain conditions - including migraine, headache, dorsalgia, chronic back pain and MCP. Multi-trait analysis of GWAS summary statistics between endometriosis, MCP and migraine, highlighted many additional genome-wide significant variants, 12 mapping within 500Kb across diseases and 9 tagging the same association signals. These results provide support to the hypothesis that the presence of endometriosis can result in pain through interlinked activation of hormonal, immune, and neuronal pathways as is seen in other chronic pain conditions 61 . It also provides strong support for further investigation of the mechanisms underlying the genetically regulated comorbidity between endometriosis and other types of pain, both to aid the development of new treatments that facilitate early intervention and rationalise the repurposing of currently available therapies.
We also observed significant genetic correlations between endometriosis and two inflammatory conditions: asthma and osteoarthritis. Asthma has previously been associated with endometriosis in epidemiological studies 62 , 63 , and a shared genetic target with was recently highlighted 64 . Endometriosis has also previously been linked epidemiologically to increased risk of joint inflammation, but mainly in relation to rheumatoid arthritis, an auto-immune condition 65 , and not osteoarthritis characterised by auto-inflammation 66 . This link has not been fully explored as the onset of osteoarthritis would typically post-date an endometriosis diagnosis by several decades, and many musculoskeletal cohorts do not capture detailed women’s health data. Given that osteoarthritis also appears to be associated with fluctuations in sex hormones 67 a more detailed exploration into the genetically driven shared pathogenesis of these conditions is warranted.
Our results demonstrate genetic underpinning of shared pathogenesis of endometriosis with other chronic pain conditions, as well as other inflammatory conditions. Our identification of a range of genes with strong evidence for causal association with endometriosis, should inform new avenues of targeted research into gene-specific mechanisms of pain and pathogenesis, with the potential to identify new - or repurposing of existing - treatment targets for this debilitating, enigmatic disease.
Multi Trait
We conducted multi-trait analysis of GWAS (MTAG) between endometriosis and two of the most strongly genetically correlated pain traits with large-scale GWAS data available: multi-site chronic pain (MCP) (rg=0.43) and migraine (rg=0.29). The MTAG methodology leverages genetic correlation between traits to improve power to detect association in genome-wide analyses 47 . The MTAG analysis identified 52 genome-wide significant lead SNPs for endometriosis, 18 of which were not reported in our single trait meta-analysis ( Supplementary Table 30 ); 12 MCP and/or migraine-associated lead SNPs mapped within 500Kb of endometriosis lead SNPs, 9 of which were tagging the same signal (r 2 >0.5) ( Table 2 ). Four of the 12 were shared across all three conditions ( RAB9BP1 /5q21.2, MLLT10 /10p12.31, OLFM4 /13q14.3, NOL4L /20q11.21), 6 between endometriosis and MCP ( FAF1 /1p32.3, PAPPA2 /1q25.2, AFF3 /2q11.2, BSN /3p21.31, JADE2 /5q31.1, PTPRO /12p12.3) and 2 between endometriosis and migraine ( NGF /1p13.2, FSHB /11p14.1) ( Table 2 ).
The 9 shared lead SNPs between endometriosis and MCP or migraine are eQTLs in diverse tissues ( Supplementary Tables 31 – 32 ). Rs1352889 regulates the expression of multiple genes ( UBA7, AMT, RNF123, ARIH2 ) involved in the ubiquitin system, an important cellular mechanism that may be associated with the immune-mediated survival of endometrium implants in ectopic locations 48 . Recent studies have suggested that ubiquitin system failures are implicated in chronic, mainly neuropathic pain (pain caused by a lesion or disease of the somatosensory system) and inflammatory pain 49 . Other eQTLs include rs12580862 regulating expression of RERG (RAS-like Estrogen Regulated Growth Inhibitor), which is induced by estrogen receptors to promote proliferation and survival of endometriotic cells 50 ; and rs4071559 in the 5-prime-UTR of ARL14EP, regulating expression of ARL14EP which controls export of major histocompatibility class (MHC) II molecules involved in initiation of immune response ( Table 1 ). MHCII-restricted T-helper cells appear to play an important role in the development of clinical symptoms associated with neuropathic pain 51 and there is increasing interest in the role of MHCII gene polymorphisms in the susceptibility to chronic neuropathic pain after nerve injury. ARL14EP is located adjacent to FSHB , which encodes follicle-stimulating hormone (FSH) β polypeptide; both genes harbour variants associated with sex hormone regulation and numerous reproductive traits 52 – 55 , suggesting the potential co-regulation of hormone and immune responses at this locus. Neuroimmune mechanisms underlie many chronic pain conditions and this co-regulation could contribute to the increased prevalence of these conditions in women and the frequent observation that symptoms (including non-pelvic chronic pain) vary with hormonal state 56 , 57 .
Extended Data
Q-Q plot for genome-wide association results for a. overall endometriosis, b. rASRM stage I/II endometriosis, c. rASRM stage III/IV endometriosis, d. endometriosis-associated infertility, e. only studies with >300 cases.
Manhattan plot for genome-wide association results for a. overall endometriosis, b. rASRM stage I/II endometriosis, c. rASRM stage III/IV endometriosis, d. endometriosis associated infertility. The GWAS meta-analysis results are shown on the y-axis as −log10(P-value) and on the x-axis is the chromosomal location. The red vertical line illustrates the genome-wide significance (p<5×10 −8 ) and the blue vertical line shows the nominal genome-wide results (p<1×10 −5 ).
Results from meta-analysis contrasting GWAS studies with mixed female and male controls (N cases=5,222, N controls=44,176) vs. GWAS studies with only female controls (N cases=44,176, N controls=657,747). Significant heterogeneity: p-value<1.19×10–3 (0.05/42). The error bars represent standard error estimates for the beta co-efficient estimates. * SNPs with nominal heterogeneity p-values: rs1430787, p=0.03, rs1451383, p=0.01.
Tissue-specific gene enrichment using RNA-Seq data across 35 human tissues from the Human Protein Atlas (a and b) and 29 human tissues from GTEx (c and d). The x-axis shows each of the tissues, and the y-axis represents the tissue-specific gene enrichment −Log10(P−Value) (left) and the fold-change values of the tissue-specific gene enrichment (right).
a. Illustration of the regional association plot for the SKAP1/17q21.32 locus including the 99% credible sets. eSMR endometrium SNP: SMR SNP identified as causal for endometriosis utilizing the eQTL data from endometrium; The shaded region in the credible sets panel is further annotated in panel c. b. SMR significant endometrial eQTL for HOXB9 (SMR p-value=2×10 −6 ). The lower and upper bounds of the boxes represent the first and third quantiles, the whiskers extend 1.5 times the interquartile range from the bounds of the box and the line represents the median. c. position of the SMR-significant eQTL in the SKAP1/17q21.32 locus along with HOXB9 promoter-associated chromatin loops with anchor points containing the lead SNP and the SMR SNP. The SMR-significant SNP associated with expression of HOXB9 in endometrium (eQTL) and endometriosis is shown in black and the associated HOXB9¬ target is shown in red. Valid promoter-associated chromatin loops were generated from H3K27Ac HiChIP libraries from a normal immortalized endometrial cell line (E6E7hTERT) and three endometrial cancer cell lines (ARK1, Ishikawa and JHUEM-14) 68 .
Box plots of log scale sICAM-1 levels in a. overall endometriosis cases (N=136) vs. controls (N=54), b. rASRM stage I/II cases (N=85) vs. controls and rASRM stage III/IV (N=51) vs. controls. Reported p-values are from the adjusted logistic regression model (see Supplementary Information: Methods ) ( Supplementary Table 19 ). The lower and upper bounds of the boxes represent the first and third quantiles, the whiskers extend 1.5 times the interquartile range from the bounds of the box and the line represents the median.
Single cell expression profiles for 18/23 genes regulated by endometriosis risk variants in GTEx Multi-Gene Single Cell Viewer (5 were not included in the database). This is an aster plot where the fraction of cells in which a gene is detected is shown. The cells are categorised by cell-type.
Correlation between the effect sizes of 42 endometriosis-associated GWAS loci contrasting endometriosis surgical sub-types and adenomyosis: a. Adenomyosis vs. rASRM stage I/II, b. Adenomyosis vs. rASRM stage III/IV, c. Adenomyosis vs. endometrioma, d. Adenomyosis vs. deep lesions, e. Adenomyosis vs. superficial lesions. Minor allele frequency for each of the 42 variants is given by shade of grey: Lighter shade of grey designates a smaller MAF, darker shade of grey a larger MAF. Nominal associations (p<0.05) are annotated with locus name and larger circles. The solid black line represents the linear regression line and the dotted black line is the x=y with a slope of 1 for reference of change in ORs. Test statistics including p-values for all the associations are provided in Supplementary Table 21 .
Genetic correlation between only surgically or medically confirmed endometriosis (N=8,390 cases) and 32 immune/inflammatory, pain, reproductive, and metabolic traits/conditions using LD score regression analysis (LDSC). Heritability of each trait is noted in parenthesis on the x-axis. The significance threshold is adjusted for multiple testing using Bonferroni correction. Significant (p<1.56×10 −3 ) correlations are denoted with a red star (*), nominal correlation (p<0.05) with a green star (*). Bars present the genetic correlation (rg) for each trait in relation to endometriosis and the error bars are standard errors. The exact p-values are provided in Supplementary Table 25 .
Genetic correlation between 19 brain imaging traits in UKBB and a. endometriosis (N=58,961) and b. surgically or medically confirmed endometriosis (N=8,390 cases) using LD score regression analysis (LDSC). A total of 6 functional MRI measures (netmat_ICA_09Aug2017_001–006), structural MRI measures including 11 freesurfer derived variables (aseg_lh_volume_Left-Thalamus-Proper, aseg_lh_volume_Left-Hippocampus, aseg_lh_ volume_Left-Amygdala, aseg_lh_volume_Right-Thalamus-Proper, aseg_lh_volume_Right-Hippocampus, aseg_lh_volume_Right-Amygdala, aseg_lh_volume_CC_Posterior, aseg_lh_volume_CC_Mid_Posterior, aseg_lh_volume_CC_Central, aseg_lh_volume_CC_Mid_Anterior, aseg_lh_volume_CC_Anterior) and 2 FAST calculations for insula region (IDP_T1_FAST_ROIs_L_insular_cortex, IDP_T1_FAST_ROIs_R_insular_cortex) were analysed. Heritability of each trait is noted in parenthesis on the x-axis. Nominal correlations are denoted with a green star (*). Bars present the genetic correlation (rg) for each trait in relation to endometriosis and the error bars are standard errors. The exact p-values are provided in Supplementary Table 27 and 28 .
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.