Intro
Endometriosis, a disease affecting 6–10% of women of reproductive age, is defined as the presence of endometrial-like tissue in sites outside of the uterus, most commonly the pelvic peritoneum, ovaries and recto-vaginal septum ( 1 ). Although symptoms vary, affected women most commonly experience chronic pelvic pain, severe dysmenorrhea and subfertility. The disease is inherited as a complex genetic trait ( 1 – 3 ), and aggregates within families in humans ( 4 , 5 ) and non-human primates ( 6 ). Genetic factors accounted for 52% of the variation in liability to endometriosis in an Australian twin study, with a relative-recurrence risk of 2.34 for sibs of endometriosis patients ( 7 ).
We previously reported significant genetic linkage (logarithm of odds (LOD) score >3) to endometriosis on chromosome 10q26 in a study of 1,176 families ( 8 ). We have now performed further analyses to refine the linkage peak and extensive fine mapping to identify regions associated with risk of developing endometriosis. We used latent class analysis to determine whether stratification by disease stage and/or symptoms was possible, and the relative contribution of these factors to linkage in the region. Subsequently, we genotyped a high-density SNP panel in 1,144 familial cases and 1,190 controls. The best association signals were detected at three independent loci although only SNPs at the 96.59 Mb region, harbouring the cytochrome P450 family subfamily C ( CYP2C19 ) gene, showed evidence of replication in independent case:control samples.
Results
Comparative fits of LCA models determined a two-class solution as the most parsimonious with a minimum BIC of −247.50 (one-class BIC −237.30; three-class BIC −220.81). The main phenotypic measure discriminating between the two classes was subfertility. Class 1 families (CL1: 51.7% of the linkage families; 663 QIMR, 138 Oxford) represented an endometriosis type typically without subfertility (91%), a slightly lower proportion with stage B disease (27%), but more common experience of pelvic pain (80.3%). Class 2 (CL2; 48.3%; 268 QIMR, 107 Oxford) families represented a form typically seen with subfertility (89%), a slightly higher proportion with stage B disease (40%) and less common experience of pelvic pain (72.3%).
The results of linkage analyses, performed using the subfertility weighting, did not change substantially by including CL2 families only. Restricting the analysis to CL1 families produced an expLOD=3.62 at approximately 98 Mb, 27 Mb closer to the centromere than the published significant linkage peak (expLOD=3.08 at 125 Mb) ( Fig. 1 ). These results were confirmed by the ordered subset analysis, with an increased expLOD=3.58 when families ranked by fertility scores were successively added to the linkage analysis. The increased evidence for linkage in the fertile subset relative to the entire sample (assessed via 10,000 permutations) was significant ( P =0.02).
Nominal signals of association ( P <1×10 −3 ), supported by multiple SNPs, were seen in three regions ( Table 2 , Fig. 2 ). The SNPs with the smallest P -values were under the fertility-related linkage peak: at ~96.59 Mb (top SNP rs11592737, P =4.9 × 10 −4 , OR=0.78) in intron 7 of the CYP2C19 gene, and at ~105.63 Mb (rs12573103, P =2.5 × 10 −4 , OR=1.24) upstream of the SH3 and PX domain-containing adaptor ( SH3PXD2A ) gene. The next smallest P -value was for rs2250804 at 124.25 Mb ( P =9.7 × 10 −4 ; OR=1.22) under the published linkage peak, in intron 3 of the HtrA serine peptidase 1 precursor ( HTRA1 ) gene. All three signals were independent, with no linkage disequilibrium between the SNPs. However, the permutation analysis showed that none of association signals was significant at a study-wide level ( P <0.05), with corrected empirical P -values of 0.92, 0.75 and 0.99 for rs11592737, rs12573103 and rs2250904, respectively.
Exploratory analyses were performed limiting cases to those from the previously published linkage families (1,105 cases) and CL1 families only (755 cases). P -values for the SNPs at 96.59 Mb, 105.63 Mb and 124.25 Mb remained the smallest in these analyses. For rs11592737 at 96.59 Mb, a P =9.0 × 10 −4 (OR=1.26) was obtained including only linkage family cases but was below background levels including only CL1 family cases ( P =5.0 × 10 −2 ; OR=1.18). For rs12573103 at 105.63 Mb, a P =1.0 × 10 −4 (OR=1.25) was obtained including only linkage family cases and P =5.3 × 10 −5 (OR=1.31) for CL1 cases. The signal for rs2250804 at 124.25 Mb was reduced by including only linkage family ( P =1.1 × 10 −3 , OR=1.23) and CL1 ( P =1.2 × 10 −2 , OR=1.19) cases.
The SNPs with the smallest P -values in the 96.59 Mb, 105.63 Mb and 124.25 Mb regions were not present on the commercial Illumina arrays typed in the replication samples. However, additional SNPs that were in moderate linkage disequilibrium (LD: pairwise r 2 >0.5) were included in our Illumina iSelect panel, allowing a direct comparison of P -values for the discovery and replication datasets. All three SNPs in LD (r 2 ≥0.98) with rs11592737 at 96.59 Mb showed nominal evidence of association ( P =4.0 × 10 −2 for rs7085745 and 5.0 × 10 −2 for rs12243416 and rs11188067; ORs=1.10) in the replication dataset ( Table 2 ), with corresponding P -values between 3.1–4.5 × 10 −4 (ORs=1.14) in the meta-analysis of the fine-mapping discovery and replication datasets ( Table 2 ).
There was no replication signal for SNPs at either 105.63 Mb or 124.25 Mb. The SNP in highest LD with rs12573103 at 105.63 Mb (rs1980653, r 2 =0.94) had a P =0.60 (OR=1.02) in the replication dataset. The two SNPs in highest LD with rs2250804 at 124.25 Mb (rs2300433 and rs2253755, r 2 =0.50) had P -values of 0.44 and 0.47 in the replication dataset ( Table 2 ).
Discussion
Analysis of endometriosis stage and pelvic pain and subfertility symptoms identified two classes of families, distinguished primarily by the presence or absence of subfertility. Separate linkage analyses with the two classes produced significantly increased evidence for linkage amongst families without subfertility, shifting the linkage peak approximately 27 Mb. We genotyped SNPs at high density across the region covered by both the published and fertility-related linkage peaks and found evidence of association at three independent loci. Although this was not significant at a study-wide level, there was evidence for replication of the signal(s) at 96.59 Mb within the CYP2C19 gene in the independent set of endometriosis cases.
The most significantly associated SNP at 96.59 Mb is in intron 7 of CYP2C19 , which participates in the metabolism of drugs and oestrogen including conversion of oestradiol (E2) to oestrone (E1), and the production of E1 and E2 2α- and 16α-hydroxylation metabolites ( 20 , 21 ). The key SNP rs11592737 is in complete LD with rs12248560 (the second best SNP in the region), a functional variant located in the CYP2C19 promoter. The rs12248560 “T” allele (CYP2C19*17; http://www.cypalleles.ki.se/cyp2c19.htm ) increases the rate of CYP2C19 transcription and was initially thought to produce an ‘ultra-rapid’ metaboliser form of the CYP2C19 protein ( 22 ), although a recent review found drug metabolic rates within the ranges seen for wild-type homozygotes ( 23 ).
Further evidence for a role for CYP2C19 in diseases influenced by oestrogen comes from a recent study showing a decreased risk of breast cancer in rs12248560 carriers, possibly through increased catabolism resulting in lower overall oestrogen levels ( 24 ). CYP2C19 has also been associated with endometriosis in a small study of 50 cases and 50 controls suggesting that affected women were significantly more likely ( P =0.023; OR=3.12) to be heterozygous carriers of rs4244285, a splice site defect that abrogates gene expression ( 25 ). However, these findings were not replicated in another small study of 46 cases and 39 controls ( 26 ). While both studies were clearly under-powered, the initial study adds further support to CYP2C19 as a plausible endometriosis candidate gene. The P -value for rs4244285 in our discovery sample was only nominally significant ( P =0.011; OR=1.23), indicating this SNP is not driving our association signal in the region. Several lines of evidence now point to a role for CYP2C19 either through the effect on transcription of the rs12248560 variant or additional rare and low frequency alleles in LD with this SNP. The association result does not account for the linkage signal. Linkage to this region could result from a combination of a common variant like the one described here in CYP2C19 and rare variants of larger effect that might be observed in only a few families.
There was no evidence for replication of the 105.63 Mb or 124.25 Mb signals. The directions of the effects were the same for both the fine-mapping and replication datasets, and both regions harbour plausible candidate genes for endometriosis: SH3PXD2A is an interaction partner of ADAM metallopeptidase domain 12 ( ADAM12 ), which has a role in uterine decidualization in mice ( 27 , 28 ). HTRA1 is upregulated in human decidual cells, suggesting a role in preparing the endometrium for embryo implantation ( 29 ). It is likely that these associations represent false positive signals, although we were unable to test the key SNPs directly in the replication study. Additionally, the discovery sample used familial cases but these were not available for the replication sample, and non-familial cases may have different underlying disease aetiology.
We detected evidence of genetic association in a region of significant linkage to endometriosis on chromosome 10. This signal does not fully account for the previously reported or fertility-related linkage peaks. However, the finding of suggestive association, and the presence of an extremely plausible candidate gene in the region of association, suggest that further investigation is warranted. Future studies should include replication in other samples, a search for rare and novel genetic variants and gene expression studies.
Materials|Methods
The linkage study included 931 affected sister pair families collected by the Queensland Institute of Medical Research (QIMR) and 245 collected by the University of Oxford ( 8 , 9 ). To refine the published linkage peak we sought to increase the genetic homogeneity of the sample by examining endometriosis subtypes using information about self-rated symptoms of pelvic pain (ever experiencing severe pelvic pain) and subfertility (failure to conceive after trying for 12 months) and physician-diagnosed disease stage (based on the revised American Fertility Society (rAFS) classification system) ( 10 ). As it can be difficult to stage disease accurately using clinical records alone, a simplified two-stage system was used ( 9 , 10 ): stage A (rAFS I–II or some ovarian disease plus a few adhesions) and stage B (rAFS III–IV). Latent class analysis (LCA), a method to find subtypes of related cases from multivariate categorical data, was used to investigate the presence and composition of endometriosis subgroups using the Bayes Information Criterion (BIC) ( 11 ) as the index of model goodness-of-fit. The null hypothesis of a one-class (group) solution ( i.e . all individuals belong to the same class or group) is rejected if models with more parameters (groups) provide a smaller BIC value.
To investigate whether stratifying on subfertility provided a more phenotypically homogenous sample, the approach of Cox et. al . ( 12 ) was adapted to conduct linkage analyses with families weighted according to reported subfertility (0=subfertility, 1=no subfertility). Non-parametric, multipoint, affected-only LOD scores were calculated on the basis of the S-pairs scoring function and an exponential allele-sharing model (exponential LOD (expLOD) scores) using the ALLEGRO analysis package ( 13 ). Ordered subset analyses (OSA) ( 14 ) were performed to assess the increased evidence of linkage in subsets of families ordered by their subfertility value relative to the entire sample.
For the fine-mapping association analyses we genotyped unrelated cases mostly from the families included in the linkage study ( 8 ). Subject to DNA availability, cases were chosen to include individuals with the most severe disease, i.e . the highest disease stage or the youngest age at onset if both sisters had the same disease stages. There were 871 such cases from the 931 QIMR families and 231 from Oxford. A further 40 QIMR cases were chosen from families not included in the original linkage analysis but containing a proband plus at least 2 affected relatives using the same criteria.
QIMR controls (N=952) were chosen from female twin pairs originally recruited for a study of gynaecological health ( 15 ), including one sample from pairs where neither sister had self-reported endometriosis. Oxford controls (N=238) were unrelated women recruited in collaborating hospitals who were: 1) undergoing laparoscopy for pelvic pain, subfertility or other gynaecological complaints, hysterectomy or sterilisation; 2) free of endometriosis at surgery, and 3) without a previous surgical diagnosis of endometriosis. All study participants were volunteers, had signed written informed consent and had provided a blood sample for DNA extraction. Ethics approval was obtained from the QIMR Human Research Ethics Committee, and the UK Regional Multi-centre and local Research Ethics Committees.
The 95% confidence interval for both the published (region 112–129 Mb) and ‘fertility-related’ (region 94–107 Mb) linkage peaks extends over 36 Mb (NCBI build 36; http://www.ensembl.org ). Assays for 13,589 SNPs were manufactured and the genotyping and initial quality control performed at Illumina Inc (San Diego, CA, USA) on an Illumina Infinium iSelect custom platform. Across the entire 36 Mb region, gene-based SNPs were included in all exons and 5’ and 3’ untranslated regions for approximately 250 genes, and under the published and fertility-related linkage peaks SNPs tagged to a minimum pair-wise r 2 of 0.97. In an attempt to capture information from rare SNPs (minor allele frequencies (MAFs) <1%)) we did not exclude loci with MAFs of 0 in the HapMap ( www.hapmap.org ) although most variants were common in our dataset: only 6% of SNPs had MAFs <1% (range 0.0002–0.0098; Table 1 ).
Additional quality control was performed on genotype data from 2,369 (1,158 cases, 1,211 controls) individuals and 12,537 polymorphic SNPs using PLINK ( 16 ). We detected and removed individuals with non-Caucasian ancestry and SNPs with >5% missing genotypes or Hardy-Weinberg P -values <1 × 10 −4 in control samples. Thereafter, 1,144 cases (911 QIMR; 233 Oxford), 1,190 controls (952 QIMR; 238 Oxford) and 11,984 SNPs remained in the dataset. Cochran-Mantel-Haenszel (CMH) tests of association were performed using PLINK including QIMR and Oxford data as different strata to account for any subtle differences between populations in baseline effect ( 17 ). Breslow-Day (BD) tests were conducted to check that the assumptions of the CMH test ( i.e. similar effect size across strata) were true. The significance of the association signals was assessed by permutation (10,000 replicates).
We attempted to replicate the results from the ‘discovery’ sample in an independent set of 2,079 cases (1,383 QIMR; 696 Oxford), all surgically confirmed, without a family history, recruited within the QIMR and Oxford studies. Each was genotyped on Illumina Human670Quad Beadarrays for a genome-wide association (GWA) study ( 17 ). There were 7,060 population controls genotyped using: a) Human610Quad (QIMR controls: 1,870 unrelated individuals recruited within the Brisbane Adolescent Twin Study ( 18 , 19 )) or b) Human1M-Duo beadchips (Oxford controls: 5,190 UK unrelated population controls provided by The Wellcome Trust Case Control Consortium 2). Association analysis, and meta-analysis of the P -values for both datasets, were performed using PLINK ( 16 ).
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.