{"paper_id":"576d1225-626d-4784-9154-8caeda044a00","body_text":"Initially, 2,351 surgically-confirmed endometriosis cases were drawn from women recruited by The Queensland Institute of Medical Research (QIMR) study 19  and a further 1,030 cases were obtained from women recruited by the Oxford Endometriosis Gene (OXEGENE) study. Australian controls consisted of 1,870 individuals recruited by QIMR 2  and 1,244 individuals recruited by the Hunter Community Study (HCS) 7 . UK controls encompassed 6,000 individuals provided by the Wellcome Trust Case Control Consortium 2 (WTCCC2). Approval for the studies was obtained from the QIMR Human Ethics Research Committee, the University of Newcastle and Hunter New England Population Health Human Research Ethics Committees, and the Oxford regional multi-centre and local research ethics committees. Informed consent was obtained from all participants prior to testing 2 .\nAll Japanese GWA case and control samples were obtained from the BioBank Japan (BBJ) at the Institute of Medical Science, the University of Tokyo. A total of 1,423 cases were diagnosed with endometriosis by the following one or more examinations: multiple clinical symptoms, physical examinations, and laparoscopy or imaging tests. We utilized 1,318 female control samples consisting of healthy volunteers from Osaka-Midosuji Rotary Club, Osaka, Japan and women in the Biobank Japan who were registered to have no history of endometriosis. All participants provided written informed consent to this study. This study was approved by the ethical committees at the Institute of Medical Science, the University of Tokyo and Center for Genomic Medicine, RIKEN Yokohama Institute.\nQIMR and OX cases, and QIMR controls were genotyped at deCODE Genetics on Illumina 670-Quad (cases) and 610-Quad (controls) BeadChips (Illumina Inc), respectively. HCS controls were genotyped at the University of Newcastle on 610-Quad BeadChips (Illumina Inc). The WTCCC2 controls were genotyped at the Wellcome Trust Sanger Institute using Illumina HumanHap1M BeadChips. Genotypes for QIMR cases and controls were called with the Illumina BeadStudio software. Standard quality control procedures were applied as outlined previously 20 . Briefly, individuals with call rates <0.95 then SNPs with a mean BeadStudio GenCall score < 0.7, call rates < 0.95, Hardy-Weinberg equilibrium  P  < 10 −6 , and minor allele frequency (MAF) < 0.01 were excluded. Cryptic relatedness between individuals was identified through a full identity-by-state matrix. Ancestry outliers were identified using data from 11 populations of the HapMap 3 and five Northern European populations genotyped by the GenomeEUtwin consortium, using EIGENSOFT 21 , 22 . To increase the power of the Australian GWA dataset we ancestrally matched the existing QIMR cases and controls 2  to individuals from the Hunter Community Study (HCS) 7  genotyped on Illumina 610 chips. After stringent quality control, the resulting QIMRHCS GWA cohort consists of 2,262 endometriosis cases and 2,924 controls, increasing the Australian effective sample size by 24%. 2\nQuality control procedures for the OX genotype data resulted in the removal of SNPs with a genotype call rate < 0.99 and/or heterozygosity < 0.31 or > 0.33. Genome-wide IBS was estimated for each pair of individuals and one individual from each duplicate or related pair (IBS > 0.82) was removed. Genotype data were combined with CEU, CHB&JPT and YRI genotype data from HapMap 3 and individuals of non Northern European ancestry were identified using EIGENSOFT and subsequently removed. SNPs with a genotype call rate < 0.95 were removed, and this threshold was increased to 0.99 for SNPs with MAF < 0.05. In addition, SNPs showing a significant a) deviation from HWE ( P  < 1 × 10 −6 ); b) difference in call rate between 58BC and NBS control groups ( P  < 1 × 10 −4 ); c) difference in allele/genotype frequency between control groups ( P  < 1 × 10 −4 ); d) difference in call rate between cases and controls ( P  < 1 × 10 −4 ) and e) a MAF < 0.01 were removed. 2\nThe BBJ cases and controls were genotyped using the Illumina HumanHap550v3 Genotyping BeadChip. Quality control included sample call rate ≥ 0.98, identity-by-state to exclude close relatedness samples and principal component analysis to exclude non-Asian samples. We also performed SNP quality control (call rate of ≥ 0.99 in both cases and controls and Hardy-Weinberg equilibrium test  P  ≥ 1.0 × 10 −6  in controls); 460,945 SNPs on all chromosomes passed the quality control filters and were further analyzed. 1\nFor SNPs passing QC, tests of allelic association (--assoc) were performed using PLINK 23  in the separate QIMRHCS, OX and BBJ GWA datasets. The primary meta-analysis of all endometriosis cases versus controls in the QIMRHCS, OX and BBJ GWA data was performed using a fixed-effect (inverse variance-weighted) model, where the effect size estimates, or β-coefficients, are weighted by their estimated standard errors, utilizing the GWAMA software 24 .\nThe threshold of 7.2 × 10 −8  for GWA studies of dense SNPs and resequence data 25  proposed by Dudbridge and Gusnanto 26  was utilized to indicate genome-wide  significant  association, while SNPs with  P  ≤ 10 −5  were considered to show a  suggestive  association [as used in the online ‘Catalog of Published Genome-Wide Association Studies’].\nAlso, given the substantially greater genetic loading of moderate to severe (stage B) endometriosis (rAFS stage III or IV disease) compared to minimal (stage A) endometriosis (rAFS stage I or II disease) 2 , a secondary analysis was performed for suggestive SNPs ( P  ≤ 10 −5 ); where the association results from QIMRHCS and OX stage B cases versus controls, were meta-analyzed with the BBJ association results. As previously demonstrated 2 , the exclusion of minimal endometriosis cases has the potential to enrich true genetic risk effects, even taking into account the reduced sample size.\nConsistency of allelic effects across studies was examined utilizing the  Cochran’s Q  test 27 . Between-study (effect) heterogeneity was indicated by  Q statistic P  values < 0.1 28 . Meta-analysis of SNPs associated with fixed-effect  P  ≤ 10 −5  and showing evidence of effect heterogeneity were also analyzed using the recently developed Han and Eskin’s random effects model (RE2) implemented in the Metasoft software 29 . In contrast to the conventional DerSimonian-Laird random effects (RE) model 30 , the RE2 model  increases  power under heterogeneity 29 .\nIn order to assess the impact of variants not present on the Illumina BeadChips and better define the associated regions, we imputed genotypes within ±2500 kb of the most significant genotyped SNP using the full reference panel from the 1000 Genomes project Interim Phase I Haplotypes (2010-11 data freeze, 2011-06 haplotypes). Imputation was performed separately for the QIMRHCS, OX and BBJ GWA datasets with only the overlapping genotyped SNPs within ±2500kb of the most significant genotyped SNP, using the MaCH and minimac programs 31 , 32  and following the two-step approach outlined in the online ‘Minimac: 1000 Genomes Imputation Cookbook’. Analysis of imputed genotype dosage scores was performed using mach2dat 31 , 32  and PLINK. The quality of imputation was assessed by means of the Rsq statistic. Results for poorly imputed SNPs, defined as having an Rsq < 0.3, were subsequently removed. The results from association analysis of imputed data in the QIMRHCS, OX and BBJ datasets were then combined via meta-analysis of the β-coefficients weighted by their estimated standard errors using GWAMA.\nIndependent of the BBJ GWA case-control cohort, a total of 1,044 cases and 4,017 controls were obtained from the BioBank Japan and utilized for replication. We note that 653 of these 1,044 cases were also utilized in a small GWA study (Adachi et al. 2010) of 696 cases and 825 controls 9 . To utilize all available association data for rs13394619 maximally, given there is incomplete overlap between the Adachi and our replication cases and zero overlap between the controls, we worked with the published results for rs13394619 in Adachi et al (2010) and the results from comparing our non-overlapping 391 replication cases to our 4,017 replication controls.\nThe seven SNPs (rs7521902, rs13394619, rs4141819, rs7739264, rs12700667, rs1537377 and rs10859871) reaching genome-wide significance in the GWA meta-analysis were genotyped in the independent Japanese replication cohort using the multiplex PCR-based Invader assay (Third Wave Technologies), as previously described 1 .\nTests of allelic association were performed using PLINK in the independent Japanese replication cohort. Because only the associations in the same direction would be considered as replicated, one-sided  P  values were obtained by halving the standard (two-sided) PLINK  P  values. To determine the total evidence for association, the one-sided replication  P  values were meta-analyzed with the QIMRHCS, OX, BBJ [and Adachi 9  500K (290 cases and 262 controls) and 6.0 (406 cases and 563 controls) for rs13394619] GWA  P  values using METAL 33 . The  P  values observed in each case-control cohort were converted into a signed Z-score. Z-scores for each allele were combined across samples in a weighted sum, with weights proportional to the square-root of the sample size for each cohort 34 . Given that our cohorts have unequal numbers of cases and controls, we utilized the effective sample size, where  N eff  = 4 / (1 /  N cases  + 1 /  N controls ) 33 . We also performed meta-analysis of the β-coefficients weighted by their estimated standard errors using GWAMA to estimate the overall odds ratio and 95% CI for the genome-wide significant SNPs.\nThe aim of the prediction analysis was to evaluate the aggregate effects of many variants of small effect. We summarized variation across nominally associated loci into quantitative scores and related the scores to disease state in independent samples. Although variants of small effect (e.g., genotype relative risk of 1.05) are unlikely to achieve even nominal significance, increasing proportions of “true” effects will be detected at increasingly liberal  P  value thresholds, e.g.  P  < 0.1 (i.e., ~10% of all SNPs),  P  < 0.2, etc. Using such thresholds, we defined large sets of “allele specific scores” in the “discovery” sample of the Japanese BioBank (BBJ) endometriosis case-control set (1,423 cases, 1,318 controls) to generate risk scores for individuals in the “target” sample of the QIMRHCS (2,262 cases, 2,924 controls), OX (919 cases, 5,151 controls) and combined European (QIMRHCS+OX) endometriosis case-control sets (3,181 cases, 8,075 controls). The term risk score is used instead of risk, as it is impossible to differentiate the minority of true risk alleles from the non-associated variants. In the discovery sample, we selected sets of allele specific scores for SNPs with the following levels of significance;  P  < 0.01,  P  < 0.05,  P  < 0.1,  P  < 0.2,  P  < 0.3,  P  < 0.4,  P  < 0.5,  P  < 0.6,  P  < 0.7,  P  < 0.8,  P  < 0.9,  P  < 1.0. For each individual in the target sample, we calculated the number of score alleles that they possessed, each weighted by the log odds ratio from the discovery sample. To assess whether the aggregate scores reflect endometriosis risk, we tested for a higher mean score in cases compared to controls. Logistic regression was used to assess the relationship between target sample disease status and aggregate risk score. Nagelkerke’s pseudo  R 2  was used to assess the variance explained. Prediction was performed using all 407,632 SNPs overlapping the QIMRHCS, OX and BBJ GWA datasets, and after excluding the 6,163 SNPs within ±2500 kb of the seven implicated SNPs listed in  Table 1 . We also performed the predictions in reverse, using QIMRHCS+OX-derived risk scores to predict affection status in the BBJ case-control set.\nGene-based approaches can be more powerful than traditional individual-SNP-based approaches in the presence of allelic heterogeneity. Therefore, utilizing the QIMRHCS, OX and BBJ GWA data, we performed a genome-wide gene-based association study using VEGAS 14 . Briefly, for the 407,632 overlapping SNPs, the  P  values from the European GWA study (i.e., FE meta-analysis of QIMRHCS and OX GWA data) and the  P  values from the Japanese (BBJ) GWA study were analyzed separately using VEGAS. The VEGAS test incorporates evidence for association from all SNPs across a gene and accounts for gene size (number of SNPs) and LD between SNPs by using simulations from the multivariate normal distribution. The resulting European and Japanese gene-based  P  values were meta-analyzed using Stouffer’s Z-score combined p-value method 34 . A total of 17,538 genes (including 50 kb 5′ and 3′ of their transcription start and end site, respectively 14 ) contained association results for ≥1 SNP, so a Bonferroni adjusted significance threshold of  P  ≤ 2.85 × 10 −6  (0.05 / 17,538) was utilized to indicate genome-wide gene-based  significant  association.\n\nSupplementary information  One file ‘Supplementary Text and Figures.pdf’, containing: Supplementary Figures 1–19, Supplementary Tables 1–4, Supplementary Note and References.","source_license":"CC0","license_restricted":false}