Identification of MKNK1 and TOP3A as ovarian endometriosis risk-associated genes using integrative genomic analyses and functional experiments

article OA: gold CC0 ⤵ 3 in-corpus citations
AI-generated summary by claude@2026-06, 2026-06-08

Integrative genomic analyses identified MKNK1 and TOP3A as novel ovarian endometriosis risk genes, with MKNK1 knockdown inhibiting ectopic endometrial stromal cell migration and invasion, and TOP3A knockdown inhibiting proliferation and migration while promoting apoptosis.

One-sentence paraphrase of the abstract; not a substitute for reading it. No clinical advice. How this works

AI-generated deep summary by claude@2026-06, 2026-06-11 · read from full text

The paper used large-scale European GWAS summary statistics for ovarian endometriosis (245,494 participants) integrated with three independent eQTL datasets via the Sherlock Bayesian framework, alongside gene-level association (MAGMA) and additional prediction-based validation using S-PrediXcan with GTEx blood eQTL data, to identify candidate risk genes and related pathways. It reports that MKNK1 and TOP3A emerged as ovarian endometriosis risk-associated genes, supported by peripheral blood transcriptome sequencing, immunohistochemistry across ectopic, eutopic, and normal endometrium samples, and in vitro functional experiments, with significance calibrated by simulation and multiple-testing controls. A key caveat is that the functional and expression validation relied on relatively small endometriosis and control sample sizes (N=30 per group) and excluded individuals with adenomyosis, potentially limiting broader inference. This paper is centrally about endometriosis — it identifies and functionally characterizes MKNK1 and TOP3A as ovarian endometriosis risk-associated genes using integrative genomic analyses.

Read from the paper's body, not the abstract. Not a substitute for reading the paper. No clinical advice. How this works

Abstract

The risk of endometriosis (EM), which is a common complex gynaecological disease, is related to genetic predisposition. However, it is unclear how genetic variants confer the risk of EM. Here, via Sherlock integrative analysis, we combined large-scale genome-wide association studies (GWAS) summary statistics on EM (N = 245,494) with a blood-based eQTL dataset (N = 1490) to identify EM risk-related genes. For validation, we leveraged two independent eQTL datasets (N = 769) for integration with the GWAS data. Thus, we prioritised 14 genes, including GIMAP4, TOP3A, and NMNAT3, which showed significant association with susceptibility to EM. We also utilised two independent methods, Multi-marker Analysis of GenoMic Annotation and S-PrediXcan, to further validate the EM risk-associated genes. Moreover, protein-protein interaction network analysis showed the 14 genes were functionally connected. Functional enrichment analyses further demonstrated that these genes were significantly enriched in metabolic and immune-related pathways. Differential gene expression analysis showed that in peripheral blood samples from patients with ovarian EM, TOP3A, MKNK1, SIPA1L2, and NUCB1 were significantly upregulated, while HOXB2, GIMAP5, and MGMT were significantly downregulated compared with their expression levels in samples from the controls. Immunohistochemistry further confirmed the increased expression levels of MKNK1 and TOP3A in the ectopic and eutopic endometrium compared to normal endometrium, while HOBX2 was downregulated in the endometrium of women with ovarian EM. Finally, in ex vivo functional experiments, MKNK1 knockdown inhibited ectopic endometrial stromal cells (EESCs) migration and invasion. TOP3A knockdown inhibited EESCs proliferation, migration, and invasion, while promoting their apoptosis. Convergent lines of evidence suggested that MKNK1 and TOP3A are novel EM risk-related genes.
Full text 54,349 characters · extracted from pmc-nxml · 7 sections · click to expand

Credit

Yizhou Huang: Conceptualization, Methodology, Software, Data curation, Visualization, Investigation, Validation, Writing – original draft, Writing – review & editing. Jie Luo: Conceptualization, Data curation, Investigation, Resources, Writing – original draft. Yue Zhang: Methodology, Data curation, Validation, Visualization. Tao Zhang: Data curation, Resources, Investigation. Xiangwei Fei: Methodology, Funding acquisition. Liqing Chen: Data curation, Funding acquisition. Yingfan Zhu: Data curation. Songyue Li: Data curation. Caiyun Zhou: Methodology. Kaihong Xu: Funding acquisition. Yunlong Ma: Conceptualization, Software, Visualization Writing – review & editing. Jun Lin: Supervision, Writing – review & editing. Jianhong Zhou: Conceptualization, Supervision, Writing – review & editing.

Funding

This work was funded by the 10.13039/501100001809 National Natural Science Foundation of China (grant Numbers 81901454, 81801420, and 81671426), This work was also funded by Zhejiang Provincial Natural Science Foundation of China (LGF20H040009).

Results

In the current study, we leveraged a comprehensive integrative framework to prioritise novel EM susceptibility genes, based on multiple bioinformatics methods and functional experiments ( Fig. 1 ). To highlight the EM-associated risk genes, three steps were involved: 1) Using Sherlock integrative analysis, MAGMA, and S-PrediXcan methods, we combined GWAS summary statistics on EM (N = 245,494) with three independent eQTL datasets (N = 2259) to identify novel genes associated with EM risk and then used multiple bioinformatics approaches, including permutation, pathway-based enrichment, and PPI network analyses to examine the biological functions of the identified genes in silico . 2) To further validate the EM-risk genes, we collected peripheral blood samples from 30 patients with ovarian EM and 30 matched healthy controls for transcriptome sequencing and performed DGE and co-expression analyses. 3) To further explore the functional roles of the top-ranked newly identified EM-risk genes, we conducted follow-up validation studies by performing IHC and cellular functional experiments. Fig. 1 shows a detailed schematic of the proposed framework. Fig. 1 Schematic of framework. EM, endometriosis. GWAS, genome-wide association study. eQTL, expression quantitative trait loci. MAGMA, Multi-marker Analysis of GenoMic Annotation. GO, Gene Ontology. DGE, differential gene expression. IHC, immunohistochemical. EESCs, ectopic endometrial stromal cells. Fig. 1 Schematic of framework. EM, endometriosis. GWAS, genome-wide association study. eQTL, expression quantitative trait loci. MAGMA, Multi-marker Analysis of GenoMic Annotation. GO, Gene Ontology. DGE, differential gene expression. IHC, immunohistochemical. EESCs, ectopic endometrial stromal cells. The workflow of the integrative genomics analysis is shown in Supplementary Fig. S1 . At the discovery stage, we integrated the GWAS summary statistics on EM (Dataset #1, N = 245,494) with eQTL data (Dataset #3, N = 1490) using Sherlock integrative analysis to determine whether abnormal gene expression confers susceptibility to EM. We found that 715 genes were significantly associated with EM risk ( Sherlock -based permuted P  ≤ 0.05; Gene set #1, Supplementary Table S2 ). We re-performed the Sherlock analysis with the same parameter settings to validate the identified genes using two independent eQTL datasets (Datasets #4 and #5). In this regard, we found 683 significant EM risk-associated genes from Dataset #4 ( Sherlock -based permuted P  < 0.05; Gene set #2, Supplementary Table S3 ) and 330 significant EM risk-associated genes from Dataset #5 ( Sherlock -based permuted P  < 0.05, Gene set #3, Supplementary Table S4 ). Comparing the identified genes from Gene set #1 in the discovery stage with those from Gene sets #2 and #3, 14 overlapping genes across the three gene sets, including GIMAP4 (permuted P  = 1.08 × 10 −3 ), TOP3A (permuted P  = 2.19 × 10 −3 ), and NMNAT3 (permuted P  = 5.75 × 10 −3 ), were identified as EM risk-associated genes ( Fig. 2 A and Table 1 ). Fig. 2 Integrative genomics analyses identify risk genes and pathways for EM. (A) Venn diagram of three identified EM-risk gene sets: Gene set #1, Gene set #2, and Gene set #3 are based on Sherlock integrative genomics analysis by combining Zeller et al. eQTL data (Dataset #3), Dixon et al. eQTL data (Dataset #4), and GTEx blood eQTL data (Dataset #5) with GWAS summary statistics on EM, respectively. (B, C) Venn diagrams of the significantly enriched pathways (B) and GO terms (C) by three identified gene sets. (D, E) The scatter diagrams showing the 19 common significant pathways (D) and 35 common significant GO terms (E) based on Gene set #1. EM, endometriosis. GO, Gene Ontology. Fig. 2 Table 1 Comprehensive genomics analyses showing that 14 genes are implicated in EM risk. Table 1 Gene LBF in the discovery stage Permuted P- value ( Sherlock analysis of Dataset #3) Permuted P- value ( Sherlock analysis of Dataset #4) Permuted P- value ( Sherlock analysis of Dataset #5) MAGMA-based P- value (Dataset #1) MAGMA-based P- value (Dataset #2, Negative control) S-PrediXcan-based P- value (GTEx v7 whole blood) GIMAP4 3.41 0.0011 0.0023 0.0019 0.0014 0.13 0.0049 TOP3A 2.86 0.0022 0.024 0.0028 0.011 0.11 0.0032 NMNAT3 2.12 0.0058 0.006 0.020 0.12 0.52 0.23 MKNK1 1.70 0.0098 0.032 0.0039 3.88E-05 0.65 0.027 TPM2 1.62 0.011 0.034 0.012 0.025 0.57 0.0015 SIPA1L2 1.57 0.012 0.041 0.019 0.32 0.080 0.0017 METTL27 1.44 0.014 0.011 0.0087 0.0046 NA 0.048 NUCB1 1.26 0.017 0.023 0.0098 0.0079 0.35 0.0081 VAMP4 1.22 0.018 0.04 0.0082 0.013 0.38 0.0099 ENDOG 1.01 0.024 0.041 0.026 0.026 0.22 0.056 HOXB2 0.85 0.029 0.04 0.0088 0.051 NA 0.018 GIMAP5 0.57 0.044 0.0063 0.048 0.0014 0.62 0.46 RBM18 0.55 0.045 0.0078 0.040 0.12 0.80 0.84 MGMT 0.54 0.046 0.023 0.036 0.0052 0.90 0.84 EM, endometriosis; LBF, logarithm of the Bayes Factor; MAGMA, Multi-marker Analysis of GenoMic Annotation. Integrative genomics analyses identify risk genes and pathways for EM. (A) Venn diagram of three identified EM-risk gene sets: Gene set #1, Gene set #2, and Gene set #3 are based on Sherlock integrative genomics analysis by combining Zeller et al. eQTL data (Dataset #3), Dixon et al. eQTL data (Dataset #4), and GTEx blood eQTL data (Dataset #5) with GWAS summary statistics on EM, respectively. (B, C) Venn diagrams of the significantly enriched pathways (B) and GO terms (C) by three identified gene sets. (D, E) The scatter diagrams showing the 19 common significant pathways (D) and 35 common significant GO terms (E) based on Gene set #1. EM, endometriosis. GO, Gene Ontology. Comprehensive genomics analyses showing that 14 genes are implicated in EM risk. EM, endometriosis; LBF, logarithm of the Bayes Factor; MAGMA, Multi-marker Analysis of GenoMic Annotation. Furthermore, we performed functional enrichment analyses for the three gene sets identified above based on the KEGG pathway and gene ontology (GO) terms using the KOBAS web-access tool. Thus, we observed that 19 common biological pathways and 35 common GO terms were significantly enriched by gene sets #1, #2, and #3 (FDR < 0.05, Fig. 2 B–E and Supplementary Table S5 ). Based on Gene set #1, the top-ranked significantly enriched pathways included metabolism of proteins ( P  = 1.27 × 10 −21 ) and immune system ( P  = 4.66 × 10 −21 ), while the top-ranked significant GO term was membrane-bound organelles ( P  = 1.76 × 10 −26 ). To further validate the identified EM risk-associated genes, an independent method of gene-level genetic association analysis was adopted using the MAGMA tool. Thus, 1228 significant or suggestive genes associated with EM were found (MAGMA-based P  < 0.05, Gene set #4, Supplementary Table S6 ). Consistently, 10 of the 14 Sherlock -identified risk genes were validated via the MAGMA analysis ( Fig. 3 A and Supplementary Table S7 ). Furthermore, none of the three Sherlock -identified EM risk-associated gene sets overlapped with the MAGMA-identified Null trait-related genes ( Fig. 3 B, Gene set #5, as a negative control). Fig. 3 Consistent evidence supporting the identified EM-risk genes from integrative genomics analyses. (A, B) Venn diagrams showing the overlapping genes of three Sherlock-identified EM-risk gene sets with MAGMA-identified EM-risk genes (Gene set #4; A), and with MAGMA-identified Null trait-related genes (Gene set #5, as a negative control; B). (C–E) Computer-based permutation analyses of 100,000 times for comparison of genes from Gene set #1 with that from Gene set #2 (C), Gene set #3 (D), and Gene set #4 (E). EM, endometriosis. MAGMA, Multi-marker Analysis of GenoMic Annotation. Fig. 3 Consistent evidence supporting the identified EM-risk genes from integrative genomics analyses. (A, B) Venn diagrams showing the overlapping genes of three Sherlock-identified EM-risk gene sets with MAGMA-identified EM-risk genes (Gene set #4; A), and with MAGMA-identified Null trait-related genes (Gene set #5, as a negative control; B). (C–E) Computer-based permutation analyses of 100,000 times for comparison of genes from Gene set #1 with that from Gene set #2 (C), Gene set #3 (D), and Gene set #4 (E). EM, endometriosis. MAGMA, Multi-marker Analysis of GenoMic Annotation. To ensure the reliability of our findings, we performed in silico permutation analyses 100,000 times. Thus, we observed that the significant genes involved in Gene set #1 remarkably overlapped with Gene set #2 (Permuted P  = 2.0 × 10 −5 ; see Fig. 3 C), Gene set #3 (Permuted P  = 0; see Fig. 3 D), and Gene set #4 (Permuted P  = 0; see Fig. 3 E), indicating that these identified EM-associated risk genes are biologically consistent. Moreover, to further ensure the reliability of our results, we compared the three identified significant gene sets based on Sherlock analysis with the gene sets based on the MAGMA analysis of GWAS summary statistics on EM and the Null phenotype. Thus, we observed that Gene sets #1, #2, and #3 showed markedly higher overlapping rates with Gene set #4 than with Gene set #5 at three different P -value thresholds (i.e., 0.05, 0.01, and 0.001; Supplementary Fig. S4 ), indicating that the identified genes associated with EM risk were attributable to genetic components instead of random events. As referenced in previous studies [11] , [27] , [36] , using the S-PrediXcan method as an independent technique to integrate GWAS summary statistics with GTEx blood-based eQTL data, we consistently found that nine of these 14 identified genes were associated with EM ( Table 1 ). Also, we performed an integrative analysis of incorporating GWAS summary data on EM with GTEx eQTL data from uterus tissue that is more relevant tissue, and replicated six top-identified genes showing notable associations with EM ( P  < 0.05, Supplementary Table S8 ). To distinguish the causality of identified genes, we further conducted Mendelian randomization analyses of integrating GWAS summary data and two blood eQTL datasets from the eQTLGen and GTEx databases using SMR tool [13] . For GETx blood dataset, we found that nine of these 14 top-identified risk genes, including TOP3A and MKNK1 , showed notable causality with EM ( P SMR 0.05), and three genes exhibited suggestive causality with EM (0.05 <  P SMR 0.05). As for eQTLGen blood dataset, we also identified nine genes showing notable causality with EM, including TOP3A and MKNK1 ( P SMR 0.05), and one gene exhibiting suggestively causal evidence for EM (0.05 <  P SMR 0.05). Consistently, there is a high correlation of the SMR results between the eQTLGen and GTEx blood eQTL datasets (rho = 0.7, P = 0.0057, Supplementary Table S9 and Fig. S5 ). Notably, it has been reported that TPM2 , HOXB2 , and MGMT are associated with EM [37] , [38] , [39] . Based on our comprehensive integrative genomics analysis, we identified 14 genes, including 11 novel risk genes implicated in EM susceptibility ( Table 1 ). To investigate the underlying molecular links corresponding to the 14 EM risk-associated genes, we constructed PPI network analysis using the GeneMANIA tool. From Fig. 4 , it is evident that strong biological interactions existed among these identified risk genes; this is consistent with previous evidence that biologically related genes may demonstrate convergent contribution to complex disease risk [40] , [41] . Co-expression links accounted for most (71.51%) of the molecular interactions among these identified risk genes. For example, NUCB1 showed co-expression with TPM2 , HOXB2 , and MGMT . Additionally, protein domains were also shared among these genes. Notably, these protein domains accounted for 27.25% of the total network. Our PPI network analysis also demonstrated that the 14 identified EM risk-associated genes might have a synergistic contribution to the pathogenesis of EM. Fig. 4 PPI-network of 14 EM-risk genes using the GeneMANIA tool. The 14 EM-risk genes were marked with red colour, and the 20 predicted genes were marked with green colour. The underlying molecular interactions between each gene pair were attributed based on the co-expression links (account for 71.51%), shared protein domains (account for 27.25%), and co-localization (account for 1.24%). PPI, protein–protein interaction. Fig. 4 PPI-network of 14 EM-risk genes using the GeneMANIA tool. The 14 EM-risk genes were marked with red colour, and the 20 predicted genes were marked with green colour. The underlying molecular interactions between each gene pair were attributed based on the co-expression links (account for 71.51%), shared protein domains (account for 27.25%), and co-localization (account for 1.24%). PPI, protein–protein interaction. To further validate these results, we subjected peripheral blood samples from 30 ovarian EM cases and 30 healthy controls to DGE analysis via RNA sequencing. The expression levels of the 14 EM risk-associated genes in each individual were visualized as heatmaps as shown in Supplementary Fig. S6 . Further, we examined the differences in the expression levels of the 14 EM risk-associated genes identified by Sherlock analysis between the groups. As shown in Fig. 5 A–G and Table 2 , among the 14 genes, TOP3A , MKNK1 , SIPA1L2 , and NUCB1 were significantly upregulated, while HOXB2 , GIMAP5 , and MGMT were significantly downregulated in PBMC from patients with ovarian EM compared with the samples from the controls. The changes in the expression levels of the other seven genes ( GIMAP4 , NMNAT3 , TPM2 , METTL27 , VAMP4 , ENDOG , and RBM18 ) did not show any significant difference between the EM cases and the healthy controls ( Supplementary Fig. S7 ). By performing co-expression pattern analysis, we found that the co-expression patterns of the 14 important genes were prominently altered among the patients with EM compared with the control cases ( Fig. 5 H). Overall, our RNA sequencing results provided further evidence that the identified genes showed aberrant expression levels in EM compared with the healthy controls. Based on the scoring of the multiple supportive evidence corresponding to the 14 genes ( Table 2 ), we selected the top five genes, TOP3A , MKNK1 , SIPA1L2 , NUCB1 , and HOXB2 , for follow-up protein endometrial tissue expression evaluation. Fig. 5 Differential gene expression and co-expression patterns of EM-risk genes in PBMCs of women with ovarian EM and health controls based on RNA-Seq. ( A-G ) Violin plots showing significantly different expressed genes between EM and controls for TOP3A (A), MKNK1 (B), SIPA1L2 (C), NUCB1 (D), HOXB2 (E), GIMAP5 (F), and MGMT (G). ( H ) Co-expression pattern analysis of 14 EM-risk genes between controls and EM. The colour legend showing the degree of correlation coefficients, red represents − 1 and blue represents + 1. PBMC, peripheral blood mononuclear cell. EM, endometriosis. Fig. 5 Table 2 Seven significantly differentially expressed genes verified by subjecting PBMCs from women with ovarian EM and healthy controls to RNA-sequencing. Table 2 Gene names Log2 Fold Change P -value Evidence Scoring TOP3A 0.56 1.15E-08 6 MKNK1 0.94 1.91E-09 6 NUCB1 0.30 0.011 6 SIPA1L2 1.62 1.27E-10 5 HOXB2 -0.42 0.0021 5 GIMAP5 -0.46 0.0013 5 MGMT -0.42 0.003 5 Note: Evidence scores were calculated by combining all pieces of supportive evidence from the analyses performed in this study, including Sherlock , MAGMA, S-PrediXcan, and DGE analyses. A score of 1 indicates a significant result, while a score of 0 indicates a non-significant result. EM, endometriosis; PBMC, peripheral blood mononuclear cell; MAGMA, Multi-marker Analysis of GenoMic Annotation; DGE, differential gene expression. Differential gene expression and co-expression patterns of EM-risk genes in PBMCs of women with ovarian EM and health controls based on RNA-Seq. ( A-G ) Violin plots showing significantly different expressed genes between EM and controls for TOP3A (A), MKNK1 (B), SIPA1L2 (C), NUCB1 (D), HOXB2 (E), GIMAP5 (F), and MGMT (G). ( H ) Co-expression pattern analysis of 14 EM-risk genes between controls and EM. The colour legend showing the degree of correlation coefficients, red represents − 1 and blue represents + 1. PBMC, peripheral blood mononuclear cell. EM, endometriosis. Seven significantly differentially expressed genes verified by subjecting PBMCs from women with ovarian EM and healthy controls to RNA-sequencing. Note: Evidence scores were calculated by combining all pieces of supportive evidence from the analyses performed in this study, including Sherlock , MAGMA, S-PrediXcan, and DGE analyses. A score of 1 indicates a significant result, while a score of 0 indicates a non-significant result. EM, endometriosis; PBMC, peripheral blood mononuclear cell; MAGMA, Multi-marker Analysis of GenoMic Annotation; DGE, differential gene expression. To evaluate the protein expression levels and localisation of the identified EM risk-associated genes in the endometrium of the patients with EM and the healthy controls, we performed IHC staining for TOP3A , MKNK1 , SIPA1L2 , NUCB1 , and HOXB2 in ectopic and eutopic endometria samples from 30 patients with ovarian EM and normal endometria samples from the 30 control patients. Thus, we observed that MKNK1 was primarily localised in the nucleus of endometrial glandular epithelial cells, whereas its expression in the endometrial stroma was comparatively weak ( Fig. 6 A–C). Additionally, MKNK1 expression was significantly higher in eutopic endometrium than in normal endometrium and showed the highest expression level in ectopic endometrium ( P  < 0.05; Fig. 6 D). Further, TOP3A was predominantly immunolocalised in the cytomembrane and cytoplasm of endometrial glandular epithelial cells ( Fig. 6 E–G). Additionally, its expression level (IRS) in the ectopic endometrium was significantly higher than that in the eutopic and normal endometrium ( P  < 0.001 for both), and its expression in the eutopic endometrium was increased compared with that corresponding to normal endometrium ( P  < 0.01), as presented in Fig. 6 H. Endometrial glandular epithelial and stromal cells expressed HOXB2 , and the protein of this gene was primarily localised in the cytoplasm and nucleus ( Fig. 6 I–K). Additionally, ectopic endometrium showed decreased HOXB2 expression compared with eutopic and normal endometrium ( P   0.05), as can be seen in Fig. 6 L. NUCB1 was expressed in both the endometrial stroma and epithelium and mainly presented the glandular epithelial cells cytoplasmic staining in ( Fig. 6 M–O). Ectopic, eutopic, and normal endometria did not differ significantly with respect to NUCB1 expression ( P  > 0.05, Fig. 6 P). Unfortunately, due to the lack of a good antibody, SIPA1L2 expression could not be detected (data not shown). Fig. 6 Immunoreactivity of MKNK1, TOP3A, HOXB2 and NUCB1 in endometrium from women with and without ovarian endometriosis. The expression and localisation of MKNK1 (A–C), TOP3A (E–G), HOXB2 (I–K), NUCB1 (M–O) in normal, eutopic, and ectopic endometrium evaluated by IHC staining, respectively. The comparisons of IRS across three groups (D, H, L, P). Values are presented as mean± SEM. P- values were determined by Kruskal-Wallis tests followed by multiple comparisons. ∗ P  < 0.05, ∗∗ P  < 0.01. and ∗∗∗ P  < 0.001. IHC, immunohistochemical. IRS, immunoreactive score. Fig. 6 Immunoreactivity of MKNK1, TOP3A, HOXB2 and NUCB1 in endometrium from women with and without ovarian endometriosis. The expression and localisation of MKNK1 (A–C), TOP3A (E–G), HOXB2 (I–K), NUCB1 (M–O) in normal, eutopic, and ectopic endometrium evaluated by IHC staining, respectively. The comparisons of IRS across three groups (D, H, L, P). Values are presented as mean± SEM. P- values were determined by Kruskal-Wallis tests followed by multiple comparisons. ∗ P  < 0.05, ∗∗ P  < 0.01. and ∗∗∗ P  < 0.001. IHC, immunohistochemical. IRS, immunoreactive score. As MKNK1 and TOP3A were identified as novel, top-ranked EM-risk genes, we investigated the functional roles of these two genes in EM in vitro. EESCs were transfected with MKNK1 siRNA and TOP3A siRNA to knock down their expression, and their biological behaviours were assessed. The results of RT-qPCR and western blotting showed significantly lower MKNK1 and TOP3A expression in siRNA-transfected EESCs ( Fig. 7 A–C). Further, the results of the CCK-8 assay showed a significant decrease in the proliferation of EESCs in the si- TOP3A group, while no difference was observed in the si- MKNK1 group compared with the si-Ctrl group ( Fig. 7 D). Based on flow cytometry, the apoptosis rate corresponding to the si- TOP3A group was higher than that corresponding to the si-Ctrl group ( Fig. 7 E). Furthermore, based on Transwell assays, compared with the si-Ctrl group, the si-MKNK1 and si-TOP3A groups showed significantly impaired EESC migration and invasion abilities ( Fig. 7 F). Our cell function studies also indicated that MKNK1 downregulation inhibited the migration and invasion abilities of EESCs, but did not affect their proliferation and apoptosis rates. Additionally, TOP3A downregulation inhibited EESCs proliferation, migration, and invasion and promoted their apoptosis. Fig. 7 The role of MKNK1 and TOP3A in proliferation, apoptosis, migration and invasion of EESCs. (A–C) mRNA (A) and protein (B–C) expression levels of MKNK1 and TOP3A in EESCs transfected with siRNA were determined by RT-qPCR (n = 3) and western blot analysis (n = 6), respectively. (D) Proliferation of EESCs transfected with si-MKNK1, si-TOP3A, and si-Ctrl was assessed with CCK-8 assay at 0, 24, 48, 72 and 96 h, n = 6. (E) Representative images and the graphical statistics of apoptosis rate assessed by flow cytometry of EESCs transfected with si-MKNK1, si-TOP3A, and si-Ctrl. n = 6. (F) Representative fields (100 × magnification) and the graphical statistics of Transwell migration and invasion assay of EESCs transfected with si-MKNK1, si-TOP3A, and si-Ctrl. n = 5. Values are presented as mean± SEM. P -values were determined by unpaired two-tailed t test. ∗ P  < 0.05, and ∗∗∗ P  < 0.001. EESCs, ectopic endometrial stromal cells. Real-time quantitative PCR (RT-qPCR). Cell counting kit-8, CCK-8. Fig. 7 The role of MKNK1 and TOP3A in proliferation, apoptosis, migration and invasion of EESCs. (A–C) mRNA (A) and protein (B–C) expression levels of MKNK1 and TOP3A in EESCs transfected with siRNA were determined by RT-qPCR (n = 3) and western blot analysis (n = 6), respectively. (D) Proliferation of EESCs transfected with si-MKNK1, si-TOP3A, and si-Ctrl was assessed with CCK-8 assay at 0, 24, 48, 72 and 96 h, n = 6. (E) Representative images and the graphical statistics of apoptosis rate assessed by flow cytometry of EESCs transfected with si-MKNK1, si-TOP3A, and si-Ctrl. n = 6. (F) Representative fields (100 × magnification) and the graphical statistics of Transwell migration and invasion assay of EESCs transfected with si-MKNK1, si-TOP3A, and si-Ctrl. n = 5. Values are presented as mean± SEM. P -values were determined by unpaired two-tailed t test. ∗ P  < 0.05, and ∗∗∗ P  < 0.001. EESCs, ectopic endometrial stromal cells. Real-time quantitative PCR (RT-qPCR). Cell counting kit-8, CCK-8.

Materials

In the current integrative genomics analysis, we collected multi-omics datasets from several widely-established public databases as follows: 1) Dataset #1 for GWAS summary statistics on EM. We downloaded this GWAS summary dataset [15] from Gene ATLAS, an atlas of genetic associations in UK Biobank. A total of 245,494 subjects including 4252 EM patients based on European ancestry were chosen. The Affymetrix UK BiLEVE Axiom array and the Affymetrix UK Biobank Axiom array were used to genotype all samples. [16] , [17] After strict quality control, there were 13,853,045 SNPs eligible for downstream analyses. 2) Dataset #2 for GWAS data on Null phenotype: To evaluate the reliability of our findings, as referred to previous studies [18] , [19] , we constructed a GWAS summary dataset on a randomly distributed phenotype (called as Null trait) as a negative control. [21] , [20] , [21] 3) Dataset #3 for discovery eQTL dataset. This eQTL dataset [22] as a discovery dataset was used to conduct the Sherlock -based integrative genomics analysis. There were 1490 subjects with a total of 675,350 SNPs and 12,808 genes. 4) Dataset #4 for validation eQTL dataset. This eQTL dataset [23] as an independent validation dataset was leveraged to perform the Sherlock inference analysis with the same parameters. A total of 400 subjects with 408,283 SNPs and 20,599 genes. 5) Dataset #5 for validation eQTL dataset. This blood-based eQTL dataset (n = 369 blood samples), which was downloaded from the Genotype-Tissue Expression (GTEx) portal (Version 7, https://gtexportal.org/) [24] , was also adopted as an independent validation dataset for Sherlock analysis with the same parameters. For more detailed information for these datasets, please refer to Supplemental Methods . We applied the Sherlock integrative genomics analysis [25] for incorporating GWAS summary statistics with eQTL data to uncover susceptible SNPs and genes for EM. The Sherlock Bayesian algorithm was designed to identify disease-relevant SNPs that have cis- and trans -regulatory effects on gene expression. The SNPs associated with EM and gene expression simultaneously were termed as eSNPs. There existed 3 potential scenarios for the Bayesian inference: 1) A positive score would be assigned based on an eSNP demonstrating a significant association with EM; 2) A negative score would be assigned based on an eSNP demonstrating a non-significant association with EM; 3) No score would be assigned based on an SNP that was significantly associated with EM but not associated with gene expression. The logarithm of the Bayes Factor (LBF) is calculated by summarising the assigned scores of all relevant eSNPs for a given gene as a vital indicator to predict EM-risk genes. The significance of Sherlock analysis for each gene is calibrated by a simulation analysis, and simulated P -value< 0.05 was of significance. The Multi-marker Analysis of GenoMic Annotation (MAGMA) [26] was used as an independent technique to conduct a genetic association analysis of the GWAS summary dataset on gene-level. In this updated model, MAGMA carry a T statistic: T = ∑ i N Z i 2 = Z T Z where N is the SNP numbers within a specific gene. Z i = φ ( p i ) , φ represents the cumulative normal distribution function, and p i is the marginal P value for a SNP i . SNPs were assigned to a given gene depended on the location of the SNP whether located into the gene or within a genomic region spanning 20 kb window of the gene. Gene body are defined as the region from transcription start site to transcription stop site. Furthermore, the MAGMA model assumes Z ∼ MVN ( 0 , S ) , where S is the linkage disequilibrium (LD) matrix among SNPs. The LD matrix can be diagonalized and thereby written as S = QA Q T , where Q is an orthogonal matrix and A = diag ( λ 1 , λ 2 , . , λ N ) with λ j being the j th eigenvalue of S . The 1000 Genomes Project Phase 3 European Panel [17] is used as a reference for calculating LD among SNPs. D ∼ MVN ( 0 , I K ) is a random variable, where D = A - 0.5 Q T Z . Thus, the sum of squared SNP Z-statistics is calculated: T = Z T Z = ( Q A 0.5 D ) T Q A 0.5 D = D T AD = ∑ i N λ i D i 2 where D i ∼ N ( 0 , 1 ) and D i 2 ∼ χ 1 2 . T follows a mixture distribution of independent χ 1 2 random variables. To further validate the risk genes whose expression levels are linked with EM, we conducted an integrative genomics analysis by using the S-PrediXcan [27] as an independent approach to integrate meta-GWAS summary statistics with GTEx blood-based eQTL data (i.e., Dataset #5). S-PrediXcan mainly leverages two linear regression (MASHR) models to analyze the association between predicted gene expression and EM: Y = α 1 + X l β l + ε 1 Y = α 2 + G g γ g + ε 2 where α 1 and α 2 are intercepts, ε 1 and ε 2 are independent error terms, Y is the n dimensional vector for n individuals, X l is the allelic dosage for SNP l in n individuals, β l is the effect size of SNP l , G g = ∑ i ∈ g e n e ( g ) ω i g X i is the predicted expression calculated by ω lg and X l , in which ω lg is derived from the GTEx Project, and γ g is the effect size of G g . The Z-score (Wald-statistic) of the association between predicted gene expression and EM can be transformed as: Z g = γ ˆ g s e ( γ ˆ g ) ≈ ∑ i ∈ g e n e ( g ) ω i g σ ˆ i σ ˆ g β ˆ i s e ( β ˆ i ) where σ ˆ g is the standard deviation of G g and can be calculated from the 1000 Genomes Project European Phase 3 Panel [17] , β ˆ l is the effect size from GWAS on EM and σ ˆ l is the standard deviation of β ˆ l . Significant associations were adjusted using Bonferroni correction for multiple tests, and the significant P-value threshold was 4.46 × 10 −6 (0.05/11,217). To uncover the biological pathways and molecular functions of the identified EM-risk genes, we utilize the web-access tool of KOBAS [28] to performed a functional enrichment analysis (http://kobas.cbi.pku.edu.cn/kobas3). We submitted three gene sets identified by the Sherlock analysis (Gene sets #1–3) into the KOBAS website to calculate significantly enriched gene sets. There were two types of functional terms derived from multiple databases used in the enrichment analysis: 1) Biological pathways: Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome, BioCyc, and PANTHER pathways; 2) Gene ontology (GO) terms: biological process, cellular component, and molecular functions. The hypergeometric test was used to calculate the significance of each functional term. The False discovery Rate (FDR) method was used for conducting multiple comparisons, and FDR< 0.05 was considered to be of significance. The study participants were recruited in Women’s Hospital, Zhejiang University School of Medicine from June 2020 to December 2020. Women who aged 18–45 years old and underwent hystero-laparoscopy for suspected ovarian EM were included in the ovarian EM group (N = 30). The peripheral blood samples were collected before surgery, and the eutopic and ectopic endometria were obtained simultaneously during surgery. For controls, the peripheral blood samples were collected from healthy women who had regular menstrual cycles (N = 30), and the normal endometria were obtained from women who underwent hysteroscopy for tubal infertility or uterine mediastinum (N = 30). All diagnosis were confirmed by surgery and final pathological examination. Subjects who had received hormonal treatment in the past 3 months, or with diseases of endocrine system, malignant tumour, major organ diseases or with pathological diagnosis of adenomyosis, polyps, fibroid, or endometrial hyperplasia were excluded. Participants’ demographic and clinical characteristics are shown in Supplementary Table S1 . All EM patients were classified as III/IV stage according to the revised American Fertility Society (r-AFS) classification [29] . Variables including age, BMI were not statistically significant between ovarian EM group and controls. The study was approved by the Human Ethics Committee of Women’s Hospital, Zhejiang University School of Medicine (No. 20190014) and all participants signed informed consents. The endometrium specimens were fixed for 24 h in 10% neutral buffered formalin then underwent routine processing of washing, dehydration, transparency, wax dipping, and embedding at the Department of Pathology of Women’s Hospital, Zhejiang University School of Medicine for histologic diagnosis and immunohistochemistry. 5 mL peripheral blood was extracted with EDTA anticoagulant tube. Peripheral blood mononuclear cells (PBMCs) were isolated using a standardised density gradient technique. Total RNA was extracted from the isolated PBMCs using the QIAzol and miRNeasy Mini Kit (Qiagen, CA, USA). After amplification, the RNA integrity was tested using the Bioanalyzer 2100 system with the RNA Nano 6000 Assay Kit (Agilent Technologies, CA, USA). The mRNA was purified from total RNA using poly-T oligo-attached magnetic beads, and used for establishing cDNA libraries for RNA sequencing. To preferentially select cDNA fragments of 370–420 bp in length, library fragments were purified using the AMPure XP system (Beckman Coulter, Beverly, USA). The quality of cDNA library was assessed on the Agilent Bioanalyzer 2100 system. Paired-end sequencing (150 bp) was performed on the Illumina Novaseq platform by Novogene (Beijing, China). First, raw sequencing data were qualified using the FastQC software (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). The human reference genome file was downloaded from the Ensembl ftp site (http://asia.ensembl.org/info/data/ftp/index.html, file name: Homo_sapiens.GRCh37.75.cdna.all.fa). Index of the reference genome was established using Hisat2 V2.0.5 [30] . We used the Hisat2 V2.0.5 tool to align paired-end clean reads to the reference genome. The featureCounts V1.5.0-p3 [31] was utilised to count the reads number mapped to each gene. The Fragments Per Kilobase of exon model per million mapped fragments (FPKM) of each gene was calculated. To validate whether abnormal expression of these identified genetics-risk genes is associated with EM, we performed differential gene expression (DGE) analysis using the edgeR [32] R package (3.22.5) in our sequenced transcriptomic data that contained 30 EM patients and 30 matched controls. The P -values were corrected using the Benjamini & Hochberg method. The Student’s t-test was applied to evaluate the significance. Based on the Pearson correlation algorithm, we performed co-expression pattern analyses for discovering the co-expression patterns among the identified genes between EM and controls. The Corrplot package in R platform was used to visualise the co-expression patterns. To prioritise the important risk genes for subsequent functional validation, we performed an evidence scoring analysis for identified genes by combining all piece of supportive evidence from the current analysis including Sherlock , MAGMA, S-PrediXcan, and DGE analysis. A significant result scores 1, and a non-significant result scores 0. The IHC staining was performed as previously described [33] . 3 µm thick sections were cut from the tissue blocks, followed by routine deparaffinisation and rehydration procedures. IHC staining was performed with antibodies specific to MKNK1 (dilution 1:500, T611286; Abmart, Shanghai, China), TOP3A (dilution 1:1200, TD3265; Abmart, Shanghai, China), NUCB1 (dilution 1:2500, ab154262; Abcam, MA, USA) and HOXB2 (dilution 1:500, orb114161; Biorbyt, Cambridge, UK) for 1 h at room temperature. After washing, the sections were incubated with horseradish peroxidase (HRP)-conjugated secondary antibodies (Envision Detection kit, GeneTech, Shanghai, China) for 1 h and reacted with DAB (GeneTech) until appropriate for microscopic examination. A negative control was performed by the same method except for replacing the primary antibodies with PBS. Slides were evaluated independently by two blinded observers and re-examined by a senior pathology physician. The IHC results were evaluated using the immunoreactive score (IRS) [34] . The percentage of positive cells was scored from 0 to 4 as: no, 80%. The intensity of reaction was scored from 0 to 3 as: no colour, mild, moderate, and intense. The final score was calculated by multiplying the percentage and intensity scores, ranging from 0 to 12. Ectopic endometrial stromal cells (EESCs) were isolated and cultured as previously described [35] . Small interfering RNA (siRNA) were produced by Genepharma Corporation (Shanghai, China). The siRNA sequences were: MKNK1, 5‘-GUGGGAUGAAACUGAACAATT-3′ (sense), 5‘-UUGUUCAGUUUCAUCCCACTT-3′ (antisense); TOP3A 5‘-GGCAGCAAGUGCAGAAAUATT-3′ (sense), 5‘-UAUUUCUGCACUUGCUGCCTT-3′ (antisense). siRNAs (20 nM) were transfected into the EESCs at 70% confluency using Lipofectamine RNAiMAX (Invitrogen, Carlsbad, CA, USA). After transfection, total RNA (for 48 h) and protein (for 72 h) were extracted, and real-time quantitative PCR (RT-qPCR) and western blot analysis were conducted to assess transfection efficiency, respectively. For further details, see Supplementary Methods . The biological behaviours of EESCs including proliferation, apoptosis, and migration and invasion were assessed via cell counting kit-8 assay, flow cytometry, and transwell assays, respectively. For detailed methods, see Supplementary Methods . For clinical data, IHC scoring and cell experiments, statistical analyses were conducted using the SPSS 24.0 (IBM, USA) and GraphPad Prism 8 (GraphPad Software, USA). The continuous data were presented as mean± SEM (or SD). Shapiro-Wilk test was used to examine the normality of data. For data variables with normal distribution, Student’s t-test or one-way ANOVA followed by Bonferroni’s post hoc tests were used for comparison between two groups or across multiple groups, respectively. For the data variables that were non-normally distributed, Mann–Whitney U or Kruskal–Wallis ANOVA followed by multiple comparison tests were carried out. The categorical data were shown as n (%) and compared by Chi-squared test. A value of P  < 0.05 was indicated as statistically significant.

Discussion

EM is a common and complex disease with genetic predisposition. Hitherto, multiple GWAS have been performed to reveal the genetic determinants underlying EM in populations worldwide. However, the genes associated with EM risk have not yet been identified. As most of the EM risk-associated risk variants are located in non-coding regions, the identified risk variants may confer the risk of EM by regulating gene expression. Therefore, identifying risk genes, from our perspective, is a crucial step in bridging GWAS findings and the aetiology of EM to the end of facilitating the development of novel therapeutics for its management. In this study, we conducted Sherlock analyses to integrate a large-scale EM GWAS dataset with independent eQTL datasets. Thus, we first identified 14 risk genes whose expression changes may contribute to the risk of EM and thereafter, performed comprehensive analyses to validate and prioritise the identified risk genes. Further DGE analysis showed that seven of these genes, including TOP3A , MKNK1 , and SIPA1L2 were dysregulated in the peripheral blood of ovarian EM cases compared with the control samples. IHC staining results also consistently showed that MKNK 1 and TOP3A were upregulated, while HOBX2 was downregulated in the endometrium of women with ovarian EM. Finally, we observed that the knockdown of MKNK1 and TOP3A affected the migration and invasion behaviours of EESCs. Taken together, these convergent lines of evidence suggested that MKNK1 and TOP3A are promising candidate genes for EM. MKNK1 (also named MNK1 ), located on chromosome 1p33, plays essential roles in many human diseases, including tumourigenesis and metabolic diseases, and is also implicated in autoimmune and inflammatory diseases as well as viral replication processes. Additionally, MKNK1 is one of the immediate downstream effectors of the activated MAPK and PI3K pathways driven by BRAF V600E and mutated PTEN . Elevated levels of MKNK protein kinases and their substrate, eIF4E (or p-eIF4E), have been detected in multiple types of solid tumours (e.g., breast, prostate, and melanoma) as well as haematological malignancies [42] , [43] , [44] . It has also been reported that MKNK1 facilitates tumour invasion and metastasis by promoting eIF4E phosphorylation [45] , [46] . In addition, numerous studies have highlighted that the MKNK/eIF4E axis contributes to promoting oncogenic mRNA translation [44] , [47] , and in recent years, MKNK1/2 has been regarded as an important molecular target in invasive and metastatic cancer, and several MKNK1/2 inhibitors have reached phase I/II clinical trials [42] . However, despite the vital role of the MAPK and PI3K pathways in EM, no study to date has focused on the association between MKNK1 with EM. In this study, we identified MKNK1 as a promising EM risk-related gene and verified that it was consistently upregulated in peripheral blood and endometrium samples from EM cases compared with controls. Consistent with its known cellular function, MKNK1 protein expression was detected in both the nucleus and cytoplasm of endometrial cells. Our in vitro experiments also suggested that MKNK1 possibly participates in the pathogenesis of EM by promoting the invasion and migration of EESCs, the mechanism of which might involve eIF4E phosphorylation or the regulation of other oncogenic cell signalling pathways. TOP3A is located on chromosome 17p11.2 and encodes Top3α (topoisomerase 3α), a type IA DNA topoisomerase that shows dual localisation, in the nucleus and mitochondria [48] . Reportedly, the nuclear isoform of Top3α functions as a decatenase, facilitating the processing of homologous recombination intermediates to maintain genomic stability [49] , [50] . Additionally, the mitochondrial isoform of Top3α is an essential component of the mtDNA replication machinery required for the decatenation and segregation process [51] . However, the function of TOP3A in human diseases, including EM, has not yet been sufficiently investigated. Although the pathophysiology of EM remains elusive, dysregulated DNA damage response (DDR) has received much attention in this regard in recent years. For example, Bane et al. [52] demonstrated that eutopic endometrium from women with EM show higher DDR and DNA repair gene expression levels, as well as higher DNA damage levels compared with the controls, suggesting the existence of stimuli that induce DNA damage in eutopic endometrium. Thus, the involvement of TOP3Α in homologous recombination repair may provide clues regarding its biological role in EM. High TOP3Α expression levels in eutopic and ectopic endometrium samples probably help counteract the high DNA damage caused by external or internal factors; notwithstanding, this still warrants further investigation. HOXB2, a member of the HOX family, is a transcription factor that is involved in embryonic development. The expression of HOXB2 is altered in a variety of solid tumours, the function of which could be distinct in different tumours. HOXB2 was identified as a tumor suppressor in breast cancer cells, whose expression could be downregulated by estrogen [53] . A previous study demonstrated that HOXB2, as a downstream target of miR-202–5, played a role in inhibiting the proliferation, invasion and migration of ovarian cancer cells [54] . However, in some other malignant tumours such as esophageal squamous cell carcinoma, nephroblastoma and bladder cancer, HOXB2 presented a tumour promotor via increasing cell proliferation, invasion and migration [55] , [56] , [57] . In EM, whether HOXB2 act as a suppressor thus the decreased expression of protein might promote diseases progression need to be further clarified. Previous GWASs have been conducted for identifying disease-associated variants for EM [58] . To give an overview of significant loci obtained through GWAS, we summarized these reported significant variants in the Supplementary Table S10 . Among the 14 identified genes, TPM2 , HOXB2 , and MGMT have been shown to be associated with EM in a few previous studies [37] , [38] , [39] . Specifically, TPM2 encodes beta-tropomyosin, which plays a role in muscle contraction and motility, and helps maintain cell shape and cell-matrix interactions. Irungu et al. [37] discovered and confirmed that TPM2 is highly expressed in the ectopic endometrium and serum of patients with EM compared with samples from the controls, suggesting it has potential as a biomarker of EM. In this study, bioinformatics analyses based on public datasets suggested that the expression level of TPM2 is associated with EM risk, but showed no expression difference in our verification samples. This phenomenon could be attribute to several reasons including the different ethnic background, relatively small verification sample size, and differential expression of genes in different tissues and different levels. Vestergaard et al. [38] observed that HOXB2 transcription was significantly reduced in EM lesions compared with endometrium samples from patients with EM and healthy controls, which was in agreement with our results. O6-methylguanine-DNA methyltransferase (MGMT), which is responsible for the direct repair of DNA, is primarily immunolocalised in the nuclei of epithelial cells in eutopic endometrial tissue and ovarian EM lesions [39] . Nevertheless, previous studies have only been focused on the investigation of the expression or localisation patterns of risk genes. Thus, further studies are needed to clarify the function of these genes in EM. Several eQTL analysis studies have focused on the association between genetic variation and gene expression in EM. In this regard, Montgomery et al. have highlighted three eQTLs that may regulate the expression of target genes, including LINC0039 and CDC42 on chromosome 1, VEZT on chromosome 12, and CDKN2A-AS1 on chromosome 9, by integrating gene expression data from whole blood (n = 862) and endometrial tissue (n = 136) with an Australian GWAS dataset (2594 cases and 4496 controls) [59] , [60] . Recently, Chou et al. conducted a GWAS involving 126 EM cases and 96 controls in a Taiwanese population, and thereafter, mapped the results obtained with the GTEx database. They identified that SNP rs13126673 on chromosome 4 is a cis-eQTL and is associated with both EM risk and INTU expression [61] . These previous studies were based on relatively small sample sizes of GWAS data, which may have reduced their power to identify more risk loci. In this current study, we leveraged GWAS data with a very large sample size (n = 245,494) from the UK Biobank database and three independent eQTL datasets for integrative genomic analysis; this enhanced the possibility of identifying more novel loci. Additionally, Sherlock integrative genomics analysis, based on the Bayesian inference method, is a vigorous tool for integrating genetic data from GWAS with existing eQTL data [25] . Compared with the usual GWAS approaches that disregard large amounts of common genetic variants with minor effects, Sherlock integrative analysis has an obvious advantage in that it involves the re-use of these disregarded common variants in GWAS. Further, based on Sherlock analysis, several novel risk genes have been implicated in the pathogenesis of various complex diseases, including schizophrenia, childhood-onset asthma, major depressive disorders, and COVID-19 [18] , [36] , [62] , [63] . Our study further provides supportive evidence that incorporating multiple layers of omics data contributes to strengthening the association signals of pinpointing risk loci for complex diseases. This study had several limitations that should be considered. First, the GWAS data and eQTL data used in the integrative analysis were based on European ancestry, whereas our RNA-sequencing data were derived from a Han Chinese population. This might have led to biases due to the differences in genetic architectures across different ethnicities. Second, even though our current integrative genomic analyses have highlighted some EM risk-associated genes, such as MKNK1 and TOP3A , there were other numerous underlying susceptible genes with suggestive evidence for EM that warrant further investigation, as documented in supplementary tables . Third, the different datasets used in the present study showed heterogeneity. To overcome this issue, we applied different statistical methods for multiple corrections of each analysed dataset, such as FDR< 0.05, for pathway enrichment analysis, permuted P- value< 0.05, for Sherlock analysis, and empirical P -value< 0.05, for in silico permutation analysis. Moreover, the study participants were only ovarian EM cases, and further studies involving superficial endometriosis and deep EM are needed. Furthermore, to examine whether using the threshold of MAF> 0.0001 affect the results of integrative genomic analyses, we re-performed the Sherlock-based analyses based on SNPs with a MAF> 0.01, and found that these 14 identified EM-risk genes remained to be significant, which is highly consistent with current findings (R2 = 0.9997–0.9999, Supplementary Fig. S8 and Table S11 ). Finally, our functional experiments on MKNK1 and TOP3A in EECSs are preliminary. Thus, further studies involving animal models and molecular mechanisms are required. In summary, based on our comprehensive analyses, MKNK1 and TOP3A were identified as EM risk-associated genes, whose genetically modulated abnormal expression may contribute to EM. By combining GWAS summary-based statistics with eQTL-derived regulatory information, this study provides a plausible mechanistic explanation of the functional effects of genetic variants on EM susceptibility. These results provide novel insights into the biological mechanisms of EM and support the promise of translating GWAS findings into new approaches for clinical diagnosis and treatment.

Introduction

Endometriosis (EM), a common gynaecological disease characterised by the presence of endometrium-like tissue outside the uterus, affects approximately 10% of women of reproductive age [1] . EM presentation is highly heterogeneous, varying from cysts in the ovaries (endometrioma) and superficial peritoneal lesions to nodules with a depth of penetration exceeding 5 mm (deep EM) [1] . EM causes symptoms, including dysmenorrhoea, dyspareunia, chronic pelvic pain, and infertility, impairing women’s quality of life while being a substantial economic burden [2] , [3] . Hormonal medication and surgical removal of lesions are the main therapeutic approaches for EM management; however, their efficacy is unsatisfactory, and EM recurrence rates are also high. A major reason for this dilemma is that the aetiology of this disease remains unclear. Multiple lines of evidence suggest that EM is influenced by both genetic and environmental factors. Twin studies have demonstrated that EM heritability is approximately 50% [4] , indicating the pivotal role of inherited risk variants in EM. From this perspective, a genome-wide association study (GWAS) is an effective approach for simultaneously examining the association of several single nucleotide polymorphisms (SNPs) with EM to the end of identifying novel risk genetic variants. The first GWAS, conducted in 2010 in the Japanese population, identified a significant association between two SNPs—rs10965235 located in CDKN2B-AS1 on chromosome 9 and rs16826658 located in WNT4 on chromosome 1—with EM [5] . Subsequently, more than 10 large-scale, population-based GWAS involving different ethnic populations have been performed, and a dozen genome-wide significant loci have been reported [6] , [7] , [8] . Nevertheless, as most reported risk variants are located in non-coding genomic regions, how these non-coding variants affect EM pathogenesis remains largely unknown. The greatest challenge in following up on GWAS is to identify genes responsible for an association with EM and to understand the functional consequences of these loci. Expression quantitative trait loci (eQTL), which offers the possibility to elucidate a fraction of the genetic variance in gene expression, have been extensively studied for post-GWAS analyses [9] , [10] , [11] , [12] . Owing to its ability to establish a link between non-coding variants and the expression of a given gene, eQTL analysis is one of the most remarkable methods for highlighting disease-associated variants [13] , [14] . In this study, we performed a Bayesian integrative analysis (Sherlock) by combining genetic associations from large-scale GWAS summary statistics on EM and three independent eQTL datasets to identify potential EM risk-related genes. We further subjected peripheral blood samples from patients with EM and healthy controls to transcriptome sequencing to study the expression levels of the identified ovarian EM risk-associated genes. Additionally, the expression levels of these gens in tissue samples were investigated via immunohistochemical (IHC) analysis using ectopic, eutopic, and normal endometria samples. Finally, the potential roles of the EM risk-related genes were explored via in vitro functional studies. Our integrative study provided novel insights that MKNK1 and TOP3A may represent promising EM risk genes.

Coi Statement

The authors declare that they have no conflicts of interest.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: pmc-nxml

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Condition tags

endometriosis

Citation neighborhood

Papers in the corpus that this work cites (lower rings, blue) and that cite this one (upper rings, green). Dot size scales with the paper's in-corpus citation count — bigger dot = more influential within the endo/adeno field. Click a dot to open that paper. [ expand to 2 hops ] — adds papers reached through this work's immediate citers/citees. Heavier; up to 60 extra dots.

References (68)

Cited by (3)

Source provenance

europepmc
last seen: 2026-06-12T06:13:51.797165+00:00
openalex
last seen: 2026-06-10T17:14:06.276822+00:00
pubmed
last seen: 2026-06-12T06:13:03.365682+00:00
License: CC0 · commercial use OK