Identification of Key Genes and Pathways associated with Endometriosis by Weighted Gene Co-expression Network Analysis

article OA: gold CC0 ⤵ 9 in-corpus citations
AI-generated summary by claude@2026-06, 2026-06-07

This study utilized weighted gene co-expression network analysis and gene set enrichment analysis on microarray data to identify key genes and pathways in ectopic and eutopic endometrium associated with endometriosis.

One-sentence paraphrase of the abstract; not a substitute for reading it. No clinical advice. How this works

AI-generated deep summary by claude@2026-06, 2026-06-07 · read from full text

This study used GEO microarray datasets to compare gene expression in ectopic endometrium (EC) versus normal controls (GSE7305) and eutopic endometrium (EU) versus controls (GSE120103), applying limma-based differential expression, GSEA, and weighted gene co-expression network analysis (WGCNA) to find EC- and EU-associated gene modules and hub genes, then validating hub genes in independent datasets (GSE7307 and GSE51981). EC-associated genes were enriched for immune response and immune cell trafficking, while EU-associated genes were enriched for stress response and steroid hormone biosynthesis, and WGCNA identified an EC blue module and an EU magenta module with inferred roles in implantation/migration via inflammatory or hormone-receptor-related processes. Twelve EC hub genes and sixteen EU hub genes were defined by overlap between WGCNA and protein-protein interaction network analyses and further validated in independent datasets. The paper does not discuss explicit technical limitations beyond noting that there is no standard protocol for transcriptomic analysis. This paper is centrally about endometriosis—using WGCNA to identify hub genes and pathways specific to ectopic and eutopic endometrium.

Read from the paper's body, not the abstract. Not a substitute for reading the paper. No clinical advice. How this works

Abstract

Background: Endometriosis is a common gynecological disorder with high rates of infertility and pelvic pain. However, its pathogenesis and diagnostic biomarkers remain unclear. This study aimed to elucidate potential hub genes and key pathways associated with endometriosis in ectopic endometrium (EC) and eutopic endometrium (EU). Material and Method: EC and EU-associated microarray datasets were obtained from the gene expression omnibus (GEO) database. Gene set enrichment analysis was performed to obtain further biological insight into the EU and EC-associated genes. Weighted gene co-expression network analysis (WGCNA) was performed to find clinically significant modules of highly-correlated genes. The hub genes that belong to both the weighted gene co-expression network and protein-protein interaction (PPI) network were identified using a Venn diagram.
Full text 18,432 characters · extracted from pmc-nxml · 5 sections · click to expand

Intro

Endometriosis is an estrogen-dependent gynecological disorder characterized by the growth of endometrium in ectopic locations 1 . A total of 30-50% of women with endometriosis suffer from pain and/or unexplained infertility 2 . Despite several theories (i.e., retrograde menstruation, coelomic metaplasia, Müllerian remnants) that have been proposed, the pathogenesis of endometriosis is still unknown. The widely accepted retrograde menstrual reflux hypothesis states that eutopic endometrium (EU) migrates and survives outside the cavity of uterus, then establishes new endometriosis lesions. It is believed that the elucidation of molecular and functional specificities of the ectopic endometrium (EC) and EU facilitates a better understanding of the complex physiopathology of endometriosis. Previous studies have shown that the EC may behave differently from its eutopic counterpart 3 . However, these studies mainly focused on a few molecules or the gene expression differences between different tissues, without considering the intrinsic relationship between these genes. In addition, there is still room for improvement of the bioinformatics algorithm in analyzing these transcriptomic data, and the specific biomarkers and roles of the EC and EU in endometriosis remain uncertain. Therefore, our study for the first time explored the genomic alteration profiles of the two entities of endometriosis using weighted gene co-expression network analysis (WGCNA) to identify endometriosis-associated biomarkers and pathways. Currently, there is no standard protocol for analyzing transcriptomic data. Network analysis is a promising direction that allows for a greater ability to recognize biological themes or pathways. It combines biology and network science to study the relationships of interacting components, which may provide novel and comprehensive insights into the diseases from the level of multiple genes 4 . WGCNA is a network method for identifying highly correlated gene expression modules in different samples and analyzing the correlation between the module and disease type/clinical phenotype. Hence, WGCNA has been widely used to explore the biomarkers and therapeutic targets of various diseases, such as breast cancer 5 . WGCNA has also been used to identify biologically related modules. For example, Wang et al. found fifteen hub genes that were highly correlated with the progression and prognosis of clear cell renal cell carcinoma using WGCNA 6 . As a result, using WGCNA, we attempted to identify the modules of co-expressed genes highly associated with endometriosis and their key drivers. Meanwhile, we tried to explore the key pathways of the EC and EU in the pathogenesis of endometriosis. Our study may provide a better understanding of the disorder through the comparison between the eutopic and ectopic endometrium, and provide a new insight into the molecular mechanisms underlying the pathogenesis of endometriosis.

Methods

To illustrate the data preprocessing, analysis and validation, a schematic flow diagram of the study is presented in Figure 1 . Expression profiles of endometriosis-associated mRNAs in GSE7305 , GSE120103 , GSE7307 and GSE51981 were downloaded from Gene Expression Omnibus (GEO) database. The microarray datasets GSE7305 and GSE120103 with complete clinical information and same menstrual cycle were used as training sets to identify hub differentially expressed genes (DEGs) of endometriosis, GSE7307 and GSE51981 were used as test sets to validate our results, respectively. Dataset GSE7305 7 performed on the GPL570 platform (Affymetrix Human Genome U133 Plus 2.0 Array) was used to recognize hub DEGs in ovarian endometrioma, which includes 10 ovarian endometriomas from women with endometriosis (EC) and 10 normal endometria (Ctrl). Dataset GSE120103 8 performed on the GPL6480 platform (Agilent-014850 Whole Human Genome Microarray 4x44K G4112F) was applied to identify hub DEGs in eutopic endometrium, which includes 9 eutopic endometria from fertile women with endometriosis (EU) and 9 Ctrl. Dataset GSE7307 performed on the GPL570 platform (Affymetrix Human Genome U133 Plus 2.0 Array) was used to validate EC-associated hub DEGs, which includes 23 EC and 18 Ctrl. And Dataset GSE51981 performed on the GPL570 platform (Affymetrix Human Genome U133 Plus 2.0 Array) was used to validate EC-associated hub DEGs, which includes 38 EU and 71 Ctrl. Limma (linear models for microarray data) package 9 , 10 in R software was utilized to correct the data background and identify the DEGs in EC vs Ctrl and EU vs Ctrl groups. Batch effect was removed using the limma package removeBatchEffect function. Benjamin and Hochberg method was used for multiple testing corrections 11 . The false discovery rate (FDR) <0.05 and |log 2 (Fold Change)| (|log 2 FC|) ≥1 was the cut-off criteria for screening DEGs. To evaluate the molecular mechanisms of endometriosis, GSEA of the gene expression profiles of GSE7305 and GSE120103 was performed using the “ClusterProfiler” package in R ( http://www.bioconductor.org/packages/release/bioc/html/clusterProfiler.html ). The genes were listed based on their expression levels, and were further mapped to the annotated gene sets of c5 (Gene Ontology (GO) gene sets) and c2 (curated gene sets) in Molecular Signatures Database (MSigDB). Gene sets with P-value <0.05 and FDR <25% are considered as significant 12 . The search tool for retrieval of interacting genes (STRING) database was used to identify the interactions among DEGs with the parameters of protein interaction score>0.4. Thereafter, the PPI network is constructed by Cytoscape. The potential hub DEGs were determined by Molecular Complex Detection (MCODE) plug-in (K-score>3) 13 . We used the WGCNA R package to establish co-expression networks 14 for the genes in GSE7305 and GSE120103 . The unqualified genes were screened out. A matrix of genes' similarity by Pearson's correlation analysis was created. Appropriate soft threshold power (β) was applied to strengthen this matrix to a scale-free co-expression network. For this purpose, we choose the lowest power (14 or 22) for which the scale-free topology fit index curve flattens out upon reaching a high value (above 0.8). Furthermore, the adjacency matrix was transformed into the topological overlap matrix (TOM). Genes with higher TOM values indicate higher connectivities in the network; that is, more adjacencies to other network-generated genes 15 , 16 . Meanwhile, genes were clustered hierarchically by the TOM-based dissimilarity (1-TOM) measure. The highly correlated genes were assigned to the same module. The correlation between modules and clinical traits was investigated by the module-trait relationship analysis of WGCNA. The modules that most relevant to the clinical traits could be identified. In this study, the endometriosis-associated blue and magenta modules were chosen for the subsequent analyses. Metascape was used to explore the function annotations (GO biological processes and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways) of these two modules. Terms with P-value1.5 were considered statistically significant. We analyzed the gene significance (GS, the correlation between the gene and a clinical phenotype of interest) and module membership (MM, the correlation between gene expression profile and module eigengene) of each gene in the clinically significant blue and magenta modules. The module eigengene is defined as the main component of the module's gene expression matrix. |MM|>0.6 and |GS|>0.8 were set as the threshold for screening candidate hub genes that strongly associated with EC or EU. In the end, the Venn diagram was performed to identify the common hub genes from PPI network analysis and WGCNA. In addition, GSE7307 and GSE51981 were used as validation data sets. “ggplot2” (Ito & Murphy, 2013) R package was applied to show relative expression of the identified hub genes in different comparison groups (EC vs Ctrl or EU vs Ctrl).

Results

With the cut-off criteria (FDR1), a total of 1487 DEGs (824 upregulated and 663 downregulated) were identified between the EC and Ctrl in the GSE7305 dataset, and a total of 5794 DEGs (1974 upregulated and 3820 downregulated) were identified between the EU and Ctrl in the GSE120103 dataset. Volcano plots show the variation of DEGs in the EC versus Ctrl (Figure 2 A) and EU versus Ctrl (Figure 2 B). GSEA revealed the genes in the EC were mainly enriched in the immune response and immune cell trafficking, such as protein activation cascade, complement activation, regulation of humoral immune response, complement and coagulation cascades pathway and chemokine signaling pathway (Figure 3 A-B), and the genes in the EU were mainly enriched in the stress response and steroid hormone biosynthesis, such as the stress response to copper ion, activation of GTP hydrolases (GTPases) activity, Human ATP-binding cassette (ABC) transporter dependent pathway and steroid hormone biosynthesis pathway (Figure 3 C-D). We explored the functions of the two entities of endometriosis using genomic alteration profiles. These function annotations revealed the distinct roles of the EC and EU in the pathological process of endometriosis, which demonstrated the reliability of our results. To identify the candidate hub genes, the most differentially expressed 362 genes in the EC versus Ctrl and 1992 genes in the EU versus Ctrl were selected for the PPI network construction. The MCODE clustering algorithm was applied to analyze the PPI network. With a threshold of k-scores>3, seven clusters with 78 candidate hub genes in the EC and 21 clusters with 205 candidate hub genes in the EU were selected. Figure 4 A-D depicts the top two clusters in the EC and EU. The values of β = 14 (scale free R 2 = 0.80) and β = 22 (scale free R 2 = 0.85) were selected as the soft-threshold powers to ensure scale-free networks using R package of “WGCNA” (Figure 5 A-B). Genes with similar expression patterns were clustered into co-expression modules that were displayed in different colors. A total of 58 and 39 modules were identified (Figure 5 C-D). The relevance between each module and clinical information was shown in the module-trait relationship (Figure 5 E-F). In this situation, we focused on the EC and EU-associated key modules. The blue module, containing 2036 genes, was most correlated with the EC (R=0.87, p=5×10 -6 ). Meanwhile, the magenta module, containing 768 genes, was most correlated with the EU (R=0.97, p=1×10 -12 ). Hence, the blue and magenta modules were clinically significant and used for the following analyses in this study. To explore the function mechanism of genes in the clinically significant modules, GO analysis and KEGG analysis were conducted. Function analysis revealed that the main biological processes and pathways of the blue module were regulation of cell adhesion, autophagy, FoxO signaling pathway and focal adhesion pathway (Figure 6 A-B), and those of the magenta module were regulation of the mitogen-activated protein kinase (MAPK) cascade, regulation of the growth hormone receptor, the tumor necrosis factor (TNF) signaling pathway and NOD-like receptor signaling pathway (Figure 6 C-D). These function annotations for the blue and magenta modules are listed in Table 1 and Table 2 . The genes in the top EC-associated module mainly played roles in autophagy, focal adhesion and cancer, while those in the top EU-associated module were involved in creating an estrogen-rich and inflammatory microenvironment. Hub genes in the co-expression network are characterized by high intramodular connectivity which is measured by the value of GS and MM. In Figure 7 A and 7 B, the scatterplots of GS (y-axis) vs. MM (x-axis) are shown in the blue (R=0.95, p<1×10 -200 ) and magenta (R=0.8, p<4.2×10 -172 ) modules. MM was highly correlated with GS in each module, which indicated that the hub genes in the co-expression modules were highly correlated with endometriosis. With the threshold of |MM| > 0.6 and |GS| > 0.8, using WGCNA, we identified 735 candidate hub genes in the blue module, and 329 candidate hub genes in the magenta module. For the identification of endometriosis-associated hub genes, we compared the hub genes in the co-expression and PPI networks. We finally identified 16 overlapping hub genes in the blue module (Figure 7 C) and 12 overlapping hub genes in the magenta module (Figure 7 D). These 28 hub genes are listed in Table 3 . Independent datasets were used to identify the hub genes. We compared the expression of each hub gene in endometriosis. Fifteen hub genes were differentially expressed between the EC and Ctrl in GSE 7305 (Figure 8 A), and seven hub genes were differentially expressed between the EU and Ctrl in GSE 51981 (Figure 8 B). Boxplots were used to show the validation results (Figure 8 A-B).

Discussion

Endometriosis is a non-malignant gynecological disease whose pathogenesis is still unclear. The absence of biomarkers may contribute to the long delay between disease onset and diagnosis. Hence, it is imperative to identify novel molecular biomarkers that may enable early diagnosis and personalized treatment. For the first time, our study identified endometriosis-associated hub genes using WGCNA, which may hold important clues regarding the pathogenesis of endometriosis, provide valuable resources for the identification of endometriosis biomarkers and thus may improve the clinical management of this disease. WGCNA can produce more robust results compared with other bioinformatics methods 17 , 18 because it constructs weighted co-expression networks based on the similarities of gene expression profiles and focuses on the correlation between the co-expressed modules and clinical traits. Hub genes are defined as the highly connected nodes that contribute to a phenotype or disease 19 . Therefore, this method has been used to identify biologically relevant modules and biomarkers in different diseases 20 . Endometriosis is a benign disease, although, similar to cancer, it has characteristics of being invasive and migratory. In our study, EC and EU-associated hub modules were identified. Function enrichment analyses showed that the genes in the blue and magenta modules had different roles and both were significantly associated with endometriosis, which demonstrated our analysis. For example, genes in the EC-associated blue module played roles in autophagy, focal adhesion (the initiation step for disease progression 21 ) and cancer, all of which were involved in the pathogenesis of endometriosis 22 , 23 . Previous studies showed that S100A7 promoted the development of endometriosis by activating NF-kappaB signaling pathway 24 . In our study, the genes in EU-associated magenta module played roles in the regulation of growth hormone receptor signaling pathway, NF-kappaB signaling and GnRH signaling pathway, which induced an estrogen-rich and inflammatory microenvironment involved in cell division, cell movement and survival in endometriosis 20 , 25 - 27 . As a result, we assume that hormone receptor signaling or inflammatory microenvironment may promote the passing of EU through oviducts and migrating to the ovarian surfaces, and adhesion and autophagy correlated genes may induce the successful ectopic implantation of endometrium (EC) and formation of endometriotic lesions. Taken together, dysregulated genes in the EU may be responsible for the increased propensity of endometrial debris ectopic implantation and for early events that lead to the establishment of lesions. Dysregulated genes in the EC may contribute to the lesion formation and influence the progression of the disease. To better understand the pathogenesis of endometriosis, WGCNA and PPI analyses were used to identify the EC and EU-associated hub genes. Some hub mRNAs, such as TAS2R3, TAS2R41, SERPING1, CASR, CCKAR, GPR55, HCRTR2, CRH, HTR5A, CFTR, and ENAM, were also key enriched genes in the GSEA. For instance, SERPING1 was involved in the complement and coagulation cascades, both NR4A2 and ABCC8 played important roles in ABC transporters, and CYP2E1 was involved in the pathway of steroid hormone biosynthesis. In addition, some identified hub genes of the EC (TAGLN, GATA6, CDH3, CLU, COL8A1, MYH11, MYOCD) and EU (CXCL13, DDK-1, KLF4, CYP2E1, CYP4B1 and PROK1) have been reported be associated with endometriosis. For example, TAGLN may be involved in cell invasion, migration, and differentiation in endometriosis 28 . GATA6 is an essential gene in the activation of estrogen synthesis and may become a molecular marker in endometriotic lesions 29 , 30 . Endometrial CXCL13 expression may play an important role in the pathophysiology of endometriosis 31 . MiR-200b inhibited invasive growth in endometriosis by targeting KLF4 32 . Dysregulated endometrial PROK1 expression may be correlated with the progesterone resistance of endometriosis 33 . Most importantly, we discovered some novel and important genes, including HOXC6, PROS1, SERPING1, MYLK, ACTG2 and THBS2 in the ectopic endometrium, and NR4A2, ABCC8, COL4A6, COL5A3, FSTL3 and WDR27 in the eutopic endometrium. For example, HOXC6 was found to regulate the response to hormonal signals, and the overexpression of FSTL3 significantly improved angiogenesis and neovascularization in the induced pluripotent stem cells 34 . These hub genes may provide new mechanisms for endometriosis and will be investigated in the future. Our study identified multi-molecule biomarkers in endometriosis. However, some patients of the validation datasets had incomplete clinical information, which affected further data exploration. The identified genes will be further validated by clinical specimens and in vitro experiments for their application in endometriosis.

Conclusions

Our study for the first time analyzed the gene expression files of the eutopic and ectopic endometrium in women with endometriosis using WGCNA, explored the distinct functions of the eutopic and ectopic endometrium, and identified co-expression modules and potential biomarkers for endometriosis. Our study may improve the understanding of the pathogenesis of endometriosis and provide references for endometriosis-associated biomarkers and therapeutic targets.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: pmc-nxml

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Condition tags

endometriosisinfertility

MeSH descriptors

Endometriosis Gene Expression Profiling Gene Regulatory Networks Protein Interaction Maps Signal Transduction Endometriosis Endometrium Endometrium Female Gene Expression Profiling Gene Regulatory Networks Humans Protein Interaction Maps Signal Transduction

Citation neighborhood

Papers in the corpus that this work cites (lower rings, blue) and that cite this one (upper rings, green). Dot size scales with the paper's in-corpus citation count — bigger dot = more influential within the endo/adeno field. Click a dot to open that paper. [ expand to 2 hops ] — adds papers reached through this work's immediate citers/citees. Heavier; up to 60 extra dots.

References (36)

Cited by (9)

Source provenance

europepmc
last seen: 2026-06-22T06:15:23.361955+00:00
openalex
last seen: 2026-06-10T17:14:06.276822+00:00
pubmed
last seen: 2026-05-13T22:24:20.309598+00:00
License: CC0 · commercial use OK