Tumor-intrinsic B4GALNT3 expression drives a protective immune microenvironment in endometriosis-associated ovarian cancer

other OA: gold CC-BY-NC-ND-4.0
AI-generated deep summary by claude@2026-06, 2026-06-09 · read from full text

The paper performed a multi-stage computational analysis of endometriosis-associated ovarian cancer using publicly available transcriptomic data from five GEO cohorts, identifying consensus differentially expressed genes (EAOC versus endometrioma and/or normal) and narrowing them with functional enrichment and two machine-learning feature-selection methods (LASSO logistic regression and SVM-RFE). The candidate hub gene(s) were evaluated for overall-survival prognostic value across TCGA, ICGC, and Kaplan–Meier Plotter, with validation in an additional GEO cohort; diagnostic performance was also tested using leave-one-dataset-out cross-validation, while immune infiltration was assessed by CIBERSORTx deconvolution in a validation cohort. The authors then examined histotype-specific immune correlations in ovarian clear cell and endometrioid carcinomas and used an independent single-cell RNA-seq dataset (OCCC) to infer the cellular source of B4GALNT3, which they validate as mainly attributable to a specific cell lineage, while noting that their single-cell proxy is histotype-based rather than directly the full EAOC dataset. This paper is centrally about endometriosis-associated ovarian cancer — specifically, it identifies tumor-intrinsic B4GALNT3 expression as shaping a protective immune microenvironment in that endometriosis-linked malignancy.

Read from the paper's body, not the abstract. Not a substitute for reading the paper. No clinical advice. How this works

Abstract

BACKGROUND: Although endometriosis-associated ovarian cancer (EAOC) is considered a separate clinical entity, no specific prognostic biomarkers aid in its management. This has, therefore, been among the factors hindering the development of tailored treatments. We aim to develop a robust, histotype-aware biomarker for EAOC through an integrative computational approach to explain its association with the tumor immune microenvironment. METHODS: A multi-stage bioinformatics approach using multiple independent Gene Expression Omnibus (GEO) cohorts was employed. We extracted consensus differentially expressed genes (DEGs) from three discovery datasets (EAOC vs. non-malignant tissue). These DEGs were further distilled into high-confidence hub genes using two machine learning algorithms. The pan-cancer prognostic potential was assessed via meta-analysis and tested for validity in an independent, EAOC-enriched cohort (GSE65986). The derived immune context was assessed using CIBERSORTx deconvolution in a pure EAOC cohort (GSE226870), while the cellular origin of our candidate was determined using an independent ovarian clear cell carcinoma (OCCC) single-cell RNA sequencing (scRNA-seq) dataset (GSE224334). RESULTS: From our analysis, we identified 75 consensus DEGs distilled into five hub genes. Among these, B4GALNT3 was the key candidate. While the pan-ovarian cancer meta-analysis showed a non-significant protective trend, we confirmed in our EAOC-enriched validation cohort that high B4GALNT3 expression was significantly associated with improved overall survival [hazard ratio (HR) =0.350, P=0.04]. It showed robust diagnostic potential with an overall area under the curve (AUC) of 0.962 [95% confidence interval (CI): 0.923-0.993] in leave-one-dataset-out cross-validation among discovery datasets. Immune deconvolution revealed that B4GALNT3 expression correlated with an anti-tumor microenvironment composed of increased levels of plasma B cells, memory B cells, and activated dendritic cells, with decreased regulatory T cells and M2 macrophages. Finally, scRNA-seq analysis confirmed that B4GALNT3 was intrinsically highly expressed in malignant and epithelial cells, with low expression in immune lineages. CONCLUSIONS: B4GALNT3 is a novel, subtype-specific protective biomarker in EAOC. Our findings support a mechanism by which tumor-cell-intrinsic expression of B4GALNT3 drives protection from immune microenvironments. This work identifies B4GALNT3 as a promising prognostic factor and potential target for further mechanistic studies and protein-level validation in EAOC.
Full text 35,096 characters · extracted from pmc · 6 sections · click to expand

Intro

Ovarian cancer continues to be one of the most lethal gynecological malignancies worldwide, a status attributable largely to its typically late-stage diagnosis and the high incidence of recurrence following standard therapies ( 1 ). Within the heterogeneous spectrum of ovarian neoplasms, endometriosis-associated ovarian cancer (EAOC), which predominantly includes endometrioid and clear cell histotypes, has been recognized as a distinct clinical and molecular entity arising from endometriotic lesions ( 2 , 3 ). The etiological linkage between endometriosis and subsequent malignant transformation highlights a unique carcinogenic pathway, yet, the precise molecular drivers that govern EAOC initiation and progression remain largely elusive ( 4 ). This significant knowledge gap has hindered the development of tailored diagnostic tools and effective therapeutic interventions, underscoring the urgent necessity for robust biomarkers capable of accurately predicting patient outcomes and guiding clinical decision-making. The advent of high-throughput sequencing technologies, coupled with the increasing availability of public databases such as the Gene Expression Omnibus (GEO), has provided an unprecedented opportunity to explore the complex transcriptomic landscapes of various cancers, including EAOC ( 5 , 6 ). The integration of advanced bioinformatics and sophisticated machine learning algorithms offers a powerful framework for systematically sifting through vast amounts of genomic data to identify genes with genuine biological and clinical relevance. By moving beyond simple gene lists to construct functional interaction networks and validate findings across multiple cohorts, it is possible to identify consensus gene signatures that are robust, reproducible, and reflective of the core pathological mechanisms ( 7 ). Such computational approaches have proven invaluable in discovering novel biomarkers for diagnosis, prognosis, and therapeutic targeting across numerous malignancies. Therefore, this study was designed to execute a comprehensive, multi-stage computational analysis aimed at identifying and validating key prognostic biomarkers for EAOC. We hypothesized that by integrating data from multiple independent cohorts and applying a stringent filtering pipeline that combines differential expression analysis, functional enrichment, and machine learning-based feature selection, we could uncover novel genes critically involved in EAOC pathogenesis. Furthermore, considering the pivotal role of the tumor immune microenvironment in cancer progression and therapeutic response, we conducted histotype-aware immune deconvolution to relate candidate biomarkers to the EAOC-specific immune composition, with secondary benchmarking in ovarian clear cell carcinoma (OCCC) and endometrioid ovarian carcinoma (ENOC) ( 8 ). The goal of this work was to identify a high-confidence prognostic marker, explore its role in the EAOC immunological context using deconvolution, and validate its cellular origin using independent single-cell transcriptomic data, thereby laying the groundwork for future functional studies and the development of new therapeutic strategies. We present this article in accordance with the STROBE reporting checklist (available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-aw-2458/rc ).

Methods

This study utilized publicly available transcriptomic data from five independent cohorts obtained from the National Center for Biotechnology Information (NCBI) GEO database. The datasets were organized into discovery and validation cohorts. The discovery cohorts, used for the identification of differentially expressed genes (DEGs), included three sets: GSE226575 , GSE157153 , and GSE230956 . The validation cohorts included two independent sets: GSE65986 was used for subtype-specific survival analysis, and GSE226870 was used for subtype-specific immune infiltration analysis. The characteristics of all datasets are summarized in Table 1 . The raw data files for all datasets were downloaded and subjected to standardized preprocessing procedures, which included background correction, log 2 transformation, and quantile normalization. Probes were annotated to their corresponding gene symbols, and for genes represented by multiple probes, the average expression value was calculated. Detailed methodological descriptions are provided in the Appendix 1 . The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. EAOC, endometriosis-associated ovarian cancer; EM, endometriosis; ENOC, endometrioid ovarian carcinoma; GEO, Gene Expression Omnibus; OCCC, ovarian clear cell carcinoma. DEGs were identified for each of the three discovery datasets by comparing EAOC samples (cases) against non-malignant tissues (endometrioma and/or normal, collectively treated as controls) using the limma package in R software. DEG analyses were conducted within each dataset separately without cross-dataset batch correction or pooling, and consensus DEGs were obtained by intersection. A stringent cutoff criterion was established, with genes exhibiting an absolute log 2 fold change (log 2 FC) ≥2 and a false discovery rate (FDR) of <0.05 considered to be significant DEGs. Volcano plots were generated to visualize the overall distribution of these DEGs prior to intersection analysis. Subsequently, a Venn diagram analysis was performed to isolate the subset of DEGs that were commonly dysregulated across all three datasets. This consensus gene set was then subjected to a comprehensive suite of functional and pathway enrichment analyses. Gene ontology (GO), focusing on biological process (BP) and cellular component (CC) terms, and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses were performed. Furthermore, a protein-protein interaction (PPI) network analysis was conducted using the Metascape web portal. Gene set enrichment analysis (GSEA) was performed with fgsea (pre-ranked; 10,000 permutations), and pathways with FDR <0.05 were considered significantly enriched. Genes were ranked based on the t -statistic derived from the limma analysis, and the HALLMARK gene sets (Category H) from the Molecular Signatures Database (MSigDB) were used as the reference gene set collection ( 9 , 10 ). To further distill the list of common DEGs and identify the most impactful features, two distinct machine learning algorithms were applied using Python’s scikit-learn library. All preprocessing and model tuning were performed within stratified 5-fold cross-validation (fixed random seed) to minimize information leakage. The least absolute shrinkage and selection operator (LASSO) logistic regression model was utilized for its ability to perform simultaneous regularization and variable selection, with the optimal regularization parameter determined via cross-validation to select a final set of predictive features ( 11 ). Additionally, the support vector machine-recursive feature elimination (SVM-RFE) algorithm was implemented. A pipeline consisting of median imputation and standard scaling was constructed, and the recursive feature elimination with cross-validation (RFECV) function was then applied with a LinearSVC estimator, using 5-fold stratified cross-validation and optimizing for the F1 macro score to recursively eliminate features and identify an optimal subset of predictive genes. A final set of candidate hub genes was identified through an intersection analysis of the gene sets generated from the preceding steps, yielding a consensus set of five hub genes. The prognostic significance of the candidate hub genes was rigorously evaluated across three independent sources: The Cancer Genome Atlas (TCGA) cohort, the International Cancer Genome Consortium (ICGC) cohort, and the Kaplan-Meier Plotter (KM-Plotter) cohort. The primary endpoint was overall survival (OS). For TCGA RNA sequencing (RNA-seq) data, STAR counts were normalized using upper-quartile scaling and each gene was modeled as a continuous predictor [per 1-standard deviation (SD) increase] in a separate univariable Cox proportional hazards model. For ICGC and KM-Plotter, effect sizes were obtained from binary contrasts of high vs. low expression. Per gene, study-level effects {log [hazard ratio (HR)] and its standard error} were combined across sources using a DerSimonian-Laird random-effects model, implemented with the metafor package in R. To validate the hypothesis of subtype-specific effects, an additional survival analysis was performed using the independent GSE65986 validation cohort. For the final identified hub gene(s), a diagnostic receiver operating characteristic (ROC) curve analysis was performed to evaluate their ability to discriminate between tumor (EAOC) and non-malignant tissues. To strictly evaluate the robustness of the hub gene and rule out dataset-specific bias, a leave-one-dataset-out cross-validation was performed. We iteratively trained the logistic regression model on two of the three discovery datasets and validated it on the remaining independent dataset. This process was repeated for all three combinations to assess generalizability. An overall combined ROC curve was generated, and the area under the curve (AUC) with its 95% confidence interval (CI) was calculated to quantify diagnostic performance. To investigate the role of the candidate hub gene(s) in the EAOC-specific immune microenvironment, immune cell deconvolution was performed on the EAOC samples of the GSE226870 validation cohort using the CIBERSORTx web portal ( 12 ). The relative abundance of 22 immune cell types was estimated using the LM22 leukocyte signature matrix. Following developer recommendations, the analysis was run in absolute mode, with B-mode batch correction enabled to mitigate platform effects, and quantile normalization was disabled for the RNA-seq data. Significance analysis was based on 500 permutations. For downstream analysis, only samples passing pre-specified quality control filters [deconvolution P<0.05, reconstruction correlation ≥0.80, and root-mean-square error (RMSE) ≤0.30] were retained. The association between the expression of the hub gene(s) and the resulting immune cell fractions was assessed using two-sided Spearman’s rank correlation. To account for multiple testing across the 22 cell types, the FDR was controlled using the Benjamini-Hochberg procedure, with q<0.05 considered statistically significant ( 13 , 14 ). In additional histotype-aware analyses, we applied the identical CIBERSORTx pipeline and quality-control criteria to OCCC and ENOC. To validate the cellular origin of B4GALNT3 expression, we analysed an independent, publicly available scRNA-seq dataset GSE224334 of OCCC. Given that OCCC is a predominant histotype of EAOC and our bulk-data immune correlations were directionally concordant in the OCCC subgroup, this cohort served as a valid and relevant proxy to determine the cellular origin of B4GALNT3 within the EAOC context. The dataset was processed using the Seurat package in R ( 3 , 15 ). After quality control, cells were clustered and annotated into major cell lineages (including malignant, epithelial, CD4T, CD8T, mono/macro, fibroblasts, etc.) based on canonical marker gene expression. Normalized B4GALNT3 expression levels were then visualized across all cell clusters on t-distributed stochastic neighbor embedding (t-SNE) plots ( 16 , 17 ). The mean expression of B4GALNT3 within each major cell lineage was calculated and compared to determine the primary cellular source of B4GALNT3. All statistical analyses and machine learning tasks were conducted using R software (version 4.5.2) and Python (version 3.11). Differential expression analysis was performed using the limma package based on linear models and empirical Bayes moderated t -statistics. Feature selection was conducted using LASSO regression and SVM-RFE algorithms. Survival outcomes were evaluated using Kaplan-Meier curves with log-rank tests and univariable Cox proportional hazards regression models to estimate HR and 95% CI. Meta-analysis of HRs was performed using a DerSimonian-Laird random-effects model. Correlations between gene expression and immune infiltration were assessed using Spearman’s rank correlation coefficient. Diagnostic performance was quantified using the area under the ROC curve (AUC). The Benjamini-Hochberg procedure was applied to control the FDR for multiple hypothesis testing. Unless otherwise stated, all statistical tests were two-sided, and a P value <0.05 was considered statistically significant.

Results

The overall workflow of this study is depicted in Figure 1 . Following this pipeline, the initial differential expression analysis of the three integrated GEO discovery datasets revealed a widespread alteration of the transcriptomic landscape in EAOC tissues. A volcano plot identified 7,019 DEGs ( Figure 2A ). To isolate the most robust and consistently dysregulated genes, an intersection analysis yielded a core signature of 75 common DEGs ( Figure 2B ). Overall workflow of the study. The flowchart illustrates the multi-stage analytical pipeline, from data acquisition and identification of DEGs to functional enrichment, machine learning-based feature selection, and the identification of five hub genes. DEG, differentially expressed gene; EAOC, endometriosis-associated ovarian cancer; GO, Gene Ontology; GSE, Gene Expression Omnibus Series; GSEA, Gene Set Enrichment Analysis; ICGC, International Cancer Genome Consortium; KEGG, Kyoto Encyclopedia of Genes and Genomes; KM, Kaplan-Meier; LASSO, least absolute shrinkage and selection operator; PPI, protein-protein interaction; RNA-Seq, RNA sequencing; SVM, Support Vector Machine; TCGA, The Cancer Genome Atlas. Identification of DEGs in EAOC. (A) Volcano plot visualizing DEGs between EAOC and control tissues from the combined GEO datasets. Up-regulated (n=2,847) and down-regulated (n=4,172) genes are shown in red and blue, respectively. (B) Venn diagram showing the overlap of DEGs from three independent GEO datasets ( GSE226575 , GSE157153 , GSE230956 ), identifying 75 common DEGs. Numbers in panels indicate gene counts and overlaps between datasets. DEG, differentially expressed gene; EAOC, endometriosis-associated ovarian cancer; GEO, Gene Expression Omnibus; GSE, Gene Expression Omnibus Series. To elucidate the collective biological significance of the 75 common DEGs, a series of functional analyses was performed. GO analysis indicated that these genes were primarily involved in BP related to cell division ( Figure 3A ) and CCs like the mitotic checkpoint complex ( Figure 3B ). The PPI network analysis revealed a densely interconnected functional network with its core enriched in the mitotic cell cycle ( Figure 3C ). KEGG pathway-gene correlations ( Figure 3D ) and GSEA ( Figure 3E ) also pointed towards cancer-related pathways, such as epithelial-mesenchymal transition (EMT) and TNFα signaling via NF-κB. Functional enrichment analysis of the 75 common DEGs. (A) GO enrichment results for BP. (B) GO enrichment results for CC. (C) PPI network of the common DEGs, with the main cluster enriched in mitotic cell cycle processes. Node size: gene degree within PPI network. Edge thickness: interaction confidence score. (D) Network depicting the relationship between key KEGG pathways and associated DEGs. (E) GSEA plot showing enrichment of the HALLMARK epithelial-mesenchymal transition and TNFα signaling via NF-κB pathways in EAOC. BP, biological process; CC, cellular component; DEG, differentially expressed gene; EAOC, endometriosis-associated ovarian cancer; FDR, false discovery rate; GO, gene ontology; GSEA, Gene Set Enrichment Analysis; KEGG, Kyoto Encyclopedia of Genes and Genomes; PPI, protein-protein interaction. To distill the most robust biomarkers from the 75 common DEGs, a multi-pronged strategy was employed. LASSO regression ( Figure 4A ) and SVM-RFE algorithms ( Figure 4B ) identified 10 and 23 optimal predictive genes, respectively. The intersection of these gene sets with genes from the KEGG analysis yielded a final consensus list of five high-confidence hub genes, i.e., B4GALNT3, CLDN4, MARVELD2, OCLN , and SGPP2 ( Figure 4C ). Hub gene selection via machine learning and integrated analysis. (A) Feature selection using the LASSO regression model, identifying 10 optimal features. (B) Feature selection using the SVM-RFE model, identifying 23 optimal features. (C) Venn diagram illustrating the intersection of gene sets from LASSO, SVM-RFE, and KEGG analyses to identify five hub genes. λ: regularization parameter in LASSO. F1: harmonic mean of precision and recall. N: number of features retained at minimum deviance. KEGG, Kyoto Encyclopedia of Genes and Genomes; LASSO, least absolute shrinkage and selection operator; RFECV, recursive feature elimination with cross-validation; SVM-RFE, support vector machine-recursive feature elimination. We first systematically evaluated the prognostic value of the five candidate hub genes via a meta-analysis of three large-scale pan-ovarian cancer cohorts ( Table 2 ). The analysis confirmed that high expression of CLDN4 and OCLN were significant risk factors for poor prognosis. In contrast, the combined HR for B4GALNT3 indicated a protective trend (combined HR =0.851), but this did not reach statistical significance (P=0.26) due to high inter-study heterogeneity (I 2 =78.3%) ( Figure 5A ). To test the hypothesis that this effect was subtype-specific, we performed a validation analysis in the independent GSE65986 cohort, which is highly enriched with EAOC-related subtypes. Strikingly, the protective association of B4GALNT3 with OS was independently and significantly observed in this cohort (HR =0.350, P=0.04) ( Figure 5B ). Taken together, B4GALNT3 showed lower expression in EAOC than in non-malignant tissue, and among EAOC cases higher expression correlated with better OS. This series of analyses ultimately established B4GALNT3 as our sole primary gene of interest. CI, confidence interval; HR, hazard ratio; OS, overall survival. Prognostic association and diagnostic performance of B4GALNT3. (A) Forest plot of the meta-analysis for B4GALNT3 expression and OS in pan-ovarian cancer cohorts. Study-specific HRs include a univariable Cox model for TCGA. (B) Kaplan-Meier survival curve analysis of B4GALNT3 in the EAOC-related subtype cohort GSE65986 (HR =0.35, P=0.04). (C) Diagnostic performance of B4GALNT3 evaluated via leave-one-dataset-out cross-validation. The plot displays ROC curves for validation in each independent dataset ( GSE226575 , AUC =0.950; GSE157153 , AUC =0.979; GSE230956 , AUC =0.938) and the overall combined performance (AUC =0.962). Error bars: 95% CI of HR. P: log-rank test significance. AUC, area under the curve; CI, confidence interval; EAOC, endometriosis-associated ovarian cancer; GSE, Gene Expression Omnibus Series; HR, hazard ratio; ICGC, International Cancer Genome Consortium; KM, Kaplan-Meier; OS, overall survival; ROC, receiver operating characteristic; TCGA, The Cancer Genome Atlas. Further investigation indicated strong discriminative performance of B4GALNT3 in distinguishing EAOC from non-malignant tissues. Through leave-one-dataset-out cross-validation, B4GALNT3 demonstrated consistent diagnostic ability across independent cohorts, with validation AUCs of 0.950 in GSE226575 , 0.979 in GSE157153 , and 0.938 in GSE230956 . The overall combined AUC was 0.962 (95% CI: 0.923–0.993) ( Figure 5C ). To interrogate the immunologic context of B4GALNT3, we performed immune cell deconvolution in the independent EAOC cohort ( GSE226870 ). As shown in Figure 6A , B4GALNT3 expression was positively correlated with plasma B cells (rho =0.683, P=0.02), memory B cells (rho =0.595, P=0.01), CD4 + memory resting T cells (rho =0.545, P=0.04), and activated myeloid dendritic cells (rho =0.421, P=0.05). In contrast, inverse associations were observed with M0 macrophages (rho =−0.413, P=0.04), naive B cells (rho =−0.446, P=0.02), regulatory T cells (Tregs; rho =−0.479, P=0.04), and M2 macrophages (rho =−0.557, P=0.01). Correlation of B4GALNT3 with the immune microenvironment by histotype. (A) Lollipop plot of Spearman correlations (rho) between B4GALNT3 expression and CIBERSORTx-inferred immune cell fractions in the combined EAOC cohort ( GSE226870 ). (B) Lollipop plot of Spearman correlations (rho) in the OCCC subgroup of GSE226870 , analysed with the identical pipeline. (C) Lollipop plot of Spearman correlations (rho) in the ENOC subgroup of GSE226870 , analysed with the identical pipeline. Asterisks denote FDR-adjusted significance (q<0.05). Asterisks indicate significant correlations after FDR correction (q<0.05); bars to the right of zero denote positive correlations; bars to the left denote negative correlations. EAOC, endometriosis-associated ovarian cancer; ENOC, endometrioid ovarian carcinoma; FDR, false discovery rate; GSE, Gene Expression Omnibus Series; OCCC, ovarian clear cell carcinoma. A directionally concordant pattern was observed in the OCCC subgroup ( Figure 6B ), where B4GALNT3 correlated positively with memory B cells (rho =0.608, P=0.03) and activated dendritic cells (rho =0.507, P=0.04), and inversely with M2 macrophages (rho =−0.589, P=0.04) and M0 macrophages (rho =−0.404, P=0.046). In the ENOC subgroup ( Figure 6C ), associations were again directionally consistent and of slightly larger magnitude than in OCCC: positive correlations with plasma B cells (rho =0.645, P=0.02), memory B cells (rho =0.511, P=0.03), activated dendritic cells (rho =0.464, P=0.02), and CD4 + memory resting T cells (rho =0.426, P=0.02); and negative correlations with Tregs (rho =−0.466, P=0.02), naïve B cells (rho =−0.492, P=0.01), and M2 macrophages (rho =−0.506, P=0.01). A borderline inverse association was noted for M0 macrophages (rho =−0.335, P=0.052). Overall, across EAOC, OCCC, and ENOC, the direction of correlations was consistent, with positive correlations for B-cell and activated dendritic cell fractions and negative correlations for Treg and M2 macrophage fractions. Effect sizes were smaller in OCCC and ENOC than in EAOC; exact statistics and multiple-testing details are reported in Figure 6A-6C and legends. Our bulk transcriptomic deconvolution revealed a strong correlation between overall B4GALNT3 expression and an anti-tumor immune infiltrate. However, this analysis could not determine the cellular source of B4GALNT3. To resolve this ambiguity, we analysed an independent scRNA-seq cohort from an OCCC cohort ( GSE224334 ), which represents a major histological subtype of EAOC. We identified nine major cell lineages, including malignant cells, epithelial cells, T cells (CD4T, CD8T), myeloid cells (mono/macro), and various stromal cells ( Figure 7A ). Visualization of B4GALNT3 expression on the t-SNE plot demonstrated that its expression was almost exclusively confined to the malignant and epithelial cell clusters ( Figure 7B ). Single-cell RNA-Seq analysis of B4GALNT3 cellular localization in OCCC. (A) t-SNE plot of all cells from an OCCC scRNA-seq cohort, colored by major cell lineage. (B) The same t-SNE plot colored by the normalized expression level of B4GALNT3, showing high expression localized to the malignant and epithelial clusters. Color gradient (purple-yellow) represents normalized B4GALNT3 expression intensity; bar height indicates mean expression per cell lineage. (C) Bar plot quantifying the mean expression of B4GALNT3 across each major cell lineage. OCCC, ovarian clear cell carcinoma; scRNA-seq, single-cell RNA sequencing; t-SNE, t-distributed Stochastic Neighbor Embedding. Quantification of mean expression levels across lineages confirmed this stark localization. Malignant cells (mean expression =2.33) and epithelial cells (mean expression =1.48) showed dramatically higher B4GALNT3 expression than all other cell types. In contrast, immune and stromal lineages, including CD4T/CD8T cells, mono/macro, and fibroblasts, showed negligible baseline expression (all mean expression <0.86) ( Figure 7C ). This result provides direct evidence that the protective B4GALNT3 signal observed in bulk tissue analysis originates from the tumor cells themselves, supporting a tumor-cell-intrinsic mechanism.

Discussion

This study employed a systematic, integrative analytical pipeline and identified B4GALNT3 as a subtype-specific protective biomarker in EAOC, linking its expression to a more permissive immunologic tumor microenvironment. Convergent evidence from statistics, machine learning, and pathway analyses, together with immune deconvolution and single-cell findings, supports a model in which a tumor-intrinsic signal exerts cell-extrinsic immunologic effects. These observations warrant orthogonal validation at the protein level. Our multi-step filtering strategy underscores the value of combining differential expression, network biology, and feature selection. Functional enrichment, particularly the protein-protein interaction network, revealed a highly connected module centered on mitotic cell-cycle programs, consistent with the proliferative phenotype of EAOC against an endometriosis-associated inflammatory background ( 18 ). This framework provides biological context for interpreting immune variation and helps explain the stronger B-cell and dendritic-cell activity observed in B4GALNT3-high tumors. A pivotal step was prognostic validation. In the pan-ovarian meta-analysis, B4GALNT3 showed a protective trend that did not reach statistical significance, suggesting subtype dependence. This hypothesis was then tested in an EAOC-enriched cohort, where the association was independently and significantly confirmed, elevating B4GALNT3 from a candidate to a supported prognostic factor in a defined pathologic context. Across three pan-ovarian cohorts, high CLDN4 and OCLN consistently correlated with worse outcomes, while B4GALNT3 exhibited a protective trend (combined HR =0.851) that did not meet statistical significance in the presence of substantial heterogeneity (I 2 =78.3%, P=0.26) ( Figure 5A , Table 2 ). In the independent EAOC-enriched GSE65986 cohort, we reproduced the protective effect, with higher B4GALNT3 associated with longer OS (HR =0.350, P=0.04; Figure 5B ). These results indicate that clinical interpretation should be firmly anchored in the appropriate histologic context. We recognize that histotype mixing, residual confounding, and limited sample size may affect effect estimates and cutoff generalizability; larger, rigorously stratified external EAOC cohorts are needed for replication and calibration. At the diagnostic level, the cross-validated overall AUC of 0.962 (95% CI: 0.923–0.993) demonstrates robust discrimination. Importantly, the consistent performance observed across the leave-one-dataset-out folds confirms that the diagnostic utility of B4GALNT3 is generalizable and not driven by dataset-specific artifacts. In clinical practice, it should be considered alongside CA125/HE4 and imaging ( 19 ). While this strict cross-validation mitigates algorithmic optimism, establishing precise thresholds for real-world settings will still require prospective validation in larger, multi-center cohorts. Immune analyses indicate that higher B4GALNT3 associates with strengthened humoral and antigen-presentation axes and attenuation of suppressive programs. Specifically, proportions of plasma cells, memory B cells, and activated dendritic cells increase, whereas regulatory T cells and M2 macrophages decrease, consistent with microenvironmental remodeling in a protective direction ( 20 ). Differences across histotypes are biologically meaningful. EAOC and ENOC more clearly show increases in B cells and activated dendritic cells with declines in Tregs and M2 macrophages, aligning with a chronic inflammatory, endometriosis-derived epithelial context and epithelial-stromal crosstalk ( 21 ). OCCC follows the same direction with smaller amplitudes, a pattern that may reflect histotype-specific metabolic and stromal programs, recurrent genomic alterations, and study-level factors such as sample size, purity, and platform. Directional concordance across EAOC ( Figure 6A ), OCCC ( Figure 6B ), and ENOC ( Figure 6C ) supports a unified view: B4GALNT3 tracks with microenvironmental features permissive to antitumor immunity, with effect size contingent on histology, underscoring the need for histotype-stratified validation. Single-cell atlases provide clinical and biological context for these bulk findings. In GSE224334 , B4GALNT3 transcripts localize predominantly to malignant and adjacent epithelial cells, with relatively low signal in lymphoid and myeloid lineages ( Figure 7A-7C ). This distribution indicates a tumor-cell origin and suggests that upregulation shapes the immune milieu through paracrine and glyco-biological mechanisms ( 22 ). In practical terms, epithelial territories with high B4GALNT3 can exhibit enhanced B-cell maturation and dendritic-cell activation without requiring expression of B4GALNT3 in those immune cells. Anchoring B4GALNT3 to epithelial compartments also reduces interpretive ambiguity inherent to bulk analyses, which are susceptible to compositional confounding ( 23 ). Mechanistically, the epithelial localization is consistent with the glycobiology of B4GALNT3, which encodes a β4-N-acetylgalactosaminyltransferase that synthesizes the LacdiNAc motif ( 24 , 25 ). Previous glycomic profiling in ovarian cancer has directly linked B4GALNT3 and its family member B4GALNT4 to specific alterations in N-linked glycan structures, providing mass-spectrometry-based evidence for this modification in the ovarian tumor context ( 26 , 27 ). Such remodeling of tumor-surface and secreted glycoproteins can alter lectin-mediated recognition on dendritic cells, influence antigen processing and major histocompatibility complex class II (MHC-II) presentation, and tune co-stimulatory thresholds that govern B-cell help and T-cell activation ( 28 , 29 ). In this setting, higher epithelial B4GALNT3 may facilitate efficient licensing of antigen-presenting cells, improve the quality and persistence of humoral responses, and make consolidation of Treg and M2 programs more difficult to maintain. The single-cell evidence of epithelial enrichment provides a cellular anchor for this model, consistent with tumor-intrinsic glycan editing that secondarily modulates antigen presentation and B-cell maturation within the local microenvironment ( 30 ). These observations carry translational implications. Tumors with high B4GALNT3 and a corresponding immune fingerprint characterized by robust B-cell and dendritic-cell activity may be candidates for approaches that consolidate tertiary lymphoid structures or enhance B-T cooperation ( 31 ). Tumors with low B4GALNT3 dominated by suppressive Treg/M2 circuits may be better served by macrophage reprogramming or Treg-modulating strategies ( 32 , 33 ). The single-cell localization points to immunohistochemistry and multiplex immunofluorescence that co-localize B4GALNT3 with epithelial cytokeratins and quantify spatial relationships with CD20/CD138-positive aggregates and mature dendritic cells, ideally complemented by spatial transcriptomics ( 34 ). It is also important to place these findings within the broader context of the glycosyltransferase family. B4GALNT3 is part of a larger class of enzymes whose members often exhibit distinct prognostic values in gynecologic cancers. For instance, a recent study constructed a transcriptomic prognostic score validated across multiple datasets, in which other glycosylation-related genes, such as B4GALNT1, emerged as significant risk factors ( 35 ). This suggests that while the dysregulation of glycosylation machinery is a common feature of ovarian malignancies, specific family members may drive opposing oncologic outcomes, with B4GALNT1 potentially linked to high-risk profiles and B4GALNT3, as shown here, associated with a protective, immune-permissive phenotype in EAOC. Furthermore, integrating a pan-gynecologic perspective enhances the translational value of these findings. While broad prognostic models in gynecologic cancers, including recent sex-informed frameworks, often identify immune-evasive and fibroblast-rich tumor microenvironments as drivers of poor outcomes, our data suggest that B4GALNT3 marks a distinct, immunologically hot or permissive niche within EAOC. The strong correlation with B cells and dendritic cells observed here contrasts sharply with the stromal-dominant, exclusion signatures typical of high-risk gynecologic tumors. This distinction underscores the necessity of histotype-specific immune stratification, as the specific protective immune context driven by B4GALNT3 in EAOC might be obscured in broader sex-informed analyses that prioritize common stromal risk factors. The role of B4GALNT3 is context dependent. Reports of oncogenic activity in gastric and colorectal cancer contrast with associations with favorable prognosis and reduced migration/invasion in neuroblastoma ( 36 - 38 ); the WNK1-B4GALNT3 fusion described in papillary thyroid carcinoma further illustrates function under specific genomic configurations ( 39 , 40 ). These cross-cancer observations reinforce the contextual nature of B4GALNT3 biology and are consistent with our subtype-focused interpretation in EAOC. Several limitations merit consideration. The single-cell evidence derives from a single public dataset with limited sample size, potential platform effects, and lineage annotations inherited from the source study, making ambient RNA and copy-number-related transcriptional inflation difficult to exclude. Bulk deconvolution is sensitive to cellular composition and batch effects, and survival estimates in the EAOC-enriched cohort are constrained by histotype mixing and a limited number of events. The next steps include orthogonal, protein-level validation using multiplex immunohistochemistry to co-localize B4GALNT3 with epithelial markers and to quantify spatial proximity to B-cell and dendritic-cell niches, supported by spatial transcriptomics. Mechanistic studies employing B4GALNT3 perturbation, glycoproteomics, and lectin-binding assays, together with epithelial-antigen-presenting cell co-culture systems, will be essential to test causality and map the specific glycan edits that influence antigen processing and co-stimulatory signaling.

Conclusions

This study suggests that B4GALNT3 is a novel, subtype-specific protective factor in EAOC-related cancers. Our analysis indicates a strong potential for its use in diagnostics and provides the first evidence that its favorable prognostic impact is likely mediated through beneficial remodeling of the tumor immune microenvironment. These findings highlight B4GALNT3 as a promising candidate for further research, a hypothesis that warrants protein-level validation and functional perturbation studies.

Supplementary Material

The article’s supplementary files as

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: pmc

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Condition tags

endometriosis

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-06-11T06:19:48.454388+00:00
pmc
last seen: 2026-05-13T20:22:03.195721+00:00
pubmed
last seen: 2026-05-19T00:30:14.586023+00:00
unpaywall
last seen: 2026-05-11T08:34:28.763810+00:00
License: CC-BY-NC-ND-4.0 · commercial use OK · attribution required
Courtesy of the U.S. National Library of Medicine