Method
The overview design of our work was shown in Fig. 1 . Data sources for MR and transcriptome data are publicly available online. Fig. 1 Research flow chart
Research flow chart
Our work is based on aggregated data from the European ancestor Gwas. I. Estimates of RA effects for snp associated with RA risk were derived from a study that included 3730 cases and 333,429 controls of European ancestry(ukb-a-105). II. ii.Genetic instrumentation for cervical cancer was extracted from a large GWAS study comprising 1,889 cases and 461,044 controls of European ancestry(ukb-b-8777).
Estimates of RA effects for snp associated with RA risk were derived from a study that included 3730 cases and 333,429 controls of European ancestry(ukb-a-105).
ii.Genetic instrumentation for cervical cancer was extracted from a large GWAS study comprising 1,889 cases and 461,044 controls of European ancestry(ukb-b-8777).
All Mendelian randomization (MR) methods rely on three core assumptions to mitigate bias in MR estimation: (i) instrumental variables (IVs) are strongly correlated with RA, (ii) IVs influence cervical cancer solely through their impact on RA, and (iii) IVs are independent of any other confounding factors. Failure to uphold these assumptions can compromise the reliability of conclusions. To ensure optimal IV selection associated with RA, the following steps were implemented:Firstly, single nucleotide polymorphisms (SNPs) closely linked to RA were identified using a statistically significant threshold ( p < 5 × 10–8) from published GWAS studies. Secondly, SNPs in high linkage disequilibrium (LD) were assessed to avoid redundancy; those exceeding the threshold (window size = 10,000 kb, R2 < 0.005) were excluded through LD analysis. Based on the established screening criteria, a total of 13 single nucleotide polymorphisms (SNPs) were identified as significantly associated with rheumatoid arthritis (RA). Notably, all these genetic variants demonstrated F-values exceeding the threshold of 10, indicating strong statistical significance. Subsequently, through comprehensive comparative analysis, four common SNPs were discovered to be shared between RA and cervical cancer, suggesting potential genetic overlap between these two distinct pathological conditions.
An inverse variance weighted (IVW) meta-analysis of Wald ratio estimates was performed to explore the causal effect of RA on cervical cancer risk. Since the IVW test presents a weighted regression of exposure outcomes with an intercept constrained to zero, the estimates may be biased. In this case, we combined MR-Egger regression and weighted median test to analyze the causal relationship.
Mendelian randomized multiple validity residuals and outliers (MR-Presso) analysis, leave-one-out analysis, and Cochrane's Q-test were used to extensively assess MR results. Notably, the MR-PRESSO method corrects for the presence of horizontal pleiotropy by removing peripheral SNP. Leave-one-out analysis can analyze the effect of outliers. Based on IVW and MR-Egger estimation, Cochran's Q-test identifies the SNP leading to heterogeneity.
Microarray expression and clinical data of GSE55457 (13 RA cases and 10 controls) were downloaded from Gene expression Omnibus (GEO) database. The RNA sequencing data of 3 normal tissues and 306 cervical cancer samples with corresponding clinical information were obtained from Cancer Genome Atlas (TCGA). The GSE44001 dataset including 300 patients with cervical cancer and the GSE26712 dataset of 185 ovarian cancers were used to understand the prognosis of genes associated with RA scores. Single-cell data were collected from the National Genomics Data Center (access number: PRJCA008573), including five cases of cervical cancer.
In the RA dataset GSE55457 , differential gene analysis was conducted using calibrated criteria of p 1, and results were visualized through volcano plots created with ggplot2. For scRNA-seq datasets processed separately with Seurat v4 [ 13 ]. Genes ranging from 500 to 7500 were selected based on their expression profiles, ensuring mitochondrial gene scores were below 15%. Following gene selection, data normalization was performed using "NormalizeData", and the top 2000 highly variable genes (HVGs) were identified to stabilize UMI count variances. Subsequently, data underwent correction for batch effects using Harmony and underwent cell cycle correction. PCA cluster analysis identified seven clusters using PC 40 and a resolution of 0.01, characterized by typical cell type marker genes. Major cell type markers included GNLY and NKG7 for natural killer (NK) cells, EPCAM and KRT19 for epithelial cells, COL1A1 and ACTA2 for fibroblasts, LYZ and APOE for macrophages, IGHG1 and IGKC for plamsa cells, VWF and PLVAP for endothelial cells, and MS4A1 and CD79A for B cells. Bubble plots were used to visualize the expression of marker genes across different cell subpopulations. RA-related genes were scored using Singscore, Modulescore, AUC, and UCell methods. The average of these scores was calculated, and cells were classified into high and low RA score groups based on the median of these averages. Differential gene analysis between these groups was conducted using the FindMarkers function [ 14 ].Cellular communication between different subgroups was analyzed using the cellchat package [ 15 ] to assess the strength and frequency of interactions across cell types.
Differential analysis of RA score-associated genes in cervical cancer was initially conducted using calibrated criteria of p 1, visualized through volcano plots generated with ggplot2. Subsequently, 404 RA score-related genes were subjected to survival analysis among cervical cancer patients using univariate Cox regression ( p < 0.01), and results were illustrated with forest plots. The interaction between different genes was explored, alongside analysis of somatic mutation incidence, genetic loci, and copy number variations (CNV) affecting these genes. Finally, univariate Cox regression analysis was performed using GSE44001 cervical cancer data.
CESC patients were clustered based on RNA sequences from 11 RA score-related genes in cervical cancer tissue. The consensus clustering utilized the K-means algorithm implemented in the ‘‘ConsensusClusterPlus’’ R package [ 16 ], repeated 50 times to ensure robust subtype identification. Optimal subcluster numbers were determined using consensus cumulative distribution function (CDF), delta area, and clustering consensus (CLC) metrics.In the TCGA cohort, CESC patients were stratified into two groups based on consensus RA score-related gene clusters. Overall survival (OS) analysis between these groups was performed using Kaplan–Meier (KM) analysis, facilitated by the ‘‘survival’’ and ‘‘survminer’’ R packages.Additionally, Gene Ontology (GO) enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses were conducted for both subclusters using the ‘‘clusterProfiler’’ R package. These analyses aimed to elucidate potential biological processes and pathways associated with the identified molecular subgroups.
To enhance model accuracy, we conducted univariate Cox regression analyses on differential RA score genes in cervical cancer using GSE44001 and GSE26712 datasets, including genes with p < 0.05 in subsequent analyses. Prognostic genes identified in TCGA data and from the aforementioned datasets were intersected separately and deduplicated to derive model genes. Seven genes—SERINC2, ESD, DSG2, ITM2A, NRP1, DENND2D, and FDPS—were selected for inclusion in machine learning models. The screening criterion was p less than 0.05.The cervical cancer data from TCGA was split into training and validation sets in a 1:1 ratio. Ten machine learning methods were employed with 101 algorithm combinations to construct prognostic models simultaneously on the training, validation, and TCGA cohort datasets [ 17 ]. These methods included Random Survival Forest (RSF), Elastic Net (Enet), Lasso, Ridge, Stepwise Cox, CoxBoost, Cox Partial Least Squares Regression (plsRcox), Supervised Principal Components (SuperPC), Generalized Augmented Regression Model (GBM), and Survival Support Vector Machine (Surviva-svm).The Harrell consistency index (C-index) was calculated for each model on the training and validation sets as well as the TCGA cohort. The model with the highest average C-index across these datasets was selected as the optimal model. Additionally, the ‘‘survival’’ package was utilized for survival modeling and Kaplan–Meier (KM) analysis, while the ‘‘timeROC’’ package was employed to generate ROC curves.
Initially, we performed univariate and multivariate Cox regression analyses using the ‘‘survival’’ package to examine the relationship between the risk score and clinical characteristics. Subsequently, we integrated prognostic and clinical factors to construct a nomogram using the R package ‘‘RMS’’. The performance of both the risk score model and the nomogram was assessed using calibration curves and time-dependent ROC curves.Following this, we compared clinical characteristics, tumor stemness index, and immune scores between the high-risk and low-risk groups.
The R package ‘‘GSVA’’ was used for ssGSEA [ 18 ] and ‘‘CIBERSORT [ 19 ]’’ for immune cell infiltration calculations to examine differences in immune cell infiltration levels between the two risk groups. Additionally, the TIDE algorithm [ 20 ] was employed to assess response to immune checkpoint blockade therapy (ICB therapy) and predict neoantigen activity ( http://tide.dfci.harvard.edu ). The TIDE algorithm integrates two primary mechanisms of immune evasion: T-cell dysfunction and T-cell rejection. Higher TIDE scores indicate a greater likelihood of tumor immune escape and poorer response to immunotherapy. Immunotherapy scores for a large sample cohort from the TCGA database were obtained from TCIA ( https://tcia.at/home ), comparing treatment scores for anti-CTLA4 and anti-PD1 inhibitors across different risk groups. Somatic mutation data (MuTect2 Variant Aggregation and Masking) were downloaded from the TCGA-CESC dataset, and mutation frequencies in specific patients were calculated using the R package ‘‘maftools’’ [ 21 ]. TMB, defined as the total number of mutations per million bases of somatic cells, was analyzed using the ‘‘maftools’’ package. Kaplan–Meier (KM) analysis was conducted to reveal overall survival (OS) differences between TMB groups.
Survival differences and immunotherapy effects between different risk groups in IMvigor210 were initially compared. Survival differences between different risk groups in GSE78220 [ 22 ] and GSE135222 [ 23 ] and treatment sensitivity in the GSE91061 [ 23 ] dataset were compared. Correlations between risk scores and the IC50 values of commonly used chemotherapeutic drugs such as cisplatin, docetaxel, and paclitaxel, as well as common targeted therapy drugs, were investigated using IC50 data obtained from the Genomics of Drug Sensitivity in Cancer (GDSC).
The expression levels of the seven model genes were validated using the GEPIA 2 online database ( http://gepia2.cancer-pku.cn/ ). Additionally, gene expression in cervical cancer tissues and normal cervical tissues was confirmed using the HPA protein database ( https://www.proteinatlas.org/ ).
Cervical cancer cells (HeLa and SiHa) and normal cervical epithelial cells (ECT1/E6E) were kindly donated to Li Li's group at the Cancer Hospital of Guangxi Swell Medical University. These cells were maintained in RPMI 1640 medium (Sigma—Aldrich; Thermo Fisher Scientific) containing 10% fetal bovine serum (FBS) (FBS; Gibco; Expression Vector; Thermo Fisher Scientific), 100 IU/mL penicillin and 10 µg/mL streptomycin (Thermo Fisher Scientific). All cells were cultured at 37 °C under 5% CO2.
A total of 10 cases of fresh cervical cancer tissues and 10 cases of fresh normal cervical tissues were collected in the Nanning Second People’s Hospital from 2023 to 2024. Written informed consent was obtained from each patient and approved by the Research Ethics Committee of the Nanning Second People’s Hospital. The case selection criteria were established as follows: participants aged between 55 and 60 years with negative human papillomavirus (HPV) status were included. Furthermore, individuals were excluded if they presented with any chronic systemic conditions, including but not limited to hypertension, diabetes mellitus, or previous malignancies. Additionally, cervical tissue samples were obtained from patients undergoing hysterectomy for uterine fibroids or adenomyosis, with histological confirmation showing normal cervical morphology and clear surgical margins. Written informed consent was acquired from all participating individuals following a comprehensive explanation of the study protocol. Participants were explicitly assured regarding the absence of any potential secondary victimization or financial obligations associated with their involvement. Moreover, the research protocol underwent rigorous review and subsequently received formal approval from the Institutional Review Board (IRB) of the Second People's Hospital of Nanning, with the assigned ethical approval number Y2023019. The acquisition and use of human cervical tissue samples in this study were conducted in full compliance with the ethical principles outlined in the Declaration of Helsinki.
These RNAs were extracted by TRIzol® (Invitrogen; Thermo Fisher Scientific, Inc) and reverse transcribed by HiScript III RT SuperMix for qPCR (Vazyme Biotech). Gene primer sequences are shown in Table S2. PCR cycling parameters were: 95 °C for 5 min, 40 cycles of 30 s, and 60 °C for 1 min. each sample was performed in 3 separate reactions, and the average of each point was calculated. mRNA expression levels were normalized to GAPDH levels.
In this experiment, all statistical analyses were conducted using R 4.3. For non-parametric data, comparisons between two independent samples and multiple samples were performed using the Wilcoxon test and Kruskal–Wallis test, respectively. Parametric data were analyzed using the t-test and one-way ANOVA. A p-value < 0.05 was considered statistically significant (* p -value < 0.05; ** p < 0.01; *** p < 0.001). Relevant R packages, including ‘‘ggplot2’’, ‘‘ggpubr’’, ‘‘survival’’, ‘‘survminer’’, and other R packages, were obtained from Bioconductor or R packages. The threshold for statistical significance was set at p < 0.05 for each analysis.
Result
As depicted in Table 1 , MR analysis revealed a causal association between RA and cervical cancer in a European cohort (IVW: OR 1.15, 95% CI 1.06–1.24, p < 0.001; Weighted median: OR 1.15, 95% CI 1.11–1.20, p < 0.001; Weighted mode: OR 1.15, 95% CI 1.08–1.22, p < 0.05; Weighted mode: OR 1.15, 95% CI 1.10–1.20, p < 0.05) (Table 1 and Fig. 2 ). First, MR-Egger regression was employed to investigate horizontal pleiotropy, and the results indicated no significant impact of pleiotropy on causality (all p -values > 0.05). Second, the MR Egger test results were consistent with the IVW method without outliers, affirming the reliability of the initial findings. Third, to further explore the relationship between RA and cervical cancer in the European cohort, we conducted leave-one-out analysis and Cochrane Q-test. The leave-one-out analysis did not identify any single SNP driving the causal relationship between RA and cervical cancer (Fig. 2 ). Cochrane Q-tests showed p -values all less than 0.05 (Q-value of 10.48, p < 0.05 for IVW test; Q-value of 14.70, p < 0.05 for MR-Egger test), indicating heterogeneity among the SNPs. Table 1 MR analysis revealed a causal association between RA and cervical cancer in a European cohort id.exposure id.outcome method nsnp b se pval or or_lci95 or_uci95 ukb-a-105 ukb-b-8777 MR Egger 4 0.186889 0.070112 0.116626 1.205494 1.050715 1.383074 ukb-a-105 ukb-b-8777 Weighted median 4 0.144097 0.02021 1.00E-12 1.154996 1.11014 1.201665 ukb-a-105 ukb-b-8777 Inverse variance weighted 4 0.136623 0.040813 0.000815 1.146396 1.058264 1.241867 ukb-a-105 ukb-b-8777 Simple mode 4 0.139261 0.031871 0.022172 1.149424 1.079818 1.223516 ukb-a-105 ukb-b-8777 Weighted mode 4 0.139261 0.021726 0.007694 1.149424 1.101505 1.199427 Fig. 2 Mendelian randomization analysis of causal effects. A : Distribution of genetic variant effects (n = 4). B : Results from Mendelian randomization (MR) methods, including MR test, weighted median, MR Egger, and weighted mean, with effect estimates (β) and confidence intervals shown. Negative values indicate inverse causal relationships. Graphs represent the aggregated analysis. C : Leave-one-out sensitivity analysis (forest plot) demonstrating the robustness of causal estimates upon sequential exclusion of individual variants. D : Funnel plot assessing potential pleiotropy or bias in the causal relationship between rheumatoid arthritis (RA) and cervical cancer (CC)
MR analysis revealed a causal association between RA and cervical cancer in a European cohort
Mendelian randomization analysis of causal effects. A : Distribution of genetic variant effects (n = 4). B : Results from Mendelian randomization (MR) methods, including MR test, weighted median, MR Egger, and weighted mean, with effect estimates (β) and confidence intervals shown. Negative values indicate inverse causal relationships. Graphs represent the aggregated analysis. C : Leave-one-out sensitivity analysis (forest plot) demonstrating the robustness of causal estimates upon sequential exclusion of individual variants. D : Funnel plot assessing potential pleiotropy or bias in the causal relationship between rheumatoid arthritis (RA) and cervical cancer (CC)
Differential analysis of RA sequencing data yielded 312 differential genes, comprising 123 down-regulated and 189 up-regulated genes (Table S1, Fig. 3 A). Using single-cell analysis, differential gene enrichments were scored across high-quality transcriptome data obtained from 48,045 cells after rigorous quality assurance and filtering. De-batching of samples using harmony reduced the differences in cell distribution. Subsequent PCA clustering of cells identified seven distinct clusters with minimal inter-cluster influence (Fig. 3 B).Cellular annotation based on marker genes categorized the seven clusters into epithelial cells, endothelial cells, fibroblasts, macrophages, B-cells, plamsa cells, and NK cells (Fig. 3 C). Bubble plots illustrating marker gene expression in different cell subgroups highlighted distinct expression patterns (Fig. 3 D). RA scores derived from differential genes were computed using four scoring methods and mapped onto box plots, revealing predominant enrichment in NK cells across various algorithms (Supplementary Fig. S1). A consolidated scoring outcome based on the average score from the four algorithms demonstrated the cellular distribution of scores in UMAP (Fig. 3 E), with high expression particularly noted in NK cells as depicted in Fig. 3 F and 3G.Cells were stratified into high and low RA score subpopulations based on median values, and differential gene analysis between these subtypes was conducted. Analysis of cellular communication indicated heightened interactions in the high-score subpopulation (Fig. 3 H). Specifically, NK cells exhibited more frequent and intense communication with epithelial cells, fibroblasts, and endothelial cells, with particularly strong interactions observed between NK cells and fibroblasts (F i g. 3 I,J).These findings collectively suggest that RA scores influence NK cell dynamics in cervical cancer patients, thereby impacting fibroblasts and epithelial cells (tumor cells). Fig. 3 Single-cell analysis of rheumatoid arthritis (RA)-associated genes in cervical cancer. A : Volcano plot of differentially expressed genes (DEGs) associated with RA. B : UMAP visualization of single-cell data after batch correction. C : Cell clustering analysis identifying 7 distinct clusters. D : UMAP plot showing the distribution of different cell subpopulations. E : Bubble plot displaying marker genes for each cell subpopulation. F : UMAP visualization of RA-associated scores across cell subpopulations. G : Boxplot comparing RA-associated scores among different cell subpopulations. H : Dot plot showing RA scores calculated by multiple analytical algorithms. I – J : Cell–cell communication analysis depicting (I) the number and (II) strength of signaling interactions between cell populations
Single-cell analysis of rheumatoid arthritis (RA)-associated genes in cervical cancer. A : Volcano plot of differentially expressed genes (DEGs) associated with RA. B : UMAP visualization of single-cell data after batch correction. C : Cell clustering analysis identifying 7 distinct clusters. D : UMAP plot showing the distribution of different cell subpopulations. E : Bubble plot displaying marker genes for each cell subpopulation. F : UMAP visualization of RA-associated scores across cell subpopulations. G : Boxplot comparing RA-associated scores among different cell subpopulations. H : Dot plot showing RA scores calculated by multiple analytical algorithms. I – J : Cell–cell communication analysis depicting (I) the number and (II) strength of signaling interactions between cell populations
Of the 1526 RA score-associated genes, 404 were found to exhibit significant differences in cervical cancer, including 150 down-regulated genes and 254 up-regulated genes, as depicted in Fig. 4 A. Univariate Cox analysis conducted on these 404 genes identified 30 genes that were associated with prognosis in cervical cancer (Fig. 4 B). To illustrate the intricate relationship between RA score-related genes and their prognostic implications in cervical cancer, a network diagram was constructed (Fig. 4 C).The somatic mutation frequencies of these 30 genes in cervical cancer were investigated, with ROCK2 showing the highest prevalence of mutations (up to 15%), while the remaining genes exhibited lower mutation frequencies (Fig. 4 D). The genomic locations of these 30 genes are depicted in Fig. 4 E. Analysis further revealed prevalent copy number variations (CNVs) among these genes, with TFRC, EFNA1, and LIMD2 displaying amplifications, and ESD showing deletions (Fig. 4 F).Comparative analysis indicated that the majority of RA score-related genes were significantly up-regulated in cervical cancer tissues compared to normal tissues (Fig. 4 G). Assessment of the impact of these 30 genes on overall survival (OS) in cervical cancer patients using GEO dataset showed that the expression levels of DLL4, LAGE3, CDH5, SUV39H1, and ESD were statistically correlated with OS (Fig. 4 H). Fig. 4 Prognostic analysis of RA-associated signature genes in cervical cancer. A : Volcano plot showing differential expression of RA-associated genes in TCGA-CC dataset. B : Forest plot of univariate Cox regression analysis identifying prognostic genes. C : Protein–protein interaction network of prognostic genes, with purple nodes indicating risk factors and green nodes representing protective factors. D : Mutation waterfall plot of prognostic genes, showing ROCK2 with the highest mutation frequency (15%). E : Circos plot displaying chromosomal locations of prognostic genes. F : CNV bar plot revealing copy number variations, with TFRC showing the highest gain frequency. G : Boxplot demonstrating differential expression patterns of prognostic genes in TCGA (majority showing elevated expression). H : Validation forest plot confirming the prognostic value of identified genes in GEO datasets
Prognostic analysis of RA-associated signature genes in cervical cancer. A : Volcano plot showing differential expression of RA-associated genes in TCGA-CC dataset. B : Forest plot of univariate Cox regression analysis identifying prognostic genes. C : Protein–protein interaction network of prognostic genes, with purple nodes indicating risk factors and green nodes representing protective factors. D : Mutation waterfall plot of prognostic genes, showing ROCK2 with the highest mutation frequency (15%). E : Circos plot displaying chromosomal locations of prognostic genes. F : CNV bar plot revealing copy number variations, with TFRC showing the highest gain frequency. G : Boxplot demonstrating differential expression patterns of prognostic genes in TCGA (majority showing elevated expression). H : Validation forest plot confirming the prognostic value of identified genes in GEO datasets
Consistent cluster analysis was utilized to classify 304 cervical cancer patients into two groups, C1 and C2 (Table S2), as illustrated in Fig. 5 A–C. The expression differences of the 30 genes between the C1 and C2 clusters are shown in Fig. 5 D. Kaplan–Meier (KM) analysis revealed significant differences in survival between patients in the C1 and C2 groups ( p < 0.001) (Fig. 5 E).Regarding the differences in the tumor microenvironment (TME) between the two clusters, C2 exhibited higher abundance of various immune cell infiltrates, including regulatory T cells, CD8 T cells, activated NK cells, and dendritic cells, among others (Fig. 5 F). Evaluation of the immune score in cervical cancer tumor tissues indicated that immune score, stromal score, and estimate score were markedly elevated in the C2 subpopulation (Fig. 5 G).Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis highlighted significantly activated pathways in C2, such as BIOSYNTHESIS_OF_UNSATURATED_FATTY_ACIDS and ONE CARBON POOL BY FOLATE, among others (Fig. 5 H). Furthermore, Gene Ontology (GO) biopathway analysis demonstrated significant enrichment in functional groups such as GLUCOSE_METABOLIC_PROCESS and PEPTIDYL_PROLINE_HYDROXYLATION_TO_4_HYDROXY_L_PROLINE in C2 (F i g. 5 I). Fig. 5 Molecular subtyping of cervical cancer based on RA-associated signature genes. A – C : Consensus clustering analysis identified two distinct subtypes (C1 and C2) in TCGA cervical cancer cohort. D : Boxplot showing differential expression of prognostic genes between subtypes, with most genes showing significant differences (except LEPROT and few others). E : Kaplan–Meier survival analysis demonstrating better prognosis for C2 subtype patients (log-rank p < 0.05). F : Immune cell infiltration analysis revealing higher infiltration of most immune cell types in C2 subtype. G : Comparison of immune scores showing significantly elevated scores in C2 subtype for multiple immune signatures. H – I : Functional enrichment analysis of differentially expressed genes between subtypes: ( H ) Gene Ontology (GO) terms and ( I ) KEGG pathways
Molecular subtyping of cervical cancer based on RA-associated signature genes. A – C : Consensus clustering analysis identified two distinct subtypes (C1 and C2) in TCGA cervical cancer cohort. D : Boxplot showing differential expression of prognostic genes between subtypes, with most genes showing significant differences (except LEPROT and few others). E : Kaplan–Meier survival analysis demonstrating better prognosis for C2 subtype patients (log-rank p < 0.05). F : Immune cell infiltration analysis revealing higher infiltration of most immune cell types in C2 subtype. G : Comparison of immune scores showing significantly elevated scores in C2 subtype for multiple immune signatures. H – I : Functional enrichment analysis of differentially expressed genes between subtypes: ( H ) Gene Ontology (GO) terms and ( I ) KEGG pathways
Univariate Cox analysis was conducted using the GSE44001 and GSE26712 datasets, identifying 27 genes associated with patient prognosis in the former and 20 genes in the latter (Supplementary Fig. S2). Subsequently, the prognostic genes from these datasets were intersected with those from the TCGA dataset, resulting in the identification of seven genes that were consistently associated with prognosis across all datasets (Supplementary Fig. S2 and Table S3).The TCGA dataset was then split into a training set (N = 152) and a validation set (N = 152) at a 1:1 ratio, and the clinical characteristics of these datasets are detailed in Table S4. We employed 101 different combinations of machine learning methods using the leave-one-out cross-validation (LOOCV) framework to construct predictive models on the training set, validation set, and the entire TCGA cohort. The concordance index (c-index) was calculated for each model (Fig. 6 A). Ultimately, the random survival forest (RSF) model emerged as the best-performing model, achieving an average c-index of 0.894 (Fig. 6 A, bar on the right panel).Using the median risk score, the training set, validation set, and TCGA cohort were stratified into high- and low-risk groups (Table S5). Consistently, patients in the low-risk group exhibited significantly better overall survival (OS) compared to those in the high-risk group across all datasets ( p < 0.001, Fig. 6 B–D). In addition, we further analyzed the relationship between model genes and the prognosis of cervical cancer patients (Supplementary Figure. S3).Finally, ROC curves were analyzed for 1-, 3-, and 5 year survival predictions across different datasets. Notably, the 1-year AUCs were 0.925, 0.872, and 0.958 for the training set, validation set, and TCGA cohort, respectively; the 3 year AUCs were 0.927, 0.916, and 0.980, respectively; and the 5 year AUCs were 0.991, 0.873, and 0.987, respectively (Fig. 6 E–G). Comparing the median survival results of the patients in the two risk groups revealed that the median survival time was lower in the high-risk patients (1.48 years) compared to the patients in the low-risk group (2.08 years). Fig. 6 Construction of a prognostic model based on RA-associated signature genes in CC. A : Screening of key prognostic genes using 101 machine learning algorithms. B – D : Kaplan–Meier survival analysis of high- vs. low-risk patients in B the training set, C the validation set, and D the TCGA cohort, demonstrating significant survival differences (log-rank p 0.7) in E the training set, F the validation set, and G the TCGA cohort
Construction of a prognostic model based on RA-associated signature genes in CC. A : Screening of key prognostic genes using 101 machine learning algorithms. B – D : Kaplan–Meier survival analysis of high- vs. low-risk patients in B the training set, C the validation set, and D the TCGA cohort, demonstrating significant survival differences (log-rank p 0.7) in E the training set, F the validation set, and G the TCGA cohort
To investigate the independent prognostic role of the risk score in cervical cancer alongside other clinical characteristics such as age, grade, and stage, we performed univariate and multivariate Cox regression analyses. These analyses revealed that the risk score independently predicted overall survival (OS) in cervical cancer, with hazard ratios of 1.153 (95% CI 1.125–1.181) and 1.161 (95% CI 1.130–1.192), respectively (Fig. 7 A, B ).Nomogram plots were constructed to predict 1-, 3-, and 5 year OS using TCGA data, incorporating risk score, age, grade, and stage as predictive parameters (Fig. 7 C). The nomogram demonstrated high predictive accuracy, with survival rates estimated at 99.7%, 98.2%, and 97.4% for 1-, 3-, and 5 year periods, respectively. Calibration curves for 1-, 3-, and 5 year OS indicated good agreement between predicted and actual outcomes (Fig. 7 D).Furthermore, the risk score exhibited the highest C-index, exceeding 0.9, indicating robust prognostic performance (Fig. 7 E). Decision curve analysis (DCA) curves for 1-, 3-, and 5 year predictions reinforced the utility of the risk score as a prognostic tool (Fig. 7 F and Supplementary Figure. S4).ROC curves integrating clinical characteristics illustrated that the risk score at 1, 3, and 5 years outperformed other variables, achieving approximately 0.9 AUC (Fig. 7 G and Supplementary Fig. S4). A heatmap revealed significant clustering differences in stage, immune score, and stromal score between different risk groups (Fig. 7 H).Analyzing risk score differences across clusters with varying RA-related scores, we observed that the C2 subgroup had lower risk scores (F i g. 7 I). Additionally, a Sankey diagram depicted that C2, characterized by lower risk scores, corresponded to more favorable survival states (Fig. 7 J). Fig. 7 Independent predictive analysis of the RA-associated prognostic model in cervical cancer. A – B : Cox regression analyses demonstrating the independent prognostic value of the risk score: A univariate and B multivariate analyses (incorporating risk score, tumor stage, grade, and age). C : Nomogram integrating risk score with clinical characteristics for survival prediction. D : Calibration curves showing agreement between predicted and observed 1-, 3-, and 5 year survival probabilities. E : C-index comparison of risk score versus other clinical features over time. F : Decision curve analysis (DCA) evaluating clinical utility. G : ROC curve analysis of model performance. H : Heatmap integrating multi-omics data with risk stratification. I : Violin plot comparing risk scores across molecular subtypes. J : Sankey diagram visualizing relationships between survival status, molecular subtypes, and risk groups
Independent predictive analysis of the RA-associated prognostic model in cervical cancer. A – B : Cox regression analyses demonstrating the independent prognostic value of the risk score: A univariate and B multivariate analyses (incorporating risk score, tumor stage, grade, and age). C : Nomogram integrating risk score with clinical characteristics for survival prediction. D : Calibration curves showing agreement between predicted and observed 1-, 3-, and 5 year survival probabilities. E : C-index comparison of risk score versus other clinical features over time. F : Decision curve analysis (DCA) evaluating clinical utility. G : ROC curve analysis of model performance. H : Heatmap integrating multi-omics data with risk stratification. I : Violin plot comparing risk scores across molecular subtypes. J : Sankey diagram visualizing relationships between survival status, molecular subtypes, and risk groups
In the low-risk group, higher gene expression was observed in most of the HIL families (Fig. 8 A). Various methods, including ssGSEA, were employed to enrich and analyze immune cells and their functions. The analysis indicated significant enrichment of almost all immune functions (except type II IFN response) in the low-risk group (Fig. 8 B). Figure 8 C illustrates that nearly all immune cells were infiltrated more prominently in the low-risk group compared to the high-risk group. Immune scoring also demonstrated higher scores for immune and stromal components in the low-risk group (Fig. 8 D). Additionally, results from the CIBERSORT algorithm revealed extensive infiltration of various immune cells, particularly CD8 T cells, in the low-risk group (Fig. 8 E). Fig. 8 Immune correlation analysis of the risk score model in cervical cancer. A : Boxplot showing differential expression of HLA-related genes between high- and low-risk groups, with most genes significantly upregulated in the low-risk group. B : Functional analysis of immune activity, revealing enhanced immune functions (e.g., CCR signaling) in the low-risk group. C : ssGSEA analysis demonstrating distinct immune cell infiltration patterns between risk groups. D : Comparative analysis of immune scores (e.g., ESTIMATE, stromal, immune scores) across risk groups. E : CIBERSORT analysis quantifying differential infiltration levels of specific immune cell subsets between risk groups
Immune correlation analysis of the risk score model in cervical cancer. A : Boxplot showing differential expression of HLA-related genes between high- and low-risk groups, with most genes significantly upregulated in the low-risk group. B : Functional analysis of immune activity, revealing enhanced immune functions (e.g., CCR signaling) in the low-risk group. C : ssGSEA analysis demonstrating distinct immune cell infiltration patterns between risk groups. D : Comparative analysis of immune scores (e.g., ESTIMATE, stromal, immune scores) across risk groups. E : CIBERSORT analysis quantifying differential infiltration levels of specific immune cell subsets between risk groups
Immunotherapy is emerging as a treatment modality for advanced cervical cancer patients. Our analysis of TIDE results revealed a higher immune response rate in the low-risk group (41%) compared to the high-risk group (33%) (Fig. 9 A–B). In addition, the low-risk group showed lower tumor immune dysfunction and exclusion scores, whereas dysfunction scores and interferon gamma scores exhibited reverse trends (Fig. 9 C).Given the potential efficacy of immune checkpoint inhibitors (ICIs) in blocking CTLA4/PD-1 interactions to treat certain tumors, IPS scores were used to assess therapeutic potential based on IFNG expression levels. Notably, patients in the low-risk group demonstrated higher IPS scores across multiple assessments (Fig. 9 D).Fig. 9 E highlights significant differences in expression of checkpoint genes between the two groups, with higher expression observed in the low-risk group. This includes several well-known immunotherapeutic targets such as PDCD1 (PD-1), CD274 (programmed death ligand 1, PD-L1), and CTLA4. Fig. 9 Evaluation of immunotherapy response in patients with different risk scores. A – C : Analysis of Tumor Immune Dysfunction and Exclusion (TIDE) scores: A Bar plot comparing immunotherapy response rates between high- and low-risk groups; B Boxplot showing risk score distribution in responders vs. non-responders; C Violin plot comparing TIDE and related functional scores (e.g., dysfunction, exclusion) between risk groups. D : Violin plot of Immunophenoscore (IPS) analysis, revealing significantly higher scores in low-risk patients (p < 0.01). E : Boxplot of immune checkpoint inhibitor-related genes (e.g., PD-1, CTLA-4, LAG-3), with most genes upregulated in the low-risk group
Evaluation of immunotherapy response in patients with different risk scores. A – C : Analysis of Tumor Immune Dysfunction and Exclusion (TIDE) scores: A Bar plot comparing immunotherapy response rates between high- and low-risk groups; B Boxplot showing risk score distribution in responders vs. non-responders; C Violin plot comparing TIDE and related functional scores (e.g., dysfunction, exclusion) between risk groups. D : Violin plot of Immunophenoscore (IPS) analysis, revealing significantly higher scores in low-risk patients (p < 0.01). E : Boxplot of immune checkpoint inhibitor-related genes (e.g., PD-1, CTLA-4, LAG-3), with most genes upregulated in the low-risk group
Tumor mutations play a critical role in affecting patient prognosis. As depicted in Fig. 10 A and B , the mutation frequency was higher in the low-risk group (89.36%) compared to the high-risk group (79.72%). Analysis of mutated genes between these subgroups revealed consistency in the top five mutated genes across both groups.Comparing tumor mutation burden (TMB) values between different risk scores, we observed that TMB values tended to be higher in patients with low-risk scores, although this difference did not reach statistical significance (Supplementary Fig. S5). Additionally, Fig. 10 C illustrates that patients in the high TMB group exhibited better survival rates compared to those in the low TMB group.Furthermore, Fig. 10 D indicates that patients in the high-risk and low-TMB groups had the poorest prognosis. Fig. 10 Tumor mutation burden (TMB) analysis in cervical cancer risk groups. A – B : Waterfall plots depicting mutation profiles of high-risk A and low-risk B patients, with low-risk patients showing higher overall mutation rates. C – D : Kaplan–Meier survival curves stratified by TMB levels (high vs low) and risk groups, demonstrating superior survival in high-TMB/low-risk patients
Tumor mutation burden (TMB) analysis in cervical cancer risk groups. A – B : Waterfall plots depicting mutation profiles of high-risk A and low-risk B patients, with low-risk patients showing higher overall mutation rates. C – D : Kaplan–Meier survival curves stratified by TMB levels (high vs low) and risk groups, demonstrating superior survival in high-TMB/low-risk patients
In the IMvigor-210 cohort, we compared the restricted mean survival (RMS) at 6 and 12 months between the two groups to account for the delayed clinical effect of immunotherapy, and assessed the difference in long-term survival after 3 months of treatment ( p < 0.05; Fig. 11 A–B). The results indicated that patients in the low-risk group had better prognosis, suggesting a greater benefit from immunotherapy. Furthermore, the distribution of risk scores among patients with different levels of treatment response showed that the risk scores were significantly lower in the response group (complete remission [ 24 ]/partial remission [PR]) compared to the nonresponse group (progressive disease [PD]/stable disease [SD]) ( p < 0.05; Fig. 11 C).Subsequently, we validated these findings across multiple immunotherapy validation cohorts with prognostic information. Consistently, the low-risk groups demonstrated better prognostic outcomes in post-immunotherapy populations in GSE78220 ( p < 0.001) (Fig. 11 D), GSE135222 ( p < 0.001) (Fig. 11 E), and tended to have better immunotherapy outcomes in GSE91061 ( p = 0.018; Fig. 11 F). Finally, we compared the IC50 values of drugs such as cisplatin, paclitaxel, and niraparib, and consistently found lower IC50 values in the low-risk group, suggesting increased sensitivity to these drugs (Fig. 11 G–K). Fig. 11 Multi-dataset validation of risk score associations with immunotherapy and chemotherapy response. A – B : Immunotherapy survival analysis. C : Boxplot of risk scores in patients with differential immune response status (responders vs. non-responders, p < 0.001). D – E : Prognostic relevance of risk scores. Figure 11F: Phase diagram comparing risk scores between immunotherapy-sensitive and -resistant patients (p < 0.001). G – K : Chemotherapeutic drug sensitivity analysis: IC50 values for G cisplatin, H paclitaxel, I 5-FU, J doxorubicin, and K gemcitabine, with low-risk patients showing significantly lower IC 50 values (p < 0.05 for all)
Multi-dataset validation of risk score associations with immunotherapy and chemotherapy response. A – B : Immunotherapy survival analysis. C : Boxplot of risk scores in patients with differential immune response status (responders vs. non-responders, p < 0.001). D – E : Prognostic relevance of risk scores. Figure 11F: Phase diagram comparing risk scores between immunotherapy-sensitive and -resistant patients (p < 0.001). G – K : Chemotherapeutic drug sensitivity analysis: IC50 values for G cisplatin, H paclitaxel, I 5-FU, J doxorubicin, and K gemcitabine, with low-risk patients showing significantly lower IC 50 values (p < 0.05 for all)
Figure 12 A results showed high expression of SERINC2, DSG2, DENND2D, and FDPS in TCGA cervical cancer data, whereas the expression levels of ESD, NRP1, and ITM2A were relatively low. Then the same trend was validated for cervical cancer cell lines (Fig. 12 B). In addition, in clinical tissues, we found that the differences in other genes were consistent with the above results, except for FDPS, which showed no significant differences (Fig. 12 C). Validation using the GEPIA 2 online database confirmed high expression of SERINC2, DSG2, DENND2D, and FDPS in cervical cancer, whereas ESD, NRP1 and ITM2A showed contrasting lower expression levels (Fig. 12 D). Consistent results were further validated using the HPA protein database (Fig. S6). Fig. 12 Experimental and bioinformatic validation of model genes. A : Differential expression analysis of model genes in TCGA cervical cancer cohort. B : qPCR validation in cervical cancer cell lines (HeLa, SiHa) versus normal cervical epithelial cells, showing consistent overexpression of model genes. C : Tissue-level validation via qRT-PCR in 10 paired clinical samples. D : Cross-platform validation using an independent online database to demonstrate abnormal expression of cervical cancer model genes
Experimental and bioinformatic validation of model genes. A : Differential expression analysis of model genes in TCGA cervical cancer cohort. B : qPCR validation in cervical cancer cell lines (HeLa, SiHa) versus normal cervical epithelial cells, showing consistent overexpression of model genes. C : Tissue-level validation via qRT-PCR in 10 paired clinical samples. D : Cross-platform validation using an independent online database to demonstrate abnormal expression of cervical cancer model genes
Discussion
Cervical cancer ranks fourth in cancer-related mortality worldwide and is the fourth most common cancer among women globally [ 25 ]. It is predominantly caused by persistent Human Papilloma Virus (HPV) infections [ 26 ]. In recent years, increased risks of HPV infection and cervical dysplasia have been reported in patients with systemic inflammatory diseases, including rheumatoid arthritis (RA) [ 27 ]. However, the genetic covariation between RA and cervical cancer remains unclear. Our study aims to investigate and uncover potential genetic connections between cervical cancer and RA, shedding light on their potential associations at the genetic level.
MR provides an alternative method to probe causality in epidemiological research by utilizing genetic variants presumed to satisfy instrumental variable (IV) assumptions [ 28 ]. After conducting sensitivity analyses such as the Multiple Validity Test [ 29 ] and the Cochrane Q-test [ 30 ], MR analysis suggested an increased risk of cervical cancer in patients with RA. However, further studies are needed to explore the relationship between gene-level changes and cervical cancer in RA patients.
Therefore, we conducted further functional enrichment analysis on RA-related differential genes to explore their potential biological functions. A total of 43 RA-related genes were found to be up-regulated, while 30 RA-related genes were down-regulated. Using multiple scoring modalities for RA-associated differential genes in single-cell scoring of cervical cancer, we observed that RA-associated scores were predominantly enriched in NK cells. Cellular communication analysis suggested that NK cells exhibit strong signaling interactions with fibroblasts and epithelial cells. Some studies have indicated that NK cells play a crucial role in recognizing and eliminating cervical tumor cells as well as cells infected by HPV virus [ 24 ]. NK cells can also influence the expression of E6 and E7 and regulate interferon production via IL-8 [ 31 , 32 ]. Therefore, we speculate that the increased risk of cervical carcinogenesis in RA patients may be attributed to alterations in NK cells. The genetic interplay between autoimmune diseases and cervical cancer extends beyond RA. Systemic lupus erythematosus (SLE), another chronic autoimmune condition, has been associated with altered HPV clearance mechanisms and increased cervical cancer risk [ 33 ]. Unlike RA, which primarily involves Th1-mediated responses, SLE is characterized by dysregulated B-cell activity and type I interferon signatures [ 34 ]. This distinction suggests that RA-specific mechanisms might involve unique genetic variants affecting NK cell function and T-cell regulation, rather than the humoral immunity alterations observed in SLE [ 35 ].
We further conducted a prognostic analysis based on the differential genes associated with RA-related scores, dividing cervical cancer patients into two subgroups. The C2 subgroup exhibited higher immune scores, stromal scores, and more abundant immune cell infiltration. Importantly, patients in the C2 subgroup showed better prognosis. It is well-established that immune cell infiltration in the tumor microenvironment closely correlates with patient prognosis [ 36 ]. Collectively, these findings suggest that the heightened susceptibility to cervical cancer in RA patients might be linked to alterations in their immune cell profiles.
Next, we conducted prognostic modeling of RA score-related genes using 101 combinations of 10 machine learning algorithms. Cervical cancer patients were stratified into high-risk and low-risk groups based on their risk scores. Analysis of patients in different risk groups revealed that those in the low-risk group exhibited better prognosis and richer immune cell infiltration. Furthermore, HLA-related genes and most immune checkpoint inhibitor genes were expressed at higher levels in the low-risk group, which also showed lower TIDE scores. Higher TME scores in the low-risk group suggested a greater potential for triggering an immune response by the tumor [ 37 ]. The discovery of immune checkpoints such as CTLA-4 and PD-1 has revolutionized cancer immunotherapy, particularly in solid tumors [ 38 ]. Encouragingly, anti-CTLA-4 and anti-PD-1 immune checkpoint inhibitors have demonstrated significant efficacy in cervical cancer [ 39 ]. Regulatory T cells' immunosuppressive role contributes to immune evasion in cervical cancer [ 40 ]. Recent studies have implicated γδ T cells producing IL-17a in increased angiogenesis during the development of HPV-induced squamous cell carcinoma [ 41 ]. In the context of RA, overexpression of IL-18 and a relative deficiency of IL-18 binding protein levels are associated with disease pathogenesis [ 42 ]. IL-18, produced by various immune and non-immune cells, aberrantly activates CD4 + T cells, primarily by inducing IFN-γ expression in CD4 + and CD8 + T cells [ 43 , 44 ]. Elevated tumor mutation burden (TMB) can activate immune cells, potentially enhancing responses to immunotherapy [ 45 ]. Combining TMB with risk scores revealed higher TMB levels in the low-risk group. Comparative analysis of overall survival across TMB subgroups indicated better prognosis in patients with high TMB and low-risk scores, consistent with previous findings [ 46 ]. Therefore, we propose that the heightened risk of cervical cancer and poorer prognosis in RA patients primarily occur through immune pathways, influencing microenvironmental changes that promote immune evasion and related mechanisms.
The therapeutic landscape of RA, particularly the widespread use of TNF-α inhibitors, may influence cervical cancer risk through immunomodulatory mechanisms. TNF-α plays a dual role in carcinogenesis, exhibiting both tumor-suppressive and tumor-promoting effects depending on the microenvironment [ 47 ]. While TNF-α inhibitors effectively control RA disease activity, their impact on cervical cancer risk remains controversial. Some studies suggest that long-term TNF-α inhibition might impair immune surveillance against HPV-infected cells, potentially increasing cervical cancer risk [ 48 ]. Conversely, other reports indicate that proper control of systemic inflammation might reduce cancer-promoting cytokine cascades [ 49 ]. This dichotomy underscores the need for careful monitoring of RA patients receiving biologic therapies, particularly those with persistent HPV infection.
There are several limitations to this study. First, the database utilized in this research originated from the European region, there may be racial specificity leading to errors in the results, necessitating further clinical validation to ascertain the generalizability of the findings to other populations and ethnicities. Second, the source of single-cell data we applied was only five cases, in which there may be errors in the results due to the small amount of sample data. Finally, due to limited funding, our study could not validate findings on cellular or tissue levels, nor did it validate immune cell markers. It has to be acknowledged that we should follow up with studies targeting the mechanism of action of rheumatoid arthritis affecting cervical carcinogenesis through NK cells. While our study identifies potential genetic connections between RA and cervical cancer and highlights the role of immune pathways, particularly NK cells, in mediating this association, it is important to interpret these findings with caution. The modest effect sizes observed in our MR analyses (e.g., an odds ratio of 1.15) suggest that the genetic contribution to cervical cancer risk in RA patients, while statistically significant, may be small in magnitude. Such modest effects warrant careful consideration, as they could reflect statistical artifacts or confounding factors rather than biologically meaningful associations. Additionally, the gene signatures identified in this study, though informative, require further prospective and functional validation to confirm their relevance in clinical settings.