Establishing a Prognostic Model in Prostate Adenocarcinoma through Comprehensive scRNA-Seq and Bulk RNA-Seq Analysis and Validation | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Establishing a Prognostic Model in Prostate Adenocarcinoma through Comprehensive scRNA-Seq and Bulk RNA-Seq Analysis and Validation Lin Hao, Xiangqiu Chen, Qingchun Zhou, Tao Wu, Zhiqiang Wen, Ziliang Ji, and 3 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-3912322/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Background The prognostic management of Prostate adenocarcinoma (PRAD) presents a considerable challenge to healthcare professionals. However, it fails to accurately capture the fundamental cellular and molecular functions within tumor cells. Methods The data for PRAD scRNA-seq were retrieved from the Gene Expression Omnibus (GEO) database. The limma program was utilized to identify differentially expressed genes (DEGs) in PRAD patients that exert an influence on overall survival (OS). For the identification of key modules associated with PRAD, Weighted Gene Correlation Network Analysis (WGCNA) was employed. The intersection of core cell marker genes, PRAD key module genes, and DEGs was utilized to build a predictive model using univariate Cox and Least Absolute Shrinkage and Selection Operator (LASSO) analyses. Furthermore, we conducted experimental validation by collecting patient samples. Results Analysis of 162,897 scRNA-seq datasets and identified 7 central cell types. From the scRNA-seq dataset, 1805 marker genes were identified, while the bulk RNA-seq dataset yielded 1086 DEGs. Additionally, 2545 genes were linked to a key module identified through WGCNA. A predictive model was derived from the expression levels of 21 signature genes following intersection, univariate Cox, and LASSO analyses. And we confirmed the accuracy of our analysis through the patient samples we collected. Conclusion This study developed a unique prognosis-predictive model to predict the survival condition of individuals with PRAD through the integration of scRNA-seq and bulk RNA-seq data. The risk score emerges as a potential independent predictive indicator, demonstrating a strong relationship with the immunological microenvironment. Prostate adenocarcinoma Prognosis Immune landscape Bulk RNA-seq scRNA-seq Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Introduction Prostate adenocarcinoma (PRAD) is the predominant malignancy found in males, with its incidence strongly correlated with advancing age[ 1 ]. Radical prostatectomy or radiation therapy are the principal curative treatment options for individuals with prostate cancer. Nevertheless, it is estimated that around 20–30% of individuals would experience biochemical recurrence after radical prostatectomy, which represents the early occurrence of recurrence and metastasis[ 2 , 3 ]. Currently, the effectiveness of predicting patient survival risk has been demonstrated by using many factors, including serum prostate-specific antigen levels[ 4 ], tumor pathological stage, and Gleason score[ 5 ]. The survival rate of individuals with PARD is higher in the presence of tumors, and the progression of the tumor is related to the quality of life of patients. Hence, the prioritization lies in accurately predicting the potential progression of the tumor. Because of the rapid advancement of cancer genetics, Bulk transcriptome sequencing (bulk RNA-seq) has been a prominent tool in transcriptomics in recent decades[ 6 ]. Genetic changes are increasingly being found as effective therapy targets for PRAD. Wu et al. , for example, discovered that targeting the HIC1/TGF-axis-shaped prostate cancer microenvironment slows its growth[ 7 ]. In addition, according to Hao et al. , the deletion of HIC1 has been found to facilitate the spread of prostate cancer through the initiation of epithelial-mesenchymal transition[ 8 ]. Bulk RNA-seq enhances the comprehension of disease. However, bulk RNA-seq studies only examine the average gene expression in cell groups. Single-cell RNA-seq (scRNA-seq) now provides access to underlying gene expression distributions and clarifies information on cellular transcriptome heterogeneity[ 9 ]. With scRNA-seq, it becomes possible to develop potentially beneficial personalized therapeutic approaches for diagnosing cancer throughout its progression. In the study conducted by Yu et al. , single-cell omics traces the heterogeneity among prostate cancer cells and the tumor microenvironment, improving our understanding of tumor progression mechanisms[ 10 ]. Chen et al. also utilized scRNA-seq to determine the correlation between IDH1 expression and the degree of immune cell infiltration, particularly involving immunosuppressive cells like CD8 + T-cells, CD4 + T-cells, and macrophages[ 11 ]. In light of this advantage, multiple studies have prioritized the identification of possible biological markers for PRAD by integrating bulk RNA-seq and scRNA-seq analyses to effectively stratify and discern patients. This study employed scRNA-seq and bulk RNA-seq data to perform comprehensive bioinformatics analysis in order to develop a predictive model for patients with PRAD. Simultaneously, the study delineated the immune infiltration landscape and ascertained its role in the pathogenesis of PRAD. Furthermore, the relationship between risk models and infiltrating immune cells was investigated to acquire a deeper comprehension of the underlying molecular immunological mechanisms involved in the evolution of PRAD. In conclusion, this research offers original findings that have the potential to enhance the clinical approach to managing PRAD. Materials and methods The collection and analysis of data Data from bulk RNA-seq, clinical records, and SNP mutation sites of TCGA-PRAD were acquired from the TCGA database ( https://portal.gdc.cancer.gov/ ). The dataset encompassed 52 samples of normal tissue and 492 samples of PRAD, all of which contain detailed survival and clinical data. The scRNA sequencing datasets PRAD_GSE137829, PRAD_GSE141445, and PRAD_GSE172301 were utilized in the study. The study employed scRNA-seq to examine cancer tissues obtained from 32 patients with PRAD. These tissues were sourced from the TISCH database ( http://tisch.comp-genomics.org/gallery/ ). Detailed patient information and sequencing statistics are available in Table 1. The integration of samples was performed using the anchors method, a technique implemented in the R "Seurat"[ 12 ]. Core cells were subsequently created through a filtering step applied to the scRNA-seq data. Unqualified cells refer to genes that exhibit detectability exclusively in three or fewer cells, as well as in less than 200 cells of low quality. Genes identified under these criteria were omitted from subsequent analysis. The analysis of gene expression in core cells involved utilizing a standardized linear regression model, followed by the application of ANOVA to identify the top 2000 genes exhibiting highly variable fingerprints. Principal component analysis (PCA) was conducted on individual cell samples, identifying the first 20 principal components (PCs). These components were then chosen for further study. The UMAP algorithm was employed to comprehensively analyze dimensionality reduction on the initial 20 principal component pairs of the samples[ 13 ]. R "singleR" was employed to utilize reference datasets such as HumanPrimaryCellAtlasData, BlueprintEncodeData, and ImmuneCellExpressionData for auxiliary annotation[ 14 ]. Additionally, the CellMarker database and prior research were consulted to identify marker genes for the manual annotation of distinct clusters[ 15 ]. The visualization process was facilitated by utilizing the R "SCP" ( https://github.com/zhanghao-njmu/SCP ). Analysis process and experimental design are shown in the flowchart. Gene enrichment analysis based on single-cell marker genes The Seurat package's FindAllMarkers function was utilized to locate marker genes for each cluster, with parameters set as min.pct = 0.2 and only. The Wilcoxon rank sum test was performed to identify differentially expressed genes (DEGs) and screen marker genes when pos = TRUE. Using "clusterProfler" in R software[ 16 ], core cells of marker genes underwent enrichment analysis for Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) functions, respectively. 'Monocle 2' was utilized to infer the developmental trajectory of cells to elucidate the underlying molecular causes of PRAD progression [ 17 ]. The "cellchat" algorithm was employed to explore potential cell-to-cell interactions, while "nichenet" was utilized to derive the regulatory links between ligands and immune cell and stromal cell target genes[ 18 , 19 ]. DEG identification and functional enrichment analysis in the TCGA-PRAD A total of 52 control and 492 PRAD patient data were subjected to a difference analysis utilizing the limma package. The criteria of P 1 were utilized to identify genes as DEG. The visualization of the heat maps and volcano plots of DEGs was carried out utilizing the ggplot2 and pheatmap packages, respectively. The DEGs were subsequently analyzed by KEGG and GO using the online instrument Metascope[ 20 ]. WGCNA analysis The "goodSamplesGenes" function, a component of the R "WGCNA", was employed to assess the necessity of gene filtering within a sample and to determine an optimal soft threshold[ 21 ]. The analysis involved the establishment of a co-expression network to examine the link between the genes associated with the module (ME) and PRAD. The development of a prognosis-predictive model The identification of candidate genes involved the identification of pairwise intersecting genes from three sources: marker genes of core cells, DEGs associated with PRAD, and module genes. These candidate genes were subsequently used in the least absolute shrinkage and selection operator (LASSO) regression analysis, which was executed using the R "glmnet". The aim of the LASSO regression was to minimize the number of genes in the final risk model, thereby screening for characteristic genes associated with prognosis [ 22 ]. In order to decrease the number of genes in the final risk model, variables with P -values 0.05 were subjected to the LASSO regression analysis, which was executed utilizing the "glmnet" R package. The prognosis-predictive model was developed employing the subsequent formula: risk score = gene exp11 + gene exp22+...+gene expression (where "gene expression" represents the gene expression value and represents the corresponding LASSO regression coefficient). The R programs "survminer" and "ggrisk" were employed to visualize the survival curves and risk maps of patients. Additionally, the "survROC" program was utilized to plot ROC curves, assessing the performance of risk scores in predicting overall survival (OS) at 1, 3, and 5 years among individuals with PRAD. Independent prognostication Univariate analysis was performed to examine the predictive significance of risk models and clinical parameters (age, T stage, M stage, N stage, riskScore). In contrast, multivariable Cox analysis of OS was utilized to determine independent risk factors and predict new events in PRAD. Utilizing the "cph" function in R, a nomogram model was plotted to visualize this predictive model and predict potential new events for patients at 1, 3, and 5 years. The validity of the bar chart was validated through regression analysis. GSEA and GSVA enrichment analysis To investigate functional variations and related pathways between the high-risk and low-risk groups, a GSEA was performed utilizing the clusterProfler package on all genes in the TCGA samples from both groups [ 23 ]. A collection of 50 human cancer marker pathway genes was obtained from the Molecular Signature Database (MSigDB) ( http://www.gsea-msigdb.org/gsea/index.js ). Subsequently, GSVA was performed on all genes from samples in both groups, and the variations in GSVA scores between both risk samples were assessed utilizing the "limma" [ 24 , 25 ]. Analysis of immune microenvironment The ssGSEA, implemented through the R "gsva", was performed for all samples within the TCGA-PRAD dataset to derive profiles of 28 immune cell infiltrations. This was done to facilitate the comparison of immune cell characteristics between both risk groups. Pearson's correlation coefficient was employed to examine the link between risk score and immune infiltrating cells. The R "maftools" was utilized to analyze the mutational variations between both risk groups[ 26 ]. Patient sample collection A total of 12 kidney tissue samples were collected, including 6 samples from normal individuals and 6 samples from PRAD patients. All patient samples were obtained from the Department of Urology, Shenzhen Hospital, Southern Medical University, Shenzhen, and approved by the Medical Ethics Committee of Southern Medical University. RNAiso Plus (TaKaRa), was employed to extract total RNA from lung tissues. Reverse transcription was conducted using an RNA inversion kit (TaKaRa). Actin was utilized as an internal control, and the relative expression of genes was assessed using ΔΔCt quantification, expressed as fold-changes compared to the normal controls. Normal kidney tissue and PRAD patient kidney tissue were stained and analyzed via immunofluorescence microscopy utilizing a Zeiss microscope. Results Identification of PRAD cell subtypes The fundamental framework of the current research is depicted in Figure S1 . Initially, a collection of scRNA-seq samples was obtained from a database consisting of 32 patients diagnosed with PRAD. Subsequently, a rigorous screening process was conducted to eliminate cells that did not meet the predetermined criteria. As a result, a total of 162,897 cells were successfully retrieved and deemed suitable for subsequent investigations. PCA clustering was carried out using single-cell samples. Concurrently, in the context of PCA, 20 PCs with a P -value of 0.05 were subjected to further study. The UMAP approach was employed to partition the cells, identifying 34 distinct cell clusters (Figure. 1. a). The annotation of unique cell clusters was executed using the "singleR" package marker genes, the CellMarker database, and relevant references, generating seven cell clusters. Bubble charts (Fig. 1 . b) were utilized to visually represent the expression of critical marker genes for each cell type. These charts were employed to examine the expression patterns of specific cells. The cumulative expression levels of each identified marker gene were aggregated to illustrate the precision of cell type identification. The cell types present include fibroblasts, endothelial cells, epithelial cells, monocyte macrophages, B cells, T cells, and mast cells (Fig. 1 . c). Additionally, the UMAP graph (Fig. 1 . d) illustrates the clustering of several samples. Functional enrichment study of marker genes The gene markers associated with specific cell types exhibited a remarkable enrichment for GO and KEGG activities. It was shown that marker genes pertaining to all seven cell types had associations with cell adhesion and migration (Figure. 1. e-g). Moreover, the expression of B cell marker genes was linked to the DNA replication process. Most marker genes found in other cells were associated with traits like resistance to infection, tumor development, and cellular adhesion (Figure. 1.h). Each annotated cell was subjected to a pseudo-time analysis, and the Monocle 2 technique was used to evaluate their differentiation track. The findings of the study revealed that PRAD cells underwent a gradual differentiation process with three paths. Immune cells, in particular, preferentially differentiated in the later phases, whereas non-immune cells primarily differentiated in the earlier stages (Figure. 1. i). Furthermore, the usage of ligand-receptors is expected to facilitate intercellular communication. The data showed that fibroblasts, endothelial cells, and epithelial cells communicated more often (Figure. 1. j). Furthermore, our analysis of the regulatory network for ligands and target genes indicated that 28 ligands were selectively activated in immune cells (Figure. S1). A heat map was generated to visualize the distribution of 28 ligands on immune cells and their impact on the regulation of target genes in non-immune cells. (Figure S1 ). Moreover, it was observed that the MIF cell signaling pathway exhibited significant activity in facilitating intercellular communication. The major ligand pairs involved in the MIF signaling pathway are CD74-CXCR4 and CD74-CD44 (Figure. 1. k). Non-immune cells are primarily responsible for generating MIF signals, while immune cells consist of signal receptors and influences (Figure. 1. l, m). The MIF signaling pathway is recognized for its significant role in the development and advancement of various tumor types, underscoring its potential significance in the context of prostate cancer. Identification and functional enrichment analysis of DEGs in bulk RNA-seq data In total, 1086 significant DEGs were identified. The distribution of DEGs was visually represented using volcano plots and heatmaps (Fig. 2 . a, b). The GO analysis highlighted the primary enrichment of DEGs in NABA MATRISOME ASSOCIATED and MITOTIC CELL CYCLE PROCESS (Fig. 2 . c, d). Additionally, the analysis of transcription factor enrichment predominantly emphasized HSD17B (Figure. 2. e) Identification of key modules related to PRAD WGCNA was utilized to identify participants in the progression and development of PRAD. While constructing the co-expression network, the power was set to 5 when the fit index of the soft-threshold scale-free topology attained 0.80 (Figure. 3. a, b). In order to integrate a similar module tree algorithm through dynamic shear analysis, MEDissTres was set to 0.2. Following the merging process, ten modules were obtained (Figure. 3. c), and a correlation heatmap depicting the relationships between these modules was generated (Fig. 3 .d). As per the correlation coefficient and P -value, MEbrown and MEblue (consisting of 2545 genes) were chosen as the key modules (Figure. 3. e). A scatter plot was generated to illustrate the clinical relevance of the brown and blue modules (Figure. 3. f). Construction of a three-characteristic gene-based model Using Ven plots, the intersection of single-cell sequence marker genes, PRAD module genes, and DEGs of cell subtypes was demonstrated. This analysis resulted in the identification of a total of 12 intersecting genes that were designated as candidate genes. To ensure the precision of model prediction, consideration was given to the collective set of intersecting genes, amounting to 425 genes (Figure. 4. a). Subsequently, the genes underwent screening for the purpose of model development utilizing the LASSO algorithm. The findings are depicted in (Figure. 4. b, c). The Kaplan-Meier indicated a statistically significant difference in disease-free survival (DFS) between individuals with high-risk and low-risk scores (Figure. 4. d, e). To enhance the credibility of the risk model, an assessment was conducted by calculating the Receiver Operating Characteristic (ROC) curve for OS. The Area Under the Curve (AUC) values at 1, 3, and 5 years were found to be larger than 0.80, suggesting a higher level of effectiveness of the risk model (Figure. 4. f). In conclusion, our prognostic model demonstrated outstanding predictive accuracy for PRAD. Evaluation of independent prognosis-predictive factors and nomogram development The study employed univariate and multivariate Cox analyses to determine independent predictive factors to assess clinical features and risk scores. RiskScore demonstrated superior predictive capability compared to the T, M, and N stages. (Figure. 4.g, h). The nomogram model (Figure. 4. i) incorporated the independent prognostic variables. Furthermore, the calibration curve demonstrated the substantial predictive efficacy of the model (Figure. 4. j). The accuracy of 1-, 3-, and 5-year predictions for was assessed using the Net benefit strategy (Figure. 4. k-m). Consequently, the findings of the study revealed that risk score served as an independent prognosis-predictive factor and that the nomogram exhibited a high predictive capability for DFS in individuals with PRAD. GSEA between high‑ and low-risk groups To explore the influence of both risk groups on cancer advancement, GSEA was conducted to identify the most significant enrichment pathways distinguishing between the two. The findings indicated that the high-risk group exhibited significant enrichment in immunological processes, including TNF-α and Interferon-α pathways, whereas the low-risk group exhibited enrichment primarily in androgen and acid metabolism pathways (Figure. 5. a-c). Furthermore, GSVA was performed on all genes in both risk groups. The findings indicated that the high-risk group exhibited significant expression in processes related to cell communication, cyclase regulation, and cyclic nucleotide phosphodiesterase. Moreover, the low-risk group exhibited activation in the areas of tyrosine kinase, biosynthesis, and metabolism (Figure. 5. d). Immune infiltration analysis The ssGSEA methodology was utilized to calculate infiltration scores for 28 distinct immune cell populations across various risk groups. The findings unveiled statistically significant differences in infiltration levels among 17 immune cell species (Figure. 5. e). The heatmap and Pearson correlation analysis demonstrated a significant association between prognosis-predictive genes and the risk score with infiltrating immune cells (Figure. 5. f, g). It was discovered that missense mutations and SNPs dominated the PRAD patients (Figure. 5. h). The mutation analysis outcomes between both risk groups revealed that the majority of mutation types in both groups were missense mutations. The high-risk group exhibited a larger percentage of mutations in comparison to the low-risk group (Figure. 5. i, j). Overall, the findings of this study highlighted that predicted genes, immune cell infiltration, and site mutations are relevant in PRAD. Experimental validation of the analysis results In order to validate the accuracy of our analysis, we collected kidney samples from normal individuals and patients with PRAD. We found that genes with high expression in PRAD patients with high riskscore included NDUFA4L2, GDPD3, ITGAX, and FAM167B (Figure. 6. a), which is consistent with our previous analysis results. Based on the clinical information of the patients, we divided the PRAD patients into stage I and stage III. By quantifying the expression levels of these genes through immunohistochemistry and immunofluorescence(Figure. 6. b-d), we found that the prognostic model can well reflect the different stages of tumor patients and has a high degree of accuracy. Discussion PRAD is one of the most frequent cancers globally, and its prevalence is increasing in many nations[ 27 ]. Despite recent efforts to improve PRAD care, the varied and aggressive aspects of PRAD remain limited for prognostic evaluation[ 28 ]. Thus, the screening of novel biomarkers to aid in the development of patient-specific therapies and enhance prognosis remains crucial and imperative. Unlike bulk RNA-seq, which provides insights into the average gene expression levels across cell populations, scRNA-seq has emerged as a valuable technique for transcriptional stratification. By delineating cell subpopulations and identifying specific biomarkers, scRNA-seq enables the characterization of heterogeneity among distinct cell types in multiple cancer types, including PRAD. Consequently, this investigation executed a comprehensive analysis of both bulk RNA-seq and scRNA-seq data, culminating in the development of a risk model that exhibits outstanding prognosis-predictive value in the context of PRAD. Initially, an in-depth investigation was conducted using scRNA-seq data comprising 162,897 cells to identify seven crucial cell types present in the PRAD tumor. These cell types included epithelial cells, fibroblasts, endothelial cells, T cells, monocyte macrophages, B cells, and mast cells. Remarkably, a high cellular communication frequency was observed among these cell types. Notably, our analysis detected the substantial activation of the MIF signaling pathway during the progression of PRAD. Interestingly, similar findings have been reported in colorectal cancer (CRC), where the MIF/CD74 pathway has been implicated in regulating disease progression[ 29 ]. In PRAD, it was further revealed that the CD74-CXCR4 axis plays a pivotal role in this abnormally active ligand-receptor interaction within the MIF pathway. These findings suggested a potential parallel between CRC and PRAD, whereby CD74 may be critically involved in promoting the onset and advancement of PRAD. The functional analysis of the DEGs identified in the TCGA dataset revealed significant enrichment in the NABA MATRISOME ASSOCIATED signaling pathway and the MITOTIC CELL CYCLE PROCESS signaling pathway[ 30 ]. These pathways are believed to contribute to the proliferation and advancement of PRAD. Prior research has consistently reported alterations in cyclins, TP53, and Rb genes, which are known to be involved in PRAD[ 31 ]. Additionally, the expression of NABA MATRISOME ASSOCIATED has been observed in various diseases, particularly in epithelial cells[ 32 ]. This observation may be associated with the more infiltrative nature of PRAD epithelial cells. Moreover, through the application of WGCNA analysis, the brown and blue modules, comprising 2,120 genes, were identified as key modules. By intersecting the three aforementioned gene sets, a total of 425 candidate genes were collected to enhance the stability of the gene signature. Subsequently, a gene prognosis-predictive model was constructed utilizing univariate Cox regression analysis and the LASSO algorithm[ 33 ]. The ROC curve demonstrated that the model exhibited promising predictive efficacy for prognosis in PRAD. Furthermore, the results of this study indicated that this gene prognostic model functioned as an independent predictive factor for DFS among PRAD patients. Distinguished from other models, our signatures were derived through the integration of diverse datasets and utilizing various algorithms. These signatures underwent rigorous validation using an independent training set to ensure robustness. Remarkably, our model exhibited impressive AUC values ranging from 0.800 to 0.823, further highlighting its superior performance and reliability. Additionally, all samples were categorized into low- and high-risk groups based on the calculated risk score. The observations in this research unveiled that the high-risk group exhibited significant enrichment in immune processes and immune-related pathways. This led to the hypothesis that the risk score could function as a potential prognostic marker for individuals with PRAD undergoing immunotherapy. To further explore this hypothesis, multiple aspects were examined, including tumor mutational burden (TMB), immune infiltration, and associated risk genes. The comprehensive analysis revealed that the high-risk group exhibited higher TMB and remarkably increased levels of immune cell infiltration. These outcomes collectively indicate that the risk score holds promise as a valuable tool for predicting the response to immunotherapy in individuals with PRAD. With utmost consideration given to tumor heterogeneity, interactions among different cell populations, immune infiltration, TMB, and clinical characteristics, the strength of this study lies in the identification and development of a novel predictive model. This model demonstrates exceptional accuracy in distinguishing survival outcomes. The findings obtained from this study provide compelling evidence for the stratification of PRAD patients based on these factors. Nevertheless, this study has a number of unavoidable limitations. Specifically, future research based on these findings should continue to investigate the unclear regulation mechanisms of hallmark genes in PRAD. Conclusions Through the integration of scRNA-seq and bulk RNA-seq data, multiple machine-learning techniques were employed to develop a novel prognosis-predictive model for predicting DFS among individuals with PRAD. This model enables the accurate prediction of DFS probability in PRAD patients. Furthermore, this study highlights the potential of the risk score as an independent prognosis-predictive factor that exhibits a strong correlation with the immune microenvironment. The comprehensive findings of this study establish it as a reliable predictor of efficacy in PRAD, paving the way for future targeted therapeutic approaches for this disease. Declarations Acknowledgments We thank the Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) Database for sharing a large amount of data. Author contributions Hao Lin, Xiangqiu Chen, and Qingchun Zhou contributed equally to the study’s inception and design. Hao Lin performed bioinformatics analyses and assisted with the analysis of other data. Xiangqiu Chen and Qingchun Zhou collect samples and analyze experimental data. Tao Wu, Zhiqiang Wen, and Ziliang Ji provided assistance. Donglin Sun, Qingyou Zheng and Xichun Zheng helped to revise the manuscript. All authors have read and approved the final manuscript. Funding This work was supported by Research Foundation of Shenzhen Hospital of Southern Medical University (project number 22H3AGZR07). Availability of data and materials The analyzed datasets generated during the study are available from the corresponding author upon reasonable request. Ethics approval All clinical trials were authorized by the ethics committees at Shenzhen Hospital, Southern Medical University. Consent for publication Not applicable. Competing interests The authors declare that they have no competing interests. References Sung H, et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin. 2021;71(3):209–49. Lalonde E, et al. Tumour genomic and microenvironmental heterogeneity for integrated prediction of 5-year biochemical recurrence of prostate cancer: a retrospective cohort study. Lancet Oncol. 2014;15(13):1521–32. Shao N, et al. Immunotherapy and endothelin receptor antagonists for treatment of castration-resistant prostate cancer. Int J Cancer. 2013;133(7):1743–50. Fujita K, et al. Serum core-type fucosylated prostate-specific antigen index for the detection of high-risk prostate cancer. Int J Cancer. 2021;148(12):3111–8. Jayaram A, Attard G. Diagnostic Gleason score and castration-resistant prostate cancer. Ann Oncol. 2016;27(6):962–4. Hong M, et al. RNA sequencing: new technologies and applications in cancer research. J Hematol Oncol. 2020;13(1):166. Wu T, et al. Targeting HIC1/TGF-β axis-shaped prostate cancer microenvironment restrains its progression. Cell Death Dis. 2022;13(7):624. Hao M, et al. HIC1 loss promotes prostate cancer metastasis by triggering epithelial-mesenchymal transition. J Pathol. 2017;242(4):409–20. Slovin S, et al. Single-Cell RNA Sequencing Analysis: A Step-by-Step Overview. Methods Mol Biol. 2021;2284:343–65. Yu X, et al. Single-cell omics traces the heterogeneity of prostate cancer cells and the tumor microenvironment. Cell Mol Biol Lett. 2023;28(1):38. Chen Z, et al. Dissecting the single-cell transcriptome network underlying esophagus non-malignant tissues and esophageal squamous cell carcinoma. EBioMedicine. 2021;69:103459. Butler A, et al. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol. 2018;36(5):411–20. Becht E, et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol; 2018. Aran D, et al. Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat Immunol. 2019;20(2):163–72. Zhang X, et al. CellMarker: a manually curated resource of cell markers in human and mouse. Nucleic Acids Res. 2019;47(D1):D721–d728. Yu G, et al. clusterProfiler: an R package for comparing biological themes among gene clusters. Omics. 2012;16(5):284–7. Qiu X, et al. Reversed graph embedding resolves complex single-cell trajectories. Nat Methods. 2017;14(10):979–82. Jin S, et al. Inference and analysis of cell-cell communication using CellChat. Nat Commun. 2021;12(1):1088. Browaeys R, Saelens W, Saeys Y. NicheNet: modeling intercellular communication by linking ligands to target genes. Nat Methods. 2020;17(2):159–62. Zhou Y, et al. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat Commun. 2019;10(1):1523. Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9:559. Friedman J, Hastie T, Tibshirani R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw. 2010;33(1):1–22. Subramanian A, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545–50. Hänzelmann S, Castelo R, Guinney J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics. 2013;14:7. Ritchie ME, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43(7):e47. Mayakonda A, et al. Maftools: efficient and comprehensive analysis of somatic variants in cancer. Genome Res. 2018;28(11):1747–56. Zhang X, et al. Pan-Cancer Analysis of PARP1 Alterations as Biomarkers in the Prediction of Immunotherapeutic Effects and the Association of Its Expression Levels and Immunotherapy Signatures. Front Immunol. 2021;12:721030. Zheng X, et al. Tumor-antigens and immune landscapes identification for prostate adenocarcinoma mRNA vaccine. Mol Cancer. 2021;20(1):160. Klemke L, et al. Hsp90-stabilized MIF supports tumor progression via macrophage recruitment and angiogenesis in colorectal cancer. Cell Death Dis. 2021;12(2):155. Cai Y et al. Heparin-Binding Protein: A Novel Biomarker Linking Four Different Cardiovascular Diseases. Cardiol Res Pract, 2020. 2020: p. 9575373. Olshan AF, et al. Alterations of the p16 gene in head and neck cancer: frequency and association with p53, PRAD-1 and HPV. Oncogene. 1997;14(7):811–8. Socovich AM, Naba A. The cancer matrisome: From comprehensive characterization to biomarker discovery. Semin Cell Dev Biol. 2019;89:157–66. Suchting R, et al. Using Elastic Net Penalized Cox Proportional Hazards Regression to Identify Predictors of Imminent Smoking Lapse. Nicotine Tob Res. 2019;21(2):173–9. Tables Table 1 is available in the Supplementary Files section. Additional Declarations No competing interests reported. Supplementary Files table1.tif figs1.png Figure. S1 Regulatory network for ligands and target genes. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-3912322","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":270242756,"identity":"119c4293-9ac9-4f9d-be33-7df2b7305a64","order_by":0,"name":"Lin Hao","email":"","orcid":"","institution":"Department of Urology, Shenzhen Hospital, Southern Medical University","correspondingAuthor":false,"prefix":"","firstName":"Lin","middleName":"","lastName":"Hao","suffix":""},{"id":270242757,"identity":"29fba514-4014-46da-96ed-b5057846df73","order_by":1,"name":"Xiangqiu Chen","email":"","orcid":"","institution":"Department of Urology, Shenzhen Hospital, Southern Medical University","correspondingAuthor":false,"prefix":"","firstName":"Xiangqiu","middleName":"","lastName":"Chen","suffix":""},{"id":270242759,"identity":"7b95145c-b697-42d3-b87a-9e8e9e6b5025","order_by":2,"name":"Qingchun Zhou","email":"","orcid":"","institution":"Department of Urology, Shenzhen Hospital, Southern Medical University","correspondingAuthor":false,"prefix":"","firstName":"Qingchun","middleName":"","lastName":"Zhou","suffix":""},{"id":270242760,"identity":"90470506-fbd0-4ac0-b190-ac9e28878639","order_by":3,"name":"Tao Wu","email":"","orcid":"","institution":"Department of Urology, Shenzhen Hospital, Southern Medical University","correspondingAuthor":false,"prefix":"","firstName":"Tao","middleName":"","lastName":"Wu","suffix":""},{"id":270242761,"identity":"2e7404b4-3319-4897-9a69-e69c6f57120b","order_by":4,"name":"Zhiqiang Wen","email":"","orcid":"","institution":"Department of Urology, Shenzhen Hospital, Southern Medical University","correspondingAuthor":false,"prefix":"","firstName":"Zhiqiang","middleName":"","lastName":"Wen","suffix":""},{"id":270242763,"identity":"cf772fd8-3316-43e4-a528-6960d6f80ba7","order_by":5,"name":"Ziliang Ji","email":"","orcid":"","institution":"Department of Urology, Shenzhen Hospital, Southern Medical University","correspondingAuthor":false,"prefix":"","firstName":"Ziliang","middleName":"","lastName":"Ji","suffix":""},{"id":270242765,"identity":"e75ac51f-ae5d-49f9-bdb0-7fa244742895","order_by":6,"name":"Xichun Zheng","email":"","orcid":"","institution":"Department of Urology, Shenzhen Hospital, Southern Medical University","correspondingAuthor":false,"prefix":"","firstName":"Xichun","middleName":"","lastName":"Zheng","suffix":""},{"id":270242766,"identity":"844a669f-f1ab-491b-aa4c-c090b3490ed3","order_by":7,"name":"Qingyou Zheng","email":"","orcid":"","institution":"Department of Urology, Shenzhen Hospital, Southern Medical University","correspondingAuthor":false,"prefix":"","firstName":"Qingyou","middleName":"","lastName":"Zheng","suffix":""},{"id":270242767,"identity":"0d8a86ab-71ba-48af-bdfa-d694e9dcb020","order_by":8,"name":"Donglin Sun","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABF0lEQVRIiWNgGAWjYDCCA0D8wADEYj4AEZEAk8z4tSSAtbAlgDlEagGzeAyI08J3vPfwi4SCO/YGt3s+f/7YxmA3f3Z3mgRDhXViA/vZA9i0SJ45l2aRYPAsccOds9skDrYxJDfOATIYzqQnNvDkJWDTYnAjx8wgweBwgsGN3G0MIC3MErnbJBjbDic2SPAY4NNiD2Q8/gDSwgbW8g+vFuMHQC2MG27kMIAcZscD1tKAW4vkmTNmwEA+nDjzRpqZxJlzEgkSErmbLRKOpRu38eRg1cJ3vMf4w4c/h+35biQ//lBRZmMvPyN3440PNday/exnsGoBAjYJJI5EYgOIAgUVGw71QMD8AZlnj1vhKBgFo2AUjFQAAJGJZ8QuQZg2AAAAAElFTkSuQmCC","orcid":"","institution":"Department of Urology, Shenzhen Hospital, Southern Medical University","correspondingAuthor":true,"prefix":"","firstName":"Donglin","middleName":"","lastName":"Sun","suffix":""}],"badges":[],"createdAt":"2024-01-31 02:59:13","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-3912322/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-3912322/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":50575894,"identity":"a9493536-bc0f-4f8c-91e7-b930ef8fd272","added_by":"auto","created_at":"2024-02-02 17:36:14","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":3992841,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSingle-cell RNA-seq data identified seven distinct cell clusters with varied annotations indicating substantial cellular heterogeneity in PRAD tumors.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e(a)\u003c/strong\u003e \u003cstrong\u003e.\u003c/strong\u003e Employing UMAP, dimensionality reduction was performed on the top 20 PCs, resulting in the classification of a total of 34 cell clusters. \u003cstrong\u003e(b). \u003c/strong\u003eEach cell cluster's marker gene expression. \u003cstrong\u003e(c).\u003c/strong\u003e \"SingleR\" and \"CellMarker\" were used for annotating all seven cell clusters in PRAD based on gene composition. \u003cstrong\u003e(d).\u003c/strong\u003e UMAP mapping different sourced samples. \u003cstrong\u003e(e-h).\u003c/strong\u003e ClusterProfiler was employed for GO and KEGG functional enrichment analyses using marker genes derived from differentially expressed cells. \u003cstrong\u003e(i). \u003c/strong\u003eAnalysis of the trajectory of PRAD cells indicated two primary branches: one dominated by epithelial cells, endothelial cells, and fibroblasts, while the other was dominated by immune cells \u003cstrong\u003e(j). \u003c/strong\u003eNumber and strength of interactions among different cell types \u003cstrong\u003e(k). \u003c/strong\u003eBubble diagram illustrating cell communication signaling route strength. \u003cstrong\u003e(l, m). \u003c/strong\u003eMIF signaling pathway network.\u003c/p\u003e","description":"","filename":"OnlineFig1.png","url":"https://assets-eu.researchsquare.com/files/rs-3912322/v1/7f3bbf06cf68ba14bc15bcb9.png"},{"id":50575893,"identity":"13d0d284-7a0a-423b-b2de-e9de12976029","added_by":"auto","created_at":"2024-02-02 17:36:13","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":2978397,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eDEGs identification and functional enrichment in PRAD patients and controls.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e(a). \u003c/strong\u003eDEGs volcano diagram comparing PRAD and control in TCGA. Significant DEGs were identified with \u003cem\u003eP\u003c/em\u003e\u0026lt;0.05 and |log2FoldChange|\u0026gt;1. Red dots show elevated genes, and blue dots show genes with reduced expression. \u003cstrong\u003e(b).\u003c/strong\u003e Heatmap of DEGs. (\u003cstrong\u003ec–e)\u003c/strong\u003eNetwork and bar plots of the GO and TF analysis of DEGs.\u003c/p\u003e","description":"","filename":"OnlineFig2.png","url":"https://assets-eu.researchsquare.com/files/rs-3912322/v1/c42959fd8382d1a66d357987.png"},{"id":50575899,"identity":"6500d45f-0769-4c66-8501-7d9fe98766b9","added_by":"auto","created_at":"2024-02-02 17:36:17","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":694632,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eScreening of PRAD-related genes conducted using WGCNA.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(a)\u003cstrong\u003e.\u003c/strong\u003e Evaluation of the scale-free index for different soft-threshold powers (β). \u003cstrong\u003e(b\u003c/strong\u003e). The minimum number of genes per module is 300, and 23 modules are obtained when MEDissThres is equal to 0.18. \u003cstrong\u003e(c).\u003c/strong\u003e The cluster dendrogram representing co-expression network modules (1-TOM). \u003cstrong\u003e(d).\u003c/strong\u003e A heatmap depicting the relationships between the identified modules. \u003cstrong\u003e(e).\u003c/strong\u003e Analysis of correlations between modules along with corresponding \u003cem\u003eP\u003c/em\u003e-values. \u003cstrong\u003e(f).\u003c/strong\u003e Scatter plot analysis focusing on the brown and blue modules\u003c/p\u003e","description":"","filename":"OnlineFig3.png","url":"https://assets-eu.researchsquare.com/files/rs-3912322/v1/385baf8ba599a454c075139f.png"},{"id":50575898,"identity":"86aaa38c-a1fe-41d7-8565-b85446a97cf1","added_by":"auto","created_at":"2024-02-02 17:36:16","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":1365893,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eDevelopment of risk signature in the TCGA cohort.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e(a).\u003c/strong\u003eOverlap of PRAD-related genes, DEGs in single cells, and DEGs in PRAD and controls. \u003cstrong\u003e(b, c). \u003c/strong\u003eLASSO regression of DFS-related genes. \u003cstrong\u003e(d).\u003c/strong\u003eKaplan–Meier curve results. \u003cstrong\u003e(e). \u003c/strong\u003eRisk survival status plot. \u003cstrong\u003e(f). \u003c/strong\u003eAssessment of the AUC for predicting 1, 3, and 5-year survival rates among PRAD individuals. \u003cstrong\u003e(g). \u003c/strong\u003eA Univariate Cox analysis involving risk scores and clinical characteristics.\u003cstrong\u003e (h). \u003c/strong\u003eMultifactorial Cox analysis. \u003cstrong\u003e(i).\u003c/strong\u003e Development of the nomogram model. \u003cstrong\u003e(j). \u003c/strong\u003eRepresentation of the calibration curve of the nomogram. \u003cstrong\u003e(k-m).\u003c/strong\u003e Decision curve analyses assessing the performance of nomogram for 1-, 3- and 5-year risk predictions.\u003c/p\u003e","description":"","filename":"OnlineFig4.png","url":"https://assets-eu.researchsquare.com/files/rs-3912322/v1/41bc53b943606c5e61e71e6c.png"},{"id":50575895,"identity":"8f7a2893-06ff-4266-a133-aff9173454c7","added_by":"auto","created_at":"2024-02-02 17:36:15","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":3863491,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eComparison of biological characteristics between high- and low-risk groups.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e(a, b).\u003c/strong\u003e GSEA of GO and KEGG between high- and low-risk groups\u003cstrong\u003e. (c).\u003c/strong\u003e Visualization of GSEA results in a bubble plot. \u003cstrong\u003e(d).\u003c/strong\u003e GSVA performed on all genes within the high- and low-risk groups. \u003cstrong\u003e(e). \u003c/strong\u003eThe violin plot illustrating the comparison of ssGSEA scores for 28 immune cells between high and low-risk categories. \u003cstrong\u003e(f, g). \u003c/strong\u003eThe link between immune cells and 21 model genes. \u003cstrong\u003e(h). \u003c/strong\u003eAn overview of the mutation landscape for TCGA-PRAD patients. \u003cstrong\u003e(i). \u003c/strong\u003ePrediction of the tumor mutational burden (TMB) in the high‐risk groups using the risk model.\u003cstrong\u003e (j). \u003c/strong\u003ePrediction of the tumor mutational burden (TMB) in the low‐risk groups using the risk model.\u003c/p\u003e","description":"","filename":"OnlineFig5.png","url":"https://assets-eu.researchsquare.com/files/rs-3912322/v1/5674384001cf9e9491c207e3.png"},{"id":50575896,"identity":"02fc78ab-1a8e-4bcf-9153-040f845815ee","added_by":"auto","created_at":"2024-02-02 17:36:15","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":10572578,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eExperimental validation of the analysis results.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e(a). \u003c/strong\u003eQuantitative analysis of mRNA expression of Riskscore genes (n=6).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e(b). \u003c/strong\u003eImmunohistochemistry of normal and tumor groups. \u003cstrong\u003e(c, d). \u003c/strong\u003eImmunofluorescence analysis and quantification of normal and tumor groups.\u003c/p\u003e","description":"","filename":"OnlineFig6.png","url":"https://assets-eu.researchsquare.com/files/rs-3912322/v1/51b7f88e2b1b3584905de48b.png"},{"id":50839579,"identity":"3b39fe43-63a8-4cc9-8d3e-3ba961e8a574","added_by":"auto","created_at":"2024-02-08 07:11:13","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2962772,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-3912322/v1/288959af-cd45-42dd-8c33-81ad803ca8b0.pdf"},{"id":50576475,"identity":"ab3044d6-79e8-4674-8ca0-4d9d9597d743","added_by":"auto","created_at":"2024-02-02 17:44:12","extension":"tif","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":924436,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cbr\u003e\u003c/p\u003e","description":"","filename":"table1.tif","url":"https://assets-eu.researchsquare.com/files/rs-3912322/v1/72c258be469224ace7407fde.tif"},{"id":50575891,"identity":"10588c14-c775-4434-8c15-a4915982dce3","added_by":"auto","created_at":"2024-02-02 17:36:12","extension":"png","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":354219,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFigure. S1 Regulatory network for ligands and target genes.\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"figs1.png","url":"https://assets-eu.researchsquare.com/files/rs-3912322/v1/bf268c4ce7b4a87a9614b7fb.png"}],"financialInterests":"No competing interests reported.","formattedTitle":"Establishing a Prognostic Model in Prostate Adenocarcinoma through Comprehensive scRNA-Seq and Bulk RNA-Seq Analysis and Validation","fulltext":[{"header":"Introduction","content":"\u003cp\u003eProstate adenocarcinoma (PRAD) is the predominant malignancy found in males, with its incidence strongly correlated with advancing age[\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. Radical prostatectomy or radiation therapy are the principal curative treatment options for individuals with prostate cancer. Nevertheless, it is estimated that around 20\u0026ndash;30% of individuals would experience biochemical recurrence after radical prostatectomy, which represents the early occurrence of recurrence and metastasis[\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. Currently, the effectiveness of predicting patient survival risk has been demonstrated by using many factors, including serum prostate-specific antigen levels[\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e], tumor pathological stage, and Gleason score[\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. The survival rate of individuals with PARD is higher in the presence of tumors, and the progression of the tumor is related to the quality of life of patients. Hence, the prioritization lies in accurately predicting the potential progression of the tumor.\u003c/p\u003e \u003cp\u003eBecause of the rapid advancement of cancer genetics, Bulk transcriptome sequencing (bulk RNA-seq) has been a prominent tool in transcriptomics in recent decades[\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. Genetic changes are increasingly being found as effective therapy targets for PRAD. Wu \u003cem\u003eet al.\u003c/em\u003e, for example, discovered that targeting the HIC1/TGF-axis-shaped prostate cancer microenvironment slows its growth[\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. In addition, according to Hao \u003cem\u003eet al.\u003c/em\u003e, the deletion of HIC1 has been found to facilitate the spread of prostate cancer through the initiation of epithelial-mesenchymal transition[\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. Bulk RNA-seq enhances the comprehension of disease. However, bulk RNA-seq studies only examine the average gene expression in cell groups.\u003c/p\u003e \u003cp\u003eSingle-cell RNA-seq (scRNA-seq) now provides access to underlying gene expression distributions and clarifies information on cellular transcriptome heterogeneity[\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. With scRNA-seq, it becomes possible to develop potentially beneficial personalized therapeutic approaches for diagnosing cancer throughout its progression. In the study conducted by Yu \u003cem\u003eet al.\u003c/em\u003e, single-cell omics traces the heterogeneity among prostate cancer cells and the tumor microenvironment, improving our understanding of tumor progression mechanisms[\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. Chen \u003cem\u003eet al.\u003c/em\u003e also utilized scRNA-seq to determine the correlation between IDH1 expression and the degree of immune cell infiltration, particularly involving immunosuppressive cells like CD8\u0026thinsp;+\u0026thinsp;T-cells, CD4\u0026thinsp;+\u0026thinsp;T-cells, and macrophages[\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. In light of this advantage, multiple studies have prioritized the identification of possible biological markers for PRAD by integrating bulk RNA-seq and scRNA-seq analyses to effectively stratify and discern patients.\u003c/p\u003e \u003cp\u003eThis study employed scRNA-seq and bulk RNA-seq data to perform comprehensive bioinformatics analysis in order to develop a predictive model for patients with PRAD. Simultaneously, the study delineated the immune infiltration landscape and ascertained its role in the pathogenesis of PRAD. Furthermore, the relationship between risk models and infiltrating immune cells was investigated to acquire a deeper comprehension of the underlying molecular immunological mechanisms involved in the evolution of PRAD. In conclusion, this research offers original findings that have the potential to enhance the clinical approach to managing PRAD.\u003c/p\u003e"},{"header":"Materials and methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eThe collection and analysis of data\u003c/h2\u003e \u003cp\u003eData from bulk RNA-seq, clinical records, and SNP mutation sites of TCGA-PRAD were acquired from the TCGA database (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://portal.gdc.cancer.gov/\u003c/span\u003e\u003cspan address=\"https://portal.gdc.cancer.gov/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). The dataset encompassed 52 samples of normal tissue and 492 samples of PRAD, all of which contain detailed survival and clinical data. The scRNA sequencing datasets PRAD_GSE137829, PRAD_GSE141445, and PRAD_GSE172301 were utilized in the study. The study employed scRNA-seq to examine cancer tissues obtained from 32 patients with PRAD. These tissues were sourced from the TISCH database (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://tisch.comp-genomics.org/gallery/\u003c/span\u003e\u003cspan address=\"http://tisch.comp-genomics.org/gallery/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). Detailed patient information and sequencing statistics are available in Table\u0026nbsp;1. The integration of samples was performed using the anchors method, a technique implemented in the R \"Seurat\"[\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eCore cells were subsequently created through a filtering step applied to the scRNA-seq data. Unqualified cells refer to genes that exhibit detectability exclusively in three or fewer cells, as well as in less than 200 cells of low quality. Genes identified under these criteria were omitted from subsequent analysis. The analysis of gene expression in core cells involved utilizing a standardized linear regression model, followed by the application of ANOVA to identify the top 2000 genes exhibiting highly variable fingerprints. Principal component analysis (PCA) was conducted on individual cell samples, identifying the first 20 principal components (PCs). These components were then chosen for further study. The UMAP algorithm was employed to comprehensively analyze dimensionality reduction on the initial 20 principal component pairs of the samples[\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]. R \"singleR\" was employed to utilize reference datasets such as HumanPrimaryCellAtlasData, BlueprintEncodeData, and ImmuneCellExpressionData for auxiliary annotation[\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]. Additionally, the CellMarker database and prior research were consulted to identify marker genes for the manual annotation of distinct clusters[\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]. The visualization process was facilitated by utilizing the R \"SCP\" ( \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/zhanghao-njmu/SCP\u003c/span\u003e\u003cspan address=\"https://github.com/zhanghao-njmu/SCP\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). Analysis process and experimental design are shown in the flowchart.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003eGene enrichment analysis based on single-cell marker genes\u003c/h2\u003e \u003cp\u003eThe Seurat package's FindAllMarkers function was utilized to locate marker genes for each cluster, with parameters set as min.pct\u0026thinsp;=\u0026thinsp;0.2 and only. The Wilcoxon rank sum test was performed to identify differentially expressed genes (DEGs) and screen marker genes when pos\u0026thinsp;=\u0026thinsp;TRUE. Using \"clusterProfler\" in R software[\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e], core cells of marker genes underwent enrichment analysis for Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) functions, respectively. 'Monocle 2' was utilized to infer the developmental trajectory of cells to elucidate the underlying molecular causes of PRAD progression [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e]. The \"cellchat\" algorithm was employed to explore potential cell-to-cell interactions, while \"nichenet\" was utilized to derive the regulatory links between ligands and immune cell and stromal cell target genes[\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e, \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003eDEG identification and functional enrichment analysis in the TCGA-PRAD\u003c/h2\u003e \u003cp\u003eA total of 52 control and 492 PRAD patient data were subjected to a difference analysis utilizing the limma package. The criteria of \u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05 and Log2FC\u0026thinsp;\u0026gt;\u0026thinsp;1 were utilized to identify genes as DEG. The visualization of the heat maps and volcano plots of DEGs was carried out utilizing the ggplot2 and pheatmap packages, respectively. The DEGs were subsequently analyzed by KEGG and GO using the online instrument Metascope[\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003eWGCNA analysis\u003c/h2\u003e \u003cp\u003eThe \"goodSamplesGenes\" function, a component of the R \"WGCNA\", was employed to assess the necessity of gene filtering within a sample and to determine an optimal soft threshold[\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]. The analysis involved the establishment of a co-expression network to examine the link between the genes associated with the module (ME) and PRAD.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003eThe development of a prognosis-predictive model\u003c/h2\u003e \u003cp\u003eThe identification of candidate genes involved the identification of pairwise intersecting genes from three sources: marker genes of core cells, DEGs associated with PRAD, and module genes. These candidate genes were subsequently used in the least absolute shrinkage and selection operator (LASSO) regression analysis, which was executed using the R \"glmnet\". The aim of the LASSO regression was to minimize the number of genes in the final risk model, thereby screening for characteristic genes associated with prognosis [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e]. In order to decrease the number of genes in the final risk model, variables with \u003cem\u003eP\u003c/em\u003e-values 0.05 were subjected to the LASSO regression analysis, which was executed utilizing the \"glmnet\" R package. The prognosis-predictive model was developed employing the subsequent formula: risk score\u0026thinsp;=\u0026thinsp;gene exp11\u0026thinsp;+\u0026thinsp;gene exp22+...+gene expression (where \"gene expression\" represents the gene expression value and represents the corresponding LASSO regression coefficient). The R programs \"survminer\" and \"ggrisk\" were employed to visualize the survival curves and risk maps of patients. Additionally, the \"survROC\" program was utilized to plot ROC curves, assessing the performance of risk scores in predicting overall survival (OS) at 1, 3, and 5 years among individuals with PRAD.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eIndependent prognostication\u003c/h2\u003e \u003cp\u003eUnivariate analysis was performed to examine the predictive significance of risk models and clinical parameters (age, T stage, M stage, N stage, riskScore). In contrast, multivariable Cox analysis of OS was utilized to determine independent risk factors and predict new events in PRAD. Utilizing the \"cph\" function in R, a nomogram model was plotted to visualize this predictive model and predict potential new events for patients at 1, 3, and 5 years. The validity of the bar chart was validated through regression analysis.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003eGSEA and GSVA enrichment analysis\u003c/h2\u003e \u003cp\u003eTo investigate functional variations and related pathways between the high-risk and low-risk groups, a GSEA was performed utilizing the clusterProfler package on all genes in the TCGA samples from both groups [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e]. A collection of 50 human cancer marker pathway genes was obtained from the Molecular Signature Database (MSigDB) (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://www.gsea-msigdb.org/gsea/index.js\u003c/span\u003e\u003cspan address=\"http://www.gsea-msigdb.org/gsea/index.js\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). Subsequently, GSVA was performed on all genes from samples in both groups, and the variations in GSVA scores between both risk samples were assessed utilizing the \"limma\" [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e, \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003eAnalysis of immune microenvironment\u003c/h2\u003e \u003cp\u003eThe ssGSEA, implemented through the R \"gsva\", was performed for all samples within the TCGA-PRAD dataset to derive profiles of 28 immune cell infiltrations. This was done to facilitate the comparison of immune cell characteristics between both risk groups. Pearson's correlation coefficient was employed to examine the link between risk score and immune infiltrating cells. The R \"maftools\" was utilized to analyze the mutational variations between both risk groups[\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003ePatient sample collection\u003c/h2\u003e \u003cp\u003eA total of 12 kidney tissue samples were collected, including 6 samples from normal individuals and 6 samples from PRAD patients. All patient samples were obtained from the Department of Urology, Shenzhen Hospital, Southern Medical University, Shenzhen, and approved by the Medical Ethics Committee of Southern Medical University. RNAiso Plus (TaKaRa), was employed to extract total RNA from lung tissues. Reverse transcription was conducted using an RNA inversion kit (TaKaRa). Actin was utilized as an internal control, and the relative expression of genes was assessed using ΔΔCt quantification, expressed as fold-changes compared to the normal controls. Normal kidney tissue and PRAD patient kidney tissue were stained and analyzed via immunofluorescence microscopy utilizing a Zeiss microscope.\u003c/p\u003e \u003c/div\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003eIdentification of PRAD cell subtypes\u003c/h2\u003e \u003cp\u003eThe fundamental framework of the current research is depicted in Figure \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e. Initially, a collection of scRNA-seq samples was obtained from a database consisting of 32 patients diagnosed with PRAD. Subsequently, a rigorous screening process was conducted to eliminate cells that did not meet the predetermined criteria. As a result, a total of 162,897 cells were successfully retrieved and deemed suitable for subsequent investigations. PCA clustering was carried out using single-cell samples. Concurrently, in the context of PCA, 20 PCs with a \u003cem\u003eP\u003c/em\u003e-value of 0.05 were subjected to further study. The UMAP approach was employed to partition the cells, identifying 34 distinct cell clusters (Figure. 1. a). The annotation of unique cell clusters was executed using the \"singleR\" package marker genes, the CellMarker database, and relevant references, generating seven cell clusters. Bubble charts (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. b) were utilized to visually represent the expression of critical marker genes for each cell type. These charts were employed to examine the expression patterns of specific cells. The cumulative expression levels of each identified marker gene were aggregated to illustrate the precision of cell type identification. The cell types present include fibroblasts, endothelial cells, epithelial cells, monocyte macrophages, B cells, T cells, and mast cells (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. c). Additionally, the UMAP graph (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. d) illustrates the clustering of several samples.\u003c/p\u003e\u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003eFunctional enrichment study of marker genes\u003c/h2\u003e \u003cp\u003eThe gene markers associated with specific cell types exhibited a remarkable enrichment for GO and KEGG activities. It was shown that marker genes pertaining to all seven cell types had associations with cell adhesion and migration (Figure. 1. e-g). Moreover, the expression of B cell marker genes was linked to the DNA replication process. Most marker genes found in other cells were associated with traits like resistance to infection, tumor development, and cellular adhesion (Figure. 1.h). Each annotated cell was subjected to a pseudo-time analysis, and the Monocle 2 technique was used to evaluate their differentiation track. The findings of the study revealed that PRAD cells underwent a gradual differentiation process with three paths. Immune cells, in particular, preferentially differentiated in the later phases, whereas non-immune cells primarily differentiated in the earlier stages (Figure. 1. i).\u003c/p\u003e \u003cp\u003eFurthermore, the usage of ligand-receptors is expected to facilitate intercellular communication. The data showed that fibroblasts, endothelial cells, and epithelial cells communicated more often (Figure. 1. j). Furthermore, our analysis of the regulatory network for ligands and target genes indicated that 28 ligands were selectively activated in immune cells (Figure. S1). A heat map was generated to visualize the distribution of 28 ligands on immune cells and their impact on the regulation of target genes in non-immune cells. (Figure \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eMoreover, it was observed that the MIF cell signaling pathway exhibited significant activity in facilitating intercellular communication. The major ligand pairs involved in the MIF signaling pathway are CD74-CXCR4 and CD74-CD44 (Figure. 1. k). Non-immune cells are primarily responsible for generating MIF signals, while immune cells consist of signal receptors and influences (Figure. 1. l, m). The MIF signaling pathway is recognized for its significant role in the development and advancement of various tumor types, underscoring its potential significance in the context of prostate cancer.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003eIdentification and functional enrichment analysis of DEGs in bulk RNA-seq data\u003c/h2\u003e \u003cp\u003eIn total, 1086 significant DEGs were identified. The distribution of DEGs was visually represented using volcano plots and heatmaps (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e2\u003c/span\u003e. a, b). The GO analysis highlighted the primary enrichment of DEGs in NABA MATRISOME ASSOCIATED and MITOTIC CELL CYCLE PROCESS (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e2\u003c/span\u003e. c, d). Additionally, the analysis of transcription factor enrichment predominantly emphasized HSD17B (Figure. 2. e)\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003eIdentification of key modules related to PRAD\u003c/h2\u003e \u003cp\u003eWGCNA was utilized to identify participants in the progression and development of PRAD. While constructing the co-expression network, the power was set to 5 when the fit index of the soft-threshold scale-free topology attained 0.80 (Figure. 3. a, b). In order to integrate a similar module tree algorithm through dynamic shear analysis, MEDissTres was set to 0.2. Following the merging process, ten modules were obtained (Figure. 3. c), and a correlation heatmap depicting the relationships between these modules was generated (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e3\u003c/span\u003e.d). As per the correlation coefficient and \u003cem\u003eP\u003c/em\u003e-value, MEbrown and MEblue (consisting of 2545 genes) were chosen as the key modules (Figure. 3. e). A scatter plot was generated to illustrate the clinical relevance of the brown and blue modules (Figure. 3. f).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003eConstruction of a three-characteristic gene-based model\u003c/h2\u003e \u003cp\u003eUsing Ven plots, the intersection of single-cell sequence marker genes, PRAD module genes, and DEGs of cell subtypes was demonstrated. This analysis resulted in the identification of a total of 12 intersecting genes that were designated as candidate genes. To ensure the precision of model prediction, consideration was given to the collective set of intersecting genes, amounting to 425 genes (Figure. 4. a). Subsequently, the genes underwent screening for the purpose of model development utilizing the LASSO algorithm. The findings are depicted in (Figure. 4. b, c). The Kaplan-Meier indicated a statistically significant difference in disease-free survival (DFS) between individuals with high-risk and low-risk scores (Figure. 4. d, e). To enhance the credibility of the risk model, an assessment was conducted by calculating the Receiver Operating Characteristic (ROC) curve for OS. The Area Under the Curve (AUC) values at 1, 3, and 5 years were found to be larger than 0.80, suggesting a higher level of effectiveness of the risk model (Figure. 4. f). In conclusion, our prognostic model demonstrated outstanding predictive accuracy for PRAD.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003eEvaluation of independent prognosis-predictive factors and nomogram development\u003c/h2\u003e \u003cp\u003eThe study employed univariate and multivariate Cox analyses to determine independent predictive factors to assess clinical features and risk scores. RiskScore demonstrated superior predictive capability compared to the T, M, and N stages. (Figure. 4.g, h). The nomogram model (Figure. 4. i) incorporated the independent prognostic variables. Furthermore, the calibration curve demonstrated the substantial predictive efficacy of the model (Figure. 4. j). The accuracy of 1-, 3-, and 5-year predictions for was assessed using the Net benefit strategy (Figure. 4. k-m). Consequently, the findings of the study revealed that risk score served as an independent prognosis-predictive factor and that the nomogram exhibited a high predictive capability for DFS in individuals with PRAD.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec19\" class=\"Section2\"\u003e \u003ch2\u003eGSEA between high‑ and low-risk groups\u003c/h2\u003e \u003cp\u003eTo explore the influence of both risk groups on cancer advancement, GSEA was conducted to identify the most significant enrichment pathways distinguishing between the two. The findings indicated that the high-risk group exhibited significant enrichment in immunological processes, including TNF-α and Interferon-α pathways, whereas the low-risk group exhibited enrichment primarily in androgen and acid metabolism pathways (Figure. 5. a-c). Furthermore, GSVA was performed on all genes in both risk groups. The findings indicated that the high-risk group exhibited significant expression in processes related to cell communication, cyclase regulation, and cyclic nucleotide phosphodiesterase. Moreover, the low-risk group exhibited activation in the areas of tyrosine kinase, biosynthesis, and metabolism (Figure. 5. d).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec20\" class=\"Section2\"\u003e \u003ch2\u003eImmune infiltration analysis\u003c/h2\u003e \u003cp\u003eThe ssGSEA methodology was utilized to calculate infiltration scores for 28 distinct immune cell populations across various risk groups. The findings unveiled statistically significant differences in infiltration levels among 17 immune cell species (Figure. 5. e). The heatmap and Pearson correlation analysis demonstrated a significant association between prognosis-predictive genes and the risk score with infiltrating immune cells (Figure. 5. f, g). It was discovered that missense mutations and SNPs dominated the PRAD patients (Figure. 5. h). The mutation analysis outcomes between both risk groups revealed that the majority of mutation types in both groups were missense mutations. The high-risk group exhibited a larger percentage of mutations in comparison to the low-risk group (Figure. 5. i, j). Overall, the findings of this study highlighted that predicted genes, immune cell infiltration, and site mutations are relevant in PRAD.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec21\" class=\"Section2\"\u003e \u003ch2\u003eExperimental validation of the analysis results\u003c/h2\u003e \u003cp\u003eIn order to validate the accuracy of our analysis, we collected kidney samples from normal individuals and patients with PRAD. We found that genes with high expression in PRAD patients with high riskscore included NDUFA4L2, GDPD3, ITGAX, and FAM167B (Figure. 6. a), which is consistent with our previous analysis results. Based on the clinical information of the patients, we divided the PRAD patients into stage I and stage III. By quantifying the expression levels of these genes through immunohistochemistry and immunofluorescence(Figure. 6. b-d), we found that the prognostic model can well reflect the different stages of tumor patients and has a high degree of accuracy.\u003c/p\u003e \u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003ePRAD is one of the most frequent cancers globally, and its prevalence is increasing in many nations[\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e]. Despite recent efforts to improve PRAD care, the varied and aggressive aspects of PRAD remain limited for prognostic evaluation[\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e]. Thus, the screening of novel biomarkers to aid in the development of patient-specific therapies and enhance prognosis remains crucial and imperative. Unlike bulk RNA-seq, which provides insights into the average gene expression levels across cell populations, scRNA-seq has emerged as a valuable technique for transcriptional stratification. By delineating cell subpopulations and identifying specific biomarkers, scRNA-seq enables the characterization of heterogeneity among distinct cell types in multiple cancer types, including PRAD. Consequently, this investigation executed a comprehensive analysis of both bulk RNA-seq and scRNA-seq data, culminating in the development of a risk model that exhibits outstanding prognosis-predictive value in the context of PRAD.\u003c/p\u003e \u003cp\u003eInitially, an in-depth investigation was conducted using scRNA-seq data comprising 162,897 cells to identify seven crucial cell types present in the PRAD tumor. These cell types included epithelial cells, fibroblasts, endothelial cells, T cells, monocyte macrophages, B cells, and mast cells. Remarkably, a high cellular communication frequency was observed among these cell types. Notably, our analysis detected the substantial activation of the MIF signaling pathway during the progression of PRAD. Interestingly, similar findings have been reported in colorectal cancer (CRC), where the MIF/CD74 pathway has been implicated in regulating disease progression[\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e]. In PRAD, it was further revealed that the CD74-CXCR4 axis plays a pivotal role in this abnormally active ligand-receptor interaction within the MIF pathway. These findings suggested a potential parallel between CRC and PRAD, whereby CD74 may be critically involved in promoting the onset and advancement of PRAD. The functional analysis of the DEGs identified in the TCGA dataset revealed significant enrichment in the NABA MATRISOME ASSOCIATED signaling pathway and the MITOTIC CELL CYCLE PROCESS signaling pathway[\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e]. These pathways are believed to contribute to the proliferation and advancement of PRAD. Prior research has consistently reported alterations in cyclins, TP53, and Rb genes, which are known to be involved in PRAD[\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eAdditionally, the expression of NABA MATRISOME ASSOCIATED has been observed in various diseases, particularly in epithelial cells[\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e]. This observation may be associated with the more infiltrative nature of PRAD epithelial cells. Moreover, through the application of WGCNA analysis, the brown and blue modules, comprising 2,120 genes, were identified as key modules. By intersecting the three aforementioned gene sets, a total of 425 candidate genes were collected to enhance the stability of the gene signature.\u003c/p\u003e \u003cp\u003eSubsequently, a gene prognosis-predictive model was constructed utilizing univariate Cox regression analysis and the LASSO algorithm[\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e]. The ROC curve demonstrated that the model exhibited promising predictive efficacy for prognosis in PRAD. Furthermore, the results of this study indicated that this gene prognostic model functioned as an independent predictive factor for DFS among PRAD patients. Distinguished from other models, our signatures were derived through the integration of diverse datasets and utilizing various algorithms. These signatures underwent rigorous validation using an independent training set to ensure robustness. Remarkably, our model exhibited impressive AUC values ranging from 0.800 to 0.823, further highlighting its superior performance and reliability.\u003c/p\u003e \u003cp\u003eAdditionally, all samples were categorized into low- and high-risk groups based on the calculated risk score. The observations in this research unveiled that the high-risk group exhibited significant enrichment in immune processes and immune-related pathways. This led to the hypothesis that the risk score could function as a potential prognostic marker for individuals with PRAD undergoing immunotherapy. To further explore this hypothesis, multiple aspects were examined, including tumor mutational burden (TMB), immune infiltration, and associated risk genes. The comprehensive analysis revealed that the high-risk group exhibited higher TMB and remarkably increased levels of immune cell infiltration. These outcomes collectively indicate that the risk score holds promise as a valuable tool for predicting the response to immunotherapy in individuals with PRAD.\u003c/p\u003e \u003cp\u003eWith utmost consideration given to tumor heterogeneity, interactions among different cell populations, immune infiltration, TMB, and clinical characteristics, the strength of this study lies in the identification and development of a novel predictive model. This model demonstrates exceptional accuracy in distinguishing survival outcomes. The findings obtained from this study provide compelling evidence for the stratification of PRAD patients based on these factors. Nevertheless, this study has a number of unavoidable limitations. Specifically, future research based on these findings should continue to investigate the unclear regulation mechanisms of hallmark genes in PRAD.\u003c/p\u003e"},{"header":"Conclusions","content":"\u003cp\u003eThrough the integration of scRNA-seq and bulk RNA-seq data, multiple machine-learning techniques were employed to develop a novel prognosis-predictive model for predicting DFS among individuals with PRAD. This model enables the accurate prediction of DFS probability in PRAD patients. Furthermore, this study highlights the potential of the risk score as an independent prognosis-predictive factor that exhibits a strong correlation with the immune microenvironment. The comprehensive findings of this study establish it as a reliable predictor of efficacy in PRAD, paving the way for future targeted therapeutic approaches for this disease.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eAcknowledgments\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe thank the Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) Database for sharing a large amount of data.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eHao Lin, Xiangqiu Chen, and Qingchun Zhou contributed equally to the study\u0026rsquo;s inception and design. Hao Lin performed bioinformatics analyses and assisted with the analysis of other data. Xiangqiu Chen and Qingchun Zhou collect samples and analyze experimental data. Tao Wu, Zhiqiang Wen, and Ziliang Ji provided assistance. Donglin Sun, Qingyou Zheng and Xichun Zheng helped to revise the manuscript. All authors have read and approved the final manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis work was supported by Research Foundation of Shenzhen Hospital of Southern Medical University (project number 22H3AGZR07).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAvailability of data and materials\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe analyzed datasets generated during the study are available from the corresponding author upon reasonable request.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthics approval\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAll clinical trials were authorized by the ethics committees at Shenzhen Hospital, Southern Medical University.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent for publication\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare that they have no competing interests.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eSung H, et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin. 2021;71(3):209\u0026ndash;49.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLalonde E, et al. Tumour genomic and microenvironmental heterogeneity for integrated prediction of 5-year biochemical recurrence of prostate cancer: a retrospective cohort study. Lancet Oncol. 2014;15(13):1521\u0026ndash;32.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShao N, et al. Immunotherapy and endothelin receptor antagonists for treatment of castration-resistant prostate cancer. Int J Cancer. 2013;133(7):1743\u0026ndash;50.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFujita K, et al. Serum core-type fucosylated prostate-specific antigen index for the detection of high-risk prostate cancer. Int J Cancer. 2021;148(12):3111\u0026ndash;8.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJayaram A, Attard G. Diagnostic Gleason score and castration-resistant prostate cancer. Ann Oncol. 2016;27(6):962\u0026ndash;4.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHong M, et al. RNA sequencing: new technologies and applications in cancer research. J Hematol Oncol. 2020;13(1):166.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWu T, et al. Targeting HIC1/TGF-β axis-shaped prostate cancer microenvironment restrains its progression. Cell Death Dis. 2022;13(7):624.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHao M, et al. HIC1 loss promotes prostate cancer metastasis by triggering epithelial-mesenchymal transition. J Pathol. 2017;242(4):409\u0026ndash;20.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSlovin S, et al. Single-Cell RNA Sequencing Analysis: A Step-by-Step Overview. Methods Mol Biol. 2021;2284:343\u0026ndash;65.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYu X, et al. Single-cell omics traces the heterogeneity of prostate cancer cells and the tumor microenvironment. Cell Mol Biol Lett. 2023;28(1):38.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen Z, et al. Dissecting the single-cell transcriptome network underlying esophagus non-malignant tissues and esophageal squamous cell carcinoma. EBioMedicine. 2021;69:103459.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eButler A, et al. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol. 2018;36(5):411\u0026ndash;20.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBecht E, et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol; 2018.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAran D, et al. Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat Immunol. 2019;20(2):163\u0026ndash;72.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang X, et al. CellMarker: a manually curated resource of cell markers in human and mouse. Nucleic Acids Res. 2019;47(D1):D721\u0026ndash;d728.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYu G, et al. clusterProfiler: an R package for comparing biological themes among gene clusters. Omics. 2012;16(5):284\u0026ndash;7.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eQiu X, et al. Reversed graph embedding resolves complex single-cell trajectories. Nat Methods. 2017;14(10):979\u0026ndash;82.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJin S, et al. Inference and analysis of cell-cell communication using CellChat. Nat Commun. 2021;12(1):1088.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBrowaeys R, Saelens W, Saeys Y. NicheNet: modeling intercellular communication by linking ligands to target genes. Nat Methods. 2020;17(2):159\u0026ndash;62.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhou Y, et al. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat Commun. 2019;10(1):1523.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLangfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9:559.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFriedman J, Hastie T, Tibshirani R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw. 2010;33(1):1\u0026ndash;22.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSubramanian A, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545\u0026ndash;50.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eH\u0026auml;nzelmann S, Castelo R, Guinney J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics. 2013;14:7.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRitchie ME, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43(7):e47.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMayakonda A, et al. Maftools: efficient and comprehensive analysis of somatic variants in cancer. Genome Res. 2018;28(11):1747\u0026ndash;56.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang X, et al. Pan-Cancer Analysis of PARP1 Alterations as Biomarkers in the Prediction of Immunotherapeutic Effects and the Association of Its Expression Levels and Immunotherapy Signatures. Front Immunol. 2021;12:721030.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZheng X, et al. Tumor-antigens and immune landscapes identification for prostate adenocarcinoma mRNA vaccine. Mol Cancer. 2021;20(1):160.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKlemke L, et al. Hsp90-stabilized MIF supports tumor progression via macrophage recruitment and angiogenesis in colorectal cancer. Cell Death Dis. 2021;12(2):155.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCai Y et al. \u003cem\u003eHeparin-Binding Protein: A Novel Biomarker Linking Four Different Cardiovascular Diseases.\u003c/em\u003e Cardiol Res Pract, 2020. 2020: p. 9575373.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOlshan AF, et al. Alterations of the p16 gene in head and neck cancer: frequency and association with p53, PRAD-1 and HPV. Oncogene. 1997;14(7):811\u0026ndash;8.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSocovich AM, Naba A. The cancer matrisome: From comprehensive characterization to biomarker discovery. Semin Cell Dev Biol. 2019;89:157\u0026ndash;66.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSuchting R, et al. Using Elastic Net Penalized Cox Proportional Hazards Regression to Identify Predictors of Imminent Smoking Lapse. Nicotine Tob Res. 2019;21(2):173\u0026ndash;9.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"},{"header":"Tables","content":"\u003cp\u003eTable 1 is available in the Supplementary Files section.\u003c/p\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Prostate adenocarcinoma, Prognosis, Immune landscape, Bulk RNA-seq, scRNA-seq","lastPublishedDoi":"10.21203/rs.3.rs-3912322/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-3912322/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eBackground\u003c/h2\u003e \u003cp\u003eThe prognostic management of Prostate adenocarcinoma (PRAD) presents a considerable challenge to healthcare professionals. However, it fails to accurately capture the fundamental cellular and molecular functions within tumor cells.\u003c/p\u003e\u003ch2\u003eMethods\u003c/h2\u003e \u003cp\u003eThe data for PRAD scRNA-seq were retrieved from the Gene Expression Omnibus (GEO) database. The limma program was utilized to identify differentially expressed genes (DEGs) in PRAD patients that exert an influence on overall survival (OS). For the identification of key modules associated with PRAD, Weighted Gene Correlation Network Analysis (WGCNA) was employed. The intersection of core cell marker genes, PRAD key module genes, and DEGs was utilized to build a predictive model using univariate Cox and Least Absolute Shrinkage and Selection Operator (LASSO) analyses. Furthermore, we conducted experimental validation by collecting patient samples.\u003c/p\u003e\u003ch2\u003eResults\u003c/h2\u003e \u003cp\u003eAnalysis of 162,897 scRNA-seq datasets and identified 7 central cell types. From the scRNA-seq dataset, 1805 marker genes were identified, while the bulk RNA-seq dataset yielded 1086 DEGs. Additionally, 2545 genes were linked to a key module identified through WGCNA. A predictive model was derived from the expression levels of 21 signature genes following intersection, univariate Cox, and LASSO analyses. And we confirmed the accuracy of our analysis through the patient samples we collected.\u003c/p\u003e\u003ch2\u003eConclusion\u003c/h2\u003e \u003cp\u003eThis study developed a unique prognosis-predictive model to predict the survival condition of individuals with PRAD through the integration of scRNA-seq and bulk RNA-seq data. The risk score emerges as a potential independent predictive indicator, demonstrating a strong relationship with the immunological microenvironment.\u003c/p\u003e","manuscriptTitle":"Establishing a Prognostic Model in Prostate Adenocarcinoma through Comprehensive scRNA-Seq and Bulk RNA-Seq Analysis and Validation","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-02-02 17:36:05","doi":"10.21203/rs.3.rs-3912322/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"ee738158-c3e4-44df-bce3-65e690696634","owner":[],"postedDate":"February 2nd, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2024-02-08T07:03:01+00:00","versionOfRecord":[],"versionCreatedAt":"2024-02-02 17:36:05","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-3912322","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-3912322","identity":"rs-3912322","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.