Bioinformatics analysis and validation of CD44 and associated biomarkers in prostate cancer

preprint OA: closed
Full text JSON View at publisher
Full text 140,601 characters · extracted from preprint-html · click to expand
Bioinformatics analysis and validation of CD44 and associated biomarkers in prostate cancer | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Bioinformatics analysis and validation of CD44 and associated biomarkers in prostate cancer Yangyang Xu, Junhao Lin, Fulin Wu, Jianbo Liang, Wei Li This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7019627/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 10 You are reading this latest preprint version Abstract Prostate cancer (PCa) development is linked to CD44, but its pathogenesis is unclear. This study explored the mechanisms of PCa linked to CD44 through bioinformatics and experimental approaches. Transcriptomic data analysis revealed CD44 localization on chromosome 11 and enrichment in the ribosome pathway. Differential expression analysis identified candidate genes, and two biomarkers—PLA2G4D and SERPINB5—were selected through PPI analysis, machine learning, and expression validation. These biomarkers showed lower expression in low CD44 and knockout groups compared to controls. The ANN model demonstrated high predictive accuracy (Area Under the Curve (AUC) = 0.825). Functional analysis showed PLA2G4D enrichment in the protein export pathway and SERPINB5 in dorsoventral axis formation. Twenty-two differential immune cells were identified, with positive correlations between CD44, PLA2G4D, SERPINB5, and NK cells (p < 0.05). Estradiol was identified as a targeted drug for all three genes. Reverse transcription-quantitative polymerase chain reaction (RT-qPCR) confirmed the downregulation of CD44, PLA2G4D, and SERPINB5 in tumor samples (p < 0.01). This study identified two CD44-related biomarkers, offering new therapeutic avenues and insights for PCa treatment. Health sciences/Biomarkers Biological sciences/Cancer Biological sciences/Computational biology and bioinformatics Health sciences/Oncology Prostate cancer CD44 Biomarkers Transcriptome sequencing analysis Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 1. Introduction Prostate cancer (PCa) has become the second most common carcinoma in males around the world, the number of new cases exceeds one million annually, and nearly 400,000 people die from it each year [ 1 ] . Demographic changes worldwide, leading to a gradual increase in the aging population, will directly result in nearly 3 million new cases of prostate cancer annually by 2020 [ 2 ] . Although maximal androgen blockade (MAB) with androgen deprivation therapy (ADT) and anti-androgen (AA), which is used for advanced PCa therapy, has shown more benefits than monotherapy [ 3 ] . Currently, the treatment methods for prostate cancer have become relatively mature and diversified, mainly including surgical treatment, chemotherapy, endocrine therapy, cryotherapy, radiotherapy, and immunotherapy [ 4 ] . However, the mortality rate of prostate cancer remains high, and there is still a challenge in treating prostate cancer. A wealth of research supports the role of cancer stem cells (CSCs) and their associated markers in the malignant behavior of tumors, such as cluster of differentiation 44 (CD44) [ 5 ] . CD44 is a non-kinase cell surface transmembrane glycoprotein. Once its overexpression is detected in cancer stem cells, alternative splicing may occur to promote cancer progression, which can significantly compromise treatment efficacy [ 6 ] . CD44 has been confirmed to be located on human chromosome 11 [ 7 ] or mouse chromosome 2 [ 8 ] , and it primarily functions as a single polypeptide chain encoded by a conserved gene. Hyaluronic acid (HA) is a major component of the extracellular matrix (ECM), expressed by stromal cells and cancer cells, and serves as a primary ligand for CD44 [ 9 ] . Once HAbinds to CD44, it will directly lead to a conformational change, which in turn causes adapter proteins or cytoskeletal components to bind to the intracellular domain of CD44. This subsequently activates various signaling pathways, resulting in cellular phenomena such as proliferation, adhesion, migration, and invasion [ 10 , 11 ] . When tumor cells undergo epithelial mesenchymal transition (EMT), they will acquire stemlike properties, and simultaneously, the expression of CD44 will increase [ 12 ] . In additional, tumour cells with an epithelial-mesenchymal transition phenotype also exhibit higher invasiveness and greater resistance to chemotherapy [ 13 ] . Studies have shown that the CD44 gene is associated with the occurrence and development of prostate cancer. Specifically, Loss of CD44 staining in both primary and metastatic lesions is a predictor of biochemical recurrence after radical prostatectomy [ 14 ] .The CD44 gene plays a crucial role in the occurrence and development of prostate cancer. Its expression and function are associated with the aggressive phenotype of prostate cancer cells, including proliferation, migration, and invasion [ 15 ] . Therefore, CD44 may serve as a diagnostic and prognostic marker, as well as a potential therapeutic target, in prostate cancer. Our study identified CD44-related biomarkers in PCa based on transcriptome data, and performed a series of analyses on CD44 and these biomarkers using bioinformatics methods, including functional enrichment analysis, immune infiltration analysis, drug prediction analysis, and experimental validation. Our research conclusions provide certain reference value for our subsequent exploration of the possible pathogenesis of prostate cancer, investigation into targeted therapies centered on new targets, and optimization of treatment methods as much as possible. 2. Results 2.1 Localization, molecular regulatory, gene expression, and functional enrichment analysis of CD44 The specific distribution of genes on chromosomes facilitates a deeper understanding of their functions. Chromosomal mapping indicated that CD44 was located on chromosome 11 ( Fig. 1 a ) . The subcellular localization results indicated that the CD44 protein was most likely located extracellularly with the highest score (Extracellular = 3.010) ( Fig. 1 b ) . The molecular regulatory networks play a crucial role in elucidating the principles of gene regulation within biological cells and understanding the pathogenesis of related diseases. In the miRWalk database, CD44 was associated with 2,099 microRNAs (miRNAs). In the miRDB database, CD44 was connected with 239 miRNAs. The TargetScan database revealed CD44 was linked to 911 miRNAs. The intersection of miRNAs identified for CD44 across 3 databases totals 133 (like hsa-miR-18a-3p, hsa-miR-548c-5p, and hsa-miR-520c-3p) ( Fig. 1 c ) . Subsequently, the StarBase database predicted a total of 111 lncRNAs (like SNHG3, XIST, and NORAD). Afterwards, the lncRNA-miRNA-mRNA regulatory network was constructed ( Fig. 1 d, and Supplementary Table S1 ) . The gene expression data indicated a significant reduction in CD44 expression in the tumor group relative to the normal group ( Fig. 1 e ) . Moreover, the top 5 pathways significantly enriched for the CD44 gene were ribosome, oxidative phosphorylation, glycosphingolipid biosynthesis globo series, parkinsons disease, and proteasome ( Fig. 1 f ) . 2.2 Identification and functional analysis of candidate genes Initially, a total of 208 differentially expressed genes(DEGs)1 were obtained, including 87 up-regulated genes and 121 down-regulated genes in high-CD44 and low-CD44 groups ( Fig. 2 a-b ) . Subsequently, the results of differential expression analysis revealed 3,006 DEGs2, with 1,273 up-regulated and 1,733 down-regulated in tumor samples compared with the normal group ( Fig. 2 c-d ) . Subsequently, a sum of 124 candidate genes was yielded by taking the intersection of DEGs1 and DEGs2 ( Fig. 2 e ) . Afterwards, the candidate genes exhibited marked enrichment across 364 GO signaling pathways, comprising 279 biological processes (BPs) (like cell fate determination), 28 cellular components (CCs) (like blood microparticles), and 57 molecular functions (MFs) (like serine-type endopeptidase inhibitor activity), as well as 25 KEGG signaling pathways (like renin secretion) ( Fig. 2 f-g ) . Moreover, PPI network analysis of candidate genes yielded 69 nodes (like AGT, FOXG1, and FGB) and 116 edges ( Fig. 2 h ) . The 69 genes were defined as candidate feature genes for subsequent analyses. 2.3 Identification of key genes and biomarkers Firstly, the optimal LASSO model was obtained with the smallest error (lambda.min = -4.5207). Consequently, 19 LASSO genes related to PCa were were identified through the LASSO algorithm ( Fig. 3 a, b ) . Secondly, a sum of 29 Boruta genes were gained utilizing Boruta algorithm ( Fig. 3 c ) . Additionally, the XGBoost algorithm identified 34 XGBoost genes ( Fig. 3 d ) . The intersection of genes from the 3 algorithms was taken to obtain 9 key feature genes ( Fig. 3 e ) . The Wilcoxon test revealed that CLCA2, CRISP3, PLA2G4D, SERPINB5, and SPINK1 exhibited significant and consistent expression differences between The Cancer Genome Atlas Prostate Adenocarcinoma (TCGA-PRAD) and validation set 1, and were thus designated as key genes. Specifically, in tumor samples, CRISP3 and SPINK1 were significantly upregulated, while CLCA2, PLA2G4D, and SERPINB5 were significantly downregulated ( Fig. 3 f-g ) . Subsequently, principal component analysis (PCA) of validation set 2 delineated a clear separation between the knockout and control samples, with the former demonstrating reduced expression levels relative to the latter ( Fig. 3 h-i ) . Notably, the genes PLA2G4D and SERPINB5 exhibited congruent expression profiles across both the TCGA-PRAD and the validation set 2 groups, thereby qualifying them as biomarkers ( Fig. 3 j ) . 2.4 ANN model of CD44, PLA2G4D, and SERPINB5 in PCa The model developed for assessing the accuracy of CD44, PLA2G4D, and SERPINB5 in predicting PCa was capable of effectively distinguishing between tumor and normal groups ( Fig. 4 a ) . The model achieved an AUC value of 0.825 on TCGA-PRAD (AUC > 0.7) ( Fig. 4 b ) . This evidence substantiated the model's robust predictive performance, demonstrating its effectiveness in distinguishing between tumor and the normal groups. 2.5 Functional assessment and immune infiltration analysis Gene set enrichment analysis(GSEA) revealed that PLA2G4D was associated with 60 pathways, SERPINB5 with 69 pathways. Notably, PLA2G4D was primarily enriched in the protein export, hematopoietic cell lineage, and asthma, etc. ( Fig. 5 a ) . For SERPINB5, the leading pathways were dorsoventral axis formation, glycosphingolipid biosynthesis globo series, glyoxylate and dicarboxylate metabolism, etc. ( Fig. 5 b ) . Subsequently, the estimated proportion of 28 distinct immune cell types across samples in the training set was depicted in Fig. 5 c. Upon contrasting the infiltration levels of immune cells between tumor and normal groups, it was determined that a total of 22 immune cells (like activated B cell, activated CD4 T cell, eosinophil) exhibited significant disparities between the 2 groups (p < 0.05) (Fig. 5 d). Correlation analysis revealed positive associations between CD44 (cor = 0.35, p < 0.05), PLA2G4D (cor = 0.41), and SERPINB5 (cor = 0.59, p < 0.05) with natural killer (NK) cell ( Fig. 5 e ) . 2.6 Network construction of CD44, PLA2G4D, and SERPINB5 Subsequently, the functional genes related to CD44, PLA2G4D, and SERPINB5, such as SLC9A1, HYAL2, ABCC5, and other genes, were explored. These genes were mainly enriched in metabolic processes including hyaluronan metabolic process, phospholipase A2 activity, mucopolysaccharide metabolic process, apical part of cell, phospholipid catabolic process, and glycosaminoglycan catabolic process ( Fig. 6 a ) . 2.7 Compounds related to CD44, PLA2G4D, and SERPINB5 and molecular docking In order to explore the compounds targeting CD44, PLA2G4D, and SERPINB5, compounds targeting these genes were predicted through the DsigDB database. The drug Tetradioxin, which had the highest Combined Score, was selected for molecular docking (Supplementary Table S2 and Fig. 6 b). Through molecular docking, it was found that the binding energy between CD44 and Tetradioxin was − 6.4 kcal/mol, and the amino acid residue GLU-67 was connected by a hydrogen bond. The binding energy between PLA2G4D and Tetradioxin was − 6.5 kcal/mol, with the amino acid residue THR-423 connected by a hydrogen bond. The binding energy between SERPINB5 and Tetradioxin was − 5.5 kcal/mol, and the amino acid residue ASN-266 was connected by a hydrogen bond. It could be seen that CD44, PLA2G4D, and SERPINB5 had good binding abilities with Tetradioxin ( Table 1 and Figs. 6 c-e ) . Table 1 Results of molecular docking of Tetradioxin related to CD44, PLA2G4D, and SERPINB5. Symbol Target_drug Binding energy (kcal/mol) CD44 Tetradioxin -6.4 PLA2G4D Tetradioxin -6.5 SERPINB5 Tetradioxin -5.5 2.8 The expression levels of CD44, PLA2G4D, and SERPINB5 were validated in clinical samples The results of Reverse transcription-quantitative polymerase chain reaction (RT-qPCR) showed that the expressions of CD44, PLA2G4D, and SERPINB5 were significantly downregulated in tumor samples (P < 0.01) ( Fig. 7 a-c ) . This was consistent with the results of our bioinformatics analysis, indicating that the results of the bioinformatics analysis were reliable. 3. Discussion This study is based on transcriptome data of prostate cancer, aiming to identify CD44-related biomarkers in this disease. We obtained several biomarkers, inciuding CD44, PLA2G4D, and SERPINB5. These were acquired through a series of methods, such as PPI analysis, machine learning algorithms, and expression validation. The predictive capabilities of these biomarkers were assessed using an ANN model. Subsequently, functional analysis, immune analysis, and drug prediction were conducted on these biomarkers. Overall, this provides a reference for exploring the potential pathogenesis of PCa, identifying new therapeutic targets for the development of targeted therapeutics, and optimizing treatment methods. The protein, which encoded by the CD44 gene, is a type of cell surface glycoprotein, which is closely related to cell adhesion processes, cell-cell interactions, and cell migration processes [ 16 – 18 ] . It serves as a receptor for hyaluronic acid and can also interact with other ligands, such as osteopontin, collagen, and matrix metalloproteinases (MMPs) [ 19 , 20 ] . This protein is involved in multiple cellular functions, such as lymphocyte activation, their recirculation and homing, hematopoiesis, and tumor metastasis. The transcripts of this gene undergoes variable splicing to produce functionally distinct isoforms. Gene variability in splicing is the foundation for the generation of structural and functional diversity of proteins, and this characteristic may be closely related to tumor progression and metastasis, as well as pathways such as glycosaminoglycan metabolism and the innate immune system. CD44 serves as a biomarker and therapeutic target for stem cells and tumors [ 5 , 21 , 22 ] . CD44 is notably enriched in the ribosome pathway, indicating its potential role in protein synthesis. In two datasets, the expression levels of CD44 in both the low CD44 and knockout groups were lower compared to those in the high CD44 controls. This suggest the significance of CD44 in the progression of prostate cancer. Moreover, the high predictive accuracy of ANN model targeting CD44 highlights its importance as a potential biomarker for prostate cancer. The biomarker PLA2G4D was identified through a set of analyses. The PLA2G4D gene encodes an enzyme belonging to the phospholipase A2 (PLA2) family, which primarily functions to hydrolyze the ester bond at the sn-2 position of glycerophospholipids, thereby releasing free fatty acids and lysophospholipids, which are important in biological processes such as inflammatory responses, cell signaling and lipid metabolism [ 23 ] . PLA2G4D encodes a cytosolic phospholipase A2 (cPLA2), which plays a key role in the inflammatory process [ 24 ] and has been associated with schizophrenia [ 25 ] and studied in relation to psoriasis [ 26 ] . For example, high expression of PLA2G4D is associated with poor prognosis in prostate cancer and cervical squamous cell carcinoma [ 27 ] . In our study,GSEA analysis showed that PLA2G4D is mainly enriched in the protein export pathway. This finding implies that PLA2G4D might play a role in regulating protein trafficking, which could affect cellular processes in prostate cancer. The lower expression of PLA2G4D in the low CD44 and knockout groups compared to the high CD44 controls also highlights its potential connection with the disease's progression. Another biomarker, SERPINB5, is mainly associated with the dorsoventral axis formation pathway. The SERPINB5 gene (Serpin Family B Member 5) is a gene that can encode proteins.Its related pathways include apoptosis, autophagy, and angiogenesis. A search for "gene + function/pathway" shows that the SERPINB5 gene is involved in the development of various cancers, such as liver cancer, nasopharyngeal cancer, gallbladder cancer, and colorectal cancer [ 28 – 30 ] . Both SERPINB5 and SERPINB1 are members of the serpin family, but they have different functions and mechanisms of action. SERPINB5 is primarily associated with tumor suppression, whereas SERPINB1 is mainly involved in anti-inflammation and protecting tissues from excessive protease degradation. Studies have shown that epigenetic silencing of SERPINB1 promotes inflammation-mediated progression of prostate cancer progression [ 31 ] . The lower expression of SERPINB5 in the low CD44 and knockout groups indicates its potential involvement in the development of prostate cancer. The positive correlations between CD44, PLA2G4D, and SERPINB5 with NK cells suggest that these genes may influence the immune response in prostate cancer. The results of this study showed that 22 immune infiltrating cells were significant different between the disease and the control groups, and of the three genes, CD44, PLA2G4D and SERPINB5, the first two were most positive correlated with natural killer cells, and PLA2G4D was most positive correlation with central memory CD4 T cells. The main sources of cytokines and cytoplasmic granules are natural killer (NK) cells [ 32 ] . The research findings by Amirhossein et al. confirmed that there is an interaction between Gal-9 and CD44, and this interaction plays a crucial role in the process of activating NK cells.Gal-9 enhances natural killer cell activity by interacting with CD44 [ 33 ] , suggesting that Gal-9 is a potential new therapeutic avenue for modulating NK cell effector functions. Furthermore, adoptive immunotherapy utilizing NK cells has shown significant efficacy in the treatment of hematological malignancies [ 34 ] . The high correlation between these three genes and immune activation suggests a close relationship with tumor cell killing, which in turn can influence the occurrence and progression of tumors. Cancer stem cells have been confirmed to be associated with cancer progression and metastasis [ 35 ] . MiRNAs regulate both normal stem cells and cancer stem cells [ 36 ] , and it has been confirmed that microRNA dysregulation is closely related to the initiation and progression of tumors [ 37 ] . The research by CAN LIU et al. has demonstrated that miR-34a is a target gene of p53, and its expression is downregulated in CD44 (+) prostate cancer cells purified from xenograft and primary tumors. If miR-34a is enforcedly expressed in CD44 (+) prostate cancer cells, it can inhibit the expansion, regeneration, and metastasis of tumor cells. Conversely, if miR-34a antagonists are expressed in CD44 (-) prostate cancer cells, Then it will lead to the progression and deterioration of the tumor [ 38 ] . Furthermore, studies have also demonstrated that NUMB is directly targeted by microRNA-9-5p (miR-9-5p), an oncogenic microRNA associated with poor prognosis in various malignancies. The levels of miR-9-5p are inversely correlated with the expression of NUMB in CD44 (+) prostate cancer stem cells. Levels of miR-9-5p reduces the expression of NUMB and inhibits multiple properties of prostate cancer stem cells, including proliferation, migration, invasion, and self-renewal [ 39 ] . Based on our research findings, we have predicted that the common drug estradiol (E2) may play a role in the treatment of prostate cancer.Research has confirmed that the status of cyclin D1 is closely associated with cell cycle disorders, and its effects lead to a reduction in cell number. Low concentrations (0.1 nM) of estradiol (E2) can effectively trigger these mechanisms, significantly improving treatment outcomes for patients with both early and late-stage prostate cancer, while avoiding the side effects associated with high-dose drug treatment [ 40 ] . Meanwhile, a study by Mark Stein et al. has also shown that transdermal estradiol treatment is safe, has biochemical activity, and remains of value in heavily pre-treated patients with advanced castration-resistant and chemotherapy-refractory metastatic prostate cancer [ 41 ] . Overall, the in-depth identification and characterization of CD44, PLA2G4D, and SERPINB5 have provided guiding significance for subsequent in-depth understanding of the molecular mechanisms underlying prostate cancer and the search for potential therapeutic targets. In our study, we found that the expression levels of the above three genes were significantly lower in both the training and validation sets compared to the control samples. The results of PCR verification are consistent with those of bioinformatics analysis, which also indicates the reliability of the results.Through functional analysis of CD44, we identified two biomarkers and significantly enriched pathways, as well as the association of biomarkers with the immune microenvironment, targeted drugs, etc. However, due to the relatively small sample size in our study, the overall results are highly dependent on the quality of the data, the accuracy of the algorithms, and the assumptions underlying the analysis. Moreover, due to the complexity and diversity of biological systems, the results may not fully reflect actual biological processes, and experimental validation is needed to confirm their authenticity and reliability. Further studies are needed to clarify the exact role of these genes and their potential as biomarkers. 4. Material and methods 4.1 Data collection and transcriptome sequencing The transcriptome data were retrieved from UCSC Xena and GEO databases. The training set, including 498 prostate adenocarcinoma (PRAD) tumor tissue samples and 52 normal tissue samples, was designated as The Cancer Genome Atlas (TCGA)-PARD [ 42 ] . All the above data were downloaded on July 1, 2024. The validation set 1 was obtained from GSE21034 (GPL 10264 platform) in the GEO database, and contained 179 PCa samples [ 43 ] . The validation set 2 included 3 PCa cell samples and 3 samples of PCa with the CD44 gene knocked out, and the different expression of downstream genes was tested by gene sequencing. One of the groups was marked as the SI group by the standard method of knocking out the CD44 gene. The other group did not do gene-level processing and was labeled as an NC control group. RNA was extracted from the NC group and the SI group by standard method respectively, and qualified RNA is obtained after further quality assessment of the extracted RNA for strict integrity, purity and precision. After obtaining qualified RNA, the messenger RNA (mRNA) is obtained by enriching polyA-rich tail and culling ribosome, and DNA amplification was obtained after reverse transcription with mRNA as a template. After constructing the DNA library and confirming its qualification through inspection, the pooled distinct libraries are subjected to sequencing on the Illumina HiSeq platform. The sequencing process adheres to the stipulated effective concentration and the desired amount of data to be generated by the sequencing machine, aiming to acquire the sequence details of the fragments to be sequenced. 4.2 Chromosome localization, subcellular localization, and molecular regulation network of CD44 To understand the specific distribution of CD44 on chromosomes so as to further understand the function of CD44, the location details of CD44 were visualized utilizing the RCircos package (v 1.2.2) [ 44 ] . Besides, the subcellular LOcalization predictor (CELLO) tool was employed to analyze the subcellular localization of the corresponding protein of CD44. To further investigate the potential molecular regulatory mechanisms of CD44, the upstream miRNAs of CD44 were predicted using the miRwalk, miRDB, and TargetScan databases. The intersection of miRNAs predicted by 3 databases was obtained to identify key miRNAs. Utilizing the Starbase ( https://rnasysu.com/encori/ ) database, lncRNAs were predicted using key miRNAs. The visualization of the lncRNA-miRNA-mRNA regulatory network was accomplished utilizing Cytoscape software (v 3.6.1) [ 45 ] . 4.3 Gene expression analysis and enrichment analysis of CD44 To investigate the expression differences of CD44 between tumor samples and normal samples in TCGA-PRAD, the Wilcoxon test was employed to analyze the expression levels of CD44 (p < 0.05). Additionally, the expression correlation coefficients (cor) for all genes in relation to CD44 were calculated utilizing Speraman correlation and subsequently ranked based on these cor, followed by GSEA. The reference gene set, c2.cp.kegg.v2023.1.Hs.symbols.gmt, was sourced from MSigDB. For the execution of GSEA, the clusterProfiler package (v 4.7.1.003) [ 46 ] was implemented (adj.p 1). 4.4 Identification of candidate genes First, using the median CD44 expression level as a threshold, PCa samples from the TCGA-PRAD dataset were classified into high-CD44 and low-CD44 groups according to their CD44 expression levels. The DESeq2 package (v 1.38.0) [ 47 ] was applied to discern DEGs between the 2 groups (adj.p 1). The identified DEGs were designated as DEGs1. Subsequently, the ggplot2 package (v 3.5.1) was employed to generate volcano plots that graphically represented DEGs1, highlighting the top 10 upregulated and downregulated genes within the plots. Heatmaps illustrating the expression patterns of the top 10 up-regulated and down-regulated genes were generated using ComplexHeatmap (v 2.14.0) [ 48 ] . The selection of DEGs from PCa samples compared to normal group samples aligned with the criteria established for the aforementioned high-CD44 and low-CD44 groups. The DEGs identified through this selection process were designated as DEGs2. Secondly, candidate genes were obtained by intersecting DEGs1 and DEGs2. The results were visualised utilizing the ggvenn package (v 0.1.9) [ 49 ] . 4.5 Enrichment analysis and protein-protein interaction (PPI) network of candidate genes To investigate the roles and pathways of the candidate genes, analyses leveraging GO and KEGG analyses were employed with the clusterProfiler package (p < 0.05). The results were visualised utilizing the ggplot2 package (v 3.5.1). Subsequently, to explore protein interactions among the candidate genes, a PPI network was constructed utilizing the STRING database with a confidence score ≥ 0.4. The results were visualised using Cytoscape software (v 3.6.1). The genes that exhibited interaction in the PPI network were defined as candidate feature genes. 4.6 Identification of key genes The LASSO method was utilized to obtain feature genes from candidate genes with the glmnet package (v 4.1-4) [ 50 ] . The most strongly associated feature genes were selected when lambda reached its minimum value, yielding the minimal error rate, identifying LASSO genes. Additionally, the candidate genes underwent a rigorous evaluation through the Boruta algorithm, which employed random forest to determine the significance of features by comparing their importance scores to randomly generated shadow features, effectively identifying key predictors (Boruta genes) amidst noise. Besides, the extreme gradient boosting (XGboost) model was constructed to evaluate the degree of impact that variables had on the predictive outcomes using the xgboost package (v 1.7.7.1) [ 51 ] . Genes that influenced the outcomes were denoted as XGboost genes. The feature genes, originating from the 3 machine learning algorithms, LASSO genes, Boruta genes, and XGBoost genes, were intersected as key feature genes for subsequent analyses. To assess the expression patterns of key feature genes in both the TCGA-PRAD and GSE21034 dataset, we utilized the rstatix package (v 0.7.2) to perform Wilcoxon test on all samples in 2 datasets. Genes that revealed marked differences between tumor and normal groups and exhibited consistent expression trends across both datasets were considered as key genes for subsequent analyses (p < 0.05). 4.7 Identification of biomarkers To evaluate the presence of significant differences between PCa samples and PCa-CD44 knockout samples from the validation set 2, PCA was conducted on the transcriptome sequencing data using the fast.prcomp function from the FactoMineR package (v 2.7) [ 52 ] . Then, to explore the expression of CD44 in PCa samples and PCa-CD44 knockout samples within the validation set 2 and to assess the effect of CD44 knockout, the expression levels of CD44 were analyzed using the Wilcoxon test (p < 0.05). Expression analysis was conducted using Wilcoxon test on the high-CD44 and low-CD44 groups of PCa samples from the TCGA-PRAD dataset and the validation set 2 to further identify CD44-related biomarkers in PCa. Genes that revealed marked differences between tumor and normal groups and high-CD44 and low-CD44 groups and exhibited consistent expression trends across both datasets were considered as biomarkers (p < 0.05). 4.8 Artificial Neural Networks (ANN) To assess the predictive accuracy of biomarkers for PCa, the expression data of these biomarkers were transformed into gene scores through the application of the min-max normalization method in TCGA-PRAD. Subsequently, the ANN was developed utilizing the NeuralNetTools (v 1.5.3) [ 53 ] and neuralnet packages (v 1.44.2) [ 54 ] . To evaluate the outcomes of the ANN model’s, the pROC package (v 1.18.0) [ 55 ] was utilized to generate the ROC curve. 4.9 GSEA and immune infiltration analysis of biomarkers To elucidate the pathways of biomarkers in PRAD, GSEA was performed on the biomarkers in PRAD samples from the TCGA-PRAD. The analysis method was consistent with the aforementioned GSEA analysis method for CD44. Subsequently, immune infiltration analysis offered insights into the tumor microenvironment, identified various immune cell populations, predicted responses to treatments, and aided in the development of immunotherapeutic strategies by revealing the complex interactions between immune cells and cancer cells. Initially, the ssGSEA algorithm was employed to determine the enrichment scores for 28 types of immune infiltrating cells [ 56 ] across all samples in TCGA-PRAD, utilizing the GSVA package (v 1.46.0) [ 57 ] . To assess the differences in the relative abundance of immune cells between tumor and normal groups, the Wilcoxon test was applied (p 0.3, p < 0.05). 4.10 GeneMANIA The network construction for biomarkers and CD44 was carried out using GeneMANIA ( https://genemania.org/ ). Genes functionally related to biomarkers and CD44, as well as enriched pathways, were explored through this process. 4.11 Drug prediction and molecular docking Compounds that might act on CD44 and biomarkers were predicted using the DsigDB database ( http://dsigdb.tanlab.org/DSigDBv1.0/ ). A Sankey diagram of the Top 25 small molecule compound - biomarker network was drawn based on the ranking of the Combined Score using the ggalluvial package (v 0.12.5) [ 58 ] . Subsequently, molecular docking was performed between the compound with the highest Combined Score and CD44 as well as the biomarkers. The 3D structure of the compound was downloaded from the PubChem database ( https://pubchem.ncbi.nlm.nih.gov/ ), and the 3D structures of the proteins encoded by the genes were downloaded from the Protein Data Bank Database ( https://www.rcsb.org/ ). Molecular docking of the small molecule ligands (compounds) and macromolecular proteins (genes) was carried out online using CBdock ( https://cadd.labshare.cn/cb-dock/php/blinddock.php ). Finally, the results of CBdock were visualized using PyMOL (v 3.1.0) [ 59 ] . 4.12 RT-qPCR validation To further validate the expression levels of CD44 and biomarkers in PCa samples and control samples, RT-qPCR experiments were conducted. Five tumor tissue samples from PCa patients and five adjacent non-tumor tissue samples were collected from The People's Hospital of Guangxi Zhuang Autonomous Region] for RT-qPCR. This study adhered to the Declaration of Helsinki and received approval from the Ethics Committee of Guangxi Zhuang Autonomous Region People's Hospital(approval no.2014-010). Informed consent was obtained from all patients. To verify the expression of CD44 and biomarkers, total RNA was extracted from the samples using TRIZOL (Vazyme, Nanjing, China) according to the manufacturer's instructions. The first strand of complementary DNA (cDNA) was synthesized from 2 µg of total RNA using Hifair® Ⅲ 1st Strand cDNA Synthesis SuperMix for qPCR (Yeasen, Shanghai, China) following the provided guidelines. RT-qPCR was performed using 2xUniversal Blue SYBR Green qPCR Master Mix (Servicebio, Wuhan, China). The detailed primer sequences and specific reaction procedures could be viewed in Supplementary Table S1 . GAPDH was used as the internal reference gene. The gene expression levels were calculated using the 2 -ΔΔCt method (PMID: 11846609). The results were visualized using Graphpad Prism 10. 4.13 Statistical analysis The R programming language (v 4.2.2) was utilized to conduct statistical analysis. Differences analysis between cohorts was executed via the Wilcoxon test. The p < 0.05 was considered statistically significant. Declarations Consent to participate Informed consent was obtained from all individual participants included in the study. Data availability statement The datasets analysed during the current study are available in the [GEO] repository [https://www.ncbi.nlm.nih.gov/gds/]; tht [Utilizing the Starbase] repository [https://rnasysu.com/encori/]; the [GeneMANIA] repository [https://genemania.org/]; the [DsigDB] repository [http://dsigdb.tanlab.org/DSigDBv1.0/]; the [PubChem] repository [https://pubchem.ncbi.nlm.nih.gov/]; the [Protein Data Bank] repository [https://www.rcsb.org/]; the [CBdock] repository [https://cadd.labshare.cn/cb-dock/php/blinddock.php]. Acknowledgements We would like to express our sincere gratitude to all individuals and organizations who supported and assisted us throughout this research. Special thanks to the following authors: Wei Li. In conclusion, we extend our thanks to everyone who has supported and assisted us along the way. Without your support, this research would not have been possible. Authors' contributions WL contributed to the experiment conception and design, data analysis, and manuscript draft. YX, and FW conducted the experiments. WL, YX, and FW contributed to manuscript draft and data analysis. YX and JL contributed to interpretation of data, manuscript draft and manuscript revision. YX, and JL are responsible for confirming the authenticity of all the raw data. All authors read and approved the manuscript. Competing interests The author(s) declare no competing interests. The authors declare that they have no competing interests. Funding This work was funded by the National Natural Science Foundation of China (grant no. 81460387) and it was supported by the Key Special Project of China's Key R&D Program "Active Health and Scientific Response to Aging" (no: 2021YFC2009300,2021YFC2009301 ). References Bray, F. et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 74 , 229–263 (2024). James, N. D. et al. The Lancet Commission on prostate cancer: planning for the surge in cases. Lancet 403 , 1683–1722 (2024). Wilkerson, M. L., Lin, F., Liu, H. & Cheng, L. The application of immunohistochemical biomarkers in urologic surgical pathology. Arch. Pathol. Lab. Med. 138 , 1643–1665 (2014). Sekhoacha, M. et al. Prostate Cancer Review: Genetics, Diagnosis, Treatment Options, and Alternative Approaches. Molecules 27 , 5730 (2022). Thapa, R. & Wilson, G. D. The Importance of CD44 as a Stem Cell Biomarker and Therapeutic Target in Cancer. Stem Cells Int. 2087204 (2016). (2016). Stamenkovic, I., Amiot, M., Pesando, J. M. & Seed, B. A lymphocyte molecule implicated in lymph node homing is a member of the cartilage link protein family. Cell 56 , 1057–1062 (1989). Goodfellow, P. N. et al. The gene, MIC4, which controls expression of the antigen defined by monoclonal antibody F10.44.2, is on human chromosome 11. Eur. J. Immunol. 12 , 659–663 (1982). Colombatti, A., Hughes, E. N., Taylor, B. A. & August, J. T. Gene for a major cell surface glycoprotein of mouse macrophages and other phagocytic cells is on chromosome 2. Proc. Natl. Acad. Sci. U S A . 79 , 1926–1929 (1982). Banerjee, S. et al. Impaired Synthesis of Stromal Components in Response to Minnelide Improves Vascular Function, Drug Delivery, and Survival in Pancreatic Cancer. Clin. Cancer Res. 22 , 415–425 (2016). Ponta, H., Sherman, L. & Herrlich, P. A. CD44: from adhesion molecules to signalling regulators. Nat. Rev. Mol. Cell. Biol. 4 , 33–45 (2003). Zöller, M. CD44: can a cancer-initiating cell profit from an abundantly expressed molecule? Nat. Rev. Cancer . 11 , 254–267 (2011). Mani, S. A. et al. The epithelial-mesenchymal transition generates cells with properties of stem cells. Cell 133 , 704–715 (2008). Zhao, S. et al. CD44 Expression Level and Isoform Contributes to Pancreatic Cancer Cell Plasticity, Invasiveness, and Response to Therapy. Clin. Cancer Res. 22 , 5592–5604 (2016). Hao, J. L., Cozzi, P. J., Khatri, A., Power, C. A. & Li, Y. CD147/EMMPRIN and CD44 are potential therapeutic targets for metastatic prostate cancer. Curr. Cancer Drug Targets . 10 , 287–306 (2010). Lai, C. J. et al. CD44 Promotes Migration and Invasion of Docetaxel-Resistant Prostate Cancer Cells Likely via Induction of Hippo-Yap Signaling. Cells 8 , 295 (2019). Vikesaa, J. et al. RNA-binding IMPs promote cell adhesion and invadopodia formation. Embo j. 25 , 1456–1468 (2006). Crosby, H. A., Lalor, P. F., Ross, E., Newsome, P. N. & Adams, D. H. Adhesion of human haematopoietic (CD34+) stem cells to human liver compartments is integrin and CD44 dependent and modulated by CXCR3 and CXCR4. J. Hepatol. 51 , 734–749 (2009). Yoshida, T., Matsuda, Y., Naito, Z. & Ishiwata, T. CD44 in human glioma correlates with histopathological grade and cell migration. Pathol. Int. 62 , 463–470 (2012). Casalino-Matsuda, S. M., Monzon, M. E., Day, A. J. & Forteza, R. M. Hyaluronan fragments/CD44 mediate oxidative stress-induced MUC5B up-regulation in airway epithelium. Am. J. Respir Cell. Mol. Biol. 40 , 277–285 (2009). Midgley, A. C. et al. Transforming growth factor-β1 (TGF-β1)-stimulated fibroblast to myofibroblast differentiation is mediated by hyaluronan (HA)-facilitated epidermal growth factor receptor (EGFR) and CD44 co-localization in lipid rafts. J. Biol. Chem. 288 , 14824–14838 (2013). Xu, H., Niu, M., Yuan, X., Wu, K. & Liu, A. CD44 as a tumor biomarker and therapeutic target. Exp. Hematol. Oncol. 9 , 36 (2020). Freitas, R. et al. A multivalent CD44 glycoconjugate vaccine candidate for cancer immunotherapy. J. Control Release . 367 , 540–556 (2024). Breithofer, J. et al. Phospholipase A2 group IVD mediates the transacylation of glycerophospholipids and acylglycerols. J. Lipid Res. 65 , 100685 (2024). Ohto, T., Uozumi, N., Hirabayashi, T. & Shimizu, T. Identification of novel cytosolic phospholipase A(2)s, murine cPLA(2){delta}, {epsilon}, and {zeta}, which form a gene cluster with cPLA(2){beta}. J. Biol. Chem. 280 , 24576–24583 (2005). Tao, R. et al. A family based study of the genetic association between the PLA2G4D gene and schizophrenia. Prostaglandins Leukot. Essent. Fat. Acids . 73 , 419–422 (2005). Cheung, K. L. et al. Psoriatic T cells recognize neolipid antigens generated by mast cell phospholipase delivered by exosomes and presented by CD1a. J. Exp. Med. 213 , 2399–2412 (2016). Liu, H. et al. Metabolic Molecule PLA2G2D Is a Potential Prognostic Biomarker Correlating With Immune Cell Infiltration and the Expression of Immune Checkpoint Genes in Cervical Squamous Cell Carcinoma. Front. Oncol. 11 , 755668 (2021). Liu, B. X. et al. SERPINB5 promotes colorectal cancer invasion and migration by promoting EMT and angiogenesis via the TNF-α/NF-κB pathway. Int. Immunopharmacol. 131 , 111759 (2024). Zhang, P. et al. TRIM21-SERPINB5 aids GMPS repression to protect nasopharyngeal carcinoma cells from radiation-induced apoptosis. J. Biomed. Sci. 27 , 30 (2020). Yang, S. F., Yeh, C. B., Chou, Y. E., Lee, H. L. & Liu, Y. F. Serpin peptidase inhibitor (SERPINB5) haplotypes are associated with susceptibility to hepatocellular carcinoma. Sci. Rep. 6 , 26605 (2016). Lerman, I. et al. Epigenetic Suppression of SERPINB1 Promotes Inflammation-Mediated Prostate Cancer Progression. Mol. Cancer Res. 17 , 845–859 (2019). Vivier, E., Tomasello, E., Baratin, M., Walzer, T. & Ugolini, S. Functions of natural killer cells. Nat. Immunol. 9 , 503–510 (2008). Rahmati, A., Bigam, S. & Elahi, S. Galectin-9 promotes natural killer cells activity via interaction with CD44. Front. Immunol. 14 , 1131379 (2023). Kim, S. et al. Surface Engineering of Natural Killer Cells with CD44-targeting Ligands for Augmented Cancer Immunotherapy. Small 20 , e2306738 (2024). Visvader, J. E. & Lindeman, G. J. Cancer stem cells in solid tumours: accumulating evidence and unresolved questions. Nat. Rev. Cancer . 8 , 755–768 (2008). Melton, C., Judson, R. L. & Blelloch, R. Opposing microRNA families regulate self-renewal in mouse embryonic stem cells. Nature 463 , 621–626 (2010). Esquela-Kerscher, A. & Slack, F. J. Oncomirs - microRNAs with a role in cancer. Nat. Rev. Cancer . 6 , 259–269 (2006). Liu, C. et al. The microRNA miR-34a inhibits prostate cancer stem cells and metastasis by directly repressing CD44. Nat. Med. 17 , 211–215 (2011). Wang, X. et al. NUMB suppression by miR-9-5P enhances CD44(+) prostate cancer stem cell growth and metastasis. Sci. Rep. 11 , 11210 (2021). Koong, L. Y. & Watson, C. S. Direct estradiol and diethylstilbestrol actions on early- versus late-stage prostate cancer cells. Prostate 74 , 1589–1603 (2014). Stein, M. et al. Transdermal estradiol in castrate and chemotherapy resistant prostate cancer. Med. Sci. Monit. 18 , Cr260–264 (2012). Zhao, Q., Cheng, Y. & Xiong, Y. LTF Regulates the Immune Microenvironment of Prostate Cancer Through JAK/STAT3 Pathway. Front. Oncol. 11 , 692117 (2021). Taylor, B. S. et al. Integrative genomic profiling of human prostate cancer. Cancer Cell. 18 , 11–22 (2010). Zhang, H., Meltzer, P. & Davis, S. RCircos: an R package for Circos 2D track plots. BMC Bioinform. 14 , 244 (2013). Liu, P., Xu, H., Shi, Y., Deng, L. & Chen, X. Potential Molecular Mechanisms of Plantain in the Treatment of Gout and Hyperuricemia Based on Network Pharmacology. Evid Based Complement Alternat Med. 3023127 (2020). (2020). Yu, G., Wang, L. G., Han, Y. & He, Q. Y. clusterProfiler: an R package for comparing biological themes among gene clusters. Omics 16 , 284–287 (2012). Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43 , e47 (2015). Gu, Z. & Hübschmann, D. Make Interactive Complex Heatmaps in R. Bioinformatics 38 , 1460–1462 (2022). Zheng, Y. et al. Ferroptosis and Autophagy-Related Genes in the Pathogenesis of Ischemic Cardiomyopathy. Front. Cardiovasc. Med. 9 , 906753 (2022). Li, Y., Lu, F. & Yin, Y. Applying logistic LASSO regression for the diagnosis of atypical Crohn's disease. Sci. Rep. 12 , 11340 (2022). Hou, N. et al. Predicting 30-days mortality for MIMIC-III patients with sepsis-3: a machine learning approach using XGboost. J. Transl Med. 18 , 462 (2020). Piotrowska-Niczyporuk, A., Bajguz, A., Kotowska, U., Zambrzycka-Szelewa, E. & Sienkiewicz, A. Auxins and Cytokinins Regulate Phytohormone Homeostasis and Thiol-Mediated Detoxification in the Green Alga Acutodesmus obliquus Exposed to Lead Stress. Sci. Rep. 10 , 10193 (2020). Beck, M. W. & NeuralNetTools Visualization and Analysis Tools for Neural Networks. J. Stat. Softw. 85 , 1–20 (2018). Li, S. et al. Construction of Osteosarcoma Diagnosis Model by Random Forest and Artificial Neural Network. J. Pers. Med. 13 , 447 (2023). Robin, X. et al. pROC: an open-source package for R and S + to analyze and compare ROC curves. BMC Bioinform. 12 , 77 (2011). Gao, X., Guo, Z., Wang, P., Liu, Z. & Wang, Z. Transcriptomic analysis reveals the potential crosstalk genes and immune relationship between IgA nephropathy and periodontitis. Front. Immunol. 14 , 1062590 (2023). Hänzelmann, S., Castelo, R. & Guinney, J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinform. 14 , 7 (2013). Brunson, J. C. ggalluvial: Layered Grammar for Alluvial Plots. J. Open. Source Softw. 5 , 2017 (2020). Seeliger, D. & de Groot, B. L. Ligand docking and binding site analysis with PyMOL and Autodock/Vina. J. Comput. Aided Mol. Des. 24 , 417–422 (2010). Additional Declarations No competing interests reported. Supplementary Files SupplementaryInformation.pdf SupplementaryTable1.xlsx SupplementaryTable2.xlsx Cite Share Download PDF Status: Under Review Version 1 posted Editorial decision: Revision requested 24 Oct, 2025 Reviews received at journal 22 Oct, 2025 Reviewers agreed at journal 19 Oct, 2025 Reviews received at journal 06 Oct, 2025 Reviewers agreed at journal 26 Sep, 2025 Reviewers invited by journal 07 Jul, 2025 Editor assigned by journal 07 Jul, 2025 Editor invited by journal 04 Jul, 2025 Submission checks completed at journal 04 Jul, 2025 First submitted to journal 01 Jul, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7019627","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":483352480,"identity":"20710f76-b77d-4dcc-8735-94f7966d0ea7","order_by":0,"name":"Yangyang Xu","email":"","orcid":"","institution":"The People’s Hospital of Guangxi Zhuang Autonomous Region","correspondingAuthor":false,"prefix":"","firstName":"Yangyang","middleName":"","lastName":"Xu","suffix":""},{"id":483352482,"identity":"b5050233-11fc-49f8-b724-9f0580848ae0","order_by":1,"name":"Junhao Lin","email":"","orcid":"","institution":"The People’s Hospital of Guangxi Zhuang Autonomous Region","correspondingAuthor":false,"prefix":"","firstName":"Junhao","middleName":"","lastName":"Lin","suffix":""},{"id":483352483,"identity":"d55467a7-69ef-48bd-9871-fc97c1e43a63","order_by":2,"name":"Fulin Wu","email":"","orcid":"","institution":"The People’s Hospital of Guangxi Zhuang Autonomous Region","correspondingAuthor":false,"prefix":"","firstName":"Fulin","middleName":"","lastName":"Wu","suffix":""},{"id":483352485,"identity":"43a0c3e9-02a6-4eb0-bdc9-a6007ac9bfcd","order_by":3,"name":"Jianbo Liang","email":"","orcid":"","institution":"The People’s Hospital of Guangxi Zhuang Autonomous Region","correspondingAuthor":false,"prefix":"","firstName":"Jianbo","middleName":"","lastName":"Liang","suffix":""},{"id":483352487,"identity":"b3f91bae-beb7-4f33-aea5-b672df85eca8","order_by":4,"name":"Wei Li","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA5klEQVRIiWNgGAWjYBACPmYgkVDxj4GfmbHhwAcDGzuCWtjAWs4cYJBsbz74cEZBWjJhLSCCse0Ag8GZY8nGPB8OMTYQ1MLOY/jgYdsdBoYbOWbSNgYHmBnYDx/dgN9hPMYGCeeeMTDOAGrJMbjDx8CTlnYDvxbebRIJZcwMzBJgLc+YGSR4zAhp2f4jARgIbCAtFgaHGRuI0LKNIaHtMAMPD9D7DMRp4f8skXAmjUGCHRjIPQZpyWyE/MLPfyzx448KGwZ7oPkHfvyxseNnP3wMrxYYqG+A20uM8lEwCkbBKBgF+AEAkddG0RwXFEUAAAAASUVORK5CYII=","orcid":"","institution":"The People’s Hospital of Guangxi Zhuang Autonomous Region","correspondingAuthor":true,"prefix":"","firstName":"Wei","middleName":"","lastName":"Li","suffix":""}],"badges":[],"createdAt":"2025-07-01 11:08:35","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7019627/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7019627/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":86513263,"identity":"08abdea8-1897-4e33-bc53-eb731e73d6ae","added_by":"auto","created_at":"2025-07-11 13:34:48","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":265605,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eDiagram of the result of CD44-Related Analyses and Regulatory Networks in Prostate Cancer. (a) \u003c/strong\u003eMap of chromosome distribution of CD44. \u003cstrong\u003e(b) \u003c/strong\u003eDiagram of subcellular localization analysis of the corresponding protein of CD44. * indicates p\u0026lt;0.05. \u003cstrong\u003e(c) \u003c/strong\u003eVenn Diagram of key miRNAs from miRwalk database, miRDB database and TargetScan database. \u003cstrong\u003e(d) \u003c/strong\u003eDiagram of LncRNAmiRNA-mRNA regulatory network. Red represents CD44, purple represents lncRNA, and green represents miRNA. Blue represents the normal group and yellow represents the tumor group; ** indicates p\u0026lt;0.01 \u003cstrong\u003e(e) \u003c/strong\u003eDiagram of expression of CD44 in PCa samples and control samples in the training set TCGA-PRAD. (** indicates p\u0026lt;0.01)\u003cstrong\u003e (f) \u003c/strong\u003eDiagram of the result of single sample gene set enrichment analysis. The figure is divided into three sections. The upper section illustrates the process of calculating enrichment scores (ES, enrichment score). From left to right, an ES value is calculated for each gene and plotted as a line. A particularly prominent peak on the far left or right represents the ES value of the gene set phenotype. In the middle section, www.yiqishengxin.com 9, each line represents a gene in the gene set and its position in the gene list. The lower section shows the distribution of rank values for all genes, with the vertical axis representing the ranked list metric, which indicates the ranking of the gene\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-7019627/v1/9fb4edd2df4582c92541e0ea.png"},{"id":86514408,"identity":"8a4d2dfd-a0b5-443b-913a-ca923718164e","added_by":"auto","created_at":"2025-07-11 13:50:48","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":273862,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eDiagram of differential expression gene analysis and functional enrichment analysis of candidate genes.\u003c/strong\u003e \u003cstrong\u003e(a)\u003c/strong\u003e Volcano map of differentially expressed genes. The coordinate is Log\u003csub\u003e2\u003c/sub\u003eFC, and the vertical coordinate is-Log10(p value). Each point represents a gene. The genes in the upper right corner are up-regulated differential expression genes (red), the genes in the upper left corner are down-regulated differential expression genes (blue), and the rest of the genes are not statistically significant (gray).\u003cstrong\u003e (b) \u003c/strong\u003eHeatmap of gene expression level. In the middle annotation bar, blue represents samples with low CD44 expression, and red represents those with high CD44 expression. The color of the density heat map above indicates the gene expression density of each sample, with redder colors indicating higher density. In the lower heat map, the vertical axis represents genes, with red indicating high-expression genes and blue indicating low-expression genes. \u003cstrong\u003e(c) \u003c/strong\u003eVolcano map of differentially expressed genes. The genes in the upper right corner are up-regulated differential expression genes (red), the genes in the upper left corner are down-regulated differential expression genes (blue), and the rest of the genes are not statistically significant (gray).\u003cstrong\u003e (d) \u003c/strong\u003eHeatmap of gene expression density. In the middle annotation bar, blue represents samples with low CD44 expression, and red represents those with high CD44 expression. The color of the density heat map above indicates the gene expression density of each sample, with redder colors indicating higher density. In the lower heat map, the vertical axis represents genes, with red indicating high-expression genes and blue indicating low-expression genes.\u003cstrong\u003e (e) \u003c/strong\u003eVenn diagram of the intersection of genes in the DEGs1 and DEGs2 sets.\u003cstrong\u003e (f) \u003c/strong\u003eBubble diagram of GO analysis of candidate genes. The size of the point represents the number of KEGG pathways enriched to the gene, and the larger the point, the more genes enriched; the x-coordinate represents-log10(p), and the larger the value, the more significant the pathway enriched, and the y-coordinate represents the name of the pathway enriched.\u003cstrong\u003e (g) \u003c/strong\u003eBubble diagram of KEGG analysis of candidate genes.\u003cstrong\u003e \u003c/strong\u003eThe size of the graph represents the number of enriched genes, and the larger the point, the more enriched genes; the horizontal axis represents-log10 (p), and the larger the value, the more significant the enriched pathway, and the vertical axis represents the name of the enriched pathway. \u003cstrong\u003e(h) \u003c/strong\u003eDiagram of the PPI network analysis of candidate genes. Each node represents a gene, and the lines represent interactions between genes.\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-7019627/v1/fbec939e5d1d76bd010948ba.png"},{"id":86513266,"identity":"077cb7a3-05f3-4910-a663-503092262810","added_by":"auto","created_at":"2025-07-11 13:34:48","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":205574,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFigures of identification and analysis of key genes and biomarkers.\u003c/strong\u003e \u003cstrong\u003e(a) \u003c/strong\u003eGraph of LASSO cross-validation. The x-coordinate is log (Lambda) and the y-coordinate represents the local likelihood deviation.\u003cstrong\u003e (b) \u003c/strong\u003eGraph of the LASSO regression path.\u003cstrong\u003e \u003c/strong\u003eEach curve in the figure represents the trajectory of change of each genetic coefficient, the x-coordinate is log (Lambda) and the vertical axis is the coefficient value. \u003cstrong\u003e(c) \u003c/strong\u003eGraph of the results of the Boruta algorithm.\u003cstrong\u003e (d) \u003c/strong\u003eGraph of the results of the XGBoostalgorithm. The vertical axis is the variable and the horizontal axis is the importance.\u003cstrong\u003e (e) \u003c/strong\u003eVenn diagram of the intersection of LASSO, Boruta and XGBoost algorithms.\u003cstrong\u003e(f) \u003c/strong\u003eBox plot of expression levels of characteristic genes in training set TCGA-PRAD. * indicates p\u0026lt;0.05, ** indicates p\u0026lt;0.01, ***indicates p\u0026lt;0.001, **** indicates p\u0026lt;0.0001.\u003cstrong\u003e (g) \u003c/strong\u003eBox plot of expression levels of characteristic genes in validation set 1 (GSE21034). * indicates p\u0026lt;0.05, ** indicates p\u0026lt;0.01, **** indicates p\u0026lt;0.0001.\u003cstrong\u003e (h) \u003c/strong\u003eGraph of PCA analysis of PCa samples and PCa-CD44 knockout samples. Orange represents the con group and purple represents the knockout group.\u003cstrong\u003e (i) \u003c/strong\u003eBox plot of CD44 expression in PCa samples and PCa-CD44 knockout samples. **** indicates p\u0026lt;0.0001.\u003cstrong\u003e (j-1) \u003c/strong\u003eBox plot of expression levels of characteristic genes in training (left) set and\u003cstrong\u003e \u003c/strong\u003evalidation set 2 (right). ** indicates p\u0026lt;0.01, **** indicates p\u0026lt;0.0001.\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-7019627/v1/12ec5563e13ba845b4ed68a2.png"},{"id":86513267,"identity":"e284b069-bc1f-4344-9987-e644d9a720de","added_by":"auto","created_at":"2025-07-11 13:34:48","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":42153,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eGraph of artificial neural network (ANN) analysis.\u003c/strong\u003e \u003cstrong\u003e(a) \u003c/strong\u003eANN diagram constructed by CD44 and biomarkers.\u003cstrong\u003e (b) \u003c/strong\u003eDiagram of the ROC curve of the ANN model.\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-7019627/v1/b19f469ceebd08c64231135b.png"},{"id":86513268,"identity":"b6995b22-316b-4318-9c15-23501bd97493","added_by":"auto","created_at":"2025-07-11 13:34:48","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":205072,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eGraph of GSEA enrichment analysis and immune infiltration analysis.\u003c/strong\u003e \u003cstrong\u003e(a) \u003c/strong\u003eFigure of the GSEA enriched pathway of PLA2G4D. \u003cstrong\u003e(b)\u003c/strong\u003e Figure of the GSEA enriched pathway of SERPINB5. The figure is divided into three sections. The upper section illustrates the calculation process of the enrichment score (ES) values, where an ES value is calculated for each gene from left to right, forming a line. A prominent peak on the far left or right represents the ES value of the gene set phenotype. In the middle section, each line represents a gene in the gene set and its position in the gene list. The lower section shows the distribution of rank values for all genes, with the vertical axis representing the ranked list metric, which indicates the ranking of the gene. \u003cstrong\u003e(c)\u003c/strong\u003eHeat map of the enrichment scores of immune infiltrating cells between tumor and normal samples. The horizontal axis is the sample, and the vertical axis is different immune cells. Different colors represent the infiltration fraction of immune cells. Yellow represents tumor samples, and blue represents normal samples.\u003cstrong\u003e (d) \u003c/strong\u003eBox plot of immune cell infiltration differences between tumor and normal group samples. The x-axis represents different immune cells, and the y-axis represents the infiltration fraction of immune cells. Yellow represents tumor samples, and blue represents normal samples, * indicates p\u0026lt;0.05, ** indicates p\u0026lt;0.01, ***indicates p\u0026lt;0.001, **** indicates p\u0026lt;0.0001. \u003cstrong\u003e(e) \u003c/strong\u003eMap of the correlation between genes and differential immune cells. The colors in the figure indicate correlation, with yellow for positive correlation and blue for negative correlation, * Indicates the magnitude of significance.\u003c/p\u003e","description":"","filename":"5.png","url":"https://assets-eu.researchsquare.com/files/rs-7019627/v1/436de221a64b0a29af724264.png"},{"id":86514409,"identity":"7e07e57e-75d8-41eb-b9cd-ff0a275318c9","added_by":"auto","created_at":"2025-07-11 13:50:48","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":396318,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eDiagrams of Functional Analysis, Compound Prediction, and Molecular Docking of CD44, PLA2G4D, and SERPINB5. (a) \u003c/strong\u003eFigure of functional genes related to CD44, PLA2G4D and SERPINB5 and related functional enrichment analysis.\u003cstrong\u003e (b) \u003c/strong\u003eFigure of compounds predicted to be associated with the genes CD44, PLA2G4D and SERPINB5.\u003cstrong\u003e (c) \u003c/strong\u003eDiagram of Molecular docking result of CD44 and Tetradioxin.\u003cstrong\u003e (d) \u003c/strong\u003eDiagram of Molecular docking result of PLA2G4D and Tetradioxin.\u003cstrong\u003e \u003c/strong\u003e(\u003cstrong\u003ee\u003c/strong\u003e) Diagram of Molecular docking result of SERPINB5 and Tetradioxin.\u003c/p\u003e","description":"","filename":"6.png","url":"https://assets-eu.researchsquare.com/files/rs-7019627/v1/ad34fdf8e1e75a7077999ba6.png"},{"id":86513272,"identity":"535194a8-39b2-42cd-8757-3703eb42e89a","added_by":"auto","created_at":"2025-07-11 13:34:48","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":51705,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eDiagrams of the results of RT-qPCR expression level of CD44, PLA2G4D \u003c/strong\u003ea\u003cstrong\u003end SERPINB5. (a) \u003c/strong\u003eDiagram of the results of RT-qPCR expression level of CD44. ***indicates p\u0026lt;0.001. \u003cstrong\u003e(b) \u003c/strong\u003eDiagram of the results of RT-qPCR expression level of PLA2G4D. ** indicates p\u0026lt;0.01. \u003cstrong\u003e(c) \u003c/strong\u003eDiagram of the results of RT-qPCR expression level of SERPINB5. ***indicates p\u0026lt;0.001.\u003c/p\u003e","description":"","filename":"7.png","url":"https://assets-eu.researchsquare.com/files/rs-7019627/v1/243097a0b2f0285e5ab69e83.png"},{"id":86515727,"identity":"dcc2707a-a25b-4d27-a774-995c0c52cd2b","added_by":"auto","created_at":"2025-07-11 14:06:49","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2782528,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7019627/v1/47651175-d47c-4888-abca-9f72cb31ae35.pdf"},{"id":86513955,"identity":"1e00a52b-44fd-46c7-b643-7b8894fcdc7c","added_by":"auto","created_at":"2025-07-11 13:42:48","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":57161,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryInformation.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7019627/v1/6b886bc0f0740dd873402042.pdf"},{"id":86513273,"identity":"d759504c-f084-44f5-bab1-79a190478b9c","added_by":"auto","created_at":"2025-07-11 13:34:48","extension":"xlsx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":10947,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryTable1.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-7019627/v1/cb85eb197a4634d2edfacf1c.xlsx"},{"id":86515260,"identity":"842057ec-9ec8-4f9c-8b9e-cab3abee9746","added_by":"auto","created_at":"2025-07-11 13:58:48","extension":"xlsx","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":21870,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryTable2.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-7019627/v1/57e0f783a870058ee5db299c.xlsx"}],"financialInterests":"No competing interests reported.","formattedTitle":"\u003cp\u003eBioinformatics analysis and validation of CD44 and associated biomarkers in prostate cancer\u003c/p\u003e","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eProstate cancer (PCa) has become the second most common carcinoma in males around the world, the number of new cases exceeds one million annually, and nearly 400,000 people die from it each year\u003csup\u003e[\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]\u003c/sup\u003e. Demographic changes worldwide, leading to a gradual increase in the aging population, will directly result in nearly 3\u0026nbsp;million new cases of prostate cancer annually by 2020\u003csup\u003e[\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]\u003c/sup\u003e.\u003c/p\u003e\u003cp\u003eAlthough maximal androgen blockade (MAB) with androgen deprivation therapy (ADT) and anti-androgen (AA), which is used for advanced PCa therapy, has shown more benefits than monotherapy\u003csup\u003e[\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]\u003c/sup\u003e. Currently, the treatment methods for prostate cancer have become relatively mature and diversified, mainly including surgical treatment, chemotherapy, endocrine therapy, cryotherapy, radiotherapy, and immunotherapy\u003csup\u003e[\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]\u003c/sup\u003e. However, the mortality rate of prostate cancer remains high, and there is still a challenge in treating prostate cancer.\u003c/p\u003e\u003cp\u003eA wealth of research supports the role of cancer stem cells (CSCs) and their associated markers in the malignant behavior of tumors, such as cluster of differentiation 44 (CD44)\u003csup\u003e[\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]\u003c/sup\u003e. CD44 is a non-kinase cell surface transmembrane glycoprotein. Once its overexpression is detected in cancer stem cells, alternative splicing may occur to promote cancer progression, which can significantly compromise treatment efficacy\u003csup\u003e[\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]\u003c/sup\u003e. CD44 has been confirmed to be located on human chromosome 11\u003csup\u003e[\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]\u003c/sup\u003eor mouse chromosome 2\u003csup\u003e[\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]\u003c/sup\u003e, and it primarily functions as a single polypeptide chain encoded by a conserved gene.\u003c/p\u003e\u003cp\u003eHyaluronic acid (HA) is a major component of the extracellular matrix (ECM), expressed by stromal cells and cancer cells, and serves as a primary ligand for CD44\u003csup\u003e[\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]\u003c/sup\u003e. Once HAbinds to CD44, it will directly lead to a conformational change, which in turn causes adapter proteins or cytoskeletal components to bind to the intracellular domain of CD44. This subsequently activates various signaling pathways, resulting in cellular phenomena such as proliferation, adhesion, migration, and invasion\u003csup\u003e[\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e, \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]\u003c/sup\u003e. When tumor cells undergo epithelial mesenchymal transition (EMT), they will acquire stemlike properties, and simultaneously, the expression of CD44 will increase\u003csup\u003e[\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e]\u003c/sup\u003e. In additional, tumour cells with an epithelial-mesenchymal transition phenotype also exhibit higher invasiveness and greater resistance to chemotherapy\u003csup\u003e[\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]\u003c/sup\u003e. Studies have shown that the CD44 gene is associated with the occurrence and development of prostate cancer. Specifically, Loss of CD44 staining in both primary and metastatic lesions is a predictor of biochemical recurrence after radical prostatectomy\u003csup\u003e[\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]\u003c/sup\u003e.The CD44 gene plays a crucial role in the occurrence and development of prostate cancer. Its expression and function are associated with the aggressive phenotype of prostate cancer cells, including proliferation, migration, and invasion\u003csup\u003e[\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]\u003c/sup\u003e. Therefore, CD44 may serve as a diagnostic and prognostic marker, as well as a potential therapeutic target, in prostate cancer.\u003c/p\u003e\u003cp\u003eOur study identified CD44-related biomarkers in PCa based on transcriptome data, and performed a series of analyses on CD44 and these biomarkers using bioinformatics methods, including functional enrichment analysis, immune infiltration analysis, drug prediction analysis, and experimental validation. Our research conclusions provide certain reference value for our subsequent exploration of the possible pathogenesis of prostate cancer, investigation into targeted therapies centered on new targets, and optimization of treatment methods as much as possible.\u003c/p\u003e"},{"header":"2. Results","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\u003ch2\u003e2.1 Localization, molecular regulatory, gene expression, and functional enrichment analysis of CD44\u003c/h2\u003e\u003cp\u003eThe specific distribution of genes on chromosomes facilitates a deeper understanding of their functions. Chromosomal mapping indicated that CD44 was located on chromosome 11 \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003ea\u003cb\u003e)\u003c/b\u003e. The subcellular localization results indicated that the CD44 protein was most likely located extracellularly with the highest score (Extracellular\u0026thinsp;=\u0026thinsp;3.010) \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eb\u003cb\u003e)\u003c/b\u003e. The molecular regulatory networks play a crucial role in elucidating the principles of gene regulation within biological cells and understanding the pathogenesis of related diseases. In the miRWalk database, CD44 was associated with 2,099 microRNAs (miRNAs). In the miRDB database, CD44 was connected with 239 miRNAs. The TargetScan database revealed CD44 was linked to 911 miRNAs. The intersection of miRNAs identified for CD44 across 3 databases totals 133 (like hsa-miR-18a-3p, hsa-miR-548c-5p, and hsa-miR-520c-3p) \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003ec\u003cb\u003e)\u003c/b\u003e. Subsequently, the StarBase database predicted a total of 111 lncRNAs (like SNHG3, XIST, and NORAD). Afterwards, the lncRNA-miRNA-mRNA regulatory network was constructed \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003ed, \u003cb\u003eand Supplementary Table \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e)\u003c/b\u003e. The gene expression data indicated a significant reduction in CD44 expression in the tumor group relative to the normal group \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003ee\u003cb\u003e)\u003c/b\u003e. Moreover, the top 5 pathways significantly enriched for the CD44 gene were ribosome, oxidative phosphorylation, glycosphingolipid biosynthesis globo series, parkinsons disease, and proteasome \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003ef\u003cb\u003e)\u003c/b\u003e.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec4\" class=\"Section2\"\u003e\u003ch2\u003e2.2 Identification and functional analysis of candidate genes\u003c/h2\u003e\u003cp\u003eInitially, a total of 208 differentially expressed genes(DEGs)1 were obtained, including 87 up-regulated genes and 121 down-regulated genes in high-CD44 and low-CD44 groups \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ea-b\u003cb\u003e)\u003c/b\u003e. Subsequently, the results of differential expression analysis revealed 3,006 DEGs2, with 1,273 up-regulated and 1,733 down-regulated in tumor samples compared with the normal group \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ec-d\u003cb\u003e)\u003c/b\u003e. Subsequently, a sum of 124 candidate genes was yielded by taking the intersection of DEGs1 and DEGs2 \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ee\u003cb\u003e)\u003c/b\u003e. Afterwards, the candidate genes exhibited marked enrichment across 364 GO signaling pathways, comprising 279 biological processes (BPs) (like cell fate determination), 28 cellular components (CCs) (like blood microparticles), and 57 molecular functions (MFs) (like serine-type endopeptidase inhibitor activity), as well as 25 KEGG signaling pathways (like renin secretion) \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ef-g\u003cb\u003e)\u003c/b\u003e. Moreover, PPI network analysis of candidate genes yielded 69 nodes (like AGT, FOXG1, and FGB) and 116 edges \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eh\u003cb\u003e)\u003c/b\u003e. The 69 genes were defined as candidate feature genes for subsequent analyses.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec5\" class=\"Section2\"\u003e\u003ch2\u003e2.3 Identification of key genes and biomarkers\u003c/h2\u003e\u003cp\u003eFirstly, the optimal LASSO model was obtained with the smallest error (lambda.min = -4.5207). Consequently, 19 LASSO genes related to PCa were were identified through the LASSO algorithm \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ea, b\u003cb\u003e)\u003c/b\u003e. Secondly, a sum of 29 Boruta genes were gained utilizing Boruta algorithm \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ec\u003cb\u003e)\u003c/b\u003e. Additionally, the XGBoost algorithm identified 34 XGBoost genes \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ed\u003cb\u003e)\u003c/b\u003e. The intersection of genes from the 3 algorithms was taken to obtain 9 key feature genes \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ee\u003cb\u003e)\u003c/b\u003e. The Wilcoxon test revealed that CLCA2, CRISP3, PLA2G4D, SERPINB5, and SPINK1 exhibited significant and consistent expression differences between The Cancer Genome Atlas Prostate Adenocarcinoma (TCGA-PRAD) and validation set 1, and were thus designated as key genes. Specifically, in tumor samples, CRISP3 and SPINK1 were significantly upregulated, while CLCA2, PLA2G4D, and SERPINB5 were significantly downregulated \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ef-g\u003cb\u003e)\u003c/b\u003e. Subsequently, principal component analysis (PCA) of validation set 2 delineated a clear separation between the knockout and control samples, with the former demonstrating reduced expression levels relative to the latter \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eh-i\u003cb\u003e)\u003c/b\u003e. Notably, the genes PLA2G4D and SERPINB5 exhibited congruent expression profiles across both the TCGA-PRAD and the validation set 2 groups, thereby qualifying them as biomarkers \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ej\u003cb\u003e)\u003c/b\u003e.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec6\" class=\"Section2\"\u003e\u003ch2\u003e2.4 ANN model of CD44, PLA2G4D, and SERPINB5 in PCa\u003c/h2\u003e\u003cp\u003eThe model developed for assessing the accuracy of CD44, PLA2G4D, and SERPINB5 in predicting PCa was capable of effectively distinguishing between tumor and normal groups \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ea\u003cb\u003e)\u003c/b\u003e. The model achieved an AUC value of 0.825 on TCGA-PRAD (AUC\u0026thinsp;\u0026gt;\u0026thinsp;0.7) \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eb\u003cb\u003e)\u003c/b\u003e. This evidence substantiated the model's robust predictive performance, demonstrating its effectiveness in distinguishing between tumor and the normal groups.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec7\" class=\"Section2\"\u003e\u003ch2\u003e2.5 Functional assessment and immune infiltration analysis\u003c/h2\u003e\u003cp\u003eGene set enrichment analysis(GSEA) revealed that PLA2G4D was associated with 60 pathways, SERPINB5 with 69 pathways. Notably, PLA2G4D was primarily enriched in the protein export, hematopoietic cell lineage, and asthma, etc. \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ea\u003cb\u003e)\u003c/b\u003e. For SERPINB5, the leading pathways were dorsoventral axis formation, glycosphingolipid biosynthesis globo series, glyoxylate and dicarboxylate metabolism, etc. \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eb\u003cb\u003e)\u003c/b\u003e. Subsequently, the estimated proportion of 28 distinct immune cell types across samples in the training set was depicted in Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ec. Upon contrasting the infiltration levels of immune cells between tumor and normal groups, it was determined that a total of 22 immune cells (like activated B cell, activated CD4 T cell, eosinophil) exhibited significant disparities between the 2 groups (p\u0026thinsp;\u0026lt;\u0026thinsp;0.05) (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ed). Correlation analysis revealed positive associations between CD44 (cor\u0026thinsp;=\u0026thinsp;0.35, p\u0026thinsp;\u0026lt;\u0026thinsp;0.05), PLA2G4D (cor\u0026thinsp;=\u0026thinsp;0.41), and SERPINB5 (cor\u0026thinsp;=\u0026thinsp;0.59, p\u0026thinsp;\u0026lt;\u0026thinsp;0.05) with natural killer (NK) cell \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ee\u003cb\u003e)\u003c/b\u003e.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e\u003ch2\u003e2.6 Network construction of CD44, PLA2G4D, and SERPINB5\u003c/h2\u003e\u003cp\u003eSubsequently, the functional genes related to CD44, PLA2G4D, and SERPINB5, such as SLC9A1, HYAL2, ABCC5, and other genes, were explored. These genes were mainly enriched in metabolic processes including hyaluronan metabolic process, phospholipase A2 activity, mucopolysaccharide metabolic process, apical part of cell, phospholipid catabolic process, and glycosaminoglycan catabolic process \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003ea\u003cb\u003e)\u003c/b\u003e.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec9\" class=\"Section2\"\u003e\u003ch2\u003e2.7 Compounds related to CD44, PLA2G4D, and SERPINB5 and molecular docking\u003c/h2\u003e\u003cp\u003eIn order to explore the compounds targeting CD44, PLA2G4D, and SERPINB5, compounds targeting these genes were predicted through the DsigDB database. The drug Tetradioxin, which had the highest Combined Score, was selected for molecular docking \u003cb\u003e(Supplementary Table \u003cspan refid=\"MOESM2\" class=\"InternalRef\"\u003eS2\u003c/span\u003e and\u003c/b\u003e Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eb). Through molecular docking, it was found that the binding energy between CD44 and Tetradioxin was \u0026minus;\u0026thinsp;6.4 kcal/mol, and the amino acid residue GLU-67 was connected by a hydrogen bond. The binding energy between PLA2G4D and Tetradioxin was \u0026minus;\u0026thinsp;6.5 kcal/mol, with the amino acid residue THR-423 connected by a hydrogen bond. The binding energy between SERPINB5 and Tetradioxin was \u0026minus;\u0026thinsp;5.5 kcal/mol, and the amino acid residue ASN-266 was connected by a hydrogen bond. It could be seen that CD44, PLA2G4D, and SERPINB5 had good binding abilities with Tetradioxin \u003cb\u003e(\u003c/b\u003eTable\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e \u003cb\u003eand\u003c/b\u003e Figs.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003ec-e\u003cb\u003e)\u003c/b\u003e.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eResults of molecular docking of Tetradioxin related to CD44, PLA2G4D, and SERPINB5.\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"3\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eSymbol\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eTarget_drug\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eBinding energy (kcal/mol)\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eCD44\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eTetradioxin\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e-6.4\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003ePLA2G4D\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eTetradioxin\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e-6.5\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eSERPINB5\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eTetradioxin\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e-5.5\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec10\" class=\"Section2\"\u003e\u003ch2\u003e2.8 The expression levels of CD44, PLA2G4D, and SERPINB5 were validated in clinical samples\u003c/h2\u003e\u003cp\u003eThe results of Reverse transcription-quantitative polymerase chain reaction (RT-qPCR) showed that the expressions of CD44, PLA2G4D, and SERPINB5 were significantly downregulated in tumor samples (P\u0026thinsp;\u0026lt;\u0026thinsp;0.01) \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003ea-c\u003cb\u003e)\u003c/b\u003e. This was consistent with the results of our bioinformatics analysis, indicating that the results of the bioinformatics analysis were reliable.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e"},{"header":"3. Discussion","content":"\u003cp\u003eThis study is based on transcriptome data of prostate cancer, aiming to identify CD44-related biomarkers in this disease. We obtained several biomarkers, inciuding CD44, PLA2G4D, and SERPINB5. These were acquired through a series of methods, such as PPI analysis, machine learning algorithms, and expression validation. The predictive capabilities of these biomarkers were assessed using an ANN model. Subsequently, functional analysis, immune analysis, and drug prediction were conducted on these biomarkers. Overall, this provides a reference for exploring the potential pathogenesis of PCa, identifying new therapeutic targets for the development of targeted therapeutics, and optimizing treatment methods.\u003c/p\u003e\u003cp\u003eThe protein, which encoded by the CD44 gene, is a type of cell surface glycoprotein, which is closely related to cell adhesion processes, cell-cell interactions, and cell migration processes\u003csup\u003e[\u003cspan additionalcitationids=\"CR17\" citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e]\u003c/sup\u003e. It serves as a receptor for hyaluronic acid and can also interact with other ligands, such as osteopontin, collagen, and matrix metalloproteinases (MMPs)\u003csup\u003e[\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e, \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e]\u003c/sup\u003e. This protein is involved in multiple cellular functions, such as lymphocyte activation, their recirculation and homing, hematopoiesis, and tumor metastasis. The transcripts of this gene undergoes variable splicing to produce functionally distinct isoforms. Gene variability in splicing is the foundation for the generation of structural and functional diversity of proteins, and this characteristic may be closely related to tumor progression and metastasis, as well as pathways such as glycosaminoglycan metabolism and the innate immune system. CD44 serves as a biomarker and therapeutic target for stem cells and tumors\u003csup\u003e[\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e, \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e, \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e]\u003c/sup\u003e. CD44 is notably enriched in the ribosome pathway, indicating its potential role in protein synthesis. In two datasets, the expression levels of CD44 in both the low CD44 and knockout groups were lower compared to those in the high CD44 controls. This suggest the significance of CD44 in the progression of prostate cancer. Moreover, the high predictive accuracy of ANN model targeting CD44 highlights its importance as a potential biomarker for prostate cancer.\u003c/p\u003e\u003cp\u003eThe biomarker PLA2G4D was identified through a set of analyses. The PLA2G4D gene encodes an enzyme belonging to the phospholipase A2 (PLA2) family, which primarily functions to hydrolyze the ester bond at the sn-2 position of glycerophospholipids, thereby releasing free fatty acids and lysophospholipids, which are important in biological processes such as inflammatory responses, cell signaling and lipid metabolism\u003csup\u003e[\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e]\u003c/sup\u003e. PLA2G4D encodes a cytosolic phospholipase A2 (cPLA2), which plays a key role in the inflammatory process\u003csup\u003e[\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e]\u003c/sup\u003e and has been associated with schizophrenia\u003csup\u003e[\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e]\u003c/sup\u003e and studied in relation to psoriasis\u003csup\u003e[\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e]\u003c/sup\u003e. For example, high expression of PLA2G4D is associated with poor prognosis in prostate cancer and cervical squamous cell carcinoma\u003csup\u003e[\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e]\u003c/sup\u003e. In our study,GSEA analysis showed that PLA2G4D is mainly enriched in the protein export pathway. This finding implies that PLA2G4D might play a role in regulating protein trafficking, which could affect cellular processes in prostate cancer. The lower expression of PLA2G4D in the low CD44 and knockout groups compared to the high CD44 controls also highlights its potential connection with the disease's progression.\u003c/p\u003e\u003cp\u003eAnother biomarker, SERPINB5, is mainly associated with the dorsoventral axis formation pathway. The SERPINB5 gene (Serpin Family B Member 5) is a gene that can encode proteins.Its related pathways include apoptosis, autophagy, and angiogenesis. A search for \"gene\u0026thinsp;+\u0026thinsp;function/pathway\" shows that the SERPINB5 gene is involved in the development of various cancers, such as liver cancer, nasopharyngeal cancer, gallbladder cancer, and colorectal cancer\u003csup\u003e[\u003cspan additionalcitationids=\"CR29\" citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e]\u003c/sup\u003e. Both SERPINB5 and SERPINB1 are members of the serpin family, but they have different functions and mechanisms of action. SERPINB5 is primarily associated with tumor suppression, whereas SERPINB1 is mainly involved in anti-inflammation and protecting tissues from excessive protease degradation. Studies have shown that epigenetic silencing of SERPINB1 promotes inflammation-mediated progression of prostate cancer progression\u003csup\u003e[\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e]\u003c/sup\u003e. The lower expression of SERPINB5 in the low CD44 and knockout groups indicates its potential involvement in the development of prostate cancer. The positive correlations between CD44, PLA2G4D, and SERPINB5 with NK cells suggest that these genes may influence the immune response in prostate cancer.\u003c/p\u003e\u003cp\u003eThe results of this study showed that 22 immune infiltrating cells were significant different between the disease and the control groups, and of the three genes, CD44, PLA2G4D and SERPINB5, the first two were most positive correlated with natural killer cells, and PLA2G4D was most positive correlation with central memory CD4 T cells.\u003c/p\u003e\u003cp\u003eThe main sources of cytokines and cytoplasmic granules are natural killer (NK) cells\u003csup\u003e[\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e]\u003c/sup\u003e. The research findings by Amirhossein et al. confirmed that there is an interaction between Gal-9 and CD44, and this interaction plays a crucial role in the process of activating NK cells.Gal-9 enhances natural killer cell activity by interacting with CD44\u003csup\u003e[\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e]\u003c/sup\u003e, suggesting that Gal-9 is a potential new therapeutic avenue for modulating NK cell effector functions. Furthermore, adoptive immunotherapy utilizing NK cells has shown significant efficacy in the treatment of hematological malignancies\u003csup\u003e[\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e]\u003c/sup\u003e. The high correlation between these three genes and immune activation suggests a close relationship with tumor cell killing, which in turn can influence the occurrence and progression of tumors.\u003c/p\u003e\u003cp\u003eCancer stem cells have been confirmed to be associated with cancer progression and metastasis\u003csup\u003e[\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e]\u003c/sup\u003e. MiRNAs regulate both normal stem cells and cancer stem cells\u003csup\u003e[\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e]\u003c/sup\u003e, and it has been confirmed that microRNA dysregulation is closely related to the initiation and progression of tumors\u003csup\u003e[\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e]\u003c/sup\u003e.\u003c/p\u003e\u003cp\u003eThe research by CAN LIU et al. has demonstrated that miR-34a is a target gene of p53, and its expression is downregulated in CD44 (+) prostate cancer cells purified from xenograft and primary tumors. If miR-34a is enforcedly expressed in CD44 (+) prostate cancer cells, it can inhibit the expansion, regeneration, and metastasis of tumor cells. Conversely, if miR-34a antagonists are expressed in CD44 (-) prostate cancer cells, Then it will lead to the progression and deterioration of the tumor\u003csup\u003e[\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e]\u003c/sup\u003e. Furthermore, studies have also demonstrated that NUMB is directly targeted by microRNA-9-5p (miR-9-5p), an oncogenic microRNA associated with poor prognosis in various malignancies. The levels of miR-9-5p are inversely correlated with the expression of NUMB in CD44 (+) prostate cancer stem cells. Levels of miR-9-5p reduces the expression of NUMB and inhibits multiple properties of prostate cancer stem cells, including proliferation, migration, invasion, and self-renewal\u003csup\u003e[\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e]\u003c/sup\u003e.\u003c/p\u003e\u003cp\u003eBased on our research findings, we have predicted that the common drug estradiol (E2) may play a role in the treatment of prostate cancer.Research has confirmed that the status of cyclin D1 is closely associated with cell cycle disorders, and its effects lead to a reduction in cell number. Low concentrations (0.1 nM) of estradiol (E2) can effectively trigger these mechanisms, significantly improving treatment outcomes for patients with both early and late-stage prostate cancer, while avoiding the side effects associated with high-dose drug treatment\u003csup\u003e[\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e]\u003c/sup\u003e. Meanwhile, a study by Mark Stein et al. has also shown that transdermal estradiol treatment is safe, has biochemical activity, and remains of value in heavily pre-treated patients with advanced castration-resistant and chemotherapy-refractory metastatic prostate cancer\u003csup\u003e[\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e]\u003c/sup\u003e.\u003c/p\u003e\u003cp\u003eOverall, the in-depth identification and characterization of CD44, PLA2G4D, and SERPINB5 have provided guiding significance for subsequent in-depth understanding of the molecular mechanisms underlying prostate cancer and the search for potential therapeutic targets. In our study, we found that the expression levels of the above three genes were significantly lower in both the training and validation sets compared to the control samples. The results of PCR verification are consistent with those of bioinformatics analysis, which also indicates the reliability of the results.Through functional analysis of CD44, we identified two biomarkers and significantly enriched pathways, as well as the association of biomarkers with the immune microenvironment, targeted drugs, etc. However, due to the relatively small sample size in our study, the overall results are highly dependent on the quality of the data, the accuracy of the algorithms, and the assumptions underlying the analysis. Moreover, due to the complexity and diversity of biological systems, the results may not fully reflect actual biological processes, and experimental validation is needed to confirm their authenticity and reliability. Further studies are needed to clarify the exact role of these genes and their potential as biomarkers.\u003c/p\u003e"},{"header":"4. Material and methods","content":"\u003cdiv id=\"Sec13\" class=\"Section2\"\u003e\u003ch2\u003e4.1 Data collection and transcriptome sequencing\u003c/h2\u003e\u003cp\u003eThe transcriptome data were retrieved from UCSC Xena and GEO databases. The training set, including 498 prostate adenocarcinoma (PRAD) tumor tissue samples and 52 normal tissue samples, was designated as The Cancer Genome Atlas (TCGA)-PARD\u003csup\u003e[\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e]\u003c/sup\u003e. All the above data were downloaded on July 1, 2024. The validation set 1 was obtained from GSE21034 (GPL 10264 platform) in the GEO database, and contained 179 PCa samples\u003csup\u003e[\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e]\u003c/sup\u003e. The validation set 2 included 3 PCa cell samples and 3 samples of PCa with the CD44 gene knocked out, and the different expression of downstream genes was tested by gene sequencing. One of the groups was marked as the SI group by the standard method of knocking out the CD44 gene. The other group did not do gene-level processing and was labeled as an NC control group. RNA was extracted from the NC group and the SI group by standard method respectively, and qualified RNA is obtained after further quality assessment of the extracted RNA for strict integrity, purity and precision. After obtaining qualified RNA, the messenger RNA (mRNA) is obtained by enriching polyA-rich tail and culling ribosome, and DNA amplification was obtained after reverse transcription with mRNA as a template. After constructing the DNA library and confirming its qualification through inspection, the pooled distinct libraries are subjected to sequencing on the Illumina HiSeq platform. The sequencing process adheres to the stipulated effective concentration and the desired amount of data to be generated by the sequencing machine, aiming to acquire the sequence details of the fragments to be sequenced.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec14\" class=\"Section2\"\u003e\u003ch2\u003e4.2 Chromosome localization, subcellular localization, and molecular regulation network of CD44\u003c/h2\u003e\u003cp\u003eTo understand the specific distribution of CD44 on chromosomes so as to further understand the function of CD44, the location details of CD44 were visualized utilizing the RCircos package (v 1.2.2)\u003csup\u003e[\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e]\u003c/sup\u003e. Besides, the subcellular LOcalization predictor (CELLO) tool was employed to analyze the subcellular localization of the corresponding protein of CD44. To further investigate the potential molecular regulatory mechanisms of CD44, the upstream miRNAs of CD44 were predicted using the miRwalk, miRDB, and TargetScan databases. The intersection of miRNAs predicted by 3 databases was obtained to identify key miRNAs. Utilizing the Starbase (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://rnasysu.com/encori/\u003c/span\u003e\u003cspan address=\"https://rnasysu.com/encori/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) database, lncRNAs were predicted using key miRNAs. The visualization of the lncRNA-miRNA-mRNA regulatory network was accomplished utilizing Cytoscape software (v 3.6.1)\u003csup\u003e[\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e]\u003c/sup\u003e.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec15\" class=\"Section2\"\u003e\u003ch2\u003e4.3 Gene expression analysis and enrichment analysis of CD44\u003c/h2\u003e\u003cp\u003eTo investigate the expression differences of CD44 between tumor samples and normal samples in TCGA-PRAD, the Wilcoxon test was employed to analyze the expression levels of CD44 (p\u0026thinsp;\u0026lt;\u0026thinsp;0.05). Additionally, the expression correlation coefficients (cor) for all genes in relation to CD44 were calculated utilizing Speraman correlation and subsequently ranked based on these cor, followed by GSEA. The reference gene set, c2.cp.kegg.v2023.1.Hs.symbols.gmt, was sourced from MSigDB. For the execution of GSEA, the clusterProfiler package (v 4.7.1.003)\u003csup\u003e[\u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e]\u003c/sup\u003e was implemented (adj.p\u0026thinsp;\u0026lt;\u0026thinsp;0.05, NSE\u0026thinsp;\u0026gt;\u0026thinsp;1).\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec16\" class=\"Section2\"\u003e\u003ch2\u003e4.4 Identification of candidate genes\u003c/h2\u003e\u003cp\u003eFirst, using the median CD44 expression level as a threshold, PCa samples from the TCGA-PRAD dataset were classified into high-CD44 and low-CD44 groups according to their CD44 expression levels. The DESeq2 package (v 1.38.0)\u003csup\u003e[\u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e]\u003c/sup\u003e was applied to discern DEGs between the 2 groups (adj.p\u0026thinsp;\u0026lt;\u0026thinsp;0.05 \u0026amp; |log\u003csub\u003e2\u003c/sub\u003e FC| \u0026gt;1). The identified DEGs were designated as DEGs1. Subsequently, the ggplot2 package (v 3.5.1) was employed to generate volcano plots that graphically represented DEGs1, highlighting the top 10 upregulated and downregulated genes within the plots. Heatmaps illustrating the expression patterns of the top 10 up-regulated and down-regulated genes were generated using ComplexHeatmap (v 2.14.0)\u003csup\u003e[\u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e]\u003c/sup\u003e. The selection of DEGs from PCa samples compared to normal group samples aligned with the criteria established for the aforementioned high-CD44 and low-CD44 groups. The DEGs identified through this selection process were designated as DEGs2. Secondly, candidate genes were obtained by intersecting DEGs1 and DEGs2. The results were visualised utilizing the ggvenn package (v 0.1.9)\u003csup\u003e[\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e]\u003c/sup\u003e.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec17\" class=\"Section2\"\u003e\u003ch2\u003e4.5 Enrichment analysis and protein-protein interaction (PPI) network of candidate genes\u003c/h2\u003e\u003cp\u003eTo investigate the roles and pathways of the candidate genes, analyses leveraging GO and KEGG analyses were employed with the clusterProfiler package (p\u0026thinsp;\u0026lt;\u0026thinsp;0.05). The results were visualised utilizing the ggplot2 package (v 3.5.1). Subsequently, to explore protein interactions among the candidate genes, a PPI network was constructed utilizing the STRING database with a confidence score\u0026thinsp;\u0026ge;\u0026thinsp;0.4. The results were visualised using Cytoscape software (v 3.6.1). The genes that exhibited interaction in the PPI network were defined as candidate feature genes.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec18\" class=\"Section2\"\u003e\u003ch2\u003e4.6 Identification of key genes\u003c/h2\u003e\u003cp\u003eThe LASSO method was utilized to obtain feature genes from candidate genes with the glmnet package (v 4.1-4)\u003csup\u003e[\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e]\u003c/sup\u003e. The most strongly associated feature genes were selected when lambda reached its minimum value, yielding the minimal error rate, identifying LASSO genes. Additionally, the candidate genes underwent a rigorous evaluation through the Boruta algorithm, which employed random forest to determine the significance of features by comparing their importance scores to randomly generated shadow features, effectively identifying key predictors (Boruta genes) amidst noise. Besides, the extreme gradient boosting (XGboost) model was constructed to evaluate the degree of impact that variables had on the predictive outcomes using the xgboost package (v 1.7.7.1)\u003csup\u003e[\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e]\u003c/sup\u003e. Genes that influenced the outcomes were denoted as XGboost genes. The feature genes, originating from the 3 machine learning algorithms, LASSO genes, Boruta genes, and XGBoost genes, were intersected as key feature genes for subsequent analyses. To assess the expression patterns of key feature genes in both the TCGA-PRAD and GSE21034 dataset, we utilized the rstatix package (v 0.7.2) to perform Wilcoxon test on all samples in 2 datasets. Genes that revealed marked differences between tumor and normal groups and exhibited consistent expression trends across both datasets were considered as key genes for subsequent analyses (p\u0026thinsp;\u0026lt;\u0026thinsp;0.05).\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec19\" class=\"Section2\"\u003e\u003ch2\u003e4.7 Identification of biomarkers\u003c/h2\u003e\u003cp\u003eTo evaluate the presence of significant differences between PCa samples and PCa-CD44 knockout samples from the validation set 2, PCA was conducted on the transcriptome sequencing data using the fast.prcomp function from the FactoMineR package (v 2.7)\u003csup\u003e[\u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e]\u003c/sup\u003e. Then, to explore the expression of CD44 in PCa samples and PCa-CD44 knockout samples within the validation set 2 and to assess the effect of CD44 knockout, the expression levels of CD44 were analyzed using the Wilcoxon test (p\u0026thinsp;\u0026lt;\u0026thinsp;0.05). Expression analysis was conducted using Wilcoxon test on the high-CD44 and low-CD44 groups of PCa samples from the TCGA-PRAD dataset and the validation set 2 to further identify CD44-related biomarkers in PCa. Genes that revealed marked differences between tumor and normal groups and high-CD44 and low-CD44 groups and exhibited consistent expression trends across both datasets were considered as biomarkers (p\u0026thinsp;\u0026lt;\u0026thinsp;0.05).\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec20\" class=\"Section2\"\u003e\u003ch2\u003e4.8 Artificial Neural Networks (ANN)\u003c/h2\u003e\u003cp\u003eTo assess the predictive accuracy of biomarkers for PCa, the expression data of these biomarkers were transformed into gene scores through the application of the min-max normalization method in TCGA-PRAD. Subsequently, the ANN was developed utilizing the NeuralNetTools (v 1.5.3)\u003csup\u003e[\u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e53\u003c/span\u003e]\u003c/sup\u003e and neuralnet packages (v 1.44.2)\u003csup\u003e[\u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e54\u003c/span\u003e]\u003c/sup\u003e. To evaluate the outcomes of the ANN model\u0026rsquo;s, the pROC package (v 1.18.0)\u003csup\u003e[\u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e55\u003c/span\u003e]\u003c/sup\u003e was utilized to generate the ROC curve.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec21\" class=\"Section2\"\u003e\u003ch2\u003e4.9 GSEA and immune infiltration analysis of biomarkers\u003c/h2\u003e\u003cp\u003eTo elucidate the pathways of biomarkers in PRAD, GSEA was performed on the biomarkers in PRAD samples from the TCGA-PRAD. The analysis method was consistent with the aforementioned GSEA analysis method for CD44. Subsequently, immune infiltration analysis offered insights into the tumor microenvironment, identified various immune cell populations, predicted responses to treatments, and aided in the development of immunotherapeutic strategies by revealing the complex interactions between immune cells and cancer cells. Initially, the ssGSEA algorithm was employed to determine the enrichment scores for 28 types of immune infiltrating cells\u003csup\u003e[\u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e56\u003c/span\u003e]\u003c/sup\u003e across all samples in TCGA-PRAD, utilizing the GSVA package (v 1.46.0)\u003csup\u003e[\u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e57\u003c/span\u003e]\u003c/sup\u003e. To assess the differences in the relative abundance of immune cells between tumor and normal groups, the Wilcoxon test was applied (p\u0026thinsp;\u0026lt;\u0026thinsp;0.05). Additionally, the correlation between CD44, biomarkers and immune cells was examined through Spearman analysis across all samples in TCGA-PARD, utilizing the psych package (|cor| \u0026gt;0.3, p\u0026thinsp;\u0026lt;\u0026thinsp;0.05).\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec22\" class=\"Section2\"\u003e\u003ch2\u003e4.10 GeneMANIA\u003c/h2\u003e\u003cp\u003eThe network construction for biomarkers and CD44 was carried out using GeneMANIA (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://genemania.org/\u003c/span\u003e\u003cspan address=\"https://genemania.org/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). Genes functionally related to biomarkers and CD44, as well as enriched pathways, were explored through this process.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec23\" class=\"Section2\"\u003e\u003ch2\u003e4.11 Drug prediction and molecular docking\u003c/h2\u003e\u003cp\u003eCompounds that might act on CD44 and biomarkers were predicted using the DsigDB database (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://dsigdb.tanlab.org/DSigDBv1.0/\u003c/span\u003e\u003cspan address=\"http://dsigdb.tanlab.org/DSigDBv1.0/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). A Sankey diagram of the Top 25 small molecule compound - biomarker network was drawn based on the ranking of the Combined Score using the ggalluvial package (v 0.12.5)\u003csup\u003e[\u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e58\u003c/span\u003e]\u003c/sup\u003e. Subsequently, molecular docking was performed between the compound with the highest Combined Score and CD44 as well as the biomarkers. The 3D structure of the compound was downloaded from the PubChem database (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://pubchem.ncbi.nlm.nih.gov/\u003c/span\u003e\u003cspan address=\"https://pubchem.ncbi.nlm.nih.gov/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e), and the 3D structures of the proteins encoded by the genes were downloaded from the Protein Data Bank Database (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.rcsb.org/\u003c/span\u003e\u003cspan address=\"https://www.rcsb.org/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). Molecular docking of the small molecule ligands (compounds) and macromolecular proteins (genes) was carried out online using CBdock (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://cadd.labshare.cn/cb-dock/php/blinddock.php\u003c/span\u003e\u003cspan address=\"https://cadd.labshare.cn/cb-dock/php/blinddock.php\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). Finally, the results of CBdock were visualized using PyMOL (v 3.1.0)\u003csup\u003e[\u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e59\u003c/span\u003e]\u003c/sup\u003e.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec24\" class=\"Section2\"\u003e\u003ch2\u003e4.12 RT-qPCR validation\u003c/h2\u003e\u003cp\u003eTo further validate the expression levels of CD44 and biomarkers in PCa samples and control samples, RT-qPCR experiments were conducted. Five tumor tissue samples from PCa patients and five adjacent non-tumor tissue samples were collected from The People's Hospital of Guangxi Zhuang Autonomous Region] for RT-qPCR. This study adhered to the Declaration of Helsinki and received approval from the Ethics Committee of Guangxi Zhuang Autonomous Region People's Hospital(approval no.2014-010). Informed consent was obtained from all patients.\u003c/p\u003e\u003cp\u003eTo verify the expression of CD44 and biomarkers, total RNA was extracted from the samples using TRIZOL (Vazyme, Nanjing, China) according to the manufacturer's instructions. The first strand of complementary DNA (cDNA) was synthesized from 2 \u0026micro;g of total RNA using Hifair\u0026reg; Ⅲ 1st Strand cDNA Synthesis SuperMix for qPCR (Yeasen, Shanghai, China) following the provided guidelines. RT-qPCR was performed using 2xUniversal Blue SYBR Green qPCR Master Mix (Servicebio, Wuhan, China). The detailed primer sequences and specific reaction procedures could be viewed in \u003cb\u003eSupplementary Table \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e\u003c/b\u003e. GAPDH was used as the internal reference gene. The gene expression levels were calculated using the 2\u003csup\u003e-ΔΔCt\u003c/sup\u003e method (PMID: 11846609). The results were visualized using Graphpad Prism 10.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec25\" class=\"Section2\"\u003e\u003ch2\u003e4.13 Statistical analysis\u003c/h2\u003e\u003cp\u003eThe R programming language (v 4.2.2) was utilized to conduct statistical analysis. Differences analysis between cohorts was executed via the Wilcoxon test. The p\u0026thinsp;\u0026lt;\u0026thinsp;0.05 was considered statistically significant.\u003c/p\u003e\u003c/div\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eConsent to participate\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eInformed consent was obtained from all individual participants included in the study.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData availability statement\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe datasets analysed during the current study are available in the [GEO] repository [https://www.ncbi.nlm.nih.gov/gds/]; tht [Utilizing the Starbase] repository [https://rnasysu.com/encori/]; the [GeneMANIA] repository [https://genemania.org/]; the [DsigDB] repository [http://dsigdb.tanlab.org/DSigDBv1.0/]; \u0026nbsp;the [PubChem] repository [https://pubchem.ncbi.nlm.nih.gov/]; the [Protein Data Bank] repository [https://www.rcsb.org/]; the [CBdock] repository [https://cadd.labshare.cn/cb-dock/php/blinddock.php].\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgements\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe would like to express our sincere gratitude to all individuals and organizations who supported and assisted us throughout this research. Special thanks to the following authors: Wei Li. In conclusion, we extend our thanks to everyone who has supported and assisted us along the way. Without your support, this research would not have been possible.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthors\u0026apos; contributions\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWL contributed to the experiment conception and design, data analysis, and manuscript draft. YX, and FW conducted the experiments. WL, YX, and FW contributed to manuscript draft and data analysis. YX and JL contributed to interpretation of data, manuscript draft and manuscript revision. YX, and JL are responsible for confirming the authenticity of all the raw data. All authors read and approved the manuscript.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe author(s) declare no competing interests.\u003c/p\u003e\n\u003cp\u003eThe authors declare that they have no competing interests.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis work was funded by the National Natural Science Foundation of China (grant no. 81460387) and it was supported by the Key Special Project of China\u0026apos;s Key R\u0026amp;D Program \u0026quot;Active Health and Scientific Response to Aging\u0026quot; (no: 2021YFC2009300,2021YFC2009301 ).\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eBray, F. et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. \u003cem\u003eCA Cancer J. Clin.\u003c/em\u003e \u003cb\u003e74\u003c/b\u003e, 229\u0026ndash;263 (2024).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eJames, N. D. et al. The Lancet Commission on prostate cancer: planning for the surge in cases. \u003cem\u003eLancet\u003c/em\u003e \u003cb\u003e403\u003c/b\u003e, 1683\u0026ndash;1722 (2024).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWilkerson, M. L., Lin, F., Liu, H. \u0026amp; Cheng, L. The application of immunohistochemical biomarkers in urologic surgical pathology. \u003cem\u003eArch. Pathol. Lab. Med.\u003c/em\u003e \u003cb\u003e138\u003c/b\u003e, 1643\u0026ndash;1665 (2014).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSekhoacha, M. et al. Prostate Cancer Review: Genetics, Diagnosis, Treatment Options, and Alternative Approaches. \u003cem\u003eMolecules\u003c/em\u003e \u003cb\u003e27\u003c/b\u003e, 5730 (2022).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eThapa, R. \u0026amp; Wilson, G. D. The Importance of CD44 as a Stem Cell Biomarker and Therapeutic Target in Cancer. \u003cem\u003eStem Cells Int.\u003c/em\u003e 2087204 (2016). (2016).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eStamenkovic, I., Amiot, M., Pesando, J. M. \u0026amp; Seed, B. A lymphocyte molecule implicated in lymph node homing is a member of the cartilage link protein family. \u003cem\u003eCell\u003c/em\u003e \u003cb\u003e56\u003c/b\u003e, 1057\u0026ndash;1062 (1989).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGoodfellow, P. N. et al. The gene, MIC4, which controls expression of the antigen defined by monoclonal antibody F10.44.2, is on human chromosome 11. \u003cem\u003eEur. J. Immunol.\u003c/em\u003e \u003cb\u003e12\u003c/b\u003e, 659\u0026ndash;663 (1982).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eColombatti, A., Hughes, E. N., Taylor, B. A. \u0026amp; August, J. T. Gene for a major cell surface glycoprotein of mouse macrophages and other phagocytic cells is on chromosome 2. \u003cem\u003eProc. Natl. Acad. Sci. U S A\u003c/em\u003e. \u003cb\u003e79\u003c/b\u003e, 1926\u0026ndash;1929 (1982).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBanerjee, S. et al. Impaired Synthesis of Stromal Components in Response to Minnelide Improves Vascular Function, Drug Delivery, and Survival in Pancreatic Cancer. \u003cem\u003eClin. Cancer Res.\u003c/em\u003e \u003cb\u003e22\u003c/b\u003e, 415\u0026ndash;425 (2016).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePonta, H., Sherman, L. \u0026amp; Herrlich, P. A. CD44: from adhesion molecules to signalling regulators. \u003cem\u003eNat. Rev. Mol. Cell. Biol.\u003c/em\u003e \u003cb\u003e4\u003c/b\u003e, 33\u0026ndash;45 (2003).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZ\u0026ouml;ller, M. CD44: can a cancer-initiating cell profit from an abundantly expressed molecule? \u003cem\u003eNat. Rev. Cancer\u003c/em\u003e. \u003cb\u003e11\u003c/b\u003e, 254\u0026ndash;267 (2011).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMani, S. A. et al. The epithelial-mesenchymal transition generates cells with properties of stem cells. \u003cem\u003eCell\u003c/em\u003e \u003cb\u003e133\u003c/b\u003e, 704\u0026ndash;715 (2008).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZhao, S. et al. CD44 Expression Level and Isoform Contributes to Pancreatic Cancer Cell Plasticity, Invasiveness, and Response to Therapy. \u003cem\u003eClin. Cancer Res.\u003c/em\u003e \u003cb\u003e22\u003c/b\u003e, 5592\u0026ndash;5604 (2016).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHao, J. L., Cozzi, P. J., Khatri, A., Power, C. A. \u0026amp; Li, Y. CD147/EMMPRIN and CD44 are potential therapeutic targets for metastatic prostate cancer. \u003cem\u003eCurr. Cancer Drug Targets\u003c/em\u003e. \u003cb\u003e10\u003c/b\u003e, 287\u0026ndash;306 (2010).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLai, C. J. et al. CD44 Promotes Migration and Invasion of Docetaxel-Resistant Prostate Cancer Cells Likely via Induction of Hippo-Yap Signaling. \u003cem\u003eCells\u003c/em\u003e \u003cb\u003e8\u003c/b\u003e, 295 (2019).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eVikesaa, J. et al. RNA-binding IMPs promote cell adhesion and invadopodia formation. \u003cem\u003eEmbo j.\u003c/em\u003e \u003cb\u003e25\u003c/b\u003e, 1456\u0026ndash;1468 (2006).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eCrosby, H. A., Lalor, P. F., Ross, E., Newsome, P. N. \u0026amp; Adams, D. H. Adhesion of human haematopoietic (CD34+) stem cells to human liver compartments is integrin and CD44 dependent and modulated by CXCR3 and CXCR4. \u003cem\u003eJ. Hepatol.\u003c/em\u003e \u003cb\u003e51\u003c/b\u003e, 734\u0026ndash;749 (2009).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eYoshida, T., Matsuda, Y., Naito, Z. \u0026amp; Ishiwata, T. CD44 in human glioma correlates with histopathological grade and cell migration. \u003cem\u003ePathol. Int.\u003c/em\u003e \u003cb\u003e62\u003c/b\u003e, 463\u0026ndash;470 (2012).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eCasalino-Matsuda, S. M., Monzon, M. E., Day, A. J. \u0026amp; Forteza, R. M. Hyaluronan fragments/CD44 mediate oxidative stress-induced MUC5B up-regulation in airway epithelium. \u003cem\u003eAm. J. Respir Cell. Mol. Biol.\u003c/em\u003e \u003cb\u003e40\u003c/b\u003e, 277\u0026ndash;285 (2009).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMidgley, A. C. et al. Transforming growth factor-β1 (TGF-β1)-stimulated fibroblast to myofibroblast differentiation is mediated by hyaluronan (HA)-facilitated epidermal growth factor receptor (EGFR) and CD44 co-localization in lipid rafts. \u003cem\u003eJ. Biol. Chem.\u003c/em\u003e \u003cb\u003e288\u003c/b\u003e, 14824\u0026ndash;14838 (2013).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eXu, H., Niu, M., Yuan, X., Wu, K. \u0026amp; Liu, A. CD44 as a tumor biomarker and therapeutic target. \u003cem\u003eExp. Hematol. Oncol.\u003c/em\u003e \u003cb\u003e9\u003c/b\u003e, 36 (2020).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eFreitas, R. et al. A multivalent CD44 glycoconjugate vaccine candidate for cancer immunotherapy. \u003cem\u003eJ. Control Release\u003c/em\u003e. \u003cb\u003e367\u003c/b\u003e, 540\u0026ndash;556 (2024).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBreithofer, J. et al. Phospholipase A2 group IVD mediates the transacylation of glycerophospholipids and acylglycerols. \u003cem\u003eJ. Lipid Res.\u003c/em\u003e \u003cb\u003e65\u003c/b\u003e, 100685 (2024).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eOhto, T., Uozumi, N., Hirabayashi, T. \u0026amp; Shimizu, T. Identification of novel cytosolic phospholipase A(2)s, murine cPLA(2){delta}, {epsilon}, and {zeta}, which form a gene cluster with cPLA(2){beta}. \u003cem\u003eJ. Biol. Chem.\u003c/em\u003e \u003cb\u003e280\u003c/b\u003e, 24576\u0026ndash;24583 (2005).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eTao, R. et al. A family based study of the genetic association between the PLA2G4D gene and schizophrenia. \u003cem\u003eProstaglandins Leukot. Essent. Fat. Acids\u003c/em\u003e. \u003cb\u003e73\u003c/b\u003e, 419\u0026ndash;422 (2005).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eCheung, K. L. et al. Psoriatic T cells recognize neolipid antigens generated by mast cell phospholipase delivered by exosomes and presented by CD1a. \u003cem\u003eJ. Exp. Med.\u003c/em\u003e \u003cb\u003e213\u003c/b\u003e, 2399\u0026ndash;2412 (2016).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLiu, H. et al. Metabolic Molecule PLA2G2D Is a Potential Prognostic Biomarker Correlating With Immune Cell Infiltration and the Expression of Immune Checkpoint Genes in Cervical Squamous Cell Carcinoma. \u003cem\u003eFront. Oncol.\u003c/em\u003e \u003cb\u003e11\u003c/b\u003e, 755668 (2021).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLiu, B. X. et al. SERPINB5 promotes colorectal cancer invasion and migration by promoting EMT and angiogenesis via the TNF-α/NF-κB pathway. \u003cem\u003eInt. Immunopharmacol.\u003c/em\u003e \u003cb\u003e131\u003c/b\u003e, 111759 (2024).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZhang, P. et al. TRIM21-SERPINB5 aids GMPS repression to protect nasopharyngeal carcinoma cells from radiation-induced apoptosis. \u003cem\u003eJ. Biomed. Sci.\u003c/em\u003e \u003cb\u003e27\u003c/b\u003e, 30 (2020).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eYang, S. F., Yeh, C. B., Chou, Y. E., Lee, H. L. \u0026amp; Liu, Y. F. Serpin peptidase inhibitor (SERPINB5) haplotypes are associated with susceptibility to hepatocellular carcinoma. \u003cem\u003eSci. Rep.\u003c/em\u003e \u003cb\u003e6\u003c/b\u003e, 26605 (2016).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLerman, I. et al. Epigenetic Suppression of SERPINB1 Promotes Inflammation-Mediated Prostate Cancer Progression. \u003cem\u003eMol. Cancer Res.\u003c/em\u003e \u003cb\u003e17\u003c/b\u003e, 845\u0026ndash;859 (2019).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eVivier, E., Tomasello, E., Baratin, M., Walzer, T. \u0026amp; Ugolini, S. Functions of natural killer cells. \u003cem\u003eNat. Immunol.\u003c/em\u003e \u003cb\u003e9\u003c/b\u003e, 503\u0026ndash;510 (2008).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRahmati, A., Bigam, S. \u0026amp; Elahi, S. Galectin-9 promotes natural killer cells activity via interaction with CD44. \u003cem\u003eFront. Immunol.\u003c/em\u003e \u003cb\u003e14\u003c/b\u003e, 1131379 (2023).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKim, S. et al. Surface Engineering of Natural Killer Cells with CD44-targeting Ligands for Augmented Cancer Immunotherapy. \u003cem\u003eSmall\u003c/em\u003e \u003cb\u003e20\u003c/b\u003e, e2306738 (2024).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eVisvader, J. E. \u0026amp; Lindeman, G. J. Cancer stem cells in solid tumours: accumulating evidence and unresolved questions. \u003cem\u003eNat. Rev. Cancer\u003c/em\u003e. \u003cb\u003e8\u003c/b\u003e, 755\u0026ndash;768 (2008).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMelton, C., Judson, R. L. \u0026amp; Blelloch, R. Opposing microRNA families regulate self-renewal in mouse embryonic stem cells. \u003cem\u003eNature\u003c/em\u003e \u003cb\u003e463\u003c/b\u003e, 621\u0026ndash;626 (2010).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eEsquela-Kerscher, A. \u0026amp; Slack, F. J. Oncomirs - microRNAs with a role in cancer. \u003cem\u003eNat. Rev. Cancer\u003c/em\u003e. \u003cb\u003e6\u003c/b\u003e, 259\u0026ndash;269 (2006).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLiu, C. et al. The microRNA miR-34a inhibits prostate cancer stem cells and metastasis by directly repressing CD44. \u003cem\u003eNat. Med.\u003c/em\u003e \u003cb\u003e17\u003c/b\u003e, 211\u0026ndash;215 (2011).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWang, X. et al. NUMB suppression by miR-9-5P enhances CD44(+) prostate cancer stem cell growth and metastasis. \u003cem\u003eSci. Rep.\u003c/em\u003e \u003cb\u003e11\u003c/b\u003e, 11210 (2021).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKoong, L. Y. \u0026amp; Watson, C. S. Direct estradiol and diethylstilbestrol actions on early- versus late-stage prostate cancer cells. \u003cem\u003eProstate\u003c/em\u003e \u003cb\u003e74\u003c/b\u003e, 1589\u0026ndash;1603 (2014).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eStein, M. et al. Transdermal estradiol in castrate and chemotherapy resistant prostate cancer. \u003cem\u003eMed. Sci. Monit.\u003c/em\u003e \u003cb\u003e18\u003c/b\u003e, Cr260\u0026ndash;264 (2012).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZhao, Q., Cheng, Y. \u0026amp; Xiong, Y. LTF Regulates the Immune Microenvironment of Prostate Cancer Through JAK/STAT3 Pathway. \u003cem\u003eFront. Oncol.\u003c/em\u003e \u003cb\u003e11\u003c/b\u003e, 692117 (2021).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eTaylor, B. S. et al. Integrative genomic profiling of human prostate cancer. \u003cem\u003eCancer Cell.\u003c/em\u003e \u003cb\u003e18\u003c/b\u003e, 11\u0026ndash;22 (2010).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZhang, H., Meltzer, P. \u0026amp; Davis, S. RCircos: an R package for Circos 2D track plots. \u003cem\u003eBMC Bioinform.\u003c/em\u003e \u003cb\u003e14\u003c/b\u003e, 244 (2013).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLiu, P., Xu, H., Shi, Y., Deng, L. \u0026amp; Chen, X. Potential Molecular Mechanisms of Plantain in the Treatment of Gout and Hyperuricemia Based on Network Pharmacology. \u003cem\u003eEvid Based Complement Alternat Med.\u003c/em\u003e 3023127 (2020). (2020).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eYu, G., Wang, L. G., Han, Y. \u0026amp; He, Q. Y. clusterProfiler: an R package for comparing biological themes among gene clusters. \u003cem\u003eOmics\u003c/em\u003e \u003cb\u003e16\u003c/b\u003e, 284\u0026ndash;287 (2012).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRitchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. \u003cem\u003eNucleic Acids Res.\u003c/em\u003e \u003cb\u003e43\u003c/b\u003e, e47 (2015).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGu, Z. \u0026amp; H\u0026uuml;bschmann, D. Make Interactive Complex Heatmaps in R. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cb\u003e38\u003c/b\u003e, 1460\u0026ndash;1462 (2022).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZheng, Y. et al. Ferroptosis and Autophagy-Related Genes in the Pathogenesis of Ischemic Cardiomyopathy. \u003cem\u003eFront. Cardiovasc. Med.\u003c/em\u003e \u003cb\u003e9\u003c/b\u003e, 906753 (2022).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLi, Y., Lu, F. \u0026amp; Yin, Y. Applying logistic LASSO regression for the diagnosis of atypical Crohn's disease. \u003cem\u003eSci. Rep.\u003c/em\u003e \u003cb\u003e12\u003c/b\u003e, 11340 (2022).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHou, N. et al. Predicting 30-days mortality for MIMIC-III patients with sepsis-3: a machine learning approach using XGboost. \u003cem\u003eJ. Transl Med.\u003c/em\u003e \u003cb\u003e18\u003c/b\u003e, 462 (2020).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePiotrowska-Niczyporuk, A., Bajguz, A., Kotowska, U., Zambrzycka-Szelewa, E. \u0026amp; Sienkiewicz, A. Auxins and Cytokinins Regulate Phytohormone Homeostasis and Thiol-Mediated Detoxification in the Green Alga Acutodesmus obliquus Exposed to Lead Stress. \u003cem\u003eSci. Rep.\u003c/em\u003e \u003cb\u003e10\u003c/b\u003e, 10193 (2020).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBeck, M. W. \u0026amp; NeuralNetTools Visualization and Analysis Tools for Neural Networks. \u003cem\u003eJ. Stat. Softw.\u003c/em\u003e \u003cb\u003e85\u003c/b\u003e, 1\u0026ndash;20 (2018).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLi, S. et al. Construction of Osteosarcoma Diagnosis Model by Random Forest and Artificial Neural Network. \u003cem\u003eJ. Pers. Med.\u003c/em\u003e \u003cb\u003e13\u003c/b\u003e, 447 (2023).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRobin, X. et al. pROC: an open-source package for R and S\u0026thinsp;+\u0026thinsp;to analyze and compare ROC curves. \u003cem\u003eBMC Bioinform.\u003c/em\u003e \u003cb\u003e12\u003c/b\u003e, 77 (2011).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGao, X., Guo, Z., Wang, P., Liu, Z. \u0026amp; Wang, Z. Transcriptomic analysis reveals the potential crosstalk genes and immune relationship between IgA nephropathy and periodontitis. \u003cem\u003eFront. Immunol.\u003c/em\u003e \u003cb\u003e14\u003c/b\u003e, 1062590 (2023).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eH\u0026auml;nzelmann, S., Castelo, R. \u0026amp; Guinney, J. GSVA: gene set variation analysis for microarray and RNA-seq data. \u003cem\u003eBMC Bioinform.\u003c/em\u003e \u003cb\u003e14\u003c/b\u003e, 7 (2013).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBrunson, J. C. ggalluvial: Layered Grammar for Alluvial Plots. \u003cem\u003eJ. Open. Source Softw.\u003c/em\u003e \u003cb\u003e5\u003c/b\u003e, 2017 (2020).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSeeliger, D. \u0026amp; de Groot, B. L. Ligand docking and binding site analysis with PyMOL and Autodock/Vina. \u003cem\u003eJ. Comput. Aided Mol. Des.\u003c/em\u003e \u003cb\u003e24\u003c/b\u003e, 417\u0026ndash;422 (2010).\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Prostate cancer, CD44, Biomarkers, Transcriptome sequencing analysis","lastPublishedDoi":"10.21203/rs.3.rs-7019627/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7019627/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eProstate cancer (PCa) development is linked to CD44, but its pathogenesis is unclear. This study explored the mechanisms of PCa linked to CD44 through bioinformatics and experimental approaches. Transcriptomic data analysis revealed CD44 localization on chromosome 11 and enrichment in the ribosome pathway. Differential expression analysis identified candidate genes, and two biomarkers—PLA2G4D and SERPINB5—were selected through PPI analysis, machine learning, and expression validation. These biomarkers showed lower expression in low CD44 and knockout groups compared to controls. The ANN model demonstrated high predictive accuracy (Area Under the Curve (AUC) = 0.825). Functional analysis showed PLA2G4D enrichment in the protein export pathway and SERPINB5 in dorsoventral axis formation. Twenty-two differential immune cells were identified, with positive correlations between CD44, PLA2G4D, SERPINB5, and NK cells (p \u0026lt; 0.05). Estradiol was identified as a targeted drug for all three genes. Reverse transcription-quantitative polymerase chain reaction (RT-qPCR) confirmed the downregulation of CD44, PLA2G4D, and SERPINB5 in tumor samples (p \u0026lt; 0.01). This study identified two CD44-related biomarkers, offering new therapeutic avenues and insights for PCa treatment.\u003c/p\u003e","manuscriptTitle":"Bioinformatics analysis and validation of CD44 and associated biomarkers in prostate cancer","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-07-11 13:34:43","doi":"10.21203/rs.3.rs-7019627/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-10-24T09:54:10+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-10-23T01:03:52+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"118908199033653131909136484702845461647","date":"2025-10-20T00:11:11+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-10-06T19:23:40+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"255419754712762767469448609514384430023","date":"2025-09-26T15:56:59+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-07-07T18:48:06+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-07-07T18:03:02+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2025-07-04T08:46:22+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-07-04T05:04:21+00:00","index":"","fulltext":""},{"type":"submitted","content":"Scientific Reports","date":"2025-07-01T11:05:02+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"0bf8d716-6d53-4852-b207-e7e5be6eaf24","owner":[],"postedDate":"July 11th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":51402159,"name":"Health sciences/Biomarkers"},{"id":51402160,"name":"Biological sciences/Cancer"},{"id":51402161,"name":"Biological sciences/Computational biology and bioinformatics"},{"id":51402162,"name":"Health sciences/Oncology"}],"tags":[],"updatedAt":"2025-12-10T13:23:13+00:00","versionOfRecord":[],"versionCreatedAt":"2025-07-11 13:34:43","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7019627","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7019627","identity":"rs-7019627","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00