Machine Learning-Driven Discovery of Biomarkers in Gastric Cancer: A Focus on DPT, FBP2, ADH7, INHBA, and GPR155 | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Machine Learning-Driven Discovery of Biomarkers in Gastric Cancer: A Focus on DPT, FBP2, ADH7, INHBA, and GPR155 Jianbo Zhao, Damu Agu, Xiongfeng Li, Youge Su, Haidong Cheng, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7523494/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 8 You are reading this latest preprint version Abstract Gastric cancer still is a severe threat to human health, often presenting with a poor prognosis, effective biomarkers for early detection and targeted treatment are urgently needed. This study performed a comprehensive bioinformatics and machine learning approach to identify key protein biomarkers for gastric cancer and elucidate their potential functions. Gastric cancer-related datasets were obtained from the NCBI Gene Expression Omnibus database. Differential expression analysis identified 171 genes with noticeable differences between control and tumor samples. Utilizing LASSO, SVM-RFE, and RF algorithms, five genes—DPT, FBP2, ADH7, INHBA, and GPR155—were identified as potential biomarkers. A support vector machine (SVM) model demonstrated the highest performance among ten machine learning models constructed using these five genes. Shapley additive explanations (SHAP) were employed to illustrate the detailed contribution of the pivotal genes to the SVM model. Gene set enrichment analysis and gene set variation analysis were then used to find out the functional roles of these genes in gastric cancer cells. At length, we revealed the distinctive effects of signature genes on immune cell infiltration and patient prognosis. In conclusion, the identified proteins have the potential to serve as diagnostic biomarkers and provide prognostic value for gastric cancer. This study offers a comprehensive, data-driven approach to uncover critical molecular targets for improved detection and management of this deadly disease. Gastric cancer DPT FBP2 ADH7 INHBA GPR155 Biomarkers Machine learning SHapley additive exPlanations Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 1. INTRODUCTION The Based on GLOBOCAN 2022 data, gastric cancer (GC) recorded approximately 968,000 new cases and 659,000 deaths in 2022, ranking fifth in both incidence and mortality rates [ 1 ] . Despite it has decline in incidence and mortality, the incidence of GC among young individuals is steadily increasing annually, presenting a persistent global health challenge [ 2 ] . Adenocarcinoma accounts for over 95% of GC cases. High-risk factors for GC include consumption of nitrite-rich foods and moldy foods, Helicobacter pylori infection [ 3 ] , excessive alcohol intake, smoking and other unhealthy behaviors [ 4 , 5 ] . Early-stage GC often presents with nonspecific symptoms like upper abdominal discomfort, acid reflux, indigestion and belching, leading to frequent misdiagnosis as common digestive ailments such as gastritis and gastric ulcers [ 6 , 7 ] . Consequently, patients are frequently diagnosed at progressive even late stages. Particularly, early detection rates are notably lower in economically underdeveloped regions. Despite the advancements in understanding surgical resection, the fact that various treatment modalities including chemotherapy, radiotherapy, targeted therapy, and immunotherapy have shown some efficacy in enhancing patient prognosis, overall outcomes remain suboptimal [ 8 ] . The management of GC remains a overwhelming challenge, characterized by limited treatment measures and unfavorable long-term results. Although current biomarkers offer some utility, their constraints underscore the urgent necessity for novel, comprehensive biomarkers to address these limitations. Bioinformatics and high-throughput omics technologies have significantly advanced the investigation on tumor mechanisms. The rapid evolution of whole-genome sequencing, including next-generation sequencing, enables the acquisition of cancer genome profiles [ 9 ] . These technologies furnish researchers with extensive expression datasets, facilitating the analysis of individual patient genomes for precise targeted therapies and the potential to uncover specific cancer characteristics.Machine learning (ML) algorithms offer significant benefits in the fields of bioinformatics and molecular biology. By effectively handling intricate biological data, ML can autonomously address noise and redundant information, thereby enhancing result accuracy and reliability [ 10 ] . It can help to understand gene-disease relationship and identify potential therapeutic targets. In addition, it is conducive to find disease association, early diagnosis and drug development. Model interpretability is a key issue in machine learning and deep learning fields. Complex models perform superb predictive performance, but often they are perceived as "black boxes" because of the complicating explanation of their internal decision making process. SHapley additive exPlanations (SHAP) addresses this challenge by assigning importance values to features, explaining the output of the model and enabling more informed decisions in medical field, compared to blindly relied on the output of the algorithm, which could lead to severe consequences for patients [ 11 ] . The amalgamation of bioinformatics analysis and machine learning techniques is a promising strategy for biomarker discovery, promising promising to revolutionizing diagnosis, prognosis, and treatment of gastric cancer [ 12 ] . This research aims to create a machine learning model for identifying gastric cancer by utilizing the Gene Expression Omnibus database (GEO) database. Through meticulous algorithmic screening of crucial genes, various machine learning models were developed. Ultimately, the most effective SVM model was chosen. The contributions of DPT, FBP2, ADH7, INHBA, and GPR155 to gastric cancer identification in this model were analyzed using Shapley Additive exPlanations. The utilization of these genes in clinical settings and the exploration of their downstream targets and pathways are imperative for early disease detection and the development of novel therapeutic strategies. 2. MATERIALS AND METHODS 2.1 Data acquisition RNA sequencing (RNA-Seq) data were acquired from five microarray datasets (GSE26942 [ 13 ] , GSE27342 [ 14 ] , GSE30727, GSE63089 [ 15 ] , and GSE65801 [ 16 ] ) accessible in the GEO at https://www.ncbi.nlm.nih.gov/geo/ . These datasets collectively comprise 392 tumor samples and 199 normal tissue samples. Subsequently, the probe expression matrix underwent conversion into a gene expression matrix utilizing the platform annotation file. Further details are provided in Table 1 . Additionally, standardized RNA sequencing data of 412 gastric cancer tissues and 36 normal tissues, along with corresponding clinicopathological information, were download from The Cancer Genome Atlas (TCGA) database at https://portal.gdc.cancer.gov/ . 2.2 Gene expression normalization and differentially expressed genes identification The raw GEO data was normalized using the "NormalizeBetweenArray" in R. To address heterogeneity within the training dataset arising from differences in experimental platforms and batches, the "sva" package (version 3.54.0) was employed. Differential expression analysis between tumor and normal samples was carried out using the "limma" package (version 3.62.2) in R [ 17 ] . Visualization of the DEGs was achieved through the creation of a volcano plot using the "ggplot2" (version 3.5.2) and a heatmap applying the "pheatmap" (version 1.0.12) packages in R. 2.3 Biological function and pathway enrichment analyses To identify biological pathways associated with the expression levels of DEGs, we used the clusterProfiler package (version 4.14.4) in R [ 18 ] to perform Gene Set Enrichment Analysis (GSEA) .GSEA is a robust computational approach utilized to discern biologically significant pathways and processes linked to particular gene expression patterns. The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway repository served as the predetermined reference set for assessing enrichment levels within the gene set. Gene Set Variation Analysis (GSVA), a technique for scrutinizing gene set variability, was executed employing the GSVA (version 2.0.5) software package within the R programming language [ 19 ] . The KEGG pathway collection was also used as the background gene set for GSVA analysis in this research. As for initial 171 DEGS, we used the clusterProfiler software package to carry out enrichment analysis in three modules through Gene Ontology (GO): biological process (BP), molecular function (MF) and cellular component (CC). Contemporaneously, KEGG analysis was also conducted on these differentially expressed genes. 2.4 Machine Learning Algorithms The Absolute Shrinkage and Selection Operator (LASSO) and Random Forest(RF) algorithms are effective tools for identifying hub genes due to their strong predictive capabilities. Using the "glmnet" package (version 4.1.8) in R [ 20 ] , a LASSO model was developed to pinpoint genes closely linked to gastric cancer. For RF analysis, the "randomForest" package (version 4.7.1.2) facilitated the selection of key genes by ranking differentially expressed genes and identifying those with an importance score above 3. Support Vector Machine Recursive Feature Elimination (SVM-RFE) was implemented using the "e1071" package (version 1.7.16) in R [ 21 ] . The model's performance was assessed through a 10-fold cross-validation average misclassification rate. Genes common to the sets identified by these three machine learning approaches were designated as hub genes for gastric cancer. 2.5 Estimate diagnostic value The receiver operating characteristic (ROC) curves were calculated to evaluate the diagnostic ability of the optimal gene biomarkers, then we measured the area under the curve (AUC). Genes that with an AUC greater than 0.9 were supposed to have good diagnostic performance. Based on the data from the TCGA database, we evaluated the diagnostic ability of 5 key genes. The pROC package(version 1.18.5) was used for ROC analysis [ 22 ] , and the results were visualized using "ggplot2". Various machine learning models were developed to assess diagnostic performance, such as random forest (RF), support vector machine (SVM), partial least squares (PLS), decision tree (DTS), K-nearest neighbors (KNN), logistic regression, eXtreme Gradient Boosting(XGBoost), gradient boosting machine (GBM), generalized linear model boosting (glmBoost) and neural network (NeuralNet). The diagnostic competency of each model was judged by analyzing the ROC curves they produced. 2.6 SHapley additive exPlanation SHAP is based on the Shapley value of cooperative game theory and applies the Shapley value to machine learning model interpretation. Shapley value allocates benefits fairly among participants and SHAP uses this value to quantify each feature's contribution to model prediction. This provides a common framework for assessing variable contributions in different algorithms. SHAP values allow the model performance evaluation to be transparent and interpretable, aiding clinicians in selecting the best diagnostic tool for their use in clinical applications. 2.7 Immune Infiltration Analysis CIBERSORT is a computational tool utilized to delineate the cellular makeup of intricate tissues by analyzing gene expression patterns. In this study, immune cell infiltration levels were assessed employing the CIBERSORT algorithm through a tailored R script derived from the primary methodology [ 23 ] . The analysis was performed with 1000 permutations, and only samples with a significance threshold of P < 0.05 were retained. Spearman correlation analysis was employed to examine the association between target genes and levels of immune cell infiltration, with the outcomes presented through heat maps. 2.8 Survival analysis The Kaplan-Meier Plotter website ( https://kmplot.com/analysis/ ) , which incorporates sequencing data from multiple cancers and microarray information from The Cancer Genome Atlas and the Gene Expression Omnibus databases, was utilized for prognostic correlation analysis [ 24 ] . The "Start KM Plotter for gastric cancer" section was employed to explore the relationship between the model genes and the overall survival (OS) of patients with gastric cancer. The cutoff value was set as "Auto select best cut - off", and the probe was automatically selected as the best probe set recommended by the Jetset algorithm via "only JetSet best probe set". 2.9 Statistical analysis Rigorous statistical analyses were conducted using the R software environment(version 4.2.3, https://www.r-project.org ). Differential gene expression was evaluated with a paired-sample t-test, considering a P-value below 0.05 as statistically significant. ROC curves and AUC metrics were employed to assess the diagnostic performance of biomarkers and machine learning models. The association between gene biomarker expression and infiltrating immune cell populations was evaluated using Spearman's rank correlation. 3. RESULTS 3.1 Identification of key DEGs The present study leveraged five GEO datasets to elucidate the genetic underpinnings of gastric cancer. The datasets were merged to augment the sample size and mitigate heterogeneity, followed by data normalization. Principal component analysis (PCA) confirmed the efficacy of the normalization approach in reducing technical variability across the integrated dataset.(Fig. 1 A and Fig. 1 B). Differential expression analysis of the stomach tumor and normal samples was conducted using the "limma" R package. Genes with an adj.P.Val 1 were considered as DEGs. The merged dataset yielded a total of 171 differentially expressed genes, comprising 64 up-regulated and 107 down-regulated transcripts.(Fig. 1 C and Fig. 1 D). 3.2 GO and KEGG Analysis of 171 DEGs The biological functions of the 171 genes were investigated through gene ontology and kyoto encyclopedia of genes and genomes enrichment analyses. GO biological process (BP) analysis indicated significant enrichment in response to xenobiotic stimuli, digestion, hormone metabolic processes, tissue homeostasis, and anatomical structure homeostasis (Fig. 2 A). GO cell composition (CC) analysis demonstrated significant enrichment in the apical part of the cell, collagen-containing extracellular matrix, and apical plasma membrane (Fig. 2 A). GO molecular function (MF) analysis revealed significant enrichment in extracellular matrix structural constituents, serine hydrolase activity, oxidase activity, activities acting on CH-OH group donors, and alcohol dehydrogenase [NAD(P)+] activity (Fig. 2 A). KEGG analysis highlighted Gastric acid secretion, Metabolism of xenobiotics by cytochrome P450, Virion - Hepatitis viruses and Drug metabolism - cytochrome P450 as the most significant pathways (Fig. 2 B). 3.3 Machine learning identification of hub GC related DEGs To identify diagnostic biomarkers for gastric cancer, we employed the Least Absolute Shrinkage and Selection Operator, Support Vector Machine Recursive Feature Elimination and Random Forest techniques. LASSO was utilized to identify key features while addressing multicollinearity(Fig. 3 A and Fig. 3 B). Our analysis associated gastric cancer with the expression of 33 genes, including ADH7, FBP2, INHBA, FLJ42875, COL1A1, CCKBR, MFAP2, HOXB9, PLA2G7, GPR155 and DPT. SVM was used to generate feature vectors, revealing 26 genes strongly linked to gastric cancer(Fig. 3 C and Fig. 3 D). The Random Forest algorithm highlighted genes such as SLC5A5, ADH7, AQP4, INHBA, MYOC, DPT, APOBEC2, TRIM50, SCNN1G, DNER, FBP2, KIAA1199, GC, CPA2, COL1A1, GPR155, CCKAR, and CLDN1, each with importance scores above 3(Fig. 3 E and Fig. 3 F). The overlap of these analyses identified five hub DEGs: DPT, FBP2, ADH7, INHBA, and GPR155(Fig. 4 A). Gene expression levels in gastric cancer and control samples were visualized by box-and-whisker plot(Fig. 4 B) and volcano plot(Fig. 4 C), and differential expression of five genes in the TCGA-STAD dataset was also explored(Fig. 4 D and Fig. 4 E). Chromosome locations are shown in the Fig. 4 F. 3.4 Diagnostic value evaluation Utilizing data from the TCGA database, the area under the ROC curve (AUC) values were calculated for five core genes: DPT (AUC = 0.923), FBP2 (AUC = 0.602), ADH7 (AUC = 0.855), INHBA (AUC = 0.961), and GPR155 (AUC = 0.739)(Fig. 5 A). Furthermore, the diagnostic performance of ten machine learning algorithms was evaluated on the training datasets by generating ROC curves. Notably, the support vector machine (SVM) models exhibited the highest AUC values, with SVM (AUC = 0.938), random forest (RF) (AUC = 0.936), k-nearest neighbors (KNN) (AUC = 0.930), gradient boosting machine (GBM) (AUC = 0.920), XGBoost (AUC = 0.915), neural network (NeuralNet) (AUC = 0.908), decision tree (DTS) (AUC = 0.871), logistic regression (Logistic) (AUC = 0.867), generalized linear model boosting (glmBoost) (AUC = 0.866), and partial least squares (PLS) (AUC = 0.857) (Fig. 5 B). 3.5 SHAP analysis reveals the contributions of key DEGs To evaluate the influence of DPT, FBP2, ADH7, INHBA, and GPR155 on the predictive capacity in the SVM, we conducted Shapley Additive Explanation (SHAP) analysis. Visualization of SHAP values elucidated the specific roles of these genes, highlighting DPT and FBP2 as the most significant contributors(Fig. 6 A). The swarm plot's Y-axis arrangement delineated the ranking of gene contributions to the model and clarified the allocation of SHAP values across features. It indicated that reduced expression levels of DPT, FBP2, ADH7, and GPR155 were linked to tumor prediction, while diminished expression of INHBA genes was associated with normal prediction(Fig. 6 B). The SHAP dependency plot(Fig. 6 C) provided a detailed exploration of molecular interactions and contributions within the predictive model of gastric cancer.Each plot illustrated the correlation between gene pairs using SHAP values on both axes, unveiling positive and negative correlation patterns. For example, the DPT versus FBP2 plot demonstrated a positive correlation, indicating that increased expression levels of these genes collectively impacted the model's prediction of normal samples. The color gradient in each plot represented the range of SHAP values, with high values shown in orange and low values in purple. Conversely, GPR155 displayed a narrow range of SHAP values, suggesting a limited impact on the model.The SHAP analysis elucidated the magnitude of contributions and intricate interactions among components in the model, offering insights into their implications for gastric cancer biology beyond conventional understanding. 3.6 The working principle of the model: interpretation based on the SHAP As can be seen from Fig. 7 A and Fig. 7 B, the force and waterfall plots show the prediction results of the DPT, ADH7, INHBA, FBP2, and GPR155 model in a single sample, the first sample of the train dataset (GSE26942-GSM662387). The vertical axis represents the expression of five molecules in the sample, and the horizontal axis represents the predicted value. We can see that DPT still has the greatest impact on the results (-0.274), followed by ADH7 (-0.188), INHBA (-0.123) and FBP2 (-0.116), which tend to classify samples as normal. It is worth noting that GPR155 is biased towards tumor sample prediction. The predicted value (f(x)) was 0.0112, while the expected final predicted value (E[f(x)]) was 0.68, successfully classifying this sample as normal gastric mucosa tissue, indicating high reliability of the model. Taken together, DPT, ADH7, and INHBA were identified in this analysis as key drivers of model prediction accuracy, revealing their important role in underlying biological processes. 3.7 Functional enrichment analyses of five hub DEGs To elucidate the functional implications and potential molecular pathways which the aforementioned genes associated with, we employed Gene Set Enrichment Analysis and Gene Set Variation Analysis. Samples were stratified into high and low expression groups in accordance with individual gene median expression levels. After that, rigorous GSEA enrichment analysis was carried out for these groups, identifying and retaining significantly enriched pathways with a p-value less than 0.05. The first three most significant pathways related to high and low expression were visualized(Fig. 8 A). Gene Set Variation Analysis, a nonparametric and unsupervised method, was utilized to estimate the variation in gene set enrichment across samples. In order to assess differences in KEGG pathway activity between high and low expression groups were assessed using GSVA(Fig. 8 B). Our findings indicated significant activation of the calcium signaling pathway, vascular smooth muscle contraction, dilated cardiomyopathy, and hypertrophic cardiomyopathy pathways in the high expression DPT group. Conversely, activities of aminoacyl-tRNA biosynthesis, homologous recombination, DNA replication, and the cell cycle pathway were decreased. In the FBP2 group, upregulation was observed in retinol metabolism and cytochrome P450-mediated exogenous biological metabolic pathways, while ecm-receptor interaction, pathways in cancer, NOD-like receptor signaling pathway, Wnt signaling pathway, and dorsoventral axis formation were downregulated. Similarly, ADH7 exhibited upregulation in pathways similar to FBP2, with downregulation primarily in progesterone-mediated oocyte maturation, RNA polymerase activity, and purine metabolism. For the INHBA group, upregulation was noted in ECM-receptor interaction, TGF-β signaling pathway, focal adhesion, and systemic lupus erythematosus pathways, while downregulation was observed in nitrogen metabolism, malonic acid, pyruvate, butyrate metabolism, and valine, leucine, and isoleucine degradation. GPR155 showed upregulation in the calcium signaling pathway, taste transduction, and neuroactive ligand-receptor interaction, and downregulation in pyrimidine metabolism, cell cycle, and the P53 signaling pathway. The dysregulation of these pathways, influenced by the five model genes, underscores the complexity of the underlying molecular mechanisms associated with gastric cancer, offering valuable insights for further investigation. 3.8 Immune infiltration analysis Immune desert tumors refer to the lack of immune cell infiltration in the tumor microenvironment (TME), which can lead to tumor non-response to immunotherapy and lead to worse survival. Figure 9 shows the difference in immune cell infiltration and the association between signature genes and immune cells identified by CIBERSORT algorithm in gastric cancer and normal group, respectively. As shown, these characteristic genes are significantly associated with most types of immune cells, demonstrating that they may be instrumental in influencing the immune microenvironment. 3.9 Clinical prognostic correlation analysis To decipher the role of model genes in GC patient survival, we analyzed their prognostic differences on overall survival (OS) by using the Kaplan-Meier Plotter website. The results indicate that patients with upregulated model genes exhibited significantly inferior OS rates compared to low-expression groups, with DPT (HR = 1.15; P = 0.12), FBP2 (HR = 1.46; P = 1e-04), ADH7 (HR = 1.36; P = 0.00079), INHBA (HR = 1.3; P = 0.0075), and GPR155 (HR = 1.8; P = 2.1e-07)(Fig. 10 ). 4. DISCUSSION The clinical management of gastric cancer remains challenging, as most patients present with advanced-stage disease at diagnosis [ 25 ] . According to current guidelines and expert consensus, advanced gastric cancer refers to cases where the cancerous tissue invades the muscularis propria of the stomach wall or penetrates the muscle layer to reach the serosa. Advanced - stage gastric cancer mainly refers to tumors that have infiltrated, spread, and metastasized to distant organs, such as liver metastasis and peritoneal metastasis. Peritoneal metastasis (PM) is a prevalent manifestation of advanced gastric cancer and represents the primary mode of recurrence following gastric cancer surgery, significantly impacting the prognosis of patients with this disease [ 26 ] . Moreover, the gene mutations in tumor cells are complex. Tumors exhibit substantial genomic heterogeneity, with divergent mutation profiles observed across individuals [ 27 ] . Therefore, the need to find precise treatment methods is becoming increasingly urgent. Accurate biomarkers play a pivotal role in modern medical research by enabling early disease diagnosis, treatment selection, and prognosis assessment. The advancement of omics technologies and machine learning has greatly enhanced the study of tumor markers. Bioinformatics encompasses a suite of analytical techniques, including gene expression profiling, survival analysis, protein-protein interaction network reconstruction, and functional enrichment analysis, among others [ 28 ] . Machine learning, a subfield of artificial intelligence (AI), empowers computer systems to enhance performance autonomously by leveraging data and experience rather than explicit programming. ML algorithms extract patterns from extensive datasets to execute tasks like prediction, classification, and decision-making. In contrast to costly and time-consuming experimental approaches, ML algorithms offer a cost-effective and efficient means of analyzing intricate biological data to enhance precision and effectiveness [ 29 ] . By building predictive models, ML can identify potential biomarkers and therapeutic targets in large biological datasets. Disease-associated genes identified through large-scale genomic analyses may serve as potential biomarkers for diagnosis or molecular targets for therapeutic intervention [ 30 ] . Interpretability tools provide transparency via visualization or numerical analysis rather than blindly trusting results. Interpretability tools can provide transparency by visualizing or numerical analysis instead of blindly trusting the output of the algorithm, which could lead to serious harm on patients. The essence of the need for interpretation of machine learning models is to transform technical logic into human - understandable decision - making basis, thus meeting the multiple requirements of technical reliability, social ethics, and legal compliance. Local interpretation of machine learning models helps meet transparency requirements, promotes human - machine collaboration, and contributes to model development, debugging, and monitoring. SHAP has become one of the gold standards in the current interpretable AI field through its rigorous mathematical foundation and flexible practical tools. Its proposed approach offers a comprehensive interpretation of the model by quantifying the marginal contribution of individual features from both global and local perspectives. This enhanced transparency empowers users to better understand and trust the model's predictions and decision-making process. It enables people to identify new diagnostic and prognostic biomarkers that play a essential role in the development, progression, and metastasis of cancer. In this study, we included five gastric cancer mRNA expression datasets (GSE26942, GSE27342, GSE30727, GSE63089, and GSE65801). Differential gene expression analysis was performed between gastric cancer tissues and matched normal gastric mucosa samples. By utilizing three machine learning methods to screen genes and identify an intersection, we pinpointed several genes that are closely associated with the biology of gastric cancer. Notably, DPT, FBP2, ADH7, INHBA, and GPR155 emerged as significant factors in the pathogenesis of gastric cancer, with previous implications in various cancer types. Dermatopontin (DPT), an extracellular matrix protein initially obtained during the purification process of dermatan sulfate proteoglycan, is believed to influence cell-matrix interactions and matrix assembly. Research indicates that DPT expression is diminished in several malignancies, such as hepatocellular carcinoma, colon, oral, ovarian, breast and papillary thyroid cancers. This reduction in DPT expression has been linked to the promotion of tumor initiation, progression, and metastasis through the inactivation of signaling pathways like Wnt and Hippo/YAP [ 31 – 33 ] . Fructose-1,6-bisphosphatase 2 (FBP2), as a moonlighting protein, is extensively expressed in all non-gluconeogenic tissues. Beyond its canonical metabolic role in glycogen synthesis, it also involved in cell cycle-dependent processes, facilitates synaptic plasticity, and modulates the activity of transcription factors.FBP2 has the capability to interact directly with c-MYC in oral squamous cell carcinoma cells and sarcoma cells. Reduced levels of FBP2 have been associated with enhanced tumor growth and invasion [ 34 ] . Duda et al. Found FBP2 inhibits the transcriptional activity of HIF - α in lung cancer cells [ 35 ] . Additionally, Co²⁺enhances Camk2α activity through structural remodeling of Fbp2, modulating its mitochondrial binding affinity, which is a potential mechanism for inducing epilepsy [ 36 ] . Alcohol dehydrogenase 7 (ADH7) is predominantly expressed in the proximal gastrointestinal tract, where it mediate the cytochrome P450-mediated metabolism of xenobiotics. Specifically, ADH7 catalyzes the oxidative conversion of ethanol within the gastroesophageal mucosa through its dehydrogenase activity, a process that precedes systemic absorption [ 37 ] . Single - nucleotide polymorphisms in ADH7 have been reported as susceptibility factors for tumor and drug dependence. A study suggested through Mendelian analysis that ADH7 (OR = 1.3568, 95% CI = 1.1044–1.6670) may be a marker for gastric cancer [ 38 ] . The encoded product of the INHBA gene is the inhibin βA subunit, an important member of the transforming growth factor - β superfamily, which participates in the production of inhibins and activins. The INHBA gene has been confirmed as an oncogene and is overexpressed in various malignant solid tumors, such as ovarian cancer, rectal cancer, and head - and - neck squamous cell carcinoma [ 39 – 41 ] . As for G protein - coupled receptor 155 (GPR155), also named as the lysosomal cholesterol sensor (LYCHOS) protein [ 42 ] , it can control the lysosomal cholesterol sensor and regulate the lysosomal pathway to convert cholesterol levels into the activation of mTORC1 signaling, participating in the regulation of tumor metabolism [ 43 ] . DaiShimizu et al. found low expression of GPR155 in gastric cancer cells. Next-generation sequencing analysis of liver-metastatic GC tissues revealed GPR155 as a promising diagnostic biomarker for hematogenous metastasis, demonstrating significant differential expression patterns compared to primary lesions [ 44 ] . To validate the importance of these genes in the gastric cancer prediction model, we used SHAP analysis. We interpreted the SVM model using SHAP analysis, with DPT and FBP2 being the most critical. Visualization of SHAP values revealed the relative contributions of individual genes, highlighting their significance in the model's decision-making process. Applying the CIBERSORT algorithm, we characterized differences in the immune microenvironment between gastric cancer (GC) tissues and control samples. Leveraging the TCGA database, we further validated the expression levels and diagnostic potential of the model genes. Additionally, we assessed the prognostic impact of these five genes on the survival of gastric cancer patients using the Kaplan-Meier plotter online tool. 5. Limitations The findings of this study are derived from analyses of public databases. Although we have endeavored to validate the results by integrating TCGA-STAD and multiple GEO datasets, the conclusions still require further confirmation through fundamental biological experiments. A limited number of published studies have touched upon the roles of these genes in gastric cancer. In future work, we will prioritize experimental validation and investigation of the specific molecular mechanisms through which these genes influence the development and progression of gastric cancer. 6.Conclusion In conclusion, we identified five distinct genes with significant implications for the prediction and diagnosis of gastric malignancies utilizing a combination of bioinformatics analysis and machine learning techniques. Furthermore, a comprehensive analysis of the developed model was conducted employing SHAP interpretation. Notably, DPT and FBP2 emerged as prominent among these genes. Our examination extended to elucidating the potential pathways associated with these genes, their prognostic relevance, and their influence on immune cell infiltration. This study has unveiled novel candidate biomarkers for gastric cancer, offering promising prospects for the advancement of diagnostic and therapeutic approaches. Subsequent research endeavors should prioritize the validation of the functional roles of above-mentioned genes and elucidate their interactions with tumor immune microenvironment, thereby paving the way for innovative strategies in the diagnosis and treatment of gastric cancer. Declarations AUTHOR CONTRIBUTIONS Jianbo Zhao: Data curation (equal); formal analysis (equal); investigation; writing – original draft (lead). Damu Agu: formal analysis (equal); writing – original draft (equal). Xiongfeng Li: Methodology and visualization. Youge Su: Data curation (equal). Haidong Cheng: Formal analysis (equal); project administration (equal); writing – review and editing (equal). Mingxing Hou: Project administration (equal); writing – review and editing (equal) ACKNOWLEDGMENTS We sincerely appreciate the valuable contribution of the TCGA and GEO databases in making data available to the public. The authors also thank all participants who participate in this research. CONFLICT OF INTEREST STATEMENT The authors have no conflict of interest. DATA AVAILABILITY STATEMENT The data that support the findings of this study are available from The Cancer Genome Atlas Stomach Adenocarcinoma (TCGA-STAD) cohort: https://portal.gdc.cancer.gov/analysis_page?app=Projects/TCGA-STAD and Gene Expression Omnibus (GEO) repository: https://www.ncbi.nlm.nih.gov/geo/. The specific accession numbers for the five datasets are [GSE26942, GSE27342, GSE30727, GSE63089, and GSE65801]. The corresponding author can supply R code utilized in this research upon reasonable request. FUNDING INFORMATION This work is support by Inner Mongolia Science and Technology Plan Project (Grant No. 2022YFSH0081) and The Youth Project of Inner Mongolia Medical University (Grant No. YKD2023QN011), alongside support from the Youth Exploration Project of Inner Mongolia Medical University Affiliated Hospital, allocated under Grant No. 2022NYFYTS015. Ethical approval The data utilized in this study were exclusively sourced from public databases, no separate ethical approval was required for this secondary analysis. Consent to participate Not Applicable Consent to publish Separate consent for publication in this context was not required. CLINICAL TRIAL NUMBER Not Applicable References Bray F, Laversanne M, Sung H, Ferlay J, Siegel RL, Soerjomataram I et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 74(3). United States:Wiley-Blackwell,2024. 229–263. http://doi.org/10.3322/caac.21834 Morgan E, Arnold M, Camargo MC, Gini A, Kunzmann AT, Matsuda T et al. The current and future incidence and mortality of gastric cancer in 185 countries, 2020-40: A population-based modelling study. EClinicalMedicine. 47England:Elsevier,2022. 101404. https://doi.org/10.1016/j.eclinm.2022.101404 Duan Y, Xu Y, Dou Y, Xu D. Helicobacter pylori and gastric cancer: mechanisms and new perspectives. J Hematol Oncol. 18(1). England:BioMed Central,2025. 10. https://doi.org/10.1186/s13045-024-01654-2 Guan WL, He Y, Xu RH. Gastric cancer treatment: recent progress and future perspectives. J Hematol Oncol. 16(1). England:BioMed Central,2023. 57. https://doi.org/10.1186/s13045-023-01451-3 Lu L, Mullins CS, Schafmayer C, Zeißig S, Linnebacher M. A global assessment of recent trends in gastrointestinal cancer and lifestyle-associated risk factors. Cancer Commun (Lond). 41(11). United States:other,2021. 1137–1151. https://doi.org/10.1002/cac2.12220 Lordick F, Carneiro F, Cascinu S, Fleitas T, Haustermans K, Piessen G et al. Gastric cancer: ESMO Clinical Practice Guideline for diagnosis, treatment and follow-up. Ann Oncol. 33(10). England:Oxford University Press,2022. 1005–20. https://doi.org/10.1016/j.annonc.2022.07.004 Smyth EC, Nilsson M, Grabsch HI, van Grieken NC, Lordick F. Gastric cancer. Lancet. 396(10251). England:other,2020. 635–648. https://doi.org/10.1016/s0140-6736(20)31288-5 Ajani JA, D'Amico TA, Bentrem DJ, Chao J, Cooke D, Corvera C et al. Gastric Cancer, Version 2.2022, NCCN Clinical Practice Guidelines in Oncology. J Natl Compr Canc Netw 20(2). United States:Cold Spring Publishing LLC,2022. 167–92. https://doi.org/10.6004/jnccn.2022.0008 Selvakumar SC, Preethi KA, Ross K, Tusubira D, Khan M, Mani P et al. CRISPR/Cas9 and next generation sequencing in the personalized treatment of Cancer. Mol Cancer. 21(1). England:BioMed Central,2022. 83. https://doi.org/10.1186/s12943-022-01565-1 Black JE, Kueper JK, Williamson TS. An introduction to machine learning for classification and prediction. Fam Pract. 40(1). England:Oxford University Press,2023. 200–204. https://doi.org/10.1093/fampra/cmac104 Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B et al. From Local Explanations to Global Understanding with Explainable AI for Trees. Nat Mach Intell. 2(1). England:SPRINGERNATURE,2020. 56–67. https://doi.org/10.1038/s42256-019-0138-9 Matsuoka T, Yashiro M. Bioinformatics Analysis and Validation of Potential Markers Associated with Prediction and Prognosis of Gastric Cancer. Int J Mol Sci. 25(11). Switzerland:MDPI (Basel, Switzerland),2024. 5880. https://doi.org/10.3390/ijms25115880 Oh SC, Sohn BH, Cheong JH, Kim SB, Lee JE, Park KC et al. Clinical and genomic landscape of gastric cancer with a mesenchymal phenotype. Nat Commun 9(1). England:Springer Nature,2018. 1777. https://doi.org/10.1038/s41467-018-04179-8 Cui J, Chen Y, Chou WC, Sun L, Chen L, Suo J et al. An integrated transcriptomic and computational analysis for biomarker identification in gastric cancer. Nucleic Acids Res. 39(4). England:Oxford University Press,2011. 1197 – 207. https://doi.org/10.1093/nar/gkq960 Zhang X, Ni Z, Duan Z, Xin Z, Wang H, Tan J et al. Overexpression of E2F mRNAs associated with gastric cancer progression identified by the transcription factor and miRNA co-regulatory network analysis. PLoS ONE 10(2). United States:Public Library of Science,2015. e0116979. https://doi.org/10.1371/journal.pone.0116979 Li H, Yu B, Li J, Su L, Yan M, Zhang J et al. Characterization of differentially expressed genes involved in pathways associated with gastric cancer. PLoS ONE 10(4). United States:Public Library of Science,2015. e0125013. https://doi.org/10.1371/journal.pone.0125013 Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43(7). England:Oxford University Press,2015. e47. https://doi.org/10.1093/nar/gkv007 Xu S, Hu E, Cai Y, Xie Z, Luo X, Zhan L et al. Using clusterProfiler to characterize multiomics data. Nat Protoc. 19(11). England:Springer Nature,2024. 3292–3320. https://doi.org/10.1038/s41596-024-01020-z Hänzelmann S, Castelo R, Guinney J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinf 14England:BioMed Cent 2013. 7. https://doi.org/10.1186/1471-2105-14-7 Friedman J, Hastie T, Tibshirani R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw. 33(1). United States:University of California at Los Angeles,2010. 1–22. Sanz H, Valim C, Vegas E, Oller JM, Reverter F. SVM-RFE: selection and visualization of the most relevant features through non-linear kernels. BMC Bioinformatics. 19(1). England:BioMed Central,2018. 432. https://doi.org/10.1186/s12859-018-2451-4 Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, et al. pROC: an open-source package for R and S + to analyze and compare ROC curves. BMC Bioinf 12England:BioMed Cent. 2011;77. https://doi.org/10.1186/1471-2105-12-77 . Newman AM, Liu CL, Green MR, Gentles AJ, Feng W, Xu Y et al. Robust enumeration of cell subsets from tissue expression profiles. Nat Methods. 12(5). United States:Springer Nature,2015. 453-7. https://doi.org/10.1038/nmeth.3337 Győrffy B. Integrated analysis of public datasets for the discovery and validation of survival-associated genes in solid tumors. Innovation (Camb). 5(3). United States:other,2024. 100625. https://doi.org/10.1016/j.xinn.2024.100625 Yasuda T, Wang YA. Gastric cancer immunosuppressive microenvironment heterogeneity: implications for therapy development. Trends Cancer. 10(7). United States:Elsevier,2024. 627–642. https://doi.org/10.1016/j.trecan.2024.03.008 Li GZ, Doherty GM, Wang J. Surgical Management of Gastric Cancer: A Review. JAMA Surg. 157(5). United States:American Medical Association,2022. 446–454. https://doi.org/10.1001/jamasurg.2022.0182 Körfer J, Lordick F, Hacker UT. Molecular Targets for Gastric Cancer Treatment and Future Perspectives from a Clinical and Translational Point of View. Cancers (Basel). 13(20). Switzerland:MDPI (Basel, Switzerland),2021. 5216. https://doi.org/10.3390/cancers13205216 Huang J, Mao L, Lei Q, Guo AY. Bioinformatics tools and resources for cancer and application. Chin Med J (Engl). 137(17). China:Wolters Kluwer Medknow Publications,2024. 2052–2064. https://doi.org/10.1097/cm9.0000000000003254 Aromolaran O, Aromolaran D, Isewon I, Oyelade J. Machine learning approach to gene essentiality prediction: a review. Brief Bioinform. 22(5). England:Oxford University Press,2021. bbab128 [pii]. https://doi.org/10.1093/bib/bbab128 Guan S, Xu Z, Yang T, Zhang Y, Zheng Y, Chen T et al. Identifying potential targets for preventing cancer progression through the PLA2G1B recombinant protein using bioinformatics and machine learning methods. Int J Biol Macromol. 276(Pt 1). Netherlands:Elsevier,2024. 133918. https://doi.org/10.1016/j.ijbiomac.2024.133918 Huang S, Ma L, Lan B, Liu N, Nong W, Huang Z. Comprehensive analysis of prognostic genes in gastric cancer. Aging (Albany NY). 13(20). United States:other,2021. 23637–23651. https://doi.org/10.18632/aging.203638 Ye D, Wang Y, Deng X, Zhou X, Liu D, Zhou B et al. DNMT3a-dermatopontin axis suppresses breast cancer malignancy via inactivating YAP. Cell Death Dis. 14(2). England:Springer Nature,2023. 106. https://doi.org/10.1038/s41419-023-05657-8 Catalán V, Domench P, Gómez-Ambrosi J, Ramírez B, Becerril S, Mentxaka A et al. Dermatopontin Influences the Development of Obesity-Associated Colon Cancer by Changes in the Expression of Extracellular Matrix Proteins. Int J Mol Sci. 23(16). Switzerland:MDPI (Basel, Switzerland),2022. 9222. https://doi.org/10.3390/ijms23169222 Gizak A, Budziak B, Domaradzka A, Pietras Ł, Rakus D. Fructose 1,6-bisphosphatase as a promising target of anticancer treatment. Adv Biol Regul. 95England:other,2025. 101057. https://doi.org/10.1016/j.jbior.2024.101057 Duda P, Janczara J, McCubrey JA, Gizak A, Rakus D. The Reverse Warburg Effect is Associated with Fbp2-Dependent Hif1α Regulation in Cancer Cells Stimulated by Fibroblasts. Cells. 9(1). Switzerland:MDPI (Basel, Switzerland),2020. 205. https://doi.org/10.3390/cells9010205 Duda P, Budziak B, Rakus D. Cobalt Regulates Activation of Camk2α in Neurons by Influencing Fructose 1,6-bisphosphatase 2 Quaternary Structure and Subcellular Localization. Int J Mol Sci. 22(9). Switzerland:MDPI (Basel, Switzerland),2021. 4800. https://doi.org/10.3390/ijms22094800 Zhao L, Lei H, Shen L, Tang J, Wang Z, Bai W et al. Prognosis genes in gastric adenocarcinoma identified by cross talk genes in disease–related pathways. Mol Med Rep. 16(2). Greece:Spandidos Publications,2017. 1232–1240. https://doi.org/10.3892/mmr.2017.6699 Duell EJ, Sala N, Travier N, Muñoz X, Boutron-Ruault MC, Clavel-Chapelon F et al. Genetic variation in alcohol dehydrogenase (ADH1A, ADH1B, ADH1C, ADH7) and aldehyde dehydrogenase (ALDH2), alcohol consumption and gastric cancer risk in the European Prospective Investigation into Cancer and Nutrition (EPIC) cohort. Carcinogenesis. 33(2). England:Oxford University Press,2012. 361-7. https://doi.org/10.1093/carcin/bgr285 Hu Y, Recouvreux MS, Haro M, Taylan E, Taylor-Harding B, Walts AE et al. INHBA(+) cancer-associated fibroblasts generate an immunosuppressive tumor microenvironment in ovarian cancer. NPJ Precis Oncol. 8(1). England:Springer Nature,2024. 35. https://doi.org/10.1038/s41698-024-00523-y Wang JJ, Chen DX, Zhang Y, Xu X, Cai Y, Wei WQ et al. Elevated expression of the RNA-binding protein IGF2BP1 enhances the mRNA stability of INHBA to promote the invasion and migration of esophageal squamous cancer cells. Exp Hematol Oncol. 12(1). England:BioMed Central,2023. 75. https://doi.org/10.1186/s40164-023-00429-8 Li FL, Gu LH, Tong YL, Chen RQ, Chen SY, Yu XL et al. INHBA promotes tumor growth and induces resistance to PD-L1 blockade by suppressing IFN-γ signaling. Acta Pharmacol Sin. 46(2). United States:Nature Publishing Group,2025. 448–461. https://doi.org/10.1038/s41401-024-01381-x Bayly-Jones C, Lupton CJ, Keen AC, Dong S, Mastos C, Luo W et al. LYCHOS is a human hybrid of a plant-like PIN transporter and a GPCR. Nature. 634(8036). England:Springer Nature,2024. 1238–1244. https://doi.org/10.1038/s41586-024-08012-9 Shin HR, Citron YR, Wang L, Tribouillard L, Goul CS, Stipp R et al. Lysosomal GPCR-like protein LYCHOS signals cholesterol sufficiency to mTORC1. Science. 377(6612). United States:other,2022. 1290–1298. https://doi.org/10.1126/science.abg6621 Shimizu D, Kanda M, Tanaka H, Kobayashi D, Tanaka C, Hayashi M, et al. GPR155 Serves as a Predictive Biomarker for Hematogenous Metastasis in Patients with Gastric Cancer. Sci Rep 7England:Springer Nat. 2017;42089. https://doi.org/10.1038/srep42089 . Additional Declarations No competing interests reported. Cite Share Download PDF Status: Under Review Version 1 posted Reviews received at journal 04 Oct, 2025 Reviewers agreed at journal 30 Sep, 2025 Reviewers agreed at journal 25 Sep, 2025 Reviewers invited by journal 24 Sep, 2025 Editor assigned by journal 23 Sep, 2025 Editor invited by journal 23 Sep, 2025 Submission checks completed at journal 16 Sep, 2025 First submitted to journal 16 Sep, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7523494","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":524661973,"identity":"50081933-4ddb-4fd0-972f-24646e389c55","order_by":0,"name":"Jianbo Zhao","email":"","orcid":"","institution":"Inner Mongolia Medical University","correspondingAuthor":false,"prefix":"","firstName":"Jianbo","middleName":"","lastName":"Zhao","suffix":""},{"id":524661974,"identity":"388ebc2e-09c9-4a14-8c1a-fe9e8e8d6689","order_by":1,"name":"Damu Agu","email":"","orcid":"","institution":"Affiliated Hospital of Inner Mongolia Medical University","correspondingAuthor":false,"prefix":"","firstName":"Damu","middleName":"","lastName":"Agu","suffix":""},{"id":524661975,"identity":"9a9fa487-d7d4-414a-8ce6-506bf71edf2a","order_by":2,"name":"Xiongfeng Li","email":"","orcid":"","institution":"Affiliated Hospital of Inner Mongolia Medical University","correspondingAuthor":false,"prefix":"","firstName":"Xiongfeng","middleName":"","lastName":"Li","suffix":""},{"id":524661976,"identity":"d0633340-5d29-4c96-9f05-a27718fa9983","order_by":3,"name":"Youge Su","email":"","orcid":"","institution":"Inner Mongolia Medical University","correspondingAuthor":false,"prefix":"","firstName":"Youge","middleName":"","lastName":"Su","suffix":""},{"id":524661977,"identity":"d04b9f59-3ca6-4c8a-aa79-4f1452fd8209","order_by":4,"name":"Haidong Cheng","email":"","orcid":"","institution":"Affiliated Hospital of Inner Mongolia Medical University","correspondingAuthor":false,"prefix":"","firstName":"Haidong","middleName":"","lastName":"Cheng","suffix":""},{"id":524661978,"identity":"9f4b7d8c-9e6e-424d-aab7-729093aff1ac","order_by":5,"name":"Mingxing Hou","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAt0lEQVRIiWNgGAWjYBACAwYGxgcSBjZybOztB4jWwmxgUZBmzMdzJoFoLWwCFR8OJ86TcDAgTou52BkzhhsGzOltEgwJDD8qthHWYjk7Le3hDAO23DbpxgOMPWduE+Gw28nHjSUMeHLbZA4kMDO2EaUlsU36j4FEOptEggGxWpKPSUgYGCSQoiUt2UDCIMGwDRjIB4n0S47hA4k//+Xl29sPPvhRQYQWFHCARPWjYBSMglEwCnABAKR5OxUT8pEMAAAAAElFTkSuQmCC","orcid":"","institution":"Affiliated Hospital of Inner Mongolia Medical University","correspondingAuthor":true,"prefix":"","firstName":"Mingxing","middleName":"","lastName":"Hou","suffix":""}],"badges":[],"createdAt":"2025-09-03 06:38:37","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7523494/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7523494/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":93013524,"identity":"9d24d042-068e-43e1-be49-adcd44d43832","added_by":"auto","created_at":"2025-10-08 07:25:34","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":9285362,"visible":true,"origin":"","legend":"","description":"","filename":"MachineLearningDrivenDiscoveryofBiomarkersinGastricCancerAFocusonDPTFBP2ADH7INHBAandGPR155.docx","url":"https://assets-eu.researchsquare.com/files/rs-7523494/v1/23f97d8018aa37fcb29b327f.docx"},{"id":93011243,"identity":"e32bc6b3-88ba-4a68-ba30-e2a4e9d2e5a7","added_by":"auto","created_at":"2025-10-08 07:17:34","extension":"json","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":8022,"visible":true,"origin":"","legend":"","description":"","filename":"ab507f3bf44a48b385b5a190ee21fd49.json","url":"https://assets-eu.researchsquare.com/files/rs-7523494/v1/e5b0051e17c899b5a8f08440.json"},{"id":93011250,"identity":"bcda297a-20d0-498e-954b-8826e321c7c8","added_by":"auto","created_at":"2025-10-08 07:17:35","extension":"xml","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":119872,"visible":true,"origin":"","legend":"","description":"","filename":"ab507f3bf44a48b385b5a190ee21fd491enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-7523494/v1/66e01f5d5a41729745099d4a.xml"},{"id":93011246,"identity":"9c743f93-786d-4baf-9654-3f9690e1e696","added_by":"auto","created_at":"2025-10-08 07:17:35","extension":"jpeg","order_by":3,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":5541132,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage1.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7523494/v1/90d9c33466cde4a1196c94ff.jpeg"},{"id":93013933,"identity":"02272f0b-a9ff-4059-9d8c-976fd9e84b08","added_by":"auto","created_at":"2025-10-08 07:33:35","extension":"jpeg","order_by":4,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":2857964,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage10.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7523494/v1/c1cf0fc1f49392cf7b8ec4fc.jpeg"},{"id":93013526,"identity":"04a667c4-4ef2-487e-9ae8-6d2b0ebdfbbf","added_by":"auto","created_at":"2025-10-08 07:25:35","extension":"jpeg","order_by":5,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1351884,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage2.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7523494/v1/d884ff155d49ab07ac7453cf.jpeg"},{"id":93011245,"identity":"7d45c395-2a41-4ba4-ac46-9d644661128c","added_by":"auto","created_at":"2025-10-08 07:17:34","extension":"jpeg","order_by":6,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1627792,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage3.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7523494/v1/c745b75b36e20aedc01dbd48.jpeg"},{"id":93011263,"identity":"312c9c4e-4c86-4699-82a8-678d227d2fb9","added_by":"auto","created_at":"2025-10-08 07:17:35","extension":"jpeg","order_by":7,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1323280,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage4.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7523494/v1/6441780b6b217b6595ae4f3e.jpeg"},{"id":93011249,"identity":"c15b68ed-d155-4d73-87ae-c94f0a2f38d0","added_by":"auto","created_at":"2025-10-08 07:17:35","extension":"jpeg","order_by":8,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":530803,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage5.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7523494/v1/e54afb868b2e4e6cfd797a86.jpeg"},{"id":93011253,"identity":"969f7f00-ed67-4291-b5f1-fb73e8ac0690","added_by":"auto","created_at":"2025-10-08 07:17:35","extension":"jpeg","order_by":9,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":2444756,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage6.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7523494/v1/84316199a3c24b32efca5bd4.jpeg"},{"id":93011260,"identity":"81064cac-d472-41f9-9922-255d5ed10e3e","added_by":"auto","created_at":"2025-10-08 07:17:35","extension":"jpeg","order_by":10,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":779956,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage7.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7523494/v1/f7371ebf23cd49a4dc57b3fb.jpeg"},{"id":93011258,"identity":"5a787884-10b3-45a5-8bfc-1183d58e7c4f","added_by":"auto","created_at":"2025-10-08 07:17:35","extension":"jpeg","order_by":11,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":440700,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage8.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7523494/v1/cd996d36efee09d616dd6c28.jpeg"},{"id":93011261,"identity":"4a55ba53-2c03-457e-bc32-920cd1ce6503","added_by":"auto","created_at":"2025-10-08 07:17:35","extension":"jpeg","order_by":12,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":630092,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage9.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7523494/v1/fd83c93fec395482212c52a6.jpeg"},{"id":93011268,"identity":"745685ce-3411-4b57-aa78-236967e8065b","added_by":"auto","created_at":"2025-10-08 07:17:35","extension":"png","order_by":13,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1453013,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7523494/v1/1bfa6073e7072308c96eeb9f.png"},{"id":93011269,"identity":"40cf841e-2f47-4135-bde5-c57fecd6c68e","added_by":"auto","created_at":"2025-10-08 07:17:36","extension":"png","order_by":14,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":164751,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage10.png","url":"https://assets-eu.researchsquare.com/files/rs-7523494/v1/93a18ef18ae9fc18572cd546.png"},{"id":93011265,"identity":"af8117fd-8122-46c2-9bd9-affbb75dc7fd","added_by":"auto","created_at":"2025-10-08 07:17:35","extension":"png","order_by":15,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":97788,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-7523494/v1/9ba19cb7389abd5366a32e67.png"},{"id":93013531,"identity":"6d00ccca-a789-40a1-9d8a-4f13feb073be","added_by":"auto","created_at":"2025-10-08 07:25:36","extension":"png","order_by":16,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":116612,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-7523494/v1/97abc9b1140fcb69bd7a09f0.png"},{"id":93011264,"identity":"f6b07e0c-9520-4142-b32a-d233110c102b","added_by":"auto","created_at":"2025-10-08 07:17:35","extension":"png","order_by":17,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":105173,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-7523494/v1/d9fe57bfd5b5168f3be93493.png"},{"id":93011266,"identity":"cefe73ea-03a2-49c4-99dc-c0b0fa5a41e4","added_by":"auto","created_at":"2025-10-08 07:17:35","extension":"png","order_by":18,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":94545,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-7523494/v1/b855c0bf7885c174baf757be.png"},{"id":93011275,"identity":"19940ca6-003b-4682-bf41-5ab8c6b0aed0","added_by":"auto","created_at":"2025-10-08 07:17:36","extension":"png","order_by":19,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":931265,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-7523494/v1/c915ec6502a4aef25cafd99d.png"},{"id":93011274,"identity":"5301afd0-100c-4c46-b58f-9c001b845c71","added_by":"auto","created_at":"2025-10-08 07:17:36","extension":"png","order_by":20,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":212139,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage7.png","url":"https://assets-eu.researchsquare.com/files/rs-7523494/v1/d50da34098e7eea297da18ee.png"},{"id":93011271,"identity":"44e0828f-6614-4b67-b0b2-615779964c9b","added_by":"auto","created_at":"2025-10-08 07:17:36","extension":"png","order_by":21,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":79802,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage8.png","url":"https://assets-eu.researchsquare.com/files/rs-7523494/v1/86d107c8b9c880a086db0ea1.png"},{"id":93011273,"identity":"21dadc9d-e3b5-40b1-ac29-5b656dd2bcf8","added_by":"auto","created_at":"2025-10-08 07:17:36","extension":"png","order_by":22,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":127948,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage9.png","url":"https://assets-eu.researchsquare.com/files/rs-7523494/v1/0da04b25a4c7f6053203e8c3.png"},{"id":93011257,"identity":"9a349d02-fe63-4e33-9796-5b7330e8b8b0","added_by":"auto","created_at":"2025-10-08 07:17:35","extension":"xml","order_by":23,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":119096,"visible":true,"origin":"","legend":"","description":"","filename":"ab507f3bf44a48b385b5a190ee21fd491structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-7523494/v1/8c35f8b19cd2502e248c360f.xml"},{"id":93013530,"identity":"9a62f8a2-0b62-4215-a85e-ebd8a9567d72","added_by":"auto","created_at":"2025-10-08 07:25:35","extension":"html","order_by":24,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":129502,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-7523494/v1/afb89f3cf6c0ead80426a0b4.html"},{"id":93011242,"identity":"89bb3220-183e-44e7-bf08-633787b06270","added_by":"auto","created_at":"2025-10-08 07:17:34","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":827212,"visible":true,"origin":"","legend":"\u003cp\u003ePrincipal component analysis was conducted to illustrate gene expression patterns across datasets and perform a differential gene expression analysis. (A) Depicts the distribution of the five datasets prior to the removal of batch effects. (B) Demonstrates the elimination of all confounding factors from the corrected samples. (C) Displays volcanic plots representing differentially expressed genes, with red and blue dots indicating significantly upregulated and downregulated genes, respectively, and black dots representing non-significant genes. (D) Shows a heatmap depicting the expression patterns of DEGs across the samples.\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7523494/v1/dcf9b0c66b5a6014c33931e7.png"},{"id":93011248,"identity":"a0b0edde-e909-4804-80b4-3714ca1f5c8d","added_by":"auto","created_at":"2025-10-08 07:17:35","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":515450,"visible":true,"origin":"","legend":"\u003cp\u003eFunctional enrichment of 171 DEGs. (A)GO and (B)KEGG enrichment analyse.\u003c/p\u003e","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-7523494/v1/102a48afe1c77226592a9ac6.png"},{"id":93011277,"identity":"4237a58a-b803-4e8e-91c5-2de43a54061d","added_by":"auto","created_at":"2025-10-08 07:17:37","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":623801,"visible":true,"origin":"","legend":"\u003cp\u003eIdentification of key genes. (A and B) The LASSO logistic regression algorithm was utilized, leading to the selection of 33 genes associated with gastric cancer. (C and D) The SVM-RFE algorithm was applied to determine the optimal combination of feature genes and ultimately identifying 26 genes as the optimal feature set. (E and F) The RF algorithm determined genes whose importance score over than 3 as the best feature genes.\u003c/p\u003e","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-7523494/v1/a302b8e3b28b16d3058abfae.png"},{"id":93013525,"identity":"cd36ff3e-3165-4037-91c3-71766c59a8df","added_by":"auto","created_at":"2025-10-08 07:25:35","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":473247,"visible":true,"origin":"","legend":"\u003cp\u003eThe expression of the 5 hub genes in datasets.\u003cstrong\u003e \u003c/strong\u003e(A) Venn diagram showing the 5 signature genes shared by LASSO, SVM-RFE and RF. (B and C) The expression levels of the 5 signature genes in GEO datasets. (D and E) The expression levels of the 5 signature genes in TCGA datasets.(F) Chromosome location map of the 5 hub genes.\u003c/p\u003e","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-7523494/v1/57b84e69d016b697af44ff86.png"},{"id":93014772,"identity":"043d7192-31df-432d-8e2e-0bbf5283d489","added_by":"auto","created_at":"2025-10-08 07:41:35","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":409218,"visible":true,"origin":"","legend":"\u003cp\u003eDiagnostic value evaluation. (A) The ROC curves of each biomarker—DPT, FBP2, ADH7, INHBA, GPR155—demonstrate their essential diagnostic value for GC. (B) ROC analysis for different machine learning model.\u003c/p\u003e","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-7523494/v1/3e2b6a3323a572ed794c21ef.png"},{"id":93011252,"identity":"33dde843-771d-489e-a34e-703fdadc1bdf","added_by":"auto","created_at":"2025-10-08 07:17:35","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":689260,"visible":true,"origin":"","legend":"\u003cp\u003eSHAP model. (A) SHAP value distribution highlights each gene's significance within the predictive model. (B) Correlation between feature values and SHAP values. (C) Multiple scatter plots depict the relationship between SHAP values for DPT, FBP2, ADH7, INHBA, and GPR155.\u003c/p\u003e","description":"","filename":"floatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-7523494/v1/d3963161165efa3457017be1.png"},{"id":93011262,"identity":"58207b93-7bd0-4161-95c0-2dd4ded95899","added_by":"auto","created_at":"2025-10-08 07:17:35","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":346954,"visible":true,"origin":"","legend":"\u003cp\u003eKey Features and Genes’ Contributions in unique sample\u003cstrong\u003e.\u003c/strong\u003eThe waterfall plot(A)and force plot(B).\u003c/p\u003e","description":"","filename":"floatimage7.png","url":"https://assets-eu.researchsquare.com/files/rs-7523494/v1/6e74fceb474652a38c41d79e.png"},{"id":93011276,"identity":"a5756937-2c6d-4ce7-8814-a7cb02d61735","added_by":"auto","created_at":"2025-10-08 07:17:36","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":365225,"visible":true,"origin":"","legend":"\u003cp\u003eRelative pathways of five key biomarkers.(A) GSEA and (B) GSVA\u003c/p\u003e","description":"","filename":"floatimage8.png","url":"https://assets-eu.researchsquare.com/files/rs-7523494/v1/a20fdb5ebc61b8eca2e66768.png"},{"id":93013935,"identity":"c408a4fa-cf26-4390-8a65-12ac5d687a59","added_by":"auto","created_at":"2025-10-08 07:33:36","extension":"png","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":500768,"visible":true,"origin":"","legend":"\u003cp\u003eImmune cell infiltration analysis in gastric cancer.\u003cstrong\u003e \u003c/strong\u003e(A) Comparative analysis using a box-and-whisker plot shows the percentage of 22 distinct immune cell types in the tumor and normal groups. (B) Correlation analysis of 5 genes with various immune cells. *p\u0026lt;0.05; **p\u0026lt;0.01; ***p\u0026lt;0.001.\u003c/p\u003e","description":"","filename":"floatimage9.png","url":"https://assets-eu.researchsquare.com/files/rs-7523494/v1/df9800a2c4a6a20f1bf16b65.png"},{"id":93013529,"identity":"2a2c0ec0-3845-4fc6-bd5b-8aa17a8c0985","added_by":"auto","created_at":"2025-10-08 07:25:35","extension":"png","order_by":10,"title":"Figure 10","display":"","copyAsset":false,"role":"figure","size":1272822,"visible":true,"origin":"","legend":"\u003cp\u003eSurvival analysis for individual genes: Kaplan-Meier curve. (A) ADH7; (B) DPT; (C) FBP2; (D) GPR155; (E) INHBA.\u003c/p\u003e","description":"","filename":"floatimage10.png","url":"https://assets-eu.researchsquare.com/files/rs-7523494/v1/47559f3a2aa114235bc6dad5.png"},{"id":93014973,"identity":"0dfdaa83-4ee7-4a5a-ac66-813eb840b5f7","added_by":"auto","created_at":"2025-10-08 07:49:37","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":6871435,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7523494/v1/3e9238f6-ab47-4805-b7f2-f3c05fad701a.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Machine Learning-Driven Discovery of Biomarkers in Gastric Cancer: A Focus on DPT, FBP2, ADH7, INHBA, and GPR155","fulltext":[{"header":"1. INTRODUCTION","content":"\u003cp\u003eThe Based on GLOBOCAN 2022 data, gastric cancer (GC) recorded approximately 968,000 new cases and 659,000 deaths in 2022, ranking fifth in both incidence and mortality rates\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e1\u003c/span\u003e]\u003c/sup\u003e. Despite it has decline in incidence and mortality, the incidence of GC among young individuals is steadily increasing annually, presenting a persistent global health challenge\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e2\u003c/span\u003e]\u003c/sup\u003e. Adenocarcinoma accounts for over 95% of GC cases. High-risk factors for GC include consumption of nitrite-rich foods and moldy foods, Helicobacter pylori infection\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e3\u003c/span\u003e]\u003c/sup\u003e, excessive alcohol intake, smoking and other unhealthy behaviors\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e4\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e5\u003c/span\u003e]\u003c/sup\u003e. Early-stage GC often presents with nonspecific symptoms like upper abdominal discomfort, acid reflux, indigestion and belching, leading to frequent misdiagnosis as common digestive ailments such as gastritis and gastric ulcers\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e6\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e7\u003c/span\u003e]\u003c/sup\u003e. Consequently, patients are frequently diagnosed at progressive even late stages. Particularly, early detection rates are notably lower in economically underdeveloped regions. Despite the advancements in understanding surgical resection, the fact that various treatment modalities including chemotherapy, radiotherapy, targeted therapy, and immunotherapy have shown some efficacy in enhancing patient prognosis, overall outcomes remain suboptimal\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e8\u003c/span\u003e]\u003c/sup\u003e. The management of GC remains a overwhelming challenge, characterized by limited treatment measures and unfavorable long-term results. Although current biomarkers offer some utility, their constraints underscore the urgent necessity for novel, comprehensive biomarkers to address these limitations.\u003c/p\u003e\n\u003cp\u003eBioinformatics and high-throughput omics technologies have significantly advanced the investigation on tumor mechanisms. The rapid evolution of whole-genome sequencing, including next-generation sequencing, enables the acquisition of cancer genome profiles\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e9\u003c/span\u003e]\u003c/sup\u003e. These technologies furnish researchers with extensive expression datasets, facilitating the analysis of individual patient genomes for precise targeted therapies and the potential to uncover specific cancer characteristics.Machine learning (ML) algorithms offer significant benefits in the fields of bioinformatics and molecular biology. By effectively handling intricate biological data, ML can autonomously address noise and redundant information, thereby enhancing result accuracy and reliability\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e10\u003c/span\u003e]\u003c/sup\u003e. It can help to understand gene-disease relationship and identify potential therapeutic targets. In addition, it is conducive to find disease association, early diagnosis and drug development. Model interpretability is a key issue in machine learning and deep learning fields. Complex models perform superb predictive performance, but often they are perceived as \u0026quot;black boxes\u0026quot; because of the complicating explanation of their internal decision making process. SHapley additive exPlanations (SHAP) addresses this challenge by assigning importance values to features, explaining the output of the model and enabling more informed decisions in medical field, compared to blindly relied on the output of the algorithm, which could lead to severe consequences for patients\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e11\u003c/span\u003e]\u003c/sup\u003e. The amalgamation of bioinformatics analysis and machine learning techniques is a promising strategy for biomarker discovery, promising promising to revolutionizing diagnosis, prognosis, and treatment of gastric cancer\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e12\u003c/span\u003e]\u003c/sup\u003e.\u003c/p\u003e\n\u003cp\u003eThis research aims to create a machine learning model for identifying gastric cancer by utilizing the Gene Expression Omnibus database (GEO) database. Through meticulous algorithmic screening of crucial genes, various machine learning models were developed. Ultimately, the most effective SVM model was chosen. The contributions of DPT, FBP2, ADH7, INHBA, and GPR155 to gastric cancer identification in this model were analyzed using Shapley Additive exPlanations. The utilization of these genes in clinical settings and the exploration of their downstream targets and pathways are imperative for early disease detection and the development of novel therapeutic strategies.\u003c/p\u003e"},{"header":"2. MATERIALS AND METHODS","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\n \u003ch2\u003e2.1 Data acquisition\u003c/h2\u003e\n \u003cp\u003eRNA sequencing (RNA-Seq) data were acquired from five microarray datasets (GSE26942\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e13\u003c/span\u003e]\u003c/sup\u003e, GSE27342\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e14\u003c/span\u003e]\u003c/sup\u003e, GSE30727, GSE63089\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e15\u003c/span\u003e]\u003c/sup\u003e, and GSE65801\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e16\u003c/span\u003e]\u003c/sup\u003e) accessible in the GEO at \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.ncbi.nlm.nih.gov/geo/\u003c/span\u003e\u003c/span\u003e. These datasets collectively comprise 392 tumor samples and 199 normal tissue samples. Subsequently, the probe expression matrix underwent conversion into a gene expression matrix utilizing the platform annotation file. Further details are provided in Table\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e. Additionally, standardized RNA sequencing data of 412 gastric cancer tissues and 36 normal tissues, along with corresponding clinicopathological information, were download from The Cancer Genome Atlas (TCGA) database at \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://portal.gdc.cancer.gov/\u003c/span\u003e\u003c/span\u003e.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec4\" class=\"Section2\"\u003e\n \u003ch2\u003e2.2 Gene expression normalization and differentially expressed genes identification\u003c/h2\u003e\n \u003cp\u003eThe raw GEO data was normalized using the \u0026quot;NormalizeBetweenArray\u0026quot; in R. To address heterogeneity within the training dataset arising from differences in experimental platforms and batches, the \u0026quot;sva\u0026quot; package (version 3.54.0) was employed. Differential expression analysis between tumor and normal samples was carried out using the \u0026quot;limma\u0026quot; package (version 3.62.2) in R\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e17\u003c/span\u003e]\u003c/sup\u003e. Visualization of the DEGs was achieved through the creation of a volcano plot using the \u0026quot;ggplot2\u0026quot; (version 3.5.2) and a heatmap applying the \u0026quot;pheatmap\u0026quot; (version 1.0.12) packages in R.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec5\" class=\"Section2\"\u003e\n \u003ch2\u003e2.3 Biological function and pathway enrichment analyses\u003c/h2\u003e\n \u003cp\u003eTo identify biological pathways associated with the expression levels of DEGs, we used the clusterProfiler package (version 4.14.4) in R\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e18\u003c/span\u003e]\u003c/sup\u003e to perform Gene Set Enrichment Analysis (GSEA) .GSEA is a robust computational approach utilized to discern biologically significant pathways and processes linked to particular gene expression patterns. The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway repository served as the predetermined reference set for assessing enrichment levels within the gene set. Gene Set Variation Analysis (GSVA), a technique for scrutinizing gene set variability, was executed employing the GSVA (version 2.0.5) software package within the R programming language\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e19\u003c/span\u003e]\u003c/sup\u003e. The KEGG pathway collection was also used as the background gene set for GSVA analysis in this research. As for initial 171 DEGS, we used the clusterProfiler software package to carry out enrichment analysis in three modules through Gene Ontology (GO): biological process (BP), molecular function (MF) and cellular component (CC). Contemporaneously, KEGG analysis was also conducted on these differentially expressed genes.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec6\" class=\"Section2\"\u003e\n \u003ch2\u003e2.4 Machine Learning Algorithms\u003c/h2\u003e\n \u003cp\u003eThe Absolute Shrinkage and Selection Operator (LASSO) and Random Forest(RF) algorithms are effective tools for identifying hub genes due to their strong predictive capabilities. Using the \u0026quot;glmnet\u0026quot; package (version 4.1.8) in R\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e20\u003c/span\u003e]\u003c/sup\u003e, a LASSO model was developed to pinpoint genes closely linked to gastric cancer. For RF analysis, the \u0026quot;randomForest\u0026quot; package (version 4.7.1.2) facilitated the selection of key genes by ranking differentially expressed genes and identifying those with an importance score above 3. Support Vector Machine Recursive Feature Elimination (SVM-RFE) was implemented using the \u0026quot;e1071\u0026quot; package (version 1.7.16) in R\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e21\u003c/span\u003e]\u003c/sup\u003e. The model\u0026apos;s performance was assessed through a 10-fold cross-validation average misclassification rate. Genes common to the sets identified by these three machine learning approaches were designated as hub genes for gastric cancer.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec7\" class=\"Section2\"\u003e\n \u003ch2\u003e2.5 Estimate diagnostic value\u003c/h2\u003e\n \u003cp\u003eThe receiver operating characteristic (ROC) curves were calculated to evaluate the diagnostic ability of the optimal gene biomarkers, then we measured the area under the curve (AUC). Genes that with an AUC greater than 0.9 were supposed to have good diagnostic performance. Based on the data from the TCGA database, we evaluated the diagnostic ability of 5 key genes. The pROC package(version 1.18.5) was used for ROC analysis\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e22\u003c/span\u003e]\u003c/sup\u003e, and the results were visualized using \u0026quot;ggplot2\u0026quot;. Various machine learning models were developed to assess diagnostic performance, such as random forest (RF), support vector machine (SVM), partial least squares (PLS), decision tree (DTS), K-nearest neighbors (KNN), logistic regression, eXtreme Gradient Boosting(XGBoost), gradient boosting machine (GBM), generalized linear model boosting (glmBoost) and neural network (NeuralNet). The diagnostic competency of each model was judged by analyzing the ROC curves they produced.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e\n \u003ch2\u003e2.6 SHapley additive exPlanation\u003c/h2\u003e\n \u003cp\u003eSHAP is based on the Shapley value of cooperative game theory and applies the Shapley value to machine learning model interpretation. Shapley value allocates benefits fairly among participants and SHAP uses this value to quantify each feature\u0026apos;s contribution to model prediction. This provides a common framework for assessing variable contributions in different algorithms. SHAP values allow the model performance evaluation to be transparent and interpretable, aiding clinicians in selecting the best diagnostic tool for their use in clinical applications.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec9\" class=\"Section2\"\u003e\n \u003ch2\u003e2.7 Immune Infiltration Analysis\u003c/h2\u003e\n \u003cp\u003eCIBERSORT is a computational tool utilized to delineate the cellular makeup of intricate tissues by analyzing gene expression patterns. In this study, immune cell infiltration levels were assessed employing the CIBERSORT algorithm through a tailored R script derived from the primary methodology\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e23\u003c/span\u003e]\u003c/sup\u003e. The analysis was performed with 1000 permutations, and only samples with a significance threshold of P\u0026thinsp;\u0026lt;\u0026thinsp;0.05 were retained. Spearman correlation analysis was employed to examine the association between target genes and levels of immune cell infiltration, with the outcomes presented through heat maps.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec10\" class=\"Section2\"\u003e\n \u003ch2\u003e2.8 Survival analysis\u003c/h2\u003e\n \u003cp\u003eThe Kaplan-Meier Plotter website (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://kmplot.com/analysis/\u003c/span\u003e\u003c/span\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e)\u003c/span\u003e, which incorporates sequencing data from multiple cancers and microarray information from The Cancer Genome Atlas and the Gene Expression Omnibus databases, was utilized for prognostic correlation analysis\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e24\u003c/span\u003e]\u003c/sup\u003e. The \u0026quot;Start KM Plotter for gastric cancer\u0026quot; section was employed to explore the relationship between the model genes and the overall survival (OS) of patients with gastric cancer. The cutoff value was set as \u0026quot;Auto select best cut - off\u0026quot;, and the probe was automatically selected as the best probe set recommended by the Jetset algorithm via \u0026quot;only JetSet best probe set\u0026quot;.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e\n \u003ch2\u003e2.9 Statistical analysis\u003c/h2\u003e\n \u003cp\u003eRigorous statistical analyses were conducted using the R software environment(version 4.2.3, \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.r-project.org\u003c/span\u003e\u003c/span\u003e). Differential gene expression was evaluated with a paired-sample t-test, considering a P-value below 0.05 as statistically significant. ROC curves and AUC metrics were employed to assess the diagnostic performance of biomarkers and machine learning models. The association between gene biomarker expression and infiltrating immune cell populations was evaluated using Spearman\u0026apos;s rank correlation.\u003c/p\u003e\n\u003c/div\u003e"},{"header":"3. RESULTS","content":"\u003cdiv id=\"Sec13\" class=\"Section2\"\u003e\u003ch2\u003e3.1 Identification of key DEGs\u003c/h2\u003e\u003cp\u003eThe present study leveraged five GEO datasets to elucidate the genetic underpinnings of gastric cancer. The datasets were merged to augment the sample size and mitigate heterogeneity, followed by data normalization. Principal component analysis (PCA) confirmed the efficacy of the normalization approach in reducing technical variability across the integrated dataset.(Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eA and Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eB). Differential expression analysis of the stomach tumor and normal samples was conducted using the \"limma\" R package. Genes with an adj.P.Val\u0026thinsp;\u0026lt;\u0026thinsp;0.05 and |log2FoldChange|\u0026gt;1 were considered as DEGs. The merged dataset yielded a total of 171 differentially expressed genes, comprising 64 up-regulated and 107 down-regulated transcripts.(Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eC and Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eD).\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec14\" class=\"Section2\"\u003e\u003ch2\u003e3.2 GO and KEGG Analysis of 171 DEGs\u003c/h2\u003e\u003cp\u003eThe biological functions of the 171 genes were investigated through gene ontology and kyoto encyclopedia of genes and genomes enrichment analyses. GO biological process (BP) analysis indicated significant enrichment in response to xenobiotic stimuli, digestion, hormone metabolic processes, tissue homeostasis, and anatomical structure homeostasis (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eA). GO cell composition (CC) analysis demonstrated significant enrichment in the apical part of the cell, collagen-containing extracellular matrix, and apical plasma membrane (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eA). GO molecular function (MF) analysis revealed significant enrichment in extracellular matrix structural constituents, serine hydrolase activity, oxidase activity, activities acting on CH-OH group donors, and alcohol dehydrogenase [NAD(P)+] activity (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eA). KEGG analysis highlighted Gastric acid secretion, Metabolism of xenobiotics by cytochrome P450, Virion - Hepatitis viruses and Drug metabolism - cytochrome P450 as the most significant pathways (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eB).\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec15\" class=\"Section2\"\u003e\u003ch2\u003e3.3 Machine learning identification of hub GC related DEGs\u003c/h2\u003e\u003cp\u003eTo identify diagnostic biomarkers for gastric cancer, we employed the Least Absolute Shrinkage and Selection Operator, Support Vector Machine Recursive Feature Elimination and Random Forest techniques. LASSO was utilized to identify key features while addressing multicollinearity(Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eA and Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eB). Our analysis associated gastric cancer with the expression of 33 genes, including ADH7, FBP2, INHBA, FLJ42875, COL1A1, CCKBR, MFAP2, HOXB9, PLA2G7, GPR155 and DPT. SVM was used to generate feature vectors, revealing 26 genes strongly linked to gastric cancer(Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eC and Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eD). The Random Forest algorithm highlighted genes such as SLC5A5, ADH7, AQP4, INHBA, MYOC, DPT, APOBEC2, TRIM50, SCNN1G, DNER, FBP2, KIAA1199, GC, CPA2, COL1A1, GPR155, CCKAR, and CLDN1, each with importance scores above 3(Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eE and Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eF). The overlap of these analyses identified five hub DEGs: DPT, FBP2, ADH7, INHBA, and GPR155(Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eA). Gene expression levels in gastric cancer and control samples were visualized by box-and-whisker plot(Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eB) and volcano plot(Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eC), and differential expression of five genes in the TCGA-STAD dataset was also explored(Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eD and Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eE). Chromosome locations are shown in the Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eF.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec16\" class=\"Section2\"\u003e\u003ch2\u003e3.4 Diagnostic value evaluation\u003c/h2\u003e\u003cp\u003eUtilizing data from the TCGA database, the area under the ROC curve (AUC) values were calculated for five core genes: DPT (AUC\u0026thinsp;=\u0026thinsp;0.923), FBP2 (AUC\u0026thinsp;=\u0026thinsp;0.602), ADH7 (AUC\u0026thinsp;=\u0026thinsp;0.855), INHBA (AUC\u0026thinsp;=\u0026thinsp;0.961), and GPR155 (AUC\u0026thinsp;=\u0026thinsp;0.739)(Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eA). Furthermore, the diagnostic performance of ten machine learning algorithms was evaluated on the training datasets by generating ROC curves. Notably, the support vector machine (SVM) models exhibited the highest AUC values, with SVM (AUC\u0026thinsp;=\u0026thinsp;0.938), random forest (RF) (AUC\u0026thinsp;=\u0026thinsp;0.936), k-nearest neighbors (KNN) (AUC\u0026thinsp;=\u0026thinsp;0.930), gradient boosting machine (GBM) (AUC\u0026thinsp;=\u0026thinsp;0.920), XGBoost (AUC\u0026thinsp;=\u0026thinsp;0.915), neural network (NeuralNet) (AUC\u0026thinsp;=\u0026thinsp;0.908), decision tree (DTS) (AUC\u0026thinsp;=\u0026thinsp;0.871), logistic regression (Logistic) (AUC\u0026thinsp;=\u0026thinsp;0.867), generalized linear model boosting (glmBoost) (AUC\u0026thinsp;=\u0026thinsp;0.866), and partial least squares (PLS) (AUC\u0026thinsp;=\u0026thinsp;0.857) (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eB).\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec17\" class=\"Section2\"\u003e\u003ch2\u003e3.5 SHAP analysis reveals the contributions of key DEGs\u003c/h2\u003e\u003cp\u003eTo evaluate the influence of DPT, FBP2, ADH7, INHBA, and GPR155 on the predictive capacity in the SVM, we conducted Shapley Additive Explanation (SHAP) analysis. Visualization of SHAP values elucidated the specific roles of these genes, highlighting DPT and FBP2 as the most significant contributors(Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eA). The swarm plot's Y-axis arrangement delineated the ranking of gene contributions to the model and clarified the allocation of SHAP values across features. It indicated that reduced expression levels of DPT, FBP2, ADH7, and GPR155 were linked to tumor prediction, while diminished expression of INHBA genes was associated with normal prediction(Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eB). The SHAP dependency plot(Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eC) provided a detailed exploration of molecular interactions and contributions within the predictive model of gastric cancer.Each plot illustrated the correlation between gene pairs using SHAP values on both axes, unveiling positive and negative correlation patterns. For example, the DPT versus FBP2 plot demonstrated a positive correlation, indicating that increased expression levels of these genes collectively impacted the model's prediction of normal samples. The color gradient in each plot represented the range of SHAP values, with high values shown in orange and low values in purple. Conversely, GPR155 displayed a narrow range of SHAP values, suggesting a limited impact on the model.The SHAP analysis elucidated the magnitude of contributions and intricate interactions among components in the model, offering insights into their implications for gastric cancer biology beyond conventional understanding.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec18\" class=\"Section2\"\u003e\u003ch2\u003e3.6 The working principle of the model: interpretation based on the SHAP\u003c/h2\u003e\u003cp\u003eAs can be seen from Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003eA and Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003eB, the force and waterfall plots show the prediction results of the DPT, ADH7, INHBA, FBP2, and GPR155 model in a single sample, the first sample of the train dataset (GSE26942-GSM662387). The vertical axis represents the expression of five molecules in the sample, and the horizontal axis represents the predicted value. We can see that DPT still has the greatest impact on the results (-0.274), followed by ADH7 (-0.188), INHBA (-0.123) and FBP2 (-0.116), which tend to classify samples as normal. It is worth noting that GPR155 is biased towards tumor sample prediction. The predicted value (f(x)) was 0.0112, while the expected final predicted value (E[f(x)]) was 0.68, successfully classifying this sample as normal gastric mucosa tissue, indicating high reliability of the model. Taken together, DPT, ADH7, and INHBA were identified in this analysis as key drivers of model prediction accuracy, revealing their important role in underlying biological processes.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec19\" class=\"Section2\"\u003e\u003ch2\u003e3.7 Functional enrichment analyses of five hub DEGs\u003c/h2\u003e\u003cp\u003eTo elucidate the functional implications and potential molecular pathways which the aforementioned genes associated with, we employed Gene Set Enrichment Analysis and Gene Set Variation Analysis. Samples were stratified into high and low expression groups in accordance with individual gene median expression levels. After that, rigorous GSEA enrichment analysis was carried out for these groups, identifying and retaining significantly enriched pathways with a p-value less than 0.05. The first three most significant pathways related to high and low expression were visualized(Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003eA). Gene Set Variation Analysis, a nonparametric and unsupervised method, was utilized to estimate the variation in gene set enrichment across samples. In order to assess differences in KEGG pathway activity between high and low expression groups were assessed using GSVA(Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003eB). Our findings indicated significant activation of the calcium signaling pathway, vascular smooth muscle contraction, dilated cardiomyopathy, and hypertrophic cardiomyopathy pathways in the high expression DPT group. Conversely, activities of aminoacyl-tRNA biosynthesis, homologous recombination, DNA replication, and the cell cycle pathway were decreased. In the FBP2 group, upregulation was observed in retinol metabolism and cytochrome P450-mediated exogenous biological metabolic pathways, while ecm-receptor interaction, pathways in cancer, NOD-like receptor signaling pathway, Wnt signaling pathway, and dorsoventral axis formation were downregulated. Similarly, ADH7 exhibited upregulation in pathways similar to FBP2, with downregulation primarily in progesterone-mediated oocyte maturation, RNA polymerase activity, and purine metabolism. For the INHBA group, upregulation was noted in ECM-receptor interaction, TGF-β signaling pathway, focal adhesion, and systemic lupus erythematosus pathways, while downregulation was observed in nitrogen metabolism, malonic acid, pyruvate, butyrate metabolism, and valine, leucine, and isoleucine degradation. GPR155 showed upregulation in the calcium signaling pathway, taste transduction, and neuroactive ligand-receptor interaction, and downregulation in pyrimidine metabolism, cell cycle, and the P53 signaling pathway. The dysregulation of these pathways, influenced by the five model genes, underscores the complexity of the underlying molecular mechanisms associated with gastric cancer, offering valuable insights for further investigation.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec20\" class=\"Section2\"\u003e\u003ch2\u003e3.8 Immune infiltration analysis\u003c/h2\u003e\u003cp\u003eImmune desert tumors refer to the lack of immune cell infiltration in the tumor microenvironment (TME), which can lead to tumor non-response to immunotherapy and lead to worse survival. Figure\u0026nbsp;\u003cspan refid=\"Fig9\" class=\"InternalRef\"\u003e9\u003c/span\u003e shows the difference in immune cell infiltration and the association between signature genes and immune cells identified by CIBERSORT algorithm in gastric cancer and normal group, respectively. As shown, these characteristic genes are significantly associated with most types of immune cells, demonstrating that they may be instrumental in influencing the immune microenvironment.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec21\" class=\"Section2\"\u003e\u003ch2\u003e3.9 Clinical prognostic correlation analysis\u003c/h2\u003e\u003cp\u003eTo decipher the role of model genes in GC patient survival, we analyzed their prognostic differences on overall survival (OS) by using the Kaplan-Meier Plotter website. The results indicate that patients with upregulated model genes exhibited significantly inferior OS rates compared to low-expression groups, with DPT (HR\u0026thinsp;=\u0026thinsp;1.15; P\u0026thinsp;=\u0026thinsp;0.12), FBP2 (HR\u0026thinsp;=\u0026thinsp;1.46; P\u0026thinsp;=\u0026thinsp;1e-04), ADH7 (HR\u0026thinsp;=\u0026thinsp;1.36; P\u0026thinsp;=\u0026thinsp;0.00079), INHBA (HR\u0026thinsp;=\u0026thinsp;1.3; P\u0026thinsp;=\u0026thinsp;0.0075), and GPR155 (HR\u0026thinsp;=\u0026thinsp;1.8; P\u0026thinsp;=\u0026thinsp;2.1e-07)(Fig.\u0026nbsp;\u003cspan refid=\"Fig10\" class=\"InternalRef\"\u003e10\u003c/span\u003e).\u003c/p\u003e\u003c/div\u003e"},{"header":"4. DISCUSSION","content":"\u003cp\u003eThe clinical management of gastric cancer remains challenging, as most patients present with advanced-stage disease at diagnosis\u003csup\u003e[\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e]\u003c/sup\u003e. According to current guidelines and expert consensus, advanced gastric cancer refers to cases where the cancerous tissue invades the muscularis propria of the stomach wall or penetrates the muscle layer to reach the serosa. Advanced - stage gastric cancer mainly refers to tumors that have infiltrated, spread, and metastasized to distant organs, such as liver metastasis and peritoneal metastasis. Peritoneal metastasis (PM) is a prevalent manifestation of advanced gastric cancer and represents the primary mode of recurrence following gastric cancer surgery, significantly impacting the prognosis of patients with this disease\u003csup\u003e[\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e]\u003c/sup\u003e. Moreover, the gene mutations in tumor cells are complex. Tumors exhibit substantial genomic heterogeneity, with divergent mutation profiles observed across individuals\u003csup\u003e[\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e]\u003c/sup\u003e. Therefore, the need to find precise treatment methods is becoming increasingly urgent.\u003c/p\u003e\u003cp\u003e\u003cul\u003e\u003cli\u003e\u003cp\u003eAccurate biomarkers play a pivotal role in modern medical research by enabling early disease diagnosis, treatment selection, and prognosis assessment. The advancement of omics technologies and machine learning has greatly enhanced the study of tumor markers. Bioinformatics encompasses a suite of analytical techniques, including gene expression profiling, survival analysis, protein-protein interaction network reconstruction, and functional enrichment analysis, among others\u003csup\u003e[\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e]\u003c/sup\u003e. Machine learning, a subfield of artificial intelligence (AI), empowers computer systems to enhance performance autonomously by leveraging data and experience rather than explicit programming. ML algorithms extract patterns from extensive datasets to execute tasks like prediction, classification, and decision-making. In contrast to costly and time-consuming experimental approaches, ML algorithms offer a cost-effective and efficient means of analyzing intricate biological data to enhance precision and effectiveness\u003csup\u003e[\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e]\u003c/sup\u003e. By building predictive models, ML can identify potential biomarkers and therapeutic targets in large biological datasets. Disease-associated genes identified through large-scale genomic analyses may serve as potential biomarkers for diagnosis or molecular targets for therapeutic intervention\u003csup\u003e[\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e]\u003c/sup\u003e. Interpretability tools provide transparency via visualization or numerical analysis rather than blindly trusting results. Interpretability tools can provide transparency by visualizing or numerical analysis instead of blindly trusting the output of the algorithm, which could lead to serious harm on patients. The essence of the need for interpretation of machine learning models is to transform technical logic into human - understandable decision - making basis, thus meeting the multiple requirements of technical reliability, social ethics, and legal compliance. Local interpretation of machine learning models helps meet transparency requirements, promotes human - machine collaboration, and contributes to model development, debugging, and monitoring. SHAP has become one of the gold standards in the current interpretable AI field through its rigorous mathematical foundation and flexible practical tools. Its proposed approach offers a comprehensive interpretation of the model by quantifying the marginal contribution of individual features from both global and local perspectives. This enhanced transparency empowers users to better understand and trust the model's predictions and decision-making process. It enables people to identify new diagnostic and prognostic biomarkers that play a essential role in the development, progression, and metastasis of cancer.\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eIn this study, we included five gastric cancer mRNA expression datasets (GSE26942, GSE27342, GSE30727, GSE63089, and GSE65801). Differential gene expression analysis was performed between gastric cancer tissues and matched normal gastric mucosa samples. By utilizing three machine learning methods to screen genes and identify an intersection, we pinpointed several genes that are closely associated with the biology of gastric cancer. Notably, DPT, FBP2, ADH7, INHBA, and GPR155 emerged as significant factors in the pathogenesis of gastric cancer, with previous implications in various cancer types. Dermatopontin (DPT), an extracellular matrix protein initially obtained during the purification process of dermatan sulfate proteoglycan, is believed to influence cell-matrix interactions and matrix assembly. Research indicates that DPT expression is diminished in several malignancies, such as hepatocellular carcinoma, colon, oral, ovarian, breast and papillary thyroid cancers. This reduction in DPT expression has been linked to the promotion of tumor initiation, progression, and metastasis through the inactivation of signaling pathways like Wnt and Hippo/YAP\u003csup\u003e[\u003cspan additionalcitationids=\"CR32\" citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e]\u003c/sup\u003e. Fructose-1,6-bisphosphatase 2 (FBP2), as a moonlighting protein, is extensively expressed in all non-gluconeogenic tissues. Beyond its canonical metabolic role in glycogen synthesis, it also involved in cell cycle-dependent processes, facilitates synaptic plasticity, and modulates the activity of transcription factors.FBP2 has the capability to interact directly with c-MYC in oral squamous cell carcinoma cells and sarcoma cells. Reduced levels of FBP2 have been associated with enhanced tumor growth and invasion\u003csup\u003e[\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e]\u003c/sup\u003e. Duda et al. Found FBP2 inhibits the transcriptional activity of HIF - α in lung cancer cells\u003csup\u003e[\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e]\u003c/sup\u003e. Additionally, Co\u0026sup2;⁺enhances Camk2α activity through structural remodeling of Fbp2, modulating its mitochondrial binding affinity, which is a potential mechanism for inducing epilepsy\u003csup\u003e[\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e]\u003c/sup\u003e. Alcohol dehydrogenase 7 (ADH7) is predominantly expressed in the proximal gastrointestinal tract, where it mediate the cytochrome P450-mediated metabolism of xenobiotics. Specifically, ADH7 catalyzes the oxidative conversion of ethanol within the gastroesophageal mucosa through its dehydrogenase activity, a process that precedes systemic absorption\u003csup\u003e[\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e]\u003c/sup\u003e. Single - nucleotide polymorphisms in ADH7 have been reported as susceptibility factors for tumor and drug dependence. A study suggested through Mendelian analysis that ADH7 (OR\u0026thinsp;=\u0026thinsp;1.3568, 95% CI\u0026thinsp;=\u0026thinsp;1.1044\u0026ndash;1.6670) may be a marker for gastric cancer\u003csup\u003e[\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e]\u003c/sup\u003e. The encoded product of the INHBA gene is the inhibin βA subunit, an important member of the transforming growth factor - β superfamily, which participates in the production of inhibins and activins. The INHBA gene has been confirmed as an oncogene and is overexpressed in various malignant solid tumors, such as ovarian cancer, rectal cancer, and head - and - neck squamous cell carcinoma\u003csup\u003e[\u003cspan additionalcitationids=\"CR40\" citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e]\u003c/sup\u003e. As for G protein - coupled receptor 155 (GPR155), also named as the lysosomal cholesterol sensor (LYCHOS) protein\u003csup\u003e[\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e]\u003c/sup\u003e, it can control the lysosomal cholesterol sensor and regulate the lysosomal pathway to convert cholesterol levels into the activation of mTORC1 signaling, participating in the regulation of tumor metabolism\u003csup\u003e[\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e]\u003c/sup\u003e. DaiShimizu et al. found low expression of GPR155 in gastric cancer cells. Next-generation sequencing analysis of liver-metastatic GC tissues revealed GPR155 as a promising diagnostic biomarker for hematogenous metastasis, demonstrating significant differential expression patterns compared to primary lesions\u003csup\u003e[\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e]\u003c/sup\u003e. To validate the importance of these genes in the gastric cancer prediction model, we used SHAP analysis. We interpreted the SVM model using SHAP analysis, with DPT and FBP2 being the most critical. Visualization of SHAP values revealed the relative contributions of individual genes, highlighting their significance in the model's decision-making process. Applying the CIBERSORT algorithm, we characterized differences in the immune microenvironment between gastric cancer (GC) tissues and control samples. Leveraging the TCGA database, we further validated the expression levels and diagnostic potential of the model genes. Additionally, we assessed the prognostic impact of these five genes on the survival of gastric cancer patients using the Kaplan-Meier plotter online tool.\u003c/p\u003e\u003c/li\u003e\u003c/ul\u003e\u003c/p\u003e"},{"header":"5. Limitations","content":"\u003cp\u003e\u003cul\u003e\u003cli\u003e\u003cp\u003eThe findings of this study are derived from analyses of public databases. Although we have endeavored to validate the results by integrating TCGA-STAD and multiple GEO datasets, the conclusions still require further confirmation through fundamental biological experiments. A limited number of published studies have touched upon the roles of these genes in gastric cancer. In future work, we will prioritize experimental validation and investigation of the specific molecular mechanisms through which these genes influence the development and progression of gastric cancer.\u003c/p\u003e\u003c/li\u003e\u003c/ul\u003e\u003c/p\u003e"},{"header":"6.Conclusion","content":"\u003cp\u003eIn conclusion, we identified five distinct genes with significant implications for the prediction and diagnosis of gastric malignancies utilizing a combination of bioinformatics analysis and machine learning techniques. Furthermore, a comprehensive analysis of the developed model was conducted employing SHAP interpretation. Notably, DPT and FBP2 emerged as prominent among these genes. Our examination extended to elucidating the potential pathways associated with these genes, their prognostic relevance, and their influence on immune cell infiltration. This study has unveiled novel candidate biomarkers for gastric cancer, offering promising prospects for the advancement of diagnostic and therapeutic approaches. Subsequent research endeavors should prioritize the validation of the functional roles of above-mentioned genes and elucidate their interactions with tumor immune microenvironment, thereby paving the way for innovative strategies in the diagnosis and treatment of gastric cancer.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eAUTHOR CONTRIBUTIONS\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eJianbo Zhao: Data curation (equal); formal analysis (equal); investigation; writing \u0026ndash; original draft (lead). Damu Agu: formal analysis (equal); writing \u0026ndash; original draft (equal). Xiongfeng Li: Methodology and visualization. Youge Su: Data curation (equal).\u0026nbsp;Haidong Cheng: Formal analysis (equal); project administration (equal);\u0026nbsp;writing \u0026ndash; review and editing (equal). Mingxing Hou: Project administration (equal);\u0026nbsp;writing \u0026ndash; review and editing (equal)\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eACKNOWLEDGMENTS\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe sincerely appreciate the valuable contribution of the TCGA and GEO databases in making data available to the public. The authors also thank all participants who participate in this research.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCONFLICT OF INTEREST STATEMENT\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors have no conflict of interest.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eDATA AVAILABILITY STATEMENT\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe data that support the findings of this study are available from The Cancer Genome Atlas Stomach Adenocarcinoma (TCGA-STAD) cohort: https://portal.gdc.cancer.gov/analysis_page?app=Projects/TCGA-STAD and Gene Expression Omnibus (GEO) repository: https://www.ncbi.nlm.nih.gov/geo/. The specific accession numbers for the five datasets are [GSE26942, GSE27342, GSE30727, GSE63089, and GSE65801]. The corresponding author can supply R code utilized in this research upon reasonable request.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFUNDING INFORMATION\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis work is support by Inner Mongolia Science and Technology Plan Project (Grant No. 2022YFSH0081) and The Youth Project of Inner Mongolia Medical University (Grant No. YKD2023QN011), alongside support from the Youth Exploration Project of Inner Mongolia Medical University Affiliated Hospital, allocated under Grant No. 2022NYFYTS015.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthical approval\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe data utilized in this study were exclusively sourced from public databases, no separate ethical approval was required for this secondary analysis.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent to participate\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot Applicable\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent to publish\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eSeparate consent for publication in this context was not required.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCLINICAL TRIAL NUMBER\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot Applicable\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eBray F, Laversanne M, Sung H, Ferlay J, Siegel RL, Soerjomataram I et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 74(3). United States:Wiley-Blackwell,2024. 229\u0026ndash;263. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://doi.org/10.3322/caac.21834\u003c/span\u003e\u003cspan address=\"10.3322/caac.21834\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMorgan E, Arnold M, Camargo MC, Gini A, Kunzmann AT, Matsuda T et al. The current and future incidence and mortality of gastric cancer in 185 countries, 2020-40: A population-based modelling study. EClinicalMedicine. 47England:Elsevier,2022. 101404. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.eclinm.2022.101404\u003c/span\u003e\u003cspan address=\"10.1016/j.eclinm.2022.101404\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDuan Y, Xu Y, Dou Y, Xu D. Helicobacter pylori and gastric cancer: mechanisms and new perspectives. J Hematol Oncol. 18(1). England:BioMed Central,2025. 10. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1186/s13045-024-01654-2\u003c/span\u003e\u003cspan address=\"10.1186/s13045-024-01654-2\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGuan WL, He Y, Xu RH. Gastric cancer treatment: recent progress and future perspectives. J Hematol Oncol. 16(1). England:BioMed Central,2023. 57. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1186/s13045-023-01451-3\u003c/span\u003e\u003cspan address=\"10.1186/s13045-023-01451-3\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLu L, Mullins CS, Schafmayer C, Zei\u0026szlig;ig S, Linnebacher M. A global assessment of recent trends in gastrointestinal cancer and lifestyle-associated risk factors. Cancer Commun (Lond). 41(11). United States:other,2021. 1137\u0026ndash;1151. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1002/cac2.12220\u003c/span\u003e\u003cspan address=\"10.1002/cac2.12220\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLordick F, Carneiro F, Cascinu S, Fleitas T, Haustermans K, Piessen G et al. Gastric cancer: ESMO Clinical Practice Guideline for diagnosis, treatment and follow-up. Ann Oncol. 33(10). England:Oxford University Press,2022. 1005\u0026ndash;20. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.annonc.2022.07.004\u003c/span\u003e\u003cspan address=\"10.1016/j.annonc.2022.07.004\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSmyth EC, Nilsson M, Grabsch HI, van Grieken NC, Lordick F. Gastric cancer. Lancet. 396(10251). England:other,2020. 635\u0026ndash;648. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/s0140-6736(20)31288-5\u003c/span\u003e\u003cspan address=\"10.1016/s0140-6736(20)31288-5\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAjani JA, D'Amico TA, Bentrem DJ, Chao J, Cooke D, Corvera C et al. Gastric Cancer, Version 2.2022, NCCN Clinical Practice Guidelines in Oncology. J Natl Compr Canc Netw 20(2). United States:Cold Spring Publishing LLC,2022. 167\u0026ndash;92. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.6004/jnccn.2022.0008\u003c/span\u003e\u003cspan address=\"10.6004/jnccn.2022.0008\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSelvakumar SC, Preethi KA, Ross K, Tusubira D, Khan M, Mani P et al. CRISPR/Cas9 and next generation sequencing in the personalized treatment of Cancer. Mol Cancer. 21(1). England:BioMed Central,2022. 83. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1186/s12943-022-01565-1\u003c/span\u003e\u003cspan address=\"10.1186/s12943-022-01565-1\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBlack JE, Kueper JK, Williamson TS. An introduction to machine learning for classification and prediction. Fam Pract. 40(1). England:Oxford University Press,2023. 200\u0026ndash;204. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/fampra/cmac104\u003c/span\u003e\u003cspan address=\"10.1093/fampra/cmac104\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B et al. From Local Explanations to Global Understanding with Explainable AI for Trees. Nat Mach Intell. 2(1). England:SPRINGERNATURE,2020. 56\u0026ndash;67. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s42256-019-0138-9\u003c/span\u003e\u003cspan address=\"10.1038/s42256-019-0138-9\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMatsuoka T, Yashiro M. Bioinformatics Analysis and Validation of Potential Markers Associated with Prediction and Prognosis of Gastric Cancer. Int J Mol Sci. 25(11). Switzerland:MDPI (Basel, Switzerland),2024. 5880. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/ijms25115880\u003c/span\u003e\u003cspan address=\"10.3390/ijms25115880\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eOh SC, Sohn BH, Cheong JH, Kim SB, Lee JE, Park KC et al. Clinical and genomic landscape of gastric cancer with a mesenchymal phenotype. Nat Commun 9(1). England:Springer Nature,2018. 1777. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41467-018-04179-8\u003c/span\u003e\u003cspan address=\"10.1038/s41467-018-04179-8\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eCui J, Chen Y, Chou WC, Sun L, Chen L, Suo J et al. An integrated transcriptomic and computational analysis for biomarker identification in gastric cancer. Nucleic Acids Res. 39(4). England:Oxford University Press,2011. 1197\u0026thinsp;\u0026ndash;\u0026thinsp;207. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/nar/gkq960\u003c/span\u003e\u003cspan address=\"10.1093/nar/gkq960\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZhang X, Ni Z, Duan Z, Xin Z, Wang H, Tan J et al. Overexpression of E2F mRNAs associated with gastric cancer progression identified by the transcription factor and miRNA co-regulatory network analysis. PLoS ONE 10(2). United States:Public Library of Science,2015. e0116979. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1371/journal.pone.0116979\u003c/span\u003e\u003cspan address=\"10.1371/journal.pone.0116979\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLi H, Yu B, Li J, Su L, Yan M, Zhang J et al. Characterization of differentially expressed genes involved in pathways associated with gastric cancer. PLoS ONE 10(4). United States:Public Library of Science,2015. e0125013. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1371/journal.pone.0125013\u003c/span\u003e\u003cspan address=\"10.1371/journal.pone.0125013\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRitchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43(7). England:Oxford University Press,2015. e47. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/nar/gkv007\u003c/span\u003e\u003cspan address=\"10.1093/nar/gkv007\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eXu S, Hu E, Cai Y, Xie Z, Luo X, Zhan L et al. Using clusterProfiler to characterize multiomics data. Nat Protoc. 19(11). England:Springer Nature,2024. 3292\u0026ndash;3320. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41596-024-01020-z\u003c/span\u003e\u003cspan address=\"10.1038/s41596-024-01020-z\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eH\u0026auml;nzelmann S, Castelo R, Guinney J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinf 14England:BioMed Cent 2013. 7. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1186/1471-2105-14-7\u003c/span\u003e\u003cspan address=\"10.1186/1471-2105-14-7\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eFriedman J, Hastie T, Tibshirani R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw. 33(1). United States:University of California at Los Angeles,2010. 1\u0026ndash;22.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSanz H, Valim C, Vegas E, Oller JM, Reverter F. SVM-RFE: selection and visualization of the most relevant features through non-linear kernels. BMC Bioinformatics. 19(1). England:BioMed Central,2018. 432. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1186/s12859-018-2451-4\u003c/span\u003e\u003cspan address=\"10.1186/s12859-018-2451-4\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRobin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, et al. pROC: an open-source package for R and S\u0026thinsp;+\u0026thinsp;to analyze and compare ROC curves. BMC Bioinf 12England:BioMed Cent. 2011;77. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1186/1471-2105-12-77\u003c/span\u003e\u003cspan address=\"10.1186/1471-2105-12-77\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eNewman AM, Liu CL, Green MR, Gentles AJ, Feng W, Xu Y et al. Robust enumeration of cell subsets from tissue expression profiles. Nat Methods. 12(5). United States:Springer Nature,2015. 453-7. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/nmeth.3337\u003c/span\u003e\u003cspan address=\"10.1038/nmeth.3337\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGyőrffy B. Integrated analysis of public datasets for the discovery and validation of survival-associated genes in solid tumors. Innovation (Camb). 5(3). United States:other,2024. 100625. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.xinn.2024.100625\u003c/span\u003e\u003cspan address=\"10.1016/j.xinn.2024.100625\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eYasuda T, Wang YA. Gastric cancer immunosuppressive microenvironment heterogeneity: implications for therapy development. Trends Cancer. 10(7). United States:Elsevier,2024. 627\u0026ndash;642. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.trecan.2024.03.008\u003c/span\u003e\u003cspan address=\"10.1016/j.trecan.2024.03.008\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLi GZ, Doherty GM, Wang J. Surgical Management of Gastric Cancer: A Review. JAMA Surg. 157(5). United States:American Medical Association,2022. 446\u0026ndash;454. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1001/jamasurg.2022.0182\u003c/span\u003e\u003cspan address=\"10.1001/jamasurg.2022.0182\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eK\u0026ouml;rfer J, Lordick F, Hacker UT. Molecular Targets for Gastric Cancer Treatment and Future Perspectives from a Clinical and Translational Point of View. Cancers (Basel). 13(20). Switzerland:MDPI (Basel, Switzerland),2021. 5216. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/cancers13205216\u003c/span\u003e\u003cspan address=\"10.3390/cancers13205216\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHuang J, Mao L, Lei Q, Guo AY. Bioinformatics tools and resources for cancer and application. Chin Med J (Engl). 137(17). China:Wolters Kluwer Medknow Publications,2024. 2052\u0026ndash;2064. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1097/cm9.0000000000003254\u003c/span\u003e\u003cspan address=\"10.1097/cm9.0000000000003254\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAromolaran O, Aromolaran D, Isewon I, Oyelade J. Machine learning approach to gene essentiality prediction: a review. Brief Bioinform. 22(5). England:Oxford University Press,2021. bbab128 [pii]. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/bib/bbab128\u003c/span\u003e\u003cspan address=\"10.1093/bib/bbab128\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGuan S, Xu Z, Yang T, Zhang Y, Zheng Y, Chen T et al. Identifying potential targets for preventing cancer progression through the PLA2G1B recombinant protein using bioinformatics and machine learning methods. Int J Biol Macromol. 276(Pt 1). Netherlands:Elsevier,2024. 133918. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.ijbiomac.2024.133918\u003c/span\u003e\u003cspan address=\"10.1016/j.ijbiomac.2024.133918\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHuang S, Ma L, Lan B, Liu N, Nong W, Huang Z. Comprehensive analysis of prognostic genes in gastric cancer. Aging (Albany NY). 13(20). United States:other,2021. 23637\u0026ndash;23651. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.18632/aging.203638\u003c/span\u003e\u003cspan address=\"10.18632/aging.203638\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eYe D, Wang Y, Deng X, Zhou X, Liu D, Zhou B et al. DNMT3a-dermatopontin axis suppresses breast cancer malignancy via inactivating YAP. Cell Death Dis. 14(2). England:Springer Nature,2023. 106. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41419-023-05657-8\u003c/span\u003e\u003cspan address=\"10.1038/s41419-023-05657-8\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eCatal\u0026aacute;n V, Domench P, G\u0026oacute;mez-Ambrosi J, Ram\u0026iacute;rez B, Becerril S, Mentxaka A et al. Dermatopontin Influences the Development of Obesity-Associated Colon Cancer by Changes in the Expression of Extracellular Matrix Proteins. Int J Mol Sci. 23(16). Switzerland:MDPI (Basel, Switzerland),2022. 9222. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/ijms23169222\u003c/span\u003e\u003cspan address=\"10.3390/ijms23169222\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGizak A, Budziak B, Domaradzka A, Pietras Ł, Rakus D. Fructose 1,6-bisphosphatase as a promising target of anticancer treatment. Adv Biol Regul. 95England:other,2025. 101057. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.jbior.2024.101057\u003c/span\u003e\u003cspan address=\"10.1016/j.jbior.2024.101057\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDuda P, Janczara J, McCubrey JA, Gizak A, Rakus D. The Reverse Warburg Effect is Associated with Fbp2-Dependent Hif1α Regulation in Cancer Cells Stimulated by Fibroblasts. Cells. 9(1). Switzerland:MDPI (Basel, Switzerland),2020. 205. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/cells9010205\u003c/span\u003e\u003cspan address=\"10.3390/cells9010205\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDuda P, Budziak B, Rakus D. Cobalt Regulates Activation of Camk2α in Neurons by Influencing Fructose 1,6-bisphosphatase 2 Quaternary Structure and Subcellular Localization. Int J Mol Sci. 22(9). Switzerland:MDPI (Basel, Switzerland),2021. 4800. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/ijms22094800\u003c/span\u003e\u003cspan address=\"10.3390/ijms22094800\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZhao L, Lei H, Shen L, Tang J, Wang Z, Bai W et al. Prognosis genes in gastric adenocarcinoma identified by cross talk genes in disease\u0026ndash;related pathways. Mol Med Rep. 16(2). Greece:Spandidos Publications,2017. 1232\u0026ndash;1240. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3892/mmr.2017.6699\u003c/span\u003e\u003cspan address=\"10.3892/mmr.2017.6699\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDuell EJ, Sala N, Travier N, Mu\u0026ntilde;oz X, Boutron-Ruault MC, Clavel-Chapelon F et al. Genetic variation in alcohol dehydrogenase (ADH1A, ADH1B, ADH1C, ADH7) and aldehyde dehydrogenase (ALDH2), alcohol consumption and gastric cancer risk in the European Prospective Investigation into Cancer and Nutrition (EPIC) cohort. Carcinogenesis. 33(2). England:Oxford University Press,2012. 361-7. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/carcin/bgr285\u003c/span\u003e\u003cspan address=\"10.1093/carcin/bgr285\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHu Y, Recouvreux MS, Haro M, Taylan E, Taylor-Harding B, Walts AE et al. INHBA(+) cancer-associated fibroblasts generate an immunosuppressive tumor microenvironment in ovarian cancer. NPJ Precis Oncol. 8(1). England:Springer Nature,2024. 35. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41698-024-00523-y\u003c/span\u003e\u003cspan address=\"10.1038/s41698-024-00523-y\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWang JJ, Chen DX, Zhang Y, Xu X, Cai Y, Wei WQ et al. Elevated expression of the RNA-binding protein IGF2BP1 enhances the mRNA stability of INHBA to promote the invasion and migration of esophageal squamous cancer cells. Exp Hematol Oncol. 12(1). England:BioMed Central,2023. 75. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1186/s40164-023-00429-8\u003c/span\u003e\u003cspan address=\"10.1186/s40164-023-00429-8\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLi FL, Gu LH, Tong YL, Chen RQ, Chen SY, Yu XL et al. INHBA promotes tumor growth and induces resistance to PD-L1 blockade by suppressing IFN-γ signaling. Acta Pharmacol Sin. 46(2). United States:Nature Publishing Group,2025. 448\u0026ndash;461. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41401-024-01381-x\u003c/span\u003e\u003cspan address=\"10.1038/s41401-024-01381-x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBayly-Jones C, Lupton CJ, Keen AC, Dong S, Mastos C, Luo W et al. LYCHOS is a human hybrid of a plant-like PIN transporter and a GPCR. Nature. 634(8036). England:Springer Nature,2024. 1238\u0026ndash;1244. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41586-024-08012-9\u003c/span\u003e\u003cspan address=\"10.1038/s41586-024-08012-9\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eShin HR, Citron YR, Wang L, Tribouillard L, Goul CS, Stipp R et al. Lysosomal GPCR-like protein LYCHOS signals cholesterol sufficiency to mTORC1. Science. 377(6612). United States:other,2022. 1290\u0026ndash;1298. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1126/science.abg6621\u003c/span\u003e\u003cspan address=\"10.1126/science.abg6621\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eShimizu D, Kanda M, Tanaka H, Kobayashi D, Tanaka C, Hayashi M, et al. GPR155 Serves as a Predictive Biomarker for Hematogenous Metastasis in Patients with Gastric Cancer. Sci Rep 7England:Springer Nat. 2017;42089. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/srep42089\u003c/span\u003e\u003cspan address=\"10.1038/srep42089\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"discover-oncology","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"dion","sideBox":"Learn more about [Discover Oncology](https://www.springer.com/12672)","snPcode":"","submissionUrl":"","title":"Discover Oncology","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Discover Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Gastric cancer, DPT, FBP2, ADH7, INHBA, GPR155, Biomarkers, Machine learning, SHapley additive exPlanations","lastPublishedDoi":"10.21203/rs.3.rs-7523494/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7523494/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eGastric cancer still is a severe threat to human health, often presenting with a poor prognosis, effective biomarkers for early detection and targeted treatment are urgently needed. This study performed a comprehensive bioinformatics and machine learning approach to identify key protein biomarkers for gastric cancer and elucidate their potential functions. Gastric cancer-related datasets were obtained from the NCBI Gene Expression Omnibus database. Differential expression analysis identified 171 genes with noticeable differences between control and tumor samples. Utilizing LASSO, SVM-RFE, and RF algorithms, five genes\u0026mdash;DPT, FBP2, ADH7, INHBA, and GPR155\u0026mdash;were identified as potential biomarkers. A support vector machine (SVM) model demonstrated the highest performance among ten machine learning models constructed using these five genes. Shapley additive explanations (SHAP) were employed to illustrate the detailed contribution of the pivotal genes to the SVM model. Gene set enrichment analysis and gene set variation analysis were then used to find out the functional roles of these genes in gastric cancer cells. At length, we revealed the distinctive effects of signature genes on immune cell infiltration and patient prognosis. In conclusion, the identified proteins have the potential to serve as diagnostic biomarkers and provide prognostic value for gastric cancer. This study offers a comprehensive, data-driven approach to uncover critical molecular targets for improved detection and management of this deadly disease.\u003c/p\u003e","manuscriptTitle":"Machine Learning-Driven Discovery of Biomarkers in Gastric Cancer: A Focus on DPT, FBP2, ADH7, INHBA, and GPR155","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-10-08 07:17:30","doi":"10.21203/rs.3.rs-7523494/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"editorInvitedReview","content":"","date":"2025-10-04T08:58:22+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"90928337102021945326400139838688419237","date":"2025-09-30T14:17:10+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"275777986759286461148707844558731809165","date":"2025-09-25T10:28:38+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-09-24T13:09:06+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-09-23T15:00:10+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2025-09-23T14:22:41+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-09-16T17:44:10+00:00","index":"","fulltext":""},{"type":"submitted","content":"Discover Oncology","date":"2025-09-16T17:40:00+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"discover-oncology","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"dion","sideBox":"Learn more about [Discover Oncology](https://www.springer.com/12672)","snPcode":"","submissionUrl":"","title":"Discover Oncology","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Discover Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"462484c0-aca0-4f72-82fe-6f47586b8836","owner":[],"postedDate":"October 8th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2025-11-27T07:53:55+00:00","versionOfRecord":[],"versionCreatedAt":"2025-10-08 07:17:30","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7523494","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7523494","identity":"rs-7523494","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.