Construction of a Feature Gene and Machine Prediction Model for Inflammatory Bowel Disease Based on Multi - Chip Joint Analysis | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Construction of a Feature Gene and Machine Prediction Model for Inflammatory Bowel Disease Based on Multi - Chip Joint Analysis Yan Chaosheng, Rao jingjing, Dai yuanyuan, Duan wenhui, Sun haowen, and 2 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6286485/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 19 Aug, 2025 Read the published version in Journal of Translational Medicine → Version 1 posted You are reading this latest preprint version Abstract Background Inflammatory bowel disease (IBD) is a chronic non - specific inflammatory disorder triggered by immune responses and genetic factors. Currently, there is no cure for IBD, and its etiology remains unclear. As a result, early detection and diagnosis of IBD pose significant challenges. Therefore, investigating biomarkers in peripheral blood is of utmost importance, as it can assist doctors in the early identification and management of IBD. Methods We employed the multi - chip joint analysis approach to thoroughly explore the database. Based on methods such as artificial neural networks (ANN), machine learning techniques, and the SHAP model, we developed a diagnostic model for IBD. To select genetic features, we utilized three machine learning algorithms: the Least Absolute Shrinkage and Selection Operator (LASSO), Support Vector Machine (SVM), and Random Forest (RF) to screen for differentially expressed genes. Additionally, we conducted an in - depth analysis of the enriched molecular pathways of these differentially expressed genes through Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. Moreover, we used the SHAP model to interpret the results of the machine learning process. Finally, we examined the relationship between differentially expressed genes and immune cells. Results Through machine learning, we identified four crucial biomarkers for IBD, namely LOC389023, DUOX2, LCN2, and DEFA6. The SHAP model was used to elucidate the contribution of differentially expressed genes in the diagnostic model. These genes are primarily associated with immune system modulation and microbial alterations. GO and KEGG pathway enrichment analyses indicated that the differentially expressed genes demonstrated excellent performance in molecular pathways such as the Antimicrobial and IL − 17 signaling pathways. By performing correlation and differential analyses between differentially expressed genes and immune cells, we found that M1 macrophages exhibited stable differential changes across all four differentially expressed genes. M2 macrophages, resting mast cells, neutrophils, and activated CD4 memory T cells all showed significant differences among three of the differentially expressed genes. Conclusion We have identified differentially expressed genes (LOC389023, DUOX2, LCN2, and DEFA6) with significant immune - related effects in IBD. Our findings suggest that machine learning algorithms outperform ANN in the diagnosis of IBD. This research provides a theoretical foundation for the clinical diagnosis, targeted therapy, and prognosis evaluation of IBD. Inflammatory bowel disease Machine learning Artificial neural network Diagnostic model Immune differences Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Introduction Inflammatory bowel disease (IBD) is predominantly classified into ulcerative colitis (UC) and Crohn's disease (CD). Ever since IBD emerged in the 20th century, its incidence and prevalence have witnessed a remarkable increase over the past few decades. Similar to other immune - related disorders, the incidence of IBD has grown in tandem with industrialization and urbanization. Based on the 2019 Global Burden of Disease (GBD) findings, the estimated prevalence of IBD impacts around 5 million individuals, with approximately 400,000 new cases reported each year. Moreover, age and gender are crucial factors influencing the incidence and prevalence of IBD [ 2 ]. For instance, 25% of cases manifest during childhood, and the incidence rate keeps climbing. Although the incidence of IBD appears to have stabilized in the Western world, its prevalence continues to surge. This is because IBD commonly affects young individuals with relatively low mortality rates, and currently, there is no curative treatment available [ 3 , 4 ]. The rising prevalence of Inflammatory Bowel Disease (IBD) emphasizes the need to gain a deeper understanding of its molecular mechanisms to develop targeted therapies and diagnostic tools. Currently, in clinical practice, there are no biomarkers capable of accurately predicting the disease course or treatment response. This is mainly attributed to the complex molecular basis of IBD and variations in immune responses. In reality, C - reactive protein (CRP), erythrocyte sedimentation rate (ESR), fecal biomarkers, and calprotectin are often regarded as essential diagnostic tools for IBD. However, in practical applications, these so - called biomarkers have certain limitations. For example, fecal biomarkers have poor accuracy, ESR is easily affected by multiple factors, and CRP production shows increased heterogeneity. In recent years, with the data collected from genomic research, the relationship between specific genes and the etiology of IBD has been widely explored. As mentioned earlier, the association between the NOD2 gene and Crohn's disease (CD) has been well - established. It has become a key predictive factor and is associated with an increased risk of disease complications [ 5 ]. Additionally, individuals with genetic variability in the PRDM1 and NDP52 genes are more susceptible to CD. Genes such as KIF9 - AS1, LINC01272, and DIO3OS have proven useful in differentiating and detecting various types of IBD. Regrettably, no research has yet been able to clearly explain the action pathways of pathogenic genes and the immune cells affected by them. Previous studies have indicated that IBD results from the interaction of immune responses, genetic factors, and microbiota [ 6 ]. Although the exact causes of IBD remain unclear, multiple interrelated factors from genetics, the immune system, microbiota, and the environment all play a role in the development of IBD [ 3 ]. Immune dysfunction can trigger persistent inflammation, a characteristic feature of IBD. This leads to the reduction or destruction of intestinal crypts, along with a series of severe clinical manifestations and complications, significantly deteriorating the quality of life of patients [ 7 , 8 ]. Our study revealed that, compared to the normal control group, patients with Inflammatory Bowel Disease (IBD) exhibited differential expression of several genes. These genes were predominantly enriched in pathways associated with inflammatory and immune responses, which is consistent with previous research findings [ 9 ]. Besides the differences in gene expression, we also discovered that the diagnosis of IBD could be reflected by changes in immune cells.We propose that differential gene expression serves as the initiating factor of IBD. Through several potential mechanisms, it acts on immune cells, leading to significant variations in their expression levels and quantities. Firstly, differential gene expression might trigger excessive activation or inhibition of pathogenic molecular pathways in IBD patients, causing the release of an excessive amount of inflammatory factors. Secondly, the overexpression of inflammatory factors disrupts the immune response in patients. Eventually, the disordered immune response gives rise to differential expression of immune cell levels in the patient's body.In conclusion, differential gene expression plays a pivotal role in the initiation and progression of IBD and can be regarded as a key element in the development of novel diagnostic and therapeutic strategies for IBD. To explore this further, we employed multi - chip joint analysis, artificial neural networks (ANN), and machine learning algorithms to identify significantly differentially expressed genes. We then utilized the SHAP model to illustrate the contribution of these differentially expressed genes to the diagnosis. Finally, we performed correlation and differential analyses on immune cells to identify the differences in gene expression among various immune cells. Materials and Methods Data Source, Preprocessing, and Analysis: In this study, we conducted a comprehensive analysis of multiple databases, including GEO, IBDDMB, and UKB. Specifically, we retrieved disease datasets (GSE87466, GSE179285, and GSE87473) from the GEO database. These datasets encompassed data from 438 patients with inflammatory bowel disease and 51 healthy individuals. The patient biopsy data were sourced from the sigmoid colon, ascending/descending colon, and terminal ileum. The patient data included cases of moderate to severe active ulcerative colitis.We utilized R software (version 4.3.3) for data preparation. During the preprocessing stage, we removed probes corresponding to multiple genes and converted probe IDs into gene symbols using the annotation file of the platform. When dealing with multiple probes for the same gene, we retained only the probe with the highest signal value. To ensure data consistency and reliability, we took measures to reduce the potential impact of batch effects, which are commonly introduced during the data integration process. Batch effects can occur due to variations in experimental conditions, instruments, or sample processing over time or across different datasets, and they may severely confound the interpretation of results. Thus, we employed the "limma", "pheatmap", and "ggplot2" packages to calibrate the data for different groups (diseased and non - diseased). Additionally, the "pheatmap" and "ggplot2" packages were used to generate heatmaps and volcano plots of differentially expressed genes (DEGs), respectively. The data were analyzed using log2 transformation. Transcriptome Data Refinement and Analysis Process: We utilized supplementary probe annotation files to convert the expression matrix from the probe level to the gene level. For genes associated with multiple probes, the arithmetic mean of the corresponding probe values was employed to represent gene expression. Following this conversion, we standardized the dataset and then applied the SVA package for batch - effect correction. Principal component analysis (PCA) was utilized to assess the success of the standardization process.To identify the differentially expressed genes between Inflammatory Bowel Disease and control samples, we made use of the limma package (linear model of microarray data). Differentially expressed genes were defined as those with an absolute logarithmic fold change (|log FC|) greater than 2 and an adjusted p - value less than 0.05. Particular emphasis was placed on genes that might be related to immune infiltration in IBD patients and normal individuals. Enrichment Analysis: To clarify the biological significance and pathway associations of differentially expressed genes (DEGs), we carried out comprehensive Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses. In the R programming environment, we systematically explored the impacts of differentially expressed protein - modifying genes (PMGs) on key biological processes (BP), molecular functions (MF), and cellular components (CC).We employed R software for this analysis, utilizing tools like the “clusterProfiler” and “org” packages, along with “Hs.eg.db”, “enrichplot”, and “ggplot2” packages. With a focus on KEGG pathway data, we enriched our interpretation. This integrated approach allows us to grasp the complexity of the molecular landscape related to Inflammatory Bowel Disease and the control group, offering a comprehensive framework for further exploration of the underlying mechanisms. Artificial Neural Network: Artificial neural network (ANN) models are playing an increasingly crucial role in the realm of predictive modeling. This is because they are capable of capturing nonlinear relationships within high - dimensional datasets [ 10 ]. ANN models can predict complex variable relationships that other models, such as logistic regression models, are unable to achieve.The working principle of an ANN model is inspired by biological neural networks. In an ANN, each neuron is interconnected with other neurons. Neurons have two main components: dendrites and axons. Dendrites function as receivers of information, while axons serve as transmitters. The nucleus of a neuron holds the information to be transmitted.An ANN typically consists of an input layer, one or more hidden layers, and an output layer. Information enters the model through the input layer, undergoes processing in the hidden layer(s), and is then output via the output layer [ 11 ].In our study, we utilized R software for correlation analysis. To enhance the intuitiveness of the data, we employed packages like “neuralnet” and “NeuralNetTools” to generate relevant graphics. Machine Learning Algorithms: To identify candidate genes, we employed the “VennDiagram” package to visualize the intersection of key genes among differentially expressed genes (DEGs). Three machine learning algorithms, namely the Least Absolute Shrinkage and Selection Operator (LASSO), Support Vector Machine (SVM), and Random Forest (RF), were utilized to identify potential biomarkers.For the LASSO analysis, we used the “glmnet” package with a penalty parameter and 10 - fold cross - validation. This approach was applied to select important variables from high - dimensional data.RF is an ensemble estimator that consists of multiple decision trees as its basic estimators. In the classification process of RF, each tree determines a category, and the category receiving the highest number of votes is designated as the final output.SVM consider each predictor as a dimension in a high - dimensional space. SVM aims to find the optimal hyperplane for classifying samples, and it demonstrates excellent performance when dealing with highly complex data.Finally, the genes at the intersection of the results from the LASSO, RF, and SVM algorithms were identified as potential biomarkers for IBD. Explanation of Machine Learning Models: Ten machine learning models were constructed using the selected predictive factors. These models included Ridge Least Squares (RLS), RF, Decision Tree (DTS), SVM, Logistic Regression, K-Nearest Neighbors (KNN), Extreme Gradient Boosting (XGBoost), Gradient Boosting Machine (GBM), Neural Network, and Generalized Linear Model Boosting (GlmBoost).Some of these models possess high interpretability and are well - suited for analyzing linear relationships. Others are capable of capturing non - linear relationships and interactions, making them suitable for handling high - dimensional data. For instance, XGBoost is an optimized version of the gradient boosting framework. It supports parallel computing and can automatically handle missing values, showing excellent performance in clinical prediction tasks.We evaluated these models based on several metrics, such as the receiver operating characteristic (ROC) curve, specificity, sensitivity, and accuracy, with the ROC curve serving as the primary evaluation indicator. The model that exhibited the best predictive performance was chosen as the main model for this study.To further understand the interpretability of the final prediction model, we employed the Shapley Sum Interpretation (SHAP) method [ 12 ]. This method helps to explain how each feature contributes to the model's prediction, providing valuable insights into the underlying mechanisms of the model. Gene Set Enrichment Analysis and Gene Set Variation Analysis: Gene Set Enrichment Analysis (GSEA) is a powerful and widely - used approach in genomics research. It aims to uncover biological pathways, functions, or molecular features associated with distinct phenotypes or experimental conditions. In this study, we downloaded the immunological signature gene set (c7: immunological signature gene sets) and conducted GSEA analysis on all genes within the immune cell cluster (version 1.64.0).Conversely, Gene Set Variation Analysis (GSVA) offers a unique way to represent gene set variation within a sample. It achieves this by converting gene expression data into gene set activity scores, without the need for data ranking. To perform GSVA, we calculated the average expression value of genes in each cell cluster using the immune - related gene set (h: hallmark gene set) (version 1.50.0).Finally, we visually presented the results of the GSVA analysis. This visualization helps to better understand the differences and trends in gene set activities, providing valuable insights into the underlying biological mechanisms related to the immune cell clusters. Immune Cell Infiltration and Correlation Analysis: The CIBERSORT algorithm was employed to assess the relative abundance of immune cells in each normal and IBD sample.Subsequently, Spearman correlation analysis was carried out to determine the relationship between potential biomarkers and immune cells. Additionally, a differential analysis of immune cells was conducted for each gene individually. Data Analysis: Each experiment was performed at least three times. The ROC curve was established, and the area under the curve (AUC) and 95% confidence interval (CI) values were calculated and validated using SPSS software. The statistical significance between the two groups was determined via the Student's t - test. The results were further analyzed using GraphPad Prism version 7 software. A P - value less than 0.05 was considered to indicate significant statistical significance. Result 1. PCA data correction Figure 1 shows the flowchart of this study. Before batch calibration, samples from different experiments are separated, and there is a batch effect between them. Through PCA analysis, samples from different experiments were randomly shuffled to eliminate batch effects (Fig. 2A, 2B). 2.Limma DEGs The Limma method was used to identify differentially expressed genes (DEGs) in the GSE87466, GSE179285, and GSE87473 datasets, with 12 upregulated and 5 downregulated (Fig. 2C, 2D). 3. Enrichment Analysis Results The enrichment analysis of 17 candidate diagnostic genes revealed that they are predominantly associated with immune response, microbial factors, and inflammatory response, underscoring their crucial role in the pathogenesis and progression of IBD.KEGG analysis indicated that these genes were significantly enriched in categories such as Adipocytokine Secretion, Bile Acid - related Pathways, FoxO Signaling Pathway, IL − 17 Signaling Pathway, Alcohol - related Cancer - associated Receptor - Cytochrome Interactions, and AMPK - Glucagon - Leukocyte - Staphylococcus aureus - related Pathways (Fig. 3 A - D).In terms of biological processes, GO analysis emphasized the involvement of these candidate genes in areas like Antimicrobial Activity, Cellular Responses to Toxic cAMP - related Substances, etc. (Fig. 3 E - H). Moreover, molecular functional analysis showed that Response to Lipolysis was a significant category among these genes. 4. ANN Prediction Results The ANN employed in this study consists of three layers: the input layer, the hidden layer, and the output layer. In the input layer, disease features and genes are assigned scores. Subsequently, based on these scores, as well as the weights associated with disease features and genes, the hidden layer is generated. There are five nodes in the hidden layer, and using these nodes and their respective weights, the output layer is derived. The output layer represents the properties of the sample.Regrettably, the ANN we designed achieved an accuracy rate of only 21% when predicting the control group, while it reached 95.2% for predicting the experimental group. To evaluate the overall diagnostic performance, we constructed receiver operating characteristic (ROC) curves. The results indicated that the accuracy of sample prediction using the ANN was 93.7% (Fig. 4 A, Fig. 4 E). 5. Selection Results of Characteristic Genes We utilized LASSO regression to generate cross - validation graphs for identifying the characteristic genes of 11 diseases (Fig. 4 B, Fig. 4 F). Employing the SVM method, we obtained accurate graphs and cross - validation error graphs, through which 8 disease - characteristic genes were screened out (Fig. 4 C, Fig. 4 G). By means of the RF method, a scoring map of forest trees and gene importance was obtained, and 10 characteristic genes of diseases were selected (Fig. 4 D, Fig. 4 H).A Venn diagram was used to display the intersection of genes selected by the three machine - learning algorithms. As a result, a total of four intersecting genes were identified (Fig. 4 I). 6. Visualization results Draw a box plot of differentially expressed genes, and combine it with a volcano plot of differentially expressed genes to observe that DUOX2, LCN2, and DEFA6 are upregulated in the experimental group. LOC389023 was downregulated in the experimental group. Through the chromosome circle diagram, we can observe the intersection characteristic genes and their distribution on the chromosome (Fig. 4 J-L). 7. Machine learning model results Using ten machine learning models including RLS, RF, DTS, SVM, Logistic, KNN, X GBoost, GBM, Neural Net, and GlmBoost, the diagnostic efficiency of four IBD differentially expressed genes was evaluated through 10 fold cross validation training control parameters. After finding the optimal diagnostic model, the Type column of the training set was converted into binary labels, and the model was retrained using the optimal method. Construct ROC curve to evaluate overall diagnostic performance (Fig. 5 A). 8. SHAP analysis results By analyzing the predictive performance of machine learning, bar charts, bee colony plots, scatter plots, waterfall plots, and force plots can be obtained. Bar chart, the larger the value, the greater the impact of this gene on the prediction results. The bee colony plot obtains the mean SHAP value of each gene, which represents the contribution to the machine learning model. A scatter plot is a three-dimensional graph that allows observation of the interaction between genes and SHAP values. The waterfall plot can display the predicted results of a single sample, and the larger the absolute value of the value, the greater the impact of this gene on the predicted results. Strive to display the predicted results of a single sample. Firstly, find the benchmark value, and then for each gene, we can obtain a predicted result (Fig. 5 B-F). In summary, through SHAP analysis, we explained the machine learning model and calculated the contribution size of each gene. In each sample, patients can also be distinguished by gene expression. We can interpret the model by comprehensively analyzing the results of these genes. 9. GSEA and GSVA analysis results Analyze which functions or pathways are enriched in the high expression group or low expression group of the target gene through GSEA/GSVA, and visualize the top five pathways with the most significant enrichment (Fig. 6A-L). 10. Results of immune cell infiltration analysis Obtain the content of immune cells in each sample through immune cell infiltration analysis, and visualize the results of immune cell infiltration to obtain a bar chart. It is worth noting that the content refers to the relative content, and the sum of all immune cells is one. In the box plot of differences, if there is an asterisk above a certain immune cell, it indicates that the immune cell has differences between the control group and the experimental group. The horizontal and vertical axes of the correlation graph represent the names of immune cells, and the values inside represent the correlation coefficients (Fig. 7A-D). In order to investigate the relationship between immune cells and LOC389023, DUOX2, LCN2, and DEFA6, scatter plots were visualized for immune cells with significant correlations. Then, we can visualize the correlation results and obtain correlation lollipop plots (Fig. 8 A-D). Discussion Our research offers a comprehensive genetic - based interpretation of the differences in genes, action pathways, and immune responses in IBD. We integrated multi - chip analysis, ANN, and ten machine - learning methods. Additionally, we performed interpretable analysis on the machine - learning model to explore the contribution of differential genes to the diagnosis of IBD.By using differential genes as a benchmark, we investigated the changes of each gene in the immune response and examined the alterations and correlations of immune cells. This allowed us to elaborate on the genetic factors of IBD patients from both genetic and immune - response perspectives. Based on these findings, we developed a predictive model and validated it multiple times using data from the GEO database. IBD results from the interaction between immune responses and genetic factors, and the impacts of factors such as microorganisms, the environment, and diet cannot be ignored [ 13 ]. Currently, the clinical diagnosis of IBD is still restricted by various limitations, and a simple and convenient diagnostic method remains to be developed. The biomarkers identified through our screening can be detected in peripheral blood, enabling an easy assessment of the likelihood of IBD in subjects. In our study, we have successfully identified four biomarkers for the diagnosis of IBD, namely LOC389023, DUOX2, LCN2, and DEFA6. These biomarkers play a crucial role in immune function.The ANN we designed exhibited an accuracy rate of merely 21% when predicting the control group, yet it achieved a remarkable 95.2% accuracy rate for predicting the experimental group. By constructing ROC curves to assess the overall diagnostic performance, the accuracy of sample prediction using the ANN reached 93.7%.We utilized these biomarkers to construct machine - learning models. After selecting the optimal model through 10 - fold cross - validation, we transformed the raw data for secondary validation. Among the machine - learning algorithms, the Gradient Boosting Machine (GBM) and K - Nearest Neighbors (KNN) demonstrated the highest diagnostic performance, reaching 95.2%. They were able to effectively distinguish between the experimental and control groups.However, there are numerous uncertainties regarding the diagnostic performance of the ANN. We made multiple attempts to modify the number of hidden layers and conducted small - scale validations on the data. Unfortunately, the results did not meet our expectations. After consulting relevant literature and experts, it is generally hypothesized that gene scoring might have led to data overfitting in the ANN.Nonetheless, we employed various types of machine - learning methods to analyze the data from multiple dimensions and perspectives, which effectively circumvented these issues. To this end, we compared the results of the ANN and machine - learning models, highlighting that machine - learning models are more reliable. Nevertheless, large - scale validation is still required to confirm their generalizability. The research findings indicate that LOC389023 plays a vital role in regulating gene expression and chromatin status. LOC389023 is a long non - coding RNA located within chromosome 2q14.1 and the DPP10 gene. In the nuclei of neurons, LOC389023 contains GC - rich stem - ring motifs. These motifs can bind to the SUZ12 protein of the chromatin and the polycomb repressive chromatin - modifying complex 2 (PRC2) complex. By doing so, it recruits chromatin - remodeling inhibitors, leading to a reduction in DPP10 gene expression. This regulatory effect exhibits cell - type specificity. The study also uncovered the relationships among LOC389023, gene expression, histone methylation, and voltage - gated K(+) channels during neural development [ 14 ].DNA methylation and non - coding RNAs have been extensively investigated in patients with Inflammatory Bowel Disease (IBD). DNA methylation is dependent on dietary cofactors such as substrates and nutrients (folate, vitamin B12/D, etc.). It is associated with inflammation, microbiota composition, and microRNAs, which can influence IBD by interfering with T - cell differentiation. Marangoni et al. conducted a comprehensive analysis of the role of DNA methylation in IBD and its impact on the inflammatory process [ 15 ].Furthermore, studies have revealed that voltage - gated K(+) channels, such as KV1.3 and KCa3.1, are involved in K(+) conductance in T lymphocytes. They play a crucial role in cell proliferation, differentiation, apoptosis, and infiltration. In chemically induced IBD model mice, the activity and expression of KCa3.1 in CD4(+) T lymphocytes of mesenteric lymph nodes were increased. Moreover, its regulatory factor NDPK - B showed positive expression. When the KCa3.1 K(+) channel was blocked with TRAM − 34 and/or ICA17043, the severity of IBD was significantly reduced. Symptoms such as diarrhea, fecal blood, inflammation, and colonic crypt injury were alleviated. Simultaneously, the expression levels of KCa3.1 and Th1 cytokines in CD4(+) T lymphocytes were restored. This indicates that the abnormality of the KCa3.1 channel is related to the development of IBD, and intervention targeting this channel can improve the disease condition [ 16 , 17 ].In addition, the upregulation of K2P5.1 in T lymphocytes is associated with the pathogenesis of autoimmune diseases. Given that IBD also belongs to this category of diseases and pre - ion channel mRNA splicing is related to the disease, the mRNA splicing mechanism regulated by the K2P5.1 K(+) channel transcription holds guiding significance for the treatment of IBD and other diseases [ 17 ]. Lipocalin 2 (LCN2), a member of the adipokine protein family, is predominantly involved in processes such as cell growth, differentiation, metabolism, and immune response. During inflammation, various cell types, including macrophages, epithelial cells, and neutrophils, secrete LCN2. It then exerts its effects on other cells via the bloodstream or the local tissue microenvironment, thereby influencing inflammatory responses, immune regulation, and other physiological processes.LCN2 is implicated in both acute and chronic inflammation and plays a crucial pathogenic role in diseases such as cancer, diabetes, obesity, and multiple sclerosis [ 18 , 19 ]. Through a series of cell - and tissue - based studies, Xia et al. discovered that LCN2 is key in regulating iron homeostasis and the inflammatory response. LCN2 can interact with iron and iron - carriers in diverse cell types, including immune cells and epithelial cells. By binding to bacterial iron - carriers, it inhibits bacterial growth. Moreover, it can regulate cell survival, apoptosis, and other processes by modulating intracellular iron levels. Under inflammatory conditions, LCN2 stabilizes the iron pool and reduces iron - related toxicity. For instance, in intestinal inflammation, LCN2 restricts the availability of iron in the intestine, safeguarding the mucosa from damage. In kidney diseases, it protects against acute kidney injury, yet in chronic kidney disease, it may exacerbate the disease progression. Additionally, LCN2 has a complex role in tumor cells. It can both promote tumor cell growth and metastasis and potentially inhibit tumor development [ 20 – 22 ].Qun et al. conducted experiments infecting mice with harmful bacteria and found that LCN2 is involved in microbial invasion, the inflammatory response, and tissue damage. It also participates in regulating the balance of the gut microbiota and its metabolites. When the LCN2 gene is absent, significant changes occur in the operational taxonomic units (OTUs), alpha - and beta - diversity of the gut microbiota in mice infected with harmful bacteria. Simultaneously, intestinal metabolites are affected, with increased levels of metabolites such as taurodeoxycholic acid and undecylenic acid. This indicates that LCN2 may modulate the intestinal environment by regulating the composition and metabolite levels of the gut microbiota, thus influencing the host's health and disease resistance [ 23 ]. There are two types of dual oxidase (DUOX) enzymes, namely DUOX1 and DUOX2. Their primary function is to generate reactive oxygen species (ROS) in tissues such as the thyroid, colon, respiratory tract, and lymphatic system. DUOX significantly contributes to the synthesis of hydrogen peroxide (H₂O₂), a substance that plays a pivotal role in the host defense system. H₂O₂ is involved in processes such as signal transduction, cell differentiation, cell death programs, immune defense, microbial composition regulation, and hormone synthesis (specifically thyroid hormone) [ 24 , 25 ].Helmut carried out a multi - omics whole - phenotype association study (PheWAS) on 2872 participants to analyze the relationship between DUOX2 gene variations and the Inflammatory Bowel Disease (IBD) phenotype. The study revealed that rare variations in the DUOX2 gene were associated with an elevated risk of IBD. Through multi - omics PheWAS and rare - variation association analysis, the link between DUOX2 variations and the pathogenesis of IBD was elucidated [ 25 ].In the context of respiratory diseases, taking respiratory viral infections as an example, Ducquin's research demonstrated that DUOX2 selectively regulates the cytokines and chemokines secreted by epithelial cells. This, in turn, impacts the recruitment, adhesion, and degranulation of neutrophils. It was thus found that DUOX2 plays a crucial role in modulating the interaction between epithelial cells and immune cells [ 26 , 27 ].In the study of congenital hypothyroidism (CH), DUOX2 mutations have been extensively explored. In a critical CH cohort, 50% of patients carry DUOX2 mutations (38%). These mutations are associated with the patients' biochemical characteristics, influencing the diagnosis and treatment of the disease. Moreover, current screening thresholds may result in missed diagnoses. At the cellular level, the expression of DUOX2 on the cell membrane and its mediation of H₂O₂ production are of utmost importance. When DUOX2 - mediated H₂O₂ production is completely lost, it further impairs the synthesis of thyroid hormones, ultimately leading to the development of CH.From the perspective of signaling pathways, Juan's research indicated that DUOX2 plays a significant role in the development of IBD - related tumors. When the TLR4 signaling pathway is activated in epithelial cells, the expression of DUOX2 is upregulated. DUOX2 then catalyzes the production of H₂O₂, which is closely associated with the initiation and progression of tumors. Additionally, DUOX2 interacts with the microbiota. In this process, the generated H₂O₂ promotes tumor development, affecting the transition from IBD to tumor [ 29 ]. DEFA6 is secreted by specialized epithelial cells located at the base of the small intestine crypts, namely Paneth cells, and it represents the most abundant antibacterial agent produced by Paneth cells in the small intestine. Its release into the crypt lumen is thought to safeguard against microbial invasion into the crypt microenvironment [ 30 ]. As an antimicrobial peptide, DEFA6 plays a crucial part in the intestinal immune defense system by combating pathogens.The author conducted a study on mucosal samples obtained from 88 CD patients who underwent ileocolonic resection. The results showed that while the expression of DEFA6 in the healthy and diseased ileal mucosa of early - and late - stage CD patients did not exhibit significant differences, there was an upward trend in the expression of DEFA6 in the external validation cohort of late - stage CD patients. As the CD disease course progresses, the expression of antimicrobial - peptide - related genes, such as DEFB4A, increases in the affected mucosa. It is hypothesized that persistent mucosal damage may enable intestinal bacteria to interact with epithelial cells, thereby stimulating the expression of antimicrobial peptides. Abundant evidence indicates that alterations in DEFA6 expression are associated with the CD disease course [ 31 ].Stephen's research revealed that DEFA6, serving as a specific marker for Paneth cells in the small intestine, contributes to intestinal immune defense and the maintenance of the gut microbiota balance [ 32 , 33 ]. Serena's work confirmed that DEFA6 functions to maintain intestinal immune homeostasis in the Paneth cells of human small - intestine organoids. Studies have shown that Paneth cells can produce antibacterial substances like DEFA6 to preserve gut microbiota balance. However, the expression level of DEFA6 in human small - intestine organoids is extremely low and significantly differs from that in the source tissue. In contrast, mouse small - intestine organoids can more effectively mimic the expression of α - defense factors in tissues. Moreover, the author found that WNT signal stimulation fails to restore the expression of DEFA6 in human small - intestine organoids. Nevertheless, after treatment with FOXO inhibitors, the mRNA expression of DEFA6 increased by over 100,000 - fold, nearly reaching the level of human tissue. This finding indicates that the FOXO signaling pathway is essential for regulating the expression of DEFA6 in human Paneth cells. Inhibiting the FOXO signaling pathway can effectively restore the expression of DEFA6, which holds great significance for the study of intestinal diseases and the enhancement of intestinal immunity [ 34 ]. Through GO and KEGG analyses, it was revealed that the differentially expressed genes were predominantly enriched in the antibacterial aspects of the human immune system and the IL − 17 signaling pathway.Nicholas conducted a study on 136 IBD patients. The findings indicated that patients with low IgG/G1 levels had poorer clinical survival data compared to those with normal levels. This suggests that humoral immunity plays a pivotal role in the survival of IBD patients. When IBD patients experience compromised humoral immunity, their likelihood of requiring surgery increases. This implies that low IgG/G1 levels have differential impacts on the surgical requirements of different subtypes of IBD patients. It also indicates that humoral immunity can serve as a predictor of IBD patients' survival, which is of great significance for their clinical management.IL − 17 is closely associated with IBD and is essential in the pathological progression of IBD. Research has demonstrated that IL − 17 is a key cytokine secreted by Th17 cells and plays a substantial role in the development of intestinal inflammation in IBD patients [ 35 ]. In CD patients, IL − 17 - producing cells accumulate in large quantities in the submucosal and muscularis propria layers. Moreover, compared to healthy individuals, the number of IL − 17 - producing T cells in CD patients is significantly elevated. Ample evidence points to a strong link between IL − 17 and IBD [ 36 ].Furthermore, Kosaku's research has also shown that IL − 17 is closely related to IBD and plays a crucial role in its pathogenesis [ 37 , 38 ]. Through whole - exome sequencing analysis of colon organoids from UC patients and healthy controls, it was discovered that the UC inflammatory epithelium accumulates somatic mutations in multiple genes associated with the IL − 17 signaling pathway, such as NFKBIZ, ZC3H12A, and PIGR. These genes are rarely affected in colon cancer but undergo mutations within the inflammatory environment of UC. This indicates that the IL − 17 signaling pathway is significantly perturbed in the pathological process of UC. Additionally, gene mutations related to the IL − 17 signaling pathway may also be implicated in the occurrence and development of UC in humans. They may disrupt the intestinal immune balance by interfering with the IL − 17 signaling pathway, thus promoting the progression of UC [ 38 ]. Our research results demonstrated differences in plasma cells, follicular helper T cells, activated natural killer (NK) cells, M1 macrophages, resting mast cells, activated mast cells, M0 macrophages, and neutrophils between the control group and the experimental group. Subsequently, through an investigation of the relationships between genes and immune cells, we found that in LOC389023, there were significant differences in M1 macrophages, M2 macrophages, activated mast cells, resting mast cells, and neutrophils. In LCN2, significant differential changes were observed in eosinophils, M0 macrophages, M1 macrophages, M2 macrophages, resting mast cells, neutrophils, plasma cells, activated CD4 memory T cells, and CD8 T cells. In DUOX2, stronger associations were detected in M0 macrophages, M1 macrophages, M2 macrophages, activated mast cells, resting mast cells, neutrophils, plasma cells, activated CD4 memory T cells, and CD8 T cells. In DEFA6, significant changes were noted in M1 macrophages, activated CD4 memory T cells, and regulatory T cells (Tregs). Moreover, M1 macrophages exhibited consistent differential changes when the four differentially expressed genes varied. M2 macrophages, resting mast cells, neutrophils, and activated CD4 memory T cells all showed obvious differences in three of the differentially expressed genes.Chen's research revealed that macrophages can polarize into two phenotypes, M1 and M2. M1 macrophages are mainly involved in pro - inflammatory responses and can secrete pro - inflammatory factors such as IL − 6, IL − 12, and TNF. M2 macrophages are mainly involved in anti - inflammatory responses and contribute to tissue repair, with characteristic expressions of arginase − 1 (Arg − 1), mannose receptor (CD206), and anti - inflammatory factor IL − 10, etc. IBD is an intestinal inflammatory disorder, and the inflammatory microenvironment in the intestine is closely related to macrophage polarization [ 39 , 40 ]. Based on the author's research findings, we speculate that the over - activation of M1 macrophages may exacerbate the intestinal inflammatory response in IBD patients, leading to tissue damage. In contrast, the anti - inflammatory and tissue - repair functions of M2 macrophages may help alleviate the inflammatory symptoms of IBD and promote the repair of intestinal tissue. Regrettably, none of these issues have been verified, so it cannot be simply stated that M1/M2 macrophages are closely associated with IBD. However, through bioinformatics analysis methods and leveraging GEO data, the results of our comprehensive analysis indicate that macrophage polarization plays an important role in IBD [ 40 ]. Zhen's research revealed that, in colorectal cancer patients, the density of mast cells is lower compared to normal tissues, and their phenotype undergoes substantial changes, shifting from a quiescent state to an activated one. In the tumor microenvironment, activated mast cells release a variety of bioactive substances, including histamine, cytokines, proteases, and lipid mediators, which are capable of triggering inflammatory responses [ 41 ]. Based on this, we hypothesize that the abnormal expression of differentially expressed genes leads to the over - activation of immune cells. These activated immune cells then release active substances, thereby contributing to the onset and progression of IBD. Our research findings indicate that mast cells exhibit significant differences across numerous differentially expressed genes [ 42 ]. Neutrophils play a key role in the innate immunity of the intestine [ 43 ]. Camille's study demonstrated that neutrophil infiltration serves as one of the markers of disease activity in IBD patients. Neutrophils can release various inflammatory mediators, such as ROS, cytotoxic particle contents, and neutrophil extracellular traps (NETs), thereby triggering inflammatory responses and causing tissue damage [ 44 , 45 ]. However, simultaneously, neutrophils are also essential for maintaining the intestinal barrier, host defense, and reducing inflammation. For instance, CD177 + neutrophils have a protective effect. The study also found that neutrophil infiltration is associated with the disease severity, and its excessive activation may lead to treatment failure. In CD, the dysfunction and reduced recruitment of neutrophils result in delayed bacterial clearance and the persistent presence of antigens, which trigger adaptive immune responses and the formation of granulomas. Additionally, the gut microbiota can regulate the production, function, and maturation of neutrophils, while neutrophils also influence the composition and function of the microbiota. The two interact and jointly impact the development of IBD [ 45 ]. CD4 + T cells display distinct characteristics in CD and UC. In CD patients, there is a significant expansion of CD4 + tissue - resident memory T cells (Trm) in the gut. Notably, CD4 + Trm subsets expressing CD161 and CCR5 in CD patients exhibit stronger cytotoxicity and are associated with disease activity. In contrast, the intestine of UC patients is abundant in CD45RA+, CCR7 + naive CD4 + T cells and CXCR5 + T follicular helper cells (Tfh).Moreover, CD - specific CD4 + Trm (CDtrm) possess innate - like cell properties. They can rapidly secrete inflammatory cytokines such as IFN - γ upon cytokine stimulation, without the necessity of T - cell receptor (TCR) activation. This ability can cause damage to intestinal epithelial cells. In UC, the increase in Tfh cells may be linked to the elevation of pathological IgG + plasma cells [ 46 ]. Currently, in clinical practice, achieving an accurate diagnosis of IBD remains a challenge. The diagnosis mainly relies on medical history, clinical symptoms, laboratory tests, imaging examinations, endoscopy, and histological examinations. Colonoscopy is widely regarded as the "gold standard" for diagnosing IBD. However, colonoscopy is time-consuming and highly dependent on the diagnostic experience of relevant physicians. Additionally, the intestinal wall of IBD patients in severe stages is extremely fragile, and improper operation during colonoscopy can lead to complications such as bleeding and perforation.Therefore, our objective is to identify more reliable and sensitive biomarkers that can directly reflect the different stages of IBD and its underlying disease mechanisms. To this end, we carried out rigorous screening in databases such as the GEO, IBDDMB, and the UKB. We imposed strict restrictions on the number of patients included in the study. Only patients from cohorts with a population size greater than 100 were considered, and the scope of inclusion covered all stages of IBD onset and the entire intestinal tissue.Regrettably, although we obtained a substantial amount of data from public databases for analysis and made efforts to include as complete data as possible, we did not conduct real-world verification. Nevertheless, our comprehensive analysis of existing IBD data using multiple methods and our genetic - based study of specific immune cells that exhibit differences in immune responses are unprecedented. This approach provides novel research perspectives for the study of IBD. Currently, ANN, machine learning, interpretable analysis, and multi - chip joint analysis are gaining increasing popularity in medical research. Given the complex genetic basis of IBD and the limitations of existing diagnostic tools, the diagnosis of IBD often poses significant challenges.By analyzing extensive genomic data, ANNs and machine learning hold great promise for enhancing diagnostic efficiency and accuracy. They can also identify complex multi - allele patterns that may be associated with specific diseases. Through interpretable analysis, detailed insights can be provided into the contribution of differentially expressed genes within diagnostic models, thereby offering explanations for machine - learning models [ 47 ].The application of artificial neural networks and computer technologies such as machine learning and deep learning to assist clinical doctors in disease diagnosis and treatment has become increasingly prevalent. The continuously evolving artificial intelligence enables more precise and scientific diagnosis and treatment, ultimately bringing benefits to patients in need of medical care. conclusion In this study, we developed a diagnostic model for predicting IBD using comprehensive ANN, machine learning, interpretable analysis, and multi chip joint analysis methods. ANN, LASSO, SVM, and Random Forest algorithms are used for genetic feature selection. We identified four key biomarkers (LOC389023, DUOX2, LCN2, and DEFA6) and used ten machine learning methods and SHAP models to assist in IBD diagnosis, clarifying the genetic characteristics, molecular pathways, and differential immune cells of IBD. The research results showed that these genes are related to immune system function. In addition, we conducted multiple validations using the dataset. Our research findings indicate that machine learning algorithms can facilitate accurate diagnostic decisions for IBD, enabling clinicians to explore new treatment pathways and diagnostic methods. Declarations • Ethics approval and consent to participate NOT REQUIRED. • Consent for publication NOT REQUIRED. • Competing interests The author declares that there is no conflict of interest. • Funding National Natural Science Foundation of China (32372302). National Natural Science Foundation of China(82405210). • Authors' contributions Yan chaosheng: Data curation, Methodology, Writing- Original draft preparation, Formal analysis, Methodology. Rao jingjing:Resources, Writing - Original Draft. Dai yuanyuan:Writing- Reviewing and Editing. Duan wenhui: Investigation. Sun haowen: Investigation. Sheng yinyue:Methodology, Writing - Review & Editing, Project administration, Funding acquisition. Xue yuzheng:Methodology, Writing - Review & Editing, Project administration, Funding acquisition. • Acknowledgements Throughout the entire research process, all the authors not only provided professional academic guidance, helping me to solve the challenges in experimental design and data analysis, but also patiently offered numerous revision suggestions during the paper - writing stage. This has significantly enhanced the quality of the paper. Additionally, they provided financial support. • Availability of data and material The GSE87466, GSE179285, and GSE87473 microarray datasets used in this study were downloaded from the Gene Expression Omnibus database. References SAEID SEYEDIAN S, ALIMENTARY TRACT RESEARCH CENTER, AHVAZ JUNDISHAPUR UNIVERSITY OF MEDICAL SCIENCE, AHVAZ, IRAN, NOKHOSTIN F, et al. A review of the diagnosis, prevention, and treatment methods of inflammatory bowel disease[J/OL]. Journal of Medicine and Life, 2019, 12(2): 113-122. DOI:10.25122/jml-2018-0075. HODSON R. Inflammatory bowel disease[J/OL]. Nature, 2016, 540(7634): S97-S97. DOI:10.1038/540S97a. DIEZ-MARTIN E, HERNANDEZ-SUAREZ L, MUÑOZ-VILLAFRANCA C, et al. Inflammatory Bowel Disease: A Comprehensive Analysis of Molecular Bases, Predictive Biomarkers, Diagnostic Methods, and Therapeutic Options[J/OL]. International Journal of Molecular Sciences, 2024, 25(13): 7062. DOI:10.3390/ijms25137062. BISGAARD T H, ALLIN K H, KEEFER L, et al. Depression and anxiety in inflammatory bowel disease: epidemiology, mechanisms and treatment[J/OL]. Nature Reviews Gastroenterology & Hepatology, 2022, 19(11): 717-726. DOI:10.1038/s41575-022-00634-6. HE X, ZHOU H. Decoding the IBD paradox: A triadic interplay between REG3, enterococci, and NOD2[J/OL]. Cell Host & Microbe, 2023, 31(9): 1425-1427. DOI:10.1016/j.chom.2023.08.008. GILLILAND A, CHAN J J, DE WOLFE T J, et al. Pathobionts in Inflammatory Bowel Disease: Origins, Underlying Mechanisms, and Implications for Clinical Care[J/OL]. Gastroenterology, 2024, 166(1): 44-58. DOI:10.1053/j.gastro.2023.09.019. COSÍN-ROGER J. Inflammatory Bowel Disease: Immune Function, Tissue Fibrosis and Current Therapies[J/OL]. International Journal of Molecular Sciences, 2024, 25(12): 6416. DOI:10.3390/ijms25126416. AGRAWAL M, ALLIN K H, PETRALIA F, et al. Multiomics to elucidate inflammatory bowel disease risk factors and pathways[J/OL]. Nature Reviews Gastroenterology & Hepatology, 2022, 19(6): 399-409. DOI:10.1038/s41575-022-00593-y. AGRAWAL M, SPENCER E A, COLOMBEL J F, et al. Approach to the Management of Recently Diagnosed Inflammatory Bowel Disease Patients: A User’s Guide for Adult and Pediatric Gastroenterologists[J/OL]. Gastroenterology, 2021, 161(1): 47-65. DOI:10.1053/j.gastro.2021.04.063. STIDHAM R W, TAKENAKA K. Artificial Intelligence for Disease Assessment in Inflammatory Bowel Disease: How Will it Change Our Practice?[J/OL]. Gastroenterology, 2022, 162(5): 1493-1506. DOI:10.1053/j.gastro.2021.12.238. RENGANATHAN V. Overview of artificial neural network models in the biomedical domain[J/OL]. Bratislava Medical Journal, 2019, 120(07): 536-540. DOI:10.4149/BLL_2019_087. SHEN M, ZHANG Y, ZHAN R, et al. Predicting the risk of cardiovascular disease in adults exposed to heavy metals: Interpretable machine learning[J/OL]. Ecotoxicology and Environmental Safety, 2025, 290: 117570. DOI:10.1016/j.ecoenv.2024.117570. QIU P, ISHIMOTO T, FU L, et al. The Gut Microbiota in Inflammatory Bowel Disease[J/OL]. Frontiers in Cellular and Infection Microbiology, 2022, 12: 733992. DOI:10.3389/fcimb.2022.733992. TUSHIR J S, AKBARIAN S. Chromatin-bound RNA and the neurobiology of psychiatric disease[J/OL]. Neuroscience, 2014, 264: 131-141. DOI:10.1016/j.neuroscience.2013.06.051. MAGRO D O, SASSAKI L Y, CHEBLI J M F. Interaction between diet and genetics in patients with inflammatory bowel disease[J/OL]. World Journal of Gastroenterology, 2024, 30(12): 1644-1650. DOI:10.3748/wjg.v30.i12.1644. ZENG B, HUANG Y, CHEN S, et al. Dextran sodium sulfate potentiates NLRP3 inflammasome activation by modulating the KCa3.1 potassium channel in a mouse model of colitis[J/OL]. Cellular & Molecular Immunology, 2022, 19(8): 925-943. DOI:10.1038/s41423-022-00891-0. OHYA S. Physiological Role of K + Channels in the Regulation of T Cell Function[J/OL]. YAKUGAKU ZASSHI, 2016, 136(3): 479-483. DOI:10.1248/yakushi.15-00246-4. GUPTA U, GHOSH S, WALLACE C T, et al. Increased LCN2 (lipocalin 2) in the RPE decreases autophagy and activates inflammasome-ferroptosis processes in a mouse model of dry AMD[J/OL]. Autophagy, 2023, 19(1): 92-111. DOI:10.1080/15548627.2022.2062887. LI J, SIMMONS A J, HAWKINS C V, et al. Identification and multimodal characterization of a specialized epithelial cell type associated with Crohn’s disease[J/OL]. Nature Communications, 2024, 15(1): 7204. DOI:10.1038/s41467-024-51580-7. XIAO X, YEOH B S, VIJAY-KUMAR M. Lipocalin 2: An Emerging Player in Iron Homeostasis and Inflammation[J/OL]. Annual Review of Nutrition, 2017, 37(1): 103-130. DOI:10.1146/annurev-nutr-071816-064559. WU D, WANG X, HAN Y, et al. The effect of lipocalin-2 (LCN2) on apoptosis: a proteomics analysis study in an LCN2 deficient mouse model[J/OL]. BMC Genomics, 2021, 22(1): 892. DOI:10.1186/s12864-021-08211-y. YANG Y, LI S, LIU K, et al. Lipocalin-2-mediated intestinal epithelial cells pyroptosis via NF-κB/NLRP3/GSDMD signaling axis adversely affects inflammation in colitis[J/OL]. Biochimica et Biophysica Acta (BBA) - Molecular Basis of Disease, 2024, 1870(7): 167279. DOI:10.1016/j.bbadis.2024.167279. HUANG Q, XING J, LI G, et al. LCN2 regulates the gut microbiota and metabolic profile in mice infected with Mycobacterium bovis [J/OL]. mSystems, 2024, 9(8): e00501-24. DOI:10.1128/msystems.00501-24. THEN A, GOENAWAN H, LESMANA R, et al. Exploring the potential regulation of DUOX in thyroid hormone‑autophagy signaling via IGF‑1 in the skeletal muscle (Review)[J/OL]. Biomedical Reports, 2024, 22(3): 39. DOI:10.3892/br.2024.1917. GRASBERGER H, MAGIS A T, SHENG E, et al. DUOX2 variants associate with preclinical disturbances in microbiota-immune homeostasis and increased inflammatory bowel disease risk[J/OL]. Journal of Clinical Investigation, 2021, 131(9): e141676. DOI:10.1172/JCI141676. KASUMBA D M, HUOT S, CARON E, et al. DUOX2 regulates secreted factors in virus‐infected respiratory epithelial cells that contribute to neutrophil attraction and activation[J/OL]. The FASEB Journal, 2023, 37(2): e22765. DOI:10.1096/fj.202201205R. GEERDINK R J, PILLAY J, MEYAARD L, et al. Neutrophils in respiratory syncytial virus infection: A target for asthma prevention[J/OL]. Journal of Allergy and Clinical Immunology, 2015, 136(4): 838-847. DOI:10.1016/j.jaci.2015.06.034. PETERS C, NICHOLAS A K, SCHOENMAKERS E, et al. DUOX2 / DUOXA2 Mutations Frequently Cause Congenital Hypothyroidism that Evades Detection on Newborn Screening in the United Kingdom[J/OL]. Thyroid, 2019, 29(6): 790-801. DOI:10.1089/thy.2018.0587. BURGUEÑO J F, FRITSCH J, GONZÁLEZ E E, et al. Epithelial TLR4 Signaling Activates DUOX2 to Induce Microbiota-Driven Tumorigenesis[J/OL]. Gastroenterology, 2021, 160(3): 797-808.e6. DOI:10.1053/j.gastro.2020.10.031. SIMMS L A, DOECKE J D, WALSH M D, et al. Reduced -defensin expression is associated with inflammation and not NOD2 mutation status in ileal Crohn’s disease[J/OL]. Gut, 2008, 57(7): 903-910. DOI:10.1136/gut.2007.142588. ANGRIMAN I, BORDIGNON G, KOTSAFTI A, et al. Innate Immunity Activation in Newly Diagnosed Ileocolonic Crohn’s Disease: A Cohort Study[J/OL]. Diseases of the Colon & Rectum, 2024[2025-03-11]. https://journals.lww.com/10.1097/DCR.0000000000003145. DOI:10.1097/DCR.0000000000003145. GAUDINO S J, BEAUPRE M, LIN X, et al. IL-22 receptor signaling in Paneth cells is critical for their maturation, microbiota colonization, Th17-related immune responses, and anti-Salmonella immunity[J/OL]. Mucosal Immunology, 2021, 14(2): 389-401. DOI:10.1038/s41385-020-00348-5. DEGRUTTOLA A K, LOW D, MIZOGUCHI A, et al. Current Understanding of Dysbiosis in Disease in Human and Animal Models:[J/OL]. Inflammatory Bowel Diseases, 2016, 22(5): 1137-1150. DOI:10.1097/MIB.0000000000000750. ENG S J, NONNECKE E B, DE LORIMIER A J, et al. FOXO inhibition rescues α-defensin expression in human intestinal organoids[J/OL]. Proceedings of the National Academy of Sciences, 2023, 120(47): e2312453120. DOI:10.1073/pnas.2312453120. MOSCHEN A R, TILG H, RAINE T. IL-12, IL-23 and IL-17 in IBD: immunobiology and therapeutic targeting[J/OL]. Nature Reviews Gastroenterology & Hepatology, 2019, 16(3): 185-196. DOI:10.1038/s41575-018-0084-8. SCHMITT H, NEURATH M F, ATREYA R. Role of the IL23/IL17 Pathway in Crohn’s Disease[J/OL]. Frontiers in Immunology, 2021, 12: 622934. DOI:10.3389/fimmu.2021.622934. DENG Z, WANG S, WU C, et al. IL-17 inhibitor-associated inflammatory bowel disease: A study based on literature and database analysis[J/OL]. Frontiers in Pharmacology, 2023, 14: 1124628. DOI:10.3389/fphar.2023.1124628. NANKI K, FUJII M, SHIMOKAWA M, et al. Somatic inflammatory gene mutations in human ulcerative colitis epithelium[J/OL]. Nature, 2020, 577(7789): 254-259. DOI:10.1038/s41586-019-1844-5. SHAPOURI‐MOGHADDAM A, MOHAMMADIAN S, VAZINI H, et al. Macrophage plasticity, polarization, and function in health and disease[J/OL]. Journal of Cellular Physiology, 2018, 233(9): 6425-6440. DOI:10.1002/jcp.26429. YUNNA C, MENGRU H, LEI W, et al. Macrophage M1/M2 polarization[J/OL]. European Journal of Pharmacology, 2020, 877: 173090. DOI:10.1016/j.ejphar.2020.173090. APONTE-LÓPEZ A, MUÑOZ-CRUZ S. Mast Cells in the Tumor Microenvironment[M/OL]//BIRBRAIR A. Tumor Microenvironment: page 1273. Cham: Springer International Publishing, 2020: 159-173[2025-03-19]. http://link.springer.com/10.1007/978-3-030-49270-0_9. DOI:10.1007/978-3-030-49270-0_9. XIE Z, NIU L, ZHENG G, et al. Single-cell analysis unveils activation of mast cells in colorectal cancer microenvironment[J/OL]. Cell & Bioscience, 2023, 13(1): 217. DOI:10.1186/s13578-023-01144-x. LIEW P X, KUBES P. The Neutrophil’s Role During Health and Disease[J/OL]. Physiological Reviews, 2019, 99(2): 1223-1248. DOI:10.1152/physrev.00012.2018. PAPAYANNOPOULOS V. Neutrophils Stepping Through (to the Other Side)[J/OL]. Immunity, 2018, 49(6): 992-994. DOI:10.1016/j.immuni.2018.12.006. DANNE C, SKERNISKYTE J, MARTEYN B, et al. Neutrophils: from IBD to the gut microbiota[J/OL]. Nature Reviews Gastroenterology & Hepatology, 2024, 21(3): 184-197. DOI:10.1038/s41575-023-00871-3. YOKOI T, MURAKAMI M, KIHARA T, et al. Identification of a unique subset of tissue-resident memory CD4 + T cells in Crohn’s disease[J/OL]. Proceedings of the National Academy of Sciences, 2023, 120(1): e2204269120. DOI:10.1073/pnas.2204269120. ROMAN-NARANJO P, PARRA-PEREZ A M, LOPEZ-ESCAMEZ J A. A systematic review on machine learning approaches in the diagnosis and prognosis of rare genetic diseases[J/OL]. Journal of Biomedical Informatics, 2023, 143: 104429. DOI:10.1016/j.jbi.2023.104429. Cite Share Download PDF Status: Published Journal Publication published 19 Aug, 2025 Read the published version in Journal of Translational Medicine → Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6286485","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":436882388,"identity":"9da4f231-b8e5-4c90-9074-0e02f87f8bb6","order_by":0,"name":"Yan Chaosheng","email":"","orcid":"","institution":"Jiangnan University Wuxi Medical College: Jiangnan University Wuxi School of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Yan","middleName":"","lastName":"Chaosheng","suffix":""},{"id":436882389,"identity":"402c4a28-8543-4c84-9acd-ab305c37eaf7","order_by":1,"name":"Rao jingjing","email":"","orcid":"","institution":"Jiangnan University Wuxi Medical College: Jiangnan University Wuxi School of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Rao","middleName":"","lastName":"jingjing","suffix":""},{"id":436882390,"identity":"a18b31c3-c8f4-4780-903e-e875c19a3cbb","order_by":2,"name":"Dai yuanyuan","email":"","orcid":"","institution":"Affiliated Hospital of Jiangnan University","correspondingAuthor":false,"prefix":"","firstName":"Dai","middleName":"","lastName":"yuanyuan","suffix":""},{"id":436882391,"identity":"c3b72a90-9a22-4639-86a7-4e19029144e4","order_by":3,"name":"Duan wenhui","email":"","orcid":"","institution":"Affiliated Hospital of Jiangnan University","correspondingAuthor":false,"prefix":"","firstName":"Duan","middleName":"","lastName":"wenhui","suffix":""},{"id":436882392,"identity":"f8940b61-f0f3-4a6b-8f54-d6ca35d7cde9","order_by":4,"name":"Sun haowen","email":"","orcid":"","institution":"Affiliated Hospital of Jiangnan University","correspondingAuthor":false,"prefix":"","firstName":"Sun","middleName":"","lastName":"haowen","suffix":""},{"id":436882393,"identity":"b8d4d86e-a6fc-4ba4-9f24-046c8086317f","order_by":5,"name":"Sheng yingyue","email":"","orcid":"","institution":"Affiliated Hospital of Jiangnan University","correspondingAuthor":false,"prefix":"","firstName":"Sheng","middleName":"","lastName":"yingyue","suffix":""},{"id":436882394,"identity":"1233579a-7634-4bfb-9566-19e5074589c2","order_by":6,"name":"xue yuzheng","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAwElEQVRIiWNgGAWjYNACAwkefmbmww9I0FJhIyPZzpZmQIKWM2k2Bud5FCSIc9KN5GOSP9sO8xgf5mEwYKixiSaoRXJGWpqEJFCL2WHeAw8YjqXlNhDSwi+RYyZhCNbCl2DA2HCYsBY2ifxvEokghzXzGEgQpQVoC5vEgTNpPAbMxGqR7HlmbNlQYcMjcRgYyAnE+MXgePLDmz8MJOz5+w8ffvChxoawFiBgQURHAhHKQYD5A5EKR8EoGAWjYKQCAE9sOk2bH4klAAAAAElFTkSuQmCC","orcid":"","institution":"Affiliated Hospital of Jiangnan University","correspondingAuthor":true,"prefix":"","firstName":"xue","middleName":"","lastName":"yuzheng","suffix":""}],"badges":[],"createdAt":"2025-03-23 04:53:20","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6286485/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6286485/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1186/s12967-025-06838-z","type":"published","date":"2025-08-19T16:29:36+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":81120386,"identity":"fd97790a-8a2c-4e99-9aee-0dbb2d7d17c2","added_by":"auto","created_at":"2025-04-22 12:41:58","extension":"jpeg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":270750,"visible":true,"origin":"","legend":"\u003cp\u003eFlowchart of this study\u003c/p\u003e","description":"","filename":"floatimage1.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-6286485/v1/2901856657678772e1749046.jpeg"},{"id":81120619,"identity":"a2cdbcb0-7751-4bb2-893f-6f63aa6280a2","added_by":"auto","created_at":"2025-04-22 12:49:59","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":4063568,"visible":true,"origin":"","legend":"\u003cp\u003eAnalyze the differences in gene expression between different datasets and the effectiveness of batch effect correction. (A) The distribution of samples from different experiments before batch calibration; (B) PCA analysis involves randomly shuffling samples from different experiments; (C) Heat map of DEG in IBD dataset. Red and blue represent upregulated and downregulated DEGs, respectively; (D) Volcanic map of DEGs in IBD dataset| log2FC|\u0026gt; 2。 Red indicates an increase, blue indicates a decrease\u003c/p\u003e","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-6286485/v1/9c5439c3149ca7303edf73be.png"},{"id":81119456,"identity":"5de33960-d870-49fc-9a9e-cbb846742f0f","added_by":"auto","created_at":"2025-04-22 12:33:58","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":2213698,"visible":true,"origin":"","legend":"\u003cp\u003eThis set of images analyzes differentially expressed genes, demonstrating different pathways and functional classifications. (A-B) KEGG pathway analysis shows that as the P value decreases, the color becomes redder. (E-F) Go analysis, including biological processes, cellular components, and molecular functions. As the P value decreases, the color becomes redder. (C, D, G, H) represent pathways enriched with differentially expressed genes, and the size of the dots indicates the number of differentially expressed genes contained in the corresponding pathways. The larger the number, the larger the dots.\u003c/p\u003e","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-6286485/v1/7db31fa7555ce0477256823f.png"},{"id":81120392,"identity":"689eedb3-f193-418d-a9e9-33d502990ede","added_by":"auto","created_at":"2025-04-22 12:41:59","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":3309306,"visible":true,"origin":"","legend":"\u003cp\u003eScreening differential genes through various methods and visualizing the differential results. (A, E) Distinguish the experimental group from the control group using ANN. Construct ROC curve to evaluate overall diagnostic performance. The (B, F) genes were screened using the LASSO algorithm. In order to obtain the optimal model, a 10 fold cross validation method was used. The lowest gene number n=11 at the lowest point of the curve is most suitable for LASSO. (C, G) By using SVM algorithm to screen genes, accurate graphs and cross validation error graphs were obtained, and 8 disease characteristic genes were screened. (D, H) Screening genes through random forest algorithm. Identify 10 important genes from Random Forest. IncNodePurity sorts genes based on their relative importance. (I) The intersection of three algorithms yields four intersection genes. (J) Volcanic diagram of differentially expressed genes, with red representing upregulation. Green represents a decrease. (K) Box plot of differentially expressed genes, with the horizontal axis representing the names of intersecting characteristic genes and the vertical axis representing the expression levels of genes. Blue is the sample of the control group, and red is the sample of the experimental group. (L) The outermost circle of the chromosome diagram represents the chromosome number, and the second circle represents the shape of the chromosome. Label the names of genes with intersecting differences at the corresponding positions on the chromosome.\u003c/p\u003e","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-6286485/v1/0b2d81d1e08a7ac1b20a1605.png"},{"id":81119465,"identity":"9c396447-bdba-45a4-ac56-d7fd1e37549e","added_by":"auto","created_at":"2025-04-22 12:33:59","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":2643660,"visible":true,"origin":"","legend":"\u003cp\u003ePredicting samples using machine learning models and performing SHAP analysis on the machine learning models. (A) Ten machine learning models are used to construct ROC curves to evaluate overall diagnostic performance. (B) Bar chart, with the vertical axis representing the gene name and the horizontal axis representing the mean absolute value of SHAP value. The larger the value, the more likely it is to indicate the gene. The greater the impact on the predicted results. (C) Strive to present the predicted results of a single sample. Firstly, find the benchmark value, and then for each gene, we can obtain a predicted result. (D) The bee colony plot, with the vertical axis representing gene names and the horizontal axis representing SHAP values, allows us to obtain the mean SHAP value for each gene. The larger the value, the greater the contribution of that gene. Each dot here represents a sample, the color of the dot represents the gene expression level, and purple represents low expression. Orange represents high expression. (E) Waterfall chart, displaying the predicted results of a single sample. In this graph, the vertical axis represents the gene expression level, and the horizontal axis represents the predicted value. The larger the absolute value of the value, the greater the impact of this gene on the predicted results. (F) Scatter plot, where the horizontal axis represents the expression level of one gene and the vertical axis represents the SHAP value. The dots represent the expression level of another gene, purple represents low expression, and orange represents high expression. The interaction relationship between these two genes and SHAP values can be observed.\u003c/p\u003e","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-6286485/v1/6f95818f68dbe04ef2d30230.png"},{"id":81120390,"identity":"510d7c3d-dcfe-42d8-a430-43aee47996c6","added_by":"auto","created_at":"2025-04-22 12:41:59","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":5048254,"visible":true,"origin":"","legend":"\u003cp\u003eVisualize the results of target gene analysis using GSEA/GSVA. (Figures A, B, D, E, F, H, J, K). GSEA bar chart, with the horizontal axis representing sorted genes and the vertical axis representing enriched scores, visualizes the top five pathways with the most significant enrichment. (Figures C, F, I, L). GSVA bar chart, the vertical axis represents pathways, the horizontal axis represents T-test values, the red represents upregulation in the target gene, and the green represents downregulation in the target gene. Gray represents no difference in the target gene. (Figures A-C) LOC389023, (Figures D-F) DUOX2, (Figures G-I) LCN2, (Figures J-L) DEFA6.\u003c/p\u003e","description":"","filename":"floatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-6286485/v1/46aa4ce0036ee52f01fee482.png"},{"id":81119472,"identity":"d1746c56-73f4-434f-84d2-85df49a3fbe8","added_by":"auto","created_at":"2025-04-22 12:33:59","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":3465988,"visible":true,"origin":"","legend":"\u003cp\u003eImmune related analysis results. (A) The relationship between immune cells and target genes, *: P<0.05,**:P<0.01,***:P<0.001。 (B) Obtain the content of immune cells in each sample through immune cell infiltration analysis. Visualize the results of immune cell recordings to obtain a bar chart, where the horizontal axis represents the sample and the vertical axis represents the content of immune cells. The sum of all immune cells is one. Different colors represent different immune cells. (C) The horizontal and vertical axes of the graph represent the names of immune cells. The values inside represent the correlation coefficient, with red representing positive correlation and green representing negative correlation. (D) Box plot of differences, with the horizontal axis representing the names of immune cells and the vertical axis representing the content of immune cells. Green represents the samples of the control group, and red represents the samples of the experimental group. *: P<0.05,**:P<0.01,***:P<0.001。\u003c/p\u003e","description":"","filename":"floatimage7.png","url":"https://assets-eu.researchsquare.com/files/rs-6286485/v1/98394b9f4cc8811a4c76f507.png"},{"id":81120391,"identity":"98ba9527-0212-41a9-b57d-5707b9d11eed","added_by":"auto","created_at":"2025-04-22 12:41:59","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":3262113,"visible":true,"origin":"","legend":"\u003cp\u003eCorrelation analysis between differentially expressed genes and immune cells. (A)LOC389023、(B)DUOX2、(C)LCN2、(D)DEFA6。 Correlation lollipop chart, where the vertical axis represents the names of immune cells, the horizontal axis represents the correlation coefficient, the size of the circle represents the absolute value of the correlation coefficient, and the color of the circle represents the P-value of the correlation test. Scatter plot, where the horizontal axis represents the expression level of the target gene, the vertical axis represents the content of immune cells, the R value represents the correlation coefficient, and the P value represents statistical validity.\u003c/p\u003e","description":"","filename":"floatimage8.png","url":"https://assets-eu.researchsquare.com/files/rs-6286485/v1/c707de194c8478d07800f0f3.png"},{"id":89847346,"identity":"1977f773-bec2-43bf-a5c3-ab49188de872","added_by":"auto","created_at":"2025-08-25 16:43:16","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":5824489,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6286485/v1/2c8e67c8-9d58-4655-b6e9-4fd5d3937ffa.pdf"}],"financialInterests":"","formattedTitle":"Construction of a Feature Gene and Machine Prediction Model for Inflammatory Bowel Disease Based on Multi - Chip Joint Analysis","fulltext":[{"header":"Introduction","content":"\u003cp\u003eInflammatory bowel disease (IBD) is predominantly classified into ulcerative colitis (UC) and Crohn's disease (CD). Ever since IBD emerged in the 20th century, its incidence and prevalence have witnessed a remarkable increase over the past few decades. Similar to other immune - related disorders, the incidence of IBD has grown in tandem with industrialization and urbanization. Based on the 2019 Global Burden of Disease (GBD) findings, the estimated prevalence of IBD impacts around 5\u0026nbsp;million individuals, with approximately 400,000 new cases reported each year. Moreover, age and gender are crucial factors influencing the incidence and prevalence of IBD [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. For instance, 25% of cases manifest during childhood, and the incidence rate keeps climbing. Although the incidence of IBD appears to have stabilized in the Western world, its prevalence continues to surge. This is because IBD commonly affects young individuals with relatively low mortality rates, and currently, there is no curative treatment available [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e, \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThe rising prevalence of Inflammatory Bowel Disease (IBD) emphasizes the need to gain a deeper understanding of its molecular mechanisms to develop targeted therapies and diagnostic tools. Currently, in clinical practice, there are no biomarkers capable of accurately predicting the disease course or treatment response. This is mainly attributed to the complex molecular basis of IBD and variations in immune responses.\u003c/p\u003e \u003cp\u003eIn reality, C - reactive protein (CRP), erythrocyte sedimentation rate (ESR), fecal biomarkers, and calprotectin are often regarded as essential diagnostic tools for IBD. However, in practical applications, these so - called biomarkers have certain limitations. For example, fecal biomarkers have poor accuracy, ESR is easily affected by multiple factors, and CRP production shows increased heterogeneity.\u003c/p\u003e \u003cp\u003eIn recent years, with the data collected from genomic research, the relationship between specific genes and the etiology of IBD has been widely explored. As mentioned earlier, the association between the NOD2 gene and Crohn's disease (CD) has been well - established. It has become a key predictive factor and is associated with an increased risk of disease complications [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. Additionally, individuals with genetic variability in the PRDM1 and NDP52 genes are more susceptible to CD. Genes such as KIF9 - AS1, LINC01272, and DIO3OS have proven useful in differentiating and detecting various types of IBD. Regrettably, no research has yet been able to clearly explain the action pathways of pathogenic genes and the immune cells affected by them.\u003c/p\u003e \u003cp\u003ePrevious studies have indicated that IBD results from the interaction of immune responses, genetic factors, and microbiota [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. Although the exact causes of IBD remain unclear, multiple interrelated factors from genetics, the immune system, microbiota, and the environment all play a role in the development of IBD [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. Immune dysfunction can trigger persistent inflammation, a characteristic feature of IBD. This leads to the reduction or destruction of intestinal crypts, along with a series of severe clinical manifestations and complications, significantly deteriorating the quality of life of patients [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eOur study revealed that, compared to the normal control group, patients with Inflammatory Bowel Disease (IBD) exhibited differential expression of several genes. These genes were predominantly enriched in pathways associated with inflammatory and immune responses, which is consistent with previous research findings [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. Besides the differences in gene expression, we also discovered that the diagnosis of IBD could be reflected by changes in immune cells.We propose that differential gene expression serves as the initiating factor of IBD. Through several potential mechanisms, it acts on immune cells, leading to significant variations in their expression levels and quantities. Firstly, differential gene expression might trigger excessive activation or inhibition of pathogenic molecular pathways in IBD patients, causing the release of an excessive amount of inflammatory factors. Secondly, the overexpression of inflammatory factors disrupts the immune response in patients. Eventually, the disordered immune response gives rise to differential expression of immune cell levels in the patient's body.In conclusion, differential gene expression plays a pivotal role in the initiation and progression of IBD and can be regarded as a key element in the development of novel diagnostic and therapeutic strategies for IBD. To explore this further, we employed multi - chip joint analysis, artificial neural networks (ANN), and machine learning algorithms to identify significantly differentially expressed genes. We then utilized the SHAP model to illustrate the contribution of these differentially expressed genes to the diagnosis. Finally, we performed correlation and differential analyses on immune cells to identify the differences in gene expression among various immune cells.\u003c/p\u003e"},{"header":"Materials and Methods","content":"\u003cp\u003eData Source, Preprocessing, and Analysis:\u003c/p\u003e \u003cp\u003eIn this study, we conducted a comprehensive analysis of multiple databases, including GEO, IBDDMB, and UKB. Specifically, we retrieved disease datasets (GSE87466, GSE179285, and GSE87473) from the GEO database. These datasets encompassed data from 438 patients with inflammatory bowel disease and 51 healthy individuals. The patient biopsy data were sourced from the sigmoid colon, ascending/descending colon, and terminal ileum. The patient data included cases of moderate to severe active ulcerative colitis.We utilized R software (version 4.3.3) for data preparation. During the preprocessing stage, we removed probes corresponding to multiple genes and converted probe IDs into gene symbols using the annotation file of the platform. When dealing with multiple probes for the same gene, we retained only the probe with the highest signal value. To ensure data consistency and reliability, we took measures to reduce the potential impact of batch effects, which are commonly introduced during the data integration process. Batch effects can occur due to variations in experimental conditions, instruments, or sample processing over time or across different datasets, and they may severely confound the interpretation of results. Thus, we employed the \"limma\", \"pheatmap\", and \"ggplot2\" packages to calibrate the data for different groups (diseased and non - diseased). Additionally, the \"pheatmap\" and \"ggplot2\" packages were used to generate heatmaps and volcano plots of differentially expressed genes (DEGs), respectively. The data were analyzed using log2 transformation.\u003c/p\u003e \u003cp\u003eTranscriptome Data Refinement and Analysis Process:\u003c/p\u003e \u003cp\u003eWe utilized supplementary probe annotation files to convert the expression matrix from the probe level to the gene level. For genes associated with multiple probes, the arithmetic mean of the corresponding probe values was employed to represent gene expression. Following this conversion, we standardized the dataset and then applied the SVA package for batch - effect correction. Principal component analysis (PCA) was utilized to assess the success of the standardization process.To identify the differentially expressed genes between Inflammatory Bowel Disease and control samples, we made use of the limma package (linear model of microarray data). Differentially expressed genes were defined as those with an absolute logarithmic fold change (|log FC|) greater than 2 and an adjusted p - value less than 0.05. Particular emphasis was placed on genes that might be related to immune infiltration in IBD patients and normal individuals.\u003c/p\u003e \u003cp\u003eEnrichment Analysis:\u003c/p\u003e \u003cp\u003eTo clarify the biological significance and pathway associations of differentially expressed genes (DEGs), we carried out comprehensive Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses. In the R programming environment, we systematically explored the impacts of differentially expressed protein - modifying genes (PMGs) on key biological processes (BP), molecular functions (MF), and cellular components (CC).We employed R software for this analysis, utilizing tools like the “clusterProfiler” and “org” packages, along with “Hs.eg.db”, “enrichplot”, and “ggplot2” packages. With a focus on KEGG pathway data, we enriched our interpretation. This integrated approach allows us to grasp the complexity of the molecular landscape related to Inflammatory Bowel Disease and the control group, offering a comprehensive framework for further exploration of the underlying mechanisms.\u003c/p\u003e \u003cp\u003eArtificial Neural Network:\u003c/p\u003e \u003cp\u003eArtificial neural network (ANN) models are playing an increasingly crucial role in the realm of predictive modeling. This is because they are capable of capturing nonlinear relationships within high - dimensional datasets [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. ANN models can predict complex variable relationships that other models, such as logistic regression models, are unable to achieve.The working principle of an ANN model is inspired by biological neural networks. In an ANN, each neuron is interconnected with other neurons. Neurons have two main components: dendrites and axons. Dendrites function as receivers of information, while axons serve as transmitters. The nucleus of a neuron holds the information to be transmitted.An ANN typically consists of an input layer, one or more hidden layers, and an output layer. Information enters the model through the input layer, undergoes processing in the hidden layer(s), and is then output via the output layer [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e].In our study, we utilized R software for correlation analysis. To enhance the intuitiveness of the data, we employed packages like “neuralnet” and “NeuralNetTools” to generate relevant graphics.\u003c/p\u003e \u003cp\u003eMachine Learning Algorithms:\u003c/p\u003e \u003cp\u003eTo identify candidate genes, we employed the “VennDiagram” package to visualize the intersection of key genes among differentially expressed genes (DEGs). Three machine learning algorithms, namely the Least Absolute Shrinkage and Selection Operator (LASSO), Support Vector Machine (SVM), and Random Forest (RF), were utilized to identify potential biomarkers.For the LASSO analysis, we used the “glmnet” package with a penalty parameter and 10 - fold cross - validation. This approach was applied to select important variables from high - dimensional data.RF is an ensemble estimator that consists of multiple decision trees as its basic estimators. In the classification process of RF, each tree determines a category, and the category receiving the highest number of votes is designated as the final output.SVM consider each predictor as a dimension in a high - dimensional space. SVM aims to find the optimal hyperplane for classifying samples, and it demonstrates excellent performance when dealing with highly complex data.Finally, the genes at the intersection of the results from the LASSO, RF, and SVM algorithms were identified as potential biomarkers for IBD.\u003c/p\u003e \u003cp\u003eExplanation of Machine Learning Models:\u003c/p\u003e \u003cp\u003eTen machine learning models were constructed using the selected predictive factors. These models included Ridge Least Squares (RLS), RF, Decision Tree (DTS), SVM, Logistic Regression, K-Nearest Neighbors (KNN), Extreme Gradient Boosting (XGBoost), Gradient Boosting Machine (GBM), Neural Network, and Generalized Linear Model Boosting (GlmBoost).Some of these models possess high interpretability and are well - suited for analyzing linear relationships. Others are capable of capturing non - linear relationships and interactions, making them suitable for handling high - dimensional data. For instance, XGBoost is an optimized version of the gradient boosting framework. It supports parallel computing and can automatically handle missing values, showing excellent performance in clinical prediction tasks.We evaluated these models based on several metrics, such as the receiver operating characteristic (ROC) curve, specificity, sensitivity, and accuracy, with the ROC curve serving as the primary evaluation indicator. The model that exhibited the best predictive performance was chosen as the main model for this study.To further understand the interpretability of the final prediction model, we employed the Shapley Sum Interpretation (SHAP) method [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e]. This method helps to explain how each feature contributes to the model's prediction, providing valuable insights into the underlying mechanisms of the model.\u003c/p\u003e \u003cp\u003eGene Set Enrichment Analysis and Gene Set Variation Analysis:\u003c/p\u003e \u003cp\u003eGene Set Enrichment Analysis (GSEA) is a powerful and widely - used approach in genomics research. It aims to uncover biological pathways, functions, or molecular features associated with distinct phenotypes or experimental conditions. In this study, we downloaded the immunological signature gene set (c7: immunological signature gene sets) and conducted GSEA analysis on all genes within the immune cell cluster (version 1.64.0).Conversely, Gene Set Variation Analysis (GSVA) offers a unique way to represent gene set variation within a sample. It achieves this by converting gene expression data into gene set activity scores, without the need for data ranking. To perform GSVA, we calculated the average expression value of genes in each cell cluster using the immune - related gene set (h: hallmark gene set) (version 1.50.0).Finally, we visually presented the results of the GSVA analysis. This visualization helps to better understand the differences and trends in gene set activities, providing valuable insights into the underlying biological mechanisms related to the immune cell clusters.\u003c/p\u003e \u003cp\u003eImmune Cell Infiltration and Correlation Analysis:\u003c/p\u003e \u003cp\u003eThe CIBERSORT algorithm was employed to assess the relative abundance of immune cells in each normal and IBD sample.Subsequently, Spearman correlation analysis was carried out to determine the relationship between potential biomarkers and immune cells. Additionally, a differential analysis of immune cells was conducted for each gene individually.\u003c/p\u003e \u003cp\u003eData Analysis:\u003c/p\u003e \u003cp\u003eEach experiment was performed at least three times. The ROC curve was established, and the area under the curve (AUC) and 95% confidence interval (CI) values were calculated and validated using SPSS software. The statistical significance between the two groups was determined via the Student's t - test. The results were further analyzed using GraphPad Prism version 7 software. A P - value less than 0.05 was considered to indicate significant statistical significance.\u003c/p\u003e "},{"header":"Result","content":"\u003cp\u003e1. PCA data correction\u003c/p\u003e\u003cp\u003eFigure \u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e shows the flowchart of this study. Before batch calibration, samples from different experiments are separated, and there is a batch effect between them. Through PCA analysis, samples from different experiments were randomly shuffled to eliminate batch effects (Fig.\u0026nbsp;2A, 2B).\u003c/p\u003e\u003cp\u003e2.Limma DEGs\u003c/p\u003e\u003cp\u003eThe Limma method was used to identify differentially expressed genes (DEGs) in the GSE87466, GSE179285, and GSE87473 datasets, with 12 upregulated and 5 downregulated (Fig.\u0026nbsp;2C, 2D).\u003c/p\u003e\u003cp\u003e3. Enrichment Analysis Results\u003c/p\u003e\u003cp\u003eThe enrichment analysis of 17 candidate diagnostic genes revealed that they are predominantly associated with immune response, microbial factors, and inflammatory response, underscoring their crucial role in the pathogenesis and progression of IBD.KEGG analysis indicated that these genes were significantly enriched in categories such as Adipocytokine Secretion, Bile Acid - related Pathways, FoxO Signaling Pathway, IL − 17 Signaling Pathway, Alcohol - related Cancer - associated Receptor - Cytochrome Interactions, and AMPK - Glucagon - Leukocyte - Staphylococcus aureus - related Pathways (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e3\u003c/span\u003eA - D).In terms of biological processes, GO analysis emphasized the involvement of these candidate genes in areas like Antimicrobial Activity, Cellular Responses to Toxic cAMP - related Substances, etc. (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e3\u003c/span\u003eE - H). Moreover, molecular functional analysis showed that Response to Lipolysis was a significant category among these genes.\u003c/p\u003e\u003cp\u003e4. ANN Prediction Results\u003c/p\u003e\u003cp\u003eThe ANN employed in this study consists of three layers: the input layer, the hidden layer, and the output layer. In the input layer, disease features and genes are assigned scores. Subsequently, based on these scores, as well as the weights associated with disease features and genes, the hidden layer is generated. There are five nodes in the hidden layer, and using these nodes and their respective weights, the output layer is derived. The output layer represents the properties of the sample.Regrettably, the ANN we designed achieved an accuracy rate of only 21% when predicting the control group, while it reached 95.2% for predicting the experimental group. To evaluate the overall diagnostic performance, we constructed receiver operating characteristic (ROC) curves. The results indicated that the accuracy of sample prediction using the ANN was 93.7% (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e4\u003c/span\u003eA, Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e4\u003c/span\u003eE).\u003c/p\u003e\u003cp\u003e5. Selection Results of Characteristic Genes\u003c/p\u003e\u003cp\u003eWe utilized LASSO regression to generate cross - validation graphs for identifying the characteristic genes of 11 diseases (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e4\u003c/span\u003eB, Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e4\u003c/span\u003eF). Employing the SVM method, we obtained accurate graphs and cross - validation error graphs, through which 8 disease - characteristic genes were screened out (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e4\u003c/span\u003eC, Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e4\u003c/span\u003eG). By means of the RF method, a scoring map of forest trees and gene importance was obtained, and 10 characteristic genes of diseases were selected (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e4\u003c/span\u003eD, Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e4\u003c/span\u003eH).A Venn diagram was used to display the intersection of genes selected by the three machine - learning algorithms. As a result, a total of four intersecting genes were identified (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e4\u003c/span\u003eI).\u003c/p\u003e\u003cp\u003e6. Visualization results\u003c/p\u003e\u003cp\u003eDraw a box plot of differentially expressed genes, and combine it with a volcano plot of differentially expressed genes to observe that DUOX2, LCN2, and DEFA6 are upregulated in the experimental group. LOC389023 was downregulated in the experimental group. Through the chromosome circle diagram, we can observe the intersection characteristic genes and their distribution on the chromosome (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e4\u003c/span\u003eJ-L).\u003c/p\u003e\u003cp\u003e7. Machine learning model results\u003c/p\u003e\u003cp\u003eUsing ten machine learning models including RLS, RF, DTS, SVM, Logistic, KNN, X GBoost, GBM, Neural Net, and GlmBoost, the diagnostic efficiency of four IBD differentially expressed genes was evaluated through 10 fold cross validation training control parameters. After finding the optimal diagnostic model, the Type column of the training set was converted into binary labels, and the model was retrained using the optimal method. Construct ROC curve to evaluate overall diagnostic performance (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e5\u003c/span\u003eA).\u003c/p\u003e\u003cp\u003e8. SHAP analysis results\u003c/p\u003e\u003cp\u003eBy analyzing the predictive performance of machine learning, bar charts, bee colony plots, scatter plots, waterfall plots, and force plots can be obtained. Bar chart, the larger the value, the greater the impact of this gene on the prediction results. The bee colony plot obtains the mean SHAP value of each gene, which represents the contribution to the machine learning model. A scatter plot is a three-dimensional graph that allows observation of the interaction between genes and SHAP values. The waterfall plot can display the predicted results of a single sample, and the larger the absolute value of the value, the greater the impact of this gene on the predicted results. Strive to display the predicted results of a single sample. Firstly, find the benchmark value, and then for each gene, we can obtain a predicted result (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e5\u003c/span\u003eB-F). In summary, through SHAP analysis, we explained the machine learning model and calculated the contribution size of each gene. In each sample, patients can also be distinguished by gene expression. We can interpret the model by comprehensively analyzing the results of these genes.\u003c/p\u003e\u003cp\u003e9. GSEA and GSVA analysis results\u003c/p\u003e\u003cp\u003eAnalyze which functions or pathways are enriched in the high expression group or low expression group of the target gene through GSEA/GSVA, and visualize the top five pathways with the most significant enrichment (Fig.\u0026nbsp;6A-L).\u003c/p\u003e\u003cp\u003e10. Results of immune cell infiltration analysis\u003c/p\u003e\u003cp\u003eObtain the content of immune cells in each sample through immune cell infiltration analysis, and visualize the results of immune cell infiltration to obtain a bar chart. It is worth noting that the content refers to the relative content, and the sum of all immune cells is one. In the box plot of differences, if there is an asterisk above a certain immune cell, it indicates that the immune cell has differences between the control group and the experimental group. The horizontal and vertical axes of the correlation graph represent the names of immune cells, and the values inside represent the correlation coefficients (Fig.\u0026nbsp;7A-D). In order to investigate the relationship between immune cells and LOC389023, DUOX2, LCN2, and DEFA6, scatter plots were visualized for immune cells with significant correlations. Then, we can visualize the correlation results and obtain correlation lollipop plots (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e8\u003c/span\u003eA-D).\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eOur research offers a comprehensive genetic - based interpretation of the differences in genes, action pathways, and immune responses in IBD. We integrated multi - chip analysis, ANN, and ten machine - learning methods. Additionally, we performed interpretable analysis on the machine - learning model to explore the contribution of differential genes to the diagnosis of IBD.By using differential genes as a benchmark, we investigated the changes of each gene in the immune response and examined the alterations and correlations of immune cells. This allowed us to elaborate on the genetic factors of IBD patients from both genetic and immune - response perspectives. Based on these findings, we developed a predictive model and validated it multiple times using data from the GEO database.\u003c/p\u003e \u003cp\u003eIBD results from the interaction between immune responses and genetic factors, and the impacts of factors such as microorganisms, the environment, and diet cannot be ignored [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]. Currently, the clinical diagnosis of IBD is still restricted by various limitations, and a simple and convenient diagnostic method remains to be developed. The biomarkers identified through our screening can be detected in peripheral blood, enabling an easy assessment of the likelihood of IBD in subjects.\u003c/p\u003e \u003cp\u003eIn our study, we have successfully identified four biomarkers for the diagnosis of IBD, namely LOC389023, DUOX2, LCN2, and DEFA6. These biomarkers play a crucial role in immune function.The ANN we designed exhibited an accuracy rate of merely 21% when predicting the control group, yet it achieved a remarkable 95.2% accuracy rate for predicting the experimental group. By constructing ROC curves to assess the overall diagnostic performance, the accuracy of sample prediction using the ANN reached 93.7%.We utilized these biomarkers to construct machine - learning models. After selecting the optimal model through 10 - fold cross - validation, we transformed the raw data for secondary validation. Among the machine - learning algorithms, the Gradient Boosting Machine (GBM) and K - Nearest Neighbors (KNN) demonstrated the highest diagnostic performance, reaching 95.2%. They were able to effectively distinguish between the experimental and control groups.However, there are numerous uncertainties regarding the diagnostic performance of the ANN. We made multiple attempts to modify the number of hidden layers and conducted small - scale validations on the data. Unfortunately, the results did not meet our expectations. After consulting relevant literature and experts, it is generally hypothesized that gene scoring might have led to data overfitting in the ANN.Nonetheless, we employed various types of machine - learning methods to analyze the data from multiple dimensions and perspectives, which effectively circumvented these issues. To this end, we compared the results of the ANN and machine - learning models, highlighting that machine - learning models are more reliable. Nevertheless, large - scale validation is still required to confirm their generalizability.\u003c/p\u003e \u003cp\u003eThe research findings indicate that LOC389023 plays a vital role in regulating gene expression and chromatin status. LOC389023 is a long non - coding RNA located within chromosome 2q14.1 and the DPP10 gene. In the nuclei of neurons, LOC389023 contains GC - rich stem - ring motifs. These motifs can bind to the SUZ12 protein of the chromatin and the polycomb repressive chromatin - modifying complex 2 (PRC2) complex. By doing so, it recruits chromatin - remodeling inhibitors, leading to a reduction in DPP10 gene expression. This regulatory effect exhibits cell - type specificity. The study also uncovered the relationships among LOC389023, gene expression, histone methylation, and voltage - gated K(+) channels during neural development [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e].DNA methylation and non - coding RNAs have been extensively investigated in patients with Inflammatory Bowel Disease (IBD). DNA methylation is dependent on dietary cofactors such as substrates and nutrients (folate, vitamin B12/D, etc.). It is associated with inflammation, microbiota composition, and microRNAs, which can influence IBD by interfering with T - cell differentiation. Marangoni et al. conducted a comprehensive analysis of the role of DNA methylation in IBD and its impact on the inflammatory process [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e].Furthermore, studies have revealed that voltage - gated K(+) channels, such as KV1.3 and KCa3.1, are involved in K(+) conductance in T lymphocytes. They play a crucial role in cell proliferation, differentiation, apoptosis, and infiltration. In chemically induced IBD model mice, the activity and expression of KCa3.1 in CD4(+) T lymphocytes of mesenteric lymph nodes were increased. Moreover, its regulatory factor NDPK - B showed positive expression. When the KCa3.1 K(+) channel was blocked with TRAM \u0026minus;\u0026thinsp;34 and/or ICA17043, the severity of IBD was significantly reduced. Symptoms such as diarrhea, fecal blood, inflammation, and colonic crypt injury were alleviated. Simultaneously, the expression levels of KCa3.1 and Th1 cytokines in CD4(+) T lymphocytes were restored. This indicates that the abnormality of the KCa3.1 channel is related to the development of IBD, and intervention targeting this channel can improve the disease condition [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e, \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e].In addition, the upregulation of K2P5.1 in T lymphocytes is associated with the pathogenesis of autoimmune diseases. Given that IBD also belongs to this category of diseases and pre - ion channel mRNA splicing is related to the disease, the mRNA splicing mechanism regulated by the K2P5.1 K(+) channel transcription holds guiding significance for the treatment of IBD and other diseases [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eLipocalin 2 (LCN2), a member of the adipokine protein family, is predominantly involved in processes such as cell growth, differentiation, metabolism, and immune response. During inflammation, various cell types, including macrophages, epithelial cells, and neutrophils, secrete LCN2. It then exerts its effects on other cells via the bloodstream or the local tissue microenvironment, thereby influencing inflammatory responses, immune regulation, and other physiological processes.LCN2 is implicated in both acute and chronic inflammation and plays a crucial pathogenic role in diseases such as cancer, diabetes, obesity, and multiple sclerosis [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e, \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e]. Through a series of cell - and tissue - based studies, Xia et al. discovered that LCN2 is key in regulating iron homeostasis and the inflammatory response. LCN2 can interact with iron and iron - carriers in diverse cell types, including immune cells and epithelial cells. By binding to bacterial iron - carriers, it inhibits bacterial growth. Moreover, it can regulate cell survival, apoptosis, and other processes by modulating intracellular iron levels. Under inflammatory conditions, LCN2 stabilizes the iron pool and reduces iron - related toxicity. For instance, in intestinal inflammation, LCN2 restricts the availability of iron in the intestine, safeguarding the mucosa from damage. In kidney diseases, it protects against acute kidney injury, yet in chronic kidney disease, it may exacerbate the disease progression. Additionally, LCN2 has a complex role in tumor cells. It can both promote tumor cell growth and metastasis and potentially inhibit tumor development [\u003cspan additionalcitationids=\"CR21\" citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e].Qun et al. conducted experiments infecting mice with harmful bacteria and found that LCN2 is involved in microbial invasion, the inflammatory response, and tissue damage. It also participates in regulating the balance of the gut microbiota and its metabolites. When the LCN2 gene is absent, significant changes occur in the operational taxonomic units (OTUs), alpha - and beta - diversity of the gut microbiota in mice infected with harmful bacteria. Simultaneously, intestinal metabolites are affected, with increased levels of metabolites such as taurodeoxycholic acid and undecylenic acid. This indicates that LCN2 may modulate the intestinal environment by regulating the composition and metabolite levels of the gut microbiota, thus influencing the host's health and disease resistance [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThere are two types of dual oxidase (DUOX) enzymes, namely DUOX1 and DUOX2. Their primary function is to generate reactive oxygen species (ROS) in tissues such as the thyroid, colon, respiratory tract, and lymphatic system. DUOX significantly contributes to the synthesis of hydrogen peroxide (H₂O₂), a substance that plays a pivotal role in the host defense system. H₂O₂ is involved in processes such as signal transduction, cell differentiation, cell death programs, immune defense, microbial composition regulation, and hormone synthesis (specifically thyroid hormone) [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e, \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e].Helmut carried out a multi - omics whole - phenotype association study (PheWAS) on 2872 participants to analyze the relationship between DUOX2 gene variations and the Inflammatory Bowel Disease (IBD) phenotype. The study revealed that rare variations in the DUOX2 gene were associated with an elevated risk of IBD. Through multi - omics PheWAS and rare - variation association analysis, the link between DUOX2 variations and the pathogenesis of IBD was elucidated [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e].In the context of respiratory diseases, taking respiratory viral infections as an example, Ducquin's research demonstrated that DUOX2 selectively regulates the cytokines and chemokines secreted by epithelial cells. This, in turn, impacts the recruitment, adhesion, and degranulation of neutrophils. It was thus found that DUOX2 plays a crucial role in modulating the interaction between epithelial cells and immune cells [\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e, \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e].In the study of congenital hypothyroidism (CH), DUOX2 mutations have been extensively explored. In a critical CH cohort, 50% of patients carry DUOX2 mutations (38%). These mutations are associated with the patients' biochemical characteristics, influencing the diagnosis and treatment of the disease. Moreover, current screening thresholds may result in missed diagnoses. At the cellular level, the expression of DUOX2 on the cell membrane and its mediation of H₂O₂ production are of utmost importance. When DUOX2 - mediated H₂O₂ production is completely lost, it further impairs the synthesis of thyroid hormones, ultimately leading to the development of CH.From the perspective of signaling pathways, Juan's research indicated that DUOX2 plays a significant role in the development of IBD - related tumors. When the TLR4 signaling pathway is activated in epithelial cells, the expression of DUOX2 is upregulated. DUOX2 then catalyzes the production of H₂O₂, which is closely associated with the initiation and progression of tumors. Additionally, DUOX2 interacts with the microbiota. In this process, the generated H₂O₂ promotes tumor development, affecting the transition from IBD to tumor [\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eDEFA6 is secreted by specialized epithelial cells located at the base of the small intestine crypts, namely Paneth cells, and it represents the most abundant antibacterial agent produced by Paneth cells in the small intestine. Its release into the crypt lumen is thought to safeguard against microbial invasion into the crypt microenvironment [\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e]. As an antimicrobial peptide, DEFA6 plays a crucial part in the intestinal immune defense system by combating pathogens.The author conducted a study on mucosal samples obtained from 88 CD patients who underwent ileocolonic resection. The results showed that while the expression of DEFA6 in the healthy and diseased ileal mucosa of early - and late - stage CD patients did not exhibit significant differences, there was an upward trend in the expression of DEFA6 in the external validation cohort of late - stage CD patients. As the CD disease course progresses, the expression of antimicrobial - peptide - related genes, such as DEFB4A, increases in the affected mucosa. It is hypothesized that persistent mucosal damage may enable intestinal bacteria to interact with epithelial cells, thereby stimulating the expression of antimicrobial peptides. Abundant evidence indicates that alterations in DEFA6 expression are associated with the CD disease course [\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e].Stephen's research revealed that DEFA6, serving as a specific marker for Paneth cells in the small intestine, contributes to intestinal immune defense and the maintenance of the gut microbiota balance [\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e, \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e]. Serena's work confirmed that DEFA6 functions to maintain intestinal immune homeostasis in the Paneth cells of human small - intestine organoids. Studies have shown that Paneth cells can produce antibacterial substances like DEFA6 to preserve gut microbiota balance. However, the expression level of DEFA6 in human small - intestine organoids is extremely low and significantly differs from that in the source tissue. In contrast, mouse small - intestine organoids can more effectively mimic the expression of α - defense factors in tissues. Moreover, the author found that WNT signal stimulation fails to restore the expression of DEFA6 in human small - intestine organoids. Nevertheless, after treatment with FOXO inhibitors, the mRNA expression of DEFA6 increased by over 100,000 - fold, nearly reaching the level of human tissue. This finding indicates that the FOXO signaling pathway is essential for regulating the expression of DEFA6 in human Paneth cells. Inhibiting the FOXO signaling pathway can effectively restore the expression of DEFA6, which holds great significance for the study of intestinal diseases and the enhancement of intestinal immunity [\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThrough GO and KEGG analyses, it was revealed that the differentially expressed genes were predominantly enriched in the antibacterial aspects of the human immune system and the IL \u0026minus;\u0026thinsp;17 signaling pathway.Nicholas conducted a study on 136 IBD patients. The findings indicated that patients with low IgG/G1 levels had poorer clinical survival data compared to those with normal levels. This suggests that humoral immunity plays a pivotal role in the survival of IBD patients. When IBD patients experience compromised humoral immunity, their likelihood of requiring surgery increases. This implies that low IgG/G1 levels have differential impacts on the surgical requirements of different subtypes of IBD patients. It also indicates that humoral immunity can serve as a predictor of IBD patients' survival, which is of great significance for their clinical management.IL \u0026minus;\u0026thinsp;17 is closely associated with IBD and is essential in the pathological progression of IBD. Research has demonstrated that IL \u0026minus;\u0026thinsp;17 is a key cytokine secreted by Th17 cells and plays a substantial role in the development of intestinal inflammation in IBD patients [\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e]. In CD patients, IL \u0026minus;\u0026thinsp;17 - producing cells accumulate in large quantities in the submucosal and muscularis propria layers. Moreover, compared to healthy individuals, the number of IL \u0026minus;\u0026thinsp;17 - producing T cells in CD patients is significantly elevated. Ample evidence points to a strong link between IL \u0026minus;\u0026thinsp;17 and IBD [\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e].Furthermore, Kosaku's research has also shown that IL \u0026minus;\u0026thinsp;17 is closely related to IBD and plays a crucial role in its pathogenesis [\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e, \u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e]. Through whole - exome sequencing analysis of colon organoids from UC patients and healthy controls, it was discovered that the UC inflammatory epithelium accumulates somatic mutations in multiple genes associated with the IL \u0026minus;\u0026thinsp;17 signaling pathway, such as NFKBIZ, ZC3H12A, and PIGR. These genes are rarely affected in colon cancer but undergo mutations within the inflammatory environment of UC. This indicates that the IL \u0026minus;\u0026thinsp;17 signaling pathway is significantly perturbed in the pathological process of UC. Additionally, gene mutations related to the IL \u0026minus;\u0026thinsp;17 signaling pathway may also be implicated in the occurrence and development of UC in humans. They may disrupt the intestinal immune balance by interfering with the IL \u0026minus;\u0026thinsp;17 signaling pathway, thus promoting the progression of UC [\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eOur research results demonstrated differences in plasma cells, follicular helper T cells, activated natural killer (NK) cells, M1 macrophages, resting mast cells, activated mast cells, M0 macrophages, and neutrophils between the control group and the experimental group. Subsequently, through an investigation of the relationships between genes and immune cells, we found that in LOC389023, there were significant differences in M1 macrophages, M2 macrophages, activated mast cells, resting mast cells, and neutrophils. In LCN2, significant differential changes were observed in eosinophils, M0 macrophages, M1 macrophages, M2 macrophages, resting mast cells, neutrophils, plasma cells, activated CD4 memory T cells, and CD8 T cells. In DUOX2, stronger associations were detected in M0 macrophages, M1 macrophages, M2 macrophages, activated mast cells, resting mast cells, neutrophils, plasma cells, activated CD4 memory T cells, and CD8 T cells. In DEFA6, significant changes were noted in M1 macrophages, activated CD4 memory T cells, and regulatory T cells (Tregs). Moreover, M1 macrophages exhibited consistent differential changes when the four differentially expressed genes varied. M2 macrophages, resting mast cells, neutrophils, and activated CD4 memory T cells all showed obvious differences in three of the differentially expressed genes.Chen's research revealed that macrophages can polarize into two phenotypes, M1 and M2. M1 macrophages are mainly involved in pro - inflammatory responses and can secrete pro - inflammatory factors such as IL \u0026minus;\u0026thinsp;6, IL \u0026minus;\u0026thinsp;12, and TNF. M2 macrophages are mainly involved in anti - inflammatory responses and contribute to tissue repair, with characteristic expressions of arginase \u0026minus;\u0026thinsp;1 (Arg \u0026minus;\u0026thinsp;1), mannose receptor (CD206), and anti - inflammatory factor IL \u0026minus;\u0026thinsp;10, etc. IBD is an intestinal inflammatory disorder, and the inflammatory microenvironment in the intestine is closely related to macrophage polarization [\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e, \u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e]. Based on the author's research findings, we speculate that the over - activation of M1 macrophages may exacerbate the intestinal inflammatory response in IBD patients, leading to tissue damage. In contrast, the anti - inflammatory and tissue - repair functions of M2 macrophages may help alleviate the inflammatory symptoms of IBD and promote the repair of intestinal tissue. Regrettably, none of these issues have been verified, so it cannot be simply stated that M1/M2 macrophages are closely associated with IBD. However, through bioinformatics analysis methods and leveraging GEO data, the results of our comprehensive analysis indicate that macrophage polarization plays an important role in IBD [\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eZhen's research revealed that, in colorectal cancer patients, the density of mast cells is lower compared to normal tissues, and their phenotype undergoes substantial changes, shifting from a quiescent state to an activated one. In the tumor microenvironment, activated mast cells release a variety of bioactive substances, including histamine, cytokines, proteases, and lipid mediators, which are capable of triggering inflammatory responses [\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e]. Based on this, we hypothesize that the abnormal expression of differentially expressed genes leads to the over - activation of immune cells. These activated immune cells then release active substances, thereby contributing to the onset and progression of IBD. Our research findings indicate that mast cells exhibit significant differences across numerous differentially expressed genes [\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eNeutrophils play a key role in the innate immunity of the intestine [\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e]. Camille's study demonstrated that neutrophil infiltration serves as one of the markers of disease activity in IBD patients. Neutrophils can release various inflammatory mediators, such as ROS, cytotoxic particle contents, and neutrophil extracellular traps (NETs), thereby triggering inflammatory responses and causing tissue damage [\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e, \u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e]. However, simultaneously, neutrophils are also essential for maintaining the intestinal barrier, host defense, and reducing inflammation. For instance, CD177\u0026thinsp;+\u0026thinsp;neutrophils have a protective effect. The study also found that neutrophil infiltration is associated with the disease severity, and its excessive activation may lead to treatment failure. In CD, the dysfunction and reduced recruitment of neutrophils result in delayed bacterial clearance and the persistent presence of antigens, which trigger adaptive immune responses and the formation of granulomas. Additionally, the gut microbiota can regulate the production, function, and maturation of neutrophils, while neutrophils also influence the composition and function of the microbiota. The two interact and jointly impact the development of IBD [\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eCD4\u0026thinsp;+\u0026thinsp;T cells display distinct characteristics in CD and UC. In CD patients, there is a significant expansion of CD4\u0026thinsp;+\u0026thinsp;tissue - resident memory T cells (Trm) in the gut. Notably, CD4\u0026thinsp;+\u0026thinsp;Trm subsets expressing CD161 and CCR5 in CD patients exhibit stronger cytotoxicity and are associated with disease activity. In contrast, the intestine of UC patients is abundant in CD45RA+, CCR7\u0026thinsp;+\u0026thinsp;naive CD4\u0026thinsp;+\u0026thinsp;T cells and CXCR5\u0026thinsp;+\u0026thinsp;T follicular helper cells (Tfh).Moreover, CD - specific CD4\u0026thinsp;+\u0026thinsp;Trm (CDtrm) possess innate - like cell properties. They can rapidly secrete inflammatory cytokines such as IFN - γ upon cytokine stimulation, without the necessity of T - cell receptor (TCR) activation. This ability can cause damage to intestinal epithelial cells. In UC, the increase in Tfh cells may be linked to the elevation of pathological IgG\u0026thinsp;+\u0026thinsp;plasma cells [\u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eCurrently, in clinical practice, achieving an accurate diagnosis of IBD remains a challenge. The diagnosis mainly relies on medical history, clinical symptoms, laboratory tests, imaging examinations, endoscopy, and histological examinations. Colonoscopy is widely regarded as the \"gold standard\" for diagnosing IBD. However, colonoscopy is time-consuming and highly dependent on the diagnostic experience of relevant physicians. Additionally, the intestinal wall of IBD patients in severe stages is extremely fragile, and improper operation during colonoscopy can lead to complications such as bleeding and perforation.Therefore, our objective is to identify more reliable and sensitive biomarkers that can directly reflect the different stages of IBD and its underlying disease mechanisms. To this end, we carried out rigorous screening in databases such as the GEO, IBDDMB, and the UKB. We imposed strict restrictions on the number of patients included in the study. Only patients from cohorts with a population size greater than 100 were considered, and the scope of inclusion covered all stages of IBD onset and the entire intestinal tissue.Regrettably, although we obtained a substantial amount of data from public databases for analysis and made efforts to include as complete data as possible, we did not conduct real-world verification. Nevertheless, our comprehensive analysis of existing IBD data using multiple methods and our genetic - based study of specific immune cells that exhibit differences in immune responses are unprecedented. This approach provides novel research perspectives for the study of IBD.\u003c/p\u003e \u003cp\u003eCurrently, ANN, machine learning, interpretable analysis, and multi - chip joint analysis are gaining increasing popularity in medical research. Given the complex genetic basis of IBD and the limitations of existing diagnostic tools, the diagnosis of IBD often poses significant challenges.By analyzing extensive genomic data, ANNs and machine learning hold great promise for enhancing diagnostic efficiency and accuracy. They can also identify complex multi - allele patterns that may be associated with specific diseases. Through interpretable analysis, detailed insights can be provided into the contribution of differentially expressed genes within diagnostic models, thereby offering explanations for machine - learning models [\u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e].The application of artificial neural networks and computer technologies such as machine learning and deep learning to assist clinical doctors in disease diagnosis and treatment has become increasingly prevalent. The continuously evolving artificial intelligence enables more precise and scientific diagnosis and treatment, ultimately bringing benefits to patients in need of medical care.\u003c/p\u003e"},{"header":"conclusion","content":"\u003cp\u003eIn this study, we developed a diagnostic model for predicting IBD using comprehensive ANN, machine learning, interpretable analysis, and multi chip joint analysis methods. ANN, LASSO, SVM, and Random Forest algorithms are used for genetic feature selection. We identified four key biomarkers (LOC389023, DUOX2, LCN2, and DEFA6) and used ten machine learning methods and SHAP models to assist in IBD diagnosis, clarifying the genetic characteristics, molecular pathways, and differential immune cells of IBD. The research results showed that these genes are related to immune system function. In addition, we conducted multiple validations using the dataset. Our research findings indicate that machine learning algorithms can facilitate accurate diagnostic decisions for IBD, enabling clinicians to explore new treatment pathways and diagnostic methods.\u003c/p\u003e "},{"header":"Declarations","content":"\u003cp\u003e\u0026bull; Ethics approval and consent to participate\u003c/p\u003e\n\u003cp\u003eNOT REQUIRED.\u003c/p\u003e\n\u003cp\u003e\u0026bull; Consent for publication\u003c/p\u003e\n\u003cp\u003eNOT REQUIRED.\u003c/p\u003e\n\u003cp\u003e\u0026bull; Competing interests\u003c/p\u003e\n\u003cp\u003eThe author declares that there is no conflict of interest.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u0026bull; Funding\u003c/p\u003e\n\u003cp\u003eNational Natural Science Foundation of China (32372302).\u003c/p\u003e\n\u003cp\u003eNational Natural Science Foundation of China(82405210).\u003c/p\u003e\n\u003cp\u003e\u0026bull; Authors\u0026apos; contributions\u003c/p\u003e\n\u003cp\u003eYan chaosheng: Data curation, Methodology, Writing- Original draft preparation, Formal analysis, Methodology.\u003c/p\u003e\n\u003cp\u003eRao jingjing:Resources, Writing - Original Draft.\u003c/p\u003e\n\u003cp\u003eDai yuanyuan:Writing- Reviewing and Editing.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eDuan wenhui:\u0026nbsp;Investigation.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eSun haowen:\u0026nbsp;Investigation.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eSheng yinyue:Methodology, Writing - Review \u0026amp; Editing, Project administration, Funding acquisition.\u003c/p\u003e\n\u003cp\u003eXue yuzheng:Methodology, Writing - Review \u0026amp; Editing, Project administration, Funding acquisition.\u003c/p\u003e\n\u003cp\u003e\u0026bull; Acknowledgements\u003c/p\u003e\n\u003cp\u003eThroughout the entire research process, all the authors not only provided professional academic guidance, helping me to solve the challenges in experimental design and data analysis, but also patiently offered numerous revision suggestions during the paper - writing stage. This has significantly enhanced the quality of the paper. Additionally, they provided financial support.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u0026bull; Availability of data and material\u003c/p\u003e\n\u003cp\u003eThe GSE87466, GSE179285, and GSE87473 microarray datasets used in this study were downloaded from the Gene Expression Omnibus database.\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eSAEID SEYEDIAN S, ALIMENTARY TRACT RESEARCH CENTER, AHVAZ JUNDISHAPUR UNIVERSITY OF MEDICAL SCIENCE, AHVAZ, IRAN, NOKHOSTIN F, et al. A review of the diagnosis, prevention, and treatment methods of inflammatory bowel disease[J/OL]. Journal of Medicine and Life, 2019, 12(2): 113-122. DOI:10.25122/jml-2018-0075.\u003c/li\u003e\n\u003cli\u003eHODSON R. Inflammatory bowel disease[J/OL]. Nature, 2016, 540(7634): S97-S97. DOI:10.1038/540S97a.\u003c/li\u003e\n\u003cli\u003eDIEZ-MARTIN E, HERNANDEZ-SUAREZ L, MU\u0026Ntilde;OZ-VILLAFRANCA C, et al. Inflammatory Bowel Disease: A Comprehensive Analysis of Molecular Bases, Predictive Biomarkers, Diagnostic Methods, and Therapeutic Options[J/OL]. International Journal of Molecular Sciences, 2024, 25(13): 7062. DOI:10.3390/ijms25137062.\u003c/li\u003e\n\u003cli\u003eBISGAARD T H, ALLIN K H, KEEFER L, et al. Depression and anxiety in inflammatory bowel disease: epidemiology, mechanisms and treatment[J/OL]. Nature Reviews Gastroenterology \u0026amp; Hepatology, 2022, 19(11): 717-726. DOI:10.1038/s41575-022-00634-6.\u003c/li\u003e\n\u003cli\u003eHE X, ZHOU H. Decoding the IBD paradox: A triadic interplay between REG3, enterococci, and NOD2[J/OL]. Cell Host \u0026amp; Microbe, 2023, 31(9): 1425-1427. DOI:10.1016/j.chom.2023.08.008.\u003c/li\u003e\n\u003cli\u003eGILLILAND A, CHAN J J, DE WOLFE T J, et al. Pathobionts in Inflammatory Bowel Disease: Origins, Underlying Mechanisms, and Implications for Clinical Care[J/OL]. Gastroenterology, 2024, 166(1): 44-58. DOI:10.1053/j.gastro.2023.09.019.\u003c/li\u003e\n\u003cli\u003eCOS\u0026Iacute;N-ROGER J. Inflammatory Bowel Disease: Immune Function, Tissue Fibrosis and Current Therapies[J/OL]. International Journal of Molecular Sciences, 2024, 25(12): 6416. DOI:10.3390/ijms25126416.\u003c/li\u003e\n\u003cli\u003eAGRAWAL M, ALLIN K H, PETRALIA F, et al. Multiomics to elucidate inflammatory bowel disease risk factors and pathways[J/OL]. Nature Reviews Gastroenterology \u0026amp; Hepatology, 2022, 19(6): 399-409. DOI:10.1038/s41575-022-00593-y.\u003c/li\u003e\n\u003cli\u003eAGRAWAL M, SPENCER E A, COLOMBEL J F, et al. Approach to the Management of Recently Diagnosed Inflammatory Bowel Disease Patients: A User\u0026rsquo;s Guide for Adult and Pediatric Gastroenterologists[J/OL]. Gastroenterology, 2021, 161(1): 47-65. DOI:10.1053/j.gastro.2021.04.063.\u003c/li\u003e\n\u003cli\u003eSTIDHAM R W, TAKENAKA K. Artificial Intelligence for Disease Assessment in Inflammatory Bowel Disease: How Will it Change Our Practice?[J/OL]. Gastroenterology, 2022, 162(5): 1493-1506. DOI:10.1053/j.gastro.2021.12.238.\u003c/li\u003e\n\u003cli\u003eRENGANATHAN V. Overview of artificial neural network models in the biomedical domain[J/OL]. Bratislava Medical Journal, 2019, 120(07): 536-540. DOI:10.4149/BLL_2019_087.\u003c/li\u003e\n\u003cli\u003eSHEN M, ZHANG Y, ZHAN R, et al. Predicting the risk of cardiovascular disease in adults exposed to heavy metals: Interpretable machine learning[J/OL]. Ecotoxicology and Environmental Safety, 2025, 290: 117570. DOI:10.1016/j.ecoenv.2024.117570.\u003c/li\u003e\n\u003cli\u003eQIU P, ISHIMOTO T, FU L, et al. The Gut Microbiota in Inflammatory Bowel Disease[J/OL]. Frontiers in Cellular and Infection Microbiology, 2022, 12: 733992. DOI:10.3389/fcimb.2022.733992.\u003c/li\u003e\n\u003cli\u003eTUSHIR J S, AKBARIAN S. Chromatin-bound RNA and the neurobiology of psychiatric disease[J/OL]. Neuroscience, 2014, 264: 131-141. DOI:10.1016/j.neuroscience.2013.06.051.\u003c/li\u003e\n\u003cli\u003eMAGRO D O, SASSAKI L Y, CHEBLI J M F. Interaction between diet and genetics in patients with inflammatory bowel disease[J/OL]. World Journal of Gastroenterology, 2024, 30(12): 1644-1650. DOI:10.3748/wjg.v30.i12.1644.\u003c/li\u003e\n\u003cli\u003eZENG B, HUANG Y, CHEN S, et al. Dextran sodium sulfate potentiates NLRP3 inflammasome activation by modulating the KCa3.1 potassium channel in a mouse model of colitis[J/OL]. Cellular \u0026amp; Molecular Immunology, 2022, 19(8): 925-943. DOI:10.1038/s41423-022-00891-0.\u003c/li\u003e\n\u003cli\u003eOHYA S. Physiological Role of K\u003csup\u003e+\u003c/sup\u003e Channels in the Regulation of T Cell Function[J/OL]. YAKUGAKU ZASSHI, 2016, 136(3): 479-483. DOI:10.1248/yakushi.15-00246-4.\u003c/li\u003e\n\u003cli\u003eGUPTA U, GHOSH S, WALLACE C T, et al. Increased LCN2 (lipocalin 2) in the RPE decreases autophagy and activates inflammasome-ferroptosis processes in a mouse model of dry AMD[J/OL]. Autophagy, 2023, 19(1): 92-111. DOI:10.1080/15548627.2022.2062887.\u003c/li\u003e\n\u003cli\u003eLI J, SIMMONS A J, HAWKINS C V, et al. Identification and multimodal characterization of a specialized epithelial cell type associated with Crohn\u0026rsquo;s disease[J/OL]. Nature Communications, 2024, 15(1): 7204. DOI:10.1038/s41467-024-51580-7.\u003c/li\u003e\n\u003cli\u003eXIAO X, YEOH B S, VIJAY-KUMAR M. Lipocalin 2: An Emerging Player in Iron Homeostasis and Inflammation[J/OL]. Annual Review of Nutrition, 2017, 37(1): 103-130. DOI:10.1146/annurev-nutr-071816-064559.\u003c/li\u003e\n\u003cli\u003eWU D, WANG X, HAN Y, et al. The effect of lipocalin-2 (LCN2) on apoptosis: a proteomics analysis study in an LCN2 deficient mouse model[J/OL]. BMC Genomics, 2021, 22(1): 892. DOI:10.1186/s12864-021-08211-y.\u003c/li\u003e\n\u003cli\u003eYANG Y, LI S, LIU K, et al. Lipocalin-2-mediated intestinal epithelial cells pyroptosis via NF-\u0026kappa;B/NLRP3/GSDMD signaling axis adversely affects inflammation in colitis[J/OL]. Biochimica et Biophysica Acta (BBA) - Molecular Basis of Disease, 2024, 1870(7): 167279. DOI:10.1016/j.bbadis.2024.167279.\u003c/li\u003e\n\u003cli\u003eHUANG Q, XING J, LI G, et al. LCN2 regulates the gut microbiota and metabolic profile in mice infected with \u003cem\u003eMycobacterium bovis\u003c/em\u003e[J/OL]. mSystems, 2024, 9(8): e00501-24. DOI:10.1128/msystems.00501-24.\u003c/li\u003e\n\u003cli\u003eTHEN A, GOENAWAN H, LESMANA R, et al. Exploring the potential regulation of DUOX in thyroid hormone‑autophagy signaling via IGF‑1 in the skeletal muscle (Review)[J/OL]. Biomedical Reports, 2024, 22(3): 39. DOI:10.3892/br.2024.1917.\u003c/li\u003e\n\u003cli\u003eGRASBERGER H, MAGIS A T, SHENG E, et al. DUOX2 variants associate with preclinical disturbances in microbiota-immune homeostasis and increased inflammatory bowel disease risk[J/OL]. Journal of Clinical Investigation, 2021, 131(9): e141676. DOI:10.1172/JCI141676.\u003c/li\u003e\n\u003cli\u003eKASUMBA D M, HUOT S, CARON E, et al. DUOX2 regulates secreted factors in virus‐infected respiratory epithelial cells that contribute to neutrophil attraction and activation[J/OL]. The FASEB Journal, 2023, 37(2): e22765. DOI:10.1096/fj.202201205R.\u003c/li\u003e\n\u003cli\u003eGEERDINK R J, PILLAY J, MEYAARD L, et al. Neutrophils in respiratory syncytial virus infection: A target for asthma prevention[J/OL]. Journal of Allergy and Clinical Immunology, 2015, 136(4): 838-847. DOI:10.1016/j.jaci.2015.06.034.\u003c/li\u003e\n\u003cli\u003ePETERS C, NICHOLAS A K, SCHOENMAKERS E, et al. \u003cem\u003eDUOX2\u003c/em\u003e / \u003cem\u003eDUOXA2\u003c/em\u003e Mutations Frequently Cause Congenital Hypothyroidism that Evades Detection on Newborn Screening in the United Kingdom[J/OL]. Thyroid, 2019, 29(6): 790-801. DOI:10.1089/thy.2018.0587.\u003c/li\u003e\n\u003cli\u003eBURGUE\u0026Ntilde;O J F, FRITSCH J, GONZ\u0026Aacute;LEZ E E, et al. Epithelial TLR4 Signaling Activates DUOX2 to Induce Microbiota-Driven Tumorigenesis[J/OL]. Gastroenterology, 2021, 160(3): 797-808.e6. DOI:10.1053/j.gastro.2020.10.031.\u003c/li\u003e\n\u003cli\u003eSIMMS L A, DOECKE J D, WALSH M D, et al. Reduced -defensin expression is associated with inflammation and not NOD2 mutation status in ileal Crohn\u0026rsquo;s disease[J/OL]. Gut, 2008, 57(7): 903-910. DOI:10.1136/gut.2007.142588.\u003c/li\u003e\n\u003cli\u003eANGRIMAN I, BORDIGNON G, KOTSAFTI A, et al. Innate Immunity Activation in Newly Diagnosed Ileocolonic Crohn\u0026rsquo;s Disease: A Cohort Study[J/OL]. Diseases of the Colon \u0026amp; Rectum, 2024[2025-03-11]. https://journals.lww.com/10.1097/DCR.0000000000003145. DOI:10.1097/DCR.0000000000003145.\u003c/li\u003e\n\u003cli\u003eGAUDINO S J, BEAUPRE M, LIN X, et al. IL-22 receptor signaling in Paneth cells is critical for their maturation, microbiota colonization, Th17-related immune responses, and anti-Salmonella immunity[J/OL]. Mucosal Immunology, 2021, 14(2): 389-401. DOI:10.1038/s41385-020-00348-5.\u003c/li\u003e\n\u003cli\u003eDEGRUTTOLA A K, LOW D, MIZOGUCHI A, et al. Current Understanding of Dysbiosis in Disease in Human and Animal Models:[J/OL]. Inflammatory Bowel Diseases, 2016, 22(5): 1137-1150. DOI:10.1097/MIB.0000000000000750.\u003c/li\u003e\n\u003cli\u003eENG S J, NONNECKE E B, DE LORIMIER A J, et al. FOXO inhibition rescues \u0026alpha;-defensin expression in human intestinal organoids[J/OL]. Proceedings of the National Academy of Sciences, 2023, 120(47): e2312453120. DOI:10.1073/pnas.2312453120.\u003c/li\u003e\n\u003cli\u003eMOSCHEN A R, TILG H, RAINE T. IL-12, IL-23 and IL-17 in IBD: immunobiology and therapeutic targeting[J/OL]. Nature Reviews Gastroenterology \u0026amp; Hepatology, 2019, 16(3): 185-196. DOI:10.1038/s41575-018-0084-8.\u003c/li\u003e\n\u003cli\u003eSCHMITT H, NEURATH M F, ATREYA R. Role of the IL23/IL17 Pathway in Crohn\u0026rsquo;s Disease[J/OL]. Frontiers in Immunology, 2021, 12: 622934. DOI:10.3389/fimmu.2021.622934.\u003c/li\u003e\n\u003cli\u003eDENG Z, WANG S, WU C, et al. IL-17 inhibitor-associated inflammatory bowel disease: A study based on literature and database analysis[J/OL]. Frontiers in Pharmacology, 2023, 14: 1124628. DOI:10.3389/fphar.2023.1124628.\u003c/li\u003e\n\u003cli\u003eNANKI K, FUJII M, SHIMOKAWA M, et al. Somatic inflammatory gene mutations in human ulcerative colitis epithelium[J/OL]. Nature, 2020, 577(7789): 254-259. DOI:10.1038/s41586-019-1844-5.\u003c/li\u003e\n\u003cli\u003eSHAPOURI‐MOGHADDAM A, MOHAMMADIAN S, VAZINI H, et al. Macrophage plasticity, polarization, and function in health and disease[J/OL]. Journal of Cellular Physiology, 2018, 233(9): 6425-6440. DOI:10.1002/jcp.26429.\u003c/li\u003e\n\u003cli\u003eYUNNA C, MENGRU H, LEI W, et al. Macrophage M1/M2 polarization[J/OL]. European Journal of Pharmacology, 2020, 877: 173090. DOI:10.1016/j.ejphar.2020.173090.\u003c/li\u003e\n\u003cli\u003eAPONTE-L\u0026Oacute;PEZ A, MU\u0026Ntilde;OZ-CRUZ S. Mast Cells in the Tumor Microenvironment[M/OL]//BIRBRAIR A. Tumor Microenvironment: page 1273. Cham: Springer International Publishing, 2020: 159-173[2025-03-19]. http://link.springer.com/10.1007/978-3-030-49270-0_9. DOI:10.1007/978-3-030-49270-0_9.\u003c/li\u003e\n\u003cli\u003eXIE Z, NIU L, ZHENG G, et al. Single-cell analysis unveils activation of mast cells in colorectal cancer microenvironment[J/OL]. Cell \u0026amp; Bioscience, 2023, 13(1): 217. DOI:10.1186/s13578-023-01144-x.\u003c/li\u003e\n\u003cli\u003eLIEW P X, KUBES P. The Neutrophil\u0026rsquo;s Role During Health and Disease[J/OL]. Physiological Reviews, 2019, 99(2): 1223-1248. DOI:10.1152/physrev.00012.2018.\u003c/li\u003e\n\u003cli\u003ePAPAYANNOPOULOS V. Neutrophils Stepping Through (to the Other Side)[J/OL]. Immunity, 2018, 49(6): 992-994. DOI:10.1016/j.immuni.2018.12.006.\u003c/li\u003e\n\u003cli\u003eDANNE C, SKERNISKYTE J, MARTEYN B, et al. Neutrophils: from IBD to the gut microbiota[J/OL]. Nature Reviews Gastroenterology \u0026amp; Hepatology, 2024, 21(3): 184-197. DOI:10.1038/s41575-023-00871-3.\u003c/li\u003e\n\u003cli\u003eYOKOI T, MURAKAMI M, KIHARA T, et al. Identification of a unique subset of tissue-resident memory CD4\u003csup\u003e+\u003c/sup\u003e T cells in Crohn\u0026rsquo;s disease[J/OL]. Proceedings of the National Academy of Sciences, 2023, 120(1): e2204269120. DOI:10.1073/pnas.2204269120.\u003c/li\u003e\n\u003cli\u003eROMAN-NARANJO P, PARRA-PEREZ A M, LOPEZ-ESCAMEZ J A. A systematic review on machine learning approaches in the diagnosis and prognosis of rare genetic diseases[J/OL]. Journal of Biomedical Informatics, 2023, 143: 104429. DOI:10.1016/j.jbi.2023.104429.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":true,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Inflammatory bowel disease, Machine learning, Artificial neural network, Diagnostic model, Immune differences","lastPublishedDoi":"10.21203/rs.3.rs-6286485/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6286485/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eBackground\u003c/h2\u003e \u003cp\u003eInflammatory bowel disease (IBD) is a chronic non - specific inflammatory disorder triggered by immune responses and genetic factors. Currently, there is no cure for IBD, and its etiology remains unclear. As a result, early detection and diagnosis of IBD pose significant challenges. Therefore, investigating biomarkers in peripheral blood is of utmost importance, as it can assist doctors in the early identification and management of IBD.\u003c/p\u003e\u003ch2\u003eMethods\u003c/h2\u003e \u003cp\u003eWe employed the multi - chip joint analysis approach to thoroughly explore the database. Based on methods such as artificial neural networks (ANN), machine learning techniques, and the SHAP model, we developed a diagnostic model for IBD. To select genetic features, we utilized three machine learning algorithms: the Least Absolute Shrinkage and Selection Operator (LASSO), Support Vector Machine (SVM), and Random Forest (RF) to screen for differentially expressed genes. Additionally, we conducted an in - depth analysis of the enriched molecular pathways of these differentially expressed genes through Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. Moreover, we used the SHAP model to interpret the results of the machine learning process. Finally, we examined the relationship between differentially expressed genes and immune cells.\u003c/p\u003e\u003ch2\u003eResults\u003c/h2\u003e \u003cp\u003eThrough machine learning, we identified four crucial biomarkers for IBD, namely LOC389023, DUOX2, LCN2, and DEFA6. The SHAP model was used to elucidate the contribution of differentially expressed genes in the diagnostic model. These genes are primarily associated with immune system modulation and microbial alterations. GO and KEGG pathway enrichment analyses indicated that the differentially expressed genes demonstrated excellent performance in molecular pathways such as the Antimicrobial and IL \u0026minus;\u0026thinsp;17 signaling pathways. By performing correlation and differential analyses between differentially expressed genes and immune cells, we found that M1 macrophages exhibited stable differential changes across all four differentially expressed genes. M2 macrophages, resting mast cells, neutrophils, and activated CD4 memory T cells all showed significant differences among three of the differentially expressed genes.\u003c/p\u003e\u003ch2\u003eConclusion\u003c/h2\u003e \u003cp\u003eWe have identified differentially expressed genes (LOC389023, DUOX2, LCN2, and DEFA6) with significant immune - related effects in IBD. Our findings suggest that machine learning algorithms outperform ANN in the diagnosis of IBD. This research provides a theoretical foundation for the clinical diagnosis, targeted therapy, and prognosis evaluation of IBD.\u003c/p\u003e","manuscriptTitle":"Construction of a Feature Gene and Machine Prediction Model for Inflammatory Bowel Disease Based on Multi - Chip Joint Analysis","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-04-22 12:33:54","doi":"10.21203/rs.3.rs-6286485/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"ca800ff8-80c0-49d0-947f-baa8e4df9db8","owner":[],"postedDate":"April 22nd, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2025-08-25T16:36:52+00:00","versionOfRecord":{"articleIdentity":"rs-6286485","link":"https://doi.org/10.1186/s12967-025-06838-z","journal":{"identity":"journal-of-translational-medicine","isVorOnly":false,"title":"Journal of Translational Medicine"},"publishedOn":"2025-08-19 16:29:36","publishedOnDateReadable":"August 19th, 2025"},"versionCreatedAt":"2025-04-22 12:33:54","video":"","vorDoi":"10.1186/s12967-025-06838-z","vorDoiUrl":"https://doi.org/10.1186/s12967-025-06838-z","workflowStages":[]},"version":"v1","identity":"rs-6286485","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6286485","identity":"rs-6286485","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.