Screening of Potential Biomarkers and Immune Analysis for Osteoarthritis Based on Machine Learning and WGCNA

preprint OA: closed
Full text JSON View at publisher
Full text 92,469 characters · extracted from preprint-html · click to expand
Screening of Potential Biomarkers and Immune Analysis for Osteoarthritis Based on Machine Learning and WGCNA | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Screening of Potential Biomarkers and Immune Analysis for Osteoarthritis Based on Machine Learning and WGCNA Li Zheng, Hongquan Heng, Jian Li, Feng Zhou, Zhenghui Hu This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4299353/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract The real pathogenesis of osteoarthritis (OA) remains unknown, leaving a significant burden of social and medical experiences. Thus, this study aimed to identify potential novel biomarkers in OA. The OA dataset (GSE55235) was from the Gene Expression Omnibus (GEO) database. Weighted gene co-expression network analysis (WGCNA) for filtering the dataset to generate differentially expressed genes (DEGs). Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis to explore functional biology and related diseases. Subsequently, a further selection of latent biomarkers using three techniques (least absolute shrinkage and selection operator (LASSO) regression, support vector machine (SVM), and random forest (RF)). Receiver operating curve (ROC) of potential biomarkers were drawn to evaluate the diagnostic validity. The infiltration of immune cells for OA was evaluated using CIBERSORT, and the association with potential biomarkers and immune infiltrating cells was analyzed. Lastly, correlations and expression differences of potential biomarkers were investigated. In total, 803 DEGs were identified in OA and control samples. By overlapping DEGs and two module genes of WGCNA, we obtained 137 genes. LTC4S, XIST, CXCL8 and PIM1 were identified after validation by machine learning methods and ROC. Immune infiltration analysis demonstrated that T cells, and mast cells were linked to the pathogenesis of OA. The research might now help in understanding the etiology of OA. osteoarthritis machine learning WGCNA potential biomarkers immune analysis Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 Introduction Osteoarthritis (OA) is a complex joint disease characterized by the progressive destruction of articular cartilage with lesions of the synovium, subchondral bone, and surrounding tissues [1]. OA is the most common type of arthritis and affects more than 240 million people worldwide [2]. Particularly, due to the continuous influence of the new coronavirus, the worldwide incidence of osteoarthritis will expand in the next years, and the associated sarcopenia will deteriorate [3]. Currently, even though joint imaging is now utilized to help the identification of osteoarthritis, it is often reserved for later stages. The mechanisms of target gene expression and disease regulation in early OA are not fully elucidated [4]. Furthermore, total joint replacement may be beneficial in enhancing life quality, but many patients continue to experience severe emotional and financial burdens [5, 6]. Therefore, it is essential to investigate novel biomarkers for the early recognition of osteoarthritis. Bioinformatics analysis is an innovative and promising method for analyzing biomarkers [7, 8]. The screening of novel biomarkers for nonneoplastic disorders has been aided by data remaining employing bioinformatics analysis based on public datasets [9]. The effectiveness of genetic disorder biomarker identification is greatly increased by combining weighted gene co-expression network analysis (WGCNA) with machine learning approaches [10-12]. For example, with the help of WGCNA and the least absolute shrinkage and selection operator (LASSO), as well as logistic regression and SVM-RFE, SIRPB2, AQP9, SLC16A3, HSPA6, and LILRB3 are identified as hub genes for lumbar disc herniation [13]. WGCNA, LASSO, and SVM-RFE are used to identify RAC1 correlated M0 macrophages and the risk score to predict hepatocellular carcinoma patients’ survival [14]. However, many bioinformatics assessments of osteoarthritis have not used WGCNA and three machine learning techniques. In this study, a gene expression dataset was used to identify potential biomarkers in OA by incorporating WGCNA with LASSO, SVM, and RF. Firstly, we identified differentially expressed genes (DEGs) in the GSE55235 dataset for OA and control samples. Furthermore, WGCNA was generated to screen out key genes based on the expression matrix of DEGs, and using machine learning techniques, potential biomarkers were identified. Ultimately, we explored the functionalities of potential biomarkers and the immune mechanisms associated with osteoarthritis. This study may provide novel diagnostic biomarkers and aid in the investigation of the pathophysiology of OA. Materials And Methods 2.1. Data Collection and Identification of DEGs The datasets of GSE55235 and GSE12021 were acquired from the GEO database [15]. The GSE55235 data collection consists of 10 OA samples and 10 control samples sequenced on the GPL96 platform; the GSE12021 data collection contains 10 OA samples and 9 control samples on the GPL96 platform. The “limma” package [16] was used to screen differentially expressed genes (DEGs) between OA and healthy samples. The DEGs with adjust p-value 1 was statistically significant. Then, the “ggplot2” and “pheatmap” packages [17] were used to produce the volcano diagram as well as the heatmap. 2.2. Selection for Key Module Genes with WGCNA The expression matrix of DEGs was implemented with the “WGCNA” package [18] to generate a weighted gene coexpression network to identify genes related to OA. To assure the network’s dependability, all samples were first grouped. The expression similarity of genes was then assessed by computing the Pearson correlation coefficient between each pair of genes to produce a correlation matrix. In addition, the soft threshold function was used to turn the correlation matrix into a weighted neighborhood matrix. To ensure that gene correlations were ideally compatible with scale-free distribution, the best soft threshold was determined using a soft connectivity approach. Next, a topological overlap matrix (TOM) was created from the neighborhood matrix. What’s more, co-expression modules with a minimum of 100 genes per module were created based on the criteria of dynamic tree cutting. Lastly, gene significance (GS) and module membership (MM) were looked at to combine modules with clinical characteristics and show the signature gene network. 2.3. Correlation Analyses of Functions and Pathways A venn diagram was carried out by R (version 4.2.0) to overlap the genes between DEGs and WGCNA. The key genes were extracted for further functional enrichment. The “clusterProfiler” package [19] was used to perform Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses to investigate the biological functions and pathways. P <0.05 was deemed statistically significant. Additionally, protein-protein interaction (PPI) networks of key genes were created and shown using the Search Tool for Recurring Instances of Neighbouring Genes (STRING, Version:11.5, https://www.string-protein.org) database and Cytoscape software (Version: 3.9.1, http:// cyto-scape.org/). 2.4. Machine Learning Algorithms Identify Potential Biomarkers Initially, key genes were screened using the “glmnet” package and the LASSO logistic regression [20, 21]. The SVM algorithm was then used to search for candidate genes using the “e1071” package [22]. Moreover, the random forest (RF) algorithm [23] was employed to evaluate key genes using the “randomForest” package [24]. Lastly, overlapping key genes identified by the LASSO, SVM, and RF algorithms were assessed to be potential biomarkers of OA in synovial tissue. 2.5. Diagnostic Efficacy and Expression Levels of Potential Biomarkers To evaluate the diagnostic effectiveness of potential biomarkers, using the “pROC” package, ROC was undertaken to see if potential biomarkers might discriminate OA samples from control samples in GSE55235 and GSE12021 (25). “limma”, “ggsci” and “ggpubr” packages were applied to plot the expression levels of potential biomarkers and validate them within the GSE12021 dataset. 2.6. GSEA and Immune Infiltration Analysis of Potential Biomarkers Firstly, gene set enrichment analysis (GSEA) was conducted with the “clusterProfile” [19], “patchwork” and “org.Hs.eg.db” packages to determine the biological function of potential biomarkers. Secondly, analysis of integrated gene expression data using the CIBERSORT algorithm revealed the proportion of 22 immune cell types [26]. Finally, the “reshape2” [27] and “tidyverse” packages were utilized to investigate the relationship between osteoarthritis and 22 different types of immune infiltrating cells. To explore the correlation between potential biomarkers and 22 kinds of immune infiltrating cells, we used lollipop plots to show the correlation. P value <0.05 was statistically significant. Results 3.1. Identification of DEGs The normalized boxplots of GSE55235 were presented in Supplement Figure 1. The DEGs of GSE55235 were identified by the “limma” package. A total of 803 DEGs were discovered, containing 429 up-regulated and 374 down-regulated genes. “pheatmap” and “ggplot2” were chosen to generate heatmaps and volcano maps of substantially changed genes (Figure 1A, B). 3.2. Screening for Key Module Genes with WGCNA To further screen genes related to OA, WGCNA was performed using 803 DEGs. At first, we explored whether the sample had outliers (Figure S2A), the soft threshold selection study demonstrated that when power = 7 (scale free R2 = 0.85). Then, k plot was used to verify soft power 7 can ensure scale free R2 ≥0.85 (Figure S2B). Gene association was most commensurate with the scale-free distribution (Figure 2A). Moreover, merging modules with feature factors larger than 0.5 and setting the minimum module size to 100 identified many co-expression modules in the WGCNA (Figure 2B). Purple and yellow modules were selected as the key modules because they had a higher correlation than other modules with OA (Figure 2C). These two modules each contain 2133 genes. 3.3. Functional Enrichment Analysis of Key Genes in OA The venn was used to show 137 key genes to OA (Figure 3A), GO analysis showed these key genes were mainly involved in response to lipopolysaccharide, molecule of bacterial origin and leukocyte activation and radiation. Moreover, these key genes were mainly associated with collagen−containing extracellular matrix and antigen binding (Figure 3B). Furthermore, KEGG analysis results showed that these key genes were enriched in IL−17, NF−kappa B and TNF signaling pathways. Moreover, the KEGG analysis were mainly enriched in human T−cell leukemia virus 1 infection and TNF signaling pathway (Figure 3C). Using PPI networks, we investigate connections between proteins encoded by key genes. As shown in Figure 3D, the PPI network included 137 nodes with 279 edges, and MYC, ATF3, VEGFA, MMP9, CXCL8, EGR1, PTGS2 and NFKBI could affect more proteins. 3.4. Machine Learning Algorithms Identify Potential Biomarkers In LASSO, RF, and SVM-RFE evaluations, 137 key genes were used to better identify potential biomarkers. Firstly, 11 out of 137 important genes were identified using the LASSO regression technique (optimal sparse parameter λ = 0.004), including ABCC3, ANGPTL7, CX3CR1, CXCL8, FOSL2, GADD45A, GAP43, LTC4S, MYOC, PIM1 and XIST (Figure 4A). In a 5-fold cross-validation, we used the SVM technique to identify 20 genes (Figure 4B). Using RF, we determined the top 20 genes (Figure 4C) After taking three machine learning algorithms to get the intersection of genes, ABCC3, ANGPTL7, CXCL8, GADD45A, LTC4S, MYOC, PIM1 and XIST were identified as target biomarkers (Figure 4D). At last, we built boxplots of residual and receiver operating characteristic (ROC) for two machine learning models, SVM and RF, and the results demonstrate the methods’ excellent diagnostic performance (Figure S3A, B). 3.5. Potential Biomarkers’ Diagnostic Effectiveness and Expression Levels To validate the diagnostic efficacy of 8 potential biomarkers, we validated them in the datasets GSE55235 and GSE12021 and excluded 4 potential genes: ABCC3, ANGPTL7, GADD45A, and MYOC (Figure S4A, B). The ROC curve results of LTC4, XIST, CXCL8, and PIM1 in GSE55235 were as follows: LTC4S (AUC=1.000), XIST (AUC=0.850), CXCL8 (AUC=1.000), and PIM1 (AUC=0.990) (Figure 5A). The ROC curve results of LTC4, XIST, CXCL8, and PIM1 in GSE12021 were as follows: LTC4S (AUC=0.878), XIST (AUC=0.778), CXCL8 (AUC=0.778), and PIM1 (AUC=0.872) (Figure 5B). Then, we evaluated the expression levels of four possible biomarkers in the datasets GSE55235 and GSE12021 and determined that LTC4S and XIST are highly expressed in OA, but CXCL8 and PIM1 are lowly expressed (Figure 6A, B). 3.6. GSEA Analyses of Potential Biomarkers We investigated the biological functions and regulatory mechanisms of LTC4S, XIST, CXCL8, and PIM1 in greater detail. GSEA analyses revealed that LTC4S and XIST are mainly involved in alpha−Linolenic acid metabolism, fat digestion and absorption ECM−receptor interaction, and ribosome (Figure 7A, B). Moreover, GSEA analyses showed that CXCL8 and PIM1 are highly expressed in mineral absorption, ECM−receptor interaction, and protein digestion and absorption. CXCL8 expression levels are low in thiamine, beta−Alanine, histidine, and tryptophan metabolisms. PIM1 are lowly expressed in linoleic acid, ether lipid and alpha−Linolenic acid metabolisms (Figure 8A, B). 3.7. Correlation Analyses of Potential Biomarkers with Immune Infiltrating Cells Firstly, we analyzed the histogram distribution of osteoarthritis and 22 kinds of immune infiltrating cells. The bar chart indicated that OA is primarily associated with mast cells, dendritic cells, macrophages M1, and activated T cells CD4 memory (Figure 9A). Secondly, we illustrated the expression levels of the immune infiltrating cells in OA using box plots. Activated T cells CD4 memory, resting dendritic cells and activated dendritic cells were highly upregulated in OA. Resting T cells CD4 memory, monocytes, macrophages M2, resting mast cells, and eosinophils were highly downregulated in OA (Figure 9B). Finally, we evaluated the connection between the four potential biomarkers identified and the 22 immune infiltrating cells. The results suggested that LTC4S was positively correlated with activated T cells CD4 memory (R = 0.66, p = 0.010), activated mast cells (R = 0.7, p = 0.007), and activated dendritic cells (R = 0.77, p = 0.001). LTC4S was negatively associated with eosinophils (R = −0.58, p = 0.028), monocytes (R = −0.72, p = 0.004), resting T cells CD4 memory (R = −0.8, p < 0.001), and mast cells resting (R = −0.7, p = 0.0053) (Figure 10A). XIST was strongly associated with activated T cells CD4 memory (R = 0.59, p = 0.025). Macrophages M2 (R = −0.58, p = 0.032), eosinophils (R = −0.65, p = 0.011), and monocytes (R = −0.58, p = 0.032) were lowly correlation with XIST (Figure 10B). CXCL8 was related favorably with resting T cells CD4 memory (R = 0.8, p < 0.001), naive B cells (R = 0.61, p = 0.021), monocytes (R = 0.59, p = 0.027) and resting mast cells (R = 0.57, p = 0.033). CXCL8 had a negative relation with activated T cells CD4 memory (R = −0.59, p = 0.027), resting dendritic cells (R = −0.7, p = 0.007), and activated dendritic cells (R = −0.84, p < 0.001). PIM1 was interacted significantly with resting T cells CD4 memory (R = 0.84, p < 0.001) and monocytes (R = 0.62, p = 0.018). PIM1 had a weak link with activated dendritic cells (R = −0.67, p < 0.009) and activated T cells CD4 memory (R = −0.83, p < 0.001)(Figure10A- D). Discussion Osteoarthritis is a degenerative joint disease, sometimes known as a “whole-organ disease” since it can cause organ dysfunction or joint failure [28, 29]. Patients with OA frequently miss out on the best treatment options due to a lack of early diagnostic signs, resulting in a poor prognosis. Furthermore, an increasing number of studies indicate that immune cell infiltration is crucial to the pathogenesis of OA [30]. Consequently, identifying possible diagnostic markers and evaluating the pattern of immune cell infiltration in OA patients are crucial to understanding the etiology of OA. Due to the progress of science and technology, bioinformatics is growing at a tremendous speed. The analysis has provided an effective strategy for screening molecular markers. Over the past five years, artificial intelligence (AI)-based diagnostic and prognostic algorithms have made significant strides [31]. AI has been gradually used in orthopedics; machine learning was used to screen key genes. Ultimately, we discovered potential OA biomarkers and investigated the function of immune cell infiltration in OA in further detail. In this study, WGCNA identified 137 key genes. Interestingly, GO enrichment analysis showed that key genes were leukocyte activation, lipopolysaccharide, and antigen binding. The diseases enriched by KEGG mainly include IL−17, NF−kappa B and TNF signaling pathways. The above findings imply that the immune response is critical in the development of OA. Recent studies have found that inhibiting lipopolysaccharide (LPS) -primed macrophages can increase the maturity of IL-1β which may lead to an increase in synovial tissue inflammation and exacerbate the progress of OA [32, 33]. In addition, there is substantial evidence that IL-17 can induce IL-1β and TNF introduction to perpetuate the OA disease process [34]. As a result, this study may aid in the understanding of the molecular processes that cause the advancement of OA. LASSO logistic regression is a machine learning approach for determining a variable by calculating the minimal classification error [35]. The support vector machine (SVM) is a strong and versatile supervised learning model that analyzes data for classification and regression [36]. Random forest (RF) algorithm is a prominent tree-based ensemble machine learning approach that is highly data adaptable and can account for feature correlation as well as interactions [37]. In our research, LTC4S, XIST, CXCL8, and PIM1 were identified as diagnostic markers of OA by combining LASSO, SVM-RFE and RF algorithms. Notably, the potential biomarkers screened by the three-machine learning proved to be trustworthy in further validation of this study. LTC4S (Leukotriene C4 Synthase) is a protein coding gene, the main pathways involved in LTC4S are biosynthesis of DHA-derived sulfido conjugates and arachidonic acid metabolism. LTC4S is mainly overexpressed in bronchial biopsies from aspirin-intolerant asthma and asthma-tolerant asthma patients [38, 39]. More than 80% of bronchial mast cells have high expression of LTC4S, and eosinophils have low expression of LTC4S. [39]. What’s more, corticosteroids interact synergistically with IL-4 to stimulate monocyte LTC4S expression [40]. Studies of LTC4S and osteoarthritis have not yet been reported. XIST (X Inactive Specific Transcript) is an RNA gene and is affiliated with the lncRNA class. A study has shown that XIST, a ceRNA for miR-211, positively mediates miR-211-interacting CXCR4 expression, activates MAPK signaling pathway through miR-211/CXCR4 axis, and promotes OA proliferation and apoptosis [41]. Therefore, we infer XIST could be a potential treatment target for OA patients. CXCL8 is a member of the CXC chemokine family, which is a significant modulator of the inflammatory response. The protein encoded by this gene is popularly known as interleukin 8 (IL-8). According to research, an increase in IL-8 after a 6-week surgical knee distraction is related with a substantial improvement in knee injury and OA outcome score-4 at 12 months [42]. According to another study, CXCL8 induces apoptosis and decreased chondrocyte proliferation through affecting the NF-kB and JNK MAPK signaling pathways, while also increasing the production of other proinflammatory cytokines [43]. Therefore, CXCL8 may aggravate the disease progression of OA, and may also be served as new therapeutic targets for treatment of OA. PIM1 (Pim-1 Proto-Oncogene, Serine/Threonine Kinase) is also a protein coding gene, this gene makes a protein that is part of the Ser/Thr protein kinase family. The main pathways involved in PIM1 are apoptosis and autophagy and IL3-mediated signaling events. PIM1 is abundantly expressed in RA synovium [44], as well as in T cells, macrophages, and FLTs; the conclusion is the same as our study. PIM1 also reduce proinflammatory cytokines (interferon-γ and interleukin-17) and increased the proportion of CD25high FoxP3+ Treg cell. Further evidence shows that PIM1 inhibitors effectively stop collagen-induced arthritis and cartilage damage from getting worse [45]. Osteoarthritis and rheumatoid arthritis share some common mechanisms. Hence, the role of PIM1 in the pathology of osteoarthritis needs to be further investigated. Our study showed that mast cells were significantly upregulated in OA samples (Figure 9B). Recent studies have shown that mast cells are positively correlated with the severity of knee osteoarthritis, and that mast cell markers (CD117, CD203c, TPSB2) and BFGF are expressed at higher levels [46]. Animal studies have demonstrated that mast cells are involved in cartilage loss, synovitis and osteophytes formation in osteoarthritis and have a significant protective function [47]. Moreover, mast cells are also widely present in osteophytes samples and are abundantly distributed between the endosteal layer and subchondral cancellous trabeculae, participating in the ECM and bone remodeling processes of osteophytes [48]. T cells are the main infiltrating cells of synovial inflammation in patients with OA and play an important role in the pathological process of OA. T cells are the main component of synovial fluid in patients with knee osteoarthritis, CD4+, CD8+ and T regulatory (Tregs) cells are the main components, CD4+ T cells dominate, and cartilage degeneration is slower in mice with knocked out CD+8 T cells than in those without [49]. Treg cells are at high variance in the synovial fluid [50], and a decrease in Treg cells will increase T cell tolerance, thus exacerbating the process of OA. In conclusion, we employed novel scientific methods such as LASSO logistic regression algorithms, SVM-RFE, and RF to verify OA diagnostic biomarkers. Although we identified potential biomarkers based on machine learning algorithms and verified their diagnostic efficacy in external datasets, there are certain limitations to our study. The research is based on restricted genetic data and represents the second analysis of datasets. Then, experiments will also be conducted to understand the mechanics behind the potential biomarkers. Conclusion Our findings determined four potential biomarkers (LTC4S, XIST, CXCL8, and PIM1) that could clarify the diagnosis of osteoarthritis. Those biomarkers may explore the new pathophysiology of osteoarthritis and identify novel therapeutic targets. Abbreviations OA Osteoarthritis GEO Gene Expression Omnibus WGCNA Weighted gene co-expression network analysis TOM Topological overlap matrix GS Gene significance MM Module membership DEGs Differentially expressed genes GO Gene ontology KEGG Kyoto Encyclopedia of Genes and Genomes LASSO Least absolute shrinkage and selection operator SVM Support vector machine RF Random forest ROC Receiver operating curve Declarations Supplementary Materials: Author Contributions: HQ.H. ,J.L. and L.Z. developed a major research plan L.Z. and HQ.H analyzed data, drew charts, and wrote manuscripts. F.Z.and ZH.H. collected data and references. All authors have read and agreed to the published version of the manuscript. Funding: Not applicable. Institutional Review Board Statement: Not applicable. Informed Consent Statement: Not applicable. Data Availability Statement: Publicly available datasets [GSE55235] were analyzed in this study. This data can be found here: [https://www.ncbi.nlm.nih.gov/geo/]. Acknowledgments: We thank Dr.Liu (Nucleobase translocation of bioinformatics), and all the members of his bioinformatics team, for generously sharing their experience. Conflicts of Interest: The authors declare no conflict of interest. Ethics approval and informed consent : This study was approved by the ethics committee of Chengdu Medical College. References Grandi, F.C.; Bhutani, N. Epigenetic Therapies for Osteoarthritis. Trends Pharmacol Sci 2020, 41, 557-569, doi: 10.1016/j.tips.2020.05.008. Yue, L.; Berman, J. What Is Osteoarthritis? JAMA 2022, 327, 1300, doi:10.1001/jama.2022.1980. Castro da Rocha, F.A.; Melo, L.d.P.; Berenbaum, F. Tackling osteoarthritis during COVID-19 pandemic. Annals of the Rheumatic Diseases 2021, 80, 151-153, doi:10.1136/annrheumdis-2020-218372. Batshon, G.; Elayyan, J.; Qiq, O.; Reich, E.; Ben-Aderet, L.; Kandel, L.; Haze, A.; Steinmeyer, J.; Lefebvre, V.; Zhang, H.; et al. Serum NT/CT SIRT1 ratio reflects early osteoarthritis and chondrosenescence. Annals of the Rheumatic Diseases 2020, 79, 1370-1380, doi:10.1136/annrheumdis-2020-217072. Amanatullah, D.F.; Murasko, M.J.; Chona, D.V.; Crijns, T.J.; Ring, D.; Kamal, R.N. Financial Distress and Discussing the Cost of Total Joint Arthroplasty. J Arthroplasty 2018, 33, 3394-3397, doi:10.1016/j.arth.2018.07.010. Gay, C.; Guiguet-Auclair, C.; Coste, N.; Boisseau, N.; Gerbaud, L.; Pereira, B.; Coudeyre, E. Limited effect of a self-management exercise program added to spa therapy for increasing physical activity in patients with knee osteoarthritis: A quasi-randomized controlled trial. Ann Phys Rehabil Med 2020, 63, 181-188, doi:10.1016/j.rehab.2019.10.006. Gauthier, J.; Vincent, A.T.; Charette, S.J.; Derome, N. A brief history of bioinformatics. Brief Bioinform 2019, 20, 1981-1996, doi:10.1093/bib/bby063. Wang, T.; Zheng, X.; Li, R.; Liu, X.; Wu, J.; Zhong, X.; Zhang, W.; Liu, Y.; He, X.; Liu, W.; et al. Integrated bioinformatic analysis reveals YWHAB as a novel diagnostic biomarker for idiopathic pulmonary arterial hypertension. J Cell Physiol 2019, 234, 6449-6462, doi:10.1002/jcp.27381. Zhao, X.; Zhang, L.; Wang, J.; Zhang, M.; Song, Z.; Ni, B.; You, Y. Identification of key biomarkers and immune infiltration in systemic lupus erythematosus by integrated bioinformatics analysis. J Transl Med 2021, 19, 35, doi:10.1186/s12967-020-02698-x. Chen, Y.; Liao, R.; Yao, Y.; Wang, Q.; Fu, L. Machine learning to identify immune-related biomarkers of rheumatoid arthritis based on WGCNA network. Clin Rheumatol 2022, 41, 1057-1068, doi:10.1007/s10067-021-05960-9. Sun, R.; Li, S.; Zhao, K.; Diao, M.; Li, L. Identification of Ten Core Hub Genes as Potential Biomarkers and Treatment Target for Hepatoblastoma. Front Oncol 2021, 11, 591507, doi:10.3389/fonc.2021.591507. Yang, Z.; Yan, G.; Zheng, L.; Gu, W.; Liu, F.; Chen, W.; Cui, X.; Wang, Y.; Yang, Y.; Chen, X.; et al., as a potential predictor of prognosis and immunotherapy response for oral squamous cell carcinoma, is related to cell invasion, metastasis, and CD8+ T cell infiltration. Oncoimmunology 2021, 10, 1938890, doi:10.1080/2162402X.2021.1938890. Li, K.; Li, S.; Zhang, H.; Lei, D.; Lo, W.L.A.; Ding, M. Computational Analysis of the Immune Infiltration Pattern and Candidate Diagnostic Biomarkers in Lumbar Disc Herniation. Front Mol Neurosci 2022, 15, 846554, doi:10.3389/fnmol.2022.846554. You, J.-A.; Gong, Y.; Wu, Y.; Jin, L.; Chi, Q.; Sun, D. WGCNA, LASSO and SVM Algorithm Revealed RAC1 Correlated M0 Macrophage and the Risk Score to Predict the Survival of Hepatocellular Carcinoma Patients. Front Genet 2021, 12, 730920, doi:10.3389/fgene.2021.730920. Edgar, R.; Domrachev, M.; Lash, A.E. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 2002, 30, 207-210, doi:10.1093/nar/30.1.207. Ritchie, M.E.; Phipson, B.; Wu, D.; Hu, Y.; Law, C.W.; Shi, W.; Smyth, G.K. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 2015, 43, e47, doi:10.1093/nar/gkv007. Wickham, H. Ggplot2: Elegant Graphics for Data Analysis; ggplot2: Elegant Graphics for Data Analysis: 2009. Langfelder, P.; Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 2008, 9, 559, doi:10.1186/1471-2105-9-559. Wu, T.; Hu, E.; Xu, S.; Chen, M.; Guo, P.; Dai, Z.; Feng, T.; Zhou, L.; Tang, W.; Zhan, L.; et al. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innovation (Camb) 2021, 2, 100141, doi:10.1016/j.xinn.2021.100141. Tibshirani, R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B 1996, 58. Friedman, J.; Hastie, T.; Tibshirani, R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw 2010, 33. Huang, M.-L.; Hung, Y.-H.; Lee, W.M.; Li, R.K.; Jiang, B.-R. SVM-RFE based feature selection and Taguchi parameters optimization for multiclass SVM classifier. ScientificWorldJournal 2014, 2014, 795624, doi:10.1155/2014/795624. Breiman, L. Random forests. Machine learning 2001, 45, 5-32. Liaw, A.; Wiener, M. Classification and regression by randomForest. R news 2002, 2, 18-22. Robin, X.; Turck, N.; Hainard, A.; Tiberti, N.; Lisacek, F.; Sanchez, J.-C.; Müller, M. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 2011, 12, 77, doi:10.1186/1471-2105-12-77. Xue, G.; Hua, L.; Zhou, N.; Li, J. Characteristics of immune cell infiltration and associated diagnostic biomarkers in ulcerative colitis: results from bioinformatics analysis. Bioengineered 2021, 12, 252-265, doi:10.1080/21655979.2020.1863016. Wickham, H. Reshaping data with the reshape package. J Stat Softw 2007, 21, 1-20. Conaghan, P.G.; Cook, A.D.; Hamilton, J.A.; Tak, P.P. Therapeutic options for targeting inflammatory osteoarthritis pain. Nat Rev Rheumatol 2019, 15, 355-363, doi:10.1038/s41584-019-0221-y. Blanco, F.J.; Rego-Pérez, I. Editorial: Is it time for epigenetics in osteoarthritis? Arthritis Rheumatol 2014, 66, 2324-2327, doi:10.1002/art.38710. Rosshirt, N.; Hagmann, S.; Tripel, E.; Gotterbarm, T.; Kirsch, J.; Zeifang, F.; Lorenz, H.M.; Tretter, T.; Moradi, B. A predominant Th1 polarization is present in synovial fluid of end-stage osteoarthritic knee joints: analysis of peripheral blood, synovial fluid and synovial membrane. Clin Exp Immunol 2019, 195, 395-406, doi:10.1111/cei.13230. Karhade, A.V.; Schwab, J.H. Introduction to The Spine Journal special issue on artificial intelligence and machine learning. Spine J 2021, 21, 1601-1603, doi:10.1016/j.spinee.2021.03.028. Ni, Z.; Kuang, L.; Chen, H.; Xie, Y.; Zhang, B.; Ouyang, J.; Wu, J.; Zhou, S.; Chen, L.; Su, N.; et al. The exosome-like vesicles from osteoarthritic chondrocyte enhanced mature IL-1β production of macrophages and aggravated synovitis in osteoarthritis. Cell Death Dis 2019, 10, 522, doi:10.1038/s41419-019-1739-2. Zhang, H.; Ge, J.; Lu, X. CircFADS2 is downregulated in osteoarthritis and suppresses LPS-induced apoptosis of chondrocytes by regulating miR-195-5p methylation. Arch Gerontol Geriatr 2021, 96, 104477, doi:10.1016/j.archger.2021.104477. Na, H.S.; Park, J.-S.; Cho, K.-H.; Kwon, J.Y.; Choi, J.; Jhun, J.; Kim, S.J.; Park, S.-H.; Cho, M.-L. Interleukin-1-Interleukin-17 Signaling Axis Induces Cartilage Destruction and Promotes Experimental Osteoarthritis. Front Immunol 2020, 11, 730, doi:10.3389/fimmu.2020.00730. Yuan, Q.; Ren, J.; Wang, Z.; Ji, L.; Deng, D.; Shang, D. Identification of the Real Hub Gene and Construction of a Novel Prognostic Signature for Pancreatic Adenocarcinoma Based on the Weighted Gene Co-expression Network Analysis and Least Absolute Shrinkage and Selection Operator Algorithms. Front Genet 2021, 12, 692953, doi:10.3389/fgene.2021.692953. Ding, C.; Bao, T.-Y.; Huang, H.-L. Quantum-Inspired Support Vector Machine. IEEE Trans Neural Netw Learn Syst 2021, PP, doi:10.1109/TNNLS.2021.3084467. Chen, X.; Ishwaran, H. Random forests for genomic data analysis. Genomics 2012, 99, 323-329, doi:10.1016/j.ygeno.2012.04.003. Cowburn, A.S.; Sladek, K.; Soja, J.; Adamek, L.; Nizankowska, E.; Szczeklik, A.; Lam, B.K.; Penrose, J.F.; Austen, F.K.; Holgate, S.T.; et al. Overexpression of leukotriene C4 synthase in bronchial biopsies from patients with aspirin-intolerant asthma. J Clin Invest 1998, 101, 834-846, doi:10.1172/jci620. Cai, Y.; Bjermer, L.; Halstensen, T.S. Bronchial mast cells are the dominating LTC4S-expressing cells in aspirin-tolerant asthma. Am J Respir Cell Mol Biol 2003, 29, 683-693, doi:10.1165/rcmb.2002-0174OC. Negri, J.; Early, S.B.; Steinke, J.W.; Borish, L. Corticosteroids as inhibitors of cysteinyl leukotriene metabolic and signaling pathways. J Allergy Clin Immunol 2008, 121, 1232-1237, doi:10.1016/j.jaci.2008.02.007. Li, L.; Lv, G.; Wang, B.; Kuang, L. The role of lncRNA XIST/miR-211 axis in modulating the proliferation and apoptosis of osteoarthritis chondrocytes through CXCR4 and MAPK signaling. Biochem Biophys Res Commun 2018, 503, 2555-2562, doi:10.1016/j.bbrc.2018.07.015. Watt, F.E.; Hamid, B.; Garriga, C.; Judge, A.; Hrusecka, R.; Custers, R.J.H.; Jansen, M.P.; Lafeber, F.P.; Mastbergen, S.C.; Vincent, T.L. The molecular profile of synovial fluid changes upon joint distraction and is associated with clinical response in knee osteoarthritis. Osteoarthritis and Cartilage 2020, 28, 324-333, doi:10.1016/j.joca.2019.12.005. Yang, P.; Tan, J.; Yuan, Z.; Meng, G.; Bi, L.; Liu, J. Expression profile of cytokines and chemokines in osteoarthritis patients: Proinflammatory roles for CXCL8 and CXCL11 to chondrocytes. Int Immunopharmacol 2016, 40, 16-23, doi:10.1016/j.intimp.2016.08.005. Ha, Y.J.; Choi, Y.S.; Han, D.W.; Kang, E.H.; Yoo, I.S.; Kim, J.H.; Kang, S.W.; Lee, E.Y.; Song, Y.W.; Lee, Y.J. PIM-1 kinase is a novel regulator of proinflammatory cytokine-mediated responses in rheumatoid arthritis fibroblast-like synoviocytes. Rheumatology (Oxford) 2019, 58, 154-164, doi:10.1093/rheumatology/key261. Maney, N.J.; Lemos, H.; Barron-Millar, B.; Carey, C.; Herron, I.; Anderson, A.E.; Mellor, A.L.; Isaacs, J.D.; Pratt, A.G. Pim Kinases as Therapeutic Targets in Early Rheumatoid Arthritis. Arthritis Rheumatol 2021, 73, 1820-1830, doi:10.1002/art.41744. Takata, K.; Uchida, K.; Takano, S.; Mukai, M.; Inoue, G.; Sekiguchi, H.; Aikawa, J.; Miyagi, M.; Iwase, D.; Takaso, M. Possible Regulation of bFGF Expression by Mast Cells in Osteoarthritis Patients with Obesity: A Cross-Sectional Study. Diabetes Metab Syndr Obes 2021, 14, 3291-3297, doi:10.2147/dmso.S319537. Wang, Q.; Lepus, C.M.; Raghu, H.; Reber, L.L.; Tsai, M.M.; Wong, H.H.; von Kaeppler, E.; Lingampalli, N.; Bloom, M.S.; Hu, N.; et al. IgE-mediated mast cell activation promotes inflammation and cartilage destruction in osteoarthritis. Elife 2019, 8, doi:10.7554/eLife.39905. Kulkarni, P.; Harsulkar, A.; Märtson, A.-G.; Suutre, S.; Märtson, A.; Koks, S. Mast Cells Differentiated in Synovial Fluid and Resident in Osteophytes Exalt the Inflammatory Pathology of Osteoarthritis. Int J Mol Sci 2022, 23, doi:10.3390/ijms23010541. Hsieh, J.-L.; Shiau, A.-L.; Lee, C.-H.; Yang, S.-J.; Lee, B.-O.; Jou, I.M.; Wu, C.-L.; Chen, S.-H.; Shen, P.-C. CD8+ T cell-induced expression of tissue inhibitor of metalloproteinses-1 exacerbated osteoarthritis. Int J Mol Sci 2013, 14, 19951-19970, doi:10.3390/ijms141019951. Zhu, W.; Zhang, X.; Jiang, Y.; Liu, X.; Huang, L.; Wei, Q.; Huang, Y.; Wu, W.; Gu, J. Alterations in peripheral T cell and B cell subsets in patients with osteoarthritis. Clin Rheumatol 2020, 39, 523-532, doi:10.1007/s10067-019-04768-y. Additional Declarations No competing interests reported. Supplementary Files supplementfigures.pdf Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4299353","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":294853000,"identity":"83899d25-2d0c-47f8-8a7e-72d6a2ccbd13","order_by":0,"name":"Li Zheng","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAzklEQVRIiWNgGAWjYDACCRBRIMHA2N7Y+PAD8VoMgFp6DjcbS5CgBcRIbxPgIUaH/OzmYw+/GFjIM8982AbUbyen20BAC+OcY+nGMgYSho2zE9seFDAkG5sdIKCFWSLHTFrCQCKBcXZiO9BLBxK3EdLCJpH/DaJl5sE2CR5itPBI5LBJfgBpmcFIpBYJiTQzaQaQX3oSgYFsQIRf5GckP5P8UVEnb9h+/OHDDxV2cgS1gAAzKDoMG0BMAyKUgwDjD5B1RCoeBaNgFIyCEQgAEDc7b8pXQVMAAAAASUVORK5CYII=","orcid":"","institution":"Chengdu Medical College","correspondingAuthor":true,"prefix":"","firstName":"Li","middleName":"","lastName":"Zheng","suffix":""},{"id":294853001,"identity":"52ee4f99-1c6d-405d-8226-0def25dcea7c","order_by":1,"name":"Hongquan Heng","email":"","orcid":"","institution":"Southwest Hospital Affiliated to Army Military Medical University","correspondingAuthor":false,"prefix":"","firstName":"Hongquan","middleName":"","lastName":"Heng","suffix":""},{"id":294853003,"identity":"971f6ac9-40b1-40d6-b169-b1fe926161d2","order_by":2,"name":"Jian Li","email":"","orcid":"","institution":"The Second Affiliated Hospital of Soochow University","correspondingAuthor":false,"prefix":"","firstName":"Jian","middleName":"","lastName":"Li","suffix":""},{"id":294853004,"identity":"d7fc5c2b-90c2-4fca-a8b3-846633304032","order_by":3,"name":"Feng Zhou","email":"","orcid":"","institution":"The Second Affiliated Hospital of Soochow University","correspondingAuthor":false,"prefix":"","firstName":"Feng","middleName":"","lastName":"Zhou","suffix":""},{"id":294853005,"identity":"34ec0239-b1b4-4be3-b42b-4fec46518341","order_by":4,"name":"Zhenghui Hu","email":"","orcid":"","institution":"The Second Affiliated Hospital of Soochow University","correspondingAuthor":false,"prefix":"","firstName":"Zhenghui","middleName":"","lastName":"Hu","suffix":""}],"badges":[],"createdAt":"2024-04-21 04:39:07","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4299353/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4299353/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":55501844,"identity":"886edc39-119a-49e4-aae1-944a54dd8748","added_by":"auto","created_at":"2024-04-29 10:25:07","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":83913,"visible":true,"origin":"","legend":"\u003cp\u003eDEGs between OA patients and control samples. (\u003cstrong\u003eA\u003c/strong\u003e) The expression levels of DEGs are shown by a volcano graphic. Red dots represent upregulated genes in OA patients, blue dots represent downregulated genes in OA patients, and gray dots represent genes that are not substantially different between OA patients and controls. (\u003cstrong\u003eB\u003c/strong\u003e) Heatmap depicting the expression levels of the leading 30 DEGs Red denotes a high level of expression, while blue suggests a low level.\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-4299353/v1/6dc79d2e3e48c0f7cb420a8d.png"},{"id":55501982,"identity":"835510b4-b492-4ee5-9f23-c1a171308abf","added_by":"auto","created_at":"2024-04-29 10:33:07","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":119146,"visible":true,"origin":"","legend":"\u003cp\u003eScreening for key module genes with WGCNA. (\u003cstrong\u003eA\u003c/strong\u003e) Definition of soft threshold power through WGCNA. Scale-free indices and mean connectivity were analyzed for various soft threshold powers (β).\u003cstrong\u003e \u003c/strong\u003e(\u003cstrong\u003eB\u003c/strong\u003e) Hierarchical clustering technique was used to identify gene co-expression groups. Each branch of the tree diagram represents a gene, and genes grouped into the same module are colored the same. (\u003cstrong\u003eC\u003c/strong\u003e) Correlation between modules and OA, the modules were identified by combining modules with a feature factor greater than 0.5 and setting the minimum number of genes in a module to 50.\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-4299353/v1/38cd89be613e67f02ba21f44.png"},{"id":55501845,"identity":"6ef7aecc-11af-4a13-9a70-f6c4b5ce4333","added_by":"auto","created_at":"2024-04-29 10:25:07","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":162393,"visible":true,"origin":"","legend":"\u003cp\u003eAnalysis and display of key genes' functional enrichment. (\u003cstrong\u003eA\u003c/strong\u003e) overlap DEGs and WGCNA. (\u003cstrong\u003eB, C\u003c/strong\u003e)\u003cstrong\u003e \u003c/strong\u003eKey gene enrichment study based on GO and KEGG. (\u003cstrong\u003eD\u003c/strong\u003e) Analysis of the major OA genes' PPI networks, using the STRING database to create PPI networks between 137 OA-related important genes and Cytoscape software to show them.\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-4299353/v1/459aaa3ecc3bdb78ba9eca2f.png"},{"id":55501983,"identity":"b8f0877b-a908-41a0-a430-6d0c8dea88f6","added_by":"auto","created_at":"2024-04-29 10:33:08","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":69397,"visible":true,"origin":"","legend":"\u003cp\u003eSelection of potential biomarkers by machine learning. (\u003cstrong\u003eA\u003c/strong\u003e) LASSO logistic regression algorithm was used to retain the most predictive features and tuning parameter selection in the LASSO model. (\u003cstrong\u003eB\u003c/strong\u003e) Removing the SVM-generated feature vectors and merging them with 5-fold cross-validation allowed for the identification of relatively relevant factors. The cross-validation were plotted against variables, respectively, which showed that the variable was set at 2, with the highest accuracy. (\u003cstrong\u003eC\u003c/strong\u003e) Exploring the importance of the top 20 genes using a random forest model. (\u003cstrong\u003eD\u003c/strong\u003e) Intersection of three machine learning genes to obtain eight potential biomarkers.\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-4299353/v1/b5c969ef2917b1668d5f679a.png"},{"id":55501984,"identity":"00481efc-9486-459c-a315-57c9462a7994","added_by":"auto","created_at":"2024-04-29 10:33:08","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":59384,"visible":true,"origin":"","legend":"\u003cp\u003eEvaluation of the diagnostic implications for potential biomarkers. \u003cstrong\u003e(A) \u003c/strong\u003eROC of LTC4S, XIST, CXCL8, and PIM1 in dataset GSE55235. \u003cstrong\u003e(B)\u003c/strong\u003e ROC of LTC4S, XIST, CXCL8, and PIM1 in dataset GSE12021.\u003c/p\u003e","description":"","filename":"5.png","url":"https://assets-eu.researchsquare.com/files/rs-4299353/v1/fc477a6cabcbed34045bec56.png"},{"id":55501848,"identity":"ea2f16b0-1cb2-48ae-a9e7-251de6b013da","added_by":"auto","created_at":"2024-04-29 10:25:08","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":54558,"visible":true,"origin":"","legend":"\u003cp\u003eExpression levels of the potential biomarkers. (\u003cstrong\u003eA\u003c/strong\u003e) expression levelsof LTC4S, XIST, CXCL8, and PIM1 in dataset GSE55235. \u003cstrong\u003e(B)\u003c/strong\u003e expression levels of LTC4S, XIST, CXCL8, and PIM1 in dataset GSE12021.\u003c/p\u003e","description":"","filename":"6.png","url":"https://assets-eu.researchsquare.com/files/rs-4299353/v1/a9a400ddf033354287d28fda.png"},{"id":55501851,"identity":"f27c1e34-13dc-4c9c-af51-a9369dc3bc7f","added_by":"auto","created_at":"2024-04-29 10:25:08","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":109939,"visible":true,"origin":"","legend":"\u003cp\u003eGSEA of potential biomarkers. (\u003cstrong\u003eA, B\u003c/strong\u003e)\u003cstrong\u003e \u003c/strong\u003eGSEA results of LTC4S and XIST.\u003c/p\u003e","description":"","filename":"7.png","url":"https://assets-eu.researchsquare.com/files/rs-4299353/v1/0f4c0ea07dd063b132810ba2.png"},{"id":55501849,"identity":"bf0a4603-4e09-41c4-99c7-492607233ea5","added_by":"auto","created_at":"2024-04-29 10:25:08","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":400627,"visible":true,"origin":"","legend":"\u003cp\u003eGSEA of potential biomarkers. (\u003cstrong\u003eA, B\u003c/strong\u003e)\u003cstrong\u003e \u003c/strong\u003eGSEA results of CXCL8 and PIM1.\u003c/p\u003e","description":"","filename":"8.png","url":"https://assets-eu.researchsquare.com/files/rs-4299353/v1/81d312fd274ed11fa8fec580.png"},{"id":55501985,"identity":"e55c9e6b-8970-42d6-a405-2ab496e82fd6","added_by":"auto","created_at":"2024-04-29 10:33:08","extension":"png","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":86585,"visible":true,"origin":"","legend":"\u003cp\u003eImmune cells infiltration analysis. \u003cstrong\u003e(A)\u003c/strong\u003e Infiltration patterns of 22 immune cell types in control and osteoarthritis groups. \u003cstrong\u003e(B) \u003c/strong\u003eThe box plot for expressing differences showed the distinction of 22 infiltrating immune cells among the OA and control groups.\u003c/p\u003e","description":"","filename":"9.png","url":"https://assets-eu.researchsquare.com/files/rs-4299353/v1/ae012873bd75cf297762ba18.png"},{"id":55501854,"identity":"5901e2bd-88c3-422d-8d74-5c9cd702107f","added_by":"auto","created_at":"2024-04-29 10:25:08","extension":"png","order_by":10,"title":"Figure 10","display":"","copyAsset":false,"role":"figure","size":106659,"visible":true,"origin":"","legend":"\u003cp\u003ePotential biomarker correlation analysis with immune infiltrating Cells. (A-D)\u003cstrong\u003e \u003c/strong\u003eCorrelation of LTC4S, XIST, CXCL8, and PIM1 and 22 immune infiltrating cells. The larger the circle, the higher the correlation; the greener, the smaller pvalue. Immune infiltrating cells with pvalue \u0026lt;0.05 have been marked in red.\u003c/p\u003e","description":"","filename":"10.png","url":"https://assets-eu.researchsquare.com/files/rs-4299353/v1/1d7f6782f8573287b389dd7a.png"},{"id":55622458,"identity":"7032ea66-f0a8-482e-a845-cff5b80b792d","added_by":"auto","created_at":"2024-04-30 17:07:09","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2101577,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4299353/v1/9df0f5bf-5cf8-4053-99e6-dc6b116c32e9.pdf"},{"id":55501855,"identity":"d4a4e8b0-041c-423e-addf-3886651ac990","added_by":"auto","created_at":"2024-04-29 10:25:08","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":309620,"visible":true,"origin":"","legend":"","description":"","filename":"supplementfigures.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4299353/v1/449ac18c5b5f21370cd68892.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Screening of Potential Biomarkers and Immune Analysis for Osteoarthritis Based on Machine Learning and WGCNA","fulltext":[{"header":"Introduction","content":"\u003cp\u003eOsteoarthritis (OA) is a complex joint disease characterized by the progressive destruction of articular cartilage with lesions of the synovium, subchondral bone, and surrounding tissues [1]. OA is the most common type of arthritis and affects more than 240 million people worldwide [2]. Particularly, due to the continuous influence of the new coronavirus, the worldwide incidence of osteoarthritis will expand in the next years, and the associated sarcopenia will deteriorate [3]. Currently, even though joint imaging is now utilized to help the identification of osteoarthritis, it is often reserved for later stages. The mechanisms of target gene expression and disease regulation in early OA are not fully elucidated [4]. Furthermore, total joint replacement may be beneficial in enhancing life quality, but many patients continue to experience severe emotional and financial burdens [5, 6]. Therefore, it is essential to investigate novel biomarkers for the early recognition of osteoarthritis.\u003c/p\u003e\n\u003cp\u003eBioinformatics analysis is an innovative and promising method for analyzing biomarkers [7, 8]. The screening of novel biomarkers for nonneoplastic disorders has been aided by data remaining employing bioinformatics analysis based on public datasets [9]. The effectiveness of genetic disorder biomarker identification is greatly increased by combining weighted gene co-expression network analysis (WGCNA) with machine learning approaches [10-12]. For example, with the help of WGCNA and the least absolute shrinkage and selection operator (LASSO), as well as logistic regression and SVM-RFE, SIRPB2, AQP9, SLC16A3, HSPA6, and LILRB3 are identified as hub genes for lumbar disc herniation [13]. WGCNA, LASSO, and SVM-RFE are used to identify RAC1 correlated M0 macrophages and the risk score to predict hepatocellular carcinoma patients\u0026rsquo; survival [14]. However, many bioinformatics assessments of osteoarthritis have not used WGCNA and three machine learning techniques.\u003c/p\u003e\n\u003cp\u003eIn this study, a gene expression dataset was used to identify potential biomarkers in OA by incorporating WGCNA with LASSO, SVM, and RF. Firstly, we identified differentially expressed genes (DEGs) in the GSE55235 dataset for OA and control samples. Furthermore, WGCNA was generated to screen out key genes based on the expression matrix of DEGs, and using machine learning techniques, potential biomarkers were identified. Ultimately, we explored the functionalities of potential biomarkers and the immune mechanisms associated with osteoarthritis. This study may provide novel diagnostic biomarkers and aid in the investigation of the pathophysiology of OA.\u003c/p\u003e\n"},{"header":"Materials And Methods","content":"\u003cp\u003e\u003cem\u003e2.1. Data Collection and Identification of DEGs\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eThe datasets of GSE55235 and GSE12021 were acquired from the GEO database [15]. The GSE55235 data collection consists of 10 OA samples and 10 control samples sequenced on the GPL96 platform; the GSE12021 data collection contains 10 OA samples and 9 control samples on the GPL96 platform. The \u0026ldquo;limma\u0026rdquo; package [16] was used to screen differentially expressed genes (DEGs) between OA and healthy samples. The DEGs with adjust p-value \u0026lt;0.05 and |log2FC| \u0026gt;1 was statistically significant. Then, the \u0026ldquo;ggplot2\u0026rdquo; and \u0026ldquo;pheatmap\u0026rdquo; packages [17] were used to produce the volcano diagram as well as the heatmap.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003e2.2. Selection for Key Module Genes with WGCNA\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eThe expression matrix of DEGs was implemented with the \u0026ldquo;WGCNA\u0026rdquo; package [18] to generate a weighted gene coexpression network to identify genes related to OA. To assure the network\u0026rsquo;s dependability, all samples were first grouped. The expression similarity of genes was then assessed by computing the Pearson correlation coefficient between each pair of genes to produce a correlation matrix. In addition, the soft threshold function was used to turn the correlation matrix into a weighted neighborhood matrix. To ensure that gene correlations were ideally compatible with scale-free distribution, the best soft threshold was determined using a soft connectivity approach. Next, a topological overlap matrix (TOM) was created from the neighborhood matrix. What\u0026rsquo;s more, co-expression modules with a minimum of 100 genes per module were created based on the criteria of dynamic tree cutting. Lastly, gene significance (GS) and module membership (MM) were looked at to combine modules with clinical characteristics and show the signature gene network.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003e2.3. Correlation Analyses of Functions and Pathways\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eA venn diagram was carried out by R (version 4.2.0) to overlap the genes between DEGs and WGCNA. The key genes were extracted for further functional enrichment. The \u0026ldquo;clusterProfiler\u0026rdquo; package [19] was used to perform Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses to investigate the biological functions and pathways. P \u0026lt;0.05 was deemed statistically significant. Additionally, protein-protein interaction (PPI) networks of key genes were created and shown using the Search Tool for Recurring Instances of Neighbouring Genes (STRING, Version:11.5, https://www.string-protein.org) database and Cytoscape software (Version: 3.9.1, http:// cyto-scape.org/).\u003c/p\u003e\n\u003cp\u003e\u003cem\u003e2.4.\u003c/em\u003e \u003cem\u003eMachine Learning Algorithms Identify Potential Biomarkers\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eInitially, key genes were screened using the \u0026ldquo;glmnet\u0026rdquo; package and the LASSO logistic regression [20, 21]. The SVM\u0026nbsp;algorithm was then used to search for candidate genes using the \u0026ldquo;e1071\u0026rdquo; package [22]. Moreover, the random forest (RF) algorithm [23] was employed to evaluate key genes using the \u0026ldquo;randomForest\u0026rdquo; package [24]. Lastly, overlapping key genes identified by the LASSO, SVM, and RF algorithms were assessed to be potential biomarkers of OA in synovial tissue.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003e2.5.\u003c/em\u003e \u003cem\u003eDiagnostic Efficacy and Expression Levels of Potential Biomarkers\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eTo evaluate the diagnostic effectiveness of potential biomarkers, using the \u0026ldquo;pROC\u0026rdquo; package, ROC was\u0026nbsp;undertaken to see if potential biomarkers might discriminate OA samples from control samples in GSE55235 and GSE12021 (25). \u0026ldquo;limma\u0026rdquo;, \u0026ldquo;ggsci\u0026rdquo; and \u0026ldquo;ggpubr\u0026rdquo; packages were applied to plot the expression levels of potential biomarkers and validate them within the GSE12021 dataset.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003e2.6.\u003c/em\u003e \u003cem\u003eGSEA and Immune Infiltration Analysis of Potential Biomarkers\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eFirstly, gene set enrichment analysis (GSEA) was conducted with the \u0026ldquo;clusterProfile\u0026rdquo; [19], \u0026ldquo;patchwork\u0026rdquo; and \u0026ldquo;org.Hs.eg.db\u0026rdquo; packages to determine the biological function of potential biomarkers. Secondly, analysis of integrated gene expression data using the CIBERSORT algorithm revealed the proportion of 22 immune cell types [26]. Finally, the \u0026ldquo;reshape2\u0026rdquo; [27] and \u0026ldquo;tidyverse\u0026rdquo; packages were utilized to investigate the relationship between osteoarthritis and 22 different types of immune infiltrating cells. To explore the correlation between potential biomarkers and 22 kinds of immune infiltrating cells, we used lollipop plots to show the correlation. P value \u0026lt;0.05 was statistically significant.\u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003e3.1. Identification of DEGs\u003c/p\u003e\n\u003cp\u003eThe normalized boxplots of GSE55235 were presented in Supplement Figure 1. The DEGs of GSE55235 were identified by the \u0026ldquo;limma\u0026rdquo; package. A total of 803 DEGs were discovered, containing 429 up-regulated and 374 down-regulated genes. \u0026ldquo;pheatmap\u0026rdquo; and \u0026ldquo;ggplot2\u0026rdquo; were chosen to generate heatmaps and volcano maps of substantially changed genes (Figure 1A, B).\u003c/p\u003e\n\u003cp\u003e3.2. Screening for Key Module Genes with WGCNA\u003c/p\u003e\n\u003cp\u003eTo further screen genes related to OA, WGCNA was performed using 803 DEGs. At first, we explored whether the sample had outliers (Figure S2A), the soft threshold selection study demonstrated that when power = 7 (scale free R2 = 0.85). Then, k plot was used to verify soft power 7 can ensure scale free R2 \u0026ge;0.85 (Figure S2B). Gene association was most commensurate with the scale-free distribution (Figure 2A). Moreover, merging modules with feature factors larger than 0.5 and setting the minimum module size to 100 identified many co-expression modules in the WGCNA (Figure 2B). Purple and yellow modules were selected as the key modules because they had a higher correlation than other modules with OA (Figure 2C). These two modules each contain 2133 genes.\u003c/p\u003e\n\u003cp\u003e3.3. Functional Enrichment Analysis of Key Genes in OA\u003c/p\u003e\n\u003cp\u003eThe venn was used to show 137 key genes to OA (Figure 3A), GO analysis showed these key genes were mainly involved in response to lipopolysaccharide, molecule of bacterial origin and leukocyte activation and radiation. Moreover, these key genes were mainly associated with collagen\u0026minus;containing extracellular matrix and antigen binding (Figure 3B). Furthermore, KEGG analysis results showed that these key genes were enriched in IL\u0026minus;17, NF\u0026minus;kappa B and TNF signaling pathways. Moreover, the KEGG analysis were mainly enriched in human T\u0026minus;cell leukemia virus 1 infection and TNF signaling pathway (Figure 3C). Using PPI networks, we investigate connections between proteins encoded by key genes. As shown in Figure 3D, the PPI network included 137 nodes with 279 edges, and MYC, ATF3, VEGFA, MMP9, CXCL8, EGR1, PTGS2 and NFKBI could affect more proteins.\u003c/p\u003e\n\u003cp\u003e3.4. Machine Learning Algorithms Identify Potential Biomarkers\u003c/p\u003e\n\u003cp\u003eIn LASSO, RF, and SVM-RFE evaluations, 137 key genes were used to better identify potential biomarkers. Firstly, 11 out of 137 important genes were identified using the LASSO regression technique (optimal sparse parameter \u0026lambda; = 0.004), including ABCC3, ANGPTL7, CX3CR1, CXCL8, FOSL2, GADD45A, GAP43, LTC4S, MYOC, PIM1 and XIST (Figure 4A). In a 5-fold cross-validation, we used the SVM technique to identify 20 genes (Figure 4B). Using RF, we determined the top 20 genes (Figure 4C) After taking three machine learning algorithms to get the intersection of genes, ABCC3, ANGPTL7, CXCL8, GADD45A, LTC4S, MYOC, PIM1 and XIST were identified as target biomarkers (Figure 4D). At last, we built boxplots of residual and receiver operating characteristic (ROC) for two machine learning models, SVM and RF, and the results demonstrate the methods\u0026rsquo; excellent diagnostic performance (Figure S3A, B).\u003c/p\u003e\n\u003cp\u003e3.5. Potential Biomarkers\u0026rsquo; Diagnostic Effectiveness and Expression Levels\u003c/p\u003e\n\u003cp\u003eTo validate the diagnostic efficacy of 8 potential biomarkers, we validated them in the datasets GSE55235 and GSE12021 and excluded 4 potential genes: ABCC3, ANGPTL7, GADD45A, and MYOC (Figure S4A, B). The ROC curve results of LTC4, XIST, CXCL8, and PIM1 in GSE55235 were as follows: LTC4S (AUC=1.000), XIST (AUC=0.850), CXCL8 (AUC=1.000), and PIM1 (AUC=0.990) (Figure 5A). The ROC curve results of LTC4, XIST, CXCL8, and PIM1 in GSE12021 were as follows: LTC4S (AUC=0.878), XIST (AUC=0.778), CXCL8 (AUC=0.778), and PIM1 (AUC=0.872) (Figure 5B). Then, we evaluated the expression levels of four possible biomarkers in the datasets GSE55235 and GSE12021 and determined that LTC4S and XIST are highly expressed in OA, but CXCL8 and PIM1 are lowly expressed (Figure 6A, B).\u003c/p\u003e\n\u003cp\u003e3.6. GSEA Analyses of Potential Biomarkers\u003c/p\u003e\n\u003cp\u003eWe investigated the biological functions and regulatory mechanisms of LTC4S, XIST, CXCL8, and PIM1 in greater detail. GSEA analyses revealed that LTC4S and XIST are mainly involved in alpha\u0026minus;Linolenic acid metabolism, fat digestion and absorption ECM\u0026minus;receptor interaction, and ribosome (Figure 7A, B). Moreover, GSEA analyses showed that CXCL8 and PIM1 are highly expressed in mineral absorption, ECM\u0026minus;receptor interaction, and protein digestion and absorption. CXCL8 expression levels are low in thiamine, beta\u0026minus;Alanine, histidine, and tryptophan metabolisms. PIM1 are lowly expressed in linoleic acid, ether lipid and alpha\u0026minus;Linolenic acid metabolisms (Figure 8A, B).\u003c/p\u003e\n\u003cp\u003e3.7. Correlation Analyses of Potential Biomarkers with Immune Infiltrating Cells\u003c/p\u003e\n\u003cp\u003eFirstly, we analyzed the histogram distribution of osteoarthritis and 22 kinds of immune infiltrating cells. The bar chart indicated that OA is primarily associated with mast cells, dendritic cells, macrophages M1, and activated T cells CD4 memory (Figure 9A). Secondly, we illustrated the expression levels of the immune infiltrating cells in OA using box plots. Activated T cells CD4 memory, resting dendritic cells and activated dendritic cells were highly upregulated in OA. Resting T cells CD4 memory, monocytes, macrophages M2, resting mast cells, and eosinophils were highly downregulated in OA (Figure 9B).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eFinally, we evaluated the connection between the four potential biomarkers identified and the 22 immune infiltrating cells. The results suggested that LTC4S was positively correlated with activated T cells CD4 memory (R = 0.66, p = 0.010), activated mast cells (R = 0.7, p = 0.007), and activated dendritic cells (R = 0.77, p = 0.001). LTC4S was negatively associated with eosinophils (R = \u0026minus;0.58, p = 0.028), monocytes (R = \u0026minus;0.72, p = 0.004), resting T cells CD4 memory (R = \u0026minus;0.8, p \u0026lt; 0.001), and mast cells resting (R = \u0026minus;0.7, p = 0.0053) (Figure 10A). XIST was strongly associated with activated T cells CD4 memory (R = 0.59, p = 0.025). Macrophages M2 (R = \u0026minus;0.58, p = 0.032), eosinophils (R = \u0026minus;0.65, p = 0.011), and monocytes (R = \u0026minus;0.58, p = 0.032) were lowly correlation with XIST (Figure 10B). CXCL8 was related favorably with resting T cells CD4 memory (R = 0.8, p \u0026lt; 0.001), naive B cells (R = 0.61, p = 0.021), monocytes (R = 0.59, p = 0.027) and resting mast cells (R = 0.57, p = 0.033). CXCL8 had a negative relation with activated T cells CD4 memory (R = \u0026minus;0.59, p = 0.027), resting dendritic cells (R = \u0026minus;0.7, p = 0.007), and activated dendritic cells (R = \u0026minus;0.84, p \u0026lt; 0.001). PIM1 was interacted significantly with resting T cells CD4 memory (R = 0.84, p \u0026lt; 0.001) and monocytes (R = 0.62, p = 0.018). PIM1 had a weak link with activated dendritic cells (R = \u0026minus;0.67, p \u0026lt; 0.009) and activated T cells CD4 memory (R = \u0026minus;0.83, p \u0026lt; 0.001)(Figure10A- D).\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eOsteoarthritis is a degenerative joint disease, sometimes known as a \u0026ldquo;whole-organ disease\u0026rdquo; since it can cause organ dysfunction or joint failure [28, 29]. Patients with OA frequently miss out on the best treatment options due to a lack of early diagnostic signs, resulting in a poor prognosis. Furthermore, an increasing number of studies indicate that immune cell infiltration is crucial to the pathogenesis of OA [30]. Consequently, identifying possible diagnostic markers and evaluating the pattern of immune cell infiltration in OA patients are crucial to understanding the etiology of OA. Due to the progress of science and technology, bioinformatics is growing at a tremendous speed. The analysis has provided an effective strategy for screening molecular markers. Over the past five years, artificial intelligence (AI)-based diagnostic and prognostic algorithms have made significant strides [31]. AI has been gradually used in orthopedics; machine learning was used to screen key genes. Ultimately, we discovered potential OA biomarkers and investigated the function of immune cell infiltration in OA in further detail.\u003c/p\u003e\n\u003cp\u003eIn this study, WGCNA identified 137 key genes. Interestingly, GO enrichment analysis showed that key genes were leukocyte activation, lipopolysaccharide, and antigen binding. The diseases enriched by KEGG mainly include IL\u0026minus;17, NF\u0026minus;kappa B and TNF signaling pathways. The above findings imply that the immune response is critical in the development of OA. Recent studies have found that inhibiting lipopolysaccharide (LPS) -primed macrophages can increase the maturity of IL-1\u0026beta; which may lead to an increase in synovial tissue inflammation and exacerbate the progress of OA [32, 33]. In addition, there is substantial evidence that IL-17 can induce IL-1\u0026beta; and TNF introduction to perpetuate the OA disease process [34]. As a result, this study may aid in the understanding of the molecular processes that cause the advancement of OA.\u003c/p\u003e\n\u003cp\u003eLASSO logistic regression is a machine learning approach for determining a variable by calculating the minimal classification error [35]. The support vector machine (SVM) is a strong and versatile supervised learning model that analyzes data for classification and regression [36]. Random forest (RF) algorithm is a prominent tree-based ensemble machine learning approach that is highly data adaptable and can account for feature correlation as well as interactions [37]. In our research, LTC4S, XIST, CXCL8, and PIM1 were identified as diagnostic markers of OA by combining LASSO, SVM-RFE and RF algorithms. Notably, the potential biomarkers screened by the three-machine learning proved to be trustworthy in further validation of this study.\u003c/p\u003e\n\u003cp\u003eLTC4S (Leukotriene C4 Synthase) is a protein coding gene, the main pathways involved in LTC4S are biosynthesis of DHA-derived sulfido conjugates and arachidonic acid metabolism. LTC4S is mainly overexpressed in bronchial biopsies from aspirin-intolerant asthma and asthma-tolerant asthma patients [38, 39]. More than 80% of bronchial mast cells have high expression of LTC4S, and eosinophils have low expression of LTC4S. [39]. What\u0026rsquo;s more, corticosteroids interact synergistically with IL-4 to stimulate monocyte LTC4S expression [40]. Studies of LTC4S and osteoarthritis have not yet been reported. XIST (X Inactive Specific Transcript) is an RNA gene and is affiliated with the lncRNA class. A study has shown that XIST, a ceRNA for miR-211, positively mediates miR-211-interacting CXCR4 expression, activates MAPK signaling pathway through miR-211/CXCR4 axis, and promotes OA proliferation and apoptosis [41]. Therefore, we infer XIST could be a potential treatment target for OA patients. CXCL8 is a member of the CXC chemokine family, which is a significant modulator of the inflammatory response. The protein encoded by this gene is popularly known as interleukin 8 (IL-8). According to research, an increase in IL-8 after a 6-week surgical knee distraction is related with a substantial improvement in knee injury and OA outcome score-4 at 12 months [42]. According to another study, CXCL8 induces apoptosis and decreased chondrocyte proliferation through affecting the NF-kB and JNK MAPK signaling pathways, while also increasing the production of other proinflammatory cytokines [43]. Therefore, CXCL8 may aggravate the disease progression of OA, and may also be served as new therapeutic targets for treatment of OA. PIM1 (Pim-1 Proto-Oncogene, Serine/Threonine Kinase) is also a protein coding gene, this gene makes a protein that is part of the Ser/Thr protein kinase family. The main pathways involved in PIM1 are apoptosis and autophagy and IL3-mediated signaling events. PIM1 is abundantly expressed in RA synovium [44], as well as in T cells, macrophages, and FLTs; the conclusion is the same as our study. \u0026nbsp;PIM1 also reduce proinflammatory cytokines (interferon-\u0026gamma; and interleukin-17) and increased the proportion of CD25high FoxP3+ Treg cell. Further evidence shows that PIM1 inhibitors effectively stop collagen-induced arthritis and cartilage damage from getting worse [45]. Osteoarthritis and rheumatoid arthritis share some common mechanisms. Hence, the role of PIM1 in the pathology of osteoarthritis needs to be further investigated.\u003c/p\u003e\n\u003cp\u003eOur study showed that mast cells were significantly upregulated in OA samples (Figure 9B). Recent studies have shown that mast cells are positively correlated with the severity of knee osteoarthritis, and that mast cell markers (CD117, CD203c, TPSB2) and BFGF are expressed at higher levels [46]. Animal studies have demonstrated that mast cells are involved in cartilage loss, synovitis and osteophytes formation in osteoarthritis and have a significant protective function [47]. Moreover, mast cells are also widely present in osteophytes samples and are abundantly distributed between the endosteal layer and subchondral cancellous trabeculae, participating in the ECM and bone remodeling processes of osteophytes [48]. T cells are the main infiltrating cells of synovial inflammation in patients with OA and play an important role in the pathological process of OA. T cells are the main component of synovial fluid in patients with knee osteoarthritis, CD4+, CD8+ and T regulatory (Tregs) cells are the main components, CD4+ T cells dominate, and cartilage degeneration is slower in mice with knocked out CD+8 T cells than in those without [49]. Treg cells are at high variance in the synovial fluid [50], and a decrease in Treg cells will increase T cell tolerance, thus exacerbating the process of OA.\u003c/p\u003e\n\u003cp\u003eIn conclusion, we employed novel scientific methods such as LASSO logistic regression algorithms, SVM-RFE, and RF to verify OA diagnostic biomarkers. Although we identified potential biomarkers based on machine learning algorithms and verified their diagnostic efficacy in external datasets, there are certain limitations to our study. The research is based on restricted genetic data and represents the second analysis of datasets. Then, experiments will also be conducted to understand the mechanics behind the potential biomarkers.\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eOur findings determined four potential biomarkers (LTC4S, XIST, CXCL8, and PIM1) that could clarify the diagnosis of osteoarthritis. Those biomarkers may explore the new pathophysiology of osteoarthritis and identify novel therapeutic targets.\u003c/p\u003e\n"},{"header":"Abbreviations","content":"\u003cp\u003eOA Osteoarthritis\u003c/p\u003e\n\u003cp\u003eGEO Gene Expression Omnibus\u003c/p\u003e\n\u003cp\u003eWGCNA Weighted gene co-expression network analysis\u003c/p\u003e\n\u003cp\u003eTOM Topological overlap matrix\u003c/p\u003e\n\u003cp\u003eGS Gene significance\u003c/p\u003e\n\u003cp\u003eMM Module membership\u003c/p\u003e\n\u003cp\u003eDEGs Differentially expressed genes\u003c/p\u003e\n\u003cp\u003eGO Gene ontology\u003c/p\u003e\n\u003cp\u003eKEGG Kyoto Encyclopedia of Genes and Genomes\u003c/p\u003e\n\u003cp\u003eLASSO Least absolute shrinkage and selection operator\u003c/p\u003e\n\u003cp\u003eSVM Support vector machine\u003c/p\u003e\n\u003cp\u003eRF Random forest\u003c/p\u003e\n\u003cp\u003eROC Receiver operating curve\u003c/p\u003e\n\n"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eSupplementary Materials:\u0026nbsp;\u003c/strong\u003e\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor Contributions:\u003c/strong\u003e HQ.H. ,J.L. and L.Z. developed a major research plan L.Z. and HQ.H analyzed data, drew charts, and wrote manuscripts. F.Z.and ZH.H. collected data and references. All authors have read and agreed to the published version of the manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding:\u003c/strong\u003e Not applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eInstitutional Review Board Statement:\u003c/strong\u003e Not applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eInformed Consent Statement:\u0026nbsp;\u003c/strong\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData Availability Statement: \u0026nbsp;\u003c/strong\u003ePublicly available datasets [GSE55235] were analyzed in this study. This data can be found here: [https://www.ncbi.nlm.nih.gov/geo/].\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgments:\u003c/strong\u003e\u0026nbsp; We thank Dr.Liu (Nucleobase translocation of bioinformatics), and all the members of his bioinformatics team, for generously sharing their experience.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConflicts of Interest:\u003c/strong\u003e The authors declare no conflict of interest.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthics approval and informed consent\u003c/strong\u003e\u003cstrong\u003e:\u003c/strong\u003eThis study was approved by the ethics committee of Chengdu Medical College.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eGrandi, F.C.; Bhutani, N. Epigenetic Therapies for Osteoarthritis. Trends Pharmacol Sci 2020, 41, 557-569, doi: 10.1016/j.tips.2020.05.008.\u003c/li\u003e\n\u003cli\u003eYue, L.; Berman, J. What Is Osteoarthritis? JAMA 2022, 327, 1300, doi:10.1001/jama.2022.1980.\u003c/li\u003e\n\u003cli\u003eCastro da Rocha, F.A.; Melo, L.d.P.; Berenbaum, F. Tackling osteoarthritis during COVID-19 pandemic. Annals of the Rheumatic Diseases 2021, 80, 151-153, doi:10.1136/annrheumdis-2020-218372.\u003c/li\u003e\n\u003cli\u003eBatshon, G.; Elayyan, J.; Qiq, O.; Reich, E.; Ben-Aderet, L.; Kandel, L.; Haze, A.; Steinmeyer, J.; Lefebvre, V.; Zhang, H.; et al. Serum NT/CT SIRT1 ratio reflects early osteoarthritis and chondrosenescence. Annals of the Rheumatic Diseases 2020, 79, 1370-1380, doi:10.1136/annrheumdis-2020-217072.\u003c/li\u003e\n\u003cli\u003eAmanatullah, D.F.; Murasko, M.J.; Chona, D.V.; Crijns, T.J.; Ring, D.; Kamal, R.N. Financial Distress and Discussing the Cost of Total Joint Arthroplasty. J Arthroplasty 2018, 33, 3394-3397, doi:10.1016/j.arth.2018.07.010.\u003c/li\u003e\n\u003cli\u003eGay, C.; Guiguet-Auclair, C.; Coste, N.; Boisseau, N.; Gerbaud, L.; Pereira, B.; Coudeyre, E. Limited effect of a self-management exercise program added to spa therapy for increasing physical activity in patients with knee osteoarthritis: A quasi-randomized controlled trial. Ann Phys Rehabil Med 2020, 63, 181-188, doi:10.1016/j.rehab.2019.10.006.\u003c/li\u003e\n\u003cli\u003eGauthier, J.; Vincent, A.T.; Charette, S.J.; Derome, N. A brief history of bioinformatics. Brief Bioinform 2019, 20, 1981-1996, doi:10.1093/bib/bby063.\u003c/li\u003e\n\u003cli\u003eWang, T.; Zheng, X.; Li, R.; Liu, X.; Wu, J.; Zhong, X.; Zhang, W.; Liu, Y.; He, X.; Liu, W.; et al. Integrated bioinformatic analysis reveals YWHAB as a novel diagnostic biomarker for idiopathic pulmonary arterial hypertension. J Cell Physiol 2019, 234, 6449-6462, doi:10.1002/jcp.27381.\u003c/li\u003e\n\u003cli\u003eZhao, X.; Zhang, L.; Wang, J.; Zhang, M.; Song, Z.; Ni, B.; You, Y. Identification of key biomarkers and immune infiltration in systemic lupus erythematosus by integrated bioinformatics analysis. J Transl Med 2021, 19, 35, doi:10.1186/s12967-020-02698-x.\u003c/li\u003e\n\u003cli\u003eChen, Y.; Liao, R.; Yao, Y.; Wang, Q.; Fu, L. Machine learning to identify immune-related biomarkers of rheumatoid arthritis based on WGCNA network. Clin Rheumatol 2022, 41, 1057-1068, doi:10.1007/s10067-021-05960-9.\u003c/li\u003e\n\u003cli\u003eSun, R.; Li, S.; Zhao, K.; Diao, M.; Li, L. Identification of Ten Core Hub Genes as Potential Biomarkers and Treatment Target for Hepatoblastoma. Front Oncol 2021, 11, 591507, doi:10.3389/fonc.2021.591507.\u003c/li\u003e\n\u003cli\u003eYang, Z.; Yan, G.; Zheng, L.; Gu, W.; Liu, F.; Chen, W.; Cui, X.; Wang, Y.; Yang, Y.; Chen, X.; et al., as a potential predictor of prognosis and immunotherapy response for oral squamous cell carcinoma, is related to cell invasion, metastasis, and CD8+ T cell infiltration. Oncoimmunology 2021, 10, 1938890, doi:10.1080/2162402X.2021.1938890.\u003c/li\u003e\n\u003cli\u003eLi, K.; Li, S.; Zhang, H.; Lei, D.; Lo, W.L.A.; Ding, M. Computational Analysis of the Immune Infiltration Pattern and Candidate Diagnostic Biomarkers in Lumbar Disc Herniation. Front Mol Neurosci 2022, 15, 846554, doi:10.3389/fnmol.2022.846554.\u003c/li\u003e\n\u003cli\u003eYou, J.-A.; Gong, Y.; Wu, Y.; Jin, L.; Chi, Q.; Sun, D. WGCNA, LASSO and SVM Algorithm Revealed RAC1 Correlated M0 Macrophage and the Risk Score to Predict the Survival of Hepatocellular Carcinoma Patients. Front Genet 2021, 12, 730920, doi:10.3389/fgene.2021.730920.\u003c/li\u003e\n\u003cli\u003eEdgar, R.; Domrachev, M.; Lash, A.E. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 2002, 30, 207-210, doi:10.1093/nar/30.1.207.\u003c/li\u003e\n\u003cli\u003eRitchie, M.E.; Phipson, B.; Wu, D.; Hu, Y.; Law, C.W.; Shi, W.; Smyth, G.K. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 2015, 43, e47, doi:10.1093/nar/gkv007.\u003c/li\u003e\n\u003cli\u003eWickham, H. Ggplot2: Elegant Graphics for Data Analysis; ggplot2: Elegant Graphics for Data Analysis: 2009.\u003c/li\u003e\n\u003cli\u003eLangfelder, P.; Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 2008, 9, 559, doi:10.1186/1471-2105-9-559.\u003c/li\u003e\n\u003cli\u003eWu, T.; Hu, E.; Xu, S.; Chen, M.; Guo, P.; Dai, Z.; Feng, T.; Zhou, L.; Tang, W.; Zhan, L.; et al. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innovation (Camb) 2021, 2, 100141, doi:10.1016/j.xinn.2021.100141.\u003c/li\u003e\n\u003cli\u003eTibshirani, R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B 1996, 58.\u003c/li\u003e\n\u003cli\u003eFriedman, J.; Hastie, T.; Tibshirani, R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw 2010, 33.\u003c/li\u003e\n\u003cli\u003eHuang, M.-L.; Hung, Y.-H.; Lee, W.M.; Li, R.K.; Jiang, B.-R. SVM-RFE based feature selection and Taguchi parameters optimization for multiclass SVM classifier. ScientificWorldJournal 2014, 2014, 795624, doi:10.1155/2014/795624.\u003c/li\u003e\n\u003cli\u003eBreiman, L. Random forests. Machine learning 2001, 45, 5-32.\u003c/li\u003e\n\u003cli\u003eLiaw, A.; Wiener, M. Classification and regression by randomForest. R news 2002, 2, 18-22.\u003c/li\u003e\n\u003cli\u003eRobin, X.; Turck, N.; Hainard, A.; Tiberti, N.; Lisacek, F.; Sanchez, J.-C.; M\u0026uuml;ller, M. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 2011, 12, 77, doi:10.1186/1471-2105-12-77.\u003c/li\u003e\n\u003cli\u003eXue, G.; Hua, L.; Zhou, N.; Li, J. Characteristics of immune cell infiltration and associated diagnostic biomarkers in ulcerative colitis: results from bioinformatics analysis. Bioengineered 2021, 12, 252-265, doi:10.1080/21655979.2020.1863016.\u003c/li\u003e\n\u003cli\u003eWickham, H. Reshaping data with the reshape package. J Stat Softw 2007, 21, 1-20.\u003c/li\u003e\n\u003cli\u003eConaghan, P.G.; Cook, A.D.; Hamilton, J.A.; Tak, P.P. Therapeutic options for targeting inflammatory osteoarthritis pain. Nat Rev Rheumatol 2019, 15, 355-363, doi:10.1038/s41584-019-0221-y.\u003c/li\u003e\n\u003cli\u003eBlanco, F.J.; Rego-P\u0026eacute;rez, I. Editorial: Is it time for epigenetics in osteoarthritis? Arthritis Rheumatol 2014, 66, 2324-2327, doi:10.1002/art.38710.\u003c/li\u003e\n\u003cli\u003eRosshirt, N.; Hagmann, S.; Tripel, E.; Gotterbarm, T.; Kirsch, J.; Zeifang, F.; Lorenz, H.M.; Tretter, T.; Moradi, B. A predominant Th1 polarization is present in synovial fluid of end-stage osteoarthritic knee joints: analysis of peripheral blood, synovial fluid and synovial membrane. Clin Exp Immunol 2019, 195, 395-406, doi:10.1111/cei.13230.\u003c/li\u003e\n\u003cli\u003eKarhade, A.V.; Schwab, J.H. Introduction to The Spine Journal special issue on artificial intelligence and machine learning. Spine J 2021, 21, 1601-1603, doi:10.1016/j.spinee.2021.03.028.\u003c/li\u003e\n\u003cli\u003eNi, Z.; Kuang, L.; Chen, H.; Xie, Y.; Zhang, B.; Ouyang, J.; Wu, J.; Zhou, S.; Chen, L.; Su, N.; et al. The exosome-like vesicles from osteoarthritic chondrocyte enhanced mature IL-1\u0026beta; production of macrophages and aggravated synovitis in osteoarthritis. Cell Death Dis 2019, 10, 522, doi:10.1038/s41419-019-1739-2.\u003c/li\u003e\n\u003cli\u003eZhang, H.; Ge, J.; Lu, X. CircFADS2 is downregulated in osteoarthritis and suppresses LPS-induced apoptosis of chondrocytes by regulating miR-195-5p methylation. Arch Gerontol Geriatr 2021, 96, 104477, doi:10.1016/j.archger.2021.104477.\u003c/li\u003e\n\u003cli\u003eNa, H.S.; Park, J.-S.; Cho, K.-H.; Kwon, J.Y.; Choi, J.; Jhun, J.; Kim, S.J.; Park, S.-H.; Cho, M.-L. Interleukin-1-Interleukin-17 Signaling Axis Induces Cartilage Destruction and Promotes Experimental Osteoarthritis. Front Immunol 2020, 11, 730, doi:10.3389/fimmu.2020.00730.\u003c/li\u003e\n\u003cli\u003eYuan, Q.; Ren, J.; Wang, Z.; Ji, L.; Deng, D.; Shang, D. Identification of the Real Hub Gene and Construction of a Novel Prognostic Signature for Pancreatic Adenocarcinoma Based on the Weighted Gene Co-expression Network Analysis and Least Absolute Shrinkage and Selection Operator Algorithms. Front Genet 2021, 12, 692953, doi:10.3389/fgene.2021.692953.\u003c/li\u003e\n\u003cli\u003eDing, C.; Bao, T.-Y.; Huang, H.-L. Quantum-Inspired Support Vector Machine. IEEE Trans Neural Netw Learn Syst 2021, PP, doi:10.1109/TNNLS.2021.3084467.\u003c/li\u003e\n\u003cli\u003eChen, X.; Ishwaran, H. Random forests for genomic data analysis. Genomics 2012, 99, 323-329, doi:10.1016/j.ygeno.2012.04.003.\u003c/li\u003e\n\u003cli\u003eCowburn, A.S.; Sladek, K.; Soja, J.; Adamek, L.; Nizankowska, E.; Szczeklik, A.; Lam, B.K.; Penrose, J.F.; Austen, F.K.; Holgate, S.T.; et al. Overexpression of leukotriene C4 synthase in bronchial biopsies from patients with aspirin-intolerant asthma. J Clin Invest 1998, 101, 834-846, doi:10.1172/jci620.\u003c/li\u003e\n\u003cli\u003eCai, Y.; Bjermer, L.; Halstensen, T.S. Bronchial mast cells are the dominating LTC4S-expressing cells in aspirin-tolerant asthma. Am J Respir Cell Mol Biol 2003, 29, 683-693, doi:10.1165/rcmb.2002-0174OC.\u003c/li\u003e\n\u003cli\u003eNegri, J.; Early, S.B.; Steinke, J.W.; Borish, L. Corticosteroids as inhibitors of cysteinyl leukotriene metabolic and signaling pathways. J Allergy Clin Immunol 2008, 121, 1232-1237, doi:10.1016/j.jaci.2008.02.007.\u003c/li\u003e\n\u003cli\u003eLi, L.; Lv, G.; Wang, B.; Kuang, L. The role of lncRNA XIST/miR-211 axis in modulating the proliferation and apoptosis of osteoarthritis chondrocytes through CXCR4 and MAPK signaling. Biochem Biophys Res Commun 2018, 503, 2555-2562, doi:10.1016/j.bbrc.2018.07.015.\u003c/li\u003e\n\u003cli\u003eWatt, F.E.; Hamid, B.; Garriga, C.; Judge, A.; Hrusecka, R.; Custers, R.J.H.; Jansen, M.P.; Lafeber, F.P.; Mastbergen, S.C.; Vincent, T.L. The molecular profile of synovial fluid changes upon joint distraction and is associated with clinical response in knee osteoarthritis. Osteoarthritis and Cartilage 2020, 28, 324-333, doi:10.1016/j.joca.2019.12.005.\u003c/li\u003e\n\u003cli\u003eYang, P.; Tan, J.; Yuan, Z.; Meng, G.; Bi, L.; Liu, J. Expression profile of cytokines and chemokines in osteoarthritis patients: Proinflammatory roles for CXCL8 and CXCL11 to chondrocytes. Int Immunopharmacol 2016, 40, 16-23, doi:10.1016/j.intimp.2016.08.005.\u003c/li\u003e\n\u003cli\u003eHa, Y.J.; Choi, Y.S.; Han, D.W.; Kang, E.H.; Yoo, I.S.; Kim, J.H.; Kang, S.W.; Lee, E.Y.; Song, Y.W.; Lee, Y.J. PIM-1 kinase is a novel regulator of proinflammatory cytokine-mediated responses in rheumatoid arthritis fibroblast-like synoviocytes. Rheumatology (Oxford) 2019, 58, 154-164, doi:10.1093/rheumatology/key261.\u003c/li\u003e\n\u003cli\u003eManey, N.J.; Lemos, H.; Barron-Millar, B.; Carey, C.; Herron, I.; Anderson, A.E.; Mellor, A.L.; Isaacs, J.D.; Pratt, A.G. Pim Kinases as Therapeutic Targets in Early Rheumatoid Arthritis. Arthritis Rheumatol 2021, 73, 1820-1830, doi:10.1002/art.41744.\u003c/li\u003e\n\u003cli\u003eTakata, K.; Uchida, K.; Takano, S.; Mukai, M.; Inoue, G.; Sekiguchi, H.; Aikawa, J.; Miyagi, M.; Iwase, D.; Takaso, M. Possible Regulation of bFGF Expression by Mast Cells in Osteoarthritis Patients with Obesity: A Cross-Sectional Study. Diabetes Metab Syndr Obes 2021, 14, 3291-3297, doi:10.2147/dmso.S319537.\u003c/li\u003e\n\u003cli\u003eWang, Q.; Lepus, C.M.; Raghu, H.; Reber, L.L.; Tsai, M.M.; Wong, H.H.; von Kaeppler, E.; Lingampalli, N.; Bloom, M.S.; Hu, N.; et al. IgE-mediated mast cell activation promotes inflammation and cartilage destruction in osteoarthritis. Elife 2019, 8, doi:10.7554/eLife.39905.\u003c/li\u003e\n\u003cli\u003eKulkarni, P.; Harsulkar, A.; M\u0026auml;rtson, A.-G.; Suutre, S.; M\u0026auml;rtson, A.; Koks, S. Mast Cells Differentiated in Synovial Fluid and Resident in Osteophytes Exalt the Inflammatory Pathology of Osteoarthritis. Int J Mol Sci 2022, 23, doi:10.3390/ijms23010541.\u003c/li\u003e\n\u003cli\u003eHsieh, J.-L.; Shiau, A.-L.; Lee, C.-H.; Yang, S.-J.; Lee, B.-O.; Jou, I.M.; Wu, C.-L.; Chen, S.-H.; Shen, P.-C. CD8+ T cell-induced expression of tissue inhibitor of metalloproteinses-1 exacerbated osteoarthritis. Int J Mol Sci 2013, 14, 19951-19970, doi:10.3390/ijms141019951.\u003c/li\u003e\n\u003cli\u003eZhu, W.; Zhang, X.; Jiang, Y.; Liu, X.; Huang, L.; Wei, Q.; Huang, Y.; Wu, W.; Gu, J. Alterations in peripheral T cell and B cell subsets in patients with osteoarthritis. Clin Rheumatol 2020, 39, 523-532, doi:10.1007/s10067-019-04768-y.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"osteoarthritis, machine learning, WGCNA, potential biomarkers, immune analysis ","lastPublishedDoi":"10.21203/rs.3.rs-4299353/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4299353/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"The real pathogenesis of osteoarthritis (OA) remains unknown, leaving a significant burden of social and medical experiences. Thus, this study aimed to identify potential novel biomarkers in OA. The OA dataset (GSE55235) was from the Gene Expression Omnibus (GEO) database. Weighted gene co-expression network analysis (WGCNA) for filtering the dataset to generate differentially expressed genes (DEGs). Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis to explore functional biology and related diseases. Subsequently, a further selection of latent biomarkers using three techniques (least absolute shrinkage and selection operator (LASSO) regression, support vector machine (SVM), and random forest (RF)). Receiver operating curve (ROC) of potential biomarkers were drawn to evaluate the diagnostic validity. The infiltration of immune cells for OA was evaluated using CIBERSORT, and the association with potential biomarkers and immune infiltrating cells was analyzed. Lastly, correlations and expression differences of potential biomarkers were investigated. In total, 803 DEGs were identified in OA and control samples. By overlapping DEGs and two module genes of WGCNA, we obtained 137 genes. LTC4S, XIST, CXCL8 and PIM1 were identified after validation by machine learning methods and ROC. Immune infiltration analysis demonstrated that T cells, and mast cells were linked to the pathogenesis of OA. The research might now help in understanding the etiology of OA.","manuscriptTitle":"Screening of Potential Biomarkers and Immune Analysis for Osteoarthritis Based on Machine Learning and WGCNA","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-04-29 10:25:03","doi":"10.21203/rs.3.rs-4299353/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"728aea2c-95b9-4642-98df-f1834064b20a","owner":[],"postedDate":"April 29th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2024-05-17T14:22:48+00:00","versionOfRecord":[],"versionCreatedAt":"2024-04-29 10:25:03","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-4299353","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4299353","identity":"rs-4299353","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00