Identification and validation of novel characteristic genes based on multi-tissue osteoarthritis | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Identification and validation of novel characteristic genes based on multi-tissue osteoarthritis guihao Zheng, yulong Ouyang, shuilin Chen, bei Hu, shuai Xu, guicai Sun This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4706641/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Background Osteoarthritis (OA) is characterized by synovial inflammation, articular cartilage degradation, and subchondral bone changes. Currently, there are no reliable biomarkers for the diagnosis and treatment of OA. Therefore, exploring OA biomarkers is crucial for its prevention, diagnosis, and treatment. Materials and Methods The GSE51588, GSE12021, GSE55457, GSE56409, GSE114007, GSE168505, GSE169077, GSE55235, GSE129147, and GSE48556 datasets of patients with OA and normal control samples were obtained from the GEO database. Differentially expressed genes (DEGs) in OA and normal controls were identified using R language. Protein-protein interaction (PPI) network and module analysis were performed to screen and filter key genes. Enrichment analyses were conducted to determine the biological functions and pathways of key DEGs and predict potential transcription factors. Machine learning models (XGBoost, LASSO regression, and SVM) were used to identify the best characteristic genes, and the intersection of hub genes was used as the final diagnostic genes. ROC analysis and nomogram were used to evaluate the diagnostic value of candidate genes. The expression levels of characteristic genes were validated in external GEO datasets containing cartilage, synovial membrane, and blood samples from patients. The expression levels of the key gene IRS2 in chondrocytes were further confirmed through in vitro experiments. Results Fifteen OA characteristic genes (IRS2, ADM, SIK1, PTN, CX3CR1, WNT5A, IL21R, APOD, CRLF1, FKBP5, PNMAL1, NPR3, RARRES1, ASPN, POSTN) were identified using three machine learning algorithms. Enrichment analysis indicated that abnormal expression of DEGs and hub genes may be mediated by extracellular matrix organization, extracellular structure organization, Relaxin signaling pathway, IL-17 signaling pathway, AGE-RAGE signaling pathway in diabetic complications, and PI3K-Akt signaling pathway, which are involved in OA occurrence. Four diagnostic genes (IRS2, WNT5A, PTN, POSTN) were highly correlated with OA. Validation data set analysis showed that IRS2 was down-regulated, while WNT5A, PTN, and POSTN were up-regulated in the experimental group compared to the normal group. qRT-PCR and WB results verified that the expression level of diagnostic gene IRS2 was consistent with bioinformatics analysis results. Conclusion This study integrates bioinformatics analysis and machine learning algorithms to identify and validate four promising biomarkers: IRS2, WNT5A, PTN, and POSTN. POSTN can be used as a biomarker for OA cartilage, and early diagnosis of PTN in OA deserves attention. WNT5A and IRS2 offer new diagnostic perspectives for OA. Osteoarthritis Biomarkers Machine Learning Bioinformatics Tissue-specific expressed genes Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Introduction Osteoarthritis (OA) remains a global health challenge 1 , affecting an estimated 600 million people worldwide, or 7.6% of the world's population 2 , with a higher prevalence in women 3 . The incidence and disability rate of OA have been rising since 1990, increasing by as much as 8% to 10% 4 , Its prevalence is expected to nearly double in the next decade 5 . Additionally, the socioeconomic costs of OA are substantial, estimated to reach $80 billion in 2016 alone, and growing at a rate of 5.7% per year 4 . As the population ages, the prevalence of OA increases year by year, and in the elderly, the risk of mobility impairment and disability caused by OA surpasses that of any other disease. 6,7 . OA is characterized by the degradation of the extracellular matrix (ECM), decreased synthesis of type II collagen, increased production of matrix metalloproteinase 13 (MMP13), and activation of Cyclooxygenase-2 (COX2) by destructive intracellular cytokines. These cytokines promote the conversion of arachidonic acid into inflammatory prostaglandins, resulting in an inflammatory response 8 . During this process, the synovial membrane, cartilage, and subchondral bone interact with each other, jointly promoting disease development 9-11 . Although the etiology of OA is multi-factorial, the exact cause remains unknown. Therefore, developing new OA-related targets for better diagnosis and treatment of patients with OA is urgently needed and remains a focus and challenge of current research. With the rapid development of high-throughput sequencing technology, an increasing number of OA-related datasets have been generated and utilized 12-15 . Although some studies have focused on OA biomarkers, most are based on single tissues or individual datasets 16 . The occurrence and development of OA are closely related to the interactions among cartilage, subchondral bone, and synovitis. Despite some controversy regarding the order of pathological events, it is undeniable that these components are all involved in disease progression 17,18 , which may lead to the unreliability of current research results 19-22 . The application of machine learning (ML) in OA biomarker development has garnered growing research interest. For instance, Chen et al. 22 applied synovial tissue microarray datasets and identified ZBTB16, TNFSF11, SCRG1, and KDELR3 as diagnostic biomarkers for OA. A study by Han et al.[14] also reported that TLR7, CSF1R, APOE, C1QA, and CCL5 are key genes with high diagnostic value for OA after applying weighted gene co-expression network analysis (WGCNA) and the cytoHubba algorithm. Although these findings may contribute to a better understanding of OA and provide new insights into its diagnosis and treatment, the limited sample sizes and number of algorithms used may undermine the reliability and robustness of these studies to some extent. In this study, we aim to integrate public OA datasets from subchondral bone, cartilage, and synovium tissues to discover and validate meaningful OA biomarkers. We employed Robust Rank Aggregation (RRA) and Surrogate Variable Analysis (SVA) to jointly screen for differentially expressed genes. Additionally, we used machine learning simulations to identify the best method for screening disease biomarkers and validated these biomarkers in OA cartilage, synovium, and blood samples. Furthermore, we investigated the biological processes and predicted transcription factors of disease hub genes. Materials and Methods 2.1 Data Collection The datasets GSE51588, GSE12021, GSE55457, GSE56409, GSE114007, GSE168505, GSE169077, GSE55235, GSE129147, and GSE48556 were obtained from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/). GSE51588 is a whole-genome expression profile of subchondral bone. GSE12021, GSE55457, GSE56409, and GSE55235 contain data measured in synovial samples from OA patients and normal controls. GSE114007, GSE168505, GSE169077, and GSE129147 are RNA sequencing data from joint cartilage tissues, while GSE48556 consists of mononuclear cells from the blood of OA patients and normal controls. All these data are derived from human samples. 2.2 Identification of Differentially Expressed Genes 2.2.1 Identification of DEGs Using the RRA Method The seven raw datasets were processed using the limma package in Bioconductor, which included normalization and log transformation. Subsequently, DEGs were analyzed using the limma package, and ranked lists of upregulated and downregulated DEGs were generated for each dataset based on their fold changes. Finally, we integrated the results of these seven datasets using the R package "RobustRankAggreg" based on the robust rank aggregation (RRA) method to identify the most significant DEGs. In the RRA analysis, genes with a log fold change (logFC) >1 and an adjusted p-value <0.05 were selected as significant DEGs. 2.2.2 Identification of Differentially Expressed Genes (DEGs) Using the SVA Method Before merging the seven microarray datasets, we used the ComBat function in the Surrogate Variable Analysis (SVA) package to correct for batch effects, aiming to minimize experimental variance before subsequent analyses. Box plots and Principal Component Analysis (PCA) were used to assess the data before and after correction. The final merged dataset consisted of 167 samples, including 66 normal control samples and 101 OA samples. In the SVA analysis, genes with a log fold change (logFC) >1 and an adjusted p-value <0.05 were selected as significant DEGs. Finally, the common genes identified by both the RRA and SVA methods were considered as key genes for OA. 2.3 Gene Ontology(GO) Functional Annotation, Kyoto Encyclopedia of Genes and Genomes (KEGG) Analysis, and Disease Ontology (DO) Enrichment Analysis GO functional annotation, KEGG analysis, and DO enrichment analysis were performed using R packages (enrichplot, RColorBrewer, Heatmap, etc.). p-value <0.05 was considered statistically significant for enrichment. 2.4 Identification of Gene Clusters and Construction of Protein-Protein Interaction Networks The STRING database (https://string-db.org/) was used to construct the protein-protein interaction (PPI) network, which was then imported into Cytoscape v3.9.1 software to visualize the key nodes in the molecular interaction network. The cytoHubba algorithms were employed to analyze the PPI network. Only genes identified as hub genes by all cytoHubba calculation methods ("MCC", "DMNC", "MNC", "Degree", "EPC", "BottleNeck", "EcCentricity", "Closeness", "Radiality", "Betweenness", "Stress", "ClusteringCoefficient") were included in the final selection. The interaction genes and functions of these hub genes were predicted, and relevant PPI networks were generated using the GeneMANIA database (https://genemania.org/). Enrichment analysis of the hub genes was further conducted using GO functional terms and KEGG pathway enrichment analysis, with a p-value <0.05 considered statistically significant. 2.5 Construction of Transcriptional Regulatory Networks To identify substantial transcriptional changes and gain deeper insights into the regulatory roles of key osteoarthritis (OA) genes, we utilized TRRUST (https://www.grnpedia.org/trrust/) for sentence-based text mining to uncover transcriptional regulatory relationships. Subsequently, we assessed the enrichment of hub genes to identify the corresponding transcription factors (TFs). Finally, we constructed the TF regulatory network using Cytoscape software and validated the TFs that regulate multiple hub genes. 2.6 Machine Learning We used the `train` function from the caret package to predict the best model using various machine learning methods, including Least Absolute Shrinkage and Selection Operator (LASSO), Support Vector Machines (SVM), Decision Trees (DT), Random Forest (RF), eXtreme Gradient Boosting (XGBoost), and Generalized Linear Model (GLM). LASSO regression is a linear regression technique that combines L1 regularization to constrain the complexity of the model, thus achieving feature selection and preventing overfitting 23 . SVM is a powerful supervised learning model that can be used for classification and regression tasks. It has shown wide applicability and superior performance in high-dimensional data, complex data structures, and small sample learning 24 . DT is a supervised learning algorithm that splits nodes by selecting the features that best divide the data 25 . RF is an ensemble learning method that improves model performance and stability by voting (classification) or averaging (regression) the results of multiple decision trees. It is very powerful in identifying subtle patterns in complex datasets 26 . XGBoost is a gradient boosting framework that uses tree structure models for efficient and flexible machine learning algorithms. It is widely praised for its excellent performance and efficiency, being widely used in various machine learning competitions and real-world problems 27 . GLM is an extension of linear models that can be applied to a wide range of data types, with strong interpretability and good applicability. Through training and testing these models, we selected the best three models to analyze the differentially expressed genes and identify the optimal feature genes. The results obtained were intersected with the hub genes to serve as diagnostic genes for the disease. 2.7 Validation of Feature Genes with Receiver Operating Characteristic(ROC) Curve and Construction of Clinical Diagnostic Model To validate the importance of candidate genes in diagnosis, we used the "rms" R package to construct a nomogram and to predict and interpret the constructed model. Box plots and violin plots were used to visualize the differential expression of genes between the control and OA groups. We further evaluated the prognostic value of the candidate genes through ROC analysis, obtaining the area under the ROC curve (AUC) and the 95% confidence interval (CI). An AUC value >0.7 was considered to indicate good diagnostic performance. 2.8 Validation of Feature Genes by Western Blot(WB) and Quantitative Real-Time PCR (qRT-PCR) Human SW1353 chondrocytes, commonly used for OA modeling 8,28 , were cultured in an incubator under standard conditions (37℃, 5% CO2) using specialized SW1353 culture medium (DMEM, Pricella, CN), with the medium changed every 2-3 days. When SW1353 cells reached 50% confluence, they were stimulated with 10 ng/ml IL-1β(Peprotech, USA) for 24 hours to model OA 29-31 , The treated chondrocytes were lysed in RIPA lysis buffer (Solarbio, CN) containing 1% protease and phosphatase inhibitors (GlpBio, USA) for 30 minutes. The lysates were collected by scraping the adherent cells thoroughly, then incubated on a shaker at 4°C for 1 hour before centrifugation to collect the supernatant for subsequent experiments. The protein concentration in the supernatant was measured using a BCA kit (Solarbio, China) on a microplate reader (Thermo Fisher Scientific, USA) at a wavelength of 562 nm. Subsequently, equal amounts of sample protein and protein markers were separated by 10% sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) and transferred to PVDF membranes (Millipore, USA). After blocking with 5% BSA (Servicebio, Wuhan, CN) for 1 hour, the membranes were incubated with primary antibodies overnight at 4℃. After washing three times with Tris-buffered saline containing Tween (TBST), the membranes were incubated with secondary antibodies at room temperature for 2 hours. Details of the primary and secondary antibodies are listed in Table 1. Purified protein bands were visualized using enhanced chemiluminescence reagent (YEASEN, CN) and imaged with a Bio-Rad scanner (Bio-Rad, USA). GAPDH was used as the housekeeping gene. Band analysis was performed using ImageJ software. Table 1 Antibody Information Name Type Species Source COL2A1 primary antibodies Rabbit Proteintech/China IRS2 primary antibodies Mouse Proteintech/China MMP13 primary antibodies Rabbit Proteintech/China SOX9 primary antibodies Rabbit Zenbio/China COX2 primary antibodies Rabbit Zenbio/China GAPDH primary antibodies Mouse Proteintech/China Gout anti-Rabbit IgG secondary antibodies - Proteintech/China Gout anti-Mouse IgG secondary antibodies - Proteintech/China Similarly, total RNA was extracted from the treated chondrocytes using Trizol reagent (TaKaRa, JPN) and purified with an RNA extraction kit (TransGen, CN). The total RNA concentration of the samples was measured using a micro-UV spectrophotometer (Thermo Fisher Scientific, USA). The RNA samples were then reverse transcribed into cDNA using a reverse transcription kit (TransGen, CN). Finally, the target mRNA was amplified on a real-time fluorescence quantitative PCR instrument (Bio-Rad, USA). The final quantitative results were normalized to GAPDH and calculated using the 2^(-△△Ct) method. The primer sequences for the target genes are shown in Table 2. Table 2 Primer Information Primer Name Primer Sequence (5'-3') Species COL2A1 Forword :TGGACGATCAGGCGAAACC Human Reverse :GCTGCGGATGCTCTCAATCT IRS2 Forword :CGGTGAGTTCTACGGGTACAT Human Reverse :TCAGGGTGTATTCATCCAGCG MMP13 Forword :ACTGAGAGGCTCCGAGAAATG Human Reverse :GAACCCCGCATCTTGGCTT SOX9 Forword :AGCGAACGCACATCAAGAC Human Reverse :CTGTAGGCGATCTGTTGGGG COX2 Forword :GCACCCCGACATAGAGAGC Human Reverse :CTGCGGAGTGCAGTGTTCT GAPDH Forword :TGTGGGCATCAATGGATTTGG Human Reverse :ACACCATGTATTCCGGGTCAAT 2.9 Statistical Analysis Data were analyzed using R 4.3.0 (https://www.rproject.org/), Cytoscape 3.9.1 (https://cytoscape.org/index.html), Perl 5.32.1 (https://www.perl.org), and GraphPad Prism 9.7 (https://www.graphpad.com/) software. Continuous data were described as mean ± standard deviation. For comparisons between two groups of continuous data, if the data followed a normal distribution and had equal variances, a t-test (T test) was used. If the data did not follow a normal distribution, a non-parametric test (Mann-Whitney U test) was employed. All experiments were independently repeated three times, and a P value less than 0.05 was considered statistically significant. Results 3.1 Identification of DEGs using RRA Method Table 3 describes the sample characteristics of the datasets included in our study, including dataset ID, platform ID, total number of samples, number of normal samples, number of OA samples, and grouping information. Differential expression was analyzed for each dataset, and based on the results of DEGs from each dataset, the RRA method identified a total of 202 significantly up-regulated and 49 significantly down-regulated DEGs. A heatmap displays the top 50 up-regulated and top 10 down-regulated DEGs (Figure 1A). Table 3 Sample Characteristics of Included Datasets GEO dataset platform samples Total normal OA group GSE51588 GPL13497 Subchondral Bone 50 10 40 Train GSE12021 GPL96 Synovium 19 9 10 Train GSE55457 GPL96 Synovium 20 10 10 Train GSE56409 GPL570 Synovium 22 11 11 Train GSE114007 GPL11154/GPL18573 Cartilage 38 18 20 Train GSE168505 GPL16791 Cartilage 7 3 4 Train GSE169077 GPL96 Cartilage 11 5 6 Train GSE55235 GPL96 Synovium 20 10 10 Test GSE129147 GPL6947 Cartilage 40 7 33 Test GSE48556 GPL6947 Blood 139 33 106 Test 3.2 Identification of DEGs using SVA Method Box plots (Figures S1A-S1B) and PCA cluster plots (Figures S1C-S1D) present the gene expression matrix of the seven combined datasets before and after removing batch effects. The results indicate that, after normalization and batch effect adjustment, the batch effects between different datasets are not obvious, suggesting that the final combined dataset is suitable for subsequent analysis. The volcano plot and heatmap in Figures 1B-1C show the genetic differences between OA and normal controls in the combined dataset. Among these, 30 genes were found to be down-regulated and 64 genes were up-regulated. A total of 83 genes, identified by both RRA and SVA methods, were considered key genes for OA and retained for further analysis (Figure 1D). 3.3 Functional Enrichment and Pathway Analysis of Key Genes We analyzed the GO annotations, KEGG pathways, and DO enrichment for these key genes to explore their potential biological information. GO analysis results indicate that, in the biological process (BP) category, the genes are involved in extracellular matrix organization, extracellular structure organization, and external encapsulating structure organization. In the cellular component (CC) category, they are associated with the collagen-containing extracellular matrix, complex of collagen trimers, and fibrillar collagen trimer. The enriched molecular function (MF) terms include extracellular matrix structural constituent, extracellular matrix structural constituent conferring tensile strength, and platelet-derived growth factor binding (Figures 1E,1H). KEGG enrichment results show that key genes are primarily enriched in the Relaxin signaling pathway, Protein digestion and absorption, AGE-RAGE signaling pathway in diabetic complications, and Rheumatoid arthritis (Figures 1F,1I). DO enrichment analysis results (Figures 1G,1J) indicate that the key genes are enriched in diseases including degenerative disc disease, connective tissue cancer, cell type benign neoplasm, Osteoarthritis, and musculoskeletal system cancer. 3.4 Construction of PPI Network for Key Genes We used the key genes to construct a PPI network in STRING to understand the potential connections between proteins, with a minimum required interaction score of 0.4 and a PPI enrichment p-value <1.0e-16. This score indicates that the connections have a high level of confidence. To understand the expression of interacting proteins in OA, we scored the protein-protein interactions and combined the PPI network with scores greater than 0.4. We used Cytoscape software to visualize the PPI network constructed in STRING (Figure 2A). To understand the interactions between hub genes, we used the 'cytoHubba' algorithm to identify hub genes. We considered all computational methods in the cytoHubba algorithm ("MCC", "DMNC", "MNC", "Degree", "EPC", "BottleNeck", "EcCentricity", "Closeness", "Radiality", "Betweenness", "Stress", "ClusteringCoefficient") as hub genes, and only those identified by all methods were included as final hub genes. In the end, we identified 25 hub genes (Figure 2B): COL1A2, COL3A1, POSTN, COL11A1, CDH11, SULF1, FAP, MMP13, MMP1, OGN, THY1, JUN, CDH2, WNT5A, ATF3, CDKN1A, GAP43, PTN, DDIT4, STMN2, TOP2A, IRS2, IGFBP1, TUBB3, and SPOCK1. The interactions of these genes were visualized, with yellow representing up-regulated genes and pink representing down-regulated genes (Figure 2C). We also plotted the volcano plot and heatmap of the hub genes (Figures 2D-2E). GeneMANIA was used alongside PPI to evaluate the 25 hub genes and 20 interacting genes to predict relationships in terms of co-expression, shared protein domains, co-localization, and pathways (Figure 2F). The outer circle represents predicted genes, and the inner circle represents hub genes. Network analysis indicated that these genes are related to fibroblast proliferation, regulation of fibroblast proliferation, sensory organ morphogenesis, axonogenesis, extracellular matrix organization, banded collagen fibril, and collagen-containing extracellular matrix. To further understand the function of hub genes in OA, we performed enrichment analysis on the hub genes. According to GO analysis, the biological processes (BP) enriched include extracellular matrix organization, extracellular structure organization, external encapsulating structure organization, axon development, and regeneration. The cellular components (CC) enriched include collagen-containing extracellular matrix, fibrillar collagen trimer, banded collagen fibril, endoplasmic reticulum lumen, and complex of collagen trimers. The molecular functions (MF) enriched include extracellular matrix structural constituent, extracellular matrix structural constituent conferring tensile strength, integrin binding, platelet-derived growth factor binding, and SMAD binding (Figures 2G-2H). KEGG analysis indicated that the hub genes are enriched in the Relaxin signaling pathway, IL-17 signaling pathway, AGE-RAGE signaling pathway in diabetic complications, Protein digestion and absorption, and PI3K-Akt signaling pathway (Figures 2I-2J). 3.5 Association between Hub Genes and TFs Transcription factors (TFs) are involved in gene regulation. To explore the role of TFs, we used TRRUST to predict key TFs influencing OA through hub genes. The analysis of interactions between TFs and hub genes indicated that 49 TFs coordinated the regulation of 18 common DEGs (JUN is both a differentially expressed gene and a transcription factor), indicating a complex regulatory relationship (Figure 3A). The top-ranked (TFs), based on the complexity of gene regulation, include JUN, NFKB1, SP1, ETS1, AR, RELA, HDAC4, ATF2, ATF4, ESR1, EGR1, SP3, and TP53. Additionally, we validated the expression of several of these top-ranked TFs in OA (Figures 3B-3F). These findings reveal significant relationships between OA hub genes and TFs. 3.6 Identifying Diagnostic Biomarkers through Machine Learning To identify key diagnostic biomarkers for osteoarthritis (OA), we conducted a thorough feature selection process in our study. Initially, we employed the `train` function from the caret package to predict the optimal machine learning algorithms. The dataset was divided into training and validation sets, with training set proportions of either 0.7 or 0.8. Among the six machine learning methods evaluated, XGBoost, SVM, and Lasso regression consistently ranked high in terms of residuals and root mean square error (Figures 4A, 4C), with areas under the ROC curve exceeding 0.9 (Figures 4B, 4D), indicating superior diagnostic efficacy and learning performance. Consequently, we selected XGBoost, SVM, and Lasso regression to screen for key diagnostic biomarkers for OA. Using the XGBoost algorithm, we identified 23 candidate genes (Figure 4E). The SVM algorithm identified 23 key genes (Figures 4F-4G), and Lasso regression selected 26 candidate genes (Figures 4H-4I). By intersecting the feature genes identified by these three machine learning methods, we pinpointed 15 candidate genes: IRS2, ADM, SIK1, PTN, CX3CR1, WNT5A, IL21R, APOD, CRLF1, FKBP5, PNMAL1, NPR3, RARRES1, ASPN, and POSTN. Further intersecting these candidate genes with the hub genes, we identified IRS2, PTN, POSTN, and WNT5A as diagnostic biomarkers for OA (Figure 4J). Next, we constructed box plots for these candidate genes, which showed differential expression in the dataset. We used ROC curves to evaluate the specificity and sensitivity of these four features in distinguishing OA from normal tissues in the test set. The diagnostic values of the four genes were as follows: IRS2 (AUC = 0.879, 95% CI: 0.826-0.927), PTN (AUC = 0.784, 95% CI: 0.707-0.852), POSTN (AUC = 0.668, 95% CI: 0.583-0.748), and WNT5A (AUC = 0.783, 95% CI: 0.706-0.857) (Figure 5A). Since the AUC of the POSTN gene in the test dataset was 0.668, which is less than 0.7, we ultimately selected IRS2, PTN, and WNT5A as the diagnostic biomarkers for OA.We then constructed a diagnostic nomogram for OA based on the training dataset to develop a clinically applicable diagnostic model for OA (Figure 5B). The clinical calibration curve (Figure 5C) and clinical decision curve (Figure 5D) of the model clearly demonstrated its high predictive ability for OA (AUC = 0.913, 95% CI: 0.866-0.953, Figure 5E). 3.7 Validation of Relevant Screening Genes To better validate the clinical diagnostic capability of the candidate genes, we used the cartilage tissue dataset GSE129147, the synovial tissue dataset GSE55235, and the blood sample dataset from OA patients GSE48556 for verification, with sample sizes of 40, 20, and 139, respectively. We constructed violin plots of the candidate genes, which showed differential expression in the datasets (Figures 6A-6C). Additionally, we plotted ROC curves to evaluate the diagnostic value of each gene in different tissue samples (Figures 6D-6F). Among these, PTN had lower diagnostic efficiency in cartilage samples, POSTN had poor diagnostic efficiency in synovial samples, and only IRS2 exhibited high differential expression (p<0.01) in all samples with good diagnostic value, indicating a high clinical value. 3.8 Experimental Validation of IRS2 Gene To test the reliability of the candidate gene IRS2 in OA patients, we simulated OA by adding IL-1β to human SW1353 chondrocytes. We identified the protein level expression and relative mRNA expression level of the IRS2 gene in this sample through WB and qRT-PCR. Additionally, we checked OA-related phenotypes to ensure the accuracy of the OA simulation. The study data indicated that, compared to the control group, OA-related markers COL2A1 and SOX9 decreased, while MMP13 and COX2 significantly increased, consistent with OA characteristics. In the OA group, the protein and mRNA expression levels of the IRS2 gene were significantly reduced, showing statistical differences (Figures 7A-7C). Discussion Osteoarthritis is a common degenerative joint disease that primarily affects weight-bearing joints such as the knees and hips. It is characterized by synovial inflammation, joint cartilage degradation, and subchondral bone alterations. Synovial inflammation leads to joint swelling and pain, while cartilage degradation results in direct bone contact, causing pain and joint stiffness. Subchondral bone sclerosis and osteophyte formation further exacerbate the progression of the disease 9,32-34 . These pathological changes interact with each other, leading to joint dysfunction and a decrease in quality of life 35 . With the global population aging, the number of elderly people suffering from osteoarthritis is increasing. Meanwhile, the underlying mechanisms of OA are not yet fully understood 36,37 . Therefore, it is urgent to identify new targets related to the pathogenesis of OA to prevent its progression at an early stage. In recent years, advances in sequencing technology and bioinformatics have made it possible to reanalyze previous osteoarthritis datasets 38,39 , thereby identifying relevant mechanisms and pathogenic targets. The application of machine learning algorithms to biomedicine has greatly facilitated a better understanding and interpretation of high-dimensional sequence information. Although some studies have made progress in decoding the heterogeneity of OA, a comprehensive understanding of the disease remains limited. Outdated algorithms have restricted the reliability of clinical practice, but using advanced machine learning algorithms for comprehensive analysis of OA across multiple cohorts has become extremely urgent. , In this study, we explored and validated OA across multiple tissues using bioinformatics and machine learning algorithms, aiming to provide new insights into the mechanisms of OA from multiple perspectives. First, we identified 83 DEGs from multiple tissue datasets using RRA and SVA methods. Functional enrichment, pathway analysis, and disease ontology enrichment analysis of these DEGs revealed their involvement in extracellular matrix organization, extracellular structure organization, Relaxin signaling pathway, PI3K-Akt signaling pathway, and diseases such as connective tissue cancer and musculoskeletal system cancer. Next, we constructed a PPI network and identified hub genes from the identified DEGs. Our methods identified 25 hub genes associated with OA. Functional enrichment and pathway analysis of these hub genes revealed their involvement in extracellular matrix organization, extracellular structure organization, Relaxin signaling pathway, IL-17 signaling pathway, AGE-RAGE signaling pathway in diabetic complications, and PI3K-Akt signaling pathway. These pathways and functions were highly consistent with those of the DEGs, demonstrating the representative value of the identified hub genes. Additionally, these pathways and enrichments are in line with previous studies 19,40-42 . To better identify OA characteristic biomarkers, we employed machine learning algorithms for screening. Initially, we divided the training dataset into training and test groups and evaluated the performance of six commonly used machine learning methods. We found that regardless of the proportion of the training set, XGBoost, SVM, and Lasso regression demonstrated superior performance in screening for OA biomarker features and exhibited higher diagnostic efficiency, significantly outperforming other machine learning methods. Consequently, we selected these three algorithms for screening OA characteristic biomarkers, identifying 15 candidates: IRS2, ADM, SIK1, PTN, CX3CR1, WNT5A, IL21R, APOD, CRLF1, FKBP5, PNMAL1, NPR3, RARRES1, ASPN, and POSTN. By intersecting these genes with the identified hub genes, we pinpointed IRS2, WNT5A, PTN, and POSTN as OA diagnostic genes. Although POSTN showed statistically significant differences in OA (p<0.01), its ROC curve was below 0.7, indicating poor diagnostic performance. Thus, we constructed an OA risk prediction model based on the remaining three genes. The constructed and tested risk score nomogram could distinguish OA from normal tissues, with an AUC value of 0.913, indicating high accuracy. We validated these diagnostic genes using external datasets of OA cartilage samples, synovial samples, and blood samples. In the cartilage dataset, PTN did not show significant expression differences and had a ROC curve area below 0.7. In the synovial tissue dataset, POSTN did not exhibit differential expression compared to the control group. However, in OA blood samples, IRS2 showed statistically significant expression differences and exhibited significant expression differences across all three tissue samples (p<0.01). Its ROC curve area was consistently above 0.7, demonstrating good diagnostic performance.Thus, we have reason to believe that IRS2 might be involved in the majority of OA progression processes, highlighting its potential as a powerful diagnostic biomarker for OA. Periostin (POSTN) is a 90kDa matrix cell protein discovered in 1993 43 , involved in the pathogenesis of various diseases including tumors, pulmonary fibrosis, and allergic diseases, with expression levels increasing as the disease progresses in most cases 43-45 . POSTN interacts with extracellular matrix (ECM) proteins to regulate cell-matrix organization, leading to remodeling and fibrosis. The unique characteristics of POSTN can be attributed to highly complex signaling pathways that lead to increased POSTN production 43 . In recent years, its association with orthopedic diseases has also become apparent 46 . Numerous studies have confirmed that POSTN expression increases in OA cartilage 47-50 , but its expression in OA synovial tissue is controversial. Tajika et al. 51 found that it is highly expressed in OA synovial cells, while Mukundan Attur et al. 52 found no statistically significant difference in its expression in degenerative OA. In OA blood, J C Rousseau et al. 53 found that serum POSTN is associated with the development of OA in women,but Sittisak Honsawek et al. 54 found no statistically significant difference in serum POSTN between OA and control groups. Recently, Tan et al. 55 also showed that serum periostin levels are insufficient as clinical biomarkers for osteoarthritis. In our study, as shown in Figure 1A, POSTN was highly expressed in OA chondrocytes but appeared to be lowly expressed in synovial tissue. From Figure 6, we validated with external OA datasets and found that POSTN consistently showed high expression in cartilage but seemed to trend towards low expression in synovial tissue, without statistical significance, inconsistent with current research. This discrepancy requires further investigation through extensive related studies. Pleiotrophin (PTN) is a member of the midkine family 56 , a secreted heparin-binding peptide expressed during development in mesodermal and neuroectodermal cells but rarely in adult tissues 57 . T. Pufe et al. 58 found that PTN is almost undetectable in normal cartilage and synovial cells but is highly expressed in OA, with significant expression in early and middle stages of OA but not in late stages. Furthermore, T. Pufe et al. 59 suggested that PTN might participate in the early onset and development of OA by stimulating the activation of AP-1 (activator protein-1) transcription factor and altering gene expression. In blood, studies by Fadda et al. 60 showed no significant difference in average PTN levels between OA patients and healthy controls, suggesting that while PTN may play an important role in OA, its potential as a disease biomarker requires larger-scale exploration and further research. In our study, as seen in Figure 6, PTN did not show statistical significance in external cartilage tissue datasets, possibly because our cartilage tissue dataset represented late-stage OA. However, its high expression in the OA synovial tissue dataset is consistent with previous findings. Combining these results, we believe that PTN might play a role in diagnosing early synovitis in OA. WNT5A is a member of the Wnt protein family, which comprises a group of highly conserved signaling proteins that play crucial roles in embryonic development and various cellular processes such as cell migration, polarity, and differentiation. WNT5A operates in the non-canonical Wnt signaling pathway, particularly influencing cell movement and polarity rather than directly affecting cell proliferation 61 . The high expression of WNT5A in OA cartilage has been confirmed in numerous studies, and its mechanisms of action in OA have been widely explored 62-66 , However, the impact of WNT5A in OA synovium is less understood. Lambert et al. 67 found high expression of WNT5A in OA synovium, regulated via the Wnt signaling pathway. This finding aligns with our conclusions. However, the expression of WNT5A in OA blood remains unstudied, necessitating further investigation. Insulin Receptor Substrate 2 (IRS2) is a crucial member of the insulin receptor substrate family, playing a key role in insulin signaling and metabolic regulation 68 . It is expressed in various cell types, including liver, muscle, and adipose tissues, and is involved in multiple physiological processes 69 . The role of IRS2 in OA, however, has not been studied. In our research, using external cartilage, synovial, and OA blood sample datasets, we found that IRS2 consistently showed low expression. Given the absence of in vivo and in vitro studies on IRS2 in OA, we conducted in vitro experiments for validation. Through WB and qRT-PCR experiments, we confirmed that IRS2 is significantly downregulated in OA chondrocytes (p<0.001), consistent with our dataset validation results. Conclusion In summary, this study integrates bioinformatics analysis and machine learning algorithms to identify and validate four promising biomarkers: IRS2, WNT5A, PTN, and POSTN. POSTN can serve as a biomarker for OA cartilage, PTN shows potential for early diagnosis of OA, and WNT5A and IRS2 introduce new diagnostic perspectives for OA. Declarations Data availability statement Yes,The datasets used and/or analysed during the current study available from the corresponding author on reasonable request. Author contributions Guihao zheng, yulong Ouyang contributed equally to this work. GHZ, YLOY, SLC, SX and BH were responsible for study concept and writing the article. GCS was responsible for reviewing and writing the article. All authors read and approved the final manuscript. Competing interests There were no competing interests. References Vos, T. et al. Years lived with disability (YLDs) for 1160 sequelae of 289 diseases and injuries 1990-2010: a systematic analysis for the Global Burden of Disease Study 2010. Lancet (London, England) 380 , 2163-2196, doi:10.1016/s0140-6736(12)61729-2 (2012). Global, regional, and national burden of osteoarthritis, 1990-2020 and projections to 2050: a systematic analysis for the Global Burden of Disease Study 2021. The Lancet. Rheumatology 5 , e508-e522, doi:10.1016/s2665-9913(23)00163-7 (2023). Puig-Junoy, J. & Ruiz Zamora, A. Socio-economic costs of osteoarthritis: a systematic review of cost-of-illness studies. Seminars in arthritis and rheumatism 44 , 531-541, doi:10.1016/j.semarthrit.2014.10.012 (2015). Peat, G. & Thomas, M. J. Osteoarthritis year in review 2020: epidemiology & therapy. Osteoarthritis and cartilage 29 , 180-189, doi:10.1016/j.joca.2020.10.007 (2021). Holt, H. L. et al. Forecasting the burden of advanced knee osteoarthritis over a 10-year period in a cohort of 60-64 year-old US adults. Osteoarthritis and cartilage 19 , 44-50, doi:10.1016/j.joca.2010.10.009 (2011). Guccione, A. A. et al. The effects of specific medical conditions on the functional limitations of elders in the Framingham Study. American journal of public health 84 , 351-358, doi:10.2105/ajph.84.3.351 (1994). Chen, D. et al. Osteoarthritis: toward a comprehensive understanding of pathological mechanism. Bone research 5 , 16044, doi:10.1038/boneres.2016.44 (2017). Xiong, H., Huang, T. Y., Chang, Y. L. & Su, W. T. Achyranthes bidentate extracts protect the IL-1β-induced osteoarthritis of SW1353 chondrocytes. Journal of bioscience and bioengineering 136 , 462-470, doi:10.1016/j.jbiosc.2023.09.008 (2023). Li, X. et al. Pathological progression of osteoarthritis: a perspective on subchondral bone. Frontiers of medicine 18 , 237-257, doi:10.1007/s11684-024-1061-y (2024). Yang, D., Xu, K., Xu, X. & Xu, P. Revisiting prostaglandin E2: A promising therapeutic target for osteoarthritis. Clinical immunology (Orlando, Fla.) 260 , 109904, doi:10.1016/j.clim.2024.109904 (2024). Wen, Z. et al. Endoplasmic Reticulum Stress in Osteoarthritis: A Novel Perspective on the Pathogenesis and Treatment. Aging and disease 14 , 283-286, doi:10.14336/ad.2022.0725 (2023). Wang, Q. et al. Identification of a central role for complement in osteoarthritis. Nature medicine 17 , 1674-1679, doi:10.1038/nm.2543 (2011). Woetzel, D. et al. Identification of rheumatoid arthritis and osteoarthritis patients by transcriptome-based rule set generation. Arthritis research & therapy 16 , R84, doi:10.1186/ar4526 (2014). Filer, A. et al. Stromal transcriptional profiles reveal hierarchies of anatomical site, serum response and disease and identify disease specific pathways. PloS one 10 , e0120917, doi:10.1371/journal.pone.0120917 (2015). Fisch, K. M. et al. Identification of transcription factors responsible for dysregulated networks in human osteoarthritis cartilage by global gene expression analysis. Osteoarthritis and cartilage 26 , 1531-1538, doi:10.1016/j.joca.2018.07.012 (2018). Dong, S., Xia, T., Wang, L., Zhao, Q. & Tian, J. Investigation of candidate genes for osteoarthritis based on gene expression profiles. Acta orthopaedica et traumatologica turcica 50 , 686-690, doi:10.1016/j.aott.2016.04.002 (2016). Coppola, C. et al. Osteoarthritis: Insights into Diagnosis, Pathophysiology, Therapeutic Avenues, and the Potential of Natural Extracts. Current issues in molecular biology 46 , 4063-4105, doi:10.3390/cimb46050251 (2024). Luo, H., Li, L., Han, S. & Liu, T. The role of monocyte/macrophage chemokines in pathogenesis of osteoarthritis: A review. International journal of immunogenetics 51 , 130-142, doi:10.1111/iji.12664 (2024). Zhou, J. et al. Identification of aging-related biomarkers and immune infiltration characteristics in osteoarthritis based on bioinformatics analysis and machine learning. Frontiers in immunology 14 , 1168780, doi:10.3389/fimmu.2023.1168780 (2023). Han, Y. et al. Identification and development of a novel 5-gene diagnostic model based on immune infiltration analysis of osteoarthritis. Journal of translational medicine 19 , 522, doi:10.1186/s12967-021-03183-9 (2021). Yin, W., Lei, Y., Yang, X. & Zou, J. A two-gene random forest model to diagnose osteoarthritis based on RNA-binding protein-related genes in knee cartilage tissue. Aging 15 , 193-212, doi:10.18632/aging.204469 (2023). Chen, Z., Wang, W., Zhang, Y., Xue, X. & Hua, Y. Identification of four-gene signature to diagnose osteoarthritis through bioinformatics and machine learning methods. Cytokine 169 , 156300, doi:10.1016/j.cyto.2023.156300 (2023). Friedman, J., Hastie, T. & Tibshirani, R. Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of statistical software 33 , 1-22 (2010). Luts, J. et al. A tutorial on support vector machine-based methods for classification problems in chemometrics. Analytica chimica acta 665 , 129-145, doi:10.1016/j.aca.2010.03.030 (2010). Decup, F., Léger, S., Lefèvre, S., Doméjean, S. & Grosgogeat, B. Risk factors or indicators for dental caries and tooth wear and their relative importance in adults according to age. Journal of dentistry , 105092, doi:10.1016/j.jdent.2024.105092 (2024). Hart, C. R., Wilson, D. K., Pettit, C. L. & Nykaza, E. T. Machine-learning of long-range sound propagation through simulated atmospheric turbulence. The Journal of the Acoustical Society of America 149 , 4384, doi:10.1121/10.0005280 (2021). Islam, S. M. S. et al. Machine Learning Approaches for Predicting Hypertension and Its Associated Factors Using Population-Level Data From Three South Asian Countries. Frontiers in cardiovascular medicine 9 , 839379, doi:10.3389/fcvm.2022.839379 (2022). Xiao, J. et al. Icariin inhibits chondrocyte ferroptosis and alleviates osteoarthritis by enhancing the SLC7A11/GPX4 signaling. International immunopharmacology 133 , 112010, doi:10.1016/j.intimp.2024.112010 (2024). Chen, Y. et al. Icariin alleviates osteoarthritis through PI3K/Akt/mTOR/ULK1 signaling pathway. European journal of medical research 27 , 204, doi:10.1186/s40001-022-00820-x (2022). Mu, Y., Wang, L., Fu, L. & Li, Q. Knockdown of LMX1B Suppressed Cell Apoptosis and Inflammatory Response in IL-1β-Induced Human Osteoarthritis Chondrocytes through NF-κB and NLRP3 Signal Pathway. Mediators of inflammation 2022 , 1870579, doi:10.1155/2022/1870579 (2022). Xu, H. et al. Inhibition of CC chemokine receptor 1 ameliorates osteoarthritis in mouse by activating PPAR-γ. Molecular medicine (Cambridge, Mass.) 30 , 74, doi:10.1186/s10020-024-00823-w (2024). Martel-Pelletier, J. et al. Osteoarthritis. Nature reviews. Disease primers 2 , 16072, doi:10.1038/nrdp.2016.72 (2016). Benito, M. J., Veale, D. J., FitzGerald, O., van den Berg, W. B. & Bresnihan, B. Synovial tissue inflammation in early and late osteoarthritis. Annals of the rheumatic diseases 64 , 1263-1267, doi:10.1136/ard.2004.025270 (2005). Goldring, M. B. & Goldring, S. R. Osteoarthritis. Journal of cellular physiology 213 , 626-634, doi:10.1002/jcp.21258 (2007). Hunter, D. J., Schofield, D. & Callander, E. The individual and socioeconomic impact of osteoarthritis. Nature reviews. Rheumatology 10 , 437-441, doi:10.1038/nrrheum.2014.44 (2014). Hawker, G. A. et al. Understanding the pain experience in hip and knee osteoarthritis--an OARSI/OMERACT initiative. Osteoarthritis and cartilage 16 , 415-422, doi:10.1016/j.joca.2007.12.017 (2008). Prieto-Alhambra, D. et al. Incidence and risk factors for clinically diagnosed knee, hip and hand osteoarthritis: influences of age, gender and osteoarthritis affecting other joints. Annals of the rheumatic diseases 73 , 1659-1664, doi:10.1136/annrheumdis-2013-203355 (2014). Lin, J. et al. Bioinformatics analysis to identify key genes and pathways influencing synovial inflammation in osteoarthritis. Molecular medicine reports 18 , 5594-5602, doi:10.3892/mmr.2018.9575 (2018). Fang, C. et al. CDKN1A regulation on chondrogenic differentiation of human chondrocytes in osteoarthritis through single-cell and bulk sequencing analysis. Heliyon 10 , e27466, doi:10.1016/j.heliyon.2024.e27466 (2024). Sun, K. et al. The PI3K/AKT/mTOR signaling pathway in osteoarthritis: a narrative review. Osteoarthritis and cartilage 28 , 400-409, doi:10.1016/j.joca.2020.02.027 (2020). Xiao, J. et al. IL-17 in osteoarthritis: A narrative review. Open life sciences 18 , 20220747, doi:10.1515/biol-2022-0747 (2023). Wang, K., Li, Y. & Lin, J. Identification of diagnostic biomarkers for osteoarthritis through bioinformatics and machine learning. Heliyon 10 , e27506, doi:10.1016/j.heliyon.2024.e27506 (2024). Ono, J., Takai, M., Kamei, A., Azuma, Y. & Izuhara, K. Pathological Roles and Clinical Usefulness of Periostin in Type 2 Inflammation and Pulmonary Fibrosis. Biomolecules 11 , doi:10.3390/biom11081084 (2021). Yu, Y., Tan, C. M. & Jia, Y. Y. Research status and the prospect of POSTN in various tumors. Neoplasma 68 , 673-682, doi:10.4149/neo_2021_210223N239 (2021). Sonnenberg-Riethmacher, E., Miehe, M. & Riethmacher, D. Periostin in Allergy and Inflammation. Frontiers in immunology 12 , 722170, doi:10.3389/fimmu.2021.722170 (2021). Yoshihara, T. et al. Mechanisms of tissue degeneration mediated by periostin in spinal degenerative diseases and their implications for pathology and diagnosis: a review. Frontiers in medicine 10 , 1276900, doi:10.3389/fmed.2023.1276900 (2023). Han, T., Mignatti, P., Abramson, S. B. & Attur, M. Periostin interaction with discoidin domain receptor-1 (DDR1) promotes cartilage degeneration. PloS one 15 , e0231501, doi:10.1371/journal.pone.0231501 (2020). Chijimatsu, R. et al. Expression and pathological effects of periostin in human osteoarthritis cartilage. BMC musculoskeletal disorders 16 , 215, doi:10.1186/s12891-015-0682-3 (2015). Duan, X. et al. Amelioration of Posttraumatic Osteoarthritis in Mice Using Intraarticular Silencing of Periostin via Nanoparticle-Based Small Interfering RNA. Arthritis & rheumatology (Hoboken, N.J.) 73 , 2249-2260, doi:10.1002/art.41794 (2021). Attur, M. et al. Elevated expression of periostin in human osteoarthritic cartilage and its potential role in matrix degradation via matrix metalloproteinase-13. FASEB journal : official publication of the Federation of American Societies for Experimental Biology 29 , 4107-4121, doi:10.1096/fj.15-272427 (2015). Tajika, Y. et al. Influence of Periostin on Synoviocytes in Knee Osteoarthritis. In vivo (Athens, Greece) 31 , 69-77, doi:10.21873/invivo.11027 (2017). Attur, M. et al. Periostin loss-of-function protects mice from post-traumatic and age-related osteoarthritis. Arthritis research & therapy 23 , 104, doi:10.1186/s13075-021-02477-z (2021). Rousseau, J. C., Sornay-Rendu, E., Bertholon, C., Garnero, P. & Chapurlat, R. Serum periostin is associated with prevalent knee osteoarthritis and disease incidence/progression in women: the OFELY study. Osteoarthritis and cartilage 23 , 1736-1742, doi:10.1016/j.joca.2015.05.015 (2015). Honsawek, S., Wilairatana, V., Udomsinprasert, W., Sinlapavilawan, P. & Jirathanathornnukul, N. Association of plasma and synovial fluid periostin with radiographic knee osteoarthritis: Cross-sectional study. Joint bone spine 82 , 352-355, doi:10.1016/j.jbspin.2015.01.023 (2015). Tan, Q. et al. Serum periostin level is not sufficient to serve as a clinically applicable biomarker of osteoarthritis. BMC musculoskeletal disorders 23 , 1039, doi:10.1186/s12891-022-06017-x (2022). Yazihan, N. Midkine in inflammatory and toxic conditions. Current drug delivery 10 , 54-57, doi:10.2174/1567201811310010009 (2013). Mentlein, R. Targeting pleiotropin to treat osteoarthritis. Expert opinion on therapeutic targets 11 , 861-867, doi:10.1517/14728222.11.7.861 (2007). Pufe, T., Bartscher, M., Petersen, W., Tillmann, B. & Mentlein, R. Pleiotrophin, an embryonic differentiation and growth factor, is expressed in osteoarthritis. Osteoarthritis and cartilage 11 , 260-264, doi:10.1016/s1063-4584(02)00385-0 (2003). Pufe, T., Groth, G., Goldring, M. B., Tillmann, B. & Mentlein, R. Effects of pleiotrophin, a heparin-binding growth factor, on human primary and immortalized chondrocytes. Osteoarthritis and cartilage 15 , 155-162, doi:10.1016/j.joca.2006.07.005 (2007). Fadda, S. M. H., Bassyouni, I. H., Khalifa, R. H. & Elsaid, N. Y. Pleiotrophin, the angiogenic and mitogenic growth factor: levels in serum and synovial fluid in rheumatoid arthritis and osteoarthritis : And correlation with clinical, laboratory and radiological indices. Zeitschrift fur Rheumatologie 77 , 322-329, doi:10.1007/s00393-016-0234-8 (2018). Suthon, S., Perkins, R. S., Bryja, V., Miranda-Carboni, G. A. & Krum, S. A. WNT5B in Physiology and Disease. Frontiers in cell and developmental biology 9 , 667581, doi:10.3389/fcell.2021.667581 (2021). Shao, L. T. et al. The Protective Effects of Parathyroid Hormone (1-34) on Cartilage and Subchondral Bone Through Down-Regulating JAK2/STAT3 and WNT5A/ROR2 in a Collagenase-Induced Osteoarthritis Mouse Model. Orthopaedic surgery 13 , 1662-1672, doi:10.1111/os.13019 (2021). Martineau, X., Abed, É., Martel-Pelletier, J., Pelletier, J. P. & Lajeunesse, D. Alteration of Wnt5a expression and of the non-canonical Wnt/PCP and Wnt/PKC-Ca2+ pathways in human osteoarthritis osteoblasts. PloS one 12 , e0180711, doi:10.1371/journal.pone.0180711 (2017). Li, Y. et al. The Expression of Osteopontin and Wnt5a in Articular Cartilage of Patients with Knee Osteoarthritis and Its Correlation with Disease Severity. BioMed research international 2016 , 9561058, doi:10.1155/2016/9561058 (2016). Qi, Y., Tang, R., Shi, Z., Feng, G. & Zhang, W. Wnt5a/Platelet-rich plasma synergistically inhibits IL-1β-induced inflammatory activity through NF-κB signaling pathway and prevents cartilage damage and promotes meniscus regeneration. Journal of tissue engineering and regenerative medicine 15 , 612-624, doi:10.1002/term.3198 (2021). Ding, D. et al. Zoledronic acid generates a spatiotemporal effect to attenuate osteoarthritis by inhibiting potential Wnt5a-associated abnormal subchondral bone resorption. PloS one 17 , e0271485, doi:10.1371/journal.pone.0271485 (2022). Lambert, C. et al. Gene expression pattern of cells from inflamed and normal areas of osteoarthritis synovial membrane. Arthritis & rheumatology (Hoboken, N.J.) 66 , 960-968, doi:10.1002/art.38315 (2014). Lee, Y. H. & White, M. F. Insulin receptor substrate proteins and diabetes. Archives of pharmacal research 27 , 361-370, doi:10.1007/bf02980074 (2004). Eckstein, S. S., Weigert, C. & Lehmann, R. Divergent Roles of IRS (Insulin Receptor Substrate) 1 and 2 in Liver and Skeletal Muscle. Current medicinal chemistry 24 , 1827-1852, doi:10.2174/0929867324666170426142826 (2017). Additional Declarations No competing interests reported. Supplementary Files Supplemental.pdf Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4706641","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":333953544,"identity":"4aabca5b-d3d4-4f7b-9301-a7820137a637","order_by":0,"name":"guihao Zheng","email":"","orcid":"","institution":"The First Affiliated Hospital, Jiangxi Medical College, Nanchang University,","correspondingAuthor":false,"prefix":"","firstName":"guihao","middleName":"","lastName":"Zheng","suffix":""},{"id":333953546,"identity":"67ed26bd-50b9-4a09-a302-d3f56a00463d","order_by":1,"name":"yulong Ouyang","email":"","orcid":"","institution":"The First Affiliated Hospital, Jiangxi Medical College, Nanchang University,","correspondingAuthor":false,"prefix":"","firstName":"yulong","middleName":"","lastName":"Ouyang","suffix":""},{"id":333953548,"identity":"4841e369-4da1-418a-9c7d-63c39d71f424","order_by":2,"name":"shuilin Chen","email":"","orcid":"","institution":"The First Affiliated Hospital, Jiangxi Medical College, Nanchang University,","correspondingAuthor":false,"prefix":"","firstName":"shuilin","middleName":"","lastName":"Chen","suffix":""},{"id":333953550,"identity":"d312fe5c-c479-4eb4-bf6a-079ba85f5d0b","order_by":3,"name":"bei Hu","email":"","orcid":"","institution":"Nanchang University","correspondingAuthor":false,"prefix":"","firstName":"bei","middleName":"","lastName":"Hu","suffix":""},{"id":333953551,"identity":"2b2f37cb-e251-44e4-b810-e97ec69bbecf","order_by":4,"name":"shuai Xu","email":"","orcid":"","institution":"Nanchang University","correspondingAuthor":false,"prefix":"","firstName":"shuai","middleName":"","lastName":"Xu","suffix":""},{"id":333953553,"identity":"2c6dffbb-3f1a-4fbd-871d-9e6d77c054fa","order_by":5,"name":"guicai Sun","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA5klEQVRIie3RMQrCMBSA4dRAujzp2iLe4ZVCRRDPUinoUpy7WSk4iXMFD9EjRDq45ACKiyI4VRBc3LR1cYtxE8wPGQLvC4FHiE73kwXkSvBBmZm8rkaiQIyMxNS0gH9FBLWcLFAkHXN0TJsz1sLdZX0D0mvnnJ4PMtKdl25FwMP9OGwBGXo5Zx2UEdxG5qk5s0PcRz4FUgxyDsyWk9fHcJLvRE0eKiRwUxABdTKoCVcgonSnq5hTCyLPWWHoLQvmy8lmdExK5NUqhXst4357sUnPUlLXgPcL1aGf5quMu8KQTqfT/XFPui5I5vrRsT8AAAAASUVORK5CYII=","orcid":"","institution":"The First Affiliated Hospital, Jiangxi Medical College, Nanchang University,","correspondingAuthor":true,"prefix":"","firstName":"guicai","middleName":"","lastName":"Sun","suffix":""}],"badges":[],"createdAt":"2024-07-08 15:24:32","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4706641/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4706641/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":62157810,"identity":"b7f96956-2d3e-46f8-bc10-91ad6f945d5c","added_by":"auto","created_at":"2024-08-09 21:23:41","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":353712,"visible":true,"origin":"","legend":"\u003cp\u003eIdentification and Enrichment Analysis of DEGs in OA.\u003c/p\u003e\n\u003cp\u003e(A) Heat Maps of RRA. The top 50 up-regulated genes and the top 10 down-regulated genes from RRA analysis are displayed across seven datasets. Each column represents a dataset, and each row represents a gene. Colors range from red (up-regulated) to green (down-regulated), with white indicating no change. The number in each rectangle represents the log2 fold change value. (B) Heat Map of OA DEGs. (C) Volcano Plot of OA DEGs. (D) Venn Diagram. The Venn diagram shows the intersection of differentially expressed genes after combining datasets with RRA and SVA methods. (E,H) Gene Ontology (GO) Analysis. Bar and circle charts present the GO analysis of key genes. (F,I) Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathway Analysis Bar and circle charts present the KEGG pathway analysis of key genes. (G,J) Disease Ontology (DO) Analysis. Bar and circle charts present the DO analysis of key genes. In the circle charts, the outermost ring represents the ID of each GO, KEGG, and DO term. The next layer's color represents the enrichment significance, with deeper colors indicating higher significance. The values on this layer indicate the number of background genes. The following layer in purple represents the number of genes associated with each term. The innermost ring's height represents the enrichment ratio (the ratio of genes associated with the term to the number of background genes).\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-4706641/v1/35bcf1e8d49e203f34bee337.png"},{"id":62157219,"identity":"cc6334ae-698e-4d3c-99f9-4c98dfa94bf5","added_by":"auto","created_at":"2024-08-09 21:15:42","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":292034,"visible":true,"origin":"","legend":"\u003cp\u003eVisualization of the Protein-Protein Interaction (PPI) Network, Identification, and Enrichment Analysis of Hub Genes\u003c/p\u003e\n\u003cp\u003e(A)PPI network visualization of key genes, where orange nodes represent up-regulated genes and green nodes represent down-regulated genes. (B) Analysis using the \"cytoHubba\" algorithm identifies 25 key genes. Each node represents a key gene recognized by one of the algorithms, with the number of algorithm recognitions shown as the \"key gene\" score. (C) PPI network visualization of 15 hub genes, where orange nodes indicate up-regulated genes and pink nodes indicate down-regulated genes. (D) Heat map displaying expression patterns of hub genes. (E) Volcano plot showing differential expression of hub genes. (F) Co-expression network of hub genes. GeneMANIA predicted relationships based on co-expression, shared protein domains, co-localization, and pathways among hub genes. (G-H) GO bar plot and category network plot for hub genes. (I-J) KEGG bar plot and category network plot for hub genes. The category network plots show the relationship between differentially expressed genes (DEGs) and GO/KEGG terms. The size of the term nodes represents the number of genes enriched in each term, the color of the gene nodes indicates the significance of up- or down-regulation (red for up-regulated, purple for down-regulated), and the size of the gene nodes represents their importance (participation in more terms).\u003c/p\u003e","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-4706641/v1/3bd9be8cf9755f9f85f1938f.png"},{"id":62157215,"identity":"b775fc35-52a7-4ab4-918a-36390b6c190d","added_by":"auto","created_at":"2024-08-09 21:15:41","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":215060,"visible":true,"origin":"","legend":"\u003cp\u003ePrediction and Partial Validation of Transcription Factors for Hub Genes.\u003c/p\u003e\n\u003cp\u003e(A)Predicted transcription factors (TFs) and their interaction network with hub genes. (Ovals in the inner circle represent hub genes, triangles in the outer circle represent predicted TFs). (B) Differential expression of JUN between the OA and control groups. (C) Differential expression of SP1 between the OA and control groups. (D) Differential expression of RELA between the OA and control groups. (E) Differential expression of HDAC4 between the OA and control groups. (F) Differential expression of ATF4 between the OA and control groups (*p \u0026lt; 0.05, **p \u0026lt; 0.01, ***p \u0026lt; 0.001).\u003c/p\u003e","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-4706641/v1/18c6968054891427919b6e12.png"},{"id":62157222,"identity":"b44a8c58-7370-4c2e-9695-11650278bbc9","added_by":"auto","created_at":"2024-08-09 21:15:42","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":650259,"visible":true,"origin":"","legend":"\u003cp\u003eMachine learning screening of OA feature genes.\u003c/p\u003e\n\u003cp\u003e(A-B) Residual boxplots (red dots represent the root mean square of the residuals, and black lines represent residual values; lower residual values indicate that the model's predictions are closer to the actual results) and ROC curves of machine learning methods when the training set accounts for 70% of the data. (C-D) Residual boxplots and ROC curves of machine learning methods when the training set accounts for 80% of the data. (E) Ranking of the relative importance of genes screened by XGBoost. (F) LASSO coefficient analysis, with vertical dotted lines drawn at the optimal lambda. (G) Ten cross-validations of adjustment parameter selection in the LASSO model, with each curve corresponding to a gene. (H-I) Maximum accuracy and minimum error diagrams of the SVM-RFE algorithm for screening the best OA feature genes. (J) Venn diagram of OA feature genes identified by the XGBoost, LASSO, and SVM-RFE algorithms.\u003c/p\u003e","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-4706641/v1/7b4d364ae2ead2bb5746b563.png"},{"id":62157811,"identity":"0c42b38a-952d-431f-ab09-53d866c89f3b","added_by":"auto","created_at":"2024-08-09 21:23:42","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":207817,"visible":true,"origin":"","legend":"\u003cp\u003eRisk prediction model of characteristic genes.\u003c/p\u003e\n\u003cp\u003e(A)ROC curve and expression box diagram of characteristic genes. (B) Nomogram of characteristic genes for the diagnosis of OA patients. (C) Calibration curve used to estimate the prediction accuracy of the nomogram (the closer the line is to the ideal dashed line, the more reliable the result). (D) Accuracy of the clinical decision curve detection model (the farther the red line is from the gray line, the higher the accuracy). (E) ROC curve of the clinical decision model (*P\u0026lt;0.05; **P\u0026lt;0.01; ***P\u0026lt;0.001).\u003c/p\u003e","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-4706641/v1/aed7a5cd352e80e8e39c6e7c.png"},{"id":62157218,"identity":"be102e80-bf7e-47a0-a311-8cd68f5d3443","added_by":"auto","created_at":"2024-08-09 21:15:41","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":183324,"visible":true,"origin":"","legend":"\u003cp\u003eDifferential expression diagram and ROC curve of characteristic genes in the OA external validation dataset.\u003c/p\u003e\n\u003cp\u003e(A-C) Violin plot of normal and OA characteristic gene expression in cartilage samples, synovial samples, and blood samples test sets, respectively (*P\u0026lt;0.05; **P\u0026lt;0.01; ***P\u0026lt;0.001). (D-F) ROC curve analysis of normal and OA characteristic genes in cartilage samples, synovial samples, and blood samples test sets, respectively.\u003c/p\u003e","description":"","filename":"floatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-4706641/v1/0d802e22a28500b845d5bfff.png"},{"id":62157813,"identity":"90a12ce3-6000-4ae2-a35b-30d3ca3cf19c","added_by":"auto","created_at":"2024-08-09 21:23:42","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":178621,"visible":true,"origin":"","legend":"\u003cp\u003eIRS2 was verified by western blotting and qRT-PCR in OA chondrocytes.\u003c/p\u003e\n\u003cp\u003e(A) mRNA expression levels of IRS2, COL2A1, COX2, MMP13, and SOX9 were detected by qRT-PCR after treatment with 10 ng/ml IL-1β for 24 hours in the SW1353 cell line. (B) Protein blots of IRS2, COL2A1, COX2, MMP13, and SOX9 were detected after treatment with 10 ng/ml IL-1β for 24 hours in the SW1353 cell line. The blots were cut prior to antibody hybridization, with the original blots provided in the supplementary information. (C) Quantitative analysis of protein expression. Data are presented as mean ± standard deviation. (*P\u0026lt;0.05, **P\u0026lt;0.01, ***P\u0026lt;0.001).\u003c/p\u003e","description":"","filename":"floatimage7.png","url":"https://assets-eu.researchsquare.com/files/rs-4706641/v1/ace7900a8bd2a215e37f7f80.png"},{"id":63759703,"identity":"92ab5246-7cf7-4836-8a39-f5551ae7019e","added_by":"auto","created_at":"2024-09-02 06:01:30","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2695248,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4706641/v1/ff32e6e4-4cb8-4419-badf-640e5ca15352.pdf"},{"id":62157217,"identity":"baad2173-9452-425c-bf51-5da5d1705747","added_by":"auto","created_at":"2024-08-09 21:15:41","extension":"pdf","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":1561951,"visible":true,"origin":"","legend":"","description":"","filename":"Supplemental.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4706641/v1/5e7493cd4c53899021750f6d.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Identification and validation of novel characteristic genes based on multi-tissue osteoarthritis","fulltext":[{"header":"Introduction","content":"\u003cp\u003eOsteoarthritis (OA) remains a global health challenge\u003csup\u003e1\u003c/sup\u003e, affecting an estimated 600 million people worldwide, or 7.6% of the world\u0026apos;s population\u0026nbsp;\u003csup\u003e2\u003c/sup\u003e, with a higher prevalence in women\u003csup\u003e3\u003c/sup\u003e. The incidence and disability rate of OA have been rising since 1990, increasing by as much as 8% to 10%\u003csup\u003e4\u003c/sup\u003e, Its prevalence is expected to nearly double in the next decade\u003csup\u003e5\u003c/sup\u003e. Additionally, the socioeconomic costs of OA are substantial, estimated to reach $80 billion in 2016 alone, and growing at a rate of 5.7% per year\u003csup\u003e4\u003c/sup\u003e. As the population ages, the prevalence of OA increases year by year, and in the elderly, the risk of mobility impairment and disability caused by OA surpasses that of any other disease.\u003csup\u003e6,7\u003c/sup\u003e.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eOA is characterized by the degradation of the extracellular matrix (ECM), decreased synthesis of type II collagen, increased production of matrix metalloproteinase 13 (MMP13), and activation of Cyclooxygenase-2 (COX2) by destructive intracellular cytokines. These cytokines promote the conversion of arachidonic acid into inflammatory prostaglandins, resulting in an inflammatory response\u003csup\u003e8\u003c/sup\u003e.\u0026nbsp;During this process, the synovial membrane, cartilage, and subchondral bone interact with each other, jointly promoting disease development\u003csup\u003e9-11\u003c/sup\u003e. Although the etiology of OA is multi-factorial, the exact cause remains unknown. Therefore, developing new OA-related targets for better diagnosis and treatment of patients with OA is urgently needed and remains a focus and challenge of current research.\u003c/p\u003e\n\u003cp\u003eWith the rapid development of high-throughput sequencing technology, an increasing number of OA-related datasets have been generated and utilized\u0026nbsp;\u003csup\u003e12-15\u003c/sup\u003e.\u0026nbsp;Although some studies have focused on OA biomarkers, \u0026nbsp;most are based on single tissues or individual datasets\u003csup\u003e16\u003c/sup\u003e.\u0026nbsp;The occurrence and development of OA are closely related to the interactions among cartilage, subchondral bone, and synovitis. Despite some controversy regarding the order of pathological events, it is undeniable that these components are all involved in disease progression\u0026nbsp;\u003csup\u003e17,18\u003c/sup\u003e,\u0026nbsp;which may lead to the unreliability of current research results\u0026nbsp;\u003csup\u003e19-22\u003c/sup\u003e. The application of machine learning (ML) in OA biomarker development has garnered growing research interest. For instance,\u0026nbsp;Chen et al.\u003csup\u003e22\u003c/sup\u003e applied synovial tissue microarray datasets and identified ZBTB16, TNFSF11, SCRG1, and KDELR3 as diagnostic biomarkers for OA. A study by Han et al.[14]\u0026nbsp;also reported that TLR7, CSF1R, APOE, C1QA, and CCL5 are key genes with high diagnostic value for OA after applying weighted gene co-expression network analysis (WGCNA) and the cytoHubba algorithm.\u0026nbsp;Although these findings may contribute to a better understanding of OA and provide new insights into its diagnosis and treatment, the limited sample sizes and number of algorithms used may undermine the reliability and robustness of these studies to some extent. \u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIn this study, we aim to integrate public OA datasets from subchondral bone, cartilage, and synovium tissues to discover and validate meaningful OA biomarkers. We employed Robust Rank Aggregation (RRA) and Surrogate Variable Analysis (SVA) to jointly screen for differentially expressed genes. Additionally, we used machine learning simulations to identify the best method for screening disease biomarkers and validated these biomarkers in OA cartilage, synovium, and blood samples. Furthermore, we investigated the biological processes and predicted transcription factors of disease hub genes.\u003c/p\u003e"},{"header":"Materials and Methods","content":"\u003cp\u003e2.1 Data Collection\u003c/p\u003e\n\u003cp\u003eThe datasets GSE51588, GSE12021, GSE55457, GSE56409, GSE114007, GSE168505, GSE169077, GSE55235, GSE129147, and GSE48556 were obtained from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/). GSE51588 is a whole-genome expression profile of subchondral bone. GSE12021, GSE55457, GSE56409, and GSE55235 contain data measured in synovial samples from OA patients and normal controls. GSE114007, GSE168505, GSE169077, and GSE129147 are RNA sequencing data from joint cartilage tissues, while GSE48556 consists of mononuclear cells from the blood of OA patients and normal controls. All these data are derived from human samples.\u003c/p\u003e\n\u003cp\u003e2.2 Identification of Differentially Expressed Genes\u003c/p\u003e\n\u003cp\u003e2.2.1 Identification of DEGs Using the RRA Method\u003c/p\u003e\n\u003cp\u003eThe seven raw datasets were processed using the limma package in Bioconductor, which included normalization and log transformation. Subsequently, DEGs were analyzed using the limma package, and ranked lists of upregulated and downregulated DEGs were generated for each dataset based on their fold changes. Finally, we integrated the results of these seven datasets using the R package \u0026quot;RobustRankAggreg\u0026quot; based on the robust rank aggregation (RRA) method to identify the most significant DEGs. In the RRA analysis, genes with a log fold change (logFC) \u0026gt;1 and an adjusted p-value \u0026lt;0.05 were selected as significant DEGs.\u003c/p\u003e\n\u003cp\u003e2.2.2 Identification of Differentially Expressed Genes (DEGs) Using the SVA Method\u003c/p\u003e\n\u003cp\u003eBefore merging the seven microarray datasets, we used the ComBat function in the Surrogate Variable Analysis (SVA) package to correct for batch effects, aiming to minimize experimental variance before subsequent analyses. Box plots and Principal Component Analysis (PCA) were used to assess the data before and after correction. The final merged dataset consisted of 167 samples, including 66 normal control samples and 101 OA samples. In the SVA analysis, genes with a log fold change (logFC) \u0026gt;1 and an adjusted p-value \u0026lt;0.05 were selected as significant DEGs. Finally, the common genes identified by both the RRA and SVA methods were considered as key genes for OA.\u003c/p\u003e\n\u003cp\u003e2.3 Gene Ontology(GO)\u0026nbsp;Functional Annotation, Kyoto Encyclopedia of Genes and Genomes (KEGG) Analysis, and Disease Ontology (DO) Enrichment Analysis\u003c/p\u003e\n\u003cp\u003eGO functional annotation, KEGG analysis, and DO enrichment analysis were performed using R packages (enrichplot, RColorBrewer, Heatmap, etc.). p-value \u0026lt;0.05 was considered statistically significant for enrichment.\u003c/p\u003e\n\u003cp\u003e2.4 Identification of Gene Clusters and Construction of Protein-Protein Interaction Networks\u003c/p\u003e\n\u003cp\u003eThe STRING database (https://string-db.org/) was used to construct the protein-protein interaction (PPI) network, which was then imported into Cytoscape v3.9.1 software to visualize the key nodes in the molecular interaction network. The cytoHubba algorithms were employed to analyze the PPI network. Only genes identified as hub genes by all cytoHubba calculation methods (\u0026quot;MCC\u0026quot;, \u0026quot;DMNC\u0026quot;, \u0026quot;MNC\u0026quot;, \u0026quot;Degree\u0026quot;, \u0026quot;EPC\u0026quot;, \u0026quot;BottleNeck\u0026quot;, \u0026quot;EcCentricity\u0026quot;, \u0026quot;Closeness\u0026quot;, \u0026quot;Radiality\u0026quot;, \u0026quot;Betweenness\u0026quot;, \u0026quot;Stress\u0026quot;, \u0026quot;ClusteringCoefficient\u0026quot;) were included in the final selection. The interaction genes and functions of these hub genes were predicted, and relevant PPI networks were generated using the GeneMANIA database (https://genemania.org/). Enrichment analysis of the hub genes was further conducted using GO functional terms and KEGG pathway enrichment analysis, with a p-value \u0026lt;0.05 considered statistically significant.\u003c/p\u003e\n\u003cp\u003e2.5 Construction of Transcriptional Regulatory Networks\u003c/p\u003e\n\u003cp\u003eTo identify substantial transcriptional changes and gain deeper insights into the regulatory roles of key osteoarthritis (OA) genes, we utilized TRRUST (https://www.grnpedia.org/trrust/) for sentence-based text mining to uncover transcriptional regulatory relationships. Subsequently, we assessed the enrichment of hub genes to identify the corresponding transcription factors (TFs). Finally, we constructed the TF regulatory network using Cytoscape software and validated the TFs that regulate multiple hub genes.\u003c/p\u003e\n\u003cp\u003e2.6 Machine Learning\u003c/p\u003e\n\u003cp\u003eWe used the `train` function from the caret package to predict the best model using various machine learning methods, including Least Absolute Shrinkage and Selection Operator (LASSO), Support Vector Machines (SVM), Decision Trees (DT), Random Forest (RF), eXtreme Gradient Boosting (XGBoost), and Generalized Linear Model (GLM).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eLASSO regression is a linear regression technique that combines L1 regularization to constrain the complexity of the model, thus achieving feature selection and preventing overfitting\u003csup\u003e23\u003c/sup\u003e.\u0026nbsp;SVM is a powerful supervised learning model that can be used for classification and regression tasks. It has shown wide applicability and superior performance in high-dimensional data, complex data structures, and small sample learning\u003csup\u003e24\u003c/sup\u003e.\u0026nbsp;DT is a supervised learning algorithm that splits nodes by selecting the features that best divide the data\u003csup\u003e25\u003c/sup\u003e.\u0026nbsp;RF is an ensemble learning method that improves model performance and stability by voting (classification) or averaging (regression) the results of multiple decision trees. It is very powerful in identifying subtle patterns in complex datasets\u003csup\u003e26\u003c/sup\u003e.\u0026nbsp;XGBoost is a gradient boosting framework that uses tree structure models for efficient and flexible machine learning algorithms. It is widely praised for its excellent performance and efficiency, being widely used in various machine learning competitions and real-world problems\u003csup\u003e27\u003c/sup\u003e.\u0026nbsp;GLM is an extension of linear models that can be applied to a wide range of data types, with strong interpretability and good applicability.\u003c/p\u003e\n\u003cp\u003eThrough training and testing these models, we selected the best three models to analyze the differentially expressed genes and identify the optimal feature genes. The results obtained were intersected with the hub genes to serve as diagnostic genes for the disease.\u003c/p\u003e\n\u003cp\u003e2.7 Validation of Feature Genes with Receiver Operating Characteristic(ROC) Curve and Construction of Clinical Diagnostic Model\u003c/p\u003e\n\u003cp\u003eTo validate the importance of candidate genes in diagnosis, we used the \u0026quot;rms\u0026quot; R package to construct a nomogram and to predict and interpret the constructed model. Box plots and violin plots were used to visualize the differential expression of genes between the control and OA groups. We further evaluated the prognostic value of the candidate genes through ROC analysis, obtaining the area under the ROC curve (AUC) and the 95% confidence interval (CI). An AUC value \u0026gt;0.7 was considered to indicate good diagnostic performance.\u003c/p\u003e\n\u003cp\u003e2.8 Validation of Feature Genes by Western Blot(WB) and Quantitative Real-Time PCR (qRT-PCR)\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eHuman SW1353 chondrocytes, commonly used for OA modeling\u003csup\u003e8,28\u003c/sup\u003e, were cultured in an incubator under standard conditions (37℃, 5% CO2) using specialized SW1353 culture medium (DMEM, Pricella, CN), with the medium changed every 2-3 days. When SW1353 cells reached 50% confluence, they were stimulated with 10 ng/ml IL-1\u0026beta;(Peprotech, USA) for 24 hours to model OA\u003csup\u003e29-31\u003c/sup\u003e, The treated chondrocytes were lysed in RIPA lysis buffer (Solarbio, CN) containing 1% protease and phosphatase inhibitors (GlpBio, USA) for 30 minutes. The lysates were collected by scraping the adherent cells thoroughly, then incubated on a shaker at 4\u0026deg;C for 1 hour before centrifugation to collect the supernatant for subsequent experiments. The protein concentration in the supernatant was measured using a BCA kit (Solarbio, China) on a microplate reader (Thermo Fisher Scientific, USA) at a wavelength of 562 nm.\u003c/p\u003e\n\u003cp\u003eSubsequently, equal amounts of sample protein and protein markers were separated by 10% sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) and transferred to PVDF membranes (Millipore, USA). After blocking with 5% BSA (Servicebio, Wuhan, CN) for 1 hour, the membranes were incubated with primary antibodies overnight at 4℃. After washing three times with Tris-buffered saline containing Tween (TBST), the membranes were incubated with secondary antibodies at room temperature for 2 hours. Details of the primary and secondary antibodies are listed in Table 1. Purified protein bands were visualized using enhanced chemiluminescence reagent (YEASEN, CN) and imaged with a Bio-Rad scanner (Bio-Rad, USA). GAPDH was used as the housekeeping gene. Band analysis was performed using ImageJ software.\u003c/p\u003e\n\u003cp\u003eTable 1 Antibody Information\u003c/p\u003e\n\u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003eName\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003eType\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003eSpecies\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003eSource\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003eCOL2A1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003eprimary antibodies\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003eRabbit\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003eProteintech/China\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003eIRS2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003eprimary antibodies\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003eMouse\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003eProteintech/China\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003eMMP13\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003eprimary antibodies\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003eRabbit\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003eProteintech/China\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003eSOX9\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003eprimary antibodies\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003eRabbit\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003eZenbio/China\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003eCOX2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003eprimary antibodies\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003eRabbit\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003eZenbio/China\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003eGAPDH\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003eprimary antibodies\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003eMouse\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003eProteintech/China\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003eGout anti-Rabbit IgG\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003esecondary antibodies\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003eProteintech/China\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003eGout anti-Mouse IgG\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003esecondary antibodies\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003eProteintech/China\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003eSimilarly, total RNA was extracted from the treated chondrocytes using Trizol reagent (TaKaRa, JPN) and purified with an RNA extraction kit (TransGen, CN). The total RNA concentration of the samples was measured using a micro-UV spectrophotometer (Thermo Fisher Scientific, USA). The RNA samples were then reverse transcribed into cDNA using a reverse transcription kit (TransGen, CN). Finally, the target mRNA was amplified on a real-time fluorescence quantitative PCR instrument (Bio-Rad, USA). The final quantitative results were normalized to GAPDH and calculated using the 2^(-△△Ct) method. The primer sequences for the target genes are shown in Table 2.\u003c/p\u003e\n\u003cp\u003eTable 2 Primer Information\u003c/p\u003e\n\u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd width=\"16.197183098591548%\" valign=\"top\"\u003e\n \u003cp\u003ePrimer Name\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"73.2394366197183%\" valign=\"top\"\u003e\n \u003cp\u003ePrimer Sequence (5\u0026apos;-3\u0026apos;)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"10.56338028169014%\" valign=\"top\"\u003e\n \u003cp\u003eSpecies\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"16.197183098591548%\" rowspan=\"2\" valign=\"top\"\u003e\n \u003cp\u003eCOL2A1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"73.2394366197183%\" valign=\"top\"\u003e\n \u003cp\u003eForword :TGGACGATCAGGCGAAACC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"10.56338028169014%\" rowspan=\"2\" valign=\"top\"\u003e\n \u003cp\u003eHuman\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"100%\" valign=\"top\"\u003e\n \u003cp\u003eReverse :GCTGCGGATGCTCTCAATCT\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"16.197183098591548%\" rowspan=\"2\" valign=\"top\"\u003e\n \u003cp\u003eIRS2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"73.2394366197183%\" valign=\"top\"\u003e\n \u003cp\u003eForword :CGGTGAGTTCTACGGGTACAT\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"10.56338028169014%\" rowspan=\"2\" valign=\"top\"\u003e\n \u003cp\u003eHuman\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"100%\" valign=\"top\"\u003e\n \u003cp\u003eReverse :TCAGGGTGTATTCATCCAGCG\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"16.197183098591548%\" rowspan=\"2\" valign=\"top\"\u003e\n \u003cp\u003eMMP13\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"73.2394366197183%\" valign=\"top\"\u003e\n \u003cp\u003eForword :ACTGAGAGGCTCCGAGAAATG\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"10.56338028169014%\" rowspan=\"2\" valign=\"top\"\u003e\n \u003cp\u003eHuman\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"100%\" valign=\"top\"\u003e\n \u003cp\u003eReverse :GAACCCCGCATCTTGGCTT\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"16.197183098591548%\" rowspan=\"2\" valign=\"top\"\u003e\n \u003cp\u003eSOX9\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"73.2394366197183%\" valign=\"top\"\u003e\n \u003cp\u003eForword :AGCGAACGCACATCAAGAC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"10.56338028169014%\" rowspan=\"2\" valign=\"top\"\u003e\n \u003cp\u003eHuman\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"100%\" valign=\"top\"\u003e\n \u003cp\u003eReverse :CTGTAGGCGATCTGTTGGGG\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"16.197183098591548%\" rowspan=\"2\" valign=\"top\"\u003e\n \u003cp\u003eCOX2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"73.2394366197183%\" valign=\"top\"\u003e\n \u003cp\u003eForword :GCACCCCGACATAGAGAGC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"10.56338028169014%\" rowspan=\"2\" valign=\"top\"\u003e\n \u003cp\u003eHuman\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"100%\" valign=\"top\"\u003e\n \u003cp\u003eReverse :CTGCGGAGTGCAGTGTTCT\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"16.197183098591548%\" rowspan=\"2\" valign=\"top\"\u003e\n \u003cp\u003eGAPDH\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"73.2394366197183%\" valign=\"top\"\u003e\n \u003cp\u003eForword :TGTGGGCATCAATGGATTTGG\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"10.56338028169014%\" rowspan=\"2\" valign=\"top\"\u003e\n \u003cp\u003eHuman\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"100%\" valign=\"top\"\u003e\n \u003cp\u003eReverse :ACACCATGTATTCCGGGTCAAT\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003e\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e2.9 Statistical Analysis\u003c/p\u003e\n\u003cp\u003eData were analyzed using R 4.3.0 (https://www.rproject.org/), Cytoscape 3.9.1 (https://cytoscape.org/index.html), Perl 5.32.1 (https://www.perl.org), and GraphPad Prism 9.7 (https://www.graphpad.com/) software. Continuous data were described as mean \u0026plusmn; standard deviation. For comparisons between two groups of continuous data, if the data followed a normal distribution and had equal variances, a t-test (T test) was used. If the data did not follow a normal distribution, a non-parametric test (Mann-Whitney U test) was employed. All experiments were independently repeated three times, and a P value less than 0.05 was considered statistically significant.\u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003e3.1 Identification of DEGs using RRA Method\u003c/p\u003e\n\u003cp\u003eTable 3 describes the sample characteristics of the datasets included in our study, including dataset ID, platform ID, total number of samples, number of normal samples, number of OA samples, and grouping information. Differential expression was analyzed for each dataset, and based on the results of DEGs from each dataset, the RRA method identified a total of 202 significantly up-regulated and 49 significantly down-regulated DEGs. A heatmap displays the top 50 up-regulated and top 10 down-regulated DEGs (Figure 1A).\u003c/p\u003e\n\u003cp\u003eTable 3 Sample Characteristics of Included Datasets\u003c/p\u003e\n\u003ctable border=\"0\" cellspacing=\"0\" cellpadding=\"0\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd width=\"15.492957746478874%\" valign=\"top\"\u003e\n \u003cp\u003eGEO dataset\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25.176056338028168%\" valign=\"top\"\u003e\n \u003cp\u003eplatform\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"21.302816901408452%\" valign=\"top\"\u003e\n \u003cp\u003esamples\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"8.97887323943662%\" valign=\"top\"\u003e\n \u003cp\u003eTotal\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.091549295774648%\" valign=\"top\"\u003e\n \u003cp\u003enormal\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"9.330985915492958%\" valign=\"top\"\u003e\n \u003cp\u003eOA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"8.626760563380282%\" valign=\"top\"\u003e\n \u003cp\u003egroup\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"15.492957746478874%\" valign=\"top\"\u003e\n \u003cp\u003eGSE51588\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25.176056338028168%\" valign=\"top\"\u003e\n \u003cp\u003eGPL13497\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"21.302816901408452%\" valign=\"top\"\u003e\n \u003cp\u003eSubchondral Bone\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"8.97887323943662%\" valign=\"top\"\u003e\n \u003cp\u003e50\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.091549295774648%\" valign=\"top\"\u003e\n \u003cp\u003e10\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"9.330985915492958%\" valign=\"top\"\u003e\n \u003cp\u003e40\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"8.626760563380282%\" valign=\"top\"\u003e\n \u003cp\u003eTrain\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"15.492957746478874%\" valign=\"top\"\u003e\n \u003cp\u003eGSE12021\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25.176056338028168%\" valign=\"top\"\u003e\n \u003cp\u003eGPL96\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"21.302816901408452%\" valign=\"top\"\u003e\n \u003cp\u003eSynovium\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"8.97887323943662%\" valign=\"top\"\u003e\n \u003cp\u003e19\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.091549295774648%\" valign=\"top\"\u003e\n \u003cp\u003e9\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"9.330985915492958%\" valign=\"top\"\u003e\n \u003cp\u003e10\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"8.626760563380282%\" valign=\"top\"\u003e\n \u003cp\u003eTrain\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"15.492957746478874%\" valign=\"top\"\u003e\n \u003cp\u003eGSE55457\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25.176056338028168%\" valign=\"top\"\u003e\n \u003cp\u003eGPL96\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"21.302816901408452%\" valign=\"top\"\u003e\n \u003cp\u003eSynovium\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"8.97887323943662%\" valign=\"top\"\u003e\n \u003cp\u003e20\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.091549295774648%\" valign=\"top\"\u003e\n \u003cp\u003e10\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"9.330985915492958%\" valign=\"top\"\u003e\n \u003cp\u003e10\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"8.626760563380282%\" valign=\"top\"\u003e\n \u003cp\u003eTrain\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"15.492957746478874%\" valign=\"top\"\u003e\n \u003cp\u003eGSE56409\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25.176056338028168%\" valign=\"top\"\u003e\n \u003cp\u003eGPL570\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"21.302816901408452%\" valign=\"top\"\u003e\n \u003cp\u003eSynovium\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"8.97887323943662%\" valign=\"top\"\u003e\n \u003cp\u003e22\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.091549295774648%\" valign=\"top\"\u003e\n \u003cp\u003e11\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"9.330985915492958%\" valign=\"top\"\u003e\n \u003cp\u003e11\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"8.626760563380282%\" valign=\"top\"\u003e\n \u003cp\u003eTrain\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"15.492957746478874%\" valign=\"top\"\u003e\n \u003cp\u003eGSE114007\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25.176056338028168%\" valign=\"top\"\u003e\n \u003cp\u003eGPL11154/GPL18573\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"21.302816901408452%\" valign=\"top\"\u003e\n \u003cp\u003eCartilage\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"8.97887323943662%\" valign=\"top\"\u003e\n \u003cp\u003e38\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.091549295774648%\" valign=\"top\"\u003e\n \u003cp\u003e18\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"9.330985915492958%\" valign=\"top\"\u003e\n \u003cp\u003e20\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"8.626760563380282%\" valign=\"top\"\u003e\n \u003cp\u003eTrain\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"15.492957746478874%\" valign=\"top\"\u003e\n \u003cp\u003eGSE168505\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25.176056338028168%\" valign=\"top\"\u003e\n \u003cp\u003eGPL16791\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"21.302816901408452%\" valign=\"top\"\u003e\n \u003cp\u003eCartilage\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"8.97887323943662%\" valign=\"top\"\u003e\n \u003cp\u003e7\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.091549295774648%\" valign=\"top\"\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"9.330985915492958%\" valign=\"top\"\u003e\n \u003cp\u003e4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"8.626760563380282%\" valign=\"top\"\u003e\n \u003cp\u003eTrain\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"15.492957746478874%\" valign=\"top\"\u003e\n \u003cp\u003eGSE169077\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25.176056338028168%\" valign=\"top\"\u003e\n \u003cp\u003eGPL96\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"21.302816901408452%\" valign=\"top\"\u003e\n \u003cp\u003eCartilage\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"8.97887323943662%\" valign=\"top\"\u003e\n \u003cp\u003e11\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.091549295774648%\" valign=\"top\"\u003e\n \u003cp\u003e5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"9.330985915492958%\" valign=\"top\"\u003e\n \u003cp\u003e6\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"8.626760563380282%\" valign=\"top\"\u003e\n \u003cp\u003eTrain\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"15.492957746478874%\" valign=\"top\"\u003e\n \u003cp\u003eGSE55235\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25.176056338028168%\" valign=\"top\"\u003e\n \u003cp\u003eGPL96\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"21.302816901408452%\" valign=\"top\"\u003e\n \u003cp\u003eSynovium\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"8.97887323943662%\" valign=\"top\"\u003e\n \u003cp\u003e20\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.091549295774648%\" valign=\"top\"\u003e\n \u003cp\u003e10\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"9.330985915492958%\" valign=\"top\"\u003e\n \u003cp\u003e10\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"8.626760563380282%\" valign=\"top\"\u003e\n \u003cp\u003eTest\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"15.492957746478874%\" valign=\"top\"\u003e\n \u003cp\u003eGSE129147\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25.176056338028168%\" valign=\"top\"\u003e\n \u003cp\u003eGPL6947\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"21.302816901408452%\" valign=\"top\"\u003e\n \u003cp\u003eCartilage\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"8.97887323943662%\" valign=\"top\"\u003e\n \u003cp\u003e40\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.091549295774648%\" valign=\"top\"\u003e\n \u003cp\u003e7\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"9.330985915492958%\" valign=\"top\"\u003e\n \u003cp\u003e33\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"8.626760563380282%\" valign=\"top\"\u003e\n \u003cp\u003eTest\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"15.492957746478874%\" valign=\"top\"\u003e\n \u003cp\u003eGSE48556\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25.176056338028168%\" valign=\"top\"\u003e\n \u003cp\u003eGPL6947\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"21.302816901408452%\" valign=\"top\"\u003e\n \u003cp\u003eBlood\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"8.97887323943662%\" valign=\"top\"\u003e\n \u003cp\u003e139\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"11.091549295774648%\" valign=\"top\"\u003e\n \u003cp\u003e33\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"9.330985915492958%\" valign=\"top\"\u003e\n \u003cp\u003e106\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"8.626760563380282%\" valign=\"top\"\u003e\n \u003cp\u003eTest\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003e3.2 Identification of DEGs using SVA Method\u003c/p\u003e\n\u003cp\u003eBox plots (Figures S1A-S1B) and PCA cluster plots (Figures S1C-S1D) present the gene expression matrix of the seven combined datasets before and after removing batch effects. The results indicate that, after normalization and batch effect adjustment, the batch effects between different datasets are not obvious, suggesting that the final combined dataset is suitable for subsequent analysis. The volcano plot and heatmap in Figures 1B-1C show the genetic differences between OA and normal controls in the combined dataset. Among these, 30 genes were found to be down-regulated and 64 genes were up-regulated. A total of 83 genes, identified by both RRA and SVA methods, were considered key genes for OA and retained for further analysis (Figure 1D).\u003c/p\u003e\n\u003cp\u003e3.3 Functional Enrichment and Pathway Analysis of Key Genes\u003c/p\u003e\n\u003cp\u003eWe analyzed the GO annotations, KEGG pathways, and DO enrichment for these key genes to explore their potential biological information. GO analysis results indicate that, in the biological process (BP) category, the genes are involved in extracellular matrix organization, extracellular structure organization, and external encapsulating structure organization. In the cellular component (CC) category, they are associated with the collagen-containing extracellular matrix, complex of collagen trimers, and fibrillar collagen trimer. The enriched molecular function (MF) terms include extracellular matrix structural constituent, extracellular matrix structural constituent conferring tensile strength, and platelet-derived growth factor binding (Figures 1E,1H). KEGG enrichment results show that key genes are primarily enriched in the Relaxin signaling pathway, Protein digestion and absorption, AGE-RAGE signaling pathway in diabetic complications, and Rheumatoid arthritis (Figures 1F,1I). DO enrichment analysis results (Figures 1G,1J) indicate that the key genes are enriched in diseases including degenerative disc disease, connective tissue cancer, cell type benign neoplasm, Osteoarthritis, and musculoskeletal system cancer.\u003c/p\u003e\n\u003cp\u003e3.4 Construction of PPI Network for Key Genes\u003c/p\u003e\n\u003cp\u003eWe used the key genes to construct a PPI network in STRING to understand the potential connections between proteins, with a minimum required interaction score of 0.4 and a PPI enrichment p-value \u0026lt;1.0e-16. This score indicates that the connections have a high level of confidence. To understand the expression of interacting proteins in OA, we scored the protein-protein interactions and combined the PPI network with scores greater than 0.4. We used Cytoscape software to visualize the PPI network constructed in STRING (Figure 2A). To understand the interactions between hub genes, we used the \u0026apos;cytoHubba\u0026apos; algorithm to identify hub genes. We considered all computational methods in the cytoHubba algorithm (\u0026quot;MCC\u0026quot;, \u0026quot;DMNC\u0026quot;, \u0026quot;MNC\u0026quot;, \u0026quot;Degree\u0026quot;, \u0026quot;EPC\u0026quot;, \u0026quot;BottleNeck\u0026quot;, \u0026quot;EcCentricity\u0026quot;, \u0026quot;Closeness\u0026quot;, \u0026quot;Radiality\u0026quot;, \u0026quot;Betweenness\u0026quot;, \u0026quot;Stress\u0026quot;, \u0026quot;ClusteringCoefficient\u0026quot;) as hub genes, and only those identified by all methods were included as final hub genes. In the end, we identified 25 hub genes (Figure 2B): COL1A2, COL3A1, POSTN, COL11A1, CDH11, SULF1, FAP, MMP13, MMP1, OGN, THY1, JUN, CDH2, WNT5A, ATF3, CDKN1A, GAP43, PTN, DDIT4, STMN2, TOP2A, IRS2, IGFBP1, TUBB3, and SPOCK1. The interactions of these genes were visualized, with yellow representing up-regulated genes and pink representing down-regulated genes (Figure 2C). We also plotted the volcano plot and heatmap of the hub genes (Figures 2D-2E). GeneMANIA was used alongside PPI to evaluate the 25 hub genes and 20 interacting genes to predict relationships in terms of co-expression, shared protein domains, co-localization, and pathways (Figure 2F). The outer circle represents predicted genes, and the inner circle represents hub genes. Network analysis indicated that these genes are related to fibroblast proliferation, regulation of fibroblast proliferation, sensory organ morphogenesis, axonogenesis, extracellular matrix organization, banded collagen fibril, and collagen-containing extracellular matrix.\u003c/p\u003e\n\u003cp\u003eTo further understand the function of hub genes in OA, we performed enrichment analysis on the hub genes. According to GO analysis, the biological processes (BP) enriched include extracellular matrix organization, extracellular structure organization, external encapsulating structure organization, axon development, and regeneration. The cellular components (CC) enriched include collagen-containing extracellular matrix, fibrillar collagen trimer, banded collagen fibril, endoplasmic reticulum lumen, and complex of collagen trimers. The molecular functions (MF) enriched include extracellular matrix structural constituent, extracellular matrix structural constituent conferring tensile strength, integrin binding, platelet-derived growth factor binding, and SMAD binding (Figures 2G-2H). KEGG analysis indicated that the hub genes are enriched in the Relaxin signaling pathway, IL-17 signaling pathway, AGE-RAGE signaling pathway in diabetic complications, Protein digestion and absorption, and PI3K-Akt signaling pathway (Figures 2I-2J).\u003c/p\u003e\n\u003cp\u003e3.5 Association between Hub Genes and TFs\u003c/p\u003e\n\u003cp\u003eTranscription factors (TFs) are involved in gene regulation. To explore the role of TFs, we used TRRUST to predict key TFs influencing OA through hub genes. The analysis of interactions between TFs and hub genes indicated that 49 TFs coordinated the regulation of 18 common DEGs (JUN is both a differentially expressed gene and a transcription factor), indicating a complex regulatory relationship (Figure 3A). The top-ranked (TFs), based on the complexity of gene regulation, include JUN, NFKB1, SP1, ETS1, AR, RELA, HDAC4, ATF2, ATF4, ESR1, EGR1, SP3, and TP53. Additionally, we validated the expression of several of these top-ranked TFs in OA (Figures 3B-3F). These findings reveal significant relationships between OA hub genes and TFs.\u003c/p\u003e\n\u003cp\u003e3.6 Identifying Diagnostic Biomarkers through Machine Learning\u003c/p\u003e\n\u003cp\u003eTo identify key diagnostic biomarkers for osteoarthritis (OA), we conducted a thorough feature selection process in our study. Initially, we employed the `train` function from the caret package to predict the optimal machine learning algorithms. The dataset was divided into training and validation sets, with training set proportions of either 0.7 or 0.8. Among the six machine learning methods evaluated, XGBoost, SVM, and Lasso regression consistently ranked high in terms of residuals and root mean square error (Figures 4A, 4C), with areas under the ROC curve exceeding 0.9 (Figures 4B, 4D), indicating superior diagnostic efficacy and learning performance. Consequently, we selected XGBoost, SVM, and Lasso regression to screen for key diagnostic biomarkers for OA.\u003c/p\u003e\n\u003cp\u003eUsing the XGBoost algorithm, we identified 23 candidate genes (Figure 4E). The SVM algorithm identified 23 key genes (Figures 4F-4G), and Lasso regression selected 26 candidate genes (Figures 4H-4I). By intersecting the feature genes identified by these three machine learning methods, we pinpointed 15 candidate genes: IRS2, ADM, SIK1, PTN, CX3CR1, WNT5A, IL21R, APOD, CRLF1, FKBP5, PNMAL1, NPR3, RARRES1, ASPN, and POSTN. Further intersecting these candidate genes with the hub genes, we identified IRS2, PTN, POSTN, and WNT5A as diagnostic biomarkers for OA (Figure 4J).\u003c/p\u003e\n\u003cp\u003eNext, we constructed box plots for these candidate genes, which showed differential expression in the dataset. We used ROC curves to evaluate the specificity and sensitivity of these four features in distinguishing OA from normal tissues in the test set. The diagnostic values of the four genes were as follows: IRS2 (AUC = 0.879, 95% CI: 0.826-0.927), PTN (AUC = 0.784, 95% CI: 0.707-0.852), POSTN (AUC = 0.668, 95% CI: 0.583-0.748), and WNT5A (AUC = 0.783, 95% CI: 0.706-0.857) (Figure 5A). Since the AUC of the POSTN gene in the test dataset was 0.668, which is less than 0.7, we ultimately selected IRS2, PTN, and WNT5A as the diagnostic biomarkers for OA.We then constructed a diagnostic nomogram for OA based on the training dataset to develop a clinically applicable diagnostic model for OA (Figure 5B). The clinical calibration curve (Figure 5C) and clinical decision curve (Figure 5D) of the model clearly demonstrated its high predictive ability for OA (AUC = 0.913, 95% CI: 0.866-0.953, Figure 5E).\u003c/p\u003e\n\u003cp\u003e3.7 Validation of Relevant Screening Genes\u003c/p\u003e\n\u003cp\u003eTo better validate the clinical diagnostic capability of the candidate genes, we used the cartilage tissue dataset GSE129147, the synovial tissue dataset GSE55235, and the blood sample dataset from OA patients GSE48556 for verification, with sample sizes of 40, 20, and 139, respectively. We constructed violin plots of the candidate genes, which showed differential expression in the datasets (Figures 6A-6C). Additionally, we plotted ROC curves to evaluate the diagnostic value of each gene in different tissue samples (Figures 6D-6F).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eAmong these, PTN had lower diagnostic efficiency in cartilage samples, POSTN had poor diagnostic efficiency in synovial samples, and only IRS2 exhibited high differential expression (p\u0026lt;0.01) in all samples with good diagnostic value, indicating a high clinical value.\u003c/p\u003e\n\u003cp\u003e3.8 Experimental Validation of IRS2 Gene\u003c/p\u003e\n\u003cp\u003eTo test the reliability of the candidate gene IRS2 in OA patients, we simulated OA by adding IL-1\u0026beta; to human SW1353 chondrocytes. We identified the protein level expression and relative mRNA expression level of the IRS2 gene in this sample through WB and qRT-PCR. Additionally, we checked OA-related phenotypes to ensure the accuracy of the OA simulation. The study data indicated that, compared to the control group, OA-related markers COL2A1 and SOX9 decreased, while MMP13 and COX2 significantly increased, consistent with OA characteristics. In the OA group, the protein and mRNA expression levels of the IRS2 gene were significantly reduced, showing statistical differences (Figures 7A-7C).\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eOsteoarthritis is a common degenerative joint disease that primarily affects weight-bearing joints such as the knees and hips. It is characterized by synovial inflammation, joint cartilage degradation, and subchondral bone alterations. Synovial inflammation leads to joint swelling and pain, while cartilage degradation results in direct bone contact, causing pain and joint stiffness. Subchondral bone sclerosis and osteophyte formation further exacerbate the progression of the disease\u003csup\u003e9,32-34\u003c/sup\u003e. These pathological changes interact with each other, leading to joint dysfunction and a decrease in quality of life\u003csup\u003e35\u003c/sup\u003e. With the global population aging, the number of elderly people suffering from osteoarthritis is increasing. Meanwhile, the underlying mechanisms of OA are not yet fully understood\u003csup\u003e36,37\u003c/sup\u003e. Therefore, it is urgent to identify new targets related to the pathogenesis of OA to prevent its progression at an early stage.\u003c/p\u003e\n\u003cp\u003eIn recent years, advances in sequencing technology and bioinformatics have made it possible to reanalyze previous osteoarthritis datasets\u003csup\u003e38,39\u003c/sup\u003e, thereby identifying relevant mechanisms and pathogenic targets. The application of machine learning algorithms to biomedicine has greatly facilitated a better understanding and interpretation of high-dimensional sequence information. Although some studies have made progress in decoding the heterogeneity of OA, a comprehensive understanding of the disease remains limited. Outdated algorithms have restricted the reliability of clinical practice, but using advanced machine learning algorithms for comprehensive analysis of OA across multiple cohorts has become extremely urgent. \u0026nbsp;,\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIn this study, we explored and validated OA across multiple tissues using bioinformatics and machine learning algorithms, aiming to provide new insights into the mechanisms of OA from multiple perspectives. First, we identified 83 DEGs from multiple tissue datasets using RRA and SVA methods. Functional enrichment, pathway analysis, and disease ontology enrichment analysis of these DEGs revealed their involvement in extracellular matrix organization, extracellular structure organization, Relaxin signaling pathway, PI3K-Akt signaling pathway, and diseases such as connective tissue cancer and musculoskeletal system cancer.\u003c/p\u003e\n\u003cp\u003eNext, we constructed a PPI network and identified hub genes from the identified DEGs. Our methods identified 25 hub genes associated with OA. Functional enrichment and pathway analysis of these hub genes revealed their involvement in extracellular matrix organization, extracellular structure organization, Relaxin signaling pathway, IL-17 signaling pathway, AGE-RAGE signaling pathway in diabetic complications, and PI3K-Akt signaling pathway. These pathways and functions were highly consistent with those of the DEGs, demonstrating the representative value of the identified hub genes. Additionally, these pathways and enrichments are in line with previous studies\u003csup\u003e19,40-42\u003c/sup\u003e.\u003c/p\u003e\n\u003cp\u003eTo better identify OA characteristic biomarkers, we employed machine learning algorithms for screening. Initially, we divided the training dataset into training and test groups and evaluated the performance of six commonly used machine learning methods. We found that regardless of the proportion of the training set, XGBoost, SVM, and Lasso regression demonstrated superior performance in screening for OA biomarker features and exhibited higher diagnostic efficiency, significantly outperforming other machine learning methods.\u003c/p\u003e\n\u003cp\u003eConsequently, we selected these three algorithms for screening OA characteristic biomarkers, identifying 15 candidates: IRS2, ADM, SIK1, PTN, CX3CR1, WNT5A, IL21R, APOD, CRLF1, FKBP5, PNMAL1, NPR3, RARRES1, ASPN, and POSTN. By intersecting these genes with the identified hub genes, we pinpointed IRS2, WNT5A, PTN, and POSTN as OA diagnostic genes.\u003c/p\u003e\n\u003cp\u003eAlthough POSTN showed statistically significant differences in OA (p\u0026lt;0.01), its ROC curve was below 0.7, indicating poor diagnostic performance. Thus, we constructed an OA risk prediction model based on the remaining three genes. The constructed and tested risk score nomogram could distinguish OA from normal tissues, with an AUC value of 0.913, indicating high accuracy. We validated these diagnostic genes using external datasets of OA cartilage samples, synovial samples, and blood samples. In the cartilage dataset, PTN did not show significant expression differences and had a ROC curve area below 0.7. In the synovial tissue dataset, POSTN did not exhibit differential expression compared to the control group. However, in OA blood samples, IRS2 showed statistically significant expression differences and exhibited significant expression differences across all three tissue samples (p\u0026lt;0.01). Its ROC curve area was consistently above 0.7, demonstrating good diagnostic performance.Thus, we have reason to believe that IRS2 might be involved in the majority of OA progression processes, highlighting its potential as a powerful diagnostic biomarker for OA.\u003c/p\u003e\n\u003cp\u003ePeriostin (POSTN) is a 90kDa matrix cell protein discovered in 1993\u003csup\u003e43\u003c/sup\u003e,\u0026nbsp;involved in the pathogenesis of various diseases including tumors, pulmonary fibrosis, and allergic diseases, with expression levels increasing as the disease progresses in most cases\u003csup\u003e43-45\u003c/sup\u003e.\u0026nbsp;POSTN interacts with extracellular matrix (ECM) proteins to regulate cell-matrix organization, leading to remodeling and fibrosis. The unique characteristics of POSTN can be attributed to highly complex signaling pathways that lead to increased POSTN production\u003csup\u003e43\u003c/sup\u003e.\u0026nbsp;In recent years, its association with orthopedic diseases has also become apparent\u003csup\u003e46\u003c/sup\u003e.\u0026nbsp;Numerous studies have confirmed that POSTN expression increases in OA cartilage\u003csup\u003e47-50\u003c/sup\u003e,\u0026nbsp;but its expression in OA synovial tissue is controversial. Tajika et al.\u003csup\u003e51\u003c/sup\u003e found that it is highly expressed in OA synovial cells, while Mukundan Attur et al.\u003csup\u003e52\u003c/sup\u003e found no statistically significant difference in its expression in degenerative OA. In OA blood, J C Rousseau et al.\u003csup\u003e53\u003c/sup\u003e found that serum POSTN is associated with the development of OA in women,but Sittisak Honsawek et al.\u003csup\u003e54\u003c/sup\u003e found no statistically significant difference in serum POSTN between OA and control groups. Recently, Tan et al.\u003csup\u003e55\u003c/sup\u003ealso showed that serum periostin levels are insufficient as clinical biomarkers for osteoarthritis. In our study, as shown in Figure 1A, POSTN was highly expressed in OA chondrocytes but appeared to be lowly expressed in synovial tissue. From Figure 6, we validated with external OA datasets and found that POSTN consistently showed high expression in cartilage but seemed to trend towards low expression in synovial tissue, without statistical significance, inconsistent with current research. This discrepancy requires further investigation through extensive related studies.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003ePleiotrophin (PTN) is a member of the midkine family\u003csup\u003e56\u003c/sup\u003e,\u0026nbsp;a secreted heparin-binding peptide expressed during development in mesodermal and neuroectodermal cells but rarely in adult tissues\u003csup\u003e57\u003c/sup\u003e.\u0026nbsp;T. Pufe et al.\u003csup\u003e58\u003c/sup\u003e found that PTN is almost undetectable in normal cartilage and synovial cells but is highly expressed in OA, with significant expression in early and middle stages of OA but not in late stages. Furthermore, T. Pufe et al.\u003csup\u003e59\u003c/sup\u003e suggested that PTN might participate in the early onset and development of OA by stimulating the activation of AP-1 (activator protein-1) transcription factor and altering gene expression. In blood, studies by Fadda et al.\u003csup\u003e60\u003c/sup\u003e showed no significant difference in average PTN levels between OA patients and healthy controls, suggesting that while PTN may play an important role in OA, its potential as a disease biomarker requires larger-scale exploration and further research. In our study, as seen in Figure 6, PTN did not show statistical significance in external cartilage tissue datasets, possibly because our cartilage tissue dataset represented late-stage OA. However, its high expression in the OA synovial tissue dataset is consistent with previous findings. Combining these results, we believe that PTN might play a role in diagnosing early synovitis in OA.\u003c/p\u003e\n\u003cp\u003eWNT5A is a member of the Wnt protein family, which comprises a group of highly conserved signaling proteins that play crucial roles in embryonic development and various cellular processes such as cell migration, polarity, and differentiation. WNT5A operates in the non-canonical Wnt signaling pathway, particularly influencing cell movement and polarity rather than directly affecting cell proliferation\u003csup\u003e61\u003c/sup\u003e. The high expression of WNT5A in OA cartilage has been confirmed in numerous studies, and its mechanisms of action in OA have been widely explored\u003csup\u003e62-66\u003c/sup\u003e, However, the impact of WNT5A in OA synovium is less understood. Lambert et al.\u003csup\u003e67\u003c/sup\u003e found high expression of WNT5A in OA synovium, regulated via the Wnt signaling pathway. This finding aligns with our conclusions. However, the expression of WNT5A in OA blood remains unstudied, necessitating further investigation.\u003c/p\u003e\n\u003cp\u003eInsulin Receptor Substrate 2 (IRS2) is a crucial member of the insulin receptor substrate family, playing a key role in insulin signaling and metabolic regulation\u003csup\u003e68\u003c/sup\u003e. It is expressed in various cell types, including liver, muscle, and adipose tissues, and is involved in multiple physiological processes\u003csup\u003e69\u003c/sup\u003e. The role of IRS2 in OA, however, has not been studied. In our research, using external cartilage, synovial, and OA blood sample datasets, we found that IRS2 consistently showed low expression. Given the absence of in vivo and in vitro studies on IRS2 in OA, we conducted in vitro experiments for validation. Through WB and qRT-PCR experiments, we confirmed that IRS2 is significantly downregulated in OA chondrocytes (p\u0026lt;0.001), consistent with our dataset validation results.\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eIn summary, this study integrates bioinformatics analysis and machine learning algorithms to identify and validate four promising biomarkers: IRS2, WNT5A, PTN, and POSTN. POSTN can serve as a biomarker for OA cartilage, PTN shows potential for early diagnosis of OA, and WNT5A and IRS2 introduce new diagnostic perspectives for OA.\u003c/p\u003e\n"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eData availability statement\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eYes,The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eAuthor contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eGuihao zheng, yulong Ouyang contributed equally to this work.\u003c/p\u003e\n\u003cp\u003eGHZ, YLOY, SLC, SX and BH were responsible for study concept and writing the article. GCS was responsible for reviewing and writing the article. All authors read and approved the final manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThere were no competing interests.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eVos, T.\u003cem\u003e et al.\u003c/em\u003e Years lived with disability (YLDs) for 1160 sequelae of 289 diseases and injuries 1990-2010: a systematic analysis for the Global Burden of Disease Study 2010. \u003cem\u003eLancet (London, England)\u003c/em\u003e \u003cstrong\u003e380\u003c/strong\u003e, 2163-2196, doi:10.1016/s0140-6736(12)61729-2 (2012).\u003c/li\u003e\n\u003cli\u003eGlobal, regional, and national burden of osteoarthritis, 1990-2020 and projections to 2050: a systematic analysis for the Global Burden of Disease Study 2021. \u003cem\u003eThe Lancet. Rheumatology\u003c/em\u003e \u003cstrong\u003e5\u003c/strong\u003e, e508-e522, doi:10.1016/s2665-9913(23)00163-7 (2023).\u003c/li\u003e\n\u003cli\u003ePuig-Junoy, J. \u0026amp; Ruiz Zamora, A. Socio-economic costs of osteoarthritis: a systematic review of cost-of-illness studies. \u003cem\u003eSeminars in arthritis and rheumatism\u003c/em\u003e \u003cstrong\u003e44\u003c/strong\u003e, 531-541, doi:10.1016/j.semarthrit.2014.10.012 (2015).\u003c/li\u003e\n\u003cli\u003ePeat, G. \u0026amp; Thomas, M. J. Osteoarthritis year in review 2020: epidemiology \u0026amp; therapy. \u003cem\u003eOsteoarthritis and cartilage\u003c/em\u003e \u003cstrong\u003e29\u003c/strong\u003e, 180-189, doi:10.1016/j.joca.2020.10.007 (2021).\u003c/li\u003e\n\u003cli\u003eHolt, H. L.\u003cem\u003e et al.\u003c/em\u003e Forecasting the burden of advanced knee osteoarthritis over a 10-year period in a cohort of 60-64 year-old US adults. \u003cem\u003eOsteoarthritis and cartilage\u003c/em\u003e \u003cstrong\u003e19\u003c/strong\u003e, 44-50, doi:10.1016/j.joca.2010.10.009 (2011).\u003c/li\u003e\n\u003cli\u003eGuccione, A. A.\u003cem\u003e et al.\u003c/em\u003e The effects of specific medical conditions on the functional limitations of elders in the Framingham Study. \u003cem\u003eAmerican journal of public health\u003c/em\u003e \u003cstrong\u003e84\u003c/strong\u003e, 351-358, doi:10.2105/ajph.84.3.351 (1994).\u003c/li\u003e\n\u003cli\u003eChen, D.\u003cem\u003e et al.\u003c/em\u003e Osteoarthritis: toward a comprehensive understanding of pathological mechanism. \u003cem\u003eBone research\u003c/em\u003e \u003cstrong\u003e5\u003c/strong\u003e, 16044, doi:10.1038/boneres.2016.44 (2017).\u003c/li\u003e\n\u003cli\u003eXiong, H., Huang, T. Y., Chang, Y. L. \u0026amp; Su, W. T. Achyranthes bidentate extracts protect the IL-1\u0026beta;-induced osteoarthritis of SW1353 chondrocytes. \u003cem\u003eJournal of bioscience and bioengineering\u003c/em\u003e \u003cstrong\u003e136\u003c/strong\u003e, 462-470, doi:10.1016/j.jbiosc.2023.09.008 (2023).\u003c/li\u003e\n\u003cli\u003eLi, X.\u003cem\u003e et al.\u003c/em\u003e Pathological progression of osteoarthritis: a perspective on subchondral bone. \u003cem\u003eFrontiers of medicine\u003c/em\u003e \u003cstrong\u003e18\u003c/strong\u003e, 237-257, doi:10.1007/s11684-024-1061-y (2024).\u003c/li\u003e\n\u003cli\u003eYang, D., Xu, K., Xu, X. \u0026amp; Xu, P. Revisiting prostaglandin E2: A promising therapeutic target for osteoarthritis. \u003cem\u003eClinical immunology (Orlando, Fla.)\u003c/em\u003e \u003cstrong\u003e260\u003c/strong\u003e, 109904, doi:10.1016/j.clim.2024.109904 (2024).\u003c/li\u003e\n\u003cli\u003eWen, Z.\u003cem\u003e et al.\u003c/em\u003e Endoplasmic Reticulum Stress in Osteoarthritis: A Novel Perspective on the Pathogenesis and Treatment. \u003cem\u003eAging and disease\u003c/em\u003e \u003cstrong\u003e14\u003c/strong\u003e, 283-286, doi:10.14336/ad.2022.0725 (2023).\u003c/li\u003e\n\u003cli\u003eWang, Q.\u003cem\u003e et al.\u003c/em\u003e Identification of a central role for complement in osteoarthritis. \u003cem\u003eNature medicine\u003c/em\u003e \u003cstrong\u003e17\u003c/strong\u003e, 1674-1679, doi:10.1038/nm.2543 (2011).\u003c/li\u003e\n\u003cli\u003eWoetzel, D.\u003cem\u003e et al.\u003c/em\u003e Identification of rheumatoid arthritis and osteoarthritis patients by transcriptome-based rule set generation. \u003cem\u003eArthritis research \u0026amp; therapy\u003c/em\u003e \u003cstrong\u003e16\u003c/strong\u003e, R84, doi:10.1186/ar4526 (2014).\u003c/li\u003e\n\u003cli\u003eFiler, A.\u003cem\u003e et al.\u003c/em\u003e Stromal transcriptional profiles reveal hierarchies of anatomical site, serum response and disease and identify disease specific pathways. \u003cem\u003ePloS one\u003c/em\u003e \u003cstrong\u003e10\u003c/strong\u003e, e0120917, doi:10.1371/journal.pone.0120917 (2015).\u003c/li\u003e\n\u003cli\u003eFisch, K. M.\u003cem\u003e et al.\u003c/em\u003e Identification of transcription factors responsible for dysregulated networks in human osteoarthritis cartilage by global gene expression analysis. \u003cem\u003eOsteoarthritis and cartilage\u003c/em\u003e \u003cstrong\u003e26\u003c/strong\u003e, 1531-1538, doi:10.1016/j.joca.2018.07.012 (2018).\u003c/li\u003e\n\u003cli\u003eDong, S., Xia, T., Wang, L., Zhao, Q. \u0026amp; Tian, J. Investigation of candidate genes for osteoarthritis based on gene expression profiles. \u003cem\u003eActa orthopaedica et traumatologica turcica\u003c/em\u003e \u003cstrong\u003e50\u003c/strong\u003e, 686-690, doi:10.1016/j.aott.2016.04.002 (2016).\u003c/li\u003e\n\u003cli\u003eCoppola, C.\u003cem\u003e et al.\u003c/em\u003e Osteoarthritis: Insights into Diagnosis, Pathophysiology, Therapeutic Avenues, and the Potential of Natural Extracts. \u003cem\u003eCurrent issues in molecular biology\u003c/em\u003e \u003cstrong\u003e46\u003c/strong\u003e, 4063-4105, doi:10.3390/cimb46050251 (2024).\u003c/li\u003e\n\u003cli\u003eLuo, H., Li, L., Han, S. \u0026amp; Liu, T. The role of monocyte/macrophage chemokines in pathogenesis of osteoarthritis: A review. \u003cem\u003eInternational journal of immunogenetics\u003c/em\u003e \u003cstrong\u003e51\u003c/strong\u003e, 130-142, doi:10.1111/iji.12664 (2024).\u003c/li\u003e\n\u003cli\u003eZhou, J.\u003cem\u003e et al.\u003c/em\u003e Identification of aging-related biomarkers and immune infiltration characteristics in osteoarthritis based on bioinformatics analysis and machine learning. \u003cem\u003eFrontiers in immunology\u003c/em\u003e \u003cstrong\u003e14\u003c/strong\u003e, 1168780, doi:10.3389/fimmu.2023.1168780 (2023).\u003c/li\u003e\n\u003cli\u003eHan, Y.\u003cem\u003e et al.\u003c/em\u003e Identification and development of a novel 5-gene diagnostic model based on immune infiltration analysis of osteoarthritis. \u003cem\u003eJournal of translational medicine\u003c/em\u003e \u003cstrong\u003e19\u003c/strong\u003e, 522, doi:10.1186/s12967-021-03183-9 (2021).\u003c/li\u003e\n\u003cli\u003eYin, W., Lei, Y., Yang, X. \u0026amp; Zou, J. A two-gene random forest model to diagnose osteoarthritis based on RNA-binding protein-related genes in knee cartilage tissue. \u003cem\u003eAging\u003c/em\u003e \u003cstrong\u003e15\u003c/strong\u003e, 193-212, doi:10.18632/aging.204469 (2023).\u003c/li\u003e\n\u003cli\u003eChen, Z., Wang, W., Zhang, Y., Xue, X. \u0026amp; Hua, Y. Identification of four-gene signature to diagnose osteoarthritis through bioinformatics and machine learning methods. \u003cem\u003eCytokine\u003c/em\u003e \u003cstrong\u003e169\u003c/strong\u003e, 156300, doi:10.1016/j.cyto.2023.156300 (2023).\u003c/li\u003e\n\u003cli\u003eFriedman, J., Hastie, T. \u0026amp; Tibshirani, R. Regularization Paths for Generalized Linear Models via Coordinate Descent. \u003cem\u003eJournal of statistical software\u003c/em\u003e \u003cstrong\u003e33\u003c/strong\u003e, 1-22 (2010).\u003c/li\u003e\n\u003cli\u003eLuts, J.\u003cem\u003e et al.\u003c/em\u003e A tutorial on support vector machine-based methods for classification problems in chemometrics. \u003cem\u003eAnalytica chimica acta\u003c/em\u003e \u003cstrong\u003e665\u003c/strong\u003e, 129-145, doi:10.1016/j.aca.2010.03.030 (2010).\u003c/li\u003e\n\u003cli\u003eDecup, F., L\u0026eacute;ger, S., Lef\u0026egrave;vre, S., Dom\u0026eacute;jean, S. \u0026amp; Grosgogeat, B. Risk factors or indicators for dental caries and tooth wear and their relative importance in adults according to age. \u003cem\u003eJournal of dentistry\u003c/em\u003e, 105092, doi:10.1016/j.jdent.2024.105092 (2024).\u003c/li\u003e\n\u003cli\u003eHart, C. R., Wilson, D. K., Pettit, C. L. \u0026amp; Nykaza, E. T. Machine-learning of long-range sound propagation through simulated atmospheric turbulence. \u003cem\u003eThe Journal of the Acoustical Society of America\u003c/em\u003e \u003cstrong\u003e149\u003c/strong\u003e, 4384, doi:10.1121/10.0005280 (2021).\u003c/li\u003e\n\u003cli\u003eIslam, S. M. S.\u003cem\u003e et al.\u003c/em\u003e Machine Learning Approaches for Predicting Hypertension and Its Associated Factors Using Population-Level Data From Three South Asian Countries. \u003cem\u003eFrontiers in cardiovascular medicine\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, 839379, doi:10.3389/fcvm.2022.839379 (2022).\u003c/li\u003e\n\u003cli\u003eXiao, J.\u003cem\u003e et al.\u003c/em\u003e Icariin inhibits chondrocyte ferroptosis and alleviates osteoarthritis by enhancing the SLC7A11/GPX4 signaling. \u003cem\u003eInternational immunopharmacology\u003c/em\u003e \u003cstrong\u003e133\u003c/strong\u003e, 112010, doi:10.1016/j.intimp.2024.112010 (2024).\u003c/li\u003e\n\u003cli\u003eChen, Y.\u003cem\u003e et al.\u003c/em\u003e Icariin alleviates osteoarthritis through PI3K/Akt/mTOR/ULK1 signaling pathway. \u003cem\u003eEuropean journal of medical research\u003c/em\u003e \u003cstrong\u003e27\u003c/strong\u003e, 204, doi:10.1186/s40001-022-00820-x (2022).\u003c/li\u003e\n\u003cli\u003eMu, Y., Wang, L., Fu, L. \u0026amp; Li, Q. Knockdown of LMX1B Suppressed Cell Apoptosis and Inflammatory Response in IL-1\u0026beta;-Induced Human Osteoarthritis Chondrocytes through NF-\u0026kappa;B and NLRP3 Signal Pathway. \u003cem\u003eMediators of inflammation\u003c/em\u003e \u003cstrong\u003e2022\u003c/strong\u003e, 1870579, doi:10.1155/2022/1870579 (2022).\u003c/li\u003e\n\u003cli\u003eXu, H.\u003cem\u003e et al.\u003c/em\u003e Inhibition of CC chemokine receptor 1 ameliorates osteoarthritis in mouse by activating PPAR-\u0026gamma;. \u003cem\u003eMolecular medicine (Cambridge, Mass.)\u003c/em\u003e \u003cstrong\u003e30\u003c/strong\u003e, 74, doi:10.1186/s10020-024-00823-w (2024).\u003c/li\u003e\n\u003cli\u003eMartel-Pelletier, J.\u003cem\u003e et al.\u003c/em\u003e Osteoarthritis. \u003cem\u003eNature reviews. Disease primers\u003c/em\u003e \u003cstrong\u003e2\u003c/strong\u003e, 16072, doi:10.1038/nrdp.2016.72 (2016).\u003c/li\u003e\n\u003cli\u003eBenito, M. J., Veale, D. J., FitzGerald, O., van den Berg, W. B. \u0026amp; Bresnihan, B. Synovial tissue inflammation in early and late osteoarthritis. \u003cem\u003eAnnals of the rheumatic diseases\u003c/em\u003e \u003cstrong\u003e64\u003c/strong\u003e, 1263-1267, doi:10.1136/ard.2004.025270 (2005).\u003c/li\u003e\n\u003cli\u003eGoldring, M. B. \u0026amp; Goldring, S. R. Osteoarthritis. \u003cem\u003eJournal of cellular physiology\u003c/em\u003e \u003cstrong\u003e213\u003c/strong\u003e, 626-634, doi:10.1002/jcp.21258 (2007).\u003c/li\u003e\n\u003cli\u003eHunter, D. J., Schofield, D. \u0026amp; Callander, E. The individual and socioeconomic impact of osteoarthritis. \u003cem\u003eNature reviews. Rheumatology\u003c/em\u003e \u003cstrong\u003e10\u003c/strong\u003e, 437-441, doi:10.1038/nrrheum.2014.44 (2014).\u003c/li\u003e\n\u003cli\u003eHawker, G. A.\u003cem\u003e et al.\u003c/em\u003e Understanding the pain experience in hip and knee osteoarthritis--an OARSI/OMERACT initiative. \u003cem\u003eOsteoarthritis and cartilage\u003c/em\u003e \u003cstrong\u003e16\u003c/strong\u003e, 415-422, doi:10.1016/j.joca.2007.12.017 (2008).\u003c/li\u003e\n\u003cli\u003ePrieto-Alhambra, D.\u003cem\u003e et al.\u003c/em\u003e Incidence and risk factors for clinically diagnosed knee, hip and hand osteoarthritis: influences of age, gender and osteoarthritis affecting other joints. \u003cem\u003eAnnals of the rheumatic diseases\u003c/em\u003e \u003cstrong\u003e73\u003c/strong\u003e, 1659-1664, doi:10.1136/annrheumdis-2013-203355 (2014).\u003c/li\u003e\n\u003cli\u003eLin, J.\u003cem\u003e et al.\u003c/em\u003e Bioinformatics analysis to identify key genes and pathways influencing synovial inflammation in osteoarthritis. \u003cem\u003eMolecular medicine reports\u003c/em\u003e \u003cstrong\u003e18\u003c/strong\u003e, 5594-5602, doi:10.3892/mmr.2018.9575 (2018).\u003c/li\u003e\n\u003cli\u003eFang, C.\u003cem\u003e et al.\u003c/em\u003e CDKN1A regulation on chondrogenic differentiation of human chondrocytes in osteoarthritis through single-cell and bulk sequencing analysis. \u003cem\u003eHeliyon\u003c/em\u003e \u003cstrong\u003e10\u003c/strong\u003e, e27466, doi:10.1016/j.heliyon.2024.e27466 (2024).\u003c/li\u003e\n\u003cli\u003eSun, K.\u003cem\u003e et al.\u003c/em\u003e The PI3K/AKT/mTOR signaling pathway in osteoarthritis: a narrative review. \u003cem\u003eOsteoarthritis and cartilage\u003c/em\u003e \u003cstrong\u003e28\u003c/strong\u003e, 400-409, doi:10.1016/j.joca.2020.02.027 (2020).\u003c/li\u003e\n\u003cli\u003eXiao, J.\u003cem\u003e et al.\u003c/em\u003e IL-17 in osteoarthritis: A narrative review. \u003cem\u003eOpen life sciences\u003c/em\u003e \u003cstrong\u003e18\u003c/strong\u003e, 20220747, doi:10.1515/biol-2022-0747 (2023).\u003c/li\u003e\n\u003cli\u003eWang, K., Li, Y. \u0026amp; Lin, J. Identification of diagnostic biomarkers for osteoarthritis through bioinformatics and machine learning. \u003cem\u003eHeliyon\u003c/em\u003e \u003cstrong\u003e10\u003c/strong\u003e, e27506, doi:10.1016/j.heliyon.2024.e27506 (2024).\u003c/li\u003e\n\u003cli\u003eOno, J., Takai, M., Kamei, A., Azuma, Y. \u0026amp; Izuhara, K. Pathological Roles and Clinical Usefulness of Periostin in Type 2 Inflammation and Pulmonary Fibrosis. \u003cem\u003eBiomolecules\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, doi:10.3390/biom11081084 (2021).\u003c/li\u003e\n\u003cli\u003eYu, Y., Tan, C. M. \u0026amp; Jia, Y. Y. Research status and the prospect of POSTN in various tumors. \u003cem\u003eNeoplasma\u003c/em\u003e \u003cstrong\u003e68\u003c/strong\u003e, 673-682, doi:10.4149/neo_2021_210223N239 (2021).\u003c/li\u003e\n\u003cli\u003eSonnenberg-Riethmacher, E., Miehe, M. \u0026amp; Riethmacher, D. Periostin in Allergy and Inflammation. \u003cem\u003eFrontiers in immunology\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, 722170, doi:10.3389/fimmu.2021.722170 (2021).\u003c/li\u003e\n\u003cli\u003eYoshihara, T.\u003cem\u003e et al.\u003c/em\u003e Mechanisms of tissue degeneration mediated by periostin in spinal degenerative diseases and their implications for pathology and diagnosis: a review. \u003cem\u003eFrontiers in medicine\u003c/em\u003e \u003cstrong\u003e10\u003c/strong\u003e, 1276900, doi:10.3389/fmed.2023.1276900 (2023).\u003c/li\u003e\n\u003cli\u003eHan, T., Mignatti, P., Abramson, S. B. \u0026amp; Attur, M. Periostin interaction with discoidin domain receptor-1 (DDR1) promotes cartilage degeneration. \u003cem\u003ePloS one\u003c/em\u003e \u003cstrong\u003e15\u003c/strong\u003e, e0231501, doi:10.1371/journal.pone.0231501 (2020).\u003c/li\u003e\n\u003cli\u003eChijimatsu, R.\u003cem\u003e et al.\u003c/em\u003e Expression and pathological effects of periostin in human osteoarthritis cartilage. \u003cem\u003eBMC musculoskeletal disorders\u003c/em\u003e \u003cstrong\u003e16\u003c/strong\u003e, 215, doi:10.1186/s12891-015-0682-3 (2015).\u003c/li\u003e\n\u003cli\u003eDuan, X.\u003cem\u003e et al.\u003c/em\u003e Amelioration of Posttraumatic Osteoarthritis in Mice Using Intraarticular Silencing of Periostin via Nanoparticle-Based Small Interfering RNA. \u003cem\u003eArthritis \u0026amp; rheumatology (Hoboken, N.J.)\u003c/em\u003e \u003cstrong\u003e73\u003c/strong\u003e, 2249-2260, doi:10.1002/art.41794 (2021).\u003c/li\u003e\n\u003cli\u003eAttur, M.\u003cem\u003e et al.\u003c/em\u003e Elevated expression of periostin in human osteoarthritic cartilage and its potential role in matrix degradation via matrix metalloproteinase-13. \u003cem\u003eFASEB journal : official publication of the Federation of American Societies for Experimental Biology\u003c/em\u003e \u003cstrong\u003e29\u003c/strong\u003e, 4107-4121, doi:10.1096/fj.15-272427 (2015).\u003c/li\u003e\n\u003cli\u003eTajika, Y.\u003cem\u003e et al.\u003c/em\u003e Influence of Periostin on Synoviocytes in Knee Osteoarthritis. \u003cem\u003eIn vivo (Athens, Greece)\u003c/em\u003e \u003cstrong\u003e31\u003c/strong\u003e, 69-77, doi:10.21873/invivo.11027 (2017).\u003c/li\u003e\n\u003cli\u003eAttur, M.\u003cem\u003e et al.\u003c/em\u003e Periostin loss-of-function protects mice from post-traumatic and age-related osteoarthritis. \u003cem\u003eArthritis research \u0026amp; therapy\u003c/em\u003e \u003cstrong\u003e23\u003c/strong\u003e, 104, doi:10.1186/s13075-021-02477-z (2021).\u003c/li\u003e\n\u003cli\u003eRousseau, J. C., Sornay-Rendu, E., Bertholon, C., Garnero, P. \u0026amp; Chapurlat, R. Serum periostin is associated with prevalent knee osteoarthritis and disease incidence/progression in women: the OFELY study. \u003cem\u003eOsteoarthritis and cartilage\u003c/em\u003e \u003cstrong\u003e23\u003c/strong\u003e, 1736-1742, doi:10.1016/j.joca.2015.05.015 (2015).\u003c/li\u003e\n\u003cli\u003eHonsawek, S., Wilairatana, V., Udomsinprasert, W., Sinlapavilawan, P. \u0026amp; Jirathanathornnukul, N. Association of plasma and synovial fluid periostin with radiographic knee osteoarthritis: Cross-sectional study. \u003cem\u003eJoint bone spine\u003c/em\u003e \u003cstrong\u003e82\u003c/strong\u003e, 352-355, doi:10.1016/j.jbspin.2015.01.023 (2015).\u003c/li\u003e\n\u003cli\u003eTan, Q.\u003cem\u003e et al.\u003c/em\u003e Serum periostin level is not sufficient to serve as a clinically applicable biomarker of osteoarthritis. \u003cem\u003eBMC musculoskeletal disorders\u003c/em\u003e \u003cstrong\u003e23\u003c/strong\u003e, 1039, doi:10.1186/s12891-022-06017-x (2022).\u003c/li\u003e\n\u003cli\u003eYazihan, N. Midkine in inflammatory and toxic conditions. \u003cem\u003eCurrent drug delivery\u003c/em\u003e \u003cstrong\u003e10\u003c/strong\u003e, 54-57, doi:10.2174/1567201811310010009 (2013).\u003c/li\u003e\n\u003cli\u003eMentlein, R. Targeting pleiotropin to treat osteoarthritis. \u003cem\u003eExpert opinion on therapeutic targets\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, 861-867, doi:10.1517/14728222.11.7.861 (2007).\u003c/li\u003e\n\u003cli\u003ePufe, T., Bartscher, M., Petersen, W., Tillmann, B. \u0026amp; Mentlein, R. Pleiotrophin, an embryonic differentiation and growth factor, is expressed in osteoarthritis. \u003cem\u003eOsteoarthritis and cartilage\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, 260-264, doi:10.1016/s1063-4584(02)00385-0 (2003).\u003c/li\u003e\n\u003cli\u003ePufe, T., Groth, G., Goldring, M. B., Tillmann, B. \u0026amp; Mentlein, R. Effects of pleiotrophin, a heparin-binding growth factor, on human primary and immortalized chondrocytes. \u003cem\u003eOsteoarthritis and cartilage\u003c/em\u003e \u003cstrong\u003e15\u003c/strong\u003e, 155-162, doi:10.1016/j.joca.2006.07.005 (2007).\u003c/li\u003e\n\u003cli\u003eFadda, S. M. H., Bassyouni, I. H., Khalifa, R. H. \u0026amp; Elsaid, N. Y. Pleiotrophin, the angiogenic and mitogenic growth factor: levels in serum and synovial fluid in rheumatoid arthritis and osteoarthritis : And correlation with clinical, laboratory and radiological indices. \u003cem\u003eZeitschrift fur Rheumatologie\u003c/em\u003e \u003cstrong\u003e77\u003c/strong\u003e, 322-329, doi:10.1007/s00393-016-0234-8 (2018).\u003c/li\u003e\n\u003cli\u003eSuthon, S., Perkins, R. S., Bryja, V., Miranda-Carboni, G. A. \u0026amp; Krum, S. A. WNT5B in Physiology and Disease. \u003cem\u003eFrontiers in cell and developmental biology\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, 667581, doi:10.3389/fcell.2021.667581 (2021).\u003c/li\u003e\n\u003cli\u003eShao, L. T.\u003cem\u003e et al.\u003c/em\u003e The Protective Effects of Parathyroid Hormone (1-34) on Cartilage and Subchondral Bone Through Down-Regulating JAK2/STAT3 and WNT5A/ROR2 in a Collagenase-Induced Osteoarthritis Mouse Model. \u003cem\u003eOrthopaedic surgery\u003c/em\u003e \u003cstrong\u003e13\u003c/strong\u003e, 1662-1672, doi:10.1111/os.13019 (2021).\u003c/li\u003e\n\u003cli\u003eMartineau, X., Abed, \u0026Eacute;., Martel-Pelletier, J., Pelletier, J. P. \u0026amp; Lajeunesse, D. Alteration of Wnt5a expression and of the non-canonical Wnt/PCP and Wnt/PKC-Ca2+ pathways in human osteoarthritis osteoblasts. \u003cem\u003ePloS one\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, e0180711, doi:10.1371/journal.pone.0180711 (2017).\u003c/li\u003e\n\u003cli\u003eLi, Y.\u003cem\u003e et al.\u003c/em\u003e The Expression of Osteopontin and Wnt5a in Articular Cartilage of Patients with Knee Osteoarthritis and Its Correlation with Disease Severity. \u003cem\u003eBioMed research international\u003c/em\u003e \u003cstrong\u003e2016\u003c/strong\u003e, 9561058, doi:10.1155/2016/9561058 (2016).\u003c/li\u003e\n\u003cli\u003eQi, Y., Tang, R., Shi, Z., Feng, G. \u0026amp; Zhang, W. Wnt5a/Platelet-rich plasma synergistically inhibits IL-1\u0026beta;-induced inflammatory activity through NF-\u0026kappa;B signaling pathway and prevents cartilage damage and promotes meniscus regeneration. \u003cem\u003eJournal of tissue engineering and regenerative medicine\u003c/em\u003e \u003cstrong\u003e15\u003c/strong\u003e, 612-624, doi:10.1002/term.3198 (2021).\u003c/li\u003e\n\u003cli\u003eDing, D.\u003cem\u003e et al.\u003c/em\u003e Zoledronic acid generates a spatiotemporal effect to attenuate osteoarthritis by inhibiting potential Wnt5a-associated abnormal subchondral bone resorption. \u003cem\u003ePloS one\u003c/em\u003e \u003cstrong\u003e17\u003c/strong\u003e, e0271485, doi:10.1371/journal.pone.0271485 (2022).\u003c/li\u003e\n\u003cli\u003eLambert, C.\u003cem\u003e et al.\u003c/em\u003e Gene expression pattern of cells from inflamed and normal areas of osteoarthritis synovial membrane. \u003cem\u003eArthritis \u0026amp; rheumatology (Hoboken, N.J.)\u003c/em\u003e \u003cstrong\u003e66\u003c/strong\u003e, 960-968, doi:10.1002/art.38315 (2014).\u003c/li\u003e\n\u003cli\u003eLee, Y. H. \u0026amp; White, M. F. Insulin receptor substrate proteins and diabetes. \u003cem\u003eArchives of pharmacal research\u003c/em\u003e \u003cstrong\u003e27\u003c/strong\u003e, 361-370, doi:10.1007/bf02980074 (2004).\u003c/li\u003e\n\u003cli\u003eEckstein, S. S., Weigert, C. \u0026amp; Lehmann, R. Divergent Roles of IRS (Insulin Receptor Substrate) 1 and 2 in Liver and Skeletal Muscle. \u003cem\u003eCurrent medicinal chemistry\u003c/em\u003e \u003cstrong\u003e24\u003c/strong\u003e, 1827-1852, doi:10.2174/0929867324666170426142826 (2017).\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Osteoarthritis, Biomarkers, Machine Learning, Bioinformatics, Tissue-specific expressed genes","lastPublishedDoi":"10.21203/rs.3.rs-4706641/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4706641/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003e\u003cb\u003eBackground\u003c/b\u003e\u003c/p\u003e \u003cp\u003eOsteoarthritis (OA) is characterized by synovial inflammation, articular cartilage degradation, and subchondral bone changes. Currently, there are no reliable biomarkers for the diagnosis and treatment of OA. Therefore, exploring OA biomarkers is crucial for its prevention, diagnosis, and treatment.\u003c/p\u003e\u003cp\u003e\u003cb\u003eMaterials and Methods\u003c/b\u003e\u003c/p\u003e \u003cp\u003eThe GSE51588, GSE12021, GSE55457, GSE56409, GSE114007, GSE168505, GSE169077, GSE55235, GSE129147, and GSE48556 datasets of patients with OA and normal control samples were obtained from the GEO database. Differentially expressed genes (DEGs) in OA and normal controls were identified using R language. Protein-protein interaction (PPI) network and module analysis were performed to screen and filter key genes. Enrichment analyses were conducted to determine the biological functions and pathways of key DEGs and predict potential transcription factors. Machine learning models (XGBoost, LASSO regression, and SVM) were used to identify the best characteristic genes, and the intersection of hub genes was used as the final diagnostic genes. ROC analysis and nomogram were used to evaluate the diagnostic value of candidate genes. The expression levels of characteristic genes were validated in external GEO datasets containing cartilage, synovial membrane, and blood samples from patients. The expression levels of the key gene IRS2 in chondrocytes were further confirmed through in vitro experiments.\u003c/p\u003e\u003cp\u003e\u003cb\u003eResults\u003c/b\u003e\u003c/p\u003e \u003cp\u003eFifteen OA characteristic genes (IRS2, ADM, SIK1, PTN, CX3CR1, WNT5A, IL21R, APOD, CRLF1, FKBP5, PNMAL1, NPR3, RARRES1, ASPN, POSTN) were identified using three machine learning algorithms. Enrichment analysis indicated that abnormal expression of DEGs and hub genes may be mediated by extracellular matrix organization, extracellular structure organization, Relaxin signaling pathway, IL-17 signaling pathway, AGE-RAGE signaling pathway in diabetic complications, and PI3K-Akt signaling pathway, which are involved in OA occurrence. Four diagnostic genes (IRS2, WNT5A, PTN, POSTN) were highly correlated with OA. Validation data set analysis showed that IRS2 was down-regulated, while WNT5A, PTN, and POSTN were up-regulated in the experimental group compared to the normal group. qRT-PCR and WB results verified that the expression level of diagnostic gene IRS2 was consistent with bioinformatics analysis results.\u003c/p\u003e\u003cp\u003e\u003cb\u003eConclusion\u003c/b\u003e\u003c/p\u003e \u003cp\u003eThis study integrates bioinformatics analysis and machine learning algorithms to identify and validate four promising biomarkers: IRS2, WNT5A, PTN, and POSTN. POSTN can be used as a biomarker for OA cartilage, and early diagnosis of PTN in OA deserves attention. WNT5A and IRS2 offer new diagnostic perspectives for OA.\u003c/p\u003e","manuscriptTitle":"Identification and validation of novel characteristic genes based on multi-tissue osteoarthritis","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-08-09 21:15:37","doi":"10.21203/rs.3.rs-4706641/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"dfaedf7e-5dba-4a4e-bd48-971443668b0b","owner":[],"postedDate":"August 9th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2024-09-09T14:48:47+00:00","versionOfRecord":[],"versionCreatedAt":"2024-08-09 21:15:37","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-4706641","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4706641","identity":"rs-4706641","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.