Mechanisms in the transition from systemic lupus erythematosus to lupus nephritis: A bioinformatics and functional analysis approach | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Mechanisms in the transition from systemic lupus erythematosus to lupus nephritis: A bioinformatics and functional analysis approach Hua Li, Yike Zou, Xin Chen, Yuchi Wang, Peng Zu, Jiahao Chen, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6250363/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Lupus nephritis (LN), a severe renal complication of systemic lupus erythematosus (SLE), represents the primary driver of end-stage renal disease and mortality in SLE populations. Current immunosuppressive therapies carry significant comorbidity risks, underscoring the urgent need for non-invasive biomarkers enabling early LN detection. This study employed integrated bioinformatics approaches to identify circulating biomarkers predictive of SLE-to-LN transition. Analysis of the GSE99967 dataset revealed peripheral blood-derived differentially expressed genes (DEGs) between LN patients and SLE controls. Through weighted gene co-expression network analysis (WGCNA) and protein-protein interaction network construction, we identified hub genes subsequently refined via three machine learning algorithms: LASSO regression, random forest, and SVM-RFE. Functional enrichment analyses using DAVID, GO, and KEGG pathways elucidated immune-related biological processes. The diagnostic performance of candidate biomarkers was rigorously validated through ROC curve evaluation across both training (GSE99967) and independent validation (GSE82221) cohorts, complemented by immune infiltration profiling to delineate cellular correlations. Our multi-modal approach consistently identified inducible T-cell costimulator (ICOS) as a pivotal biomarker, demonstrating superior diagnostic accuracy (AUC [specify value] in training, [value] in validation sets) for LN progression prediction. Mechanistically, ICOS expression patterns showed significant associations with Th cell subset infiltration, suggesting its dual role as both diagnostic indicator and immunopathogenic mediator. These findings position ICOS as a promising non-invasive biomarker capable of guiding early therapeutic intervention and personalized management of SLE patients at risk for nephritis progression, potentially circumventing the need for invasive renal biopsies while addressing critical unmet needs in lupus nephropathy surveillance. Biological sciences/Biological techniques Biological sciences/Immunology Health sciences/Diseases Lupus nephritis Systemic lupus erythematosus machine learning weighted gene co-expression network analysis immune infiltration analysis Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 1. Introduction SLE is an autoimmune disease involving an inappropriate immune response to endogenous nuclear particles, which affects multiple organs and systems [ 1 ]. LN, an immune complex glomerulonephritis, is one of the most common and severe target organ manifestations of SLE and the leading cause of SLE related death [ 2 , 3 ]. Patients with LN usually present with findings of nephritic (e.g., hematuria, generalized edema, hypertension) and/or nephrotic (generalized edema, frothy urine) glomerular disease [ 4 ]. The treatment of LN usually involves immunosuppressive therapy, typically with mycophenolate mofetil or cyclophosphamide and with glucocorticoids, although these treatments are not uniformly effective [ 5 ]. Within 10 years of an initial SLE diagnosis, 5–20% of patients with LN develop ESRD, and the multiple comorbidities associated with immunosuppressive treatment, including infections, osteoporosis and cardiovascular and reproductive effects, remain a concern [ 5 ]. According to the guidelines, a reliable criterion for diagnosing LN is the histopathological confirmation obtained through renal biopsy [ 6 , 7 ]. However, kidney biopsy is an invasive procedure associated with the risk of bleeding and is expensive. Consequently, it poses limitations to physicians in their ability to dynamically monitor and manage the disease progression of SLE. Currently, commonly used laboratory markers for LN include urinary protein, serum creatinine, glomerular filtration rate, anti-dsDNA antibody, and serum complements [ 8 ]. However, these parameters cannot meet the actual needs of clinical settings due to lack of sensitivity and specificity [ 9 , 10 ]. Therefore, it is particularly important to find simple, non-invasive and effective biomarker to predict the risk of LN in SLE patients. Renal glomerulus plays a crucial role in the onset and progression of LN [ 11 ]. In individuals with SLE, the immune system produces autoantibodies and immune complexes that gradually accumulate within the renal glomeruli [ 12 – 14 ]. This accumulation triggers an inflammatory response, resulting in glomerular damage and dysfunction. In addition, several studies have provided insights into the involvement of susceptibility genes in LN, disrupting immune tolerance and promoting disease development, and these genes amplify innate immune signaling pathways, promote lymphocyte activation, and ultimately lead to renal damage [ 11 ]. Targeting specific immune cell populations or manipulating their functions could alleviate inflammation, reduce tissue damage, and improve the prognosis for patients with renal diseases [ 15 – 17 ]. In summary, understanding the molecular mechanisms underlying the transition from SLE to the LN disease stage can help develop more effective diagnostic and therapeutic strategies. This study leveraged bioinformatics tools to analyze the gene expression omnibus (GEO) datasets GSE99967 and GSE82221. In the GSE99967 data set, shared genes were obtained by crossing the MEturquoise module genes selected by weighted gene co-expression network analysis (WGCNA) and the DEGs obtained by differential analysis. Functional enrichment analyses were performed on shared genes using the DAVID database to elucidate the molecular mechanisms from SLE to LN. Subsequently, we constructed a protein-protein interaction (PPI) network to identify core genes, which were further refined using machine learning for biomarker gene selection. The candidate diagnostic gene was validated using the receiver operating characteristic (ROC) curve and single-gene difference analysis both in the training and validation sets. We also performed immune cell infiltration analysis and determined correlations between immune cells and the diagnostic biomarker. Collectively, our integrative bioinformatics approach aims to discover early diagnostic gene associated with the transition from SLE to LN to improve clinical diagnosis and treatment. 2. Materials and methods 2.1 Data sources and workflow GSE99967 [ 18 ] and GSE82221 [ 19 ] datasets were downloaded from the Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/ ) database [ 20 ]. The platform of GSE99967 was GPL21970 (Affymetrix Human Gene 2.0 ST Array), and the whole peripheral blood samples from 29 LN patients and 13 SLE controls were kept for further analysis. Additionally, we also obtained the GSE82221 dataset consisting of peripheral blood samples from 15 LN patients and 15 SLE controls for external validation purposes. The platform of GSE82221 was GPL13534 (Illumina HumanMethylation450 BeadChip). The workflow of this study is illustrated in Fig. 1 . 2.2 Differential expression genes analysis The Limma package in R was employed for differential expression genes analysis on the GSE99967 dataset. The criterion for selecting DEGs was set to p 1 to ensure significance. Finally, the packages "ggplot2" and "pheatmap" in the R language were used to visualize the DEGs as heatmaps and volcano plots, respectively. 2.3 Weighted gene co-expression network analysis WGCNA is a systems biology method used to construct gene co-expression networks and identify gene modules related to biological traits. By analyzing the similarity in expression patterns between genes, WGCNA reveals gene regulatory mechanisms and functional modules, which are crucial for understanding complex biological processes [ 21 ]. The WGCNA package was employed for the WGCNA analysis of GSE99967, examining the correlation between modules and disease status. Through WGCNA, we clustered genes into different modules and selected the module with highest correlation to LN. In this way, we finally obtained MEturquoise module. 2.4 Gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) pathway analysis The shared genes were obtained by crossing "DEGs" and "MEturquoise". DAVID ( https://davidbioinformatics.nih.gov ) is an open database that integrates biological data and analytical tools for functional annotation of genes and pathways [ 22 ]. GO is a bioinformatics tool for annotating genes and analyzing the biological processes they are involved in. KEGG is a database for analyzing relevant signaling pathways in largescale molecular datasets generated by high-throughput experimental techniques [ 23 ]. We conducted GO and KEGG enrichment analysis in the DAVID database to explore the functional biological roles of shared genes. The results were then visualized on a bioinformatics platform ( http://bioinformatics.com.cn/ ). 2.5 Protein-protein interaction network analysis and core genes identification STRING is a widely used protein interaction database that integrates known and predicted protein-protein interaction information [ 24 ]. To explore the functional associations and interactions among the shared genes, we imported them into STRING ( https://string-db.org ) to built a PPI network (confidence > 0.4). In addition, we also used CytoHubba plugin in Cytoscape software [ 25 ] to identify core genes, and the core genes were collected by Degree algorithm. 2.6 Machine learning As a dimension reduction approach, least absolute shrinkage and selection operator (LASSO) regression exhibits superior performance when evaluating high-dimensional data compared to regression analysis and uses regularization to improve prediction accuracy. A 10-fold cross-verification of LASSO analysis was performed using the “glmnet” package by a turning or penalty parameter [ 26 ]. Random forest (RF) is a supervised machine learning algorithm built with a decision tree algorithm and is used to solve regression and classification problems. Feature importance was determined by the Mean Decrease Gini Index calculated by RF [ 27 ]. A support vector machine-recursive feature elimination (SVM-RFE) model was compared by the average misjudgement rates of their 10-fold cross-validations using the “e1071” software package [ 28 ]. As a novel machine learning technique, SVM-RFE can rank features based on recursion to avoid overfitting [ 29 ]. The selected core genes were analyzed by these three machine learning algorithms to further screen out the candidate diagnostic biomarker. Finally, ICOS was identified as the candidate diagnostic biomarker by intersecting the results acquired from the SVM-RFE, LASSO, and RF models. 2.7 Diagnostic biomarker validation Receiver operating characteristic (ROC) analysis was conducted to evaluate predictive accuracy of the candidate gene. The area under the curve (AUC), sensitivity, and 1-specificity were calculated, and the AUC values quantified the ability of the gene to differentiate between classes, with higher AUC values indicating better performance. That with AUC value > 0.70 was considered with the ideal diagnostic value. ROC curve analyses in both training (GSE99967) and validation (GSE82221) sets were used to evaluate the diagnostic values of candidate gene. In addition, we undertook a rigorous validation of the expression patterns of the candidate gene utilizing these two datasets. This validation was essential to substantiate the reliability of the gene. By employing the above methods, we successfully identified ICOS as a significant blood diagnostic biomarker for identifying the transition from SLE to LN. 2.8 Immune infiltration analysis The CIBERSORT algorithm is used to infer the relative abundance of different immune cell subsets from gene expression data [ 30 ]. It estimates the relative abundance of various immune cell types by calculating the correlation between the gene expression profile and known immune cell gene expression patterns. A CIBERSORT algorithm was applied to analyze immune cells infiltration of the diagnostic biomarker. The results were visualized using "pheatmap", "gglot2", "corrplot", and "vioplot" packages. Spearman correlation analysis was employed to determine the correlation between immune cells and diagnostic biomarker. 3. Results 3.1 Identification of DEGs The volcano plot showed that using |log(FC)|>1, p-value < 0.05 as the screening threshold, 117 DEGs were identified in the GSE99967 dataset, of which 73 genes were up-regulated and 44 genes were down-regulated in terms of expression (Fig. 2 A). The heatmap showed the top 30 genes upregulated and downregulated in the GSE99967 dataset, respectively, as in (Fig. 2 B). We created a dataset called "DEGs" (containing the above 117 up-regulated and down-regulated genes). 3.2 WGCNA and key module identification In the GSE99967 dataset, A key module named "MEturquoise" was selected through WGCNA analysis. The blockwise modules function was used to cluster the samples, resulting in a cluster dendrogram (Fig. 3 A). Heatmap of the module-trait relationship was plotted (Fig. 3 B).This network analysis allowed us to focus on gene co-expression patterns, highlighting genes that are likely to be functionally related and potentially significant in LN. The dataset named "MEturquoise" with 578 genes was comprised. Finally, we obtained 27 shared genes by crossing "DEGs" and "MEturquoise" datasets (Fig. 3 C). 3.3 Functional enrichment analysis results We conducted KEGG pathway enrichment analysis on the shared genes, revealing significant enrichment in several key biological pathways. We found that these genes are enriched in some immune response-related cancer pathways, such as thyroid cancer, endometrial cancer, basal cell carcinoma and acute myeloid leukemia (Fig. 4 A). In addition, these shared genes are enriched in some typical immune response signaling pathways, such as Intestinal immune network for IgA production and T cell receptor signaling pathway (Fig. 4 A). GO analysis results show that shared genes are mainly related to the following biological processes(BP) and cellular component(CC), including adaptive immune response, immune response, T cell receptor signaling pathway, negative thymic T cell selection, T cell costimulation, and T cell receptor complex (Fig. 4 B). Some biological processes related to inflammatory response are related to the risk of LN in SLE patients, such as cellular response to interleukin-4 and cellular response to cytokine stimulus (Fig. 4 B). In summary, enrichment analysis identified several important biological pathways and processes associated with the progression of SLE to LN, particularly those related to immune response and inflammation. These findings provide important insights into the molecular mechanisms of the transition from SLE to LN and provide potential targets for diagnosis and treatment. 3.4 Construction of PPI network and core genes identification In order to better understand the interactions between the above shared genes, we used STRING, to perform PPI network construction on the shared genes (Fig. 5 A), imported the results into Cytoscape software, and used the CytoHubba plugin, to identify the core genes. Degree algorithm was used to obtain the five core genes (Fig. 5 B). These core genes included: CD28, LEF1, TCF7, ICOS, andCCR7. 3.5 Identification of the candidate diagnostic biomarker We constructed a prediction model for the diagnosis of disease using three different algorithms to distinguish the LN patients from SLE controls. Two out of five LN-related features were screened using the LASSO algorithm (Figure 6 A, B ). Next, we identified feature importance using random forests and the top five genes were selected as diagnostic genes ( Figure 6 C, D ). Then, features were selected and three genes were identified as the best candidates for LN based on SVM-RFE ( Figure 6 E, F ). Finally, we intersect the candidate genes acquired from the SVM-RFE, LASSO, and RF models, and ICOS was identified as the candidate diagnostic biomarker for follow-up steps ( Figure 6 G ). 3.6 Validation of diagnostic biomarker In order to verify the diagnostic value of the selected candidate gene in differentiating LN and SLE, ROC curve analysis was performed. The results showed that the AUCs of ICOS in GSE99967 and GSE82221 were 0.851 and 0.849, respectively (Fig. 7 A, B). This demonstrates that ICOS has high diagnostic accuracy in distinguishing LN patients from SLE controls. ICOS can be used as a potential molecular biomarker to provide an important reference for early diagnosis and personalized treatment of the transition from SLE to LN. We also visualized the expression trends of ICOS in the GSE99967 and GSE82221 datasets, and the results showed that ICOS showed a downward trend in the transition from SLE to LN disease state (p < 0.05), which emphasized the potential of ICOS for further development and application in this field (Fig. 7 C, D). 3.7 Immune infiltration analysis The proportion of immune infiltration cells was analyzed by CIBERSORT. Figure 8 A, B exhibited the distribution of immune cells in each sample using barplot and heatmap. Compared to SLE control samples, LN samples exhibited higher level of neutrophils, while fewer naive CD4 + T cells and eosinophils (Fig. 8 C). Subsequently, the correlation heatmap between individual immune cells revealed that activated memory CD4 + T cells (r=-0.55) and neutrophils(r=-0.77) were negatively related with resting memory CD4 + T cells, and naive CD4 + T cells(r=-0.52) was negatively related with monocytes, whereas memory B cells(r = 0.55) was positively related with naive CD4 + T cells (Fig. 8 D). These findings indicate distinct immune patterns in LN patients compared to SLE controls, as well as interactions between various types of immune cells. In addition, the correlation analysis between ICOS and immune cells was investigated. The results demonstrated that naive B cells, memory B cells, plasma cells, naive CD4 + T cells, resting memory CD4 + T cells, and eosinophils were positively correlated with ICOS, monocytes and neutrophils were negatively correlated with ICOS (Fig. 8 E). This suggests that ICOS may play an important role in the pathogenesis of SLE transition to LN by influencing immune cells infiltration. 4. Discussion Currently, the reliable standard for diagnosing LN is histopathological confirmation obtained through renal biopsy, but renal biopsy is an invasive procedure with a risk of bleeding and is expensive. It is particularly important to find a simple, non-invasive, and effective biomarker to predict the risk of LN in SLE patients. Based on this, this study used the GSE99967 and GSE82221 microarray data sets containing peripheral blood samples instead of other renal tissue biopsy samples. This study integrates analysis of the GEO dataset GSE99967 with bioinformatics to screen and identify the diagnostic marker involved in the progression from SLE to LN. Utilizing WGCNA, we identified DEGs and highlighted the key module MEturquoise through clustering and functional annotation. Subsequently, we identified 5 core genes using PPI network analysis. With machine learning development, people can implement medical frameworks or disease activities. machine learning has shown capability and affected clinical decision-making in multiple fields, predicting mortality [ 31 ], response to biological agents [ 32 ], and disease activity [ 33 ]. This study identified an candidate diagnostic gene: ICOS through three machine learning methods: Lasso, RF and SVM-RFE. Therefore, ICOS may serve as a promising biomarker and predictor for future diagnostic and therapeutic strategies for progression from SLE to LN. To further validate our findings, we utilized an independent GEO dataset, GSE82221, for dataset verification. ICOS was validated through ROC curves and single-gene differential analysis. The results demonstrated a strong correlation between ICOS and the progression of SLE to LN, thereby further supporting our study results. Immune infiltration analysis indicated distinct immune patterns in LN patients compared to SLE controls, as well as interactions between various types of immune cells. Naive B cells, memory B cells, plasma cells, naive CD4 + T cells, resting memory CD4 + T cells, and eosinophils were positively correlated with ICOS, while monocytes and neutrophils were negatively correlated with ICOS. This suggests that ICOS may play an important role in the pathogenesis of SLE transition to LN by influencing immune cells infiltration. These immune infiltration changes not only provided crucial clues for understanding the immunopathological mechanisms of SLE to LN but also suggested potential avenues for future targeted immune therapy strategies. These might include modulating specific immune cell subgroup functions and disrupting intercellular interactions, aiming for precise interventions in the disease progression. Kyoto Encyclopedia of genes and genomes pathway enrichment analysis revealed that shared genes were enriched in some immune response-related cancer pathways, such as thyroid cancer, endometrial cancer, basal cell carcinoma and acute myeloid leukemia. Autoimmune thyroid diseases (AITD) result from a dysregulation of the immune system leading to an immune attack on the thyroid, which is associated with other organ specific (polyglandular autoimmune syndromes), or systemic autoimmune disorders[ 34 ]. Moreover, several studies have shown an association of AITD and papillary thyroid cancer[ 34 ]. Endometrial tissue contains numerous leukocytes varying in number and phenotype throughout the menstrual cycle [ 35 ]. Leukocytes are more abundant before menstruation, probably in relation to the immune protection required during endometrial disruption. Therefore, tumor immune response may be specifically enhanced in endometrial cancer cells. The immune system plays a key role in the suppression and progression of basal cell carcinoma, and there are multiple mechanisms by which basal cell carcinoma evades the anti-tumor immune response[ 36 ]. Acute myeloid leukemia is a genetically, epigenetically, and clinically heterogeneous disease characterized by accumulation and expansion of immature myeloid cells in the bone marrow and peripheral blood, with consequent failure of normal hematopoiesis [ 37 ], which has been considered an immunoresponsive malignancy [ 38 ]. In addition, these shared genes are enriched in some typical immune response signaling pathways, such as Intestinal immune network for IgA production and T cell receptor signaling pathway. Gene ontology analysis revealed some biological processes related to inflammatory and immune responses are related to the risk of LN in SLE patients. In summary, enrichment analysis revealed some key biological pathways and functions related to processes such as immune response and inflammation, providing a new perspective on the pathogenesis of transition from SLE to LN. Inducible co-stimulator (ICOS) is a cell-enhanced co-stimulatory receptor that has shown great potential in the regulation of innate and adaptive immunity[ 39 ]. ICOS may play an important role in adaptive immunity by regulating the interaction between T cells and antigen-presenting cells. Disruption of this molecule can lead to autoimmune diseases, in particular SLE. A case-control study found that ICOS rs11889031 may act as a risk factor for SLE and could be used as a genetic susceptibility biomarker [ 40 ]. In addition, ICOS is a potent promoter of organ inflammation in murine lupus. ICOS stimulates T follicular helper cell differentiation in lymphoid tissue, suggesting that it might drive autoimmunity by enhancing autoantibody production. Yet, the pathogenic relevance of this mechanism remains unclear. A study shows that selective ablation of ICOS ligand (ICOSL) in CD11c + cells but not in B cells dramatically ameliorates kidney and lung inflammation in lupus-prone MRL [ 41 ]. In conclusion, in addition to being a potential biomarker for predicting the progression of SLE to LN, ICOS also represents a promising therapeutic target. By specifically regulating the expression or activity of ICOS, it is possible to intervene in the risk of LN in SLE patients and provide new clinical treatment strategies. Since the above data comes from previous studies, it should be valid, but it also has limitations and requires further new and in-depth research to verify. 5. Conclusion ICOS not only the potential biomarker and predictor for the transition from SLE to LN but also hold potential as a therapeutic target. These findings offer significant insights into the molecular mechanisms of SLE to LN and pave the way for the development of novel therapeutic approaches, holding considerable clinical application prospects. Declarations Author contributions Conceptualization, writing, and original draft preparation: Hua Li; Writing, review, and editing: Hua Li and Yike Zou; Methodology: Hua Li, Yike Zou and Yuchi Wang; Resources: Hua Li, Yike Zou, Yuchi Wang and Peng Zu; Formal analysis and investigation: Hua Li, Yike Zou, Yuchi Wang, Peng Zu and Jiahao Chen; Supervision: Hongwei Su. Disclosure statement No potential conflict of interest was reported by the authors. Funding Science and Technology Department of Sichuan Province of China (No. 2022YFS0621). Data availability statement The Gene expression profiles were obtained from the Gene Expression Omnibus (GEO) database at the following link: https://www.ncbi.nlm.nih.gov/geo/, and the gene expression profiles used in this research, GSE99967 and GSE82221, were accessed at the following link: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE99967 and https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE82221. All data generated or analyzed during this study are included in this published article. References Kaul A, Gordon C, Crow MK, Touma Z, Urowitz MB, van Vollenhoven R, et al. Systemic lupus erythematosus. Nat Rev Dis Primers. (2016) 2:16039. 10.1038/nrdp.2016.39 Lech M, Anders HJ. The pathogenesis of lupus nephritis. J Am Soc Nephrol. (2013) 24:1357–66. 10.1681/ASN.2013010026 Shen L, Lan L, Zhu T, Chen H, Gu H, Wang C, Chen Y, Wang M, Tu H, Enghard P, Jiang H, Chen J. Identification and Validation of IFI44 as Key Biomarker in Lupus Nephritis. Front Med (Lausanne). 2021 Oct 25;8:762848. Omer MH, Shafqat A, Ahmad O, Nadri J, AlKattan K, Yaqinuddin A. Urinary Biomarkers for Lupus Nephritis: A Systems Biology Approach. J Clin Med. 2024 Apr 18;13(8):2339. Anders HJ, Saxena R, Zhao MH, Parodis I, Salmon JE, Mohan C. Lupus nephritis. Nat Rev Dis Primers. 2020 Jan 23;6(1):7. Fanouriakis A, Kostopoulou M, Cheema K, Anders HJ, Aringer M, Bajema I, et al. 2019 Update of the Joint European League Against Rheumatism and European Renal Association-European Dialysis and Transplant Association (EULAR/ERA-EDTA) recommendations for the management of lupus nephritis. Ann Rheum Dis (2020) 79:713–23. Kidney Disease: Improving Global Outcomes Glomerular Diseases Work G . KDIGO 2021 clinical practice guideline for the management of glomerular diseases. Kidney Int (2021) 100:S1–S276. Yu C, Li P, Dang X, Zhang X, Mao Y, Chen X. Lupus nephritis: new progress in diagnosis and treatment. J Autoimmun (2022) 132:102871. Morell M, Perez-Cozar F, Maranon C. Immune-related urine biomarkers for the diagnosis of lupus nephritis. Int J Mol Sci (2021) 22(13):7143. Mejia-Vilet JM, Malvar A, Arazi A, Rovin BH. The lupus nephritis management renaissance. Kidney Int (2022) 101:242–55. Wang Z, Hu D, Pei G, Zeng R, Yao Y. Identification of driver genes in lupus nephritis based on comprehensive bioinformatics and machine learning. Front Immunol. 2023 Dec 7;14:1288699. Tojo T, Friou GJ. Lupus nephritis: varying complement-fixing properties of immunoglobulin G antibodies to antigens of cell nuclei. Science (1968) 161(3844):904–6. Jacob N, Stohl W. Autoantibody-dependent and autoantibody-independent roles for B cells in systemic lupus erythematosus: past, present, and future. Autoimmunity (2010) 43(1):84–97. Yung S, Chan TM. Autoantibodies and resident renal cells in the pathogenesis of lupus nephritis: getting to know the unknown. Clin Dev Immunol 2012 (2012) p:139365. Arazi A, Rao DA, Berthier CC, Davidson A, Liu Y, Hoover PJ, et al. The immune cell landscape in kidneys of patients with lupus nephritis. Nat Immunol (2019) 20(7):902–14. Banchereau R, Hong S, Cantarel B, Baldwin N, Baisch J, Edens M, et al. Personalized immunomonitoring uncovers molecular networks that stratify lupus patients. Cell (2016) 165(3):551–65. Davidson A. What is damaging the kidney in lupus nephritis? Nat Rev Rheumatol (2016) 12(3):143–53. Wither JE, Prokopec SD, Noamani B, Chang NH, Bonilla D, Touma Z, Avila-Casado C, Reich HN, Scholey J, Fortin PR, Boutros PC, Landolt-Marticorena C. Identification of a neutrophil-related gene expression signature that is enriched in adult systemic lupus erythematosus patients with active nephritis: Clinical/pathologic associations and etiologic mechanisms. PLoS One. 2018 May 9;13(5):e0196117. Zhu H, Mi W, Luo H, Chen T, Liu S, Raman I, Zuo X, Li QZ. Whole-genome transcription and DNA methylation analysis of peripheral blood mononuclear cells identified aberrant gene regulation pathways in systemic lupus erythematosus. Arthritis Res Ther. 2016 Jul 13;18:162. Wang Z, Monteiro CD, Jagodnik KM, Fernandez NF, Gundersen GW, Rouillard AD, Jenkins SL, Feldmann AS, Hu KS, McDermott MG, et al. Extraction and analysis of signatures from the gene expression omnibus by the crowd. Nat Commun. 2016;7:12846. Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9(1):559. Sherman BT, Hao M, Qiu J, Jiao X, Baseler MW, Lane HC, Imamichi T, Chang W. DAVID: a web server for functional enrichment analysis and functional annotation of gene lists (2021 update). Nucleic Acids Res. 2022 Jul 5;50(W1):W216-W221. Kanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 2017;45:D353–D361. Doncheva NT, Morris JH, Gorodkin J, et al. Cytoscape StringApp: network analysis and visualization of proteomics data. J Proteome Res. 2019;18(2):623–632. Shannon P, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13:2498–2504. Engebretsen S, Bohlin J. Statistical predictions with glmnet. Clin Epigenetics. 2019 Aug 23;11(1):123. Garge NR, Bobashev G, Eggleston B. Random forest methodology for model-based recursive partitioning: the mobForest package for R. BMC Bioinformatics. 2013 Apr 11;14:125. Sundermann B, Bode J, Lueken U, Westphal D, Gerlach AL, Straube B, Wittchen HU, Ströhle A, Wittmann A, Konrad C, Kircher T, Arolt V, Pfleiderer B. Support Vector Machine Analysis of Functional Magnetic Resonance Imaging of Interoception Does Not Reliably Predict Individual Outcomes of Cognitive Behavioral Therapy in Panic Disorder with Agoraphobia. Front Psychiatry. 2017 Jun 9;8:99. Zhao E, Xie H, Zhang Y. Predicting Diagnostic Gene Biomarkers Associated With Immune Infiltration in Patients With Acute Myocardial Infarction. Front Cardiovasc Med. 2020 Oct 23;7:586871. Newman AM, Liu CL, Green MR, et al. Robust enumeration of cell subsets from tissue expression profiles. Nat Methods. 2015;12(5):453–457. Lezcano-Valverde JM, Salazar F, León L, et al. Development and validation of a multivariate predictive model for rheumatoid arthritis mortality using a machine learning approach. Sci Rep. 2017;7(1):10189. Guan Y, Zhang H, Quang D, et al. Machine learning to predict anti-Tumor necrosis factor drug responses of rheumatoid arthritis patients by integrating clinical and genetic markers. Arthritis Rheumatol. 2019;71(12):1987–1996. Kegerreis B, Catalina MD, Bachali P, et al. Machine learning approaches to predict lupus disease activity from gene expression data. Sci Rep. 2019;9(1):9617. Antonelli A, Ferrari SM, Corrado A, Di Domenicantonio A, Fallahi P. Autoimmune thyroid disorders. Autoimmun Rev. 2015 Feb;14(2):174-80. Vanderstraeten A., Tuyaerts S., Amant F. The Immune System in the Normal Endometrium and Implications for Endometrial Cancer Development. J. Reprod. Immunol. 2015;109:7–16. Zilberg C, Lyons JG, Gupta R, Damian DL. The Immune Microenvironment in Basal Cell Carcinoma. Ann Dermatol. 2023 Aug;35(4):243-255. Döhner H, Weisdorf DJ, Bloomfield CD. Acute myeloid leukemia. N Engl J Med. 2015;373(12):1136–1152. doi: 10.1056/NEJMra1406184. Vago L, Gojo I. Immune escape and immunotherapy of acute myeloid leukemia. J Clin Invest. 2020 Apr 1;130(4):1552-1564. Wu G, He M, Ren K, Ma H, Xue Q. Inducible Co-Stimulator ICOS Expression Correlates with Immune Cell Infiltration and Can Predict Prognosis in Lung Adenocarcinoma. Int J Gen Med. 2022 Apr 6;15:3739-3751. Houssaini H, Bouallegui E, Abida O, Tahri S, Elloumi N, Hachicha H, Marzouk S, Bahloul Z, Masmoudi H, Fakhfakh R. ICOS gene polymorphisms in systemic lupus erythematosus: A case-control study. Int J Immunogenet. 2023 Aug;50(4):194-205. Teichmann LL, Cullen JL, Kashgarian M, Dong C, Craft J, Shlomchik MJ. Local triggering of the ICOS coreceptor by CD11c(+) myeloid cells drives organ inflammation in lupus. Immunity. 2015 Mar 17;42(3):552-65. Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6250363","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":478620768,"identity":"5f0f08ff-f7ba-42f8-b51b-2b398b864b99","order_by":0,"name":"Hua Li","email":"","orcid":"","institution":"Southwest Medical University","correspondingAuthor":false,"prefix":"","firstName":"Hua","middleName":"","lastName":"Li","suffix":""},{"id":478620769,"identity":"8089f247-701d-49fb-953d-ddd4db49f0ae","order_by":1,"name":"Yike Zou","email":"","orcid":"","institution":"Southwest Medical University","correspondingAuthor":false,"prefix":"","firstName":"Yike","middleName":"","lastName":"Zou","suffix":""},{"id":478620770,"identity":"6e969729-1395-4794-9cd3-29ec74bb49ad","order_by":2,"name":"Xin Chen","email":"","orcid":"","institution":"Affiliated Hospital of Traditional Chinese Medicine of Southwest Medical University","correspondingAuthor":false,"prefix":"","firstName":"Xin","middleName":"","lastName":"Chen","suffix":""},{"id":478620771,"identity":"198218ac-cba5-4587-9f99-8db298f3bb22","order_by":3,"name":"Yuchi Wang","email":"","orcid":"","institution":"Southwest Medical University","correspondingAuthor":false,"prefix":"","firstName":"Yuchi","middleName":"","lastName":"Wang","suffix":""},{"id":478620772,"identity":"faaffd2f-d856-46a5-b652-45b269ce56ca","order_by":4,"name":"Peng Zu","email":"","orcid":"","institution":"Southwest Medical University","correspondingAuthor":false,"prefix":"","firstName":"Peng","middleName":"","lastName":"Zu","suffix":""},{"id":478620773,"identity":"d6392a06-260e-4552-9ef5-a0be15e0ed59","order_by":5,"name":"Jiahao Chen","email":"","orcid":"","institution":"Southwest Medical University","correspondingAuthor":false,"prefix":"","firstName":"Jiahao","middleName":"","lastName":"Chen","suffix":""},{"id":478620774,"identity":"85a9d1d5-b5d9-43fc-ad84-cb7c3f56e420","order_by":6,"name":"Hongwei Su","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAzklEQVRIiWNgGAWjYDACZgaGA5L/auTs2xsbH34gWosF2zFjA57DzcYSRNtUwcacuEEivU2AhxjVBse5Ew/c4GFj3C75sI1BgsFOTreBgBbJZt4NB2dIyDBbzk5se1DAkGxsdoCAFn5m3g2HJQzY2BhuJ7YbSDAcSNxGSAsbSMufBGYehpsH2yR4iNECsuWAxAFmCYMbjERqAfnlgGTDMQPJnkRgIBsQ4ReD82c3f5BsqKnvZz/+8OGHCjs5glrQTSBN+SgYBaNgFIwCHAAACXRB1SD6A8sAAAAASUVORK5CYII=","orcid":"","institution":"Affiliated Hospital of Traditional Chinese Medicine of Southwest Medical University","correspondingAuthor":true,"prefix":"","firstName":"Hongwei","middleName":"","lastName":"Su","suffix":""}],"badges":[],"createdAt":"2025-03-18 07:23:17","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6250363/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6250363/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":85859311,"identity":"006b6009-c146-407e-8c0b-e61ae540ed19","added_by":"auto","created_at":"2025-07-02 12:03:10","extension":"jpg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":78171,"visible":true,"origin":"","legend":"\u003cp\u003eWorkflow of the study.\u003c/p\u003e","description":"","filename":"1.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6250363/v1/64438e4ab5971c8eba62a833.jpg"},{"id":85858286,"identity":"aff46e09-ca6a-4de5-a134-2c63a17b13fd","added_by":"auto","created_at":"2025-07-02 11:55:10","extension":"jpg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":58458,"visible":true,"origin":"","legend":"\u003cp\u003eScreening for DEGs. (A)Volcano plot of DEGs in GSE99967. (B)Heatmap of top 30 DEGs in GSE99967.\u003c/p\u003e","description":"","filename":"2.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6250363/v1/416af00dada7e76b90e52a91.jpg"},{"id":85859314,"identity":"95b28daf-28d8-44bb-b33a-b5492f6d2fb0","added_by":"auto","created_at":"2025-07-02 12:03:11","extension":"jpg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":106158,"visible":true,"origin":"","legend":"\u003cp\u003eWGCNA. (A) Gene cluster dendrogram, with each branch of the graph representing a gene and each color below representing a co-expression module. (C) Heatmap of module-trait relationships, where each color represents a co-expression module and the values represent module-trait correlation coefficients and p-values. It can be seen that the turquoise module has the highest correlation with LN. (D) Venn diagram of the intersection of DEGs and MEturquoise module genes.\u003c/p\u003e","description":"","filename":"3.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6250363/v1/d3a9a3f53501be9f402122c2.jpg"},{"id":85858291,"identity":"da86e378-e81e-40a9-be8e-b6a1439bc478","added_by":"auto","created_at":"2025-07-02 11:55:11","extension":"jpg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":70427,"visible":true,"origin":"","legend":"\u003cp\u003eShared genes functional enrichment analysis. (A) The KEGG outcomes are displayed with a bar plot. (B) The GO outcomes are displayed with a bar plot.\u003c/p\u003e","description":"","filename":"4.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6250363/v1/71e50be9d96bec799316df07.jpg"},{"id":85860252,"identity":"c905555b-ab4a-4d65-9e21-88c3e283cf4d","added_by":"auto","created_at":"2025-07-02 12:11:10","extension":"jpg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":54931,"visible":true,"origin":"","legend":"\u003cp\u003ePPI network and core genes identification. (A) PPI network. Each node represents a protein, and the connection represents the interaction between proteins. (B) Cytoscepe was used for visualization of core genes. The core genes was screened by Degree algorithm.\u003c/p\u003e","description":"","filename":"5.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6250363/v1/bd11047dba21ffebba6abb60.jpg"},{"id":85858298,"identity":"98dfad4b-df8b-490b-b46f-7342c58fb218","added_by":"auto","created_at":"2025-07-02 11:55:11","extension":"jpg","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":96085,"visible":true,"origin":"","legend":"\u003cp\u003eICOS was identified as the candidate diagnostic biomarker. (A, B) Regression coefficient path diagram and cross-validation curves in LASSO logistic regression algorithm. (C, D) The identification of feature importance based on random forests. (E, F) The curve of change in the predicted true and error value of each gene in SVM-RFE algorithm. (G) Venn diagram demonstrates the intersection of diagnostic markers obtained from the three algorithms.\u003c/p\u003e","description":"","filename":"6.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6250363/v1/2de048f938e7bd44994e13fb.jpg"},{"id":85858295,"identity":"0500f194-14c7-4e2c-8345-445f5cb9a6df","added_by":"auto","created_at":"2025-07-02 11:55:11","extension":"jpg","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":62278,"visible":true,"origin":"","legend":"\u003cp\u003eValidation of ICOS. (A) ROC analysis of ICOS in GSE99967. (B) ROC analysis of ICOS in GSE82221. (C) Expression level of ICOS in GSE99967. (D) Expression level of ICOS in GSE82221.\u003c/p\u003e","description":"","filename":"7.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6250363/v1/7efe6d368f82d9475865ff44.jpg"},{"id":85858300,"identity":"45a74e42-fa0d-4808-9e9c-5ed634833dba","added_by":"auto","created_at":"2025-07-02 11:55:11","extension":"jpg","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":147080,"visible":true,"origin":"","legend":"\u003cp\u003eAnalysis of immune cell infiltration. (A, B) The barplot and heatmap visualizing the proportion of infiltrating immune cells. (C) The violin plot comparing the proportion of immune cells between LN and SLE controls. (D) The correlated heatmap representing the correlation between different immune cells. Both horizontal and vertical axes demonstrate immune cell subtypes. (E) Correlation analysis of infiltrating immune cells with ICOS.\u003c/p\u003e","description":"","filename":"8.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6250363/v1/5c75a48d99543b0dd9931ba6.jpg"},{"id":89916825,"identity":"cfa1819b-88e5-45aa-8212-d2752a72d15e","added_by":"auto","created_at":"2025-08-26 12:02:11","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1384764,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6250363/v1/2d3c1e91-7704-43cb-b039-01556dc0f61a.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Mechanisms in the transition from systemic lupus erythematosus to lupus nephritis: A bioinformatics and functional analysis approach","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eSLE is an autoimmune disease involving an inappropriate immune response to endogenous nuclear particles, which affects multiple organs and systems [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. LN, an immune complex glomerulonephritis, is one of the most common and severe target organ manifestations of SLE and the leading cause of SLE related death [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. Patients with LN usually present with findings of nephritic (e.g., hematuria, generalized edema, hypertension) and/or nephrotic (generalized edema, frothy urine) glomerular disease [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. The treatment of LN usually involves immunosuppressive therapy, typically with mycophenolate mofetil or cyclophosphamide and with glucocorticoids, although these treatments are not uniformly effective [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. Within 10 years of an initial SLE diagnosis, 5\u0026ndash;20% of patients with LN develop ESRD, and the multiple comorbidities associated with immunosuppressive treatment, including infections, osteoporosis and cardiovascular and reproductive effects, remain a concern [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. According to the guidelines, a reliable criterion for diagnosing LN is the histopathological confirmation obtained through renal biopsy [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e, \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. However, kidney biopsy is an invasive procedure associated with the risk of bleeding and is expensive. Consequently, it poses limitations to physicians in their ability to dynamically monitor and manage the disease progression of SLE. Currently, commonly used laboratory markers for LN include urinary protein, serum creatinine, glomerular filtration rate, anti-dsDNA antibody, and serum complements [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. However, these parameters cannot meet the actual needs of clinical settings due to lack of sensitivity and specificity [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e, \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. Therefore, it is particularly important to find simple, non-invasive and effective biomarker to predict the risk of LN in SLE patients.\u003c/p\u003e \u003cp\u003eRenal glomerulus plays a crucial role in the onset and progression of LN [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. In individuals with SLE, the immune system produces autoantibodies and immune complexes that gradually accumulate within the renal glomeruli [\u003cspan additionalcitationids=\"CR13\" citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]. This accumulation triggers an inflammatory response, resulting in glomerular damage and dysfunction. In addition, several studies have provided insights into the involvement of susceptibility genes in LN, disrupting immune tolerance and promoting disease development, and these genes amplify innate immune signaling pathways, promote lymphocyte activation, and ultimately lead to renal damage [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. Targeting specific immune cell populations or manipulating their functions could alleviate inflammation, reduce tissue damage, and improve the prognosis for patients with renal diseases [\u003cspan additionalcitationids=\"CR16\" citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e]. In summary, understanding the molecular mechanisms underlying the transition from SLE to the LN disease stage can help develop more effective diagnostic and therapeutic strategies.\u003c/p\u003e \u003cp\u003eThis study leveraged bioinformatics tools to analyze the gene expression omnibus (GEO) datasets GSE99967 and GSE82221. In the GSE99967 data set, shared genes were obtained by crossing the MEturquoise module genes selected by weighted gene co-expression network analysis (WGCNA) and the DEGs obtained by differential analysis. Functional enrichment analyses were performed on shared genes using the DAVID database to elucidate the molecular mechanisms from SLE to LN. Subsequently, we constructed a protein-protein interaction (PPI) network to identify core genes, which were further refined using machine learning for biomarker gene selection. The candidate diagnostic gene was validated using the receiver operating characteristic (ROC) curve and single-gene difference analysis both in the training and validation sets. We also performed immune cell infiltration analysis and determined correlations between immune cells and the diagnostic biomarker. Collectively, our integrative bioinformatics approach aims to discover early diagnostic gene associated with the transition from SLE to LN to improve clinical diagnosis and treatment.\u003c/p\u003e"},{"header":"2. Materials and methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.1 Data sources and workflow\u003c/h2\u003e \u003cp\u003eGSE99967 [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e] and GSE82221 [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e] datasets were downloaded from the Gene Expression Omnibus (GEO,\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.ncbi.nlm.nih.gov/geo/\u003c/span\u003e\u003cspan address=\"https://www.ncbi.nlm.nih.gov/geo/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) database [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e]. The platform of GSE99967 was GPL21970 (Affymetrix Human Gene 2.0 ST Array), and the whole peripheral blood samples from 29 LN patients and 13 SLE controls were kept for further analysis. Additionally, we also obtained the GSE82221 dataset consisting of peripheral blood samples from 15 LN patients and 15 SLE controls for external validation purposes. The platform of GSE82221 was GPL13534 (Illumina HumanMethylation450 BeadChip). The workflow of this study is illustrated in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e2.2 Differential expression genes analysis\u003c/h2\u003e \u003cp\u003eThe Limma package in R was employed for differential expression genes analysis on the GSE99967 dataset. The criterion for selecting DEGs was set to p\u0026thinsp;\u0026lt;\u0026thinsp;0.05 and |logFC|\u0026gt;1 to ensure significance. Finally, the packages \"ggplot2\" and \"pheatmap\" in the R language were used to visualize the DEGs as heatmaps and volcano plots, respectively.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e2.3 Weighted gene co-expression network analysis\u003c/h2\u003e \u003cp\u003eWGCNA is a systems biology method used to construct gene co-expression networks and identify gene modules related to biological traits. By analyzing the similarity in expression patterns between genes, WGCNA reveals gene regulatory mechanisms and functional modules, which are crucial for understanding complex biological processes [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThe WGCNA package was employed for the WGCNA analysis of GSE99967, examining the correlation between modules and disease status. Through WGCNA, we clustered genes into different modules and selected the module with highest correlation to LN. In this way, we finally obtained MEturquoise module.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003e2.4 Gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) pathway analysis\u003c/h2\u003e \u003cp\u003eThe shared genes were obtained by crossing \"DEGs\" and \"MEturquoise\". DAVID (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://davidbioinformatics.nih.gov\u003c/span\u003e\u003cspan address=\"https://davidbioinformatics.nih.gov\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) is an open database that integrates biological data and analytical tools for functional annotation of genes and pathways [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e]. GO is a bioinformatics tool for annotating genes and analyzing the biological processes they are involved in. KEGG is a database for analyzing relevant signaling pathways in largescale molecular datasets generated by high-throughput experimental techniques [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e]. We conducted GO and KEGG enrichment analysis in the DAVID database to explore the functional biological roles of shared genes. The results were then visualized on a bioinformatics platform (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://bioinformatics.com.cn/\u003c/span\u003e\u003cspan address=\"http://bioinformatics.com.cn/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003e2.5 Protein-protein interaction network analysis and core genes identification\u003c/h2\u003e \u003cp\u003eSTRING is a widely used protein interaction database that integrates known and predicted protein-protein interaction information [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e]. To explore the functional associations and interactions among the shared genes, we imported them into STRING (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://string-db.org\u003c/span\u003e\u003cspan address=\"https://string-db.org\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) to built a PPI network (confidence\u0026thinsp;\u0026gt;\u0026thinsp;0.4). In addition, we also used CytoHubba plugin in Cytoscape software [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e] to identify core genes, and the core genes were collected by Degree algorithm.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003e2.6 Machine learning\u003c/h2\u003e \u003cp\u003eAs a dimension reduction approach, least absolute shrinkage and selection operator (LASSO) regression exhibits superior performance when evaluating high-dimensional data compared to regression analysis and uses regularization to improve prediction accuracy. A 10-fold cross-verification of LASSO analysis was performed using the \u0026ldquo;glmnet\u0026rdquo; package by a turning or penalty parameter [\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e]. Random forest (RF) is a supervised machine learning algorithm built with a decision tree algorithm and is used to solve regression and classification problems. Feature importance was determined by the Mean Decrease Gini Index calculated by RF [\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e]. A support vector machine-recursive feature elimination (SVM-RFE) model was compared by the average misjudgement rates of their 10-fold cross-validations using the \u0026ldquo;e1071\u0026rdquo; software package [\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e]. As a novel machine learning technique, SVM-RFE can rank features based on recursion to avoid overfitting [\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e]. The selected core genes were analyzed by these three machine learning algorithms to further screen out the candidate diagnostic biomarker. Finally, ICOS was identified as the candidate diagnostic biomarker by intersecting the results acquired from the SVM-RFE, LASSO, and RF models.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003e2.7 Diagnostic biomarker validation\u003c/h2\u003e \u003cp\u003eReceiver operating characteristic (ROC) analysis was conducted to evaluate predictive accuracy of the candidate gene. The area under the curve (AUC), sensitivity, and 1-specificity were calculated, and the AUC values quantified the ability of the gene to differentiate between classes, with higher AUC values indicating better performance. That with AUC value\u0026thinsp;\u0026gt;\u0026thinsp;0.70 was considered with the ideal diagnostic value. ROC curve analyses in both training (GSE99967) and validation (GSE82221) sets were used to evaluate the diagnostic values of candidate gene. In addition, we undertook a rigorous validation of the expression patterns of the candidate gene utilizing these two datasets. This validation was essential to substantiate the reliability of the gene. By employing the above methods, we successfully identified ICOS as a significant blood diagnostic biomarker for identifying the transition from SLE to LN.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003e2.8 Immune infiltration analysis\u003c/h2\u003e \u003cp\u003eThe CIBERSORT algorithm is used to infer the relative abundance of different immune cell subsets from gene expression data [\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e]. It estimates the relative abundance of various immune cell types by calculating the correlation between the gene expression profile and known immune cell gene expression patterns. A CIBERSORT algorithm was applied to analyze immune cells infiltration of the diagnostic biomarker. The results were visualized using \"pheatmap\", \"gglot2\", \"corrplot\", and \"vioplot\" packages. Spearman correlation analysis was employed to determine the correlation between immune cells and diagnostic biomarker.\u003c/p\u003e \u003c/div\u003e"},{"header":"3. Results","content":"\u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003e3.1 Identification of DEGs\u003c/h2\u003e \u003cp\u003eThe volcano plot showed that using |log(FC)|\u0026gt;1, p-value\u0026thinsp;\u0026lt;\u0026thinsp;0.05 as the screening threshold, 117 DEGs were identified in the GSE99967 dataset, of which 73 genes were up-regulated and 44 genes were down-regulated in terms of expression (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eA). The heatmap showed the top 30 genes upregulated and downregulated in the GSE99967 dataset, respectively, as in (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eB). We created a dataset called \"DEGs\" (containing the above 117 up-regulated and down-regulated genes).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003e3.2 WGCNA and key module identification\u003c/h2\u003e \u003cp\u003eIn the GSE99967 dataset, A key module named \"MEturquoise\" was selected through WGCNA analysis. The blockwise modules function was used to cluster the samples, resulting in a cluster dendrogram (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eA). Heatmap of the module-trait relationship was plotted (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eB).This network analysis allowed us to focus on gene co-expression patterns, highlighting genes that are likely to be functionally related and potentially significant in LN. The dataset named \"MEturquoise\" with 578 genes was comprised. Finally, we obtained 27 shared genes by crossing \"DEGs\" and \"MEturquoise\" datasets (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eC).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003e3.3 Functional enrichment analysis results\u003c/h2\u003e \u003cp\u003eWe conducted KEGG pathway enrichment analysis on the shared genes, revealing significant enrichment in several key biological pathways. We found that these genes are enriched in some immune response-related cancer pathways, such as thyroid cancer, endometrial cancer, basal cell carcinoma and acute myeloid leukemia (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eA). In addition, these shared genes are enriched in some typical immune response signaling pathways, such as Intestinal immune network for IgA production and T cell receptor signaling pathway (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eA). GO analysis results show that shared genes are mainly related to the following biological processes(BP) and cellular component(CC), including adaptive immune response, immune response, T cell receptor signaling pathway, negative thymic T cell selection, T cell costimulation, and T cell receptor complex (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eB). Some biological processes related to inflammatory response are related to the risk of LN in SLE patients, such as cellular response to interleukin-4 and cellular response to cytokine stimulus (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eB).\u003c/p\u003e \u003cp\u003eIn summary, enrichment analysis identified several important biological pathways and processes associated with the progression of SLE to LN, particularly those related to immune response and inflammation. These findings provide important insights into the molecular mechanisms of the transition from SLE to LN and provide potential targets for diagnosis and treatment.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003e3.4 Construction of PPI network and core genes identification\u003c/h2\u003e \u003cp\u003eIn order to better understand the interactions between the above shared genes, we used STRING, to perform PPI network construction on the shared genes (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eA), imported the results into Cytoscape software, and used the CytoHubba plugin, to identify the core genes. Degree algorithm was used to obtain the five core genes (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eB). These core genes included: CD28, LEF1, TCF7, ICOS, andCCR7.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003e3.5 Identification of the candidate diagnostic biomarker\u003c/h2\u003e \u003cp\u003eWe constructed a prediction model for the diagnosis of disease using three different algorithms to distinguish the LN patients from SLE controls. Two out of five LN-related features were screened using the LASSO algorithm (Figure\u0026ensp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eA, B\u0026ensp;). Next, we identified feature importance using random forests and the top five genes were selected as diagnostic genes (\u0026ensp;Figure\u0026ensp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eC, D\u0026ensp;). Then, features were selected and three genes were identified as the best candidates for LN based on SVM-RFE (\u0026ensp;Figure\u0026ensp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eE, F\u0026ensp;). Finally, we intersect the candidate genes acquired from the SVM-RFE, LASSO, and RF models, and ICOS was identified as the candidate diagnostic biomarker for follow-up steps (\u0026ensp;Figure\u0026ensp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eG\u0026ensp;).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003e3.6 Validation of diagnostic biomarker\u003c/h2\u003e \u003cp\u003eIn order to verify the diagnostic value of the selected candidate gene in differentiating LN and SLE, ROC curve analysis was performed. The results showed that the AUCs of ICOS in GSE99967 and GSE82221 were 0.851 and 0.849, respectively (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003eA, B). This demonstrates that ICOS has high diagnostic accuracy in distinguishing LN patients from SLE controls. ICOS can be used as a potential molecular biomarker to provide an important reference for early diagnosis and personalized treatment of the transition from SLE to LN. We also visualized the expression trends of ICOS in the GSE99967 and GSE82221 datasets, and the results showed that ICOS showed a downward trend in the transition from SLE to LN disease state (p\u0026thinsp;\u0026lt;\u0026thinsp;0.05), which emphasized the potential of ICOS for further development and application in this field (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003eC, D).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003e3.7 Immune infiltration analysis\u003c/h2\u003e \u003cp\u003eThe proportion of immune infiltration cells was analyzed by CIBERSORT. Figure\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003eA, B exhibited the distribution of immune cells in each sample using barplot and heatmap. Compared to SLE control samples, LN samples exhibited higher level of neutrophils, while fewer naive CD4\u0026thinsp;+\u0026thinsp;T cells and eosinophils (Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003eC). Subsequently, the correlation heatmap between individual immune cells revealed that activated memory CD4\u0026thinsp;+\u0026thinsp;T cells (r=-0.55) and neutrophils(r=-0.77) were negatively related with resting memory CD4\u0026thinsp;+\u0026thinsp;T cells, and naive CD4\u0026thinsp;+\u0026thinsp;T cells(r=-0.52) was negatively related with monocytes, whereas memory B cells(r\u0026thinsp;=\u0026thinsp;0.55) was positively related with naive CD4\u0026thinsp;+\u0026thinsp;T cells (Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003eD). These findings indicate distinct immune patterns in LN patients compared to SLE controls, as well as interactions between various types of immune cells.\u003c/p\u003e \u003cp\u003eIn addition, the correlation analysis between ICOS and immune cells was investigated. The results demonstrated that naive B cells, memory B cells, plasma cells, naive CD4\u0026thinsp;+\u0026thinsp;T cells, resting memory CD4\u0026thinsp;+\u0026thinsp;T cells, and eosinophils were positively correlated with ICOS, monocytes and neutrophils were negatively correlated with ICOS (Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003eE). This suggests that ICOS may play an important role in the pathogenesis of SLE transition to LN by influencing immune cells infiltration.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"4. Discussion","content":"\u003cp\u003eCurrently, the reliable standard for diagnosing LN is histopathological confirmation obtained through renal biopsy, but renal biopsy is an invasive procedure with a risk of bleeding and is expensive. It is particularly important to find a simple, non-invasive, and effective biomarker to predict the risk of LN in SLE patients. Based on this, this study used the GSE99967 and GSE82221 microarray data sets containing peripheral blood samples instead of other renal tissue biopsy samples. This study integrates analysis of the GEO dataset GSE99967 with bioinformatics to screen and identify the diagnostic marker involved in the progression from SLE to LN. Utilizing WGCNA, we identified DEGs and highlighted the key module MEturquoise through clustering and functional annotation. Subsequently, we identified 5 core genes using PPI network analysis. With machine learning development, people can implement medical frameworks or disease activities. machine learning has shown capability and affected clinical decision-making in multiple fields, predicting mortality [\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e], response to biological agents [\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e], and disease activity [\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e]. This study identified an candidate diagnostic gene: ICOS through three machine learning methods: Lasso, RF and SVM-RFE. Therefore, ICOS may serve as a promising biomarker and predictor for future diagnostic and therapeutic strategies for progression from SLE to LN.\u003c/p\u003e \u003cp\u003eTo further validate our findings, we utilized an independent GEO dataset, GSE82221, for dataset verification. ICOS was validated through ROC curves and single-gene differential analysis. The results demonstrated a strong correlation between ICOS and the progression of SLE to LN, thereby further supporting our study results.\u003c/p\u003e \u003cp\u003eImmune infiltration analysis indicated distinct immune patterns in LN patients compared to SLE controls, as well as interactions between various types of immune cells. Naive B cells, memory B cells, plasma cells, naive CD4\u0026thinsp;+\u0026thinsp;T cells, resting memory CD4\u0026thinsp;+\u0026thinsp;T cells, and eosinophils were positively correlated with ICOS, while monocytes and neutrophils were negatively correlated with ICOS. This suggests that ICOS may play an important role in the pathogenesis of SLE transition to LN by influencing immune cells infiltration. These immune infiltration changes not only provided crucial clues for understanding the immunopathological mechanisms of SLE to LN but also suggested potential avenues for future targeted immune therapy strategies. These might include modulating specific immune cell subgroup functions and disrupting intercellular interactions, aiming for precise interventions in the disease progression.\u003c/p\u003e \u003cp\u003eKyoto Encyclopedia of genes and genomes pathway enrichment analysis revealed that shared genes were enriched in some immune response-related cancer pathways, such as thyroid cancer, endometrial cancer, basal cell carcinoma and acute myeloid leukemia. Autoimmune thyroid diseases (AITD) result from a dysregulation of the immune system leading to an immune attack on the thyroid, which is associated with other organ specific (polyglandular autoimmune syndromes), or systemic autoimmune disorders[\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e]. Moreover, several studies have shown an association of AITD and papillary thyroid cancer[\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e]. Endometrial tissue contains numerous leukocytes varying in number and phenotype throughout the menstrual cycle [\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e]. Leukocytes are more abundant before menstruation, probably in relation to the immune protection required during endometrial disruption. Therefore, tumor immune response may be specifically enhanced in endometrial cancer cells. The immune system plays a key role in the suppression and progression of basal cell carcinoma, and there are multiple mechanisms by which basal cell carcinoma evades the anti-tumor immune response[\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e]. Acute myeloid leukemia is a genetically, epigenetically, and clinically heterogeneous disease characterized by accumulation and expansion of immature myeloid cells in the bone marrow and peripheral blood, with consequent failure of normal hematopoiesis [\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e], which has been considered an immunoresponsive malignancy [\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e]. In addition, these shared genes are enriched in some typical immune response signaling pathways, such as Intestinal immune network for IgA production and T cell receptor signaling pathway. Gene ontology analysis revealed some biological processes related to inflammatory and immune responses are related to the risk of LN in SLE patients. In summary, enrichment analysis revealed some key biological pathways and functions related to processes such as immune response and inflammation, providing a new perspective on the pathogenesis of transition from SLE to LN.\u003c/p\u003e \u003cp\u003eInducible co-stimulator (ICOS) is a cell-enhanced co-stimulatory receptor that has shown great potential in the regulation of innate and adaptive immunity[\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e]. ICOS may play an important role in adaptive immunity by regulating the interaction between T cells and antigen-presenting cells. Disruption of this molecule can lead to autoimmune diseases, in particular SLE. A case-control study found that ICOS rs11889031 may act as a risk factor for SLE and could be used as a genetic susceptibility biomarker [\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e]. In addition, ICOS is a potent promoter of organ inflammation in murine lupus. ICOS stimulates T follicular helper cell differentiation in lymphoid tissue, suggesting that it might drive autoimmunity by enhancing autoantibody production. Yet, the pathogenic relevance of this mechanism remains unclear. A study shows that selective ablation of ICOS ligand (ICOSL) in CD11c\u0026thinsp;+\u0026thinsp;cells but not in B cells dramatically ameliorates kidney and lung inflammation in lupus-prone MRL [\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eIn conclusion, in addition to being a potential biomarker for predicting the progression of SLE to LN, ICOS also represents a promising therapeutic target. By specifically regulating the expression or activity of ICOS, it is possible to intervene in the risk of LN in SLE patients and provide new clinical treatment strategies. Since the above data comes from previous studies, it should be valid, but it also has limitations and requires further new and in-depth research to verify.\u003c/p\u003e"},{"header":"5. Conclusion","content":"\u003cp\u003eICOS not only the potential biomarker and predictor for the transition from SLE to LN but also hold potential as a therapeutic target. These findings offer significant insights into the molecular mechanisms of SLE to LN and pave the way for the development of novel therapeutic approaches, holding considerable clinical application prospects.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eAuthor contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eConceptualization, writing, and original draft preparation: Hua Li; Writing, review, and editing: Hua Li and Yike Zou; Methodology: Hua Li, Yike Zou and Yuchi Wang; Resources: Hua Li, Yike Zou, Yuchi Wang and Peng Zu; Formal analysis and investigation: Hua Li, Yike Zou, Yuchi Wang, Peng Zu and Jiahao Chen; Supervision: Hongwei Su.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eDisclosure statement\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNo potential conflict of interest was reported by the authors.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eScience and Technology Department of Sichuan Province of China (No. 2022YFS0621).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData availability statement\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe Gene expression profiles \u0026nbsp;were obtained from the Gene Expression Omnibus (GEO) database at the following link: https://www.ncbi.nlm.nih.gov/geo/, and the gene expression profiles used in this research, GSE99967 and GSE82221, were accessed at the following link: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE99967 and https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE82221. All data generated or analyzed during this study are included in this published article.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eKaul A, Gordon C, Crow MK, Touma Z, Urowitz MB, van Vollenhoven R, et al. Systemic lupus erythematosus. Nat Rev Dis Primers. (2016) 2:16039. 10.1038/nrdp.2016.39\u003c/li\u003e\n\u003cli\u003eLech M, Anders HJ. The pathogenesis of lupus nephritis. J Am Soc Nephrol. (2013) 24:1357\u0026ndash;66. 10.1681/ASN.2013010026\u003c/li\u003e\n\u003cli\u003eShen L, Lan L, Zhu T, Chen H, Gu H, Wang C, Chen Y, Wang M, Tu H, Enghard P, Jiang H, Chen J. Identification and Validation of IFI44 as Key Biomarker in Lupus Nephritis. Front Med (Lausanne). 2021 Oct 25;8:762848. \u003c/li\u003e\n\u003cli\u003eOmer MH, Shafqat A, Ahmad O, Nadri J, AlKattan K, Yaqinuddin A. Urinary Biomarkers for Lupus Nephritis: A Systems Biology Approach. J Clin Med. 2024 Apr 18;13(8):2339.\u003c/li\u003e\n\u003cli\u003eAnders HJ, Saxena R, Zhao MH, Parodis I, Salmon JE, Mohan C. Lupus nephritis. Nat Rev Dis Primers. 2020 Jan 23;6(1):7.\u003c/li\u003e\n\u003cli\u003eFanouriakis A, Kostopoulou M, Cheema K, Anders HJ, Aringer M, Bajema I, et al. 2019 Update of the Joint European League Against Rheumatism and European Renal Association-European Dialysis and Transplant Association (EULAR/ERA-EDTA) recommendations for the management of lupus nephritis. Ann Rheum Dis (2020) 79:713\u0026ndash;23.\u003c/li\u003e\n\u003cli\u003eKidney Disease: Improving Global Outcomes Glomerular Diseases Work G . KDIGO 2021 clinical practice guideline for the management of glomerular diseases. Kidney Int (2021) 100:S1\u0026ndash;S276.\u003c/li\u003e\n\u003cli\u003eYu C, Li P, Dang X, Zhang X, Mao Y, Chen X. Lupus nephritis: new progress in diagnosis and treatment. J Autoimmun (2022) 132:102871.\u003c/li\u003e\n\u003cli\u003eMorell M, Perez-Cozar F, Maranon C. Immune-related urine biomarkers for the diagnosis of lupus nephritis. Int J Mol Sci (2021) 22(13):7143.\u003c/li\u003e\n\u003cli\u003eMejia-Vilet JM, Malvar A, Arazi A, Rovin BH. The lupus nephritis management renaissance. Kidney Int (2022) 101:242\u0026ndash;55.\u003c/li\u003e\n\u003cli\u003eWang Z, Hu D, Pei G, Zeng R, Yao Y. Identification of driver genes in lupus nephritis based on comprehensive bioinformatics and machine learning. Front Immunol. 2023 Dec 7;14:1288699.\u003c/li\u003e\n\u003cli\u003eTojo T, Friou GJ. Lupus nephritis: varying complement-fixing properties of immunoglobulin G antibodies to antigens of cell nuclei. Science (1968) 161(3844):904\u0026ndash;6.\u003c/li\u003e\n\u003cli\u003eJacob N, Stohl W. Autoantibody-dependent and autoantibody-independent roles for B cells in systemic lupus erythematosus: past, present, and future. Autoimmunity (2010) 43(1):84\u0026ndash;97.\u003c/li\u003e\n\u003cli\u003eYung S, Chan TM. Autoantibodies and resident renal cells in the pathogenesis of lupus nephritis: getting to know the unknown. Clin Dev Immunol 2012 (2012) p:139365. \u003c/li\u003e\n\u003cli\u003eArazi A, Rao DA, Berthier CC, Davidson A, Liu Y, Hoover PJ, et al. The immune cell landscape in kidneys of patients with lupus nephritis. Nat Immunol (2019) 20(7):902\u0026ndash;14.\u003c/li\u003e\n\u003cli\u003eBanchereau R, Hong S, Cantarel B, Baldwin N, Baisch J, Edens M, et al. Personalized immunomonitoring uncovers molecular networks that stratify lupus patients. Cell (2016) 165(3):551\u0026ndash;65.\u003c/li\u003e\n\u003cli\u003eDavidson A. What is damaging the kidney in lupus nephritis? Nat Rev Rheumatol (2016) 12(3):143\u0026ndash;53.\u003c/li\u003e\n\u003cli\u003eWither JE, Prokopec SD, Noamani B, Chang NH, Bonilla D, Touma Z, Avila-Casado C, Reich HN, Scholey J, Fortin PR, Boutros PC, Landolt-Marticorena C. Identification of a neutrophil-related gene expression signature that is enriched in adult systemic lupus erythematosus patients with active nephritis: Clinical/pathologic associations and etiologic mechanisms. PLoS One. 2018 May 9;13(5):e0196117.\u003c/li\u003e\n\u003cli\u003eZhu H, Mi W, Luo H, Chen T, Liu S, Raman I, Zuo X, Li QZ. Whole-genome transcription and DNA methylation analysis of peripheral blood mononuclear cells identified aberrant gene regulation pathways in systemic lupus erythematosus. Arthritis Res Ther. 2016 Jul 13;18:162.\u003c/li\u003e\n\u003cli\u003eWang Z, Monteiro CD, Jagodnik KM, Fernandez NF, Gundersen GW, Rouillard AD, Jenkins SL, Feldmann AS, Hu KS, McDermott MG, et al. Extraction and analysis of signatures from the gene expression omnibus by the crowd. Nat Commun. 2016;7:12846.\u003c/li\u003e\n\u003cli\u003eLangfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9(1):559.\u003c/li\u003e\n\u003cli\u003eSherman BT, Hao M, Qiu J, Jiao X, Baseler MW, Lane HC, Imamichi T, Chang W. DAVID: a web server for functional enrichment analysis and functional annotation of gene lists (2021 update). Nucleic Acids Res. 2022 Jul 5;50(W1):W216-W221.\u003c/li\u003e\n\u003cli\u003eKanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 2017;45:D353\u0026ndash;D361. \u003c/li\u003e\n\u003cli\u003eDoncheva NT, Morris JH, Gorodkin J, et al. Cytoscape StringApp: network analysis and visualization of proteomics data. J Proteome Res. 2019;18(2):623\u0026ndash;632.\u003c/li\u003e\n\u003cli\u003eShannon P, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13:2498\u0026ndash;2504.\u003c/li\u003e\n\u003cli\u003eEngebretsen S, Bohlin J. Statistical predictions with glmnet. Clin Epigenetics. 2019 Aug 23;11(1):123.\u003c/li\u003e\n\u003cli\u003eGarge NR, Bobashev G, Eggleston B. Random forest methodology for model-based recursive partitioning: the mobForest package for R. BMC Bioinformatics. 2013 Apr 11;14:125.\u003c/li\u003e\n\u003cli\u003eSundermann B, Bode J, Lueken U, Westphal D, Gerlach AL, Straube B, Wittchen HU, Str\u0026ouml;hle A, Wittmann A, Konrad C, Kircher T, Arolt V, Pfleiderer B. Support Vector Machine Analysis of Functional Magnetic Resonance Imaging of Interoception Does Not Reliably Predict Individual Outcomes of Cognitive Behavioral Therapy in Panic Disorder with Agoraphobia. Front Psychiatry. 2017 Jun 9;8:99.\u003c/li\u003e\n\u003cli\u003eZhao E, Xie H, Zhang Y. Predicting Diagnostic Gene Biomarkers Associated With Immune Infiltration in Patients With Acute Myocardial Infarction. Front Cardiovasc Med. 2020 Oct 23;7:586871.\u003c/li\u003e\n\u003cli\u003eNewman AM, Liu CL, Green MR, et al. Robust enumeration of cell subsets from tissue expression profiles. Nat Methods. 2015;12(5):453\u0026ndash;457.\u003c/li\u003e\n\u003cli\u003eLezcano-Valverde JM, Salazar F, Le\u0026oacute;n L, et al. Development and validation of a multivariate predictive model for rheumatoid arthritis mortality using a machine learning approach. Sci Rep. 2017;7(1):10189. \u003c/li\u003e\n\u003cli\u003eGuan Y, Zhang H, Quang D, et al. Machine learning to predict anti-Tumor necrosis factor drug responses of rheumatoid arthritis patients by integrating clinical and genetic markers. Arthritis Rheumatol. 2019;71(12):1987\u0026ndash;1996. \u003c/li\u003e\n\u003cli\u003eKegerreis B, Catalina MD, Bachali P, et al. Machine learning approaches to predict lupus disease activity from gene expression data. Sci Rep. 2019;9(1):9617.\u003c/li\u003e\n\u003cli\u003eAntonelli A, Ferrari SM, Corrado A, Di Domenicantonio A, Fallahi P. Autoimmune thyroid disorders. Autoimmun Rev. 2015 Feb;14(2):174-80.\u003c/li\u003e\n\u003cli\u003eVanderstraeten A., Tuyaerts S., Amant F. The Immune System in the Normal Endometrium and Implications for Endometrial Cancer Development. J. Reprod. Immunol. 2015;109:7\u0026ndash;16.\u003c/li\u003e\n\u003cli\u003eZilberg C, Lyons JG, Gupta R, Damian DL. The Immune Microenvironment in Basal Cell Carcinoma. Ann Dermatol. 2023 Aug;35(4):243-255.\u003c/li\u003e\n\u003cli\u003eD\u0026ouml;hner H, Weisdorf DJ, Bloomfield CD. Acute myeloid leukemia. N Engl J Med. 2015;373(12):1136\u0026ndash;1152. doi: 10.1056/NEJMra1406184.\u003c/li\u003e\n\u003cli\u003eVago L, Gojo I. Immune escape and immunotherapy of acute myeloid leukemia. J Clin Invest. 2020 Apr 1;130(4):1552-1564.\u003c/li\u003e\n\u003cli\u003eWu G, He M, Ren K, Ma H, Xue Q. Inducible Co-Stimulator ICOS Expression Correlates with Immune Cell Infiltration and Can Predict Prognosis in Lung Adenocarcinoma. Int J Gen Med. 2022 Apr 6;15:3739-3751.\u003c/li\u003e\n\u003cli\u003eHoussaini H, Bouallegui E, Abida O, Tahri S, Elloumi N, Hachicha H, Marzouk S, Bahloul Z, Masmoudi H, Fakhfakh R. ICOS gene polymorphisms in systemic lupus erythematosus: A case-control study. Int J Immunogenet. 2023 Aug;50(4):194-205.\u003c/li\u003e\n\u003cli\u003eTeichmann LL, Cullen JL, Kashgarian M, Dong C, Craft J, Shlomchik MJ. Local triggering of the ICOS coreceptor by CD11c(+) myeloid cells drives organ inflammation in lupus. Immunity. 2015 Mar 17;42(3):552-65.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Lupus nephritis, Systemic lupus erythematosus, machine learning, weighted gene co-expression network analysis, immune infiltration analysis","lastPublishedDoi":"10.21203/rs.3.rs-6250363/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6250363/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eLupus nephritis (LN), a severe renal complication of systemic lupus erythematosus (SLE), represents the primary driver of end-stage renal disease and mortality in SLE populations. Current immunosuppressive therapies carry significant comorbidity risks, underscoring the urgent need for non-invasive biomarkers enabling early LN detection. This study employed integrated bioinformatics approaches to identify circulating biomarkers predictive of SLE-to-LN transition. Analysis of the GSE99967 dataset revealed peripheral blood-derived differentially expressed genes (DEGs) between LN patients and SLE controls. Through weighted gene co-expression network analysis (WGCNA) and protein-protein interaction network construction, we identified hub genes subsequently refined via three machine learning algorithms: LASSO regression, random forest, and SVM-RFE. Functional enrichment analyses using DAVID, GO, and KEGG pathways elucidated immune-related biological processes. The diagnostic performance of candidate biomarkers was rigorously validated through ROC curve evaluation across both training (GSE99967) and independent validation (GSE82221) cohorts, complemented by immune infiltration profiling to delineate cellular correlations. Our multi-modal approach consistently identified inducible T-cell costimulator (ICOS) as a pivotal biomarker, demonstrating superior diagnostic accuracy (AUC [specify value] in training, [value] in validation sets) for LN progression prediction. Mechanistically, ICOS expression patterns showed significant associations with Th cell subset infiltration, suggesting its dual role as both diagnostic indicator and immunopathogenic mediator. These findings position ICOS as a promising non-invasive biomarker capable of guiding early therapeutic intervention and personalized management of SLE patients at risk for nephritis progression, potentially circumventing the need for invasive renal biopsies while addressing critical unmet needs in lupus nephropathy surveillance.\u003c/p\u003e","manuscriptTitle":"Mechanisms in the transition from systemic lupus erythematosus to lupus nephritis: A bioinformatics and functional analysis approach","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-07-02 11:55:06","doi":"10.21203/rs.3.rs-6250363/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"ad6afdca-88c5-40bd-89ec-51210ea1f4c5","owner":[],"postedDate":"July 2nd, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":50818774,"name":"Biological sciences/Biological techniques"},{"id":50818775,"name":"Biological sciences/Immunology"},{"id":50818776,"name":"Health sciences/Diseases"}],"tags":[],"updatedAt":"2025-08-26T11:54:04+00:00","versionOfRecord":[],"versionCreatedAt":"2025-07-02 11:55:06","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-6250363","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6250363","identity":"rs-6250363","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.