A machine learning and SHAP-based systems analysis of post-translational modification–related genes in nonalcoholic steatohepatitis

preprint OA: closed CC-BY-4.0
📄 Open PDF Full text JSON View at publisher
Full text 103,038 characters · extracted from preprint-html · click to expand
A machine learning and SHAP-based systems analysis of post-translational modification–related genes in nonalcoholic steatohepatitis | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article A machine learning and SHAP-based systems analysis of post-translational modification–related genes in nonalcoholic steatohepatitis Chao Yu, Ronglin Xu, Jingjing Bai, Pengcheng Jia, Jie Chen, Ruomu Ge, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8782305/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Nonalcoholic steatohepatitis (NASH) is a multifactorial metabolic liver disease characterized by marked clinical and molecular heterogeneity, which complicates disease characterization and patient stratification. Post-translational modification (PTM)–related genes are known to participate in metabolic and inflammatory regulation; however, their system-level relevance in NASH remains insufficiently defined. In this study, transcriptomic data from liver tissues of patients with NASH were analyzed across multiple independent cohorts. Machine learning models incorporating SHapley Additive exPlanations (SHAP) were applied to evaluate PTM-related gene patterns associated with disease status, with model performance assessed through cross-validation and external datasets. Among the evaluated approaches, the combination of least absolute shrinkage and selection operator (LASSO) and linear discriminant analysis (LDA) showed the most consistent performance. This analysis repeatedly highlighted three PTM-related genes—PELI2, DUSP2, and TRIM56—that were associated with NASH across cohorts. Expression of these genes was related to inflammatory gene programs, lipid metabolism–associated pathways, fibrosis-related markers, and variations in immune cell infiltration. Stratification based on their expression profiles further delineated molecular subgroups of NASH with distinct immune and metabolic characteristics. Overall, this study provides a system-oriented characterization of PTM-related gene alterations in NASH and illustrates the utility of integrative analytical approaches for exploring molecular heterogeneity in complex metabolic diseases. General Cell Biology & Physiology NASH PTMs Machine learning SHAP Systems analysis Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Introduction Non-alcoholic steatohepatitis (NASH) represents a progressive stage of non-alcoholic fatty liver disease (NAFLD) and is characterized by hepatic lipid accumulation, chronic inflammation, hepatocellular injury, and persistent fibrosis. It has emerged as one of the leading causes of end-stage liver disease and hepatocellular carcinoma worldwide [ 1 – 3 ]. With the steadily increasing prevalence of NAFLD, approximately one-third of patients progress from simple steatosis to NASH, ultimately developing irreversible liver fibrosis or cirrhosis, thereby imposing a substantial global health burden [ 3 , 4 ]. Although liver biopsy remains the gold standard for the diagnosis of NASH, its invasiveness, sampling variability, and inter-observer inconsistency considerably limit its routine clinical application, resulting in a large proportion of affected individuals remaining undiagnosed [ 4 , 5 ]. Consequently, there is an urgent need to develop non-invasive molecular biomarkers to complement or replace biopsy-based diagnosis and to identify core therapeutic targets with translational potential. At the pathological level, lipotoxicity-induced multimodal programmed cell death—including apoptosis, pyroptosis, necroptosis, autophagy-related cell death, and ferroptosis—represents one of the key driving forces underlying the transition from NAFLD to NASH [ 6 , 7 ]. Concurrently, mitochondrial dysfunction, endoplasmic reticulum stress, and excessive immune cell activation synergistically amplify inflammatory injury and promote fibrotic progression [ 4 , 8 , 9 ]. Accumulating evidence indicates that protein post-translational modifications (PTMs) constitute a critical molecular layer regulating these pathological processes [ 10 ]. By modulating protein stability, conformation, subcellular localization, and interaction networks, PTMs influence lipid metabolic homeostasis, mitochondrial respiratory activity, cell death thresholds, and immune-inflammatory signaling. Indeed, dysregulation of PTMs has emerged as a potential common denominator underlying multi-pathway disturbances in NASH[ 10 ]. However, systematic evaluations of how PTMs-related genes contribute to NASH progression, and which of these genes may serve as clinically actionable diagnostic or therapeutic targets, remain limited. IIn the context of translational hospital-based research, machine learning approaches have become powerful tools for identifying disease-driving genes from high-dimensional transcriptomic data[ 11 – 13 ]. Feature reduction methods based on LASSO enable efficient selection of genes with predictive relevance[ 14 ], while supervised learning algorithms such as linear discriminant analysis (LDA) can further enhance model classification performance[ 15 ]. Moreover, multi-algorithm integration strategies improve the robustness of feature selection and increase model applicability across independent cohorts. Nevertheless, conventional machine learning models alone often provide limited insight into the contribution of individual features to model predictions, leaving challenges in interpretability when applied to complex diseases. With the emergence of explainable artificial intelligence, particularly SHapley Additive exPlanations (SHAP) based on cooperative game theory, new approaches have become available to address these challenges [ 16 – 18 ]. SHAP enables quantitative assessment of the independent contribution of each gene to model predictions, thereby highlighting PTMs-related genes with the greatest biological relevance in distinguishing NAFLD from NASH and allowing inference of the pathological pathways in which they may be involved. This transparent analytical framework facilitates closer integration of data-driven feature selection with mechanistic interpretation, helping to bridge the gap between statistical association and biological explanation. Based on this background, we developed an analytical framework integrating 134 machine learning algorithms with a SHAP-based global interpretability approach to identify key regulatory factors driving NASH progression from genome-wide PTMs-related genes. Through cross-cohort validation, decomposition of feature contributions, and functional enrichment analyses, we established a non-invasive prediction model capable of accurately identifying patients with NASH. In parallel, our results highlight the involvement of core PTMs-related genes in lipotoxicity-induced cell death, inflammatory activation, mitochondrial dysfunction, and fibrotic progression, providing a basis for improved understanding of NASH molecular pathology and for the exploration of targeted therapeutic strategies. Methods Data acquisition and preprocessing A total of four NASH-related liver tissue datasets (GSE126848, GSE135251, GSE89632, and GSE130970) were retrieved from the Gene Expression Omnibus (GEO) database. All datasets were converted into expression matrices based on their corresponding platform annotation files ( Supplementary Table 1 ). Batch effects across the four datasets were corrected using the ComBat algorithm implemented in the sva package. Subsequently, GSE126848 and GSE135251 were merged to form the Merge Cohort as the training set, while GSE89632, GSE130970, and the Meta Cohort (comprising all four liver tissue datasets) were used as validation sets. Identification of differentially expressed genes and PTMs-related genes Gene expression profiles between NASH samples and normal controls were compared in the training cohort. Differentially expressed genes (DEGs) were identified using the thresholds of |log₂FC| > 0.585 and adjusted P < 0.05. The identified DEGs were intersected with a curated PTMs-related gene set, which included genes involved in 20 types of post-translational modifications such as ubiquitination, acetylation, and phosphorylation ( Supplementary Table 2 ), to obtain key PTMs-related genes significantly altered in NASH. Protein–protein interaction (PPI) analysis of the candidate PTMs-related genes was subsequently performed using the GeneMANIA platform to explore potential functional relationships among these genes [ 19 , 20 ]. Machine learning–based identification of key PTMs-related genes and model development A total of twelve machine learning algorithms were applied, including Random Forest (RF), LASSO, Elastic Net (Enet), Ridge regression, stepwise generalized linear model (stepGLM), support vector machine (SVM), glmBoost, linear discriminant analysis (LDA), partial least squares generalized linear regression (plsRglm), gradient boosting machine (GBM), XGBoost, and Naive Bayes. Using a full permutation strategy, 134 combinations of “feature selection + model construction” were generated, in which the first algorithm was used for variable selection and the second algorithm for predictive model development. Candidate PTMs-related differentially expressed genes (PTMs-DEGs) were iteratively screened in the training cohort to identify the most stable set of key PTMs-related genes. Model performance was evaluated in both the training cohort and independent validation cohorts (GSE89632, GSE130970, and the Meta Cohort) using the area under the receiver operating characteristic curve (ROC-AUC), concordance index (C-index), and confusion matrix metrics (accuracy, specificity, and sensitivity). The model with the best overall performance and highest cross-cohort consistency was selected for subsequent interpretability analyses. SHAP-based interpretability analysis SHapley Additive exPlanations (SHAP) was applied to interpret the optimal predictive model. SHAP values were used to quantify the contribution of individual genes to model outputs, assess gene–gene interactions, evaluate the influence of gene expression thresholds on NASH risk (SHAP dependence plots), and identify gene-specific drivers of model predictions at the individual patient level (force plots) [ 16 , 17 ]. This approach enabled an interpretable assessment of the contribution of key PTMs-related genes to NASH prediction. Analysis of immune and metabolic characteristics Gene set variation analysis (GSVA) was performed to assess pathway-level perturbations regulated by PTMs-related genes in NASH, including inflammation, lipid metabolism, apoptosis, and fibrosis-related pathways[ 21 ]. Immune cell infiltration, encompassing macrophages, T cell subsets, NK cells, and dendritic cells, was evaluated using four complementary algorithms: single-sample gene set enrichment analysis (ssGSEA), CIBERSORT, Microenvironment Cell Populations-counter (MCPcounter), and Estimation of Proportions of Immune and Cancer cells (EPIC)[ 22 – 26 ]. Spearman correlation analysis was conducted to examine associations between key PTMs-related genes and immune cell abundance, clinical parameters (BMI, NAS score, and total cholesterol), as well as inflammation- and fibrosis-related genes[ 27 ]. NASH subtyping based on key PTMs-related genes Non-negative matrix factorization (NMF) was applied to cluster patients with NASH and identify molecular subtypes based on the expression profiles of key PTMs-related genes[ 28 ]. Differences between the identified subtypes were subsequently compared in terms of inflammatory activity, immune cell infiltration, lipid metabolism dysregulation, fibrosis severity, and signaling pathway perturbations to evaluate the stratification capacity of key PTMs-related genes. Experimental validation This study was approved by the Ethics Committee of the First Affiliated Hospital of Anhui Medical University (Approval No. Quick-PJ2024-5-43). Human liver tissue samples were obtained from the Department of General Surgery of the First Affiliated Hospital of Anhui Medical University. Liver tissues were subjected to hematoxylin and eosin (HE) staining and Oil Red O staining (Beyotime, China) for histopathological evaluation. HepG2 cells (provided by the Key Laboratory of Infectious Diseases of Anhui Province) were cultured in Dulbecco’s modified Eagle’s medium (DMEM) supplemented with 10% fetal bovine serum. The concentrations of TNF-α, IL-6, and IL-8 in the culture supernatants were determined using commercial ELISA kits (Beyotime, China).Total RNA was extracted from liver tissues using TRIzol reagent (Takara, China), followed by quantitative real-time PCR (qPCR). β-actin served as the internal reference gene, and relative mRNA expression levels were calculated using the 2^–ΔΔCt method. Protein expression was assessed by Western blotting using a PELI2 antibody (Proteintech, China). Proteins were separated by SDS–PAGE, transferred onto membranes, and detected using enhanced chemiluminescence (ECL). To investigate the functional role of PELI2, lentiviral vectors for PELI2 overexpression and knockdown (GenePharma Genomics, China) were constructed, and cell transfection was performed according to the manufacturer’s instructions. Steatosis, inflammation, and fibrosis in NASH liver tissues and cell models were evaluated by HE and Oil Red O staining. Immunohistochemistry (IHC) was performed to assess the expression of PELI2 in liver tissues. All experimental results were visualized and quantified using ImageJ software. Primer sequences and lentiviral constructs used in this study are provided in Supplementary Tables 3 and 4 , respectively. Statistical analysis All computational analyses were performed using R software (version 4.3.2) and the Strawberry Perl environment. Statistical analyses were conducted using GraphPad Prism 10. Continuous variables were analyzed using Student’s t test or non-parametric tests, as appropriate. A two-sided P value < 0.05 was considered statistically significant.* indicates P < 0.05, ** indicates P < 0.01, *** indicates P < 0.001, and ns indicates no significance. ⸻ Workflow chart The workflow of this study is illustrated in Fig. 1 . Results Sample characteristics and differential gene expression analysis To systematically characterize the PTMs-related gene landscape associated with NASH, four public transcriptomic datasets were integrated and subjected to batch effect correction (Fig. 2 A–D). The distribution of normal controls and NASH samples across cohorts is shown, with the Merge Cohort used as the training set and GSE89632, GSE130970, and the Meta Cohort used as independent validation sets (Fig. 2 E). Differential expression analysis of the training cohort was performed using the thresholds of |log₂FC| > 0.585 and P < 0.05, identifying 338 downregulated and 350 upregulated genes in NASH (Fig. 2 F–H). Intersection of these differentially expressed genes with a curated set of 807 PTMs-related genes yielded 13 candidate genes that were both PTMs-annotated and significantly altered in NASH (Fig. 2 I–J). Functional enrichment analysis using the GeneMANIA platform revealed that these candidate genes were significantly enriched in pathways related to ubiquitin protein ligase activity, mitophagy, and I-κB kinase/NF-κB signaling (Fig. 2 K). Machine learning–based prediction model for NASH Twelve machine learning algorithms were randomly combined to generate 134 algorithmic combinations, and a NASH prediction model was constructed based on 13 differentially expressed genes. Among all models, the LASSO + LDA combination demonstrated the best predictive performance, with an area under the ROC curve (AUC) of 0.932. The AUC values of this model in the Merge Cohort, GSE130970, GSE89632, and Meta Cohort were 0.944, 0.924, 0.950, and 0.911, respectively (Fig. 3 A). The optimal model ultimately identified 10 key PTMs-related genes, among which PELI2 exhibited the highest relative importance (Fig. 3 B–C). Across all four independent cohorts, the machine learning model consistently outperformed prediction models based on any single gene in distinguishing NASH from control samples (Fig. 3 D–G). SHAP interpretability analysis of key gene contributions SHAP-based interpretability analysis indicated that model predictions were primarily driven by genes associated with ubiquitination and inflammatory regulation, with PELI2, PELI1, ZNF521, TRIM56, and RNF152 showing the highest contributions (Fig. 4 A–B). Single-sample SHAP force plots demonstrated that individual genes exerted directional effects on model predictions. Specifically, PELI2 and RNF152 predominantly contributed negatively, with PELI2 exhibiting the strongest negative contribution (SHAP value = − 0.407), whereas TRIM56, GALNT14, and ZNF521 provided positive contributions in a subset of samples (Fig. 4 C–D). Furthermore, SHAP dependence analysis revealed a pronounced nonlinear relationship between PELI2 expression levels and SHAP values, highlighting its critical influence on model output (Fig. 4 E). PELI2, DUSP2, and TRIM56 were consistently identified as key genes across multiple models After systematic screening across twelve machine learning frameworks, PELI2, DUSP2, and TRIM56 were consistently identified as the most stable signature genes across models (Fig. 5 A). Protein–protein interaction network analysis revealed that these three genes were closely connected with IRAK1, TLR5, BCL10, and downstream NF-κB–related signaling components, suggesting their involvement in ubiquitination–inflammation regulatory networks (Fig. 5 B). Single-gene ROC analysis demonstrated that PELI2 achieved AUC values of 0.929, 0.849, 0.913, and 0.864 in the four independent cohorts, respectively, whereas the multi-gene prediction model exhibited superior predictive performance across all cohorts (Fig. 5 C–F). In addition, expression analyses across multiple cohorts consistently showed that PELI2 was significantly downregulated in NASH samples (Fig. 5 G–J). Finally, inter-gene correlation analysis indicated that PELI2, DUSP2, and TRIM56 were strongly associated with key lipid metabolism regulators and fibrosis-related markers (Fig. 5 K). Pathway characteristics and immune microenvironment alterations associated with key genes GSVA enrichment analysis demonstrated that the high-expression groups of these key genes were predominantly enriched in pathways related to cellular stress responses, metabolic regulation, and inflammatory signaling. In contrast, the low-expression groups were more strongly associated with canonical NASH-related pathological processes, including lipid metabolism dysregulation, chemokine signaling, and immune cell recruitment (Fig. 6 A–C). Integrated immune infiltration analyses using multiple algorithms further revealed a pronounced remodeling of the immune microenvironment, with several immune cell populations—such as M1/M2 macrophages, dendritic cells, and T-cell subsets—showing significant alterations that were highly correlated with the expression levels of the key genes (Fig. 6 D). Bubble plot analysis additionally indicated significant associations between the three key genes and the infiltration of core immune cell populations in NASH, including monocytes, inflammatory macrophages, and CD4⁺ T-cell subsets (Fig. 6 E–F). Key gene–based NMF molecular subtyping analysis To further characterize the molecular heterogeneity of NASH, a non-negative matrix factorization (NMF) clustering model was constructed based on the three key genes, which robustly identified two distinct molecular subtypes (C1 and C2) among NASH samples (Fig. 7 A–B). Clinical feature comparison revealed that patients in the C1 subtype exhibited significantly higher NAFLD activity scores (NAS) than those in the C2 subtype (Fig. 7 C). Further analysis showed that PELI2 expression was markedly downregulated in the C1 subtype (Fig. 7 D). Integrated immune infiltration analyses using multiple algorithms demonstrated that the C1 subtype was enriched with various immune effector cells, including macrophages, monocytes, dendritic cells, and activated T-cell subsets, whereas the C2 subtype displayed relatively lower immune infiltration levels (Fig. 7 E). GSVA further indicated significant differences between the two subtypes in both immune-related and metabolism-related pathway activities (Fig. 7 F). Consistently, analyses based on four immune infiltration algorithms showed that multiple immune cell populations, including T cells, macrophages, NK cells, and dendritic cells, were significantly more abundant in the C1 subtype compared with the C2 subtype ( Supplementary Fig. 1 ). PELI2 expression is significantly downregulated in NASH HE staining and Oil Red O staining confirmed that the collected liver specimens exhibited typical histological features of NASH (Fig. 8 A–B). At the mRNA level, the expression of PELI2 and DUSP2 was significantly decreased in liver tissues from patients with NASH, whereas TRIM56 expression was significantly increased compared with controls (Fig. 8 C–E, Supplementary Table 5 ). Subsequently, further validation was performed focusing on the most critical gene, PELI2. At the protein level, Western blot analysis demonstrated a marked reduction in PELI2 protein expression in NASH liver tissues (Fig. 8 F). Consistently, immunohistochemical staining further confirmed significantly decreased PELI2 expression in liver tissues from patients with NASH (Fig. 8 G). Effects of PELI2 on NASH To further clarify the functional role of PELI2 in NASH, HepG2 cells were transduced with lentiviral vectors to achieve PELI2 overexpression or knockdown. Compared with control cells, palmitic acid (PA) treatment markedly reduced PELI2 expression, whereas enforced expression of PELI2 partially reversed the PA-induced downregulation; in contrast, PELI2 silencing further exacerbated this reduction (Fig. 9 A–D). Functional assays demonstrated that PA treatment significantly increased intracellular lipid accumulation in HepG2 cells, while PELI2 overexpression markedly alleviated lipid deposition. Conversely, cells with PELI2 knockdown exhibited more pronounced lipid accumulation (Fig. 9 E–F). In parallel, PA exposure significantly enhanced the secretion of IL-6, and IL-8, TNF-α,whereas PELI2 overexpression substantially suppressed the release of these pro-inflammatory cytokines; by contrast, PELI2 knockdown further augmented their secretion ( Supplementary Fig. 2 ). Discussion Non-alcoholic steatohepatitis (NASH) is a metabolic liver disease characterized by hepatic lipid accumulation, chronic inflammation, and immune dysregulation. Its progression involves complex interactions between metabolic pathways and inflammatory signaling processes[ 29 – 31 ]. Previous studies have suggested that protein post-translational modifications (PTMs) play important roles in regulating lipid metabolism and inflammatory responses; however, the global expression patterns and potential roles of PTMs-related genes in NASH remain incompletely understood[ 4 , 32 , 33 ]. In this study, we integrated multiple transcriptomic datasets and applied machine learning approaches to systematically investigate PTMs-related genes associated with NASH. Integrated analysis across multiple cohorts showed that PTMs-related genes were mainly enriched in immune signaling pathways, inflammatory regulation, ubiquitin-mediated protein degradation, and mitochondrial-associated functions. These results suggest that, during NASH development, abnormalities in lipid metabolism and immune-inflammatory responses may interact through PTMs-related regulatory mechanisms, thereby contributing to disease progression. Using machine learning analysis, we constructed several predictive models and evaluated their stability in independent cohorts. Among these models, the combination of least absolute shrinkage and selection operator (LASSO) and linear discriminant analysis (LDA) demonstrated relatively favorable diagnostic performance. Model interpretability analysis indicated that PELI2 contributed substantially to model prediction. Further examination revealed that the association between PELI2 expression and model output was not strictly linear, suggesting that dysregulation of PELI2 may reflect more complex regulatory alterations occurring during NASH progression. Through cross-model validation, PELI2, DUSP2, and TRIM56 were consistently identified as NASH-associated candidate genes. Protein–protein interaction network analysis showed that these genes were functionally linked to immune-related signaling molecules such as IRAK1, TLR5, and BCL10, indicating their potential involvement in inflammation-related signaling pathways. Among these candidates, PELI2 exhibited relatively stable performance across different cohorts, suggesting its potential relevance in NASH-related pathological processes. Analysis of the hepatic immune microenvironment showed that the expression levels of PELI2, DUSP2, and TRIM56 were significantly correlated with the infiltration of macrophages, T cells, and dendritic cells. In addition, non-negative matrix factorization (NMF)–based molecular subtyping revealed immune heterogeneity among NASH samples, with one subtype characterized by higher immune cell infiltration and more severe disease activity. These findings suggest that alterations in PTMs-related gene expression may be associated with differences in immune status in the NASH liver. Functional experiments further supported a potential role for PELI2 in NASH. We observed that PELI2 expression was reduced at both the mRNA and protein levels in NASH liver tissues. In hepatocyte models, overexpression of PELI2 partially attenuated palmitic acid–induced lipid accumulation and reduced the release of pro-inflammatory cytokines, whereas PELI2 knockdown exacerbated lipid deposition and inflammatory responses. These results suggest that PELI2 may be involved in regulating lipid-induced inflammatory responses in hepatocytes. Several limitations of this study should be noted. First, the analyses were primarily based on transcriptomic data, and PTMs were not directly assessed at the protein modification level. Second, in vitro hepatocyte models cannot fully recapitulate the complexity of cellular composition and immune interactions in the NASH liver. Future studies incorporating proteomic analyses, animal models, and larger clinical cohorts will be necessary to further validate these findings. In summary, this study identified PELI2, DUSP2, and TRIM56 as PTMs-related genes associated with NASH through integrated multi-cohort analysis and experimental validation. Among these genes, PELI2 may be involved in the regulation of lipid accumulation and inflammatory responses. These findings provide a basis for further investigation into the role of PTMs-related mechanisms in NASH. Conclusion In this study, analysis of multi-cohort transcriptomic data using machine learning approaches highlighted PELI2, DUSP2, and TRIM56 as genes associated with nonalcoholic steatohepatitis. Among these features, PELI2 showed consistent downregulation in NASH liver tissues and demonstrated stable performance across independent cohorts. Experimental observations further indicated that modulation of PELI2 expression was associated with changes in palmitic acid–induced lipid accumulation and inflammatory cytokine secretion, including TNF-α, IL-6, and IL-8. Taken together, these results illustrate how machine learning–based feature analysis can contribute to the characterization of molecular patterns in NASH and support the relevance of PELI2 as a gene of interest within a systems-level framework. Declarations Ethics approval and consent to participate All research involving human data was conducted in accordance with the Declaration of Helsinki. All ethical issues related to this study were appropriately addressed. The study protocol was reviewed and approved by the Ethics Committee of the First Affiliated Hospital of Anhui Medical University (Approval No. Quick-PJ2024-05-43; Supplementary Figure 3). Written informed consent to participate was obtained from all participants prior to their inclusion in the study. Consent for publication Not applicable Availability of data and materials The datasets supporting the conclusions of this article are available in the GEO database (https://www.ncbi.nlm.nih.gov/geo/). Competing interests The authors declare that they have no competing interests. Funding This work was supported by the Anhui Provincial Department of Education (Grant No. 2023AH053301) and the Postgraduate Innovation Research and Practice Program of Anhui Medical University (Grant No. YJS20250021). Authors' contributions Chao Yu: Writing – original draft, Investigation, Conceptualization. Ronlin Xu and Jingjing Bai: Methodology, Data curation. Pengcheng Jia: Visualization, Validation. Jie Chen: Writing – original draft, Visualization, Validation, Methodology. Ruomu Ge and Zhen Zhang: Writing – original draft, Validation, Supervision, Investigation, Conceptualization. Acknowledgements The authors would like to thank the Clinical Research Ethics Committee of the First Affiliated Hospital of Anhui Medical University for ethical approval of this study. We also acknowledge the public databases and resources used in this work, including the Gene Expression Omnibus (GEO), GENEMANIA, GWAS summary statistics, and the FinnGen database, for providing access to high-quality data essential for this research. Clinical Trial Registration Clinical trial number: not applicable. References Quek J, Chan KE, Wong ZY, Tan C, Tan B, Lim WH, Tan DJH, Tang ASP, Tay P, Xiao J et al (2023) Global prevalence of non-alcoholic fatty liver disease and non-alcoholic steatohepatitis in the overweight and obese population: a systematic review and meta-analysis. Lancet Gastroenterol Hepatol 8:20–30 Lv JJ, Zhang YC, Li XY, Guo H, Yang CH (2024) The burden of non-alcoholic fatty liver disease among working-age people in the Western Pacific Region, 1990–2019: an age-period-cohort analysis of the Global Burden of Disease study. BMC Public Health 24:1852 Paternostro R, Trauner M (2022) Current treatment of non-alcoholic fatty liver disease. J Intern Med 292:190–204 Huby T, Gautier EL (2022) Immune cell-mediated features of non-alcoholic steatohepatitis. Nat Rev Immunol 22:429–443 Pouwels S, Sakran N, Graham Y, Leal A, Pintar T, Yang W, Kassir R, Singhal R, Mahawar K, Ramnarain D (2022) Non-alcoholic fatty liver disease (NAFLD): a review of pathophysiology, clinical management and effects of weight loss. BMC Endocr Disord 22:63 Leow WQ, Chan AW, Mendoza PGL, Lo R, Yap K, Kim H (2023) Non-alcoholic fatty liver disease: the pathologist's perspective. Clin Mol Hepatol 29:S302–s318 Grander C, Grabherr F, Tilg H (2023) Non-alcoholic fatty liver disease: pathophysiological concepts and treatment options. Cardiovasc Res 119:1787–1798 Fromenty B, Roden M (2023) Mitochondrial alterations in fatty liver diseases. J Hepatol 78:415–429 Flessa CM, Kyrou I, Nasiri-Ansari N, Kaltsas G, Papavassiliou AG, Kassi E, Randeva HS (2021) Endoplasmic Reticulum Stress and Autophagy in the Pathogenesis of Non-alcoholic Fatty Liver Disease (NAFLD): Current Evidence and Perspectives. Curr Obes Rep 10:134–161 Min Y, Zhang Y, Ji Y, Liu S, Guan C, Wei L, Yu H, Zhang Z (2025) Post-translational modifications in the pathophysiological process of metabolic dysfunction–associated steatotic liver disease. Cell Biosci 15:79 Reel PS, Reel S, Pearson E, Trucco E, Jefferson E (2021) Using machine learning approaches for multi-omics data analysis: A review. Biotechnol Adv 49:107739 Gomes MAS, Kovaleski JL, Pagani RN, da Silva VL (2022) Machine learning applied to healthcare: a conceptual review. J Med Eng Technol 46:608–616 Asnicar F, Thomas AM, Passerini A, Waldron L, Segata N (2024) Machine learning for microbiologists. Nat Rev Microbiol 22:191–205 Hu A (2023) Heterogeneous treatment effects analysis for social scientists: A review. Soc Sci Res 109:102810 Ye Z, Li Z, Zhong S, Xing Q, Li K, Sheng W, Shi X, Bao Y (2024) The recent two decades of traumatic brain injury: a bibliometric analysis and systematic review. Int J Surg 110:3745–3759 Ali S, Akhlaq F, Imran AS, Kastrati Z, Daudpota SM, Moosa M (2023) The enlightening role of explainable artificial intelligence in medical & healthcare domains: A systematic literature review. Comput Biol Med 166:107555 Allgaier J, Mulansky L, Draelos RL, Pryss R (2023) How does the model make predictions? A systematic literature review on the explainability power of machine learning in healthcare. Artif Intell Med 143:102616 Jin Y, Kattan MW (2023) Methodologic Issues Specific to Prediction Model Development and Evaluation. Chest 164:1281–1289 Franz M, Rodriguez H, Lopes C, Zuberi K, Montojo J, Bader GD, Morris Q (2018) GeneMANIA update 2018. Nucleic Acids Res 46:W60–w64 Zhou Y, Zhou B, Pache L, Chang M, Khodabakhshi AH, Tanaseichuk O, Benner C, Chanda SK (2019) Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat Commun 10:1523 Liu J, Li Y, Ma J, Wan X, Zhao M, Zhang Y, Shang D (2023) Identification and immunological characterization of lipid metabolism-related molecular clusters in nonalcoholic fatty liver disease. Lipids Health Dis 22:124 Zhou J, Huang J, Li Z, Song Q, Yang Z, Wang L, Meng Q (2023) Identification of aging-related biomarkers and immune infiltration characteristics in osteoarthritis based on bioinformatics analysis and machine learning. Front Immunol 14:1168780 Wang Z, Hu D, Pei G, Zeng R, Yao Y (2023) Identification of driver genes in lupus nephritis based on comprehensive bioinformatics and machine learning. Front Immunol 14:1288699 Zhu D, Zhang X, Fang Y, Xu Z, Yu Y, Zhang L, Yang Y, Li S, Wang Y, Jiang C, Huang D (2024) Identification of a lactylation-related gene signature as the novel biomarkers for early diagnosis of acute myocardial infarction. Int J Biol Macromol 282:137431 Ubago-Guisado E, Rodríguez-Barranco M, Ching-López A, Petrova D, Molina-Montes E, Amiano P, Barricarte-Gurrea A, Chirlaque MD, Agudo A, Sánchez MJ (2021) Evidence Update on the Relationship between Diet and the Most Common Cancers from the European Prospective Investigation into Cancer and Nutrition (EPIC) Study: A Systematic Review. Nutrients 13 Aran D (2020) Cell-Type Enrichment Analysis of Bulk Transcriptomes Using xCell. Methods Mol Biol 2120:263–276 Collings TJ, Bourne MN, Barrett RS, Meinders E, Gonçalves B, Shield AJ, Diamond LE (2025) Reconsidering Exercise Selection with EMG: Poor Agreement between Ranking Hip Exercises with Gluteal EMG and Muscle Force. Med Sci Sports Exerc 57:1829–1837 Liefeld T, Huang E, Wenzel AT, Yoshimoto K, Sharma AK, Sicklick JK, Mesirov JP, Reich M (2023) NMF Clustering: Accessible NMF-based Clustering Utilizing GPU Acceleration. J Bioinform Syst Biol 6:379–383 Sheka AC, Adeyi O, Thompson J, Hameed B, Crawford PA, Ikramuddin S (2020) Nonalcoholic Steatohepatitis: A Review. JAMA 323:1175–1183 Schuster S, Cabrera D, Arrese M, Feldstein AE (2018) Triggering and resolution of inflammation in NASH. Nat Rev Gastroenterol Hepatol 15:349–364 Park JS, Ma H, Roh YS (2021) Ubiquitin pathways regulate the pathogenesis of chronic liver disease. Biochem Pharmacol 193:114764 Ioannou GN (2016) The Role of Cholesterol in the Pathogenesis of NASH. Trends Endocrinol Metab 27:84–95 Milić S, Lulić D, Štimac D (2014) Non-alcoholic fatty liver disease and obesity: biochemical, metabolic and clinical presentations. World J Gastroenterol 20:9330–9337 Additional Declarations The authors declare no competing interests. Supplementary Files WBoriginalimage.docx SupplementaryTables.xls Supplementaryiamge.docx Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8782305","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":585447669,"identity":"638badea-3024-4f38-9eed-a837cdf9e30e","order_by":0,"name":"Chao Yu","email":"","orcid":"","institution":"The First Affiliated Hospital of Anhui Medical University","correspondingAuthor":false,"prefix":"","firstName":"Chao","middleName":"","lastName":"Yu","suffix":""},{"id":585447670,"identity":"c2bf8e0b-27c3-47b7-bf0e-a2879e27501c","order_by":1,"name":"Ronglin Xu","email":"","orcid":"","institution":"The First Affiliated Hospital of Anhui Medical University","correspondingAuthor":false,"prefix":"","firstName":"Ronglin","middleName":"","lastName":"Xu","suffix":""},{"id":585447671,"identity":"af66dede-b236-4bef-9c8c-bcfe135b3059","order_by":2,"name":"Jingjing Bai","email":"","orcid":"","institution":"Anhui University of Chinese Medicine","correspondingAuthor":false,"prefix":"","firstName":"Jingjing","middleName":"","lastName":"Bai","suffix":""},{"id":585447672,"identity":"ee26e263-a874-484d-8867-22f40db5e934","order_by":3,"name":"Pengcheng Jia","email":"","orcid":"","institution":"The First Affiliated Hospital of Anhui Medical University","correspondingAuthor":false,"prefix":"","firstName":"Pengcheng","middleName":"","lastName":"Jia","suffix":""},{"id":585447673,"identity":"2ef13321-3e29-49b6-be61-f98238269f9a","order_by":4,"name":"Jie Chen","email":"","orcid":"","institution":"The First Affiliated Hospital of Anhui Medical University","correspondingAuthor":false,"prefix":"","firstName":"Jie","middleName":"","lastName":"Chen","suffix":""},{"id":585447674,"identity":"a5b8ffdc-62b4-43fc-ad57-d0d86dd26030","order_by":5,"name":"Ruomu Ge","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA4klEQVRIiWNgGAWjYBADOfYGBgZmCDuBOC3GPAdI1ZLYQ7QWeffeZxIfcw6n90gkMH4uzDnMwM+eY8DwcwduLYZnjptJztx2OBeohVkayGCQ7HljwNh7Bo+WGWls0rxALfslEhhADAaDGzkGzIxteLTMfwbWks4DtOU3SIs9IS3yEmxgLQlALWwQWyQIaDHgSWO2nLkt3bCH52GbNe82oG1nnhUc7MVnS/sxxhsft1nL87AnH77Nu81ajr89eeODn/hsOQBjCSQ2gCgeEHEAq1qYLQ0wFj9edaNgFIyCUTCSAQAW40rbAydzswAAAABJRU5ErkJggg==","orcid":"","institution":"The First Affiliated Hospital of Anhui Medical University","correspondingAuthor":true,"prefix":"","firstName":"Ruomu","middleName":"","lastName":"Ge","suffix":""},{"id":585450983,"identity":"b296219e-aec3-4da5-9a61-c02227ffbd11","order_by":6,"name":"Zhen Zhang","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABEUlEQVRIiWNgGAWjYDAC5gMMBz/+seFhbG98AOQeAAtK4NXClsB4WLIhTY6557AB0VqYD/A2HDZmn5FMpBaDYzwGByR3HE7snfmYTYKh5k60wQHmg7d5GOzy8GopPJOeOHN2MlDLsWe5Gw6wJVvzMCQX49Ryv8fggASbdeLG2fnHJBjYDgO18JhJ8zAcSGzAZwsPG3Pi/puHgbb8A2nh/0ZYC2+bszHjDGY2CcY2sC1seLVIHmMrOCxxJk2OsSeZ2SKx73DuzMNsxpZzDJJxauE7xrz544cKUFQeZrzx4dvh3L7jzQ9vvKmww6lF4QCHAYzNIpEAopjBDsahHgjkG9gfwNjMH3CrGwWjYBSMgpEMABFoYobn46VAAAAAAElFTkSuQmCC","orcid":"","institution":"The First Affiliated Hospital of Anhui Medical University","correspondingAuthor":true,"prefix":"","firstName":"Zhen","middleName":"","lastName":"Zhang","suffix":""}],"badges":[],"createdAt":"2026-02-04 06:03:14","currentVersionCode":1,"declarations":{"humanSubjects":true,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":true,"humanSubjectConsent":true,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-8782305/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8782305/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":101943684,"identity":"dbb6af8f-8bbe-44cb-9c07-00aae254d353","added_by":"auto","created_at":"2026-02-05 09:42:51","extension":"jpg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":1411031,"visible":true,"origin":"","legend":"\u003cp\u003eSchematic overview of the study design\u003c/p\u003e","description":"","filename":"Figure1.tif.jpg","url":"https://assets-eu.researchsquare.com/files/rs-8782305/v1/57872adc568fdc89300b674b.jpg"},{"id":101941232,"identity":"7bdacc38-1350-4ee0-a842-31a182c5a0f2","added_by":"auto","created_at":"2026-02-05 09:20:10","extension":"jpg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":1818776,"visible":true,"origin":"","legend":"\u003cp\u003eIntegration of GEO datasets and identification of PTMs-related genes in NASH.(A, B) PCA plots showing the distribution of samples from the GSE126848 and GSE135251 datasets before and after batch effect correction.(C, D) PCA plots illustrating sample distributions of the four GEO datasets (GSE126848, GSE130970, GSE135251, and GSE89632) before and after batch effect removal.(E) Numbers of normal control and NASH samples across the individual datasets.(F) Volcano plot of differential expression analysis performed in the integrated training cohort, with genes screened using the criteria of |log₂FC| \u0026gt; 0.585 and \u003cem\u003eP\u003c/em\u003e \u0026lt; 0.05.(G) Heatmap showing the expression patterns of upregulated and downregulated differentially expressed genes in the training cohort.(H) Summary of differentially expressed genes identified in the training cohort, including 350 downregulated and 338 upregulated genes.(I) Venn diagram illustrating the overlap between PTMs-related genes and DEGs.(J) Bar plot displaying the log₂FC of the 13 overlapping genes, with downregulated genes shown in yellow and upregulated genes shown in blue.(K) PPI network of the 13 overlapping genes constructed using the GeneMANIA platform\u003c/p\u003e","description":"","filename":"Figure2.tif.jpg","url":"https://assets-eu.researchsquare.com/files/rs-8782305/v1/b3f972fe4b7f4e4ce162966b.jpg"},{"id":101943720,"identity":"f1fc9af3-d425-4422-b082-0808b50a1773","added_by":"auto","created_at":"2026-02-05 09:42:58","extension":"jpg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":1451530,"visible":true,"origin":"","legend":"\u003cp\u003eMachine learning models for NASH prediction and identification of key genes.(A) Heatmap of AUC values for 134 machine learning algorithm combinations across the Merge Cohort, GSE89632, GSE130970, and the Meta Cohort. The LASSO + LDA model achieved the highest performance and identified 10 feature genes.(B) Bar plot showing the number of genes selected by each algorithm, ranked from lowest to highest.(C) Bubble plot illustrating the variable importance of the 10 feature genes in the optimal model, with PELI2 showing the highest importance.(D–G) Bar plots showing the diagnostic performance of the 10 feature genes individually across the four cohorts\u003c/p\u003e","description":"","filename":"Figure3.tif.jpg","url":"https://assets-eu.researchsquare.com/files/rs-8782305/v1/a04dd6d52a6785aabe6c6c71.jpg"},{"id":101941234,"identity":"63e20a68-f924-4726-97f6-1714ce1e5369","added_by":"auto","created_at":"2026-02-05 09:20:11","extension":"jpg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":1403478,"visible":true,"origin":"","legend":"\u003cp\u003eSHAP-based interpretability analysis of the optimal machine learning model.(A) Beeswarm plot showing the distribution of SHAP values for each gene across samples.(B) Bar plot displaying the mean absolute SHAP values of individual genes.(C) Waterfall plot illustrating the contribution of each gene to the prediction outcome for a representative sample.(D) Force plot showing how individual gene contributions combine to produce the final prediction for a single sample.(E) Dependence plot depicting the relationship between gene expression levels and SHAP values, highlighting potential interactions between features\u003c/p\u003e","description":"","filename":"Figure4.tif.jpg","url":"https://assets-eu.researchsquare.com/files/rs-8782305/v1/1dc538de1d25328b7b104216.jpg"},{"id":101943662,"identity":"718fba80-3f9b-4eac-b031-e4167d8764fa","added_by":"auto","created_at":"2026-02-05 09:42:44","extension":"jpg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":1977334,"visible":true,"origin":"","legend":"\u003cp\u003eIdentification of key signature genes across machine learning models.(A) Selection of three robust feature genes (PELI2, DUSP2, and TRIM56) consistently identified across 12 machine learning algorithms.(B) PPI network of the three key genes constructed using the GeneMANIA platform.(C–F) ROC curves showing the diagnostic performance of the three key genes in distinguishing NASH from normal liver tissues across four cohorts (Merge Cohort, GSE89632, GSE130970, and the Meta Cohort).(G–J) Multi-cohort expression analysis demonstrating consistent downregulation of PELI2 in NASH.(K) Gene–gene correlation and network reconstruction analysis showing strong associations of PELI2, DUSP2, and TRIM56 with key lipid metabolism genes (SCD, ACACA, SREBF1) and fibrosis-related markers (COL1A1, CEBPB)\u003c/p\u003e","description":"","filename":"Figure5.tif.jpg","url":"https://assets-eu.researchsquare.com/files/rs-8782305/v1/da29f7f0210116652ba72f70.jpg"},{"id":101943577,"identity":"1e5e3984-f320-4a2e-82ff-ce792fce7e51","added_by":"auto","created_at":"2026-02-05 09:42:25","extension":"jpg","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":1183395,"visible":true,"origin":"","legend":"\u003cp\u003eGSEA/GSVA analyses associated with key genes.(A–C) GSEA bar plots illustrating the enrichment of biological pathways associated with the three key genes. Pathways are shown on the y-axis and t values on the x-axis; blue bars indicate upregulation, green bars indicate downregulation, and gray bars indicate no significant change.(D) GSVA heatmap showing associations between the expression levels of key genes and immune cell populations, with distinct differences observed between normal and NASH groups.(E–F) Bubble plots depicting significant correlations between the three key genes and infiltration levels of major immune cell populations in NASH\u003c/p\u003e","description":"","filename":"Figure6.tif.jpg","url":"https://assets-eu.researchsquare.com/files/rs-8782305/v1/6e59912ad1318a994a30e054.jpg"},{"id":101941238,"identity":"ef906974-d503-4b76-bfdb-e78f4ab77596","added_by":"auto","created_at":"2026-02-05 09:20:11","extension":"jpg","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":2769518,"visible":true,"origin":"","legend":"\u003cp\u003eNMF-based molecular subtyping of NASH patients according to disease severity.(A) NMF clustering analysis classifying NASH patients into two distinct clusters.(B) Clear separation of samples between the two clusters.(C) Comparison of NAS scores between the two clusters in the training cohort, showing higher scores in Cluster 1.(D) Differential expression of key genes between the two clusters.(E) GSEA heatmap illustrating significant differences in immune cell–related signatures between the two clusters.(F) GSVA bar plot showing significantly enriched pathways between the two clusters; pathways are displayed on the y-axis and t values on the x-axis, with blue indicating upregulation and green indicating downregulation\u003c/p\u003e","description":"","filename":"Figure7.tif.jpg","url":"https://assets-eu.researchsquare.com/files/rs-8782305/v1/7c89253f79f4e2dba5b06469.jpg"},{"id":101941242,"identity":"ce64c451-a8f1-42a3-bd0e-3de6e0ce991f","added_by":"auto","created_at":"2026-02-05 09:20:11","extension":"jpg","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":843029,"visible":true,"origin":"","legend":"\u003cp\u003eExperimental validation of key genes in NASH.(A, B) Representative HE staining and Oil Red O staining of liver tissues from normal controls and patients with NASH, showing typical pathological features (5× and 20×).(C–E) mRNA expression levels of key genes in liver tissues, showing downregulation of PELI2 and DUSP2 and upregulation of TRIM56 in NASH samples.(F, G) Western blot analysis confirming reduced PELI2 expression in NASH liver tissues and in palmitic acid–induced cellular models compared with controls, consistent with qPCR results.(H) IHC staining demonstrating decreased PELI2 expression in NASH liver tissues\u003c/p\u003e","description":"","filename":"Figure8.tif.jpg","url":"https://assets-eu.researchsquare.com/files/rs-8782305/v1/bff890bb374bc3a0cb736309.jpg"},{"id":102397181,"identity":"c939b61d-41c8-4599-9828-616641c58234","added_by":"auto","created_at":"2026-02-11 10:07:49","extension":"jpg","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":1355103,"visible":true,"origin":"","legend":"\u003cp\u003eFunctional validation of PELI2 in hepatocyte models.(A, B) PELI2 expression in HepG2 cells following transduction with PELI2 overexpression lentivirus, in the presence or absence of PA.(C, D) PELI2 expression in HepG2 cells following transduction with PELI2 knockdown lentivirus, in the presence or absence of PA.(E, F) Oil Red O staining revealed the extent of lipid accumulation in HepG2 cells treated with oePELI2, shPELI2, and PA (5× and 20× )\u003c/p\u003e","description":"","filename":"Figure9.tif.jpg","url":"https://assets-eu.researchsquare.com/files/rs-8782305/v1/2e9444bf5fd3e8104f959a14.jpg"},{"id":102398785,"identity":"1cf554d1-642d-4aef-a612-a3eacd97f9b7","added_by":"auto","created_at":"2026-02-11 10:28:58","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":15030445,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8782305/v1/4dc758be-a040-4eea-95fe-2b1aa2583b37.pdf"},{"id":101941243,"identity":"6d292330-8935-4fce-ab5a-c283b0cd9aac","added_by":"auto","created_at":"2026-02-05 09:20:11","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":7899010,"visible":true,"origin":"","legend":"","description":"","filename":"WBoriginalimage.docx","url":"https://assets-eu.researchsquare.com/files/rs-8782305/v1/0bcb86c4c571fd3662c74ba6.docx"},{"id":101941231,"identity":"8f790151-9e28-412f-97c9-eb7c342fad9b","added_by":"auto","created_at":"2026-02-05 09:20:10","extension":"xls","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":49087,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryTables.xls","url":"https://assets-eu.researchsquare.com/files/rs-8782305/v1/11fa1a7f7257d8f01ad64e09.xls"},{"id":101941233,"identity":"d8821b65-c447-45f3-9941-1390630a48d9","added_by":"auto","created_at":"2026-02-05 09:20:11","extension":"docx","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":647813,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementaryiamge.docx","url":"https://assets-eu.researchsquare.com/files/rs-8782305/v1/7bcd08265b7e2a220793884c.docx"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003e\u003cstrong\u003eA machine learning and SHAP-based systems analysis of post-translational modification–related genes in nonalcoholic steatohepatitis\u003c/strong\u003e\u003c/p\u003e","fulltext":[{"header":"Introduction","content":"\u003cp\u003eNon-alcoholic steatohepatitis (NASH) represents a progressive stage of non-alcoholic fatty liver disease (NAFLD) and is characterized by hepatic lipid accumulation, chronic inflammation, hepatocellular injury, and persistent fibrosis. It has emerged as one of the leading causes of end-stage liver disease and hepatocellular carcinoma worldwide [\u003cspan additionalcitationids=\"CR2\" citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. With the steadily increasing prevalence of NAFLD, approximately one-third of patients progress from simple steatosis to NASH, ultimately developing irreversible liver fibrosis or cirrhosis, thereby imposing a substantial global health burden [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e, \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. Although liver biopsy remains the gold standard for the diagnosis of NASH, its invasiveness, sampling variability, and inter-observer inconsistency considerably limit its routine clinical application, resulting in a large proportion of affected individuals remaining undiagnosed [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e, \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. Consequently, there is an urgent need to develop non-invasive molecular biomarkers to complement or replace biopsy-based diagnosis and to identify core therapeutic targets with translational potential.\u003c/p\u003e \u003cp\u003eAt the pathological level, lipotoxicity-induced multimodal programmed cell death\u0026mdash;including apoptosis, pyroptosis, necroptosis, autophagy-related cell death, and ferroptosis\u0026mdash;represents one of the key driving forces underlying the transition from NAFLD to NASH [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e, \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. Concurrently, mitochondrial dysfunction, endoplasmic reticulum stress, and excessive immune cell activation synergistically amplify inflammatory injury and promote fibrotic progression [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e, \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. Accumulating evidence indicates that protein post-translational modifications (PTMs) constitute a critical molecular layer regulating these pathological processes [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. By modulating protein stability, conformation, subcellular localization, and interaction networks, PTMs influence lipid metabolic homeostasis, mitochondrial respiratory activity, cell death thresholds, and immune-inflammatory signaling. Indeed, dysregulation of PTMs has emerged as a potential common denominator underlying multi-pathway disturbances in NASH[\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. However, systematic evaluations of how PTMs-related genes contribute to NASH progression, and which of these genes may serve as clinically actionable diagnostic or therapeutic targets, remain limited.\u003c/p\u003e \u003cp\u003eIIn the context of translational hospital-based research, machine learning approaches have become powerful tools for identifying disease-driving genes from high-dimensional transcriptomic data[\u003cspan additionalcitationids=\"CR12\" citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]. Feature reduction methods based on LASSO enable efficient selection of genes with predictive relevance[\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e], while supervised learning algorithms such as linear discriminant analysis (LDA) can further enhance model classification performance[\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]. Moreover, multi-algorithm integration strategies improve the robustness of feature selection and increase model applicability across independent cohorts. Nevertheless, conventional machine learning models alone often provide limited insight into the contribution of individual features to model predictions, leaving challenges in interpretability when applied to complex diseases.\u003c/p\u003e \u003cp\u003eWith the emergence of explainable artificial intelligence, particularly SHapley Additive exPlanations (SHAP) based on cooperative game theory, new approaches have become available to address these challenges [\u003cspan additionalcitationids=\"CR17\" citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e]. SHAP enables quantitative assessment of the independent contribution of each gene to model predictions, thereby highlighting PTMs-related genes with the greatest biological relevance in distinguishing NAFLD from NASH and allowing inference of the pathological pathways in which they may be involved. This transparent analytical framework facilitates closer integration of data-driven feature selection with mechanistic interpretation, helping to bridge the gap between statistical association and biological explanation.\u003c/p\u003e \u003cp\u003eBased on this background, we developed an analytical framework integrating 134 machine learning algorithms with a SHAP-based global interpretability approach to identify key regulatory factors driving NASH progression from genome-wide PTMs-related genes. Through cross-cohort validation, decomposition of feature contributions, and functional enrichment analyses, we established a non-invasive prediction model capable of accurately identifying patients with NASH. In parallel, our results highlight the involvement of core PTMs-related genes in lipotoxicity-induced cell death, inflammatory activation, mitochondrial dysfunction, and fibrotic progression, providing a basis for improved understanding of NASH molecular pathology and for the exploration of targeted therapeutic strategies.\u003c/p\u003e"},{"header":"Methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eData acquisition and preprocessing\u003c/h2\u003e \u003cp\u003eA total of four NASH-related liver tissue datasets (GSE126848, GSE135251, GSE89632, and GSE130970) were retrieved from the Gene Expression Omnibus (GEO) database. All datasets were converted into expression matrices based on their corresponding platform annotation files (\u003cb\u003eSupplementary Table\u0026nbsp;1\u003c/b\u003e). Batch effects across the four datasets were corrected using the ComBat algorithm implemented in the sva package. Subsequently, GSE126848 and GSE135251 were merged to form the Merge Cohort as the training set, while GSE89632, GSE130970, and the Meta Cohort (comprising all four liver tissue datasets) were used as validation sets.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eIdentification of differentially expressed genes and PTMs-related genes\u003c/h3\u003e\n\u003cp\u003eGene expression profiles between NASH samples and normal controls were compared in the training cohort. Differentially expressed genes (DEGs) were identified using the thresholds of |log₂FC| \u0026gt; 0.585 and adjusted \u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05. The identified DEGs were intersected with a curated PTMs-related gene set, which included genes involved in 20 types of post-translational modifications such as ubiquitination, acetylation, and phosphorylation (\u003cb\u003eSupplementary Table\u0026nbsp;2\u003c/b\u003e), to obtain key PTMs-related genes significantly altered in NASH. Protein\u0026ndash;protein interaction (PPI) analysis of the candidate PTMs-related genes was subsequently performed using the GeneMANIA platform to explore potential functional relationships among these genes [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e, \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e].\u003c/p\u003e\n\u003ch3\u003eMachine learning–based identification of key PTMs-related genes and model development\u003c/h3\u003e\n\u003cp\u003eA total of twelve machine learning algorithms were applied, including Random Forest (RF), LASSO, Elastic Net (Enet), Ridge regression, stepwise generalized linear model (stepGLM), support vector machine (SVM), glmBoost, linear discriminant analysis (LDA), partial least squares generalized linear regression (plsRglm), gradient boosting machine (GBM), XGBoost, and Naive Bayes. Using a full permutation strategy, 134 combinations of \u0026ldquo;feature selection\u0026thinsp;+\u0026thinsp;model construction\u0026rdquo; were generated, in which the first algorithm was used for variable selection and the second algorithm for predictive model development. Candidate PTMs-related differentially expressed genes (PTMs-DEGs) were iteratively screened in the training cohort to identify the most stable set of key PTMs-related genes. Model performance was evaluated in both the training cohort and independent validation cohorts (GSE89632, GSE130970, and the Meta Cohort) using the area under the receiver operating characteristic curve (ROC-AUC), concordance index (C-index), and confusion matrix metrics (accuracy, specificity, and sensitivity). The model with the best overall performance and highest cross-cohort consistency was selected for subsequent interpretability analyses.\u003c/p\u003e\n\u003ch3\u003eSHAP-based interpretability analysis\u003c/h3\u003e\n\u003cp\u003eSHapley Additive exPlanations (SHAP) was applied to interpret the optimal predictive model. SHAP values were used to quantify the contribution of individual genes to model outputs, assess gene\u0026ndash;gene interactions, evaluate the influence of gene expression thresholds on NASH risk (SHAP dependence plots), and identify gene-specific drivers of model predictions at the individual patient level (force plots) [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e, \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e]. This approach enabled an interpretable assessment of the contribution of key PTMs-related genes to NASH prediction.\u003c/p\u003e\n\u003ch3\u003eAnalysis of immune and metabolic characteristics\u003c/h3\u003e\n\u003cp\u003eGene set variation analysis (GSVA) was performed to assess pathway-level perturbations regulated by PTMs-related genes in NASH, including inflammation, lipid metabolism, apoptosis, and fibrosis-related pathways[\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]. Immune cell infiltration, encompassing macrophages, T cell subsets, NK cells, and dendritic cells, was evaluated using four complementary algorithms: single-sample gene set enrichment analysis (ssGSEA), CIBERSORT, Microenvironment Cell Populations-counter (MCPcounter), and Estimation of Proportions of Immune and Cancer cells (EPIC)[\u003cspan additionalcitationids=\"CR23 CR24 CR25\" citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e]. Spearman correlation analysis was conducted to examine associations between key PTMs-related genes and immune cell abundance, clinical parameters (BMI, NAS score, and total cholesterol), as well as inflammation- and fibrosis-related genes[\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e].\u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eNASH subtyping based on key PTMs-related genes\u003c/h2\u003e \u003cp\u003eNon-negative matrix factorization (NMF) was applied to cluster patients with NASH and identify molecular subtypes based on the expression profiles of key PTMs-related genes[\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e]. Differences between the identified subtypes were subsequently compared in terms of inflammatory activity, immune cell infiltration, lipid metabolism dysregulation, fibrosis severity, and signaling pathway perturbations to evaluate the stratification capacity of key PTMs-related genes.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eExperimental validation\u003c/h3\u003e\n\u003cp\u003eThis study was approved by the Ethics Committee of the First Affiliated Hospital of Anhui Medical University (Approval No. Quick-PJ2024-5-43). Human liver tissue samples were obtained from the Department of General Surgery of the First Affiliated Hospital of Anhui Medical University. Liver tissues were subjected to hematoxylin and eosin (HE) staining and Oil Red O staining (Beyotime, China) for histopathological evaluation.\u003c/p\u003e \u003cp\u003eHepG2 cells (provided by the Key Laboratory of Infectious Diseases of Anhui Province) were cultured in Dulbecco\u0026rsquo;s modified Eagle\u0026rsquo;s medium (DMEM) supplemented with 10% fetal bovine serum. The concentrations of TNF-α, IL-6, and IL-8 in the culture supernatants were determined using commercial ELISA kits (Beyotime, China).Total RNA was extracted from liver tissues using TRIzol reagent (Takara, China), followed by quantitative real-time PCR (qPCR). β-actin served as the internal reference gene, and relative mRNA expression levels were calculated using the 2^\u0026ndash;ΔΔCt method.\u003c/p\u003e \u003cp\u003eProtein expression was assessed by Western blotting using a PELI2 antibody (Proteintech, China). Proteins were separated by SDS\u0026ndash;PAGE, transferred onto membranes, and detected using enhanced chemiluminescence (ECL). To investigate the functional role of PELI2, lentiviral vectors for PELI2 overexpression and knockdown (GenePharma Genomics, China) were constructed, and cell transfection was performed according to the manufacturer\u0026rsquo;s instructions.\u003c/p\u003e \u003cp\u003eSteatosis, inflammation, and fibrosis in NASH liver tissues and cell models were evaluated by HE and Oil Red O staining. Immunohistochemistry (IHC) was performed to assess the expression of PELI2 in liver tissues. All experimental results were visualized and quantified using ImageJ software. Primer sequences and lentiviral constructs used in this study are provided in \u003cb\u003eSupplementary Tables\u0026nbsp;3 and 4\u003c/b\u003e, respectively.\u003c/p\u003e \u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003eStatistical analysis\u003c/h2\u003e \u003cp\u003eAll computational analyses were performed using R software (version 4.3.2) and the Strawberry Perl environment. Statistical analyses were conducted using GraphPad Prism 10. Continuous variables were analyzed using Student\u0026rsquo;s t test or non-parametric tests, as appropriate. A two-sided \u003cem\u003eP\u003c/em\u003e value\u0026thinsp;\u0026lt;\u0026thinsp;0.05 was considered statistically significant.* indicates P\u0026thinsp;\u0026lt;\u0026thinsp;0.05, ** indicates P\u0026thinsp;\u0026lt;\u0026thinsp;0.01, *** indicates P\u0026thinsp;\u0026lt;\u0026thinsp;0.001, and ns indicates no significance.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003e⸻\u003c/h2\u003e \u003cdiv id=\"Sec12\" class=\"Section3\"\u003e \u003ch2\u003eWorkflow chart\u003c/h2\u003e \u003cp\u003eThe workflow of this study is illustrated in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003c/div\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003eSample characteristics and differential gene expression analysis\u003c/h2\u003e \u003cp\u003eTo systematically characterize the PTMs-related gene landscape associated with NASH, four public transcriptomic datasets were integrated and subjected to batch effect correction (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eA\u0026ndash;D). The distribution of normal controls and NASH samples across cohorts is shown, with the Merge Cohort used as the training set and GSE89632, GSE130970, and the Meta Cohort used as independent validation sets (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eE). Differential expression analysis of the training cohort was performed using the thresholds of |log₂FC| \u0026gt; 0.585 and \u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05, identifying 338 downregulated and 350 upregulated genes in NASH (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eF\u0026ndash;H). Intersection of these differentially expressed genes with a curated set of 807 PTMs-related genes yielded 13 candidate genes that were both PTMs-annotated and significantly altered in NASH (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eI\u0026ndash;J). Functional enrichment analysis using the GeneMANIA platform revealed that these candidate genes were significantly enriched in pathways related to ubiquitin protein ligase activity, mitophagy, and I-κB kinase/NF-κB signaling (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eK).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eMachine learning\u0026ndash;based prediction model for NASH\u003c/p\u003e \u003cp\u003eTwelve machine learning algorithms were randomly combined to generate 134 algorithmic combinations, and a NASH prediction model was constructed based on 13 differentially expressed genes. Among all models, the LASSO\u0026thinsp;+\u0026thinsp;LDA combination demonstrated the best predictive performance, with an area under the ROC curve (AUC) of 0.932. The AUC values of this model in the Merge Cohort, GSE130970, GSE89632, and Meta Cohort were 0.944, 0.924, 0.950, and 0.911, respectively (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eA). The optimal model ultimately identified 10 key PTMs-related genes, among which PELI2 exhibited the highest relative importance (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eB\u0026ndash;C). Across all four independent cohorts, the machine learning model consistently outperformed prediction models based on any single gene in distinguishing NASH from control samples (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eD\u0026ndash;G).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eSHAP interpretability analysis of key gene contributions\u003c/p\u003e \u003cp\u003eSHAP-based interpretability analysis indicated that model predictions were primarily driven by genes associated with ubiquitination and inflammatory regulation, with PELI2, PELI1, ZNF521, TRIM56, and RNF152 showing the highest contributions (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eA\u0026ndash;B). Single-sample SHAP force plots demonstrated that individual genes exerted directional effects on model predictions. Specifically, PELI2 and RNF152 predominantly contributed negatively, with PELI2 exhibiting the strongest negative contribution (SHAP value\u0026thinsp;=\u0026thinsp;\u0026minus;\u0026thinsp;0.407), whereas TRIM56, GALNT14, and ZNF521 provided positive contributions in a subset of samples (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eC\u0026ndash;D). Furthermore, SHAP dependence analysis revealed a pronounced nonlinear relationship between PELI2 expression levels and SHAP values, highlighting its critical influence on model output (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eE).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003ePELI2, DUSP2, and TRIM56 were consistently identified as key genes across multiple models\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eAfter systematic screening across twelve machine learning frameworks, PELI2, DUSP2, and TRIM56 were consistently identified as the most stable signature genes across models (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eA). Protein\u0026ndash;protein interaction network analysis revealed that these three genes were closely connected with IRAK1, TLR5, BCL10, and downstream NF-κB\u0026ndash;related signaling components, suggesting their involvement in ubiquitination\u0026ndash;inflammation regulatory networks (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eB). Single-gene ROC analysis demonstrated that PELI2 achieved AUC values of 0.929, 0.849, 0.913, and 0.864 in the four independent cohorts, respectively, whereas the multi-gene prediction model exhibited superior predictive performance across all cohorts (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eC\u0026ndash;F). In addition, expression analyses across multiple cohorts consistently showed that PELI2 was significantly downregulated in NASH samples (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eG\u0026ndash;J). Finally, inter-gene correlation analysis indicated that PELI2, DUSP2, and TRIM56 were strongly associated with key lipid metabolism regulators and fibrosis-related markers (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eK).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003ePathway characteristics and immune microenvironment alterations associated with key genes\u003c/p\u003e \u003cp\u003eGSVA enrichment analysis demonstrated that the high-expression groups of these key genes were predominantly enriched in pathways related to cellular stress responses, metabolic regulation, and inflammatory signaling. In contrast, the low-expression groups were more strongly associated with canonical NASH-related pathological processes, including lipid metabolism dysregulation, chemokine signaling, and immune cell recruitment (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eA\u0026ndash;C). Integrated immune infiltration analyses using multiple algorithms further revealed a pronounced remodeling of the immune microenvironment, with several immune cell populations\u0026mdash;such as M1/M2 macrophages, dendritic cells, and T-cell subsets\u0026mdash;showing significant alterations that were highly correlated with the expression levels of the key genes (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eD). Bubble plot analysis additionally indicated significant associations between the three key genes and the infiltration of core immune cell populations in NASH, including monocytes, inflammatory macrophages, and CD4⁺ T-cell subsets (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eE\u0026ndash;F).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003eKey gene\u0026ndash;based NMF molecular subtyping analysis\u003c/h2\u003e \u003cp\u003eTo further characterize the molecular heterogeneity of NASH, a non-negative matrix factorization (NMF) clustering model was constructed based on the three key genes, which robustly identified two distinct molecular subtypes (C1 and C2) among NASH samples (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003eA\u0026ndash;B). Clinical feature comparison revealed that patients in the C1 subtype exhibited significantly higher NAFLD activity scores (NAS) than those in the C2 subtype (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003eC). Further analysis showed that PELI2 expression was markedly downregulated in the C1 subtype (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003eD). Integrated immune infiltration analyses using multiple algorithms demonstrated that the C1 subtype was enriched with various immune effector cells, including macrophages, monocytes, dendritic cells, and activated T-cell subsets, whereas the C2 subtype displayed relatively lower immune infiltration levels (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003eE). GSVA further indicated significant differences between the two subtypes in both immune-related and metabolism-related pathway activities (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003eF). Consistently, analyses based on four immune infiltration algorithms showed that multiple immune cell populations, including T cells, macrophages, NK cells, and dendritic cells, were significantly more abundant in the C1 subtype compared with the C2 subtype (\u003cb\u003eSupplementary Fig.\u0026nbsp;1\u003c/b\u003e).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003ePELI2 expression is significantly downregulated in NASH\u003c/h2\u003e \u003cp\u003eHE staining and Oil Red O staining confirmed that the collected liver specimens exhibited typical histological features of NASH (Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003eA\u0026ndash;B). At the mRNA level, the expression of PELI2 and DUSP2 was significantly decreased in liver tissues from patients with NASH, whereas TRIM56 expression was significantly increased compared with controls (Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003eC\u0026ndash;E, \u003cb\u003eSupplementary Table\u0026nbsp;5\u003c/b\u003e). Subsequently, further validation was performed focusing on the most critical gene, PELI2. At the protein level, Western blot analysis demonstrated a marked reduction in PELI2 protein expression in NASH liver tissues (Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003eF). Consistently, immunohistochemical staining further confirmed significantly decreased PELI2 expression in liver tissues from patients with NASH (Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003eG).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eEffects of PELI2 on NASH\u003c/p\u003e \u003cp\u003eTo further clarify the functional role of PELI2 in NASH, HepG2 cells were transduced with lentiviral vectors to achieve PELI2 overexpression or knockdown. Compared with control cells, palmitic acid (PA) treatment markedly reduced PELI2 expression, whereas enforced expression of PELI2 partially reversed the PA-induced downregulation; in contrast, PELI2 silencing further exacerbated this reduction (Fig.\u0026nbsp;\u003cspan refid=\"Fig9\" class=\"InternalRef\"\u003e9\u003c/span\u003eA\u0026ndash;D). Functional assays demonstrated that PA treatment significantly increased intracellular lipid accumulation in HepG2 cells, while PELI2 overexpression markedly alleviated lipid deposition. Conversely, cells with PELI2 knockdown exhibited more pronounced lipid accumulation (Fig.\u0026nbsp;\u003cspan refid=\"Fig9\" class=\"InternalRef\"\u003e9\u003c/span\u003eE\u0026ndash;F). In parallel, PA exposure significantly enhanced the secretion of IL-6, and IL-8, TNF-α,whereas PELI2 overexpression substantially suppressed the release of these pro-inflammatory cytokines; by contrast, PELI2 knockdown further augmented their secretion (\u003cb\u003eSupplementary Fig.\u0026nbsp;2\u003c/b\u003e).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003eNon-alcoholic steatohepatitis (NASH) is a metabolic liver disease characterized by hepatic lipid accumulation, chronic inflammation, and immune dysregulation. Its progression involves complex interactions between metabolic pathways and inflammatory signaling processes[\u003cspan additionalcitationids=\"CR30\" citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e]. Previous studies have suggested that protein post-translational modifications (PTMs) play important roles in regulating lipid metabolism and inflammatory responses; however, the global expression patterns and potential roles of PTMs-related genes in NASH remain incompletely understood[\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e, \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e, \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e]. In this study, we integrated multiple transcriptomic datasets and applied machine learning approaches to systematically investigate PTMs-related genes associated with NASH.\u003c/p\u003e \u003cp\u003eIntegrated analysis across multiple cohorts showed that PTMs-related genes were mainly enriched in immune signaling pathways, inflammatory regulation, ubiquitin-mediated protein degradation, and mitochondrial-associated functions. These results suggest that, during NASH development, abnormalities in lipid metabolism and immune-inflammatory responses may interact through PTMs-related regulatory mechanisms, thereby contributing to disease progression.\u003c/p\u003e \u003cp\u003eUsing machine learning analysis, we constructed several predictive models and evaluated their stability in independent cohorts. Among these models, the combination of least absolute shrinkage and selection operator (LASSO) and linear discriminant analysis (LDA) demonstrated relatively favorable diagnostic performance. Model interpretability analysis indicated that PELI2 contributed substantially to model prediction. Further examination revealed that the association between PELI2 expression and model output was not strictly linear, suggesting that dysregulation of PELI2 may reflect more complex regulatory alterations occurring during NASH progression.\u003c/p\u003e \u003cp\u003eThrough cross-model validation, PELI2, DUSP2, and TRIM56 were consistently identified as NASH-associated candidate genes. Protein\u0026ndash;protein interaction network analysis showed that these genes were functionally linked to immune-related signaling molecules such as IRAK1, TLR5, and BCL10, indicating their potential involvement in inflammation-related signaling pathways. Among these candidates, PELI2 exhibited relatively stable performance across different cohorts, suggesting its potential relevance in NASH-related pathological processes.\u003c/p\u003e \u003cp\u003eAnalysis of the hepatic immune microenvironment showed that the expression levels of PELI2, DUSP2, and TRIM56 were significantly correlated with the infiltration of macrophages, T cells, and dendritic cells. In addition, non-negative matrix factorization (NMF)\u0026ndash;based molecular subtyping revealed immune heterogeneity among NASH samples, with one subtype characterized by higher immune cell infiltration and more severe disease activity. These findings suggest that alterations in PTMs-related gene expression may be associated with differences in immune status in the NASH liver.\u003c/p\u003e \u003cp\u003eFunctional experiments further supported a potential role for PELI2 in NASH. We observed that PELI2 expression was reduced at both the mRNA and protein levels in NASH liver tissues. In hepatocyte models, overexpression of PELI2 partially attenuated palmitic acid\u0026ndash;induced lipid accumulation and reduced the release of pro-inflammatory cytokines, whereas PELI2 knockdown exacerbated lipid deposition and inflammatory responses. These results suggest that PELI2 may be involved in regulating lipid-induced inflammatory responses in hepatocytes.\u003c/p\u003e \u003cp\u003eSeveral limitations of this study should be noted. First, the analyses were primarily based on transcriptomic data, and PTMs were not directly assessed at the protein modification level. Second, in vitro hepatocyte models cannot fully recapitulate the complexity of cellular composition and immune interactions in the NASH liver. Future studies incorporating proteomic analyses, animal models, and larger clinical cohorts will be necessary to further validate these findings.\u003c/p\u003e \u003cp\u003eIn summary, this study identified PELI2, DUSP2, and TRIM56 as PTMs-related genes associated with NASH through integrated multi-cohort analysis and experimental validation. Among these genes, PELI2 may be involved in the regulation of lipid accumulation and inflammatory responses. These findings provide a basis for further investigation into the role of PTMs-related mechanisms in NASH.\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eIn this study, analysis of multi-cohort transcriptomic data using machine learning approaches highlighted PELI2, DUSP2, and TRIM56 as genes associated with nonalcoholic steatohepatitis. Among these features, PELI2 showed consistent downregulation in NASH liver tissues and demonstrated stable performance across independent cohorts. Experimental observations further indicated that modulation of PELI2 expression was associated with changes in palmitic acid\u0026ndash;induced lipid accumulation and inflammatory cytokine secretion, including TNF-α, IL-6, and IL-8. Taken together, these results illustrate how machine learning\u0026ndash;based feature analysis can contribute to the characterization of molecular patterns in NASH and support the relevance of PELI2 as a gene of interest within a systems-level framework.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eEthics approval and consent to participate\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAll research involving human data was conducted in accordance with the Declaration of Helsinki. All ethical issues related to this study were appropriately addressed. The study protocol was reviewed and approved by the Ethics Committee of the First Affiliated Hospital of Anhui Medical University (Approval No. Quick-PJ2024-05-43; Supplementary Figure 3). Written informed consent to participate was obtained from all participants prior to their inclusion in the study.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent for publication\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAvailability of data and materials\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe datasets supporting the conclusions of this article are available in the GEO database (https://www.ncbi.nlm.nih.gov/geo/).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare that they have no competing interests.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis work was supported by the Anhui Provincial Department of Education (Grant No. 2023AH053301) and the Postgraduate Innovation Research and Practice Program of Anhui Medical University (Grant No. YJS20250021).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthors\u0026apos; contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eChao Yu:\u0026nbsp;Writing \u0026ndash; original draft, Investigation, Conceptualization.\u0026nbsp;Ronlin Xu\u003c/p\u003e\n\u003cp\u003eand Jingjing Bai:\u0026nbsp;Methodology, Data curation.\u0026nbsp;Pengcheng Jia:\u0026nbsp;Visualization, Validation.\u0026nbsp;Jie Chen:\u0026nbsp;Writing \u0026ndash; original draft, Visualization, Validation, Methodology.\u0026nbsp;Ruomu Ge and Zhen Zhang:\u0026nbsp;Writing \u0026ndash; original draft, Validation, Supervision, Investigation, Conceptualization.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors would like to thank the Clinical Research Ethics Committee of the First Affiliated Hospital of Anhui Medical University for ethical approval of this study. We also acknowledge the public databases and resources used in this work, including the Gene Expression Omnibus (GEO), GENEMANIA, GWAS summary statistics, and the FinnGen database, for providing access to high-quality data essential for this research.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eClinical Trial Registration\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eClinical trial number: not applicable.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eQuek J, Chan KE, Wong ZY, Tan C, Tan B, Lim WH, Tan DJH, Tang ASP, Tay P, Xiao J et al (2023) Global prevalence of non-alcoholic fatty liver disease and non-alcoholic steatohepatitis in the overweight and obese population: a systematic review and meta-analysis. Lancet Gastroenterol Hepatol 8:20\u0026ndash;30\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLv JJ, Zhang YC, Li XY, Guo H, Yang CH (2024) The burden of non-alcoholic fatty liver disease among working-age people in the Western Pacific Region, 1990\u0026ndash;2019: an age-period-cohort analysis of the Global Burden of Disease study. BMC Public Health 24:1852\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePaternostro R, Trauner M (2022) Current treatment of non-alcoholic fatty liver disease. J Intern Med 292:190\u0026ndash;204\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHuby T, Gautier EL (2022) Immune cell-mediated features of non-alcoholic steatohepatitis. Nat Rev Immunol 22:429\u0026ndash;443\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePouwels S, Sakran N, Graham Y, Leal A, Pintar T, Yang W, Kassir R, Singhal R, Mahawar K, Ramnarain D (2022) Non-alcoholic fatty liver disease (NAFLD): a review of pathophysiology, clinical management and effects of weight loss. BMC Endocr Disord 22:63\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLeow WQ, Chan AW, Mendoza PGL, Lo R, Yap K, Kim H (2023) Non-alcoholic fatty liver disease: the pathologist's perspective. Clin Mol Hepatol 29:S302\u0026ndash;s318\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGrander C, Grabherr F, Tilg H (2023) Non-alcoholic fatty liver disease: pathophysiological concepts and treatment options. Cardiovasc Res 119:1787\u0026ndash;1798\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFromenty B, Roden M (2023) Mitochondrial alterations in fatty liver diseases. J Hepatol 78:415\u0026ndash;429\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFlessa CM, Kyrou I, Nasiri-Ansari N, Kaltsas G, Papavassiliou AG, Kassi E, Randeva HS (2021) Endoplasmic Reticulum Stress and Autophagy in the Pathogenesis of Non-alcoholic Fatty Liver Disease (NAFLD): Current Evidence and Perspectives. Curr Obes Rep 10:134\u0026ndash;161\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMin Y, Zhang Y, Ji Y, Liu S, Guan C, Wei L, Yu H, Zhang Z (2025) Post-translational modifications in the pathophysiological process of metabolic dysfunction\u0026ndash;associated steatotic liver disease. Cell Biosci 15:79\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eReel PS, Reel S, Pearson E, Trucco E, Jefferson E (2021) Using machine learning approaches for multi-omics data analysis: A review. Biotechnol Adv 49:107739\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGomes MAS, Kovaleski JL, Pagani RN, da Silva VL (2022) Machine learning applied to healthcare: a conceptual review. J Med Eng Technol 46:608\u0026ndash;616\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAsnicar F, Thomas AM, Passerini A, Waldron L, Segata N (2024) Machine learning for microbiologists. Nat Rev Microbiol 22:191\u0026ndash;205\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHu A (2023) Heterogeneous treatment effects analysis for social scientists: A review. Soc Sci Res 109:102810\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYe Z, Li Z, Zhong S, Xing Q, Li K, Sheng W, Shi X, Bao Y (2024) The recent two decades of traumatic brain injury: a bibliometric analysis and systematic review. Int J Surg 110:3745\u0026ndash;3759\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAli S, Akhlaq F, Imran AS, Kastrati Z, Daudpota SM, Moosa M (2023) The enlightening role of explainable artificial intelligence in medical \u0026amp; healthcare domains: A systematic literature review. Comput Biol Med 166:107555\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAllgaier J, Mulansky L, Draelos RL, Pryss R (2023) How does the model make predictions? A systematic literature review on the explainability power of machine learning in healthcare. Artif Intell Med 143:102616\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJin Y, Kattan MW (2023) Methodologic Issues Specific to Prediction Model Development and Evaluation. Chest 164:1281\u0026ndash;1289\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFranz M, Rodriguez H, Lopes C, Zuberi K, Montojo J, Bader GD, Morris Q (2018) GeneMANIA update 2018. Nucleic Acids Res 46:W60\u0026ndash;w64\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhou Y, Zhou B, Pache L, Chang M, Khodabakhshi AH, Tanaseichuk O, Benner C, Chanda SK (2019) Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat Commun 10:1523\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu J, Li Y, Ma J, Wan X, Zhao M, Zhang Y, Shang D (2023) Identification and immunological characterization of lipid metabolism-related molecular clusters in nonalcoholic fatty liver disease. Lipids Health Dis 22:124\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhou J, Huang J, Li Z, Song Q, Yang Z, Wang L, Meng Q (2023) Identification of aging-related biomarkers and immune infiltration characteristics in osteoarthritis based on bioinformatics analysis and machine learning. Front Immunol 14:1168780\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang Z, Hu D, Pei G, Zeng R, Yao Y (2023) Identification of driver genes in lupus nephritis based on comprehensive bioinformatics and machine learning. Front Immunol 14:1288699\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhu D, Zhang X, Fang Y, Xu Z, Yu Y, Zhang L, Yang Y, Li S, Wang Y, Jiang C, Huang D (2024) Identification of a lactylation-related gene signature as the novel biomarkers for early diagnosis of acute myocardial infarction. Int J Biol Macromol 282:137431\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eUbago-Guisado E, Rodr\u0026iacute;guez-Barranco M, Ching-L\u0026oacute;pez A, Petrova D, Molina-Montes E, Amiano P, Barricarte-Gurrea A, Chirlaque MD, Agudo A, S\u0026aacute;nchez MJ (2021) Evidence Update on the Relationship between Diet and the Most Common Cancers from the European Prospective Investigation into Cancer and Nutrition (EPIC) Study: A Systematic Review. \u003cem\u003eNutrients\u003c/em\u003e 13\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAran D (2020) Cell-Type Enrichment Analysis of Bulk Transcriptomes Using xCell. Methods Mol Biol 2120:263\u0026ndash;276\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCollings TJ, Bourne MN, Barrett RS, Meinders E, Gon\u0026ccedil;alves B, Shield AJ, Diamond LE (2025) Reconsidering Exercise Selection with EMG: Poor Agreement between Ranking Hip Exercises with Gluteal EMG and Muscle Force. Med Sci Sports Exerc 57:1829\u0026ndash;1837\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiefeld T, Huang E, Wenzel AT, Yoshimoto K, Sharma AK, Sicklick JK, Mesirov JP, Reich M (2023) NMF Clustering: Accessible NMF-based Clustering Utilizing GPU Acceleration. J Bioinform Syst Biol 6:379\u0026ndash;383\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSheka AC, Adeyi O, Thompson J, Hameed B, Crawford PA, Ikramuddin S (2020) Nonalcoholic Steatohepatitis: A Review. JAMA 323:1175\u0026ndash;1183\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSchuster S, Cabrera D, Arrese M, Feldstein AE (2018) Triggering and resolution of inflammation in NASH. Nat Rev Gastroenterol Hepatol 15:349\u0026ndash;364\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePark JS, Ma H, Roh YS (2021) Ubiquitin pathways regulate the pathogenesis of chronic liver disease. Biochem Pharmacol 193:114764\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eIoannou GN (2016) The Role of Cholesterol in the Pathogenesis of NASH. Trends Endocrinol Metab 27:84\u0026ndash;95\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMilić S, Lulić D, Štimac D (2014) Non-alcoholic fatty liver disease and obesity: biochemical, metabolic and clinical presentations. World J Gastroenterol 20:9330\u0026ndash;9337\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"First Affiliated Hospital of Anhui Medical University","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"NASH, PTMs, Machine learning, SHAP, Systems analysis","lastPublishedDoi":"10.21203/rs.3.rs-8782305/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8782305/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eNonalcoholic steatohepatitis (NASH) is a multifactorial metabolic liver disease characterized by marked clinical and molecular heterogeneity, which complicates disease characterization and patient stratification. Post-translational modification (PTM)\u0026ndash;related genes are known to participate in metabolic and inflammatory regulation; however, their system-level relevance in NASH remains insufficiently defined. In this study, transcriptomic data from liver tissues of patients with NASH were analyzed across multiple independent cohorts. Machine learning models incorporating SHapley Additive exPlanations (SHAP) were applied to evaluate PTM-related gene patterns associated with disease status, with model performance assessed through cross-validation and external datasets.\u003c/p\u003e \u003cp\u003eAmong the evaluated approaches, the combination of least absolute shrinkage and selection operator (LASSO) and linear discriminant analysis (LDA) showed the most consistent performance. This analysis repeatedly highlighted three PTM-related genes\u0026mdash;PELI2, DUSP2, and TRIM56\u0026mdash;that were associated with NASH across cohorts. Expression of these genes was related to inflammatory gene programs, lipid metabolism\u0026ndash;associated pathways, fibrosis-related markers, and variations in immune cell infiltration. Stratification based on their expression profiles further delineated molecular subgroups of NASH with distinct immune and metabolic characteristics.\u003c/p\u003e \u003cp\u003eOverall, this study provides a system-oriented characterization of PTM-related gene alterations in NASH and illustrates the utility of integrative analytical approaches for exploring molecular heterogeneity in complex metabolic diseases.\u003c/p\u003e","manuscriptTitle":"A machine learning and SHAP-based systems analysis of post-translational modification–related genes in nonalcoholic steatohepatitis","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-02-05 09:20:06","doi":"10.21203/rs.3.rs-8782305/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"acc4a05c-00bc-40a6-8095-b180159c6095","owner":[],"postedDate":"February 5th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":62285510,"name":"General Cell Biology \u0026 Physiology"}],"tags":[],"updatedAt":"2026-02-05T09:20:06+00:00","versionOfRecord":[],"versionCreatedAt":"2026-02-05 09:20:06","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8782305","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8782305","identity":"rs-8782305","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-05-20T11:00:21.680559+00:00
License: CC-BY-4.0