Interpretable Mortality Prediction Model for ICU Patients with Pneumonia: Using Shapley Additive Explanation Method

jiaxi li; yu zhang; shenyang he; yan tang

doi:10.21203/rs.3.rs-3757487/v1

Interpretable Mortality Prediction Model for ICU Patients with Pneumonia: Using Shapley Additive Explanation Method

jiaxi li, yu zhang, shenyang he, yan tang

2023 · doi:10.21203/rs.3.rs-3757487/v1

preprint OA: closed

Full text JSON View at publisher

⚙ AI-generated summary by claude@2026-06+body, 2026-06-08 ⓘ

This study developed an interpretable XGBoost model to predict pneumonia mortality in ICUs, identifying Aspartate Aminotransferase as the most crucial predictor.

One-sentence paraphrase of the abstract; not a substitute for reading it. No clinical advice. How this works

⚙ AI-generated deep summary by claude@2026-06, 2026-06-19 · read from full text ⓘ

This retrospective cohort study used electronic health records from the eICU-CRD (2014–2015) to develop an interpretable model predicting in-hospital mortality among 10,962 adult ICU patients with pneumonia, using the first 24 hours of ICU admission and 73 EHR-derived variables. An XGBoost model was trained on 70% of the data and validated on 30%, with performance evaluated by AUC; the model’s prognostic factors were interpreted using Shapley Additive Explanation (SHAP). The XGBoost model achieved an AUC of 0.732 ± 0.0065, outperforming traditional scoring systems and other machine-learning approaches by about 8 percentage points, and SHAP highlighted AST as the most crucial predictor, followed by age, albumin, and BMI. The paper does not explicitly state limitations in the provided text, but it relies on a specific dataset time window and on retrospective, first-24-hours data. This paper is centrally about endometriosis-related conditions? No—this paper does not explicitly discuss endometriosis or adenomyosis; it was included in the corpus via a keyword match in the upstream search index.

Read from the paper's body, not the abstract. Not a substitute for reading the paper. No clinical advice. How this works

Full text 113,794 characters · extracted from preprint-html · click to expand

Interpretable Mortality Prediction Model for ICU Patients with Pneumonia: Using Shapley Additive Explanation Method | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Interpretable Mortality Prediction Model for ICU Patients with Pneumonia: Using Shapley Additive Explanation Method jiaxi li, yu zhang, shenyang he, yan tang This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-3757487/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 13 Sep, 2024 Read the published version in BMC Pulmonary Medicine → Version 1 posted 13 You are reading this latest preprint version Abstract Background Pneumonia, a leading cause of morbidity and mortality worldwide, often necessitates Intensive Care Unit (ICU) admission. Accurate prediction of pneumonia mortality is crucial for tailored prevention and treatment plans. However, existing mortality prediction models face limited adoption in clinical practice due to their lack of interpretability. Objective This study aimed to develop an interpretable model for predicting pneumonia mortality in ICUs. Leveraging the Shapley Additive Explanation (SHAP) method, we sought to elucidate the Extreme Gradient Boosting (XGBoost) model and identify prognostic factors for pneumonia. Methods Conducted as a retrospective cohort study, we utilized electronic health records from the eICU-CRD (2014–2015) for all adult pneumonia patients. The first 24 hours of each ICU admission records were considered, with 70% of the dataset allocated for model training and 30% for validation. The XGBoost model was employed, and performance was assessed using the area under the receiver operating characteristic curve (AUC). The SHAP method provided insights into the XGBoost model. Results Among 10,962 pneumonia patients, in-hospital mortality was 16.33%. The XGBoost model demonstrated superior predictive performance (AUC: 0.732 ± 0.0065) compared to traditional scoring systems and other machine learning method, which achieved an improvement of 8% points. SHAP analysis identified Aspartate Aminotransferase (AST) as the most crucial predictor. Conclusions Interpretable predictive models enhance mortality risk assessment for pneumonia patients in the ICU, fostering transparency. AST emerged as the foremost predictor, followed by patient age, albumin, BMI et al. These insights, rooted in strong correlations with mortality, facilitate improved clinical decision-making and resource allocation. Pneumonia Interpretable model machine learning ICU Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Introduction Pneumonia is a common reason for patients to be admitted to the Intensive Care Unit (ICU) [ 1 ]. The probability of pneumonia patients experiencing in-hospital mortality is around 17% [ 2 ]. Making a complete and accurate prediction and evaluation of in-hospital mortality risk for patients using data from Electronic Health Records (EHR) is an urgent task in medicine, particularly for those admitted to the ICU. However, in real-life situations, it is not feasible to make a quick and effective diagnosis for pneumonia patients, considering factors such as the complexity of the patient's condition [ 3 ]. Therefore, it is meaningful to establish an accurate and efficient predictive model to assist doctors' decision-making. Traditional methods for assessing mortality risk, such as the Apache score [ 4 ] and Sofa score [ 5 ], have their respective advantages and limitations in early diagnosis. These methods rely heavily on doctors' close attention and long-term clinical experience. In recent years, artificial intelligence has been widely used to explore various disease warning and prediction factors [ 6 , 7 , 8 ]. Machine learning algorithms have powerful functions in capturing nonlinear relationships, and more research advocates the use of new predictive models based on machine learning to support clinical treatment of patients [ 9 ]. The purpose of this study is to develop an interpretable model based on the free and open eICU Collaborative Research Database (eICU-CRD) [ 10 ] to predict the risk of mortality in ICU pneumonia patients. Additionally, the Shapley Additive Explanation (SHAP) [ 11 ] algorithm is used to explore the prognostic factors of pneumonia by explaining the XGBoost [ 12 ] (extreme gradient boosting) predictive model. Methods Data Source: The eICU Collaborative Research Database (version 2.0) is a publicly available multicenter database [ 10 ], containing deidentified data from over 200,000 ICU admissions across 208 hospitals in the United States from 2014–2015. This database provides detailed demographic information, vital sign measurements, diagnostic information, laboratory test data and treatment information for all patients. Ethical Considerations The dataset used in this study is publicly available, and all protected health information has been de-identified. Therefore, there is no need for ethical approval or individual patient consent. Subject This study is based on the publicly available eICU dataset. In addition to using ICD-9-CM codes (481–487) to filter patients, we simultaneously employed the diagnostic keyword 'Pneumonia' as a screening criterion to ensure comprehensive and accurate patient selection [ 13 ]. Besides, aged 18 years or older, and excluded if they had a hospital stay of less than 1 day or were missing gender information. Patients with severe immune suppression, such as human immunodeficiency virus (HIV) infection or a neutrophil count of less than (×10 9 / L) [ 14 ], were also excluded. Patients were divided into two groups, the survival group, and the Expired group, based on their discharge outcome. Basic patient information, including age, gender, BMI index, Apache score, and length of ICU stay, was collected during their ICU stay for baseline analysis. The first laboratory test information, vital signs at admission, and other parameters after patients entered the ICU were extracted using the Open-source PostgreSQL management and development tool (PgAdmin) [ 15 ]. The processed data (including 73 variables) serves as input features for the machine learning model. To avoid overfitting of the machine learning model, the "Feature Importance" in the XGBoost algorithm [ 16 ] was used to evaluate the importance of each feature in the model and perform feature selection accordingly. Based on this, the data was divided into a training set and a test set in a 7:3 ratio. Missing Data Handling Missing data variables are common in eICU-CRD. However, ignoring missing data in the analysis may lead to biased results. Therefore, we adopt a random forest regression imputation method [ 17 ] to predict missing values by utilizing the correlations between multiple variables. This approach enables us to reduce the impact of missing values on the model as much as possible in a scientific and accurate manner. Dealing with Imbalanced Datasets ADASYN (Adaptive Synthetic Sampling) [ 18 ] is a machine learning algorithm used to address class imbalance in classification problems. In imbalanced datasets, there is a large difference in the number of samples between different classes, which can result in poor classification performance of the model for minority classes. ADASYN balances the dataset by increasing the number of samples of the minority class, thus improving the model's performance. Considering the imbalance between the survival group and the expired group and the impact on the accuracy of the prediction results, we applied the ADASYN method to oversample the training set, while the test set maintained the original sample ratio. Interpretability of Machine Learning The prediction model's explanation is achieved through SHAP, a unified method that accurately calculates the contribution and impact of each feature on the final prediction. The SHAP value can display the degree of contribution of each predictive variable to the target variable, whether it is a positive or negative contribution. Furthermore, SHAP values can explain each observation in the dataset through a specific set of SHAP values. [ 19 ]. Numerical Analysis and Machine Learning Models The statistical analysis and calculations in this study were performed using R software [ 20 ] and Python version 3.8.0[ 21 ]. Categorical variables are represented by total number and percentage, and intergroup differences are compared using the chi-square test. Continuous variables are represented by mean value and standard deviation, and differences between two groups are compared using the Wilcoxon rank-sum test [ 22 ]. Four machine learning models (XGBoost, logistic regression [ 23 ], random forest [ 24 ], and support vector machine [ 25 ]) were used in this study to develop the prediction model. The predictive performance of each model was evaluated using the area under the receiver operating characteristic curve. In addition, considering the recent popularity of neural network models, we have also exploited a multi-layer perceptron model (MLP), aiming to enhance the classification performance of the data [ 26 ]. We calculated accuracy, sensitivity, positive predictive value, negative predictive value, and F1 score for all test results. Results The eICU-CRD contained 16,146 patients with pneumonia. After applying the inclusion criteria, 10,962 adult patients were eligible for the study. The patient screening process is depicted in Fig. 1 . Table 1 presents a comparison of baseline information between the survival group and the non-survival group, and it can be seen that the mortality rate of severe pneumonia patients was 16.33% (1790/10962). Patients in the Non-survival group were older and had higher Apache scores than those in the survival group, with more significant fluctuations, and ICU length of stay was also increased (P < 0.001). Table 1 Patient inclusion and exclusion process Variable Level Survival (0) N = 9172 Non-survival (1) N = 1790 P < 0.001 Age (Mean (SD)) / 65.697(16.007) 71.446(13.684) < 0.001 Gender (%) 0 4343(47.35) 799(44.64) 0.0377 1 4829(52.65) 991(55.36) Underlying disease (%) CKD 0 7927(86.43) 1488(83.13) 0.0003 1 1245(13.57) 302(16.87) Diabetes 0 8897(97.00) 1488(83.13) 0.299 1 275(3.00) 45(2.51) Copd 0 7208(78.59) 1459(81.51) < 0.001 1 1964(21.41) 331(18.49) Sepsis 0 6758(73.68) 1227(68.55) < 0.001 1 2414(26.22) 563(31.45) Cerebrovascular 0 9042(98.58) 1739(97.15) < 0.001 1 130(1.42) 51(2.85) Cardiac 0 5124(55.87) 812(45.36) < 0.001 1 4048(44.13) 978(54.64) BMI (Mean (SD)) / 27.41(5.82) 26.43(5.782) < 0.001 Apache Score / 62.517 (23.930) 82.737 (30.688) < 0.001 The length of ICU stay(days) / 5.084 (5.667) 6.869 (7.720) < 0.001 The dataset was divided randomly into two parts, with 70% (n = 7673) of the data used for model training and 30% (n = 3289) used for model validation. For the training data, we employ the XGBoost model to compute Gini importance for feature selection. The Gini Importance values were computed by evaluating the average reduction in Gini impurity across all split nodes associated with each feature. The selection of these 37 features is indicative of their pronounced discriminatory and predictive capabilities within the decision tree framework, thereby enhancing the overall performance of the model. To address the sample imbalance, ADASYN oversampling was applied to the training data, resulting in a revised distribution of training samples. Originally, there were 6433 survival patients and 1240 non-survival patients in the training set. After resampling, the ratio of survival patients to non-survival patients were 6433:6170. Five models, namely XGBoost, LR, RF, SVM and MLP, were established using the training dataset. The models were then tested using respective testing datasets, yielding AUC values, accuracy, mean average precision (mAP), and F1 Score were calculated to facilitate comparison between the models. The AUC was selected as the primary evaluation metric for the model because it is robust to class imbalance, insensitive to classification thresholds, allows for comparison of performance across different classifiers, and intuitively presents the model's performance at various thresholds through the ROC curve (as shown in Fig. 2 ). Among the models, XGBoost achieved the highest predictive performance (AUC = 0.732 ± 0.0065). There has been a notable improvement in scoring results compared to traditional Apache (AUC = 0.677). The use of complex neural network models did not lead to an improvement in the accuracy of the model; instead, there was a decline in various aspects. we conducted multiple cross-validation tests using k-fold (k = 5) [ 27 ] to obtain the average performance results of the model (Table 2 ). Table 2 The classification results of different models. Model AUC Accuracy mAP F1 xgboost 0.732(0.0065) 0.759(0.0071) 0.257(0.0050) 0.355(0.0021) Random forest 0.677(0.0063) 0.692(0.0076) 0.288(0.0083) 0.322(0.0305) LR 0.643(0.0507) 0.576(0.0638) 0.533(0.0109) 0.411(0.0825) Svm 0.702(0.0068) 0.642(0.0115) 0.494(0.0587) 0.481(0.0352) MLP 0.683(0.0128) 0.696(0.0181) 0.405(0.0564) 0.389(0.0390) Considering the generalization of the model, we additionally conducted external validation using 549 randomly selected cases from MIMIC (Medical Information Mart for Intensive Care) following the same criteria. From the confusion matrix (Fig. 3 ), it can be observed that the XGBoost-based model performed the best, with an accuracy of 78.14%. To determine the importance of each predictive variable for the XGBoost model, the SHAP algorithm was utilized. The variable importance plot was generated, which listed the variables in descending order of importance. Among the conventional indicators, AST, which reflects the patient's metabolic condition, was the strongest predictor for all prediction periods. The average age, albumin, and BMI were also identified as strong predictive features. Figure 4 provides a visualization of these findings. To investigate the positive or negative relationship between predictive factors and the target outcome, SHAP values were utilized to identify death risk factors. Figure 5 illustrates the results, with the horizontal position indicating the effect of higher or lower predictions associated with the value, and the color indicating whether the variable is high (red) or low (blue) for the observation value. The findings suggest that an increase in average age has a positive effect on the prediction results and is associated with a higher risk of death, while an increase in AST has a negative effect on the prediction results and is associated with a higher morality, and lower BMI index means a better survival rate. To confirm the model's diagnostic accuracy in individuals, we chose a single patient for visual validation of the prediction model (all index values in the figure have been normalized). The patient was found to have developed sepsis during their ICU stay, was older in age, and other negative predictors for clinical outcomes (indicated in red on the graph). The final predicted result score was greater than 0 (shown in Fig. 6 ), which forecasted the outcome as expired and was in line with the actual result (the patient unit stay id was 1589468 in eICU database). Discussion Pneumonia has consistently been one of the leading causes of morbidity and mortality worldwide. The mortality rate of pneumonia is intricately associated with age, incidence, and the severity of the disease at the time of admission [ 28 ]. In this study, the focus was on developing an interpretable model for predicting mortality risk in ICU patients with pneumonia. The XGBoost model exhibited superior predictive performance, boasting an impressive AUC of 0.732 ± 0.0065, surpassing traditional scoring systems and other machine learning methods. External validation using MIMIC data further validated the model's classification prowess. The SHAP method played a pivotal role in elucidating the predictive factors influencing pneumonia mortality. Notably, AST emerged as the foremost predictor, shedding light on its critical role in prognosis. The study's findings emphasize the significance of interpretable predictive models in enhancing physicians' ability to accurately gauge the risk of mortality in ICU patients with pneumonia. Regarding methods for assessing the severity of pneumonia, such as the Pneumonia Severity Index (PSI) [ 29 ], CURB-65 score [ 30 ], and clinical scoring systems, such as the Sequential Organ Failure Assessment (SOFA) score [ 5 ], the Acute Physiology and Chronic Health Evaluation (APACHE-II) score [ 4 ] can assist clinical practitioners in evaluating patients' overall risk and predict disease progression or formulate treatment plans. However, it's important to note that numerous scoring systems rely on simplified criteria and parameters, potentially lacking a comprehensive reflection of an individual patient's condition. The applicability of these scoring systems might be constrained, especially for specific age groups or patients with particular medical conditions. Due to the inherent advantage of learning algorithms in capturing powerful nonlinear characteristics, machine learning predictive models are increasingly and widely applied to explore early warning signs and analyze risk factors in various diseases [ 31 – 32 ]. This application is aimed at supporting personalized treatment for patients. In this retrospective cohort study of a large-scale public ICU database, we developed and validated five machine learning algorithms to predict the mortality of patients with pneumonia. Compared to the traditional Apache scoring system (AUC 0.677), machine learning algorithms such as Random Forest had an AUC of 0.677, Support Vector Machine (SVM) had an AUC of 0.702, and Multi-Layer Perceptron (MLP) had an AUC of 0.683. In our study, the XGBoost model showed a better performance to predict the mortality of pneumonia with an AUC of 0.734 compared with others. In order to validate the efficacy of the model, we conducted external validation using data from MIMIC. Similarly, it is observed that XGBoost achieved the best classification performance. This also indicates that in clinical scenarios, simple machine learning models may be easier to train and less susceptible to overfitting. In our study, the risk analysis for pneumonia patients is just a simple binary classification model. Given the tabular data nature of the task, it is relatively straightforward, requiring no complex feature extraction and pattern recognition. The XGBoost model has proven competent and demonstrated better classification performance. This aligns with relevant literature perspectives and is consistent with our research findings [ 33 ]. Moreover, within the medical domain, there is frequently a heightened demand for interpretability in decision-making. Physicians and patients seek to comprehend the rationale behind the model's predictions. The integration of XGBoost with our SHAP algorithm facilitates a more accessible interpretation of the model's decision-making process, in contrast to the often-opaque nature of complex neural network models, which are commonly perceived as black-box models. Using the SHAP method to explain the XGBoost model ensured both its performance and clinical interpretability. This will help doctors better understand the model's decision-making process and promote the use of predictive results. In our study, we combined the basic clinical information, ventilator parameters, and laboratory characteristics of patients, which provided important vital sign parameters and information related to the severity of pneumonia. We found that age and AST were the most significant variables associated with in-hospital mortality among pneumonia patients. The mortality rate for severe pneumonia patients in the intensive care unit (ICU) was as high as 16.33%, while for patients in the ward, an increase in age was accompanied by an increase in mortality rate, with the average age of the deceased higher than that of the survivors. According to a study by Venceslau Pinto Hespanhol et al., the hospital mortality rate for pneumonia patients over 80 years of age rose sharply after the age of 60, reaching 38.5% after the age of 90 [ 34 ]. Previous research on severe pneumonia has found that age (> 65 years), severe leukopenia or leukocytosis, and bacteremia are risk factors for death [ 35 ]. The risk of death from pneumonia often increases with patient age and comorbidities, possibly due to systemic inflammation and decreased immunity [ 36 ] Excluding the factor of age in the patients' basic information, AST provides the most important information for predicting patient mortality. Although AST is well known as a marker of liver dysfunction, we found in our study that serum AST concentration is closely related to the mortality rate of severe pneumonia. Enveloped viruses such as SARS-CoV and HCoV-NL63 enter host cells through direct membrane fusion between the host cell surface receptor (ACE2) and the virus (spike protein), leading to the release of viral ssRNA genome into host cells. After the virus enters, ACE2 is down-regulated, leading to excessive ACE/Ang II activity, increased pulmonary vascular permeability, and subsequent lung injury [ 37 – 39 ]. However, the ACE2 receptor is also widely expressed in bile ducts and hepatic epithelial cells [ 40 ]. A recent study on the association between liver injury and markers of in-hospital mortality for 2019 coronavirus disease reported that these markers, particularly AST abnormalities and death, were diagnosed in conjunction with liver injury during hospitalization, in comparison to other indicators of liver injury [ 41 ]. This suggests that measuring serum AST concentration can serve as a valuable tool for predicting patient outcomes and assessing liver injury in cases of severe pneumonia. Therefore, AST provides critical information that healthcare providers should consider when monitoring patients with severe pneumonia. To date, none of the severity-of-illness assessment tools commonly used in critical care settings require the evaluation of serum albumin levels. Nevertheless, the relationship between hypoalbuminemia and the incidence, mortality, and length of hospital stay of patients in the ICU has been established [ 42 ]. Critically ill patients often exhibit acute-phase reactions, leading to a decrease in albumin levels due to changes in distribution. Concurrently, the synthesis or breakdown metabolism of albumin may also undergo alterations [ 43 ]. In previous studies, it was found that decreased serum albumin levels may serve as independent predictors of pneumonia in patients with acute ischemic stroke (AIS), especially in cases of mild stroke [ 44 ]. Additionally, the risk of pneumonia may exhibit an inverse correlation with the albumin level. Multiple studies have also identified an overall association between disease severity and obesity, as well as other metabolic risk factors, including diabetes and hypertension [ 45 – 47 ]. Viral pneumonia often necessitates invasive mechanical ventilation (IMV), imposing a significant strain on global intensive care resources. In a study conducted by Chetboun et al., it was observed that the need for IMV increased progressively with the body mass index (BMI) among viral pneumonia patients admitted to intensive care units (ICU) [ 48 ]. Similarly, recent experiences during the viral pneumonia pandemic suggest that the mortality rate for patients requiring invasive mechanical ventilation falls within the range of 35–50% [ 49 ]. Hraiech et al. noted a survival advantage in CAP patients who required mechanical ventilation within 72 hours of the onset of Community-Acquired Pneumonia (CAP) compared to those who required mechanical ventilation 4 or more days after the onset of CAP (28% vs. 51%, p = 0.03) [ 50 ]. Therefore, any delay in identifying severe illness, recognizing those at risk of mechanical ventilation, or needing ICU-level care, along with associated timely treatment, may have detrimental effects on the prognosis of severe CAP patients [ 51 ]. Conclusion In summary, this study successfully developed an interpretable XGBoost model for predicting pneumonia mortality in ICUs, offering improved performance compared to existing models. The SHAP method facilitated a deeper understanding of the model's decision-making process. These findings contribute to advancing individualized care strategies for pneumonia patients in intensive care settings. However, several limitations should be considered in interpreting the results of this study. First, the retrospective nature of the study using electronic health records introduces inherent biases and limitations related to data quality, completeness, and potential confounding variables. Addressing these issues requires ongoing efforts to improve data collection processes and incorporating real-world data. Moreover, the model's performance, while promising, may be subject to variations across different healthcare settings and patient populations. External validation using datasets from diverse sources and geographical locations is imperative to assess the generalizability of the model and ensure its applicability in varied clinical scenarios. In future research, incorporating real-world data and continuous model updates could enable the development of a dynamic prediction tool, responsive to evolving patient conditions and treatment protocols. Collaboration with clinicians is crucial for refining the model and ensuring its seamless integration into clinical workflows [ 52 – 54 ]. Declarations Ethics approval and consent to participate: The data utilized in this research was acquired from the publicly available eICU and MIMIC. Importantly, the data does not include personal patient information and thus does not necessitate ethical approval or individual patient consent. Consent for publication: All the authors’ names are included in the title page. The order of authors is accurate and has agreed by all authors. All authors have reviewed the final version of the manuscript and approve it for publication. Availability of data and materials: The data for this study is sourced from a publicly available dataset, Materials used in experiments are available upon request. Contact the corresponding author for further details. Competing interests: Not applicable Funding: Not applicable Author Contributions Statement: J. L, and Y.Z contributed equally. Y.T conceptualized the study. S.H carried out the collection. J.L and Y.Z analysis of the literature and data and drafted the manuscript. References Nair GB, Niederman MS. Updates on community acquired pneumonia management in the ICU. Pharmacol Ther. 2021;217:107663. 10.1016/j.pharmthera.2020.107663 . Epub 2020 Aug 15. PMID: 32805298; PMCID: PMC7428725. Ramirez JA, Wiemken TL, et al. Adults Hospitalized With Pneumonia in the United States: Incidence, Epidemiology, and Mortality. Clin Infect Dis. 2017;65(11):1806–12. https://doi.org/10.1093/cid/cix647 . Koenig SM, Truwit JD. Ventilator-associated pneumonia: diagnosis, treatment, and prevention[J]. Clin Microbiol Rev. 2006;19(4):637–57. Knaus WA, Draper EA, Wagner DP, et al. APACHE II: a severity of disease classification system[J]. Crit Care Med. 1985;13(10):818–29. Lambden S, Laterre PF, Levy MM, et al. The SOFA score—development, utility and challenges of accurate assessment in clinical trials[J]. Crit Care. 2019;23(1):1–9. Liu C, Wang X, Liu C, et al. Differentiating novel coronavirus pneumonia from general pneumonia based on machine learning[J]. Biomed Eng Online. 2020;19(1):1–14. Kim SY, Diggans J, Pankratz D, et al. Classification of usual interstitial pneumonia in patients with interstitial lung disease: assessment of a machine learning approach using high-dimensional transcriptional data[J]. Volume 3. The lancet Respiratory medicine; 2015. pp. 473–82. 6. Kuo KM, Talley PC, Huang CH, et al. Predicting hospital-acquired pneumonia among schizophrenic patients: a machine learning approach[J]. BMC Med Inf Decis Mak. 2019;19(1):1–8. Johnson AE, W, Ghassemi MM, Nemati S et al. Machine learning and decision support in critical care[J]. Proceedings of the IEEE, 2016, 104(2): 444–466. Pollard TJ, Johnson AEW, Raffa JD, et al. The eICU Collaborative Research Database, a freely available multi-center database for critical care research[J]. Sci data. 2018;5(1):1–13. Lundberg SM, Lee SI. A unified approach to interpreting model predictions[J]. Adv Neural Inf Process Syst, 2017, 30. Chen T, He T, Benesty M et al. Xgboost: extreme gradient boosting[J]. R package version 0.4-2, 2015, 1(4): 1–4. van de Garde EMW, Oosterheert JJ, Bonten M, et al. International classification of diseases codes showed modest sensitivity for detecting community-acquired pneumonia[J]. J Clin Epidemiol. 2007;60(8):834–8. Van Schyndel SJ, Carrier J, Bogado Pascottini O, et al. The effect of pegbovigrastim on circulating neutrophil count in dairy cattle: A randomized controlled trial[J]. PLoS ONE. 2018;13(6):e0198701. Robinson C. Basic introduction into pgAdmin III and SQL queries[J]. 2011. Ramraj S, Uzir N, Sunil R, et al. Experimenting XGBoost algorithm for prediction and classification of different datasets[J]. Int J Control Theory Appl. 2016;9(40):651–62. Tang F, Ishwaran H. Sci J. 2017;10(6):363–77. Random forest missing data algorithms[J]. Statistical Analysis and Data Mining: The ASA Data. He H, Bai Y, Garcia EA et al. ADASYN: Adaptive synthetic sampling approach for imbalanced learning[C]//2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence). IEEE, 2008: 1322–1328. Lundberg SM, Erion G, Chen H, et al. From local explanations to global understanding with explainable AI for trees[J]. Nat Mach Intell. 2020;2(1):56–67. Ihaka R, Gentleman R. R: a language for data analysis and graphics[J]. J Comput graphical Stat. 1996;5(3):299–314. Python W. Python[J]. Python Releases Wind, 2021, 24. Cuzick J. A Wilcoxon-type test for trend[J]. Stat Med. 1985;4(1):87–90. LaValley MP. Logistic regression[J] Circulation. 2008;117(18):2395–9. Rigatti SJ. Random forest[J]. J Insur Med. 2017;47(1):31–9. Suthaharan S, Suthaharan S. Support vector machine[J]. Machine learning models and algorithms for big data classification: thinking with examples for effective learning, 2016: 207–235. Taud H, Mas JF. Multilayer perceptron (MLP)[J]. Geomatic approaches for modeling land change scenarios, 2018: 451–455. Rodriguez JD, Perez A, Lozano JA. Sensitivity analysis of k-fold cross validation in prediction error estimation[J]. IEEE Trans Pattern Anal Mach Intell. 2009;32(3):569–75. Ito A, Ishida T, Tokumasu H et al. Prognostic factors in hospitalized community-acquired pneumonia: a retrospective study of a prospective observational cohort. BMC Pulm Med. 2017;17(1):78. Published 2017 May 2. 10.1186/s12890-017-0424-4 . Wang D, Willis DR, Yih Y. The pneumonia severity index: Assessment and comparison to popular machine learning classifiers. Int J Med Inform. 2022;163:104778. 10.1016/j.ijmedinf.2022.104778 1 . Patel S. Calculated decisions: CURB-65 score for pneumonia severity. Emerg Med Pract. 2021;23(Suppl 2):CD1-CD2. Published 2021 Feb 1. 1. Kang MW, Kim J, Kim DK, et al. Machine learning algorithm to predict mortality in patients undergoing continuous renal replacement therapy. Crit Care. 2020;24(1):42. 10.1186/s13054-020-2752-7 . Published 2020 Feb 6. Liu C, Liu X, Mao Z, Interpretable Machine Learning Model for Early Prediction of Mortality in ICU Patients with Rhabdomyolysis. Med Sci Sports Exerc., Grinsztajn L, Oyallon E, Varoquaux G et al. Why do tree-based models still outperform deep learning on typical tabular data? [J]. Advances in Neural Information Processing Systems, 2022, 35: 507–520. Grinsztajn L, Oyallon E, Varoquaux G. Why do tree-based models still outperform deep learning on typical tabular data?[J]. Adv Neural Inf Process Syst. 2022;35:507–20. Hespanhol V, Bárbara C. Pneumonia mortality, comorbidities matter? Pulmonology. 2020 May-Jun;26(3):123–129. 10.1016/j.pulmoe.2019.10.003 . Epub 2019 Nov 29. PMID: 31787563. Metersky ML, Waterer G, Nsa W et al. Predictors of in-hospital vs postdischarge mortality in pneumonia. Chest. 2012;142(2):476–481. 10.1378/chest.11-2393 . PMID: 22383662. Cillo´niz C, Polverino E, Ewig S, et al. Impact of age and comorbidity on cause and outcome in community-acquired pneumonia. Chest. 2013;144(3):999–1007. Malik YA. Properties of Coronavirus and SARS-CoV-2. Malays J Pathol. 2020;42(1):3–11. PMID: 32342926. Suryamohan K, Diwanji D, Stawiski EW, et al. Human ACE2 receptor polymorphisms and altered susceptibility to SARS-CoV-2. Commun Biol. 2021;4(1):475. 10.1038/s42003-021-02030-3 . PMID: 33846513; PMCID: PMC8041869. Bakhshandeh B, Sorboni SG, Javanmard AR, et al. Variants in ACE2; potential influences on virus infection and COVID-19 severity. Infect Genet Evol. 2021;90:104773. 10.1016/j.meegid.2021.104773 . Epub 2021 Feb 17. PMID: 33607284; PMCID: PMC7886638. Chai X, Hu L, Zhang Y et al. Specific ACE2 expression in cholangiocytes may cause liver damage after 2019-nCoV infection. bioRxiv 2020:2020.02.03.931766. Lei F, Liu YM, Zhou F et al. Longitudinal association between markers of liver injury and mortality in COVID-19 in China. Hepatology 2020:1.0.1002/hep.31301. Jellinge ME, Henriksen DP, Hallas P et al. Hypoalbuminemia is a strong predictor of 30-day all-cause mortality in acutely admitted medical patients: a prospective, observational, cohort study. PLoS One. 2014;9(8): e105983. Published 2014 Aug 22. 10.1371/journal.pone.0105983 . Nicholson JP, Wolmarans MR, Park GR. The role of albumin in critical illness. Br J Anaesth. 2000;85(4):599–610. 10.1093/bja/85.4.599 . Yang X, Wang L, Zheng L, et al. Serum Albumin as a Potential Predictor of Pneumonia after an Acute Ischemic Stroke. Curr Neurovasc Res. 2020;17(4):385–93. 10.2174/1567202617666200514120641 . [113] Williamson EJ, Walker AJ, Bhaskaran K, et al. Factors associated with COVID-19-related death using OpenSAFELY. Nature. 2020;584(7821):430–6. 10.1038/s41586-020-2521-4 . Truog RD, Mitchell C, Daley GQ. The Toughest Triage - Allocating Ventilators in a Pandemic. N Engl J Med. 2020;382(21):1973–5. 10.1056/NEJMp2005689 . High Prevalence of Obesity in Severe Acute Respiratory Syndrome. Coronavirus-2 (SARS-CoV-2) Requiring Invasive Mechanical Ventilation. Obes (Silver Spring). 2020;28(10):1994. 10.1002/oby.23006 . Chetboun M, Raverdy V, Labreuche J, et al. BMI and pneumonia outcomes in critically ill covid-19 patients: An international multicenter study. Obes (Silver Spring). 2021;29(9):1477–86. 10.1002/oby.23223 . Richardson S, Hirsch JS, Narasimhan M et al. Presenting Characteristics, Comorbidities, and Outcomes Among 5700 Patients Hospitalized With COVID-19 in the New York City Area [published correction appears in JAMA. 2020;323(20):2098]. JAMA. 2020;323(20):2052–2059. 10.1001/jama.2020.6775 . Hraiech S, Alingrin J, Dizier S et al. Time to intubation is associated with outcome in patients with community-acquired pneumonia. PLoS One. 2013;8(9): e74937. Published 2013 Sep 19. 10.1371/journal.pone.0074937 . Restrepo MI, Mortensen EM, Rello J, Brody J, Anzueto A. Late admission to the ICU in patients with community-acquired pneumonia is associated with higher mortality. Chest. 2010;137(3):552–7. 10.1378/chest.09-1547 . Luo C, et al. A machine learning-based risk stratification tool for in-hospital mortality of intensive care unit patients with heart failure. J Translational Med. 2022;20(1):136. Huang T, et al. Machine learning for prediction of in-hospital mortality in lung cancer patients admitted to intensive care unit. PLoS ONE. 2023;18(1):e0280606. Wang B, et al. Novel pneumonia score based on a machine learning model for predicting mortality in pneumonia patients on admission to the intensive care unit. Respir Med. 2023;217:107363. Additional Declarations No competing interests reported. Cite Share Download PDF Status: Published Journal Publication published 13 Sep, 2024 Read the published version in BMC Pulmonary Medicine → Version 1 posted Editorial decision: Revision requested 23 Jul, 2024 Reviews received at journal 19 Jul, 2024 Reviewers agreed at journal 12 Jul, 2024 Reviewers agreed at journal 10 Jul, 2024 Reviews received at journal 17 Apr, 2024 Reviews received at journal 08 Apr, 2024 Reviewers agreed at journal 03 Apr, 2024 Reviewers agreed at journal 16 Mar, 2024 Reviewers invited by journal 15 Dec, 2023 Editor assigned by journal 15 Dec, 2023 Editor invited by journal 15 Dec, 2023 Submission checks completed at journal 15 Dec, 2023 First submitted to journal 15 Dec, 2023 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-3757487","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":260558930,"identity":"a46a1e40-ba7f-4eff-8d50-0b36d7017c06","order_by":0,"name":"jiaxi li","email":"","orcid":"","institution":"Jinniu Maternity and Child Health Hospital of Chengdu","correspondingAuthor":false,"prefix":"","firstName":"jiaxi","middleName":"","lastName":"li","suffix":""},{"id":260558931,"identity":"d0c190f6-bd1d-406a-93a0-9cf382d0cf89","order_by":1,"name":"yu zhang","email":"","orcid":"","institution":"Sichuan University","correspondingAuthor":false,"prefix":"","firstName":"yu","middleName":"","lastName":"zhang","suffix":""},{"id":260558932,"identity":"242f96ea-f7d5-4bb1-b9cb-c0c7e52322f7","order_by":2,"name":"shenyang he","email":"","orcid":"","institution":"Jinniu Maternity and Child Health Hospital of Chengdu","correspondingAuthor":false,"prefix":"","firstName":"shenyang","middleName":"","lastName":"he","suffix":""},{"id":260558933,"identity":"bfc7adf4-8085-4ade-a522-45456880767c","order_by":3,"name":"yan tang","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAy0lEQVRIie3RMQrCMBSA4RcK7fLcIwq9QosgDgEP4pIpU51cC6Z7wLWih7A3SAnUpQeos6tCTiAWXNyaboL55vy89wiA5/2gODpYbZHuD5F0TFKl07qcM3JU2nVMxxcGmSCXjjsWRHJuMDMB3O5VBznbDCYRaF2fWxOSk9itoBFbOTilKKR+KoPBLFtSIs1wAv1KevIyNJy2rkkTQo0oEqTomKQKoS6RcYr9Ldzlljh+BLb/Sr6+mqqzOXNY7EtC+Zjnn2Rs4Xme9x/eb8FDJq3E+qkAAAAASUVORK5CYII=","orcid":"","institution":"Jinniu Maternity and Child Health Hospital of Chengdu","correspondingAuthor":true,"prefix":"","firstName":"yan","middleName":"","lastName":"tang","suffix":""}],"badges":[],"createdAt":"2023-12-15 08:14:10","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-3757487/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-3757487/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1186/s12890-024-03252-x","type":"published","date":"2024-09-13T15:58:05+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":48487948,"identity":"3d273290-e8c2-425c-92fe-656a7dc0e37e","added_by":"auto","created_at":"2023-12-19 20:02:01","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":15311,"visible":true,"origin":"","legend":"\u003cp\u003ePatient inclusion and exclusion process\u003c/p\u003e","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-3757487/v1/7f1763e2b3783be26dd2f1cf.png"},{"id":48488571,"identity":"fc9f5783-18b4-4f2d-9497-042c7ad0a539","added_by":"auto","created_at":"2023-12-19 20:10:01","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":120419,"visible":true,"origin":"","legend":"\u003cp\u003eReceiver Operating Characteristic Curve of different models.\u003c/p\u003e","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-3757487/v1/b23995fc62b3d4e5ec85abbb.png"},{"id":48488572,"identity":"90420f41-490e-4f38-bdde-0db7d67c472d","added_by":"auto","created_at":"2023-12-19 20:10:01","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":14397,"visible":true,"origin":"","legend":"\u003cp\u003eExternal validation: The confusion matrix of different models\u003c/p\u003e","description":"","filename":"Onlinefloatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-3757487/v1/568e7dbafc2574fafac5d0c8.png"},{"id":48487950,"identity":"e2c628f6-cb60-46e2-9f83-161a33d366cf","added_by":"auto","created_at":"2023-12-19 20:02:01","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":60874,"visible":true,"origin":"","legend":"\u003cp\u003eThe weights of variables importance calculated by SHAP method\u003c/p\u003e","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-3757487/v1/b79b223199ebf3357d24aa5b.png"},{"id":48487947,"identity":"4469e679-7a60-4241-be80-febe0a7c133a","added_by":"auto","created_at":"2023-12-19 20:02:01","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":340722,"visible":true,"origin":"","legend":"\u003cp\u003eSHAP values to show the distribution of the impacts each feature has on the model\u003c/p\u003e","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-3757487/v1/69adf8936a89795a351acd3f.png"},{"id":48487951,"identity":"3d54f178-53e4-4f26-a116-a68ee825b516","added_by":"auto","created_at":"2023-12-19 20:02:01","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":150015,"visible":true,"origin":"","legend":"\u003cp\u003eVisualization of individual classification decision process\u003c/p\u003e","description":"","filename":"floatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-3757487/v1/c69ababee029a473f9a613ea.png"},{"id":64619195,"identity":"56efec91-f22b-497e-9cf9-406d7c233387","added_by":"auto","created_at":"2024-09-16 16:12:27","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1118739,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-3757487/v1/540842c2-f23c-4f0f-85f2-fe943b2a90ae.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Interpretable Mortality Prediction Model for ICU Patients with Pneumonia: Using Shapley Additive Explanation Method","fulltext":[{"header":"Introduction","content":"\u003cp\u003ePneumonia is a common reason for patients to be admitted to the Intensive Care Unit (ICU) [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. The probability of pneumonia patients experiencing in-hospital mortality is around 17% [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. Making a complete and accurate prediction and evaluation of in-hospital mortality risk for patients using data from Electronic Health Records (EHR) is an urgent task in medicine, particularly for those admitted to the ICU. However, in real-life situations, it is not feasible to make a quick and effective diagnosis for pneumonia patients, considering factors such as the complexity of the patient's condition [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. Therefore, it is meaningful to establish an accurate and efficient predictive model to assist doctors' decision-making.\u003c/p\u003e \u003cp\u003eTraditional methods for assessing mortality risk, such as the Apache score [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e] and Sofa score [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e], have their respective advantages and limitations in early diagnosis. These methods rely heavily on doctors' close attention and long-term clinical experience. In recent years, artificial intelligence has been widely used to explore various disease warning and prediction factors [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e, \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. Machine learning algorithms have powerful functions in capturing nonlinear relationships, and more research advocates the use of new predictive models based on machine learning to support clinical treatment of patients [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThe purpose of this study is to develop an interpretable model based on the free and open eICU Collaborative Research Database (eICU-CRD) [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e] to predict the risk of mortality in ICU pneumonia patients. Additionally, the Shapley Additive Explanation (SHAP) [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e] algorithm is used to explore the prognostic factors of pneumonia by explaining the XGBoost [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e] (extreme gradient boosting) predictive model.\u003c/p\u003e"},{"header":"Methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eData Source:\u003c/h2\u003e \u003cp\u003eThe eICU Collaborative Research Database (version 2.0) is a publicly available multicenter database [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e], containing deidentified data from over 200,000 ICU admissions across 208 hospitals in the United States from 2014\u0026ndash;2015. This database provides detailed demographic information, vital sign measurements, diagnostic information, laboratory test data and treatment information for all patients.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003eEthical Considerations\u003c/h2\u003e \u003cp\u003eThe dataset used in this study is publicly available, and all protected health information has been de-identified. Therefore, there is no need for ethical approval or individual patient consent.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003eSubject\u003c/h2\u003e \u003cp\u003eThis study is based on the publicly available eICU dataset. In addition to using ICD-9-CM codes (481\u0026ndash;487) to filter patients, we simultaneously employed the diagnostic keyword 'Pneumonia' as a screening criterion to ensure comprehensive and accurate patient selection [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]. Besides, aged 18 years or older, and excluded if they had a hospital stay of less than 1 day or were missing gender information. Patients with severe immune suppression, such as human immunodeficiency virus (HIV) infection or a neutrophil count of less than (\u0026times;10\u003csup\u003e9\u003c/sup\u003e/ L) [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e], were also excluded. Patients were divided into two groups, the survival group, and the Expired group, based on their discharge outcome.\u003c/p\u003e \u003cp\u003eBasic patient information, including age, gender, BMI index, Apache score, and length of ICU stay, was collected during their ICU stay for baseline analysis. The first laboratory test information, vital signs at admission, and other parameters after patients entered the ICU were extracted using the Open-source PostgreSQL management and development tool (PgAdmin) [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]. The processed data (including 73 variables) serves as input features for the machine learning model. To avoid overfitting of the machine learning model, the \"Feature Importance\" in the XGBoost algorithm [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e] was used to evaluate the importance of each feature in the model and perform feature selection accordingly. Based on this, the data was divided into a training set and a test set in a 7:3 ratio.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003eMissing Data Handling\u003c/h2\u003e \u003cp\u003eMissing data variables are common in eICU-CRD. However, ignoring missing data in the analysis may lead to biased results. Therefore, we adopt a random forest regression imputation method [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e] to predict missing values by utilizing the correlations between multiple variables. This approach enables us to reduce the impact of missing values on the model as much as possible in a scientific and accurate manner.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003eDealing with Imbalanced Datasets\u003c/h2\u003e \u003cp\u003eADASYN (Adaptive Synthetic Sampling) [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e] is a machine learning algorithm used to address class imbalance in classification problems. In imbalanced datasets, there is a large difference in the number of samples between different classes, which can result in poor classification performance of the model for minority classes. ADASYN balances the dataset by increasing the number of samples of the minority class, thus improving the model's performance. Considering the imbalance between the survival group and the expired group and the impact on the accuracy of the prediction results, we applied the ADASYN method to oversample the training set, while the test set maintained the original sample ratio.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eInterpretability of Machine Learning\u003c/h2\u003e \u003cp\u003eThe prediction model's explanation is achieved through SHAP, a unified method that accurately calculates the contribution and impact of each feature on the final prediction. The SHAP value can display the degree of contribution of each predictive variable to the target variable, whether it is a positive or negative contribution. Furthermore, SHAP values can explain each observation in the dataset through a specific set of SHAP values. [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e].\u003c/p\u003e \u003cdiv id=\"Sec9\" class=\"Section3\"\u003e \u003ch2\u003eNumerical Analysis and Machine Learning Models\u003c/h2\u003e \u003cp\u003eThe statistical analysis and calculations in this study were performed using R software [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e] and Python version 3.8.0[\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]. Categorical variables are represented by total number and percentage, and intergroup differences are compared using the chi-square test. Continuous variables are represented by mean value and standard deviation, and differences between two groups are compared using the Wilcoxon rank-sum test [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eFour machine learning models (XGBoost, logistic regression [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e], random forest [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e], and support vector machine [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e]) were used in this study to develop the prediction model. The predictive performance of each model was evaluated using the area under the receiver operating characteristic curve. In addition, considering the recent popularity of neural network models, we have also exploited a multi-layer perceptron model (MLP), aiming to enhance the classification performance of the data [\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e]. We calculated accuracy, sensitivity, positive predictive value, negative predictive value, and F1 score for all test results.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e"},{"header":"Results","content":"\u003cp\u003eThe eICU-CRD contained 16,146 patients with pneumonia. After applying the inclusion criteria, 10,962 adult patients were eligible for the study. The patient screening process is depicted in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e presents a comparison of baseline information between the survival group and the non-survival group, and it can be seen that the mortality rate of severe pneumonia patients was 16.33% (1790/10962). Patients in the Non-survival group were older and had higher Apache scores than those in the survival group, with more significant fluctuations, and ICU length of stay was also increased (P\u0026thinsp;\u0026lt;\u0026thinsp;0.001).\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003ePatient inclusion and exclusion process\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eVariable\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLevel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSurvival (0)\u003c/p\u003e \u003cp\u003eN\u0026thinsp;=\u0026thinsp;9172\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eNon-survival (1)\u003c/p\u003e \u003cp\u003eN\u0026thinsp;=\u0026thinsp;1790\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eP\u003c/p\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAge (Mean (SD))\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e/\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e65.697(16.007)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e71.446(13.684)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eGender (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e4343(47.35)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e799(44.64)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e0.0377\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e4829(52.65)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e991(55.36)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colspan=\"5\" nameend=\"c5\" namest=\"c1\"\u003e \u003cp\u003eUnderlying disease (%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCKD\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e7927(86.43)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1488(83.13)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e0.0003\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1245(13.57)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e302(16.87)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDiabetes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e8897(97.00)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1488(83.13)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e0.299\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e275(3.00)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e45(2.51)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCopd\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e7208(78.59)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1459(81.51)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1964(21.41)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e331(18.49)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSepsis\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e6758(73.68)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1227(68.55)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e2414(26.22)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e563(31.45)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCerebrovascular\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e9042(98.58)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1739(97.15)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e130(1.42)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e51(2.85)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCardiac\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e5124(55.87)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e812(45.36)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e4048(44.13)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e978(54.64)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBMI (Mean (SD))\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e/\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e27.41(5.82)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e26.43(5.782)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eApache Score\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e/\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e62.517 (23.930)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e82.737 (30.688)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eThe length of ICU stay(days)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e/\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e5.084 (5.667)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e6.869 (7.720)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eThe dataset was divided randomly into two parts, with 70% (n\u0026thinsp;=\u0026thinsp;7673) of the data used for model training and 30% (n\u0026thinsp;=\u0026thinsp;3289) used for model validation. For the training data, we employ the XGBoost model to compute Gini importance for feature selection. The Gini Importance values were computed by evaluating the average reduction in Gini impurity across all split nodes associated with each feature. The selection of these 37 features is indicative of their pronounced discriminatory and predictive capabilities within the decision tree framework, thereby enhancing the overall performance of the model. To address the sample imbalance, ADASYN oversampling was applied to the training data, resulting in a revised distribution of training samples. Originally, there were 6433 survival patients and 1240 non-survival patients in the training set. After resampling, the ratio of survival patients to non-survival patients were 6433:6170.\u003c/p\u003e \u003cp\u003eFive models, namely XGBoost, LR, RF, SVM and MLP, were established using the training dataset. The models were then tested using respective testing datasets, yielding AUC values, accuracy, mean average precision (mAP), and F1 Score were calculated to facilitate comparison between the models. The AUC was selected as the primary evaluation metric for the model because it is robust to class imbalance, insensitive to classification thresholds, allows for comparison of performance across different classifiers, and intuitively presents the model's performance at various thresholds through the ROC curve (as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). Among the models, XGBoost achieved the highest predictive performance (AUC\u0026thinsp;=\u0026thinsp;0.732\u0026thinsp;\u0026plusmn;\u0026thinsp;0.0065). There has been a notable improvement in scoring results compared to traditional Apache (AUC\u0026thinsp;=\u0026thinsp;0.677). The use of complex neural network models did not lead to an improvement in the accuracy of the model; instead, there was a decline in various aspects.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003ewe conducted multiple cross-validation tests using k-fold (k\u0026thinsp;=\u0026thinsp;5) [\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e] to obtain the average performance results of the model (Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eThe classification results of different models.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAUC\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eAccuracy\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003emAP\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eF1\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003exgboost\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.732(0.0065)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.759(0.0071)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.257(0.0050)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.355(0.0021)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRandom forest\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.677(0.0063)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.692(0.0076)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.288(0.0083)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.322(0.0305)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.643(0.0507)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.576(0.0638)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.533(0.0109)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.411(0.0825)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSvm\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.702(0.0068)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.642(0.0115)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.494(0.0587)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.481(0.0352)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMLP\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.683(0.0128)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.696(0.0181)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.405(0.0564)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.389(0.0390)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eConsidering the generalization of the model, we additionally conducted external validation using 549 randomly selected cases from MIMIC (Medical Information Mart for Intensive Care) following the same criteria. From the confusion matrix (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e), it can be observed that the XGBoost-based model performed the best, with an accuracy of 78.14%.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eTo determine the importance of each predictive variable for the XGBoost model, the SHAP algorithm was utilized. The variable importance plot was generated, which listed the variables in descending order of importance. Among the conventional indicators, AST, which reflects the patient's metabolic condition, was the strongest predictor for all prediction periods. The average age, albumin, and BMI were also identified as strong predictive features. Figure\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e provides a visualization of these findings.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eTo investigate the positive or negative relationship between predictive factors and the target outcome, SHAP values were utilized to identify death risk factors. Figure\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e illustrates the results, with the horizontal position indicating the effect of higher or lower predictions associated with the value, and the color indicating whether the variable is high (red) or low (blue) for the observation value. The findings suggest that an increase in average age has a positive effect on the prediction results and is associated with a higher risk of death, while an increase in AST has a negative effect on the prediction results and is associated with a higher morality, and lower BMI index means a better survival rate.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eTo confirm the model's diagnostic accuracy in individuals, we chose a single patient for visual validation of the prediction model (all index values in the figure have been normalized). The patient was found to have developed sepsis during their ICU stay, was older in age, and other negative predictors for clinical outcomes (indicated in red on the graph). The final predicted result score was greater than 0 (shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e), which forecasted the outcome as expired and was in line with the actual result (the patient unit stay id was 1589468 in eICU database).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003ePneumonia has consistently been one of the leading causes of morbidity and mortality worldwide. The mortality rate of pneumonia is intricately associated with age, incidence, and the severity of the disease at the time of admission [\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e]. In this study, the focus was on developing an interpretable model for predicting mortality risk in ICU patients with pneumonia. The XGBoost model exhibited superior predictive performance, boasting an impressive AUC of 0.732\u0026thinsp;\u0026plusmn;\u0026thinsp;0.0065, surpassing traditional scoring systems and other machine learning methods. External validation using MIMIC data further validated the model's classification prowess. The SHAP method played a pivotal role in elucidating the predictive factors influencing pneumonia mortality. Notably, AST emerged as the foremost predictor, shedding light on its critical role in prognosis. The study's findings emphasize the significance of interpretable predictive models in enhancing physicians' ability to accurately gauge the risk of mortality in ICU patients with pneumonia.\u003c/p\u003e \u003cp\u003eRegarding methods for assessing the severity of pneumonia, such as the Pneumonia Severity Index (PSI) [\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e], CURB-65 score [\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e], and clinical scoring systems, such as the Sequential Organ Failure Assessment (SOFA) score [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e], the Acute Physiology and Chronic Health Evaluation (APACHE-II) score [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e] can assist clinical practitioners in evaluating patients' overall risk and predict disease progression or formulate treatment plans. However, it's important to note that numerous scoring systems rely on simplified criteria and parameters, potentially lacking a comprehensive reflection of an individual patient's condition. The applicability of these scoring systems might be constrained, especially for specific age groups or patients with particular medical conditions. Due to the inherent advantage of learning algorithms in capturing powerful nonlinear characteristics, machine learning predictive models are increasingly and widely applied to explore early warning signs and analyze risk factors in various diseases [\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e]. This application is aimed at supporting personalized treatment for patients.\u003c/p\u003e \u003cp\u003eIn this retrospective cohort study of a large-scale public ICU database, we developed and validated five machine learning algorithms to predict the mortality of patients with pneumonia.\u003c/p\u003e \u003cp\u003eCompared to the traditional Apache scoring system (AUC 0.677), machine learning algorithms such as Random Forest had an AUC of 0.677, Support Vector Machine (SVM) had an AUC of 0.702, and Multi-Layer Perceptron (MLP) had an AUC of 0.683. In our study, the XGBoost model showed a better performance to predict the mortality of pneumonia with an AUC of 0.734 compared with others. In order to validate the efficacy of the model, we conducted external validation using data from MIMIC. Similarly, it is observed that XGBoost achieved the best classification performance. This also indicates that in clinical scenarios, simple machine learning models may be easier to train and less susceptible to overfitting. In our study, the risk analysis for pneumonia patients is just a simple binary classification model. Given the tabular data nature of the task, it is relatively straightforward, requiring no complex feature extraction and pattern recognition. The XGBoost model has proven competent and demonstrated better classification performance. This aligns with relevant literature perspectives and is consistent with our research findings [\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e]. Moreover, within the medical domain, there is frequently a heightened demand for interpretability in decision-making. Physicians and patients seek to comprehend the rationale behind the model's predictions. The integration of XGBoost with our SHAP algorithm facilitates a more accessible interpretation of the model's decision-making process, in contrast to the often-opaque nature of complex neural network models, which are commonly perceived as black-box models.\u003c/p\u003e \u003cp\u003eUsing the SHAP method to explain the XGBoost model ensured both its performance and clinical interpretability. This will help doctors better understand the model's decision-making process and promote the use of predictive results. In our study, we combined the basic clinical information, ventilator parameters, and laboratory characteristics of patients, which provided important vital sign parameters and information related to the severity of pneumonia. We found that age and AST were the most significant variables associated with in-hospital mortality among pneumonia patients. The mortality rate for severe pneumonia patients in the intensive care unit (ICU) was as high as 16.33%, while for patients in the ward, an increase in age was accompanied by an increase in mortality rate, with the average age of the deceased higher than that of the survivors. According to a study by Venceslau Pinto Hespanhol et al., the hospital mortality rate for pneumonia patients over 80 years of age rose sharply after the age of 60, reaching 38.5% after the age of 90 [\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e]. Previous research on severe pneumonia has found that age (\u0026gt;\u0026thinsp;65 years), severe leukopenia or leukocytosis, and bacteremia are risk factors for death [\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e]. The risk of death from pneumonia often increases with patient age and comorbidities, possibly due to systemic inflammation and decreased immunity [\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e] Excluding the factor of age in the patients' basic information, AST provides the most important information for predicting patient mortality. Although AST is well known as a marker of liver dysfunction, we found in our study that serum AST concentration is closely related to the mortality rate of severe pneumonia. Enveloped viruses such as SARS-CoV and HCoV-NL63 enter host cells through direct membrane fusion between the host cell surface receptor (ACE2) and the virus (spike protein), leading to the release of viral ssRNA genome into host cells. After the virus enters, ACE2 is down-regulated, leading to excessive ACE/Ang II activity, increased pulmonary vascular permeability, and subsequent lung injury [\u003cspan additionalcitationids=\"CR38\" citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e]. However, the ACE2 receptor is also widely expressed in bile ducts and hepatic epithelial cells [\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e]. A recent study on the association between liver injury and markers of in-hospital mortality for 2019 coronavirus disease reported that these markers, particularly AST abnormalities and death, were diagnosed in conjunction with liver injury during hospitalization, in comparison to other indicators of liver injury [\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e]. This suggests that measuring serum AST concentration can serve as a valuable tool for predicting patient outcomes and assessing liver injury in cases of severe pneumonia. Therefore, AST provides critical information that healthcare providers should consider when monitoring patients with severe pneumonia.\u003c/p\u003e \u003cp\u003eTo date, none of the severity-of-illness assessment tools commonly used in critical care settings require the evaluation of serum albumin levels. Nevertheless, the relationship between hypoalbuminemia and the incidence, mortality, and length of hospital stay of patients in the ICU has been established [\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e]. Critically ill patients often exhibit acute-phase reactions, leading to a decrease in albumin levels due to changes in distribution. Concurrently, the synthesis or breakdown metabolism of albumin may also undergo alterations [\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e]. In previous studies, it was found that decreased serum albumin levels may serve as independent predictors of pneumonia in patients with acute ischemic stroke (AIS), especially in cases of mild stroke [\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e]. Additionally, the risk of pneumonia may exhibit an inverse correlation with the albumin level. Multiple studies have also identified an overall association between disease severity and obesity, as well as other metabolic risk factors, including diabetes and hypertension [\u003cspan additionalcitationids=\"CR46\" citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e]. Viral pneumonia often necessitates invasive mechanical ventilation (IMV), imposing a significant strain on global intensive care resources. In a study conducted by Chetboun et al., it was observed that the need for IMV increased progressively with the body mass index (BMI) among viral pneumonia patients admitted to intensive care units (ICU) [\u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e]. Similarly, recent experiences during the viral pneumonia pandemic suggest that the mortality rate for patients requiring invasive mechanical ventilation falls within the range of 35\u0026ndash;50% [\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e]. Hraiech et al. noted a survival advantage in CAP patients who required mechanical ventilation within 72 hours of the onset of Community-Acquired Pneumonia (CAP) compared to those who required mechanical ventilation 4 or more days after the onset of CAP (28% vs. 51%, p\u0026thinsp;=\u0026thinsp;0.03) [\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e]. Therefore, any delay in identifying severe illness, recognizing those at risk of mechanical ventilation, or needing ICU-level care, along with associated timely treatment, may have detrimental effects on the prognosis of severe CAP patients [\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e].\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eIn summary, this study successfully developed an interpretable XGBoost model for predicting pneumonia mortality in ICUs, offering improved performance compared to existing models. The SHAP method facilitated a deeper understanding of the model's decision-making process. These findings contribute to advancing individualized care strategies for pneumonia patients in intensive care settings. However, several limitations should be considered in interpreting the results of this study. First, the retrospective nature of the study using electronic health records introduces inherent biases and limitations related to data quality, completeness, and potential confounding variables. Addressing these issues requires ongoing efforts to improve data collection processes and incorporating real-world data. Moreover, the model's performance, while promising, may be subject to variations across different healthcare settings and patient populations. External validation using datasets from diverse sources and geographical locations is imperative to assess the generalizability of the model and ensure its applicability in varied clinical scenarios. In future research, incorporating real-world data and continuous model updates could enable the development of a dynamic prediction tool, responsive to evolving patient conditions and treatment protocols. Collaboration with clinicians is crucial for refining the model and ensuring its seamless integration into clinical workflows [\u003cspan additionalcitationids=\"CR53\" citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e54\u003c/span\u003e].\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eEthics approval and consent to participate:\u0026nbsp;\u003c/strong\u003eThe data utilized in this research was acquired from the publicly available eICU and MIMIC. Importantly, the data does not include personal patient information and thus does not necessitate ethical approval or individual patient consent.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent for publication:\u0026nbsp;\u003c/strong\u003eAll the authors\u0026rsquo; names are included in the title page. The order of authors is accurate and has agreed by all authors. All authors have reviewed the final version of the manuscript and approve it for publication.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAvailability of data and materials:\u0026nbsp;\u003c/strong\u003eThe data for this study is sourced from a publicly available dataset, Materials used in experiments are available upon request. Contact the corresponding author for further details.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests:\u0026nbsp;\u003c/strong\u003eNot applicable\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding:\u003c/strong\u003e Not applicable\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor Contributions Statement:\u0026nbsp;\u003c/strong\u003eJ. L, and Y.Z contributed equally. \u0026nbsp;Y.T conceptualized the study. S.H carried out the collection. J.L and Y.Z analysis of the literature and data and drafted the manuscript.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eNair GB, Niederman MS. Updates on community acquired pneumonia management in the ICU. Pharmacol Ther. 2021;217:107663. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.pharmthera.2020.107663\u003c/span\u003e\u003cspan address=\"10.1016/j.pharmthera.2020.107663\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Epub 2020 Aug 15. PMID: 32805298; PMCID: PMC7428725.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRamirez JA, Wiemken TL, et al. Adults Hospitalized With Pneumonia in the United States: Incidence, Epidemiology, and Mortality. Clin Infect Dis. 2017;65(11):1806\u0026ndash;12. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/cid/cix647\u003c/span\u003e\u003cspan address=\"10.1093/cid/cix647\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKoenig SM, Truwit JD. Ventilator-associated pneumonia: diagnosis, treatment, and prevention[J]. Clin Microbiol Rev. 2006;19(4):637\u0026ndash;57.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKnaus WA, Draper EA, Wagner DP, et al. APACHE II: a severity of disease classification system[J]. Crit Care Med. 1985;13(10):818\u0026ndash;29.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLambden S, Laterre PF, Levy MM, et al. The SOFA score\u0026mdash;development, utility and challenges of accurate assessment in clinical trials[J]. Crit Care. 2019;23(1):1\u0026ndash;9.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu C, Wang X, Liu C, et al. Differentiating novel coronavirus pneumonia from general pneumonia based on machine learning[J]. Biomed Eng Online. 2020;19(1):1\u0026ndash;14.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKim SY, Diggans J, Pankratz D, et al. Classification of usual interstitial pneumonia in patients with interstitial lung disease: assessment of a machine learning approach using high-dimensional transcriptional data[J]. Volume 3. The lancet Respiratory medicine; 2015. pp. 473\u0026ndash;82. 6.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKuo KM, Talley PC, Huang CH, et al. Predicting hospital-acquired pneumonia among schizophrenic patients: a machine learning approach[J]. BMC Med Inf Decis Mak. 2019;19(1):1\u0026ndash;8.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJohnson AE, W, Ghassemi MM, Nemati S et al. Machine learning and decision support in critical care[J]. Proceedings of the IEEE, 2016, 104(2): 444\u0026ndash;466.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePollard TJ, Johnson AEW, Raffa JD, et al. The eICU Collaborative Research Database, a freely available multi-center database for critical care research[J]. Sci data. 2018;5(1):1\u0026ndash;13.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLundberg SM, Lee SI. A unified approach to interpreting model predictions[J]. Adv Neural Inf Process Syst, 2017, 30.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen T, He T, Benesty M et al. Xgboost: extreme gradient boosting[J]. R package version 0.4-2, 2015, 1(4): 1\u0026ndash;4.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003evan de Garde EMW, Oosterheert JJ, Bonten M, et al. International classification of diseases codes showed modest sensitivity for detecting community-acquired pneumonia[J]. J Clin Epidemiol. 2007;60(8):834\u0026ndash;8.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVan Schyndel SJ, Carrier J, Bogado Pascottini O, et al. The effect of pegbovigrastim on circulating neutrophil count in dairy cattle: A randomized controlled trial[J]. PLoS ONE. 2018;13(6):e0198701.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRobinson C. Basic introduction into pgAdmin III and SQL queries[J]. 2011.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRamraj S, Uzir N, Sunil R, et al. Experimenting XGBoost algorithm for prediction and classification of different datasets[J]. Int J Control Theory Appl. 2016;9(40):651\u0026ndash;62.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTang F, Ishwaran H. Sci J. 2017;10(6):363\u0026ndash;77. Random forest missing data algorithms[J]. Statistical Analysis and Data Mining: The ASA Data.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHe H, Bai Y, Garcia EA et al. ADASYN: Adaptive synthetic sampling approach for imbalanced learning[C]//2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence). IEEE, 2008: 1322\u0026ndash;1328.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLundberg SM, Erion G, Chen H, et al. From local explanations to global understanding with explainable AI for trees[J]. Nat Mach Intell. 2020;2(1):56\u0026ndash;67.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eIhaka R, Gentleman R. R: a language for data analysis and graphics[J]. J Comput graphical Stat. 1996;5(3):299\u0026ndash;314.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePython W. Python[J]. Python Releases Wind, 2021, 24.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCuzick J. A Wilcoxon-type test for trend[J]. Stat Med. 1985;4(1):87\u0026ndash;90.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLaValley MP. Logistic regression[J] Circulation. 2008;117(18):2395\u0026ndash;9.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRigatti SJ. Random forest[J]. J Insur Med. 2017;47(1):31\u0026ndash;9.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSuthaharan S, Suthaharan S. Support vector machine[J]. Machine learning models and algorithms for big data classification: thinking with examples for effective learning, 2016: 207\u0026ndash;235.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTaud H, Mas JF. Multilayer perceptron (MLP)[J]. Geomatic approaches for modeling land change scenarios, 2018: 451\u0026ndash;455.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRodriguez JD, Perez A, Lozano JA. Sensitivity analysis of k-fold cross validation in prediction error estimation[J]. IEEE Trans Pattern Anal Mach Intell. 2009;32(3):569\u0026ndash;75.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eIto A, Ishida T, Tokumasu H et al. Prognostic factors in hospitalized community-acquired pneumonia: a retrospective study of a prospective observational cohort. BMC Pulm Med. 2017;17(1):78. Published 2017 May 2. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1186/s12890-017-0424-4\u003c/span\u003e\u003cspan address=\"10.1186/s12890-017-0424-4\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang D, Willis DR, Yih Y. The pneumonia severity index: Assessment and comparison to popular machine learning classifiers. Int J Med Inform. 2022;163:104778. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.ijmedinf.2022.104778 1\u003c/span\u003e\u003cspan address=\"10.1016/j.ijmedinf.2022.104778 1\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePatel S. Calculated decisions: CURB-65 score for pneumonia severity. Emerg Med Pract. 2021;23(Suppl 2):CD1-CD2. Published 2021 Feb 1. 1.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKang MW, Kim J, Kim DK, et al. Machine learning algorithm to predict mortality in patients undergoing continuous renal replacement therapy. Crit Care. 2020;24(1):42. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1186/s13054-020-2752-7\u003c/span\u003e\u003cspan address=\"10.1186/s13054-020-2752-7\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Published 2020 Feb 6.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu C, Liu X, Mao Z, Interpretable Machine Learning Model for Early Prediction of Mortality in ICU Patients with Rhabdomyolysis. Med Sci Sports Exerc., Grinsztajn L, Oyallon E, Varoquaux G et al. Why do tree-based models still outperform deep learning on typical tabular data? [J]. Advances in Neural Information Processing Systems, 2022, 35: 507\u0026ndash;520.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGrinsztajn L, Oyallon E, Varoquaux G. Why do tree-based models still outperform deep learning on typical tabular data?[J]. Adv Neural Inf Process Syst. 2022;35:507\u0026ndash;20.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHespanhol V, B\u0026aacute;rbara C. Pneumonia mortality, comorbidities matter? Pulmonology. 2020 May-Jun;26(3):123\u0026ndash;129. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.pulmoe.2019.10.003\u003c/span\u003e\u003cspan address=\"10.1016/j.pulmoe.2019.10.003\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Epub 2019 Nov 29. PMID: 31787563.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMetersky ML, Waterer G, Nsa W et al. Predictors of in-hospital vs postdischarge mortality in pneumonia. Chest. 2012;142(2):476\u0026ndash;481. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1378/chest.11-2393\u003c/span\u003e\u003cspan address=\"10.1378/chest.11-2393\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. PMID: 22383662.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCillo\u0026acute;niz C, Polverino E, Ewig S, et al. Impact of age and comorbidity on cause and outcome in community-acquired pneumonia. Chest. 2013;144(3):999\u0026ndash;1007.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMalik YA. Properties of Coronavirus and SARS-CoV-2. Malays J Pathol. 2020;42(1):3\u0026ndash;11. PMID: 32342926.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSuryamohan K, Diwanji D, Stawiski EW, et al. Human ACE2 receptor polymorphisms and altered susceptibility to SARS-CoV-2. Commun Biol. 2021;4(1):475. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s42003-021-02030-3\u003c/span\u003e\u003cspan address=\"10.1038/s42003-021-02030-3\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. PMID: 33846513; PMCID: PMC8041869.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBakhshandeh B, Sorboni SG, Javanmard AR, et al. Variants in ACE2; potential influences on virus infection and COVID-19 severity. Infect Genet Evol. 2021;90:104773. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.meegid.2021.104773\u003c/span\u003e\u003cspan address=\"10.1016/j.meegid.2021.104773\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Epub 2021 Feb 17. PMID: 33607284; PMCID: PMC7886638.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChai X, Hu L, Zhang Y et al. Specific ACE2 expression in cholangiocytes may cause liver damage after 2019-nCoV infection. bioRxiv 2020:2020.02.03.931766.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLei F, Liu YM, Zhou F et al. Longitudinal association between markers of liver injury and mortality in COVID-19 in China. Hepatology 2020:1.0.1002/hep.31301.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJellinge ME, Henriksen DP, Hallas P et al. Hypoalbuminemia is a strong predictor of 30-day all-cause mortality in acutely admitted medical patients: a prospective, observational, cohort study. PLoS One. 2014;9(8): e105983. Published 2014 Aug 22. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1371/journal.pone.0105983\u003c/span\u003e\u003cspan address=\"10.1371/journal.pone.0105983\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNicholson JP, Wolmarans MR, Park GR. The role of albumin in critical illness. Br J Anaesth. 2000;85(4):599\u0026ndash;610. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/bja/85.4.599\u003c/span\u003e\u003cspan address=\"10.1093/bja/85.4.599\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYang X, Wang L, Zheng L, et al. Serum Albumin as a Potential Predictor of Pneumonia after an Acute Ischemic Stroke. Curr Neurovasc Res. 2020;17(4):385\u0026ndash;93. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.2174/1567202617666200514120641\u003c/span\u003e\u003cspan address=\"10.2174/1567202617666200514120641\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e[113] Williamson EJ, Walker AJ, Bhaskaran K, et al. Factors associated with COVID-19-related death using OpenSAFELY. Nature. 2020;584(7821):430\u0026ndash;6. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41586-020-2521-4\u003c/span\u003e\u003cspan address=\"10.1038/s41586-020-2521-4\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTruog RD, Mitchell C, Daley GQ. The Toughest Triage - Allocating Ventilators in a Pandemic. N Engl J Med. 2020;382(21):1973\u0026ndash;5. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1056/NEJMp2005689\u003c/span\u003e\u003cspan address=\"10.1056/NEJMp2005689\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHigh Prevalence of Obesity in Severe Acute Respiratory Syndrome. Coronavirus-2 (SARS-CoV-2) Requiring Invasive Mechanical Ventilation. Obes (Silver Spring). 2020;28(10):1994. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1002/oby.23006\u003c/span\u003e\u003cspan address=\"10.1002/oby.23006\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChetboun M, Raverdy V, Labreuche J, et al. BMI and pneumonia outcomes in critically ill covid-19 patients: An international multicenter study. Obes (Silver Spring). 2021;29(9):1477\u0026ndash;86. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1002/oby.23223\u003c/span\u003e\u003cspan address=\"10.1002/oby.23223\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRichardson S, Hirsch JS, Narasimhan M et al. Presenting Characteristics, Comorbidities, and Outcomes Among 5700 Patients Hospitalized With COVID-19 in the New York City Area [published correction appears in JAMA. 2020;323(20):2098]. JAMA. 2020;323(20):2052\u0026ndash;2059. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1001/jama.2020.6775\u003c/span\u003e\u003cspan address=\"10.1001/jama.2020.6775\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHraiech S, Alingrin J, Dizier S et al. Time to intubation is associated with outcome in patients with community-acquired pneumonia. PLoS One. 2013;8(9): e74937. Published 2013 Sep 19. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1371/journal.pone.0074937\u003c/span\u003e\u003cspan address=\"10.1371/journal.pone.0074937\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRestrepo MI, Mortensen EM, Rello J, Brody J, Anzueto A. Late admission to the ICU in patients with community-acquired pneumonia is associated with higher mortality. Chest. 2010;137(3):552\u0026ndash;7. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1378/chest.09-1547\u003c/span\u003e\u003cspan address=\"10.1378/chest.09-1547\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLuo C, et al. A machine learning-based risk stratification tool for in-hospital mortality of intensive care unit patients with heart failure. J Translational Med. 2022;20(1):136.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHuang T, et al. Machine learning for prediction of in-hospital mortality in lung cancer patients admitted to intensive care unit. PLoS ONE. 2023;18(1):e0280606.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang B, et al. Novel pneumonia score based on a machine learning model for predicting mortality in pneumonia patients on admission to the intensive care unit. Respir Med. 2023;217:107363.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"bmc-pulmonary-medicine","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"pulm","sideBox":"Learn more about [BMC Pulmonary Medicine](http://bmcpulmmed.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/pulm/default.aspx","title":"BMC Pulmonary Medicine","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Pneumonia, Interpretable model, machine learning, ICU","lastPublishedDoi":"10.21203/rs.3.rs-3757487/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-3757487/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eBackground\u003c/h2\u003e \u003cp\u003ePneumonia, a leading cause of morbidity and mortality worldwide, often necessitates Intensive Care Unit (ICU) admission. Accurate prediction of pneumonia mortality is crucial for tailored prevention and treatment plans. However, existing mortality prediction models face limited adoption in clinical practice due to their lack of interpretability.\u003c/p\u003e\u003ch2\u003eObjective\u003c/h2\u003e \u003cp\u003eThis study aimed to develop an interpretable model for predicting pneumonia mortality in ICUs. Leveraging the Shapley Additive Explanation (SHAP) method, we sought to elucidate the Extreme Gradient Boosting (XGBoost) model and identify prognostic factors for pneumonia.\u003c/p\u003e\u003ch2\u003eMethods\u003c/h2\u003e \u003cp\u003eConducted as a retrospective cohort study, we utilized electronic health records from the eICU-CRD (2014\u0026ndash;2015) for all adult pneumonia patients. The first 24 hours of each ICU admission records were considered, with 70% of the dataset allocated for model training and 30% for validation. The XGBoost model was employed, and performance was assessed using the area under the receiver operating characteristic curve (AUC). The SHAP method provided insights into the XGBoost model.\u003c/p\u003e\u003ch2\u003eResults\u003c/h2\u003e \u003cp\u003eAmong 10,962 pneumonia patients, in-hospital mortality was 16.33%. The XGBoost model demonstrated superior predictive performance (AUC: 0.732\u0026thinsp;\u0026plusmn;\u0026thinsp;0.0065) compared to traditional scoring systems and other machine learning method, which achieved an improvement of 8% points. SHAP analysis identified Aspartate Aminotransferase (AST) as the most crucial predictor.\u003c/p\u003e\u003ch2\u003eConclusions\u003c/h2\u003e \u003cp\u003eInterpretable predictive models enhance mortality risk assessment for pneumonia patients in the ICU, fostering transparency. AST emerged as the foremost predictor, followed by patient age, albumin, BMI et al. These insights, rooted in strong correlations with mortality, facilitate improved clinical decision-making and resource allocation.\u003c/p\u003e","manuscriptTitle":"Interpretable Mortality Prediction Model for ICU Patients with Pneumonia: Using Shapley Additive Explanation Method","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2023-12-19 20:01:56","doi":"10.21203/rs.3.rs-3757487/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2024-07-23T08:52:48+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-07-19T16:34:22+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"73187554403362229694556772704273425497","date":"2024-07-12T21:39:24+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"185879538212907700374545853984175352119","date":"2024-07-10T18:22:45+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-04-17T18:03:30+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-04-08T05:45:15+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"1a2087a2-b13e-4afb-b4a6-2aaa5d019d57","date":"2024-04-03T16:04:07+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"16c110b8-30ab-4c29-8bbf-d0f66841f58f","date":"2024-03-17T03:49:58+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2023-12-15T15:43:29+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2023-12-15T15:19:51+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2023-12-15T14:50:50+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2023-12-15T14:49:25+00:00","index":"","fulltext":""},{"type":"submitted","content":"BMC Pulmonary Medicine","date":"2023-12-15T07:59:50+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"bmc-pulmonary-medicine","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"pulm","sideBox":"Learn more about [BMC Pulmonary Medicine](http://bmcpulmmed.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/pulm/default.aspx","title":"BMC Pulmonary Medicine","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"e15a7248-d1b0-4ad6-8a51-2addf77a9543","owner":[],"postedDate":"December 19th, 2023","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2024-09-16T16:04:34+00:00","versionOfRecord":{"articleIdentity":"rs-3757487","link":"https://doi.org/10.1186/s12890-024-03252-x","journal":{"identity":"bmc-pulmonary-medicine","isVorOnly":false,"title":"BMC Pulmonary Medicine"},"publishedOn":"2024-09-13 15:58:05","publishedOnDateReadable":"September 13th, 2024"},"versionCreatedAt":"2023-12-19 20:01:56","video":"","vorDoi":"10.1186/s12890-024-03252-x","vorDoiUrl":"https://doi.org/10.1186/s12890-024-03252-x","workflowStages":[]},"version":"v1","identity":"rs-3757487","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-3757487","identity":"rs-3757487","version":["v1"]},"buildId":"WvIrzKhiLBfengagbw6Ux","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. The paper's references may be in our DB but unresolved to ``paper_id`` (resolution happens at ingest when the cited DOI matches a row we already have). Run the cross-source citation reconcile pass to retry.

Source provenance

crossref: last seen: 2026-05-21T01:00:31.012261+00:00
europepmc: last seen: 2026-05-20T01:45:00.602351+00:00