Ensemble Machine Learning for Institutional-Level Hospital Mortality Prediction Using Clinical and Operational Indicators

preprint OA: closed
Full text JSON View at publisher
Full text 153,275 characters · extracted from preprint-html · click to expand
Ensemble Machine Learning for Institutional-Level Hospital Mortality Prediction Using Clinical and Operational Indicators | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Ensemble Machine Learning for Institutional-Level Hospital Mortality Prediction Using Clinical and Operational Indicators Sri Murdiati, Murnawan Murnawan, Safrizal Rahman, Yoga Yuniadi, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9203293/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Background: Accurate prediction of in-hospital mortality is essential for hospital-wide clinical decision-making and resource planning. Most machine learning frameworks rely on patient-level clinical data and focus on intensive care or disease-specific populations, while hospital operational metrics are rarely incorporated. Methods: We conducted a retrospective observational study using 36 months of institutional data from 2022 to 2024 obtained from a regional referral hospital. The dataset integrated clinical indicators with hospital operational metrics, including length of stay, bed occupancy rate, and bed turnover rate. Three machine learning models, Random Forest, XGBoost, and a feed-forward neural network, were developed alongside a linear regression baseline. A stacked ensemble approach was applied to capture nonlinear relationships. Model performance was evaluated using R 2 , root mean squared error, and mean absolute error with five-fold cross-validation. Model interpretability was assessed using Shapley Additive exPlanations. Results: The stacked ensemble achieved the strongest predictive performance (R 2 = 0.84; RMSE = 4.49), while the neural network yielded the lowest MAE (2.74). Heart failure and cardiogenic shock emerged as influential clinical predictors. Although operational metrics showed limited direct effects, interaction terms improved model stability. Shapley analyses demonstrated consistent feature attributions across models, supporting interpretability. Conclusions: Integrating clinical severity indicators with hospital operational metrics using an explainable ensemble machine learning framework improves hospital-wide mortality prediction. Operational variables contribute modestly in isolation but enhance model robustness through interaction effects, highlighting the value of interpretable machine learning for institutional-level clinical decision support. Artificial Intelligence and Machine Learning Medical Informatics hospital mortality machine learning hospital performance metrics ensemble learning healthcare analytics Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 1 Introduction 1.1 Research Background The accurate prediction of inpatient mortality is one of the key areas of debate in modern medicine, thus stimulating the development of advanced computational approaches to improve prognostic power and guide clinical decision-making. Machine learning (ML) techniques utilize high-dimensional electronic health records (EHRs) to uncover predictive patterns. These patterns can support early identification of high-risk patients and enable timely clinical intervention that may reduce hospital mortality [ 1 ]. The outcome of recent empirical investigations is that ML approaches (including ensemble learning, deep neural architectures, and decision tree families) tend to outperform conventional statistical models. For example, decision tree models have achieved area under the curve (AUC) values close to 0.96 in intensive care settings, and boosting-based ensemble classifiers are better than regression baselines for selected groups such as patients with heart failure [ 2 – 4 ]. However, concerns about generalizability, interpretability, and external validation still exist, which limit the routine deployment. Concurrently, a multifactorial perspective gathering clinical determinants together with operational signals has developed; studies show that an amalgamation of structured admission data improves the assessment of mortality risks, whereas the integration of real-time EHRs provides a hospital-wide predictive capacity [ 5 – 11 ]. 1.2 Research Gap Despite a boost to speed, there are four pertinent gaps that persist. First, many models still favour administrative variables and underutilize critical clinical biomarkers and physiological parameters, which limits predictive accuracy and the transportability of findings across populations [ 12 , 13 ]. Second, there is uneven interpretability: black box behaviour may undermine transparency and clinician trust, making it difficult to incorporate into clinical workflows smoothly [ 14 , 15 ]. Third, the evidence base is predominantly intensive care unit (ICU)-centered and single-site, with a lack of utilization of hospital-wide performance metrics and deterioration in model performance over time, with no maintenance strategies [ 8 ]. Fourth, the external validity is not always carefully evaluated; cross- institutional verification is limited, and tailoring for high-risk sub-populations (e.g., paediatric, oncology) is limited [ 16 – 20 ]. Addressing these deficiencies is essential to ensure that ML-based predictions become clinically actionable and broadly adoptable. This study makes three main contributions to the field of health informatics and hospital analytics. First, it proposes an institutional-level mortality prediction framework that integrates hospital operational indicators with clinical severity variables. Second, it evaluates the predictive performance of stacked ensemble machine learning models compared with individual machine learning algorithms and a conventional linear regression baseline. Third, the study incorporates explainable artificial intelligence using SHAP to enhance model transparency and support interpretable decision-making for hospital-wide mortality monitoring. 1.3 Research Problem Current hospital mortality prediction models often devalue enterprise-level performance measures, resulting in low operational utility for risk scores in day-to-day decision making, such as bed allocation, staffing, and timely escalation. In the real world of clinical practice, a combination of key metrics such as Bed Occupancy Rate (BOR), Length of Stay (LOS), and Bed Turnover Rate (BTR) often is not integrated with clinical covariates and undermines generalizability, actionability, and interpretability across heterogeneous wards. This is important as this research aims to address the gap of a validated, explainable ML approach that simultaneously uses performance and clinical data to provide accurate, calibrated, and operationally meaningful mortality risk estimates for hospital-wide deployment [ 5 , 21 – 26 ] 1.4 Research Questions RQ1. Do ensemble machine learning frameworks that integrate hospital operational metrics (bed occupancy rate, length of stay, and bed turnover rate) with clinical covariates outperform individual machine learning models and a linear regression baseline in predicting institutional-level in-hospital mortality burden, as measured by Gross Death Rate? RQ2. What is the relative contribution of clinical severity indicators compared with hospital operational metrics in explaining variation in institutional-level in-hospital mortality? RQ3. Can explainable artificial intelligence techniques, such as SHAP, enhance the transparency and interpretability of ensemble machine learning models without compromising predictive performance? RQ4. Does feature engineering, particularly the inclusion of interaction terms among hospital operational indicators, improve predictive accuracy and model stability for institutional-level mortality prediction? 1.5 Research Objectives O1. To compare the predictive proficiency of ensemble machine learning models, individual machine learning algorithms, and a linear regression baseline using two metrics: explained variance of the models (R 2 ) and absolute error (RMSE, MAE). O2. To measure the relative significance of clinical and operational predictors of Gross Death Rate using feature-importance analyses, correlation analyses and regression diagnostics. O3. To assess the model interpretability using SHAP-based global and local feature attribution and post-hoc regression analysis. O4. To see the effect of feature engineering, specifically the interaction terms between operational metrics, on predictions and error stability. 1.6 Significance of the Study From a theoretical perspective, the inquiry addresses the gap between health care operations and clinical aspects of risk prediction by formalizing the role of performance metrics of hospitals in mortality modelling and therefore overcoming the historical ICU-centric bias [ 22 ]. Methodologically, it introduces a replicable machine learning plus explainable AI workflow with an explainable machine learning workflow that supports transparent performance evaluation and institutional-level decision support [ 24 , 27 ]. Practically, the ability to integrate performance and clinical signals offers the promise of improving patient safety and throughput by enabling earlier risk stratification, improved bed and staff planning, and more timely clinical response [ 23 , 28 ]. At the systems level, the framework makes quality dashboards and performance contracts informational and develops operational indicators correlated with outcome along with clarifying boundary conditions and promoting external validation, prospective trials, and fairness audits across heterogeneous hospitals [ 29 , 30 ]. 2 Methods 2.1 Study Design and Overall Workflow This study uses a retrospective experimental design, implementing the machine learning (ML) models for hospital mortality risk prognostication from institutional performance metrics and clinical determinants. The predictive repertoire tested here includes both the singular ML algorithms as well as the ensemble learning paradigms. The models that were assessed include: Individual Models: Linear Regression, Random Forest, XGBoost, and Neural Network Ensemble Model: Ensemble Stacked Models that combine the Random Forest Models, XGBoost models, and Neural Network as the meta-learning model. The choice of such models is justified by the fact that they can capture the complex non-linearity of clinical data. Linear Regression is used as a baseline construct because of its interpretability and the fact that it can be used to provide a benchmark against more complex algorithms. Random Forest and XGBoost, as tree-based algorithms, tend to pick up non-linear relationships, while Neural Networks are well-suited to understand in-depth feature interactions. The ensemble strategy makes use of the combined powers of these constituent algorithms, which helps to augment the predictive performance and generalizability of the ensemble. While Neural Networks provide better predictive power, tree-based models like Random Forest, XGBoost, etc., that improve the interpretability of the model, make the ensemble robust and clinical. 2.2 Data Source and Study Setting The dataset used in this study was obtained from Rumah Sakit Umum Daerah (RSUD) Dr. Zainoel Abidin, a regional referral hospital located in Banda Aceh, Indonesia. The hospital serves as a major tertiary care centre that receives patients from multiple districts across the province. The dataset contains aggregated institutional-level records of hospital admissions and clinical indicators collected over a 36-month observation period from January 2022 to December 2024. The collected dataset integrates both clinical severity indicators and hospital operational metrics, enabling the investigation of their combined influence on institutional mortality patterns. Administrative indicators include operational measures commonly used in hospital performance monitoring, such as Length of Stay (LOS), Bed Occupancy Rate (BOR), and Bed Turnover Rate (BTR). These metrics reflect hospital capacity utilisation and patient flow dynamics, which may indirectly influence mortality risk at the institutional level. In addition to administrative indicators, several clinical variables associated with severe cardiovascular complications were incorporated into the dataset. These include Heart Failure, Cardiogenic Shock, and Deep Vein Thrombosis (DVT), which have been reported in previous studies as important predictors of adverse clinical outcomes. The inclusion of both operational and clinical variables enables the present study to examine the relative contribution of system-level factors and patient-level severity indicators within a unified predictive modelling framework. Table 1 summarises the main characteristics of the dataset used in this study, including the observation period, hospital activity indicators, and mortality statistics. Table 1 Summary of dataset characteristics Characteristic Value Hospital type Regional referral hospital Observation period January 2022 – December 2024 Total observation units 36 monthly records Total hospital admissions 42,685 Total in-hospital deaths 1,142 Average monthly admissions 1,186 Gross Death Rate (mean) 2.67% Gross Death Rate (range) 1.90% – 3.52% Net Death Rate (mean) 1.84% Average Length of Stay (LOS) 5.7 days Bed Occupancy Rate (BOR) 73.4% Bed Turnover Rate (BTR) 6.2 times/month Average Bed Turnover Interval (TOI) 1.9 days Number of predictor variables 18 Outcome variable Gross Death Rate (GDR) The dataset comprises 36 monthly institutional observations. Table 1 summarises the key dataset characteristics and the outcome variable used in this study. 2.3 Dataset Characteristics and Variables The dataset used in this study consists of aggregated institutional-level observations derived from hospital activity records, rather than individual patient-level data. Each observation unit represents a monthly summary of hospital performance indicators and clinical severity variables. This aggregation allows the modelling framework to capture hospital-wide mortality patterns and operational dynamics across the study period. Overall, the dataset contains 36 monthly observations covering the period from January 2022 to December 2024, representing institutional summaries of admissions, mortality outcomes, and operational indicators. During this period, the hospital recorded 42,685 admissions and 1,142 in-hospital deaths, corresponding to an average Gross Death Rate (GDR) of approximately 2.67%. Additional operational statistics such as Bed Occupancy Rate (BOR), Bed Turnover Rate (BTR), and Bed Turnover Interval (TOI) were included to describe hospital capacity utilisation and patient flow dynamics. The dataset includes 18 predictor variables, representing a combination of administrative indicators and aggregated clinical severity measures. These variables were selected based on their relevance in prior hospital mortality studies and their availability within institutional reporting systems. For analytical clarity, the variables were organised into three categories: target variable, administrative indicators, and clinical indicators. 1. Target Variable The primary outcome variable in this study is Gross Death Rate (GDR). GDR represents the proportion of patients who died during hospitalisation relative to the total number of hospital admissions within a defined observation period. In this study, GDR is treated as a continuous institutional-level outcome, enabling the modelling of variations in overall mortality burden across time. Using GDR as the outcome variable allows the model to capture hospital-wide mortality dynamics rather than individual patient outcomes. This aggregated perspective is particularly relevant for institutional risk monitoring, quality benchmarking, and operational planning, where decision-making is often based on hospital-level indicators. 2. Administrative Indicators Administrative variables describe hospital operational performance and resource utilisation. These indicators are widely used in hospital management and health system evaluation because they reflect patient flow and capacity utilisation. The administrative indicators included in this study are: Length of Stay (LOS): the average duration of hospitalisation for admitted patients. Bed Occupancy Rate (BOR): the proportion of hospital beds occupied during a given time period. Bed Turnover Rate (BTR): the frequency at which hospital beds are used by different patients within a specified time interval. These variables capture operational constraints within the hospital system and may indirectly influence mortality outcomes through factors such as patient throughput, bed availability, and resource allocation. 3. Clinical Indicators Clinical variables represent aggregated indicators of patient severity and underlying health conditions associated with increased mortality risk. The clinical indicators included in this study are: Heart Failure, representing the prevalence of cardiac dysfunction among hospitalised patients. Cardiogenic Shock, a severe cardiac complication associated with high mortality risk. Deep Vein Thrombosis (DVT), a clinical condition associated with elevated risk of adverse outcomes. These clinical indicators capture patterns of disease severity across hospital admissions and provide important contextual information for mortality prediction models. The combination of clinical severity indicators and hospital operational metrics enables the modelling framework to examine how both patient-level severity patterns and system-level operational conditions contribute to variations in institutional mortality burden. This integrated perspective supports a more comprehensive understanding of mortality dynamics within hospital systems. 2.4 Outcome Definition The primary outcome of this study is the Gross Death Rate (GDR), defined as the proportion of in-hospital deaths relative to the total number of hospital admissions within a defined observation period. In this study, GDR is treated as a continuous institutional-level outcome variable, representing variations in the overall mortality burden experienced by the hospital across time. Gross Death Rate is widely used as a hospital performance indicator in health system evaluation and quality monitoring because it captures cumulative mortality outcomes across heterogeneous wards and patient populations. Unlike patient-level binary outcomes, GDR reflects the aggregate mortality burden at the institutional level, making it suitable for analysing system-wide performance patterns and operational decision contexts. Although GDR does not represent individual patient mortality events directly, it serves as an aggregated proxy for underlying patient-level mortality dynamics. By modelling this institutional-level outcome, the study aims to capture the combined influence of clinical severity indicators and hospital operational conditions on overall mortality patterns. This formulation enables the integration of operational indicators such as Length of Stay (LOS), Bed Occupancy Rate (BOR), and Bed Turnover Rate (BTR) with clinical variables within a unified predictive modelling framework. In addition to the primary outcome variable, several mortality-related indicators were incorporated as contextual predictors to represent broader institutional performance patterns. Importantly, these indicators were calculated from historical time windows preceding the prediction horizon, ensuring that the predictors did not include contemporaneous mortality information associated with the outcome variable. This methodological design was implemented to prevent target leakage, ensuring that the predictive variables represent historical institutional patterns rather than direct components of the outcome variable. In addition, mortality-related predictors such as Net Death Rate were computed using lagged historical windows (t-1), ensuring that mortality information from the prediction period was not included as input features for the model. By focusing on an aggregated institutional mortality indicator, this study prioritises applicability to hospital-level risk surveillance, operational planning, and early identification of periods associated with elevated mortality burden. This institutional perspective complements existing patient-level mortality prediction models by addressing a distinct analytical level relevant for hospital management and system-level decision-making. Future research may extend this framework to incorporate patient-level outcomes or time-to-event survival modelling, enabling a more granular analysis of mortality risk while maintaining the integration of operational hospital indicators. 2.5 Data Preprocessing and Feature Engineering Prior to model training, several preprocessing procedures were applied to ensure data quality, improve model stability, and enhance the predictive capacity of the machine learning algorithms. These procedures included missing value handling, categorical encoding, feature scaling, and feature engineering techniques designed to capture complex relationships among hospital performance indicators and clinical variables. 1. Handling Missing Values Missing values within the dataset were handled using median imputation. This approach was chosen because median imputation is less sensitive to extreme values than mean-based imputation and therefore reduces potential bias caused by outliers in aggregated hospital indicators. By replacing missing entries with the median value of the corresponding variable, the overall distribution of the data was preserved while maintaining numerical stability during model training. 2. Categorical Variable Encoding Certain variables in the dataset were categorical in nature and therefore required transformation into numerical representations before being used in machine learning models. Label encoding was applied to convert categorical variables into numerical values. This transformation enabled the algorithms to interpret categorical attributes while maintaining computational efficiency. 3. Feature Scaling To ensure consistency across variables measured on different scales, feature normalization was performed using the MinMaxScaler technique. Feature scaling is particularly important for algorithms such as neural networks, where differences in feature magnitude may affect gradient-based optimization. By transforming all variables into a comparable numerical range, scaling improves convergence during training and enhances overall model stability. 4. Feature Selection To identify the most relevant predictors of institutional mortality patterns, feature selection techniques were applied prior to model development. Two complementary approaches were used: Recursive Feature Elimination (RFE), which iteratively removes less informative variables based on model performance. SHAP-based feature importance analysis, which evaluates the contribution of each predictor variable to the model output. These methods allowed the study to focus on the most influential predictors while reducing potential noise from less informative variables. 5. Feature Engineering and Interaction Terms In addition to feature selection, feature engineering techniques were implemented to capture potential nonlinear relationships and synergistic effects among hospital operational indicators. Specifically, interaction terms were constructed among selected administrative variables, including: Length of Stay × Bed Occupancy Rate (LOS × BOR) Bed Occupancy Rate × Bed Turnover Rate (BOR × BTR) These interaction terms were introduced to represent complex operational dynamics within hospital systems, where the combined effect of resource utilisation indicators may influence mortality outcomes differently than individual variables considered in isolation. The inclusion of interaction features was guided by empirical correlation analysis, domain knowledge, and prior literature on hospital performance metrics. By incorporating such engineered variables, the modelling framework is better able to capture underlying system-level constraints and operational conditions that may influence institutional mortality patterns. 2.6 Model Development and Hyperparameter Optimization To evaluate the predictive capability of machine learning approaches for institutional-level mortality estimation, several models were developed and compared within a unified experimental framework. The modelling strategy consisted of a conventional statistical baseline model, multiple machine learning algorithms, and an ensemble learning architecture designed to capture complex relationships among clinical and operational variables. 1. Baseline Model A Linear Regression model was implemented as the baseline approach. Linear regression is widely used in health services research due to its interpretability and its ability to provide a clear benchmark against which more complex algorithms can be evaluated. By including a baseline statistical model, the study allows for a transparent comparison between traditional regression-based prediction and machine learning methods. 2. Machine Learning Models Three machine learning algorithms were selected for model development based on their ability to capture nonlinear relationships and complex interactions among predictors: a. Random Forest (RF). Random Forest is an ensemble tree-based algorithm that constructs multiple decision trees and aggregates their predictions. This method is effective for modelling nonlinear relationships and is robust to noise and overfitting, making it suitable for healthcare datasets that contain heterogeneous variables. b. Extreme Gradient Boosting (XGBoost). XGBoost is a boosting-based algorithm that sequentially builds decision trees to minimise prediction errors. By iteratively correcting residual errors from previous trees, XGBoost often achieves high predictive accuracy and has been widely applied in clinical prediction modelling. c. Neural Network (Multilayer Perceptron). A feed-forward neural network model was implemented using a multilayer perceptron (MLP) architecture. Neural networks are capable of capturing complex nonlinear patterns and feature interactions within high-dimensional datasets, making them suitable for modelling intricate relationships between clinical indicators and hospital operational metrics. 3. Ensemble Learning Architecture In addition to individual machine learning models, a stacked ensemble model was developed to improve predictive performance and model robustness. Stacked ensemble learning combines predictions from multiple base learners to generate a final prediction using a meta-learning algorithm. In this study, Random Forest and XGBoost were used as base learners, while a Neural Network served as the meta-learner responsible for combining the outputs of the base models. This architecture allows the ensemble model to integrate complementary strengths from different learning algorithms. Tree-based models capture nonlinear decision boundaries, while neural networks can learn higher-order interactions among predictors. By aggregating predictions from heterogeneous learners, the ensemble approach reduces variance and improves generalisation performance, which is particularly important for modelling institutional-level mortality dynamics influenced by both clinical severity and operational hospital conditions. 4. Hyperparameter Optimization To improve model performance and avoid overfitting, hyperparameter tuning was conducted using a grid search strategy combined with cross-validation. The grid search procedure systematically evaluated multiple combinations of hyperparameters for each machine learning algorithm to identify the configuration that yielded the best predictive performance. The hyperparameters explored during model tuning included: Random Forest: number of trees and maximum tree depth XGBoost: learning rate and number of boosting iterations Neural Network: hidden layer architecture and learning rate Model tuning was conducted using five-fold cross-validation, allowing the models to be trained and evaluated across multiple data partitions. This procedure improves the robustness of the estimated model performance and reduces the risk of overfitting to a specific training subset. The combination of baseline modelling, individual machine learning algorithms, and stacked ensemble learning provides a comprehensive framework for evaluating the effectiveness of advanced predictive techniques in modelling institutional-level hospital mortality. 2.7 Validation Strategy and Performance Metrics Given the temporal nature of monthly institutional observations, we used a time-aware cross-validation strategy (blocked/forward-chaining) to avoid information leakage across adjacent months. Model performance is reported as the average across folds. 1. Cross-Validation Strategy Given that the dataset consists of temporally ordered monthly institutional observations, a time-aware cross-validation strategy was applied. Specifically, a five-fold cross-validation procedure was implemented while preserving the temporal structure of the data to minimize information leakage across adjacent time periods. This approach ensures that model training and validation are performed on temporally separated subsets of the data, thereby providing a more realistic estimate of predictive performance for institutional mortality forecasting. Cross-validation provides a more reliable estimate of model performance compared with a single train–test split, particularly for datasets with relatively limited sample sizes. By evaluating models across multiple data partitions, this strategy reduces the risk that model performance estimates are biased by a specific subset of observations. The final model performance was calculated as the average performance across the five validation folds, providing a more stable estimate of predictive accuracy. This validation design helps mitigate potential temporal dependence among observations and improves the robustness of model evaluation for longitudinal institutional datasets. 2. Performance Evaluation Metrics To comprehensively assess model performance, several complementary evaluation metrics were used. These metrics capture different aspects of predictive accuracy and allow meaningful comparison between traditional statistical models and machine learning approaches. The evaluation metrics used in this study include: a. Mean Absolute Error (MAE). MAE measures the average magnitude of prediction errors without considering their direction. It provides a straightforward interpretation of how close the predicted values are to the actual outcomes. b. Mean Squared Error (MSE). MSE evaluates the average squared difference between predicted and observed values. By penalising larger errors more heavily, MSE highlights models that produce extreme prediction deviations. c. Root Mean Squared Error (RMSE). RMSE is the square root of the mean squared error and expresses prediction error in the same units as the outcome variable. It is commonly used in predictive modelling to provide an intuitive measure of model accuracy. d. Coefficient of Determination (R 2 ). R 2 measures the proportion of variance in the outcome variable that can be explained by the predictive model. Higher R 2 values indicate better model fit and stronger explanatory capability. 3. Model Comparison The performance of the baseline linear regression model, individual machine learning algorithms, and the ensemble learning architecture was compared using the evaluation metrics described above. This comparative evaluation allows the study to assess whether advanced machine learning approaches provide meaningful improvements over traditional statistical modelling. The final model selection was based on a combination of prediction accuracy, robustness across validation folds, and model stability. By applying multiple evaluation metrics and cross-validation procedures, the study ensures that the selected model demonstrates consistent performance across different data partitions and is not dependent on a single training configuration. 3 Results This investigation implements ensemble machine learning methods of predictive modelling to estimate Gross Death Rate (GDR) by using performance indicators of the hospitals and clinical features of the patients. The resultant results include a comparative evaluation of the model performance, an analysis of feature importance, correlation analyses, hypothesis validation through data visualization, regression diagnostics, and a synthesis of the salient results. 3.1 Predictive Performance of Models Predictive accuracy was measured using five-fold time-aware cross-validation with three key metrics, i.e., coefficient of determination (R 2 ), root mean square error (RMSE), and mean absolute error (MAE). The larger value of R 2 and the smaller value of RMSE/MAE indicate better performance of the model. The results of the cross-validation for the separate models are contained in Table 2 . Table 2 Cross-validation results of predictive models Model \(\:{\varvec{R}}^{2}\) RMSE MAE Random Forest (RF) 0.733 5.83 3.54 Linear Regression (LR) 0.777 5.33 3.58 XGBoost (XGB) 0.575 7.36 4.59 Neural Network (NN) 0.801 5.03 2.74 Stacked Ensemble (NN + RF+XGB) 0.841 4.49 3.39 From Table 2 , the results indicate that the stacked ensemble model, consisting of a neural network, random forest, and XGBoost, had the maximum R 2 (0.841) and the minimum RMSE (4.49), indicating superior predictive performance. On the other hand, XGBoost showed the lowest predictive ability with the lowest R 2 (0.575) and the highest RMSE (7.36). Although the stacked ensemble achieved the highest R 2 and the lowest RMSE, the neural network produced the lowest MAE. This indicates a trade-off between variance explanation and absolute error, and the ensemble was selected as the primary model due to its overall robustness across metrics. This result may reflect the relatively small dataset size and the aggregated nature of the institutional indicators, which may limit the boosting algorithm's ability to capture complex hierarchical patterns. These figures facilitate a comparative visual assessment of model performance. Figures 1 – 3 provide a visual comparison of predictive performance. Figure 1 shows that the stacked ensemble achieves the highest explained variance (R²), whereas Figs. 2 and 3 indicate that it also yields the lowest RMSE. In contrast, the neural network achieves the lowest MAE, highlighting a trade-off between variance explanation and absolute error. 3.2 Feature Importance and SHAP Interpretation In order to find the important predictors of GDR, a feature importance analysis was performed using a random forest model, and a correlation analysis was performed. The main characteristics that affect GDR predictions are summarized in Table 3 . Table 3 presents the five most influential predictors identified by the feature importance analysis, while the remaining variables demonstrated substantially lower importance scores. Table 3 Most powerful predictors of Gross Death Rate (GDR). Feature Importance Score Net Death Rate 0.39 Heart Failure 0.35 Cardiogenic Shock 0.28 Length of Stay (LOS) 0.26 Bed Turnover Rate (BTR) 0.22 Table 3 indicates that Net Death Rate (importance score = 0.39) is the most influential predictor followed by Heart Failure (importance score = 0.35) and Cardiogenic Shock (importance score = 0.28). This pattern indicates that mortality-related indicators and acute cardiac conditions are the main ones for the GDR variations. To prevent target leakage, mortality-related indicators such as Net Death Rate were calculated from historical time windows preceding the prediction period and therefore did not include contemporaneous mortality information associated with the target variable (Gross Death Rate). This design ensures that the predictors represent prior institutional patterns rather than direct components of the outcome variable. Figure 4 gives visual confirmation to these findings, showing that clinical conditions have a more substantial influence on mortality prediction than hospital operational metrics. 3.3 Correlation Analysis To further clarify the inter-variable relationships in the set of data, a correlation heatmap (Fig. 5 ) was created. The heatmap provides a visualization of the magnitude and direction of correlation, which can aid in identifying patterns, and the correlation matrix shows the exact numerical relationship between the predictors. Some of the key findings from Fig. 5 are that: Cardiogenic Shock has moderate positive correlation with GDR (r = 0.50) thereby supporting its role as a key correlate of institutional-level mortality. Heart Failure shows a moderate negative correlation with GDR (r = -0.36). These correlations should be interpreted as descriptive associations rather than causal relationships, as institutional case-mix differences and referral patterns may influence the observed relationships. Length of Stay (LOS) is associated negatively with GDR (r= -0.32), suggesting that length of stay in hospital may be an important factor in reduced mortality rates. 3.4 Regression Diagnostics and Distributional Evidence To further explore the relationship between clinical severity indicators and institutional mortality burden, we examined the distribution of monthly Gross Death Rate (GDR) across periods characterised by different levels of clinical severity indicators. Because the dataset consists of aggregated institutional observations rather than individual patient records, the analysis compares months with relatively higher versus lower prevalence of specific clinical conditions. This approach enables descriptive assessment of how fluctuations in clinical severity indicators correspond to variations in institutional-level mortality patterns. These distributions suggest that periods with a higher prevalence of cardiogenic shock correspond to higher GDR, whereas periods with higher heart failure prevalence correspond to lower GDR. This inverse association may reflect context-specific care pathways, structured disease management programmes, or referral patterns, rather than a direct protective effect. In comparison to machine learning methods, an Ordinary Least Squares (OLS) regression was used to compare the predictive methodologies. The residual plot (Fig. 8 ) shows non-random patterns and appears to indicate model mis-specification due to a non-linear effect. This analysis illustrates the suitability of machine learning approaches for modelling non-linear associations. 4 Discussion 4.1 Principal Findings This study investigated the performance of ensemble machine learning models of Gross Death Rate (GDR) prediction using both clinical severity and hospital operational measures at hospital level. Three main findings came up. First, the stacked ensemble learning model was consistently superior to individual machine learning models and linear regression in the prediction of aggregated in-hospital mortality burden. Second, clinical variables, especially cardiogenic shock and heart failure, were the major factors in mortality prediction, whereas the administrative variables showed little direct predictive value. Third, feature engineering, in particular, the introduction of interaction terms among operational indicators, enhanced stability of the models and improved the predictive accuracy. Unlike many mortality prediction studies that evaluate patient-level risk using clinical scoring systems such as SOFA or APACHE, the present study focuses on institutional-level mortality burden measured through Gross Death Rate. Because these clinical scoring systems are designed for individual patient risk stratification rather than hospital-wide performance indicators, direct comparison with institutional mortality metrics would not be methodologically appropriate. Instead, this study evaluates the relative contribution of clinical severity indicators and operational hospital metrics within a machine learning framework designed for system-level decision support. 4.2 Ensemble Learning Superiority for Institutional-Level Prediction of Mortality The superior performance of the stacked ensemble model is in line with the known theory of ensemble learning, which states that the combination of heterogeneous learners can minimize variance and better generalisation by capturing complementary data structures. In this study the merging of the tree-based models and neural networks allowed the learning of non-linear and interaction effects not properly represented by the individual learners. The increment in explained variance (R 2 = 0.84) over the best single model is indicative of the suitability of ensemble architectures for the modelling of complex dynamics in hospital-level mortality, where clinical severity as well as system-level operational conditions are involved in the outcomes. Importantly, the performance gains seen came with stable error distributions which suggests that the ensemble is less prone to overfitting to the training data, but improving its robustness. This characteristic is particularly relevant for institutional-level applications, where there is a need for prediction stability for operational planning and quality monitoring. 4.3 Clinical Supremacy over Administrative Indicators A similar result was noted in the feature importance analyses, the correlation assessments, and the regression diagnostics: clinical severity measures were considerably more prominent than administrative measures in explaining the variation in GDR. Cardiogenic shock was the strongest positive predictor of the burden of mortality, reflecting an accepted marker of acute haemodynamic compromise with a high risk of mortality. Heart failure showed an inverse relation to GDR at the institutional level, an observation that is probably due to the context, i.e., specialised care pathways, structured disease management programmes, and referral patterns in centres with higher heart failure case volumes. In contrast, operational metrics, such as length of stay, bed occupancy rate, and bed turnover rate, had limited ability on their own to predict. Their contribution seemed to be mostly indirect (operating through interaction effects) and not as independent mortality determinants. These findings make administrative indicators appear to play a secondary role in driving mortality risk and support the finding that clinical severity is the most important driver of mortality to watch in hospital-wide mortality surveillance. 4.4 Contribution of Feature Engineering and Interaction Effect The addition of the engineered interaction features greatly improved the model's performance and stability. Interaction terms between operational indicators, such as length of stay and bed occupancy rate allowed the models to include system-level constraints that may increase or reduce the effect of clinical risk. The decrease of variance of prediction error and the more narrow distribution of the inter-quartile range in the ensemble model indicate the contribution of feature engineering to increase the generalisability and reliability. These results highlight that as seemingly important and single predictors of mortality, administrative metrics may be inadequate and their interaction with clinical variables is valuable for providing contextual information. This validates a hybrid modelling approach where operational data complement and not substitute data based on clinical grounds. 4.5 Clinical, Managerial, and Methodological Implications From a clinical perspective, the findings recommend that risk stratification frameworks implemented across hospitals should focus on indicators of acute clinical severity, specifically shock states, and to use operational measures to put resource allocation and care coordination in context. Embedding such predictive outputs into the multidisciplinary processes for managing beds, and escalation protocols, may help to improve the early identification of periods of higher levels of mortality burden. For hospital management, however, the results warn against using administrative measures of efficiency, such as bed occupancy, as direct levers of mortality reduction. Instead, the operational decisions should be informed by clinically-driven risk signals, so that the capacity planning and staffing changes are in line with the underlying patient acuity. Methodologically, this research shows the usefulness of explainable ensemble learning at the level of institutions. The combination of SHAP analysis and post-hoc regression offers transparency without losing the value of non-linear models predictive advantage in the development of a healthcare system trustworthy AI tools. 5 Conclusions This study shows that explainable methods for ensemble machine learning can be used to robustly model the hospital-wide burden of mortality, as measured by the Gross Death Rate (GDR). In particular, the stacked ensemble was able to outperform the individual learning algorithms all the time, achieving a high predictive performance (R 2 ~ 0.84). These results highlight the value of the model aggregation method for institutional level mortality estimation as it provides a more nuanced approach than single algorithm models. Across the analyses, clinical variables, most prominently cardiogenic shock and heart failure, appeared to be the major determinants of variation in mortality. By contrast, administrative metrics like length of stay, bed occupancy rate and bed turnover rate had only relatively small direct explanatory power. Nonetheless, the inclusion of interaction terms by design had beneficial effects on both model stability and accuracy and shows that operational factors have meaningful explanatory capacity when interpreted in conjunction with clinical risk. From an applied perspective, these results provide arguments that hospital mortality surveillance should be underpinned by clinically derived risk signals and that operational indicators are more needed as a means of providing a context to guide coordinating rather than being used as an independent lever of outcome improvement. As currently designed, such ensemble-based risk estimates might find application by integrating such products in routine operational workflows, such as beds management and escalation meetings, as part of efforts to move closer to early detection of high-risk periods and to conducting informed capacity planning. Methodologically speaking, the research provides a stable and understandable group conduct framework appropriate for institutional support to decision making. While the single centre design and aggregated outcome limit generalisability and patient level inference, the outcomes provide the basis for future validation in a multi-centre design, extension of longitudinal outcomes and a prospective evaluation of clinical impact. In sum, the proposed approach provides a pragmatic means to advance the linkage between predictive analytics and real-world hospital management and quality improvement programs. Declarations Competing Interests The authors declare that they have no competing interests. Ethics Approval Ethical approval was obtained from the institutional review board of RSUD Dr. Zainoel Abidin, Banda Aceh. Funding The authors received no specific funding for this research. Author Contributions All authors contributed to the study conception, methodology, analysis, and manuscript preparation. Data Availability The datasets used and analysed during the current study are not publicly available due to hospital data protection regulations but may be available from the corresponding author upon reasonable request. References König S, Pellissier V, Hohenstein S, Leiner J, Meier-Hellmann A, Kuhlen R et al (2022) From population-to patient-based prediction of in-hospital mortality in heart failure using machine learning. Eur Heart J - Digit Health 3. https://doi.org/10.1093/ehjdh/ztac012 Trentino KM, Schwarzbauer K, Mitterecker A, Hofmann A, Lloyd A, Leahy MF et al (2022) Machine Learning-Based Mortality Prediction of Patients at Risk During Hospital Admission. J Patient Saf 18. https://doi.org/10.1097/PTS.0000000000000957 Yun K, Oh J, Hong TH, Kim EY (2021) Prediction of Mortality in Surgical Intensive Care Unit Patients Using Machine Learning Algorithms. Front Med (Lausanne) 8. https://doi.org/10.3389/fmed.2021.621861 Jawadi Z, He R, Srivastava PK, Fonarow GC, Khalil SO, Krishnan S et al (2024) Predicting in-hospital mortality among patients admitted with a diagnosis of heart failure: a machine learning approach. ESC Heart Fail 2490–2498. https://doi.org/10.1002/ehf2.14796 Thorsen-Meyer H-C, Nielsen AB, Nielsen AP, Kaas-Hansen BS, Toft P, Schierbeck J et al (2020) Dynamic and explainable machine learning prediction of mortality in patients in the intensive care unit: a retrospective study of high-frequency data in electronic patient records. Lancet Digit Health 2:e179–e191. https://doi.org/https://doi.org/10.1016/S2589-7500(20)30018-2 Deschepper M, Waegeman W, Vogelaers D, Eeckloo K (2020) Using structured pathology data to predict hospital-wide mortality at admission. PLoS ONE 15. https://doi.org/10.1371/journal.pone.0235117 Shah N, Konchak C, Chertok D, Au L, Kozlov A, Ravichandran U et al (2020) Clinical Analytics Prediction Engine (CAPE): Development, electronic health record integration and prospective validation of hospital mortality, 180-day mortality and 30-day readmission risk prediction models. PLoS ONE 15. https://doi.org/10.1371/journal.pone.0238065 Li C, Zhang Z, Ren Y, Nie H, Lei Y, Qiu H et al Machine learning based early mortality prediction in the emergency department. Int J Med Inf 2021;155. https://doi.org/10.1016/j.ijmedinf.2021.104570 Giwangkancana GW, Anina HN, Sukandar H (2024) Predicting End-of-Life in a Hospital Setting. J Multidiscip Healthc 17. https://doi.org/10.2147/JMDH.S443425 Wu M, Gao H (2023) A prediction model for in-hospital mortality in intensive care unit patients with metastatic cancer. Front Surg 10. https://doi.org/10.3389/fsurg.2023.992936 Nie X, Cai Y, Liu J, Liu X, Zhao J, Yang Z et al Mortality Prediction in Cerebral Hemorrhage Patients Using Machine Learning Algorithms in Intensive Care Units. Front Neurol 2021;11. https://doi.org/10.3389/fneur.2020.610531 Naemi A, Schmidt T, Mansourvar M, Naghavi-Behzad M, Ebrahimi A, Wiil UK (2021) Machine learning techniques for mortality prediction in emergency departments: A systematic review. BMJ Open 11. https://doi.org/10.1136/bmjopen-2021-052663 Huang T, Le D, Yuan L, Xu S, Peng X Machine learning for prediction of in-hospital mortality in lung cancer patients admitted to intensive care unit. PLoS ONE 2023;18. https://doi.org/10.1371/journal.pone.0280606 Deng T, Hamdan H, Yaakob R, Kasmiran KA Personalized Federated Learning for In-Hospital Mortality Prediction of Multi-Center ICU. IEEE Access 2023;11. https://doi.org/10.1109/ACCESS.2023.3241488 Maheswari BU, Ashik F, George A, Jose A, Explainable AI (2023) In-Hospital Mortality Prognosis: Unmasking Patterns using Data Science and. 9th International Conference on Signal Processing and Communication, ICSC 2023, 2023. https://doi.org/10.1109/ICSC60394.2023.10441356 Seki T, Kawazoe Y, Ohe K Machine learning-based prediction of in-hospital mortality using admission laboratory data: A retrospective, single-site study using electronic health record data. PLoS ONE 2021;16. https://doi.org/10.1371/journal.pone.0246640 Warman PI, Seas A, Satyadev N, Adil SM, Kolls BJ, Haglund MM et al Machine Learning for Predicting In-Hospital Mortality After Traumatic Brain Injury in Both High-Income and Low- and Middle-Income Countries. Neurosurgery 2022;90. https://doi.org/10.1227/neu.0000000000001898 Sinha S, Dong T, Dimagli A, Judge A, Angelini GD (2024) A machine learning algorithm-based risk prediction score for in-hospital/30-day mortality after adult cardiac surgery. Eur J Cardiothorac Surg 66:ezae368. https://doi.org/10.1093/ejcts/ezae368 Lee B, Kim K, Hwang H, Kim YS, Chung EH, Yoon JS et al (2021) Development of a machine learning model for predicting pediatric mortality in the early stages of intensive care unit admission. Sci Rep 11. https://doi.org/10.1038/s41598-020-80474-z Qiao EM, Qian AS, Nalawade V, Voora RS, Kotha NV, Vitzthum LK et al (2022) Evaluating High-Dimensional Machine Learning Models to Predict Hospital Mortality Among Older Patients With Cancer. JCO Clin Cancer Inf. https://doi.org/10.1200/cci.21.00186 Li L, Ding L, Zhang Z, Zhou L, Zhang Z, Xiong Y et al (2023) Development and Validation of Machine Learning–Based Models to Predict In-Hospital Mortality in Life-Threatening Ventricular Arrhythmias: Retrospective Cohort Study. J Med Internet Res 25. https://doi.org/https://doi.org/10.2196/47664 Li M, Han S, Liang F, Hu C, Zhang B, Hou Q et al (2024) Machine Learning for Predicting Risk and Prognosis of Acute Kidney Disease in Critically Ill Elderly Patients During Hospitalization: Internet-Based and Interpretable Model Study. J Med Internet Res 26. https://doi.org/https://doi.org/10.2196/51354 Lee SW, Lee HC, Suh J, Lee KH, Lee H, Seo S et al Multi-center validation of machine learning model for preoperative prediction of postoperative mortality. NPJ Digit Med 2022;5. https://doi.org/10.1038/s41746-022-00625-6 Fang C, Pan Y, Zhao L, Niu Z, Guo Q, Zhao B (2022) A Machine Learning-Based Approach to Predict Prognosis and Length of Hospital Stay in Adults and Children With Traumatic Brain Injury: Retrospective Cohort Study. J Med Internet Res 24. https://doi.org/10.2196/41819 Kwun JS, Ahn HB, Kang SH, Yoo S, Kim S, Song W et al (2025) Developing a Machine Learning Model for Predicting 30-Day Major Adverse Cardiac and Cerebrovascular Events in Patients Undergoing Noncardiac Surgery: Retrospective Study. J Med Internet Res 27. https://doi.org/10.2196/66366 Muralitharan S, Nelson W, Di S, McGillion M, Devereaux PJ, Barr NG et al (2021) Machine learning–Based early warning systems for clinical deterioration: Systematic scoping review. J Med Internet Res 23. https://doi.org/10.2196/25187 Lv H, Yang X, Wang B, Wang S, Du X, Tan Q et al (2021) Machine Learning–Driven Models to Predict Prognostic Outcomes in Patients Hospitalized With Heart Failure Using Electronic Health Records: Retrospective Study. J Med Internet Res 23. https://doi.org/https://doi.org/10.2196/24996 Estiri H, Strasser ZH, Klann JG, Naseri P, Wagholikar KB, Murphy SN Predicting COVID-19 mortality with electronic medical records. NPJ Digit Med 2021;4. https://doi.org/10.1038/s41746-021-00383-x Hsu C-N, Liu C-L, Tain Y-L, Kuo C-Y, Lin Y-C (2020) Machine Learning Model for Risk Prediction of Community-Acquired Acute Kidney Injury Hospitalization From Electronic Health Records: Development and Validation Study. J Med Internet Res 22. https://doi.org/https://doi.org/10.2196/16903 Halasz G, Sperti M, Villani M, Michelucci U, Agostoni P, Biagi A et al (2021) A machine learning approach for mortality prediction in COVID-19 pneumonia: Development and evaluation of the Piacenza score. J Med Internet Res 23. https://doi.org/10.2196/29058 Additional Declarations The authors declare no competing interests. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9203293","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":610878933,"identity":"bae8d9d5-363b-46a0-ab32-e469357e330f","order_by":0,"name":"Sri Murdiati","email":"","orcid":"","institution":"Syiah Kuala University","correspondingAuthor":false,"prefix":"","firstName":"Sri","middleName":"","lastName":"Murdiati","suffix":""},{"id":610878934,"identity":"b87294e4-2784-462c-ac78-24ac2b7f3e67","order_by":1,"name":"Murnawan Murnawan","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA/klEQVRIiWNgGAWjYDADfgbGBgkkvgRjAyEtkg1gLQYkaDE4AFSGpIUBpxbdGbnPHvxss8szvpHceJun5o+8bvsBxg8/GCxkcWkxu5FubtjbllxsdiOx2ZrnmIHhtjMJzJI9DBLGuLWksUnwbmNO3HY7sU2ah82AcduBBAZpoCMT8WmR/LutPnHzbJCWfwb2284/YP5NSIs077bDiRukgVp42wwSt91IYMNvy5lnbNKy/44nzrj/sNlybp9x8rYbD9ssewzw+OU40GFvzlQn9vccf3jjzTc5223nkw/f+FFRhzPEUAATD5gCxYgBfpVwwPiDSIWjYBSMglEwsgAA8exa9tyw20MAAAAASUVORK5CYII=","orcid":"https://orcid.org/0000-0001-8919-8997","institution":"Widyatama University","correspondingAuthor":true,"prefix":"","firstName":"Murnawan","middleName":"","lastName":"Murnawan","suffix":""},{"id":610878935,"identity":"44b81494-7104-4072-a7ad-ba18105c153e","order_by":2,"name":"Safrizal Rahman","email":"","orcid":"","institution":"Syiah Kuala University","correspondingAuthor":false,"prefix":"","firstName":"Safrizal","middleName":"","lastName":"Rahman","suffix":""},{"id":610878936,"identity":"4b3b5833-7945-40c2-a623-7faec743ea91","order_by":3,"name":"Yoga Yuniadi","email":"","orcid":"","institution":"University of Indonesia","correspondingAuthor":false,"prefix":"","firstName":"Yoga","middleName":"","lastName":"Yuniadi","suffix":""},{"id":610878937,"identity":"f9755ac8-a57a-48b5-bf9a-d40f1232d355","order_by":4,"name":"Nirwana Lazuardi Sary","email":"","orcid":"","institution":"Syiah Kuala University","correspondingAuthor":false,"prefix":"","firstName":"Nirwana","middleName":"Lazuardi","lastName":"Sary","suffix":""}],"badges":[],"createdAt":"2026-03-23 17:20:08","currentVersionCode":1,"declarations":{"humanSubjects":true,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":true,"humanSubjectConsent":true,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-9203293/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9203293/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":105363708,"identity":"343e5e18-f120-4f5b-b873-a89c44d92a78","added_by":"auto","created_at":"2026-03-25 08:21:50","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":127944,"visible":true,"origin":"","legend":"\u003cp\u003eComparison of R\u003csup\u003e2\u003c/sup\u003e across predictive models\u003c/p\u003e","description":"","filename":"RScoreModelComparison.png","url":"https://assets-eu.researchsquare.com/files/rs-9203293/v1/aa13347f8258fe1a3a146e9f.png"},{"id":105363712,"identity":"d117c187-f3d0-4edf-a1ad-803c8efe4e17","added_by":"auto","created_at":"2026-03-25 08:21:50","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":123904,"visible":true,"origin":"","legend":"\u003cp\u003eComparison of the RMSE scores from models\u003c/p\u003e","description":"","filename":"RMSEModelComparison.png","url":"https://assets-eu.researchsquare.com/files/rs-9203293/v1/f02b061f4054174f00ad3f08.png"},{"id":105363711,"identity":"e79f6cf1-fef3-4b23-bce2-82c51f416d18","added_by":"auto","created_at":"2026-03-25 08:21:50","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":117616,"visible":true,"origin":"","legend":"\u003cp\u003eComparison of the MAE scores of models\u003c/p\u003e","description":"","filename":"MAEModelComparison.png","url":"https://assets-eu.researchsquare.com/files/rs-9203293/v1/845ac87f25982b41dbaee334.png"},{"id":105565782,"identity":"8aa370f1-18b0-4794-944e-a522e679818b","added_by":"auto","created_at":"2026-03-27 12:54:22","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":85747,"visible":true,"origin":"","legend":"\u003cp\u003eFeature Importance ranking from the Random Forest model\u003c/p\u003e","description":"","filename":"FeatureImportanceRandomForest.png","url":"https://assets-eu.researchsquare.com/files/rs-9203293/v1/ab53e1d0139309035e001201.png"},{"id":105565337,"identity":"124f8c39-ac21-4843-b4c7-cc41d4a68e2e","added_by":"auto","created_at":"2026-03-27 12:52:57","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":69891,"visible":true,"origin":"","legend":"\u003cp\u003eCorrelation heatmap\u003c/p\u003e","description":"","filename":"HeatmapCorrelationKeyPredictorsandGrossDeathRate.png","url":"https://assets-eu.researchsquare.com/files/rs-9203293/v1/e540575d6f8d06519b27071f.png"},{"id":105363709,"identity":"e9b21674-b885-4f5e-bb34-17bc65d49cb4","added_by":"auto","created_at":"2026-03-25 08:21:50","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":27519,"visible":true,"origin":"","legend":"\u003cp\u003eDistribution of monthly Gross Death Rate (GDR) in periods with higher versus lower prevalence of cardiogenic shock\u003c/p\u003e","description":"","filename":"DistributionofGrossDeathRateBasedonCardiogenicShock.png","url":"https://assets-eu.researchsquare.com/files/rs-9203293/v1/a9e87e408ff583d06604f9ed.png"},{"id":105363715,"identity":"aa4a5d7a-f0e1-43d3-8b6a-8037e7801d6f","added_by":"auto","created_at":"2026-03-25 08:21:50","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":26794,"visible":true,"origin":"","legend":"\u003cp\u003eDistribution of monthly Gross Death Rate (GDR) in periods with higher versus lower prevalence of heart failure cases\u003c/p\u003e","description":"","filename":"DistributionofGrossDeathRateBasedonHeartFailure.png","url":"https://assets-eu.researchsquare.com/files/rs-9203293/v1/6aa29423316649f96126f7d4.png"},{"id":105566250,"identity":"e6b02ad7-1f87-4894-9943-35e22b9eafe9","added_by":"auto","created_at":"2026-03-27 12:55:53","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":34548,"visible":true,"origin":"","legend":"\u003cp\u003eOLS residual plot\u003c/p\u003e","description":"","filename":"ResidualPlotforRegressionModel.png","url":"https://assets-eu.researchsquare.com/files/rs-9203293/v1/3e57e13166e729fd2d8027f1.png"},{"id":105570231,"identity":"842d5c1e-94b3-4114-80e3-23ce2af54874","added_by":"auto","created_at":"2026-03-27 13:15:36","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1757912,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9203293/v1/16375ef5-c532-4de3-961e-ea76ad68badd.pdf"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003e\u003cstrong\u003eEnsemble Machine Learning for Institutional-Level Hospital Mortality Prediction Using Clinical and Operational Indicators\u003c/strong\u003e\u003c/p\u003e","fulltext":[{"header":"1 Introduction","content":"\u003cdiv id=\"Sec2\" class=\"Section2\"\u003e \u003ch2\u003e1.1 Research Background\u003c/h2\u003e \u003cp\u003eThe accurate prediction of inpatient mortality is one of the key areas of debate in modern medicine, thus stimulating the development of advanced computational approaches to improve prognostic power and guide clinical decision-making. Machine learning (ML) techniques utilize high-dimensional electronic health records (EHRs) to uncover predictive patterns. These patterns can support early identification of high-risk patients and enable timely clinical intervention that may reduce hospital mortality [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. The outcome of recent empirical investigations is that ML approaches (including ensemble learning, deep neural architectures, and decision tree families) tend to outperform conventional statistical models. For example, decision tree models have achieved area under the curve (AUC) values close to 0.96 in intensive care settings, and boosting-based ensemble classifiers are better than regression baselines for selected groups such as patients with heart failure [\u003cspan additionalcitationids=\"CR3\" citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. However, concerns about generalizability, interpretability, and external validation still exist, which limit the routine deployment. Concurrently, a multifactorial perspective gathering clinical determinants together with operational signals has developed; studies show that an amalgamation of structured admission data improves the assessment of mortality risks, whereas the integration of real-time EHRs provides a hospital-wide predictive capacity [\u003cspan additionalcitationids=\"CR6 CR7 CR8 CR9 CR10\" citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e1.2 Research Gap\u003c/h2\u003e \u003cp\u003eDespite a boost to speed, there are four pertinent gaps that persist. First, many models still favour administrative variables and underutilize critical clinical biomarkers and physiological parameters, which limits predictive accuracy and the transportability of findings across populations [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e, \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]. Second, there is uneven interpretability: black box behaviour may undermine transparency and clinician trust, making it difficult to incorporate into clinical workflows smoothly [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e, \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]. Third, the evidence base is predominantly intensive care unit (ICU)-centered and single-site, with a lack of utilization of hospital-wide performance metrics and deterioration in model performance over time, with no maintenance strategies [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. Fourth, the external validity is not always carefully evaluated; cross- institutional verification is limited, and tailoring for high-risk sub-populations (e.g., paediatric, oncology) is limited [\u003cspan additionalcitationids=\"CR17 CR18 CR19\" citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e]. Addressing these deficiencies is essential to ensure that ML-based predictions become clinically actionable and broadly adoptable.\u003c/p\u003e \u003cp\u003eThis study makes three main contributions to the field of health informatics and hospital analytics. First, it proposes an institutional-level mortality prediction framework that integrates hospital operational indicators with clinical severity variables. Second, it evaluates the predictive performance of stacked ensemble machine learning models compared with individual machine learning algorithms and a conventional linear regression baseline. Third, the study incorporates explainable artificial intelligence using SHAP to enhance model transparency and support interpretable decision-making for hospital-wide mortality monitoring.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e1.3 Research Problem\u003c/h2\u003e \u003cp\u003eCurrent hospital mortality prediction models often devalue enterprise-level performance measures, resulting in low operational utility for risk scores in day-to-day decision making, such as bed allocation, staffing, and timely escalation. In the real world of clinical practice, a combination of key metrics such as Bed Occupancy Rate (BOR), Length of Stay (LOS), and Bed Turnover Rate (BTR) often is not integrated with clinical covariates and undermines generalizability, actionability, and interpretability across heterogeneous wards. This is important as this research aims to address the gap of a validated, explainable ML approach that simultaneously uses performance and clinical data to provide accurate, calibrated, and operationally meaningful mortality risk estimates for hospital-wide deployment [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e, \u003cspan additionalcitationids=\"CR22 CR23 CR24 CR25\" citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e]\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e1.4 Research Questions\u003c/h2\u003e \u003cp\u003e \u003cb\u003eRQ1.\u003c/b\u003e Do ensemble machine learning frameworks that integrate hospital operational metrics (bed occupancy rate, length of stay, and bed turnover rate) with clinical covariates outperform individual machine learning models and a linear regression baseline in predicting institutional-level in-hospital mortality burden, as measured by Gross Death Rate?\u003c/p\u003e \u003cp\u003e \u003cb\u003eRQ2.\u003c/b\u003e What is the relative contribution of clinical severity indicators compared with hospital operational metrics in explaining variation in institutional-level in-hospital mortality?\u003c/p\u003e \u003cp\u003e \u003cb\u003eRQ3.\u003c/b\u003e Can explainable artificial intelligence techniques, such as SHAP, enhance the transparency and interpretability of ensemble machine learning models without compromising predictive performance?\u003c/p\u003e \u003cp\u003e \u003cb\u003eRQ4.\u003c/b\u003e Does feature engineering, particularly the inclusion of interaction terms among hospital operational indicators, improve predictive accuracy and model stability for institutional-level mortality prediction?\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003e1.5 Research Objectives\u003c/h2\u003e \u003cp\u003e \u003cb\u003eO1.\u003c/b\u003e To compare the predictive proficiency of ensemble machine learning models, individual machine learning algorithms, and a linear regression baseline using two metrics: explained variance of the models (R\u003csup\u003e2\u003c/sup\u003e) and absolute error (RMSE, MAE).\u003c/p\u003e \u003cp\u003e \u003cb\u003eO2.\u003c/b\u003e To measure the relative significance of clinical and operational predictors of Gross Death Rate using feature-importance analyses, correlation analyses and regression diagnostics.\u003c/p\u003e \u003cp\u003e \u003cb\u003eO3.\u003c/b\u003e To assess the model interpretability using SHAP-based global and local feature attribution and post-hoc regression analysis.\u003c/p\u003e \u003cp\u003e \u003cb\u003eO4.\u003c/b\u003e To see the effect of feature engineering, specifically the interaction terms between operational metrics, on predictions and error stability.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003e1.6 Significance of the Study\u003c/h2\u003e \u003cp\u003eFrom a theoretical perspective, the inquiry addresses the gap between health care operations and clinical aspects of risk prediction by formalizing the role of performance metrics of hospitals in mortality modelling and therefore overcoming the historical ICU-centric bias [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e]. Methodologically, it introduces a replicable machine learning plus explainable AI workflow with an explainable machine learning workflow that supports transparent performance evaluation and institutional-level decision support [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e, \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e]. Practically, the ability to integrate performance and clinical signals offers the promise of improving patient safety and throughput by enabling earlier risk stratification, improved bed and staff planning, and more timely clinical response [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e, \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e]. At the systems level, the framework makes quality dashboards and performance contracts informational and develops operational indicators correlated with outcome along with clarifying boundary conditions and promoting external validation, prospective trials, and fairness audits across heterogeneous hospitals [\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e, \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e"},{"header":"2 Methods","content":"\u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003e2.1 Study Design and Overall Workflow\u003c/h2\u003e \u003cp\u003eThis study uses a retrospective experimental design, implementing the machine learning (ML) models for hospital mortality risk prognostication from institutional performance metrics and clinical determinants. The predictive repertoire tested here includes both the singular ML algorithms as well as the ensemble learning paradigms.\u003c/p\u003e \u003cp\u003eThe models that were assessed include:\u003c/p\u003e \u003cp\u003e \u003col\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eIndividual Models: Linear Regression, Random Forest, XGBoost, and Neural Network\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eEnsemble Model: Ensemble Stacked Models that combine the Random Forest Models, XGBoost models, and Neural Network as the meta-learning model.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e \u003cp\u003eThe choice of such models is justified by the fact that they can capture the complex non-linearity of clinical data. Linear Regression is used as a baseline construct because of its interpretability and the fact that it can be used to provide a benchmark against more complex algorithms. Random Forest and XGBoost, as tree-based algorithms, tend to pick up non-linear relationships, while Neural Networks are well-suited to understand in-depth feature interactions. The ensemble strategy makes use of the combined powers of these constituent algorithms, which helps to augment the predictive performance and generalizability of the ensemble. While Neural Networks provide better predictive power, tree-based models like Random Forest, XGBoost, etc., that improve the interpretability of the model, make the ensemble robust and clinical.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003e2.2 Data Source and Study Setting\u003c/h2\u003e \u003cp\u003eThe dataset used in this study was obtained from Rumah Sakit Umum Daerah (RSUD) Dr. Zainoel Abidin, a regional referral hospital located in Banda Aceh, Indonesia. The hospital serves as a major tertiary care centre that receives patients from multiple districts across the province. The dataset contains aggregated institutional-level records of hospital admissions and clinical indicators collected over a 36-month observation period from January 2022 to December 2024.\u003c/p\u003e \u003cp\u003eThe collected dataset integrates both clinical severity indicators and hospital operational metrics, enabling the investigation of their combined influence on institutional mortality patterns. Administrative indicators include operational measures commonly used in hospital performance monitoring, such as Length of Stay (LOS), Bed Occupancy Rate (BOR), and Bed Turnover Rate (BTR). These metrics reflect hospital capacity utilisation and patient flow dynamics, which may indirectly influence mortality risk at the institutional level.\u003c/p\u003e \u003cp\u003eIn addition to administrative indicators, several clinical variables associated with severe cardiovascular complications were incorporated into the dataset. These include Heart Failure, Cardiogenic Shock, and Deep Vein Thrombosis (DVT), which have been reported in previous studies as important predictors of adverse clinical outcomes. The inclusion of both operational and clinical variables enables the present study to examine the relative contribution of system-level factors and patient-level severity indicators within a unified predictive modelling framework.\u003c/p\u003e \u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e summarises the main characteristics of the dataset used in this study, including the observation period, hospital activity indicators, and mortality statistics.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eSummary of dataset characteristics\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"2\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCharacteristic\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eValue\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHospital type\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRegional referral hospital\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eObservation period\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eJanuary 2022 \u0026ndash; December 2024\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTotal observation units\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e36 monthly records\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTotal hospital admissions\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e42,685\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTotal in-hospital deaths\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1,142\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAverage monthly admissions\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1,186\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGross Death Rate (mean)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2.67%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGross Death Rate (range)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1.90% \u0026ndash; 3.52%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNet Death Rate (mean)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1.84%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAverage Length of Stay (LOS)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e5.7 days\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBed Occupancy Rate (BOR)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e73.4%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBed Turnover Rate (BTR)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e6.2 times/month\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAverage Bed Turnover Interval (TOI)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1.9 days\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNumber of predictor variables\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e18\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eOutcome variable\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eGross Death Rate (GDR)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eThe dataset comprises 36 monthly institutional observations. Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e summarises the key dataset characteristics and the outcome variable used in this study.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003e2.3 Dataset Characteristics and Variables\u003c/h2\u003e \u003cp\u003eThe dataset used in this study consists of aggregated institutional-level observations derived from hospital activity records, rather than individual patient-level data. Each observation unit represents a monthly summary of hospital performance indicators and clinical severity variables. This aggregation allows the modelling framework to capture hospital-wide mortality patterns and operational dynamics across the study period.\u003c/p\u003e \u003cp\u003eOverall, the dataset contains 36 monthly observations covering the period from January 2022 to December 2024, representing institutional summaries of admissions, mortality outcomes, and operational indicators. During this period, the hospital recorded 42,685 admissions and 1,142 in-hospital deaths, corresponding to an average Gross Death Rate (GDR) of approximately 2.67%. Additional operational statistics such as Bed Occupancy Rate (BOR), Bed Turnover Rate (BTR), and Bed Turnover Interval (TOI) were included to describe hospital capacity utilisation and patient flow dynamics.\u003c/p\u003e \u003cp\u003eThe dataset includes 18 predictor variables, representing a combination of administrative indicators and aggregated clinical severity measures. These variables were selected based on their relevance in prior hospital mortality studies and their availability within institutional reporting systems.\u003c/p\u003e \u003cp\u003eFor analytical clarity, the variables were organised into three categories: target variable, administrative indicators, and clinical indicators.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003e1. Target Variable\u003c/h3\u003e\n\u003cp\u003eThe primary outcome variable in this study is Gross Death Rate (GDR). GDR represents the proportion of patients who died during hospitalisation relative to the total number of hospital admissions within a defined observation period. In this study, GDR is treated as a continuous institutional-level outcome, enabling the modelling of variations in overall mortality burden across time.\u003c/p\u003e \u003cp\u003eUsing GDR as the outcome variable allows the model to capture hospital-wide mortality dynamics rather than individual patient outcomes. This aggregated perspective is particularly relevant for institutional risk monitoring, quality benchmarking, and operational planning, where decision-making is often based on hospital-level indicators.\u003c/p\u003e\n\u003ch3\u003e2. Administrative Indicators\u003c/h3\u003e\n\u003cp\u003eAdministrative variables describe hospital operational performance and resource utilisation. These indicators are widely used in hospital management and health system evaluation because they reflect patient flow and capacity utilisation. The administrative indicators included in this study are:\u003c/p\u003e \u003cp\u003e \u003col style=\"list-style-type:lower-alpha;\"\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eLength of Stay (LOS): the average duration of hospitalisation for admitted patients.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eBed Occupancy Rate (BOR): the proportion of hospital beds occupied during a given time period.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eBed Turnover Rate (BTR): the frequency at which hospital beds are used by different patients within a specified time interval.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e \u003cp\u003eThese variables capture operational constraints within the hospital system and may indirectly influence mortality outcomes through factors such as patient throughput, bed availability, and resource allocation.\u003c/p\u003e\n\u003ch3\u003e3. Clinical Indicators\u003c/h3\u003e\n\u003cp\u003eClinical variables represent aggregated indicators of patient severity and underlying health conditions associated with increased mortality risk. The clinical indicators included in this study are:\u003c/p\u003e \u003cp\u003e \u003col style=\"list-style-type:lower-alpha;\"\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eHeart Failure, representing the prevalence of cardiac dysfunction among hospitalised patients.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eCardiogenic Shock, a severe cardiac complication associated with high mortality risk.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eDeep Vein Thrombosis (DVT), a clinical condition associated with elevated risk of adverse outcomes.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e \u003cp\u003eThese clinical indicators capture patterns of disease severity across hospital admissions and provide important contextual information for mortality prediction models.\u003c/p\u003e \u003cp\u003eThe combination of clinical severity indicators and hospital operational metrics enables the modelling framework to examine how both patient-level severity patterns and system-level operational conditions contribute to variations in institutional mortality burden. This integrated perspective supports a more comprehensive understanding of mortality dynamics within hospital systems.\u003c/p\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003e2.4 Outcome Definition\u003c/h2\u003e \u003cp\u003eThe primary outcome of this study is the Gross Death Rate (GDR), defined as the proportion of in-hospital deaths relative to the total number of hospital admissions within a defined observation period. In this study, GDR is treated as a continuous institutional-level outcome variable, representing variations in the overall mortality burden experienced by the hospital across time.\u003c/p\u003e \u003cp\u003eGross Death Rate is widely used as a hospital performance indicator in health system evaluation and quality monitoring because it captures cumulative mortality outcomes across heterogeneous wards and patient populations. Unlike patient-level binary outcomes, GDR reflects the aggregate mortality burden at the institutional level, making it suitable for analysing system-wide performance patterns and operational decision contexts.\u003c/p\u003e \u003cp\u003eAlthough GDR does not represent individual patient mortality events directly, it serves as an aggregated proxy for underlying patient-level mortality dynamics. By modelling this institutional-level outcome, the study aims to capture the combined influence of clinical severity indicators and hospital operational conditions on overall mortality patterns. This formulation enables the integration of operational indicators such as Length of Stay (LOS), Bed Occupancy Rate (BOR), and Bed Turnover Rate (BTR) with clinical variables within a unified predictive modelling framework.\u003c/p\u003e \u003cp\u003eIn addition to the primary outcome variable, several mortality-related indicators were incorporated as contextual predictors to represent broader institutional performance patterns. Importantly, these indicators were calculated from historical time windows preceding the prediction horizon, ensuring that the predictors did not include contemporaneous mortality information associated with the outcome variable. This methodological design was implemented to prevent target leakage, ensuring that the predictive variables represent historical institutional patterns rather than direct components of the outcome variable.\u003c/p\u003e \u003cp\u003eIn addition, mortality-related predictors such as Net Death Rate were computed using lagged historical windows (t-1), ensuring that mortality information from the prediction period was not included as input features for the model.\u003c/p\u003e \u003cp\u003eBy focusing on an aggregated institutional mortality indicator, this study prioritises applicability to hospital-level risk surveillance, operational planning, and early identification of periods associated with elevated mortality burden. This institutional perspective complements existing patient-level mortality prediction models by addressing a distinct analytical level relevant for hospital management and system-level decision-making.\u003c/p\u003e \u003cp\u003eFuture research may extend this framework to incorporate patient-level outcomes or time-to-event survival modelling, enabling a more granular analysis of mortality risk while maintaining the integration of operational hospital indicators.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003e2.5 Data Preprocessing and Feature Engineering\u003c/h2\u003e \u003cp\u003ePrior to model training, several preprocessing procedures were applied to ensure data quality, improve model stability, and enhance the predictive capacity of the machine learning algorithms. These procedures included missing value handling, categorical encoding, feature scaling, and feature engineering techniques designed to capture complex relationships among hospital performance indicators and clinical variables.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003e1. Handling Missing Values\u003c/h3\u003e\n\u003cp\u003eMissing values within the dataset were handled using median imputation. This approach was chosen because median imputation is less sensitive to extreme values than mean-based imputation and therefore reduces potential bias caused by outliers in aggregated hospital indicators. By replacing missing entries with the median value of the corresponding variable, the overall distribution of the data was preserved while maintaining numerical stability during model training.\u003c/p\u003e\n\u003ch3\u003e2. Categorical Variable Encoding\u003c/h3\u003e\n\u003cp\u003eCertain variables in the dataset were categorical in nature and therefore required transformation into numerical representations before being used in machine learning models. Label encoding was applied to convert categorical variables into numerical values. This transformation enabled the algorithms to interpret categorical attributes while maintaining computational efficiency.\u003c/p\u003e\n\u003ch3\u003e3. Feature Scaling\u003c/h3\u003e\n\u003cp\u003eTo ensure consistency across variables measured on different scales, feature normalization was performed using the MinMaxScaler technique. Feature scaling is particularly important for algorithms such as neural networks, where differences in feature magnitude may affect gradient-based optimization. By transforming all variables into a comparable numerical range, scaling improves convergence during training and enhances overall model stability.\u003c/p\u003e\n\u003ch3\u003e4. Feature Selection\u003c/h3\u003e\n\u003cp\u003eTo identify the most relevant predictors of institutional mortality patterns, feature selection techniques were applied prior to model development. Two complementary approaches were used:\u003c/p\u003e \u003cp\u003e \u003col style=\"list-style-type:lower-alpha;\"\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eRecursive Feature Elimination (RFE), which iteratively removes less informative variables based on model performance.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eSHAP-based feature importance analysis, which evaluates the contribution of each predictor variable to the model output.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e \u003cp\u003eThese methods allowed the study to focus on the most influential predictors while reducing potential noise from less informative variables.\u003c/p\u003e\n\u003ch3\u003e5. Feature Engineering and Interaction Terms\u003c/h3\u003e\n\u003cp\u003eIn addition to feature selection, feature engineering techniques were implemented to capture potential nonlinear relationships and synergistic effects among hospital operational indicators. Specifically, interaction terms were constructed among selected administrative variables, including:\u003c/p\u003e \u003cp\u003e \u003col style=\"list-style-type:lower-alpha;\"\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eLength of Stay \u0026times; Bed Occupancy Rate (LOS \u0026times; BOR)\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eBed Occupancy Rate \u0026times; Bed Turnover Rate (BOR \u0026times; BTR)\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e \u003cp\u003eThese interaction terms were introduced to represent complex operational dynamics within hospital systems, where the combined effect of resource utilisation indicators may influence mortality outcomes differently than individual variables considered in isolation.\u003c/p\u003e \u003cp\u003eThe inclusion of interaction features was guided by empirical correlation analysis, domain knowledge, and prior literature on hospital performance metrics. By incorporating such engineered variables, the modelling framework is better able to capture underlying system-level constraints and operational conditions that may influence institutional mortality patterns.\u003c/p\u003e \u003cdiv id=\"Sec22\" class=\"Section2\"\u003e \u003ch2\u003e2.6 Model Development and Hyperparameter Optimization\u003c/h2\u003e \u003cp\u003eTo evaluate the predictive capability of machine learning approaches for institutional-level mortality estimation, several models were developed and compared within a unified experimental framework. The modelling strategy consisted of a conventional statistical baseline model, multiple machine learning algorithms, and an ensemble learning architecture designed to capture complex relationships among clinical and operational variables.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003e1. Baseline Model\u003c/h3\u003e\n\u003cp\u003eA Linear Regression model was implemented as the baseline approach. Linear regression is widely used in health services research due to its interpretability and its ability to provide a clear benchmark against which more complex algorithms can be evaluated. By including a baseline statistical model, the study allows for a transparent comparison between traditional regression-based prediction and machine learning methods.\u003c/p\u003e\n\u003ch3\u003e2. Machine Learning Models\u003c/h3\u003e\n\u003cp\u003eThree machine learning algorithms were selected for model development based on their ability to capture nonlinear relationships and complex interactions among predictors:\u003c/p\u003e \u003cp\u003ea. Random Forest (RF).\u003c/p\u003e\u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eRandom Forest is an ensemble tree-based algorithm that constructs multiple decision trees and aggregates their predictions. This method is effective for modelling nonlinear relationships and is robust to noise and overfitting, making it suitable for healthcare datasets that contain heterogeneous variables.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e\u003cp\u003eb. Extreme Gradient Boosting (XGBoost).\u003c/p\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eXGBoost is a boosting-based algorithm that sequentially builds decision trees to minimise prediction errors. By iteratively correcting residual errors from previous trees, XGBoost often achieves high predictive accuracy and has been widely applied in clinical prediction modelling.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003ec. Neural Network (Multilayer Perceptron).\u003c/p\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eA feed-forward neural network model was implemented using a multilayer perceptron (MLP) architecture. Neural networks are capable of capturing complex nonlinear patterns and feature interactions within high-dimensional datasets, making them suitable for modelling intricate relationships between clinical indicators and hospital operational metrics.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e\n\u003ch3\u003e3. Ensemble Learning Architecture\u003c/h3\u003e\n\u003cp\u003eIn addition to individual machine learning models, a stacked ensemble model was developed to improve predictive performance and model robustness. Stacked ensemble learning combines predictions from multiple base learners to generate a final prediction using a meta-learning algorithm.\u003c/p\u003e \u003cp\u003eIn this study, Random Forest and XGBoost were used as base learners, while a Neural Network served as the meta-learner responsible for combining the outputs of the base models. This architecture allows the ensemble model to integrate complementary strengths from different learning algorithms. Tree-based models capture nonlinear decision boundaries, while neural networks can learn higher-order interactions among predictors.\u003c/p\u003e \u003cp\u003eBy aggregating predictions from heterogeneous learners, the ensemble approach reduces variance and improves generalisation performance, which is particularly important for modelling institutional-level mortality dynamics influenced by both clinical severity and operational hospital conditions.\u003c/p\u003e\n\u003ch3\u003e4. Hyperparameter Optimization\u003c/h3\u003e\n\u003cp\u003eTo improve model performance and avoid overfitting, hyperparameter tuning was conducted using a grid search strategy combined with cross-validation. The grid search procedure systematically evaluated multiple combinations of hyperparameters for each machine learning algorithm to identify the configuration that yielded the best predictive performance.\u003c/p\u003e \u003cp\u003eThe hyperparameters explored during model tuning included:\u003c/p\u003e \u003cp\u003e \u003col style=\"list-style-type:lower-alpha;\"\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eRandom Forest: number of trees and maximum tree depth\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eXGBoost: learning rate and number of boosting iterations\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eNeural Network: hidden layer architecture and learning rate\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e \u003cp\u003eModel tuning was conducted using five-fold cross-validation, allowing the models to be trained and evaluated across multiple data partitions. This procedure improves the robustness of the estimated model performance and reduces the risk of overfitting to a specific training subset.\u003c/p\u003e \u003cp\u003eThe combination of baseline modelling, individual machine learning algorithms, and stacked ensemble learning provides a comprehensive framework for evaluating the effectiveness of advanced predictive techniques in modelling institutional-level hospital mortality.\u003c/p\u003e \u003cdiv id=\"Sec27\" class=\"Section2\"\u003e \u003ch2\u003e2.7 Validation Strategy and Performance Metrics\u003c/h2\u003e \u003cp\u003eGiven the temporal nature of monthly institutional observations, we used a time-aware cross-validation strategy (blocked/forward-chaining) to avoid information leakage across adjacent months. Model performance is reported as the average across folds.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003e1. Cross-Validation Strategy\u003c/h3\u003e\n\u003cp\u003eGiven that the dataset consists of temporally ordered monthly institutional observations, a time-aware cross-validation strategy was applied. Specifically, a five-fold cross-validation procedure was implemented while preserving the temporal structure of the data to minimize information leakage across adjacent time periods. This approach ensures that model training and validation are performed on temporally separated subsets of the data, thereby providing a more realistic estimate of predictive performance for institutional mortality forecasting.\u003c/p\u003e \u003cp\u003eCross-validation provides a more reliable estimate of model performance compared with a single train\u0026ndash;test split, particularly for datasets with relatively limited sample sizes. By evaluating models across multiple data partitions, this strategy reduces the risk that model performance estimates are biased by a specific subset of observations.\u003c/p\u003e \u003cp\u003eThe final model performance was calculated as the average performance across the five validation folds, providing a more stable estimate of predictive accuracy. This validation design helps mitigate potential temporal dependence among observations and improves the robustness of model evaluation for longitudinal institutional datasets.\u003c/p\u003e\n\u003ch3\u003e2. Performance Evaluation Metrics\u003c/h3\u003e\n\u003cp\u003eTo comprehensively assess model performance, several complementary evaluation metrics were used. These metrics capture different aspects of predictive accuracy and allow meaningful comparison between traditional statistical models and machine learning approaches.\u003c/p\u003e \u003cp\u003eThe evaluation metrics used in this study include:\u003c/p\u003e \u003cp\u003ea. Mean Absolute Error (MAE).\u003c/p\u003e\u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eMAE measures the average magnitude of prediction errors without considering their direction. It provides a straightforward interpretation of how close the predicted values are to the actual outcomes.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003eb. Mean Squared Error (MSE).\u003c/p\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eMSE evaluates the average squared difference between predicted and observed values. By penalising larger errors more heavily, MSE highlights models that produce extreme prediction deviations.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e\u003cp\u003ec. Root Mean Squared Error (RMSE).\u003c/p\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eRMSE is the square root of the mean squared error and expresses prediction error in the same units as the outcome variable. It is commonly used in predictive modelling to provide an intuitive measure of model accuracy.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003ed. Coefficient of Determination (R\u003csup\u003e2\u003c/sup\u003e).\u003c/p\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eR\u003csup\u003e2\u003c/sup\u003e measures the proportion of variance in the outcome variable that can be explained by the predictive model. Higher R\u003csup\u003e2\u003c/sup\u003e values indicate better model fit and stronger explanatory capability.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e\n\u003ch3\u003e3. Model Comparison\u003c/h3\u003e\n\u003cp\u003eThe performance of the baseline linear regression model, individual machine learning algorithms, and the ensemble learning architecture was compared using the evaluation metrics described above. This comparative evaluation allows the study to assess whether advanced machine learning approaches provide meaningful improvements over traditional statistical modelling.\u003c/p\u003e \u003cp\u003eThe final model selection was based on a combination of prediction accuracy, robustness across validation folds, and model stability. By applying multiple evaluation metrics and cross-validation procedures, the study ensures that the selected model demonstrates consistent performance across different data partitions and is not dependent on a single training configuration.\u003c/p\u003e"},{"header":"3 Results","content":"\u003cp\u003eThis investigation implements ensemble machine learning methods of predictive modelling to estimate Gross Death Rate (GDR) by using performance indicators of the hospitals and clinical features of the patients. The resultant results include a comparative evaluation of the model performance, an analysis of feature importance, correlation analyses, hypothesis validation through data visualization, regression diagnostics, and a synthesis of the salient results.\u003c/p\u003e \u003cdiv id=\"Sec32\" class=\"Section2\"\u003e \u003ch2\u003e3.1 Predictive Performance of Models\u003c/h2\u003e \u003cp\u003ePredictive accuracy was measured using five-fold time-aware cross-validation with three key metrics, i.e., coefficient of determination (R\u003csup\u003e2\u003c/sup\u003e), root mean square error (RMSE), and mean absolute error (MAE). The larger value of R\u003csup\u003e2\u003c/sup\u003e and the smaller value of RMSE/MAE indicate better performance of the model. The results of the cross-validation for the separate models are contained in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eCross-validation results of predictive models\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\varvec{R}}^{2}\\)\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRMSE\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eMAE\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRandom Forest (RF)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.733\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e5.83\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e3.54\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLinear Regression (LR)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.777\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e5.33\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e3.58\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eXGBoost (XGB)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.575\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e7.36\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e4.59\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNeural Network (NN)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.801\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e5.03\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e2.74\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eStacked Ensemble (NN\u0026thinsp;+\u0026thinsp;RF+XGB)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.841\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e4.49\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e3.39\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eFrom Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, the results indicate that the stacked ensemble model, consisting of a neural network, random forest, and XGBoost, had the maximum R\u003csup\u003e2\u003c/sup\u003e (0.841) and the minimum RMSE (4.49), indicating superior predictive performance. On the other hand, XGBoost showed the lowest predictive ability with the lowest R\u003csup\u003e2\u003c/sup\u003e (0.575) and the highest RMSE (7.36).\u003c/p\u003e \u003cp\u003eAlthough the stacked ensemble achieved the highest R\u003csup\u003e2\u003c/sup\u003e and the lowest RMSE, the neural network produced the lowest MAE. This indicates a trade-off between variance explanation and absolute error, and the ensemble was selected as the primary model due to its overall robustness across metrics.\u003c/p\u003e \u003cp\u003eThis result may reflect the relatively small dataset size and the aggregated nature of the institutional indicators, which may limit the boosting algorithm's ability to capture complex hierarchical patterns.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThese figures facilitate a comparative visual assessment of model performance. Figures\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e\u0026ndash;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e provide a visual comparison of predictive performance. Figure\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e shows that the stacked ensemble achieves the highest explained variance (R\u0026sup2;), whereas Figs.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e and \u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e indicate that it also yields the lowest RMSE. In contrast, the neural network achieves the lowest MAE, highlighting a trade-off between variance explanation and absolute error.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec33\" class=\"Section2\"\u003e \u003ch2\u003e3.2 Feature Importance and SHAP Interpretation\u003c/h2\u003e \u003cp\u003eIn order to find the important predictors of GDR, a feature importance analysis was performed using a random forest model, and a correlation analysis was performed.\u003c/p\u003e \u003cp\u003eThe main characteristics that affect GDR predictions are summarized in Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e. Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e presents the five most influential predictors identified by the feature importance analysis, while the remaining variables demonstrated substantially lower importance scores.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eMost powerful predictors of Gross Death Rate (GDR).\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"2\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFeature\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eImportance Score\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNet Death Rate\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.39\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHeart Failure\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.35\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCardiogenic Shock\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.28\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLength of Stay (LOS)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.26\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBed Turnover Rate (BTR)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.22\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e indicates that Net Death Rate (importance score\u0026thinsp;=\u0026thinsp;0.39) is the most influential predictor followed by Heart Failure (importance score\u0026thinsp;=\u0026thinsp;0.35) and Cardiogenic Shock (importance score\u0026thinsp;=\u0026thinsp;0.28). This pattern indicates that mortality-related indicators and acute cardiac conditions are the main ones for the GDR variations.\u003c/p\u003e \u003cp\u003eTo prevent target leakage, mortality-related indicators such as Net Death Rate were calculated from historical time windows preceding the prediction period and therefore did not include contemporaneous mortality information associated with the target variable (Gross Death Rate). This design ensures that the predictors represent prior institutional patterns rather than direct components of the outcome variable.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e gives visual confirmation to these findings, showing that clinical conditions have a more substantial influence on mortality prediction than hospital operational metrics.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec34\" class=\"Section2\"\u003e \u003ch2\u003e3.3 Correlation Analysis\u003c/h2\u003e \u003cp\u003eTo further clarify the inter-variable relationships in the set of data, a correlation heatmap (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e) was created. The heatmap provides a visualization of the magnitude and direction of correlation, which can aid in identifying patterns, and the correlation matrix shows the exact numerical relationship between the predictors.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eSome of the key findings from Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e are that:\u003c/p\u003e \u003cp\u003e \u003col\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eCardiogenic Shock has moderate positive correlation with GDR (r\u0026thinsp;=\u0026thinsp;0.50) thereby supporting its role as a key correlate of institutional-level mortality.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eHeart Failure shows a moderate negative correlation with GDR (r = -0.36). These correlations should be interpreted as descriptive associations rather than causal relationships, as institutional case-mix differences and referral patterns may influence the observed relationships.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eLength of Stay (LOS) is associated negatively with GDR (r= -0.32), suggesting that length of stay in hospital may be an important factor in reduced mortality rates.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec35\" class=\"Section2\"\u003e \u003ch2\u003e3.4 Regression Diagnostics and Distributional Evidence\u003c/h2\u003e \u003cp\u003eTo further explore the relationship between clinical severity indicators and institutional mortality burden, we examined the distribution of monthly Gross Death Rate (GDR) across periods characterised by different levels of clinical severity indicators. Because the dataset consists of aggregated institutional observations rather than individual patient records, the analysis compares months with relatively higher versus lower prevalence of specific clinical conditions. This approach enables descriptive assessment of how fluctuations in clinical severity indicators correspond to variations in institutional-level mortality patterns.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThese distributions suggest that periods with a higher prevalence of cardiogenic shock correspond to higher GDR, whereas periods with higher heart failure prevalence correspond to lower GDR. This inverse association may reflect context-specific care pathways, structured disease management programmes, or referral patterns, rather than a direct protective effect.\u003c/p\u003e \u003cp\u003eIn comparison to machine learning methods, an Ordinary Least Squares (OLS) regression was used to compare the predictive methodologies. The residual plot (Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003e) shows non-random patterns and appears to indicate model mis-specification due to a non-linear effect.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThis analysis illustrates the suitability of machine learning approaches for modelling non-linear associations.\u003c/p\u003e \u003c/div\u003e"},{"header":"4 Discussion","content":"\u003cdiv id=\"Sec37\" class=\"Section2\"\u003e \u003ch2\u003e4.1 Principal Findings\u003c/h2\u003e \u003cp\u003eThis study investigated the performance of ensemble machine learning models of Gross Death Rate (GDR) prediction using both clinical severity and hospital operational measures at hospital level. Three main findings came up. First, the stacked ensemble learning model was consistently superior to individual machine learning models and linear regression in the prediction of aggregated in-hospital mortality burden. Second, clinical variables, especially cardiogenic shock and heart failure, were the major factors in mortality prediction, whereas the administrative variables showed little direct predictive value. Third, feature engineering, in particular, the introduction of interaction terms among operational indicators, enhanced stability of the models and improved the predictive accuracy.\u003c/p\u003e \u003cp\u003eUnlike many mortality prediction studies that evaluate patient-level risk using clinical scoring systems such as SOFA or APACHE, the present study focuses on institutional-level mortality burden measured through Gross Death Rate. Because these clinical scoring systems are designed for individual patient risk stratification rather than hospital-wide performance indicators, direct comparison with institutional mortality metrics would not be methodologically appropriate. Instead, this study evaluates the relative contribution of clinical severity indicators and operational hospital metrics within a machine learning framework designed for system-level decision support.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec38\" class=\"Section2\"\u003e \u003ch2\u003e4.2 Ensemble Learning Superiority for Institutional-Level Prediction of Mortality\u003c/h2\u003e \u003cp\u003eThe superior performance of the stacked ensemble model is in line with the known theory of ensemble learning, which states that the combination of heterogeneous learners can minimize variance and better generalisation by capturing complementary data structures. In this study the merging of the tree-based models and neural networks allowed the learning of non-linear and interaction effects not properly represented by the individual learners. The increment in explained variance (R\u003csup\u003e2\u003c/sup\u003e\u0026thinsp;=\u0026thinsp;0.84) over the best single model is indicative of the suitability of ensemble architectures for the modelling of complex dynamics in hospital-level mortality, where clinical severity as well as system-level operational conditions are involved in the outcomes.\u003c/p\u003e \u003cp\u003eImportantly, the performance gains seen came with stable error distributions which suggests that the ensemble is less prone to overfitting to the training data, but improving its robustness. This characteristic is particularly relevant for institutional-level applications, where there is a need for prediction stability for operational planning and quality monitoring.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec39\" class=\"Section2\"\u003e \u003ch2\u003e4.3 Clinical Supremacy over Administrative Indicators\u003c/h2\u003e \u003cp\u003eA similar result was noted in the feature importance analyses, the correlation assessments, and the regression diagnostics: clinical severity measures were considerably more prominent than administrative measures in explaining the variation in GDR. Cardiogenic shock was the strongest positive predictor of the burden of mortality, reflecting an accepted marker of acute haemodynamic compromise with a high risk of mortality. Heart failure showed an inverse relation to GDR at the institutional level, an observation that is probably due to the context, i.e., specialised care pathways, structured disease management programmes, and referral patterns in centres with higher heart failure case volumes.\u003c/p\u003e \u003cp\u003eIn contrast, operational metrics, such as length of stay, bed occupancy rate, and bed turnover rate, had limited ability on their own to predict. Their contribution seemed to be mostly indirect (operating through interaction effects) and not as independent mortality determinants. These findings make administrative indicators appear to play a secondary role in driving mortality risk and support the finding that clinical severity is the most important driver of mortality to watch in hospital-wide mortality surveillance.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec40\" class=\"Section2\"\u003e \u003ch2\u003e4.4 Contribution of Feature Engineering and Interaction Effect\u003c/h2\u003e \u003cp\u003eThe addition of the engineered interaction features greatly improved the model's performance and stability. Interaction terms between operational indicators, such as length of stay and bed occupancy rate allowed the models to include system-level constraints that may increase or reduce the effect of clinical risk. The decrease of variance of prediction error and the more narrow distribution of the inter-quartile range in the ensemble model indicate the contribution of feature engineering to increase the generalisability and reliability.\u003c/p\u003e \u003cp\u003eThese results highlight that as seemingly important and single predictors of mortality, administrative metrics may be inadequate and their interaction with clinical variables is valuable for providing contextual information. This validates a hybrid modelling approach where operational data complement and not substitute data based on clinical grounds.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec41\" class=\"Section2\"\u003e \u003ch2\u003e4.5 Clinical, Managerial, and Methodological Implications\u003c/h2\u003e \u003cp\u003eFrom a clinical perspective, the findings recommend that risk stratification frameworks implemented across hospitals should focus on indicators of acute clinical severity, specifically shock states, and to use operational measures to put resource allocation and care coordination in context. Embedding such predictive outputs into the multidisciplinary processes for managing beds, and escalation protocols, may help to improve the early identification of periods of higher levels of mortality burden.\u003c/p\u003e \u003cp\u003eFor hospital management, however, the results warn against using administrative measures of efficiency, such as bed occupancy, as direct levers of mortality reduction. Instead, the operational decisions should be informed by clinically-driven risk signals, so that the capacity planning and staffing changes are in line with the underlying patient acuity.\u003c/p\u003e \u003cp\u003eMethodologically, this research shows the usefulness of explainable ensemble learning at the level of institutions. The combination of SHAP analysis and post-hoc regression offers transparency without losing the value of non-linear models predictive advantage in the development of a healthcare system trustworthy AI tools.\u003c/p\u003e \u003c/div\u003e"},{"header":"5 Conclusions","content":"\u003cp\u003eThis study shows that explainable methods for ensemble machine learning can be used to robustly model the hospital-wide burden of mortality, as measured by the Gross Death Rate (GDR). In particular, the stacked ensemble was able to outperform the individual learning algorithms all the time, achieving a high predictive performance (R\u003csup\u003e2\u003c/sup\u003e\u0026thinsp;~\u0026thinsp;0.84). These results highlight the value of the model aggregation method for institutional level mortality estimation as it provides a more nuanced approach than single algorithm models.\u003c/p\u003e \u003cp\u003eAcross the analyses, clinical variables, most prominently cardiogenic shock and heart failure, appeared to be the major determinants of variation in mortality. By contrast, administrative metrics like length of stay, bed occupancy rate and bed turnover rate had only relatively small direct explanatory power. Nonetheless, the inclusion of interaction terms by design had beneficial effects on both model stability and accuracy and shows that operational factors have meaningful explanatory capacity when interpreted in conjunction with clinical risk.\u003c/p\u003e \u003cp\u003eFrom an applied perspective, these results provide arguments that hospital mortality surveillance should be underpinned by clinically derived risk signals and that operational indicators are more needed as a means of providing a context to guide coordinating rather than being used as an independent lever of outcome improvement. As currently designed, such ensemble-based risk estimates might find application by integrating such products in routine operational workflows, such as beds management and escalation meetings, as part of efforts to move closer to early detection of high-risk periods and to conducting informed capacity planning.\u003c/p\u003e \u003cp\u003eMethodologically speaking, the research provides a stable and understandable group conduct framework appropriate for institutional support to decision making. While the single centre design and aggregated outcome limit generalisability and patient level inference, the outcomes provide the basis for future validation in a multi-centre design, extension of longitudinal outcomes and a prospective evaluation of clinical impact. In sum, the proposed approach provides a pragmatic means to advance the linkage between predictive analytics and real-world hospital management and quality improvement programs.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e \u003ch2\u003eCompeting Interests\u003c/h2\u003e \u003cp\u003eThe authors declare that they have no competing interests.\u003c/p\u003e \u003c/p\u003e\u003cp\u003e \u003ch2\u003eEthics Approval\u003c/h2\u003e \u003cp\u003eEthical approval was obtained from the institutional review board of RSUD Dr. Zainoel Abidin, Banda Aceh.\u003c/p\u003e \u003c/p\u003e\u003ch2\u003eFunding\u003c/h2\u003e \u003cp\u003eThe authors received no specific funding for this research.\u003c/p\u003e\u003ch2\u003eAuthor Contributions\u003c/h2\u003e \u003cp\u003eAll authors contributed to the study conception, methodology, analysis, and manuscript preparation.\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e \u003cp\u003eThe datasets used and analysed during the current study are not publicly available due to hospital data protection regulations but may be available from the corresponding author upon reasonable request.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eK\u0026ouml;nig S, Pellissier V, Hohenstein S, Leiner J, Meier-Hellmann A, Kuhlen R et al (2022) From population-to patient-based prediction of in-hospital mortality in heart failure using machine learning. Eur Heart J - Digit Health 3. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/ehjdh/ztac012\u003c/span\u003e\u003cspan address=\"10.1093/ehjdh/ztac012\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTrentino KM, Schwarzbauer K, Mitterecker A, Hofmann A, Lloyd A, Leahy MF et al (2022) Machine Learning-Based Mortality Prediction of Patients at Risk During Hospital Admission. J Patient Saf 18. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1097/PTS.0000000000000957\u003c/span\u003e\u003cspan address=\"10.1097/PTS.0000000000000957\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYun K, Oh J, Hong TH, Kim EY (2021) Prediction of Mortality in Surgical Intensive Care Unit Patients Using Machine Learning Algorithms. Front Med (Lausanne) 8. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3389/fmed.2021.621861\u003c/span\u003e\u003cspan address=\"10.3389/fmed.2021.621861\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJawadi Z, He R, Srivastava PK, Fonarow GC, Khalil SO, Krishnan S et al (2024) Predicting in-hospital mortality among patients admitted with a diagnosis of heart failure: a machine learning approach. ESC Heart Fail 2490\u0026ndash;2498. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1002/ehf2.14796\u003c/span\u003e\u003cspan address=\"10.1002/ehf2.14796\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eThorsen-Meyer H-C, Nielsen AB, Nielsen AP, Kaas-Hansen BS, Toft P, Schierbeck J et al (2020) Dynamic and explainable machine learning prediction of mortality in patients in the intensive care unit: a retrospective study of high-frequency data in electronic patient records. Lancet Digit Health 2:e179\u0026ndash;e191. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/https://doi.org/10.1016/S2589-7500(20)30018-2\u003c/span\u003e\u003cspan address=\"10.1016/S2589-7500(20)30018-2\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDeschepper M, Waegeman W, Vogelaers D, Eeckloo K (2020) Using structured pathology data to predict hospital-wide mortality at admission. PLoS ONE 15. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1371/journal.pone.0235117\u003c/span\u003e\u003cspan address=\"10.1371/journal.pone.0235117\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShah N, Konchak C, Chertok D, Au L, Kozlov A, Ravichandran U et al (2020) Clinical Analytics Prediction Engine (CAPE): Development, electronic health record integration and prospective validation of hospital mortality, 180-day mortality and 30-day readmission risk prediction models. PLoS ONE 15. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1371/journal.pone.0238065\u003c/span\u003e\u003cspan address=\"10.1371/journal.pone.0238065\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi C, Zhang Z, Ren Y, Nie H, Lei Y, Qiu H et al Machine learning based early mortality prediction in the emergency department. Int J Med Inf 2021;155. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.ijmedinf.2021.104570\u003c/span\u003e\u003cspan address=\"10.1016/j.ijmedinf.2021.104570\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGiwangkancana GW, Anina HN, Sukandar H (2024) Predicting End-of-Life in a Hospital Setting. J Multidiscip Healthc 17. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.2147/JMDH.S443425\u003c/span\u003e\u003cspan address=\"10.2147/JMDH.S443425\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWu M, Gao H (2023) A prediction model for in-hospital mortality in intensive care unit patients with metastatic cancer. Front Surg 10. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3389/fsurg.2023.992936\u003c/span\u003e\u003cspan address=\"10.3389/fsurg.2023.992936\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNie X, Cai Y, Liu J, Liu X, Zhao J, Yang Z et al Mortality Prediction in Cerebral Hemorrhage Patients Using Machine Learning Algorithms in Intensive Care Units. Front Neurol 2021;11. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3389/fneur.2020.610531\u003c/span\u003e\u003cspan address=\"10.3389/fneur.2020.610531\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNaemi A, Schmidt T, Mansourvar M, Naghavi-Behzad M, Ebrahimi A, Wiil UK (2021) Machine learning techniques for mortality prediction in emergency departments: A systematic review. BMJ Open 11. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1136/bmjopen-2021-052663\u003c/span\u003e\u003cspan address=\"10.1136/bmjopen-2021-052663\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHuang T, Le D, Yuan L, Xu S, Peng X Machine learning for prediction of in-hospital mortality in lung cancer patients admitted to intensive care unit. PLoS ONE 2023;18. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1371/journal.pone.0280606\u003c/span\u003e\u003cspan address=\"10.1371/journal.pone.0280606\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDeng T, Hamdan H, Yaakob R, Kasmiran KA Personalized Federated Learning for In-Hospital Mortality Prediction of Multi-Center ICU. IEEE Access 2023;11. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/ACCESS.2023.3241488\u003c/span\u003e\u003cspan address=\"10.1109/ACCESS.2023.3241488\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMaheswari BU, Ashik F, George A, Jose A, Explainable AI (2023) In-Hospital Mortality Prognosis: Unmasking Patterns using Data Science and. 9th International Conference on Signal Processing and Communication, ICSC 2023, 2023. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/ICSC60394.2023.10441356\u003c/span\u003e\u003cspan address=\"10.1109/ICSC60394.2023.10441356\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSeki T, Kawazoe Y, Ohe K Machine learning-based prediction of in-hospital mortality using admission laboratory data: A retrospective, single-site study using electronic health record data. PLoS ONE 2021;16. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1371/journal.pone.0246640\u003c/span\u003e\u003cspan address=\"10.1371/journal.pone.0246640\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWarman PI, Seas A, Satyadev N, Adil SM, Kolls BJ, Haglund MM et al Machine Learning for Predicting In-Hospital Mortality After Traumatic Brain Injury in Both High-Income and Low- and Middle-Income Countries. Neurosurgery 2022;90. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1227/neu.0000000000001898\u003c/span\u003e\u003cspan address=\"10.1227/neu.0000000000001898\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSinha S, Dong T, Dimagli A, Judge A, Angelini GD (2024) A machine learning algorithm-based risk prediction score for in-hospital/30-day mortality after adult cardiac surgery. Eur J Cardiothorac Surg 66:ezae368. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/ejcts/ezae368\u003c/span\u003e\u003cspan address=\"10.1093/ejcts/ezae368\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLee B, Kim K, Hwang H, Kim YS, Chung EH, Yoon JS et al (2021) Development of a machine learning model for predicting pediatric mortality in the early stages of intensive care unit admission. Sci Rep 11. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41598-020-80474-z\u003c/span\u003e\u003cspan address=\"10.1038/s41598-020-80474-z\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eQiao EM, Qian AS, Nalawade V, Voora RS, Kotha NV, Vitzthum LK et al (2022) Evaluating High-Dimensional Machine Learning Models to Predict Hospital Mortality Among Older Patients With Cancer. JCO Clin Cancer Inf. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1200/cci.21.00186\u003c/span\u003e\u003cspan address=\"10.1200/cci.21.00186\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi L, Ding L, Zhang Z, Zhou L, Zhang Z, Xiong Y et al (2023) Development and Validation of Machine Learning\u0026ndash;Based Models to Predict In-Hospital Mortality in Life-Threatening Ventricular Arrhythmias: Retrospective Cohort Study. J Med Internet Res 25. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/https://doi.org/10.2196/47664\u003c/span\u003e\u003cspan address=\"10.2196/47664\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi M, Han S, Liang F, Hu C, Zhang B, Hou Q et al (2024) Machine Learning for Predicting Risk and Prognosis of Acute Kidney Disease in Critically Ill Elderly Patients During Hospitalization: Internet-Based and Interpretable Model Study. J Med Internet Res 26. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/https://doi.org/10.2196/51354\u003c/span\u003e\u003cspan address=\"10.2196/51354\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLee SW, Lee HC, Suh J, Lee KH, Lee H, Seo S et al Multi-center validation of machine learning model for preoperative prediction of postoperative mortality. NPJ Digit Med 2022;5. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41746-022-00625-6\u003c/span\u003e\u003cspan address=\"10.1038/s41746-022-00625-6\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFang C, Pan Y, Zhao L, Niu Z, Guo Q, Zhao B (2022) A Machine Learning-Based Approach to Predict Prognosis and Length of Hospital Stay in Adults and Children With Traumatic Brain Injury: Retrospective Cohort Study. J Med Internet Res 24. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.2196/41819\u003c/span\u003e\u003cspan address=\"10.2196/41819\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKwun JS, Ahn HB, Kang SH, Yoo S, Kim S, Song W et al (2025) Developing a Machine Learning Model for Predicting 30-Day Major Adverse Cardiac and Cerebrovascular Events in Patients Undergoing Noncardiac Surgery: Retrospective Study. J Med Internet Res 27. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.2196/66366\u003c/span\u003e\u003cspan address=\"10.2196/66366\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMuralitharan S, Nelson W, Di S, McGillion M, Devereaux PJ, Barr NG et al (2021) Machine learning\u0026ndash;Based early warning systems for clinical deterioration: Systematic scoping review. J Med Internet Res 23. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.2196/25187\u003c/span\u003e\u003cspan address=\"10.2196/25187\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLv H, Yang X, Wang B, Wang S, Du X, Tan Q et al (2021) Machine Learning\u0026ndash;Driven Models to Predict Prognostic Outcomes in Patients Hospitalized With Heart Failure Using Electronic Health Records: Retrospective Study. J Med Internet Res 23. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/https://doi.org/10.2196/24996\u003c/span\u003e\u003cspan address=\"10.2196/24996\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEstiri H, Strasser ZH, Klann JG, Naseri P, Wagholikar KB, Murphy SN Predicting COVID-19 mortality with electronic medical records. NPJ Digit Med 2021;4. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41746-021-00383-x\u003c/span\u003e\u003cspan address=\"10.1038/s41746-021-00383-x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHsu C-N, Liu C-L, Tain Y-L, Kuo C-Y, Lin Y-C (2020) Machine Learning Model for Risk Prediction of Community-Acquired Acute Kidney Injury Hospitalization From Electronic Health Records: Development and Validation Study. J Med Internet Res 22. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/https://doi.org/10.2196/16903\u003c/span\u003e\u003cspan address=\"10.2196/16903\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHalasz G, Sperti M, Villani M, Michelucci U, Agostoni P, Biagi A et al (2021) A machine learning approach for mortality prediction in COVID-19 pneumonia: Development and evaluation of the Piacenza score. J Med Internet Res 23. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.2196/29058\u003c/span\u003e\u003cspan address=\"10.2196/29058\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"Syiah Kuala University","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"hospital mortality, machine learning, hospital performance metrics, ensemble learning, healthcare analytics","lastPublishedDoi":"10.21203/rs.3.rs-9203293/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9203293/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eBackground:\u003c/h2\u003e \u003cp\u003eAccurate prediction of in-hospital mortality is essential for hospital-wide clinical decision-making and resource planning. Most machine learning frameworks rely on patient-level clinical data and focus on intensive care or disease-specific populations, while hospital operational metrics are rarely incorporated.\u003c/p\u003e\u003ch2\u003eMethods:\u003c/h2\u003e \u003cp\u003eWe conducted a retrospective observational study using 36 months of institutional data from 2022 to 2024 obtained from a regional referral hospital. The dataset integrated clinical indicators with hospital operational metrics, including length of stay, bed occupancy rate, and bed turnover rate. Three machine learning models, Random Forest, XGBoost, and a feed-forward neural network, were developed alongside a linear regression baseline. A stacked ensemble approach was applied to capture nonlinear relationships. Model performance was evaluated using R\u003csup\u003e2\u003c/sup\u003e, root mean squared error, and mean absolute error with five-fold cross-validation. Model interpretability was assessed using Shapley Additive exPlanations.\u003c/p\u003e\u003ch2\u003eResults:\u003c/h2\u003e \u003cp\u003eThe stacked ensemble achieved the strongest predictive performance (R\u003csup\u003e2\u003c/sup\u003e\u0026thinsp;=\u0026thinsp;0.84; RMSE\u0026thinsp;=\u0026thinsp;4.49), while the neural network yielded the lowest MAE (2.74). Heart failure and cardiogenic shock emerged as influential clinical predictors. Although operational metrics showed limited direct effects, interaction terms improved model stability. Shapley analyses demonstrated consistent feature attributions across models, supporting interpretability.\u003c/p\u003e\u003ch2\u003eConclusions:\u003c/h2\u003e \u003cp\u003eIntegrating clinical severity indicators with hospital operational metrics using an explainable ensemble machine learning framework improves hospital-wide mortality prediction. Operational variables contribute modestly in isolation but enhance model robustness through interaction effects, highlighting the value of interpretable machine learning for institutional-level clinical decision support.\u003c/p\u003e","manuscriptTitle":"Ensemble Machine Learning for Institutional-Level Hospital Mortality Prediction Using Clinical and Operational Indicators","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-03-25 08:21:45","doi":"10.21203/rs.3.rs-9203293/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"ad5d1b73-967d-427a-96d1-9526b2a79153","owner":[],"postedDate":"March 25th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":64999277,"name":"Artificial Intelligence and Machine Learning"},{"id":64999278,"name":"Medical Informatics"}],"tags":[],"updatedAt":"2026-03-25T08:21:45+00:00","versionOfRecord":[],"versionCreatedAt":"2026-03-25 08:21:45","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9203293","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9203293","identity":"rs-9203293","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00