Prospective Multicenter Validation of Machine Learning Models for Mortality Prediction in Adult Critically Ill Patients using Transfer Learning | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Prospective Multicenter Validation of Machine Learning Models for Mortality Prediction in Adult Critically Ill Patients using Transfer Learning Ioannis Papapanagiotou, Charikleia S. Vrettou, Maria Theodorakopoulou, and 16 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8872055/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Mortality prediction in critically ill patients remains challenging due to poor cross-institutional performance and limited generalizability of machine learning models. This study addresses this, by systematically benchmarking and prospectively validating transfer learning frameworks. We trained our models on MIMIC-IV and validated them on a multicenter prospective cohort of 539 patients from three hospitals. We compared tree-based methods and modern deep learning architectures for tabular data. Results demonstrated that both Domain Adaptation (DA) and Inductive Transfer Learning (ITL) significantly enhanced model performance under realistic conditions where target-domain data are limited. DA consistently improved discrimination across all evaluated models, with LightGBM showing the most significant gains in Area Under the Receiver Operating Characteristic Curve (AUC) (p = 0.0010), and XGBoost yielding the largest improvements in Area Under the Precision-Recall Curve (AUPRC) (p = 0.0419). Among all evaluated models, Random Forest (RF) achieved the highest discriminative performance, achieving 90.7% AUC with DA and 81.3% AUPRC with ITL. Notably, the domain-adapted models significantly outperformed APACHE II (p = 0.0044) and SOFA (p = 0.0077). These findings suggest that transfer learning provides a robust and data-efficient pathway for improving model generalizability across heterogeneous populations, offering a pragmatic solution to the challenge of model degradation in clinical deployment. Biological sciences/Computational biology and bioinformatics Health sciences/Health care Physical sciences/Mathematics and computing Health sciences/Medical research machine learning mortality prediction intensive care unit adults domain adaptation inductive transfer learning prospective study Figures Figure 1 Figure 2 Figure 3 Figure 4 1. Introduction Accurate and timely prediction of mortality in critically ill adults remains a central challenge in modern intensive care medicine. Despite decades of re- search and the widespread adoption of scoring systems such as the Acute Physiology and Chronic Health Evaluation (APACHE) II [ 1 ], the Simplified Acute Physiology Score (SAPS) [ 2 ], and the Sequential Organ Failure Assessment (SOFA) [ 3 ] score, these tools often exhibit limited discrimination across institutions and patient populations due to their constraints as linear models [ 4 ]. Recently, machine learning (ML)-based models have shown promise in capturing complex, nonlinear relationships in tabular Intensive Care Unit (ICU) data often outperforming traditional scoring systems [ 5 , 6 , 7 ]. A wide range of ML approaches have been applied to the Intensive Care Unit (ICU) mortality prediction from admission data, with tree-based ensemble methods frequently outperforming linear models and deep learning architectures in statistical metrics. In their real-time mortality prediction study on the Medical Information Mart for Intensive Care (MIMIC)-III [ 8 ] data, Johnson et al. showed that gradient boosting outperformed traditional severity scores [ 9 ]. Similarly, Pang et al. reported that Extreme Gradient Boosting (XGBoost) outperformed Logistic Regression (LR) and Support Vector Machines (SVMs) trained on MIMIC-IV [ 10 ] data [ 11 ]. Large institutional datasets beyond MIMIC have also been leveraged; Min Hyuk Choi et al. analyzed ICU admissions from two major South Korean hospitals, finding that Light Gradient-Boosting Machine (LGBM) and XGBoost yielded the best predictive performance [ 12 ]. Random Forests (RF) have also shown promising results in a number of studies. Hu et al. evaluated nine (9) ML models on MIMIC-IV and identified RF as the best performer [ 5 ]. Kim et al. applied ML methods to nursing records in MIMIC-IV and again found RF superior to LR, SVMs, and Naïve Bayes [ 13 ], while Alghatani et al. likewise demonstrated that RF outperformed XGBoost and linear models when trained on MIMIC-III [ 14 ]. Variability in the best-performing models across MIMIC-III and MIMIC-IV studies likely reflects differences in cohort selection, feature selection, and evaluation setups, rather than fundamental inconsistencies in model capabilities, where tree-based methods generally outperform other methods. In addition to tree-based classical machine learning, deep learning architectures have also been increasingly used in recent years. However, standard neural networks often struggle with heterogeneous tabular data due to mixed feature types, missing values, and the lack of inherent spatial or sequential structure. To combat this, architectures such as Self-Attention and Intersample Attention Transformer (SAINT) [ 15 ] and Neural Oblivious Decision Ensembles (NODE) [ 16 ] have been proposed. Studies on SAINT and NODE have shown competitive performance against traditional boosting models, often outperforming them, indicating they could be promising architectures for tasks such as ICU mortality prediction, where capturing subtle relationships among vital signs, laboratory data, and comorbidities is crucial [ 17 , 18 ]. Although ML approaches have shown promising results, a major barrier to their clinical deployment is their susceptibility to performance degradation when applied prospectively or across heterogeneous healthcare settings. To overcome these limitations, methodological frameworks such as Domain Adaptation (DA) and Inductive Transfer Learning (ITL) aim to improve cross-site robustness by leveraging shared structure across diverse clinical environments. Mutnuri et al. applied DA and ITL by fine-tuning the last hidden layer of their neural networks [ 19 ]. This allowed the model to adapt to population-specific patterns while maintaining the structure learned from the larger internal dataset. The resulting model achieved markedly better external discrimination and calibration compared to the baseline model trained solely on internal data. DA has also shown very promising results in dynamic temporal data using convolutional and recurrent layers [ 20 ] and also in cross-institutional ICU outcome prediction, as demonstrated by Zhu et al., who pretrained a gradient boosting algorithm on adult MIMIC-II data and fine-tuned on a pediatric cohort, significantly improving ICU-mortality prediction on the target pediatric population [ 21 ]. Despite these advances, several important gaps remain unaddressed in the literature. To the best of our knowledge, no ICU mortality prediction model using DA or ITL has yet undergone prospective, real-time validation in adult ICUs, when restricted to the first 24 hours of admission data. The first 24 hours of ICU admission constitute a critical period for early risk assessment, as decisions made during this window can substantially influence patient management and outcomes. Many existing mortality prediction models leverage longitudinal data collected throughout the ICU stay which reduces their utility [ 19 , 22 ]. Although DA and ITL have shown promise, existing ICU outcome studies have focused predominantly on neural-network–based architectures [ 19 , 20 , 22 ]. Apart from Zhu et al., who combined DA with gradient boosting [ 21 ] using temporal data, we found no systematic benchmarking of DA/ITL frameworks using modern tree-based boosting models such as XGBoost or LightGBM, despite their strong performance in ICU mortality prediction. SAINT and NODE-based architectures have also recently been explored in similar domains [ 23 , 24 , 25 , 26 , 27 ], but lack evaluation within DA or ITL cross-domain frameworks or prospective studies. As such, the contributions of this work are as follows: We provide a prospectively validated study of ICU mortality prediction models using DA and ITL in adult ICU populations, focusing specifically on the clinically critical first 24 hours of admission data. We benchmark DA and ITL using tree-based models as well as SAINT and NODE for adult ICU mortality prediction, which have not been investigated in the context of 24-hour ICU admission data. 2. Methods 2.1. Data sources The publicly available MIMIC-IV v3.1 [ 10 ], which contains clinical and physiological measurements from a heterogeneous cohort of ICU patients, was used to train our ML algorithms. MIMIC-IV, which contains a total of 94,458 unique adult ICU stays, was used for model training and validation. The prospective data were collected from the ICUs of “Evangelismos” General Hospital, General Hospital of Thoracic Diseases “Sotiria”, and “KAT” Attica General Hospital. The Ethics Committees of the three Hospitals approved the study (study approval numbers 14–17/01/2025; 27 − 5/02/2025 & 8260-26/03/2025). Informed consent was obtained from all patients’ next-of-kin before inclusion, and all procedures carried out complied with the Helsinki Declaration. The prospective data collected from the 3 ICUs yielded a combined cohort of 539 adult patients admitted to the ICU. Of these, 141 patients died during hospitalization. Harmonized data collection protocols ensured consistent variable definitions across centers. Demographic characteristics of the MIMIC-IV and the prospective data are shown in Table 1 . Table 1 : Comparison of demographic characteristics between MIMIC-IV and prospective data. 2.2.Data preprocessing and feature alignment All clinical and laboratory measurements were obtained within the first 24 hours of ICU admission. To ensure compatibility with our feature space, we selected a subset of 44 features from both datasets. Collected variables included demographics (age, sex), vital signs and physiologic measures (temperature, heart rate, respiratory rate, oxygen saturation, Glasgow coma scale, lactate), and an extensive panel of laboratory values including hematologic indices (white blood cell count, hematocrit, platelet count), renal function tests (blood urea nitrogen, creatinine), metabolic and electrolyte markers (glucose, sodium, potassium), and liver function markers (total bilirubin, albumin). The complete variable lists for the MIMIC-IV and our prospective data are provided in the supplementary material Tables S2 and S3. 2.3. Model development and validation We trained supervised learning tree-based models (RF, XGBoost, LightGBM, CatBoost) as well as deep learning models (NODE, SAINT). We selected these architectures since the literature had previously identified them among the most promising approaches for ICU mortality prediction. The MIMIC- IV (source) dataset was first split into 80/20, with 80% used for model training and the remaining 20% held out as a test set. We performed a five-fold cross-validation within the training portion to avoid overfitting and ensure model robustness. Model evaluation was conducted primarily using AUC and AUPRC, given the class imbalance in ICU mortality. We have also reported the Brier score, sensitivity, and specificity. Hyperparameters used in all ML experiments were optimized with Optuna, a Bayesian optimizer [ 28 ]. In addition to the transfer learning experiments, we trained models exclusively on the prospective (target) dataset. These locally trained models served as comparators to assess the added value of transfer learning beyond what could be achieved using institution-specific data alone. To ensure a fair comparison, identical feature sets, preprocessing steps, performance metrics, and hyperparameter optimization procedures were applied. 2.4. ITL implementation In the ITL setting, models were trained on the internal MIMIC-IV cohort (source domain) and were further fine-tuned using the outcome labels from the prospective data (target domain). For boosting models, ITL was performed by continuing gradient boosting from the internally trained model using labeled external data. Specifically, the boosting model trained on MIMIC-IV was first fitted on the internal training data to obtain a base booster. This booster was then updated by training additional trees on the external transfer learning subset. To control the extent of adaptation and mitigate overfitting to the small external sample, only a limited number of new trees were added, and the learning rate for these trees was reduced relative to the original model. For each external split, the ITL procedure introduced three tunable hyperparameters: (i) the learning-rate scaling factor κ , (ii) the number of additional boosting rounds, and (iii) the class imbalance correction term ( scale_pos_weight ) applied during fine-tuning. These parameters were optimized using Optuna. Additionally, Platt scaling was applied as a post-hoc calibration step. Random forests were adapted using a model-based ensemble expansion strategy enabled by warm-start training. The source domain was reinitialized with warm-start enabled, and additional decision trees were appended to the existing ensemble. These newly added trees were trained solely on labeled target-domain data, while the original source-domain trees remained fixed. The tunable hyperparameters were (i) number of trees, (ii) maximum tree depth, and (iii) minimum leaf size. In the NODE architecture, ITL was performed by initializing a classifier with the internal base weights learned on the original internal cohort, and then fine-tuning a restricted subset of parameters on the external cohort. A new NODE instance (stacked Oblivious Decision Trees-ODST layers with additive per-layer readouts) was loaded with the stored base checkpoint; all parameters were frozen, and we unfroze only (i) the readout layers, (ii) the global logit bias term, and (iii) the final ODST layer. The final fine-tuned model was then trained using AdamW and weighted binary cross-entropy with logits. Similarly to tree-based methods, SAINT was adapted via parameter- restricted fine-tuning starting from internal base weights learned on the original internal cohort and then fine-tuning a restricted subset of parameters on the external cohort. A new SAINT instance (feature-wise tokenization via learnable per-feature affine embeddings followed by multi-head self-attention blocks and mean pooling) was then loaded with the stored base checkpoint; all parameters were frozen, and we unfroze (i) the classification head and (ii) the final transformer block. 2.5. DA implementation For DA with all tree-based models as well as NODE, we estimated each internal training instance by fitting a domain classifier that discriminates between internal and external feature distributions. We constructed a pooled dataset with corresponding binary domain labels, then a logistic regression model was trained on X domain to estimate the posterior probability P ( d = 1 | x ), which represents the probability that an observation x originates from the external cohort. Using the trained domain classifier, we computed an importance weight for each internal training sample x i as a monotonic function of the odds of originating from the external cohort. To ensure numerical stability, the probabilities p i were clipped to the interval p i ∈ [10 − 6 , 1 − 10 − 6 ], and the resulting importance weights were further constrained to lie within a bounded range. A DA model was then trained on the same internal training data, using the computed importance weights. Thus, adaptation was achieved without using external labels for training the outcome model, as external data are only used to learn the domain shift and derive weights. Unlike the tree-based approaches, SAINT was trained once to obtain a baseline model on the internal training set. Starting from the baseline weights, we fine-tuned an alignment model using labeled internal source batches together with unlabeled external TL batches by minimizing a joint objective: L = L task ( X s , y s ) + λ L align ( f ( X s ), f ( X t )), where L task is binary cross-entropy with logits on internal labels, f (·) denotes the pooled SAINT representation, and L align is a Radial Basis Function – Maximum Mean Discrepancy loss computed between source and target rep- resentations. Alignment hyperparameters were fixed from a prior Optuna run: λ = 9.7454 × 10 − 4 , σ = 13.8262, and freeze_epochs = 3, where the feature extractor was frozen for the first 3 epochs, and the classification head remained trainable throughout. 2.6. Performance aggregation strategy To account for the relatively small size of the external prospective cohort ( n = 539), both ITL and DA procedures were repeated across 5 random 70/30 splits, each controlled by a sequential independent seed. For every seed, the adapted model was trained, optimized by Optuna, and evaluated on its corresponding split. The resulting performance metrics were aggregated across all repetitions. Final reported results, therefore, reflect the mean and 95% confidence intervals computed over five sequential seeds. APACHE II and SOFA scores used to benchmark our ML models were also computed over the same seeds. 2.7. Statistical analysis Paired hypothesis testing was conducted to compare model performance across repeated data splits, using performance metrics computed from five sequential seeds. Specifically, paired t -tests were used to assess whether the mean difference in performance between models differed significantly from zero, and differences were considered statistically significant when p < 0.05. 2.8. Model interpretation To enhance the interpretability of our findings, we used SHapley Additive exPlanations (SHAP) [ 29 ], an interpretability tool that helps to elucidate how our models compute their predictions. We used a SHAP Kernel Explainer to derive feature attributions, followed by a SHAP Beeswarm plot to visualize the distribution of these effects and explore how individual predictors influence the model’s outcome predictions. 2.9. Software Our machine learning models were developed and trained using Python 3.12. Experiments were conducted on an Intel Core i7-14700k CPU and RTX 5600 GPU. 3. Results 3.1. Assessment of DA and ITL over baseline models Across all evaluated models, DA consistently improved discrimination relative to baseline in 6 out of 6 models, while ITL improved discrimination in 4 out of 6. The largest AUC improvements were observed for the LightGBM model (Fig. 1a), while the largest AUPRC gains were observed for the XGBoost model (Fig. 1b). Both models showed improvements in discrimination relative to the baseline that were statistically significant for both adaptation strategies. The domain-adapted LGBM model achieved a statistically significant performance gain in AUC (Fig. 1a) (paired t-test; p = 0.0010). Similarly, XGBoost achieved a statistically significant performance in AUPRC using DA (Fig. 1b) (paired t-test; p = 0.0419). All evaluated models are summarized in Table 2, while additional performance measures, including the Brier score, sensitivity, and specificity, are reported in the supplementary material Table S1. 3.2. Comparison with scoring systems and locally trained models Figure 2 compares the performance of the domain-adapted LGBM (Fig. 2a) and XGBoost (Fig. 2b) models with the APACHE II and SOFA scores, as well as locally trained baselines, using AUC and AUPRC. We choose to plot LGBM and XGBoost as they demonstrated the largest gains in AUC and AUPRC, respectively. The domain-adapted models significantly outperformed APACHE II, showing higher AUC (p = 0.0044) and AUPRC (p = 0.00026). They also achieved superior discrimination relative to SOFA, with statistically significant improvements in both AUC (p = 0.0077) and AUPRC (p = 0.0013). Moreover, demonstrated significant gains over the locally trained models in AUC (p = 0.033) and AUPRC (p = 0.022). Table 2 : Performance comparison of all ML models used in this study. RF 0.920 0.602 0.897 [0.885, 0.910] 0.791 [0.763, 0.818] ITL 0.903 [0.893, 0.914] 0.813 [0.788, 0.838] DA 0.907 [0.895, 0.918] 0.802 [0.773, 0.832] XGBoost 0.922 0.608 0.892 [0.878, 0.905] 0.771 [0.747, 0.795] ITL 0.891 [0.874, 0.907] 0.771 [0.741, 0.800] DA 0.905 [0.895, 0.914] 0.803 [0.779, 0.827] LightGBM 0.920 0.607 0.877 [0.863, 0.891] 0.761 [0.734, 0.789] ITL 0.881 [0.865, 0.897] 0.768 [0.737, 0.799] DA 0.892 [0.882, 0.902] 0.782 [0.755, 0.809] CatBoost 0.920 0.600 0.893 [0.884, 0.903] 0.772 [0.738, 0.806] ITL 0.896 [0.883, 0.908] 0.782 [0.757, 0.807] DA 0.896 [0.885, 0.906] 0.776 [0.746, 0.807] NODE 0.913 0.577 0.876 [0.854, 0.898] 0.734 [0.686, 0.783] ITL 0.882 [0.859, 0.905] 0.750 [0.692, 0.807] DA 0.878 [0.860, 0.896] 0.724 [0.700, 0.749] SAINT 0.911 0.581 0.896 [0.883, 0.909] 0.793 [0.758, 0.828] ITL 0.889 [0.877, 0.900] 0.785 [0.755, 0.814] DA 0.897 [0.883, 0.910] 0.794 [0.760, 0.829] Best AUROC and AUPRC values are shown in bold. Means and CIs were calculated for the prospective cohort due to its lower sample size (n = 539). Abbreviations: DA, Domain Adaptation; ITL, Inductive Transfer Learning; RF, Random Forest. 3.3. Model interpretability Feature-level interpretability analyses for the domain-adapted RF model are illustrated in Figs. 3 and 4, including Mean Decrease in Impurity (MDI)- based feature importance rankings and SHAP Beeswarm plots, respectively. We chose to plot RF, as it achieved the highest AUC after domain adaptation. 4. Discussion This study demonstrates that both DA and ITL can significantly improve the discriminatory performance of ML models for predicting mortality in ICU patients compared to baseline approaches. Across all evaluated architectures, DA consistently improved discrimination, while ITL yielded improvements in most models. These improvements were systematic and, in several cases, statistically significant. Among the evaluated models, LightGBM exhibited the largest improvements in AUC, and XGBoost showed the most pronounced gains in AUPRC, suggesting that these models may be particularly well suited to benefit from DA or ITL in this setting. LightGBM’s leaf-wise tree growth and strong handling of heterogeneous feature interactions may enable more effective refinement of decision boundaries under covariate shift, leading to larger gains in overall ranking performance. Several studies have investigated DA in temporal and dynamic clinical data using deep learning approaches [ 20 , 21 , 22 , 19 ]. However, these works differ substantially from our study in both scope and methodology. Prior approaches primarily rely on time-series data, focus on pediatric populations, or restrict DA to deep neural architectures, without considering tree-based or transformer-based models, and without benchmarking DA alongside ITL. Most closely related to our work, Mutnuri et al. evaluated both DA and ITL within fully connected neural networks [ 19 ]. In contrast to our setting, their source and target domains were of comparable size, their data contained temporal features, and adaptation was performed exclusively within fully contained neural networks. NODE and SAINT have been mainly assessed on synthetic and real-world benchmark datasets [ 23 , 24 ], focusing on representation learning and feature interaction modeling, rather than on clinical DA or transfer learning across heterogeneous healthcare cohorts. To our knowledge, neither NODE nor SAINT has previously been systematically evaluated for DA or ITL in ICU mortality prediction, nor compared directly against classical tree-based methods under similar experimental conditions. Our study, therefore, extends prior work [ 25 , 26 , 27 ] by examining how modern deep tabular architectures respond to both DA and ITL in a clinically realistic, cross-cohort setting, and furthermore by benchmarking their behavior alongside established methods. Notably, beyond comparisons with baseline machine learning models, the domain-adapted LGBM and XGBoost models substantially outperformed established clinical scoring systems, including APACHE II and SOFA, in both AUC and AUPRC. These findings underscore the added value of data-driven, adapted ML approaches over traditional rule-based scores. In addition, the domain-adapted models demonstrated significant improvements over their respective baselines trained on our prospective data, indicating that performance gains were not solely attributable to local retraining but rather to the explicit incorporation of information from external domains. Interpretability analyses of the RF model showed a coherent and clinically plausible set of dominant predictors, with strong agreement between global Gini importance and local SHAP attributions. Across both representations, APACHE II, lactate, and SOFA score emerged as top predictors in most models. SHAP values further demonstrated directionally consistent effects, in which higher lactate, higher severity scores, and lower saturation could enhance the predicted mortality (Fig. 4 ). Importantly, DA primarily reweighted these same core clinical variables rather than introducing new drivers, suggesting that performance gains were achieved by refining the relative importance of clinically meaningful signals to better match the external cohort. In our study, there is an imbalance between source and target sample sizes. With 94,458 samples available in the source domain and 539 in the target domain, instance-level weighting or selection could become inherently unstable, as similarity estimates and importance weights are dominated by high-dimensional noise and may be overly sensitive to a small number of target observations. In contrast, DA operates at a distributional level, enabling the model to leverage the large source dataset while explicitly adjusting for systematic differences between domains, rather than relying on sparse target instances to guide transfer. Under these conditions, DA could provide a more robust and data-efficient mechanism for aligning source and target domains, which likely explains its more consistent performance compared with ITL in our study. The limitations of our study should be addressed. First, the target domain was substantially smaller than the source dataset, reflecting a common real-world constraint but potentially limiting model stability and the generalizability of conclusions regarding adaptation effectiveness. Second, performance estimation relied on repeating the ITL and DA procedures across five random data splits. Although this approach improves robustness compared with a single split, the limited number of repetitions, which were driven by computational constraints, may reduce the precision of estimated confidence intervals and the power of statistical comparisons. Lastly, the use of a logistic regression domain classifier may inadequately capture complex, nonlinear domain shifts typical of ICU data, potentially resulting in suboptimal importance weighting. The strengths of our study should also be acknowledged. By leveraging the large-scale, widely recognized MIMIC-IV database for source training and validating our findings on an independent, multicenter prospective ICU cohort, we ensured a rigorous test of model generalizability in a clinically realistic setting. A key strength lies in our systematic benchmarking of modern deep tabular architectures, such as NODE and SAINT, alongside established tree-based methods. Our results highlight the utility of tree-based methods, which demonstrated superior capacity to refine decision boundaries under covariate shift, achieving the most pronounced gains in AUC and AUPRC, respectively. Furthermore, by demonstrating that domain-adapted models substantially outperform traditional clinical scoring systems like APACHE II and SOFA, this study provides strong evidence for the superiority of adaptive, data-driven approaches over static rule-based metrics. 5. Conclusions This study demonstrates that transfer learning strategies can meaningfully improve ICU mortality prediction under realistic cross-domain conditions, when target-domain data are limited. Across multiple modeling paradigms, domain adaptation consistently improved discrimination, achieving statistically significant improvements over baseline models or traditional scoring systems such as APACHE II or SOFA scores. From a clinical perspective, these findings support DA and ITL as pragmatic pathways to improve model generalizability across heterogeneous ICU populations without requiring large volumes of local retraining data. Such approaches may help mitigate well-documented performance degradation when models are transferred between hospitals, healthcare systems, or patient cohorts. The prospective validation performed in this study further strengthens the translational relevance of these results. Future work should extend this framework to incorporate multimodal representations to assess whether DA benefits persist in richer data settings. Broader multicenter prospective evaluations are also needed to examine clinical robustness across heterogeneous ICU environments. Declarations Data and code availability The processed MIMIC-IV dataset generated and analyzed during the current study is publicly available in PhysioNet: https://physionet.org/content/mimiciv/3.1. The code to reproduce our experiments will also be available in the Github repository: https://github. com/giannis3p/TL-mortality-pred. The datasets analyzed during the current study are not publicly available due to participant confidentiality and informed consent restrictions, but are available from the corresponding author on reasonable request. Author Contributions AKo, AGV, SP, and ID conceived the work; IP, AKa, SG, and SP designed the work; MT, ZM, VG, OK, MP, GP, VI, KK, SP, CK, and NSL acquired the data; IP, CSV, MT, ZM, VG, OK, AKa, MP, SG, GP, VI, KK, SP, CK, NSL, AKo, AGV, SP, ID analyzed and interpreted the data; IP, MT, ZM, VG, OK, AKa, MP, GP, VI, KK, SP, CK, NSL drafted the work; CSV, SG, AKo, AGV, SP, and ID substantively revised it. All authors have approved the submitted version and have agreed both to be personally accountable for the author's own contributions and to ensure that questions related to the accuracy or integrity of any part of the work, even ones in which the author was not personally involved, are appropriately investigated, resolved, and the resolution documented in the literature. Competing Interests The authors declare no competing interests. Funding The work was implemented under the Clusters of Research Excellence (CREs), funded by the European Union, Recovery and Resilience Facility (RRF), Greece 2.0, Next Generation EU. Grant number: ΥΠ3ΤΑ-0559412. References Knaus, W. A., Draper, E. A., Wagner, D. P. & Zimmerman, J. E. APACHE II: a severity of disease classification system. Crit. Care Med. 13 (10), 818–829 (1985). Le Gall, J. R. et al. A simplified acute physiology score for ICU patients, Crit. Care Med. 12 (11), 975–977 (1984). Vincent, J. L. et al. The SOFA (sepsis-related organ failure assessment) score to describe organ dysfunction/failure. Intensive Care Med. 22 (7), 707–710 (1996). Raith, E. P. et al. D. V. Pilcher, for the Australian, N. Z. I. C. S. A. C. for Outcomes, R. E. (CORE), Prognostic accuracy of the sofa score, sirs criteria, and qsofa score for in-hospital mortality among adults with suspected infection admitted to the intensive care unit, JAMA 317 (3) 290–300. (2017). 10.1001/jama.2016.20328 Hu, C., Gao, C., Li, T., Liu, C. & Peng, Z. Explainable artificial intelligence model for mortality risk prediction in the intensive care unit: a derivation and validation study. Postgrad. Med. J. 100 (1182), 219–227. 10.1093/postmj/qgad144 (2024). Olang, O. et al. Artificial intelligence-based models for prediction of mortality in icu patients: A scoping review, Journal of Intensive Care Medicine 40 (12) 1240–1246, pMID: 39150821. (2025). 10.1177/ 08850666241277134. Keuning, B. E. et al. H. consortium, Mortality prediction models in the adult critically ill: A scoping review, Acta Anaesthesiologica Scandinavica 64 (4) 424–442. (2020). https: //doi.org/10.1111/aas.13527 Johnson, A. E. W. et al. MIMIC-III, a freely accessible critical care database. Sci. Data . 3 (1), 160035 (2016). Johnson, A. & Mark, R. G. Real-time mortality prediction in the intensive care unit. AMIA Annu. Symp. Proc. 2017 , 994–1003 (2018). Johnson, A. E. W. et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci. Data . 10 (1), 1 (2023). Pang, K., Li, L., Ouyang, W., Liu, X. & Tang, Y. Establishment of ICU mortality risk prediction models with machine learning algorithm using MIMIC-IV database. Diagnostics (Basel) . 12 (5), 1068 (2022). Choi, M. H. et al. Mortality prediction of patients in intensive care units using machine learning algorithms based on electronic health records. Sci. Rep. 12 (1), 7180 (2022). Kim, Y., Kim, Y. & Choi, M. Machine learning-based prediction models of mortality for intensive care unit patients using nursing records, in: Studies in Health Technology and Informatics, Studies in health technology and informatics, IOS, (2024). Alghatani, K., Ammar, N., Rezgui, A. & Shaban-Nejad, A. Predicting intensive care unit length of stay and mortality using patient vital signs: Machine learning model development and validation. JMIR Med. Inf. 9 (5), e21347. 10.2196/21347 (2021). Somepalli, G., Goldblum, M., Schwarzschild, A., Bruss, C. B. & Goldstein, T. Saint: Improved neural networks for tabular data via row atten- tion and contrastive pre-training arXiv:2106.01342. URL (2021). https://arxiv.org/abs/2106.01342 Popov, S., Morozov, S. & Babenko, A. Neural oblivious decision ensembles for deep learning on tabular data (2019). arXiv:1909.06312. URL https://arxiv.org/abs/1909.06312 Grinsztajn, L., Oyallon, E. & Varoquaux, G. Why do tree-based models still outperform deep learning on tabular data? arXiv:2207.08815. URL (2022). https://arxiv.org/abs/2207.08815 Chen, K. Y., Chiang, P. H., Chou, H. R., Chen, T. W. & Chang, T. H. Trompt: Towards a better deep neural network for tabular data arXiv:2305.18446. URL (2023). https://arxiv.org/abs/2305.18446 Mutnuri, M. K., Stelfox, H. T., Forkert, N. D. & Lee, J. Using domain adaptation and inductive transfer learning to improve patient outcome prediction in the intensive care unit: Retrospective observational study, J Med Internet Res 26 e52730. 10.2196/52730 . URL (2024). https://www.jmir.org/2024/1/e52730 Alves, T., Laender, A., Veloso, A. & Ziviani, N. Dynamic prediction of icu mortality risk using domain adaptation, in: IEEE International Conference on Big Data (Big Data), IEEE, 2018, pp. 1328–1336. (2018). 10.1109/bigdata.2018.8621927 . URL http://dx.doi.org/10.1109/BigData.2018.8621927. Zhu, Y. et al. Domain adaptation using convolutional autoencoder and gradient boosting for adverse events prediction in the intensive care unit, Frontiers in Artificial Intelligence Volume 5–2022 (2022). 10.3389/frai.2022.640926 Shickel, B. et al. Deep multi-modal transfer learning for augmented patient acuity assessment in the intelligent icu. Front. Digit. Health Volume . 3–2021. 10.3389/fdgth.2021.640685 (2021). Hwang, Y. & Song, J. Recent deep learning methods for tabular data. Commun. Stat. Appl. Methods . 30 , 215–226. 10.29220/CSAM.2023.30.2.215 (2023). Gardner, J., Popovic, Z. & Schmidt, L. Benchmarking distribution shift in tabular data with tableshift arXiv:2312.07577. URL (2024). https://arxiv.org/abs/2312.07577 Gutheil, J. & Donsa, K. SAINTENS: Self-Attention and Intersample Attention Transformer for Digital Biomarker Development Using Tabular Healthcare Real World Data, Vol. 293, (2022). 10.3233/SHTI220371 Heikal, M. et al. Using machine learning and electronic health records to identify neuropsychiatric risk scores for delirium in ICU and general hospital settings. Neuropsychiatr Dis. Treat. 20 , 1861–1876 (2024). Lin, Y. T., Deng, Y. X., Tsai, C. L., Huang, C. H. & Fu, L. C. Interpretable deep learning system for identifying critical patients through the pre- diction of triage level, hospitalization, and length of stay: Prospective study, JMIR Med Inform 12 e48862. (2024). 10.2196/48862 . URL https://medinform.jmir.org/2024/1/e48862. Akiba, T., Sano, S., Yanase, T., Ohta, T. & Koyama, M. Optuna: A next- generation hyperparameter optimization framework (2019). arXiv: 1907.10902. URL https://arxiv.org/abs/1907.10902 Lundberg, S. & Lee, S. I. A unified approach to interpreting model predic- tions (2017). arXiv:1705.07874. URL. https://arxiv.org/abs/1705.07874 Additional Declarations No competing interests reported. Supplementary Files SupplementaryScientificReports.docx Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8872055","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":593253299,"identity":"6017c468-a189-4f92-9771-7a3e19236e6f","order_by":0,"name":"Ioannis Papapanagiotou","email":"","orcid":"","institution":"IB Hellas","correspondingAuthor":false,"prefix":"","firstName":"Ioannis","middleName":"","lastName":"Papapanagiotou","suffix":""},{"id":593253300,"identity":"8f6f90d6-e8a0-4d90-946b-dcda83a53cf9","order_by":1,"name":"Charikleia S. Vrettou","email":"","orcid":"","institution":"National \u0026 Kapodistrian University of Athens, Evangelismos Hospital","correspondingAuthor":false,"prefix":"","firstName":"Charikleia","middleName":"S.","lastName":"Vrettou","suffix":""},{"id":593253301,"identity":"41e87900-70ac-4d89-9371-64845a2b5797","order_by":2,"name":"Maria Theodorakopoulou","email":"","orcid":"","institution":"General Hospital Attiki KAT","correspondingAuthor":false,"prefix":"","firstName":"Maria","middleName":"","lastName":"Theodorakopoulou","suffix":""},{"id":593253302,"identity":"f1f9ed18-46b4-4892-9ea8-bfabfd03fe24","order_by":3,"name":"Zafiria Mastora","email":"","orcid":"","institution":"General Hospital of Thoracic Diseases \"Sotiria\"","correspondingAuthor":false,"prefix":"","firstName":"Zafiria","middleName":"","lastName":"Mastora","suffix":""},{"id":593253303,"identity":"23ca9230-ab43-4054-a6fe-5a2c84655eba","order_by":4,"name":"Vassiliki Giannopoulou","email":"","orcid":"","institution":"National \u0026 Kapodistrian University of Athens, Evangelismos Hospital","correspondingAuthor":false,"prefix":"","firstName":"Vassiliki","middleName":"","lastName":"Giannopoulou","suffix":""},{"id":593253304,"identity":"30c5fe53-8a84-4fdb-80c4-7fa8144972d2","order_by":5,"name":"Olga Kampouropoulou","email":"","orcid":"","institution":"National \u0026 Kapodistrian University of Athens, Evangelismos Hospital","correspondingAuthor":false,"prefix":"","firstName":"Olga","middleName":"","lastName":"Kampouropoulou","suffix":""},{"id":593253305,"identity":"49f994c7-5bc6-4122-9371-4a81c45678e1","order_by":6,"name":"Apostolos Karalis","email":"","orcid":"","institution":"IB Hellas","correspondingAuthor":false,"prefix":"","firstName":"Apostolos","middleName":"","lastName":"Karalis","suffix":""},{"id":593253306,"identity":"79c3e7c0-199c-436c-a695-244652fd3dec","order_by":7,"name":"Maria Pratikaki","email":"","orcid":"","institution":"Evangelismos Hospital","correspondingAuthor":false,"prefix":"","firstName":"Maria","middleName":"","lastName":"Pratikaki","suffix":""},{"id":593253307,"identity":"d00764af-0d61-4640-8d46-e193286a5dac","order_by":8,"name":"Spyretta Golemati","email":"","orcid":"","institution":"National \u0026 Kapodistrian University of Athens, Evangelismos Hospital","correspondingAuthor":false,"prefix":"","firstName":"Spyretta","middleName":"","lastName":"Golemati","suffix":""},{"id":593253308,"identity":"2b2db653-a245-45d4-9c9a-a781fffc2459","order_by":9,"name":"Georgios Poupouzas","email":"","orcid":"","institution":"National \u0026 Kapodistrian University of Athens, Evangelismos Hospital","correspondingAuthor":false,"prefix":"","firstName":"Georgios","middleName":"","lastName":"Poupouzas","suffix":""},{"id":593253309,"identity":"db55446b-dc2b-4e15-94ef-fd10c8269016","order_by":10,"name":"Vasileios Issaris","email":"","orcid":"","institution":"National \u0026 Kapodistrian University of Athens, Evangelismos Hospital","correspondingAuthor":false,"prefix":"","firstName":"Vasileios","middleName":"","lastName":"Issaris","suffix":""},{"id":593253310,"identity":"fdef43b6-ab28-4a4b-b017-0aac0a704be9","order_by":11,"name":"Kyriakos Karkoulias","email":"","orcid":"","institution":"General Hospital Attiki KAT","correspondingAuthor":false,"prefix":"","firstName":"Kyriakos","middleName":"","lastName":"Karkoulias","suffix":""},{"id":593253311,"identity":"aaecf734-fa69-49f1-897f-a3cdf2188bbc","order_by":12,"name":"Sofia Pouriki","email":"","orcid":"","institution":"General Hospital of Thoracic Diseases \"Sotiria\"","correspondingAuthor":false,"prefix":"","firstName":"Sofia","middleName":"","lastName":"Pouriki","suffix":""},{"id":593253312,"identity":"a4c2d553-125f-443a-9f48-3d63f2e8dc07","order_by":13,"name":"Chrysi Keskinidou","email":"","orcid":"","institution":"National \u0026 Kapodistrian University of Athens, Evangelismos Hospital","correspondingAuthor":false,"prefix":"","firstName":"Chrysi","middleName":"","lastName":"Keskinidou","suffix":""},{"id":593253313,"identity":"db7dc62e-a61d-45f4-b6c8-d58dffb67134","order_by":14,"name":"Nikolaos S. Lotsios","email":"","orcid":"","institution":"National \u0026 Kapodistrian University of Athens, Evangelismos Hospital","correspondingAuthor":false,"prefix":"","firstName":"Nikolaos","middleName":"S.","lastName":"Lotsios","suffix":""},{"id":593253314,"identity":"771935f9-ae10-4e4c-a71c-d9e3f36a4732","order_by":15,"name":"Anastasia Kotanidou","email":"","orcid":"","institution":"National \u0026 Kapodistrian University of Athens, Evangelismos Hospital","correspondingAuthor":false,"prefix":"","firstName":"Anastasia","middleName":"","lastName":"Kotanidou","suffix":""},{"id":593253315,"identity":"b51edbfa-8015-42f0-a252-dc222df62bad","order_by":16,"name":"Alice G. Vassiliou","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA0klEQVRIiWNgGAWjYDACdjD5v54fRCUUEKOFGUImSDaAtBiQosXgAIgmRgt/M4/hxx9/2PKMz69O/PDAgEGeX+wAfi0Sh3mMpXnbeIrNbrzdLAF0mOHM2QkErDnMliDN2CDBuO3G2Q0gLQkGtwlokT/Mlvzzxx8Dxs0zzm7+QZQWg8PMxyR42BISN/D3biPOFkOgFmvetgPGEjd4t1kkGEgQ9ovc8cbmmz/+HJDj7z+7+eaPCht5fmkCWhBAAqxSgljlIMB/gBTVo2AUjIJRMJIAAAUNQyq2R9d2AAAAAElFTkSuQmCC","orcid":"","institution":"National \u0026 Kapodistrian University of Athens, Evangelismos Hospital","correspondingAuthor":true,"prefix":"","firstName":"Alice","middleName":"G.","lastName":"Vassiliou","suffix":""},{"id":593253316,"identity":"d8f1a607-819f-4453-bb5b-a1df99d39056","order_by":17,"name":"Stavros Papapanagiotou","email":"","orcid":"","institution":"IB Hellas","correspondingAuthor":false,"prefix":"","firstName":"Stavros","middleName":"","lastName":"Papapanagiotou","suffix":""},{"id":593253317,"identity":"11e0a427-2fe5-4584-b3b7-60fc031aba28","order_by":18,"name":"Ioanna Dimopoulou","email":"","orcid":"","institution":"National \u0026 Kapodistrian University of Athens, Evangelismos Hospital","correspondingAuthor":false,"prefix":"","firstName":"Ioanna","middleName":"","lastName":"Dimopoulou","suffix":""}],"badges":[],"createdAt":"2026-02-13 12:53:16","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8872055/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8872055/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":102998334,"identity":"7a02da95-0c0f-4ce2-b571-0571189f1522","added_by":"auto","created_at":"2026-02-19 12:39:40","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":471387,"visible":true,"origin":"","legend":"\u003cp\u003ePerformance comparison between Baseline (BASE) (blue), ITL (magenta), and DA (black). \u0026nbsp;(a) Mean ROC curves for BASE, ITL, and DA LGBM models. Solid lines represent the mean across 5 seeds, and shaded regions show ±1 standard deviation. (b) Mean Precision–Recall curves for BASE, ITL, and DA XGBoost models. Solid lines represent the mean across 5 seeds, and shaded regions show ±1 standard deviation. Abbreviations: BASE, Baseline; DA, Domain Adaptation; ITL, Inductive Transfer Learning, LGBM, Light Gradient-Boosting Machine; XGBoost, Extreme Gradient Boosting.\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-8872055/v1/8f337149ebb2a9e33706fec1.png"},{"id":102998338,"identity":"5afc26ed-fe2c-4938-af32-bee59107d058","added_by":"auto","created_at":"2026-02-19 12:39:40","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":584076,"visible":true,"origin":"","legend":"\u003cp\u003ePerformance comparison between APACHE II (blue), SOFA (magenta), domain- adapted LGBM (black), and models trained on local data (green). (a) Mean ROC curves for APACHE II, SOFA, domain-adapted LGBM, and local LGBM. Solid lines represent the mean across 5 seeds, and shadedregions show \u003cem\u003e±\u003c/em\u003e1 standard deviation (SD). (b) Mean Precision–Recall curves for APACHE II, SOFA, domain-adapted XGB, and local XGB. Solid lines represent the mean across 5 seeds, and shaded regions show \u003cem\u003e±\u003c/em\u003e1 standard deviation (SD). Abbreviations: APACHE II, Acute Physiology and Chronic Health Evaluation II; LGBM, Light Gradient-Boosting Machine; SOFA, Sequential Organ Failure Assessment; XGBoost, Extreme Gradient Boosting.\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-8872055/v1/be162dede55564de03433124.png"},{"id":103049838,"identity":"9a36152c-fa46-4c57-9866-be688078c8a9","added_by":"auto","created_at":"2026-02-20 07:46:45","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":164257,"visible":true,"origin":"","legend":"\u003cp\u003eFeature-level MDI importances for baseline and domain-adapted Random Forest models. Abbreviations: APACHE II, Acute Physiology and Chronic Health Evaluation II; AST, \u0026nbsp;Aspartate Aminotransferase; BUN, Blood Urea Nitrogen; GCS, Glasgow Coma Scale; FiO\u003csub\u003e2\u003c/sub\u003e, Fraction of Inspired Oxygen; LDH, Lactate Dehydrogenase; HCO\u003csub\u003e3\u003c/sub\u003e, Bicarbonate; ALT, Alanine Aminotransferase; NA, Sodium; PCO\u003csub\u003e2\u003c/sub\u003e, Partial Pressure of Carbon Dioxide; RR, Respiratory Rate; APTT, Activated Partial Thromboplastin Time; HCT, Hematocrit; WBC, White blood cell count; CREA, Creatinine; PaO\u003csub\u003e2\u003c/sub\u003e, Partial Pressure of Arterial Oxygen; PLT, Platelet count; SOFA, Sequential Organ Failure Assessment.\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-8872055/v1/fec753a0e9f07d67d8772668.png"},{"id":103050269,"identity":"dd050198-4792-4557-a61f-62677fbb06f0","added_by":"auto","created_at":"2026-02-20 07:49:06","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":964557,"visible":true,"origin":"","legend":"\u003cp\u003eShap Beeswarm plot for Random Forest models. Abbreviations: APACHE II, Acute Physiology and Chronic Health Evaluation II; AST, Aspartate Aminotransferase; BUN, Blood Urea Nitrogen; GCS, Glasgow Coma Scale; FiO\u003csub\u003e2\u003c/sub\u003e, Fraction of Inspired Oxygen; LDH, Lactate Dehydrogenase; HCO\u003csub\u003e3\u003c/sub\u003e, Bicarbonate; ALT, Alanine Aminotransferase; NA, Sodium; PCO\u003csub\u003e2\u003c/sub\u003e, Partial Pressure of Carbon Dioxide. Lymph#, Absolute Number of Lymphocytes; HCT, Hematocrit; SOFA, Sequential Organ Failure Assessment; Systolic BP, Systolic Blood Pressure.\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-8872055/v1/f61ad08e35997818ed0bb998.png"},{"id":103503879,"identity":"49cb22d9-46a3-44f2-8edf-47e3986fb442","added_by":"auto","created_at":"2026-02-26 13:03:53","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":3479574,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8872055/v1/f376abed-9aba-4b4c-a423-6d599f9ae91e.pdf"},{"id":102998337,"identity":"88f262f5-7c04-4d7f-b2f4-e0ae59274d17","added_by":"auto","created_at":"2026-02-19 12:39:40","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":37511,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryScientificReports.docx","url":"https://assets-eu.researchsquare.com/files/rs-8872055/v1/b1afec806a418aecd8dce933.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"Prospective Multicenter Validation of Machine Learning Models for Mortality Prediction in Adult Critically Ill Patients using Transfer Learning","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eAccurate and timely prediction of mortality in critically ill adults remains a central challenge in modern intensive care medicine. Despite decades of re- search and the widespread adoption of scoring systems such as the Acute Physiology and Chronic Health Evaluation (APACHE) II [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e], the Simplified Acute Physiology Score (SAPS) [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e], and the Sequential Organ Failure Assessment (SOFA) [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e] score, these tools often exhibit limited discrimination across institutions and patient populations due to their constraints as linear models [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. Recently, machine learning (ML)-based models have shown promise in capturing complex, nonlinear relationships in tabular Intensive Care Unit (ICU) data often outperforming traditional scoring systems [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e, \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e, \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eA wide range of ML approaches have been applied to the Intensive Care Unit (ICU) mortality prediction from admission data, with tree-based ensemble methods frequently outperforming linear models and deep learning architectures in statistical metrics. In their real-time mortality prediction study on the Medical Information Mart for Intensive Care (MIMIC)-III [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e] data, Johnson et al. showed that gradient boosting outperformed traditional severity scores [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. Similarly, Pang et al. reported that Extreme Gradient Boosting (XGBoost) outperformed Logistic Regression (LR) and Support Vector Machines (SVMs) trained on MIMIC-IV [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e] data [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. Large institutional datasets beyond MIMIC have also been leveraged; Min Hyuk Choi et al. analyzed ICU admissions from two major South Korean hospitals, finding that Light Gradient-Boosting Machine (LGBM) and XGBoost yielded the best predictive performance [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e]. Random Forests (RF) have also shown promising results in a number of studies. Hu et al. evaluated nine (9) ML models on MIMIC-IV and identified RF as the best performer [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. Kim et al. applied ML methods to nursing records in MIMIC-IV and again found RF superior to LR, SVMs, and Na\u0026iuml;ve Bayes [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e], while Alghatani et al. likewise demonstrated that RF outperformed XGBoost and linear models when trained on MIMIC-III [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]. Variability in the best-performing models across MIMIC-III and MIMIC-IV studies likely reflects differences in cohort selection, feature selection, and evaluation setups, rather than fundamental inconsistencies in model capabilities, where tree-based methods generally outperform other methods.\u003c/p\u003e \u003cp\u003eIn addition to tree-based classical machine learning, deep learning architectures have also been increasingly used in recent years. However, standard neural networks often struggle with heterogeneous tabular data due to mixed feature types, missing values, and the lack of inherent spatial or sequential structure. To combat this, architectures such as Self-Attention and Intersample Attention Transformer (SAINT) [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e] and Neural Oblivious Decision Ensembles (NODE) [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e] have been proposed. Studies on SAINT and NODE have shown competitive performance against traditional boosting models, often outperforming them, indicating they could be promising architectures for tasks such as ICU mortality prediction, where capturing subtle relationships among vital signs, laboratory data, and comorbidities is crucial [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e, \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eAlthough ML approaches have shown promising results, a major barrier to their clinical deployment is their susceptibility to performance degradation when applied prospectively or across heterogeneous healthcare settings. To overcome these limitations, methodological frameworks such as Domain Adaptation (DA) and Inductive Transfer Learning (ITL) aim to improve cross-site robustness by leveraging shared structure across diverse clinical environments. Mutnuri et al. applied DA and ITL by fine-tuning the last hidden layer of their neural networks [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e]. This allowed the model to adapt to population-specific patterns while maintaining the structure learned from the larger internal dataset. The resulting model achieved markedly better external discrimination and calibration compared to the baseline model trained solely on internal data. DA has also shown very promising results in dynamic temporal data using convolutional and recurrent layers [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e] and also in cross-institutional ICU outcome prediction, as demonstrated by Zhu et al., who pretrained a gradient boosting algorithm on adult MIMIC-II data and fine-tuned on a pediatric cohort, significantly improving ICU-mortality prediction on the target pediatric population [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]. Despite these advances, several important gaps remain unaddressed in the literature. To the best of our knowledge, no ICU mortality prediction model using DA or ITL has yet undergone prospective, real-time validation in adult ICUs, when restricted to the first 24 hours of admission data. The first 24 hours of ICU admission constitute a critical period for early risk assessment, as decisions made during this window can substantially influence patient management and outcomes.\u003c/p\u003e \u003cp\u003eMany existing mortality prediction models leverage longitudinal data collected throughout the ICU stay which reduces their utility [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e, \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e]. Although DA and ITL have shown promise, existing ICU outcome studies have focused predominantly on neural-network\u0026ndash;based architectures [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e, \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e, \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e]. Apart from Zhu et al., who combined DA with gradient boosting [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e] using temporal data, we found no systematic benchmarking of DA/ITL frameworks using modern tree-based boosting models such as XGBoost or LightGBM, despite their strong performance in ICU mortality prediction. SAINT and NODE-based architectures have also recently been explored in similar domains [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e, \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e, \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e, \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e, \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e], but lack evaluation within DA or ITL cross-domain frameworks or prospective studies.\u003cdiv class=\"BlockQuote\"\u003e\u003cp\u003eAs such, the contributions of this work are as follows:\u003c/p\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eWe provide a prospectively validated study of ICU mortality prediction models using DA and ITL in adult ICU populations, focusing specifically on the clinically critical first 24 hours of admission data.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eWe benchmark DA and ITL using tree-based models as well as SAINT and NODE for adult ICU mortality prediction, which have not been investigated in the context of 24-hour ICU admission data.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e"},{"header":"2. Methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\n \u003ch2\u003e2.1. Data sources\u003c/h2\u003e\n \u003cp\u003eThe publicly available MIMIC-IV v3.1 [\u003cspan class=\"CitationRef\"\u003e10\u003c/span\u003e], which contains clinical and physiological measurements from a heterogeneous cohort of ICU patients, was used to train our ML algorithms. MIMIC-IV, which contains a total of 94,458 unique adult ICU stays, was used for model training and validation.\u003c/p\u003e\n \u003cp\u003eThe prospective data were collected from the ICUs of \u0026ldquo;Evangelismos\u0026rdquo; General Hospital, General Hospital of Thoracic Diseases \u0026ldquo;Sotiria\u0026rdquo;, and \u0026ldquo;KAT\u0026rdquo; Attica General Hospital. The Ethics Committees of the three Hospitals approved the study (study approval numbers 14\u0026ndash;17/01/2025; 27\u0026thinsp;\u0026minus;\u0026thinsp;5/02/2025 \u0026amp; 8260-26/03/2025). Informed consent was obtained from all patients\u0026rsquo; next-of-kin before inclusion, and all procedures carried out complied with the Helsinki Declaration.\u003c/p\u003e\n \u003cp\u003eThe prospective data collected from the 3 ICUs yielded a combined cohort of 539 adult patients admitted to the ICU. Of these, 141 patients died during hospitalization. Harmonized data collection protocols ensured consistent variable definitions across centers. Demographic characteristics of the MIMIC-IV and the prospective data are shown in Table\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003eTable\u003c/strong\u003e\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003e\u003cstrong\u003e1\u003c/strong\u003e: Comparison of demographic characteristics between MIMIC-IV and prospective data.\u003c/p\u003e\n \u003cdiv class=\"gridtable\"\u003e\n \u003cdiv align=\"left\" class=\"colspec\"\u003e\u003cimg src=\"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAecAAACICAYAAADd2CNvAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAAFiUAABYlAUlSJPAAAEEiSURBVHhe7Z3vbxvFvv/f+T6HsnYfcTgIecMDRI+CYFNQSStRqbFPOUKgU1inSAipVYsdhC6/kl6HClU3LTgUHiCI7aiVKgTYgaBTIRziVioSNujS+lSx7j26V6JrHXEqHu3GFP6AvQ/wZ76zsz9sx3bqJvOSVolnZn/NfubzmR+fmRmybduGRCKRSCSSgeH/iQESiUQikUhuLtI4SyQSiUQyYAyJ3dpDQ0P8T4lEIpFIJBuIbdtu4yyRSCQSieTmctO6tS3LQqVSwdzcHOLxOEZHRzE3Nycmk/SJQqGAUCiEUCiEQqEgRveUmZkZDA0NYXh4GLVaTYyWSCSSm8JA6ya7h5imaefzeVvXdVtRFBuADcDWNM1OJBJ2uVy2NU2z0+k0i+OPdDotXlLSB0zTdOV9vygWi477RKNRMcmmw0++Adi6rovJGaurq670Yr6J4eKRTqftaDTqCqcjm82Kt2Vks1lXev665XLZFS4e5XJZvGxHeqEdgt6v1aFpmuNa+Xze8Ux0JBIJR7pe0+47qKpq67pu5/N58RKSLhl03dQzrVwsFm1VVW00C0CxWGRxYgGgQigqsXYLp6Q7pHHuP6urq7amaa58BmCbpikmt20f46hpmr26usrSGIZh5/N5V7p0Ou24brFY9DQ6onHi0XXdlT6RSNiGYbA0q6urdiqVcqXjyzvPevRCK0zTdFyXDl3X2bN65SWfjyKUp6qqBqbrJV7vQJWb1dVV1/fod4VhqzHouqknWplXFoqiOAozsbq6ygoiXwj5zGm3cG4GxIrJRkOKUVGUrmrl7bwHKfONVHyDgGEYNppGg88jv/yORqOutH5lQmx5eUHlUrymV/k0TdNWFMWV1qslbAvl1k+pdaMX2kGUPfFZE4mEI94v321OUYvX6DdB70DfxC9eEozY0+OVd4Osm7oec67VapiYmGC/jx07hkgk4kgDACMjI3jrrbfEYMlNIh6PY21tDWtra4jH42J0Tzl58iRs28a1a9cwMjIiRm9aqBw8+eSTjvDz5887fgNAvV5HqVRype2Gu+66C/C4/8WLFx2/KSwUCuHBBx8Uo9bFIOiFgwcPOn6fO3fO8Zvnu+++g6qqGBsbE6NuGuFwGDt37nSEff/9947fku4YZN3UtXEWC1Y0GnX85jl69CgURRGDJZJNzV133eUoF4uLi6jX6440Fy9eRDQaZQa1l+zbt89R7t555x1HPJoVhiNHjojB62YQ9MLY2BhUVWW/S6WSK9+JTCaD2dlZMVgiuWl0ZZzr9ToWFxcdYa1qH4lEQgxyYFkWkskkhoeHMTQ0hFAo5OvFXSgUEI/HEQqFMDQ0xLzuksmky/OuVqtheXkZc3NziMViCIVCAIBcLsfuFYvFYFkWO6eT6xOWZWFubg6jo6PsnFAohHg8zrzTh4aGMD097TiP0orzzCuVCpLJpOMZyLOdf9ZKpYJCoYCZmRnEYjH2LjMzM+zcmZkZoJkXhULBlRc8hUIBsVjM9Q7JZBLxeLzt96hUKsjlco7n8kK831Dze/Tbk3yjeP755x2/xdbrO++840rTK8LhsKN3xDAMh/xSOT5w4AAL64Z+6IX18vrrrzt+nz171vEbAJaXl4FmJWbQ2bFjB+r1umOmy/DwMCqVCiqVCtM7ovexZVnI5XKeZSyXyzl0CVGv1x26mHTPzMwMuz4/6yaZTDLdVKlUEI/H2Xmk/7yo1+vsmpSe9KxXZaqVjq1UKhgaGsLu3bsd5+3evZulp3ReuknMIz6vCDGOt1FkN8RzO9ZlYj93J4gD6kHOJn7w5+u6bmuaZmua5nKUEMej+PEkGkvyG+MSxx7oSCQSrvuQN2sn1yf48TNFUdgYBj9ulM/n7XK57BoPK5fL7CB4xxt6Bn6MStM02zRN17gVxXnl47/927+50kIQA/6+5MBjGAYbj4xGo7ZhGC3fQxwXpXN5TNN0OE7Re/Jjn+K3v5WgPBGd8PiyQl7apmm6ZNXv3cW89YKuZXuU1VQqxdJls1n2PKIseY3T2S3GnMV7rUcvtEM7zyrmu6IoYhJb1/Wb5mwV9A7ksyDmIx9GRyKRYD4kFEYzA1ZXV5keUBSF3YOXNXHM1TAMdq1oNMqcDXknPr8yrmmarSiKZ5w47s9fj74B/1y8HrXb1LHFYtEul8sup8BsNst0k5ezJsmxaZouvSbKB68P6Z1M02TvTM9mmqZDl4nXCcK7VLeJKFhiIW0H/nw+s03TdBgWVVXZOaLQ8grML9wwDNcHyWazntfyCmt1fdF5gz4YKV7xHDHvRHgFxys3UdnQ9crlsuP+VAj5CgWaBWrVw5OYEN+d9wC2m0aB/87tvAf/XKKM8AWYF9ygc24lKM9tD29oqtylUilPxYQeGmdbyFO+PGma5ln5459dhE8jfh/xGmJ8rxDv4/esoqLlDQSVp5vlDOT3DsVi0VFGqSJuBxgPUWeSUeXD+EqZLVTEVVVl9+CvL55DclVuVjpFI0iVAlswpmjqJboHXwGAoGv4d+eNZic6VixLony00k1+5YVIJBKOvOHzkg8XK6vic/jRVbd2rzl27Bjr/gqHw44xMMMw2P+RSAS6rgMAFEVh41li18yNGzfY/5FIBE8//bQj/ujRo45raZqGQ4cOrev6x48fR6PRYL+pi4wfb1RVte2uw5deeon9Hw6HPf8HgEuXLgHN8TXeeWTnzp0sTNM0oNl1ODY2hpGREVdeED///LPjN3XH0LvPz8874luxf/9+l1MLUSgUUCqV2O+//OUv7H/q5lQUxZEXtzKiY9bS0hLQHO/k371fiF3blUoFtVoN1Wr1lujSXS9i3vKOYUtLS9A0rWW3+0ZBXa+PP/44Go0GdF1HPp/HlStXWNkPh8MuZ7eDBw8ynakoChRFQTqdxsWLFx26889//rPjvEcffZT9bxgGG27hu5NPnTqFmZkZ1k0+NjbGymc4HMbRo0dZWgAOp0LR4a/RaDC5P336tENn+uk50hG91rFBuglNe0RQeSHq9ToymQwOHTrEfp86dYrFb9u2jf1/++23s//RgVNfV8b57rvvdvy+du2a43c/KRQKsG0bP/74I/7nf/4H8XgcDz/8sCMNn0FB0LX4AtDp9S9cuOCI4wvSysoK8wj08lgVqdVqjgJVKpUc4xc8ly9fdvwWiUQiuHLlCmzbbsuwjo2NOZxzqtUqJiYmsH37doyOjuLy5ctYWVlxnLNevv32W8dvXojJi3JtbQ379+93pLtVEQ3gwsIClpeXEQqFNuQdRSP16aef4rPPPkMikXBV+rqhW73QzphfJ+zfv9/XMezMmTN49dVXudQ3l3K5jGaPJq5du8bGL9tlamqKzcIYGRnxnBnAIxoOSv/UU085wk+dOoUHHngAoVAIyWQSr732Wtue7ffff7/jNxl5cQyW/9Z8pR3Nc3qpY9uBDC/BOwwuLS0hkUiwe4k+JNPT0+xdxLHvq1evOn770ZVxvu+++xy/DcNwtS77Ra1WQzwex/bt25FMJvHkk092rASC6PT6vDHtlt9++83xOxqNsgIrHmtra460vWB+ft7Te5YMdTKZFKPWhZezx2YmHA6zHhk0ZebNN990hPWT/fv3O75roVDA4uKiy2h3y83UC36InuhLS0usEixWmjYTv/zyixgUCKU/cOCAp4d9o9FAJpOBpmm+TrGtoHLPt4KB3zd78DtGRkZ6qmPbIRwOOxwV+UrdwsICXnvtNRbH96ICQDqddr0DHWKlxI+ujPPIyIijRgqPGoSIZVkYHR3tqrDWajU89thjzCP0448/7qh22Yr1XF80Zt2832233SYGbSjxeBzVahWpVIp1ifNkMpl1F0yeO+64w/H7119/dfzejDz33HOO39VqFc8884wjrJ/wyqbRaPSlZ6JbvfD8888jnU67jm682cWuzoWFBeRyOcTj8Z72GgwaYhlrBbUEqTWaz+eh67pLvzUaDdd0uXahe4jXbIWYvhsd2y68AUazK75QKGB4eNjRQm+3l7YTujLOAPD+++87fgdN9AeAyclJhMPhrgrEZ5995qh19Vq5rOf64+Pjjt9eyqhWq7XV6hSVW9D8zF4zNzeHubk5RCIRnDx5EleuXIFhGEilUo50Yut+PTz22GOO31999ZXjN5oFMB6Pb0hB3AjE1utGj3eKFYF+TWHqRi/E43FMTU25jlYV5CB4PxI0W/OZTMY1XrrZEP0cRMQK8Z49ewAAoVCIlb1CoYC1tTUUi0WHXmq3Ve53D/F7tmpR9lLHtosoN5lMBm+88QbeeOMNRzqx92VhYcHxez10bZz379/vavr7ZU4ymcSFCxfwySefiFEdIXZv0EB9Lpdz1a7Ww3qu/+KLLzp+J5NJhwNBrVbDX//6V19FLM4BFOdn6rruMtCFQqHrXggvpqenHS1jMtQ8d955p+M3Ib5HEAcOHHDkZyaTQS6XY78ty8Kzzz4LeDjC3QpQXnz99deOcF4pHT582BHXS8jxhObxwqPiJxrrXnEz9EIrxF6Lja4Y3Qzi8bij90uURb5CrGkak81Go4HJyUku5e/flB8e8BvbFbt4+XuoqsruIVaMRJ2JZhmKxWKoVCpd69h2HbFExPsODw+7xttFI24YBpLJpEM3W8352WKlxBfRfXu9iFMCVFW10+m0nU6n7VQqZauqamvCIv6iqzu5/hPi1BOv+cZ0aM1dbfgpAJqm2eVy2TYMwzXfDAE79Kzn+kHn0bQAfqqQ+O5ovj8/bUqcLkHX4+fS8fOQ+SkTNJXKi6C8oO+oKIqd5jZT4J+XnybQ6j1aPZc41QJN2aF35KeQ3Erw09WU5vrl9B781AqSHbuZV+I3V1XV9R298iyVSjmuxU8T0YQNJ/j1hHm8ptjpuu6aZuQl5/z78axHL7SDYRiu6WTRaLTldXhZFOfcbjRe+d3OO9ge+UpTqbwQ70PyxJddsZzxz0Ppeb2hCOs88M/C34OXdUWYs2z7yBKVf5Jfmhrll95Px4pTrygtvVMr3cTD55+f3JjCmg10kD6k+4t54EfPjLPdLDBpjy3rotGo64VE4eIP2+Nj00EfKpvNsoxVVZUZF5ObCI5mZnz44Yeu64j3E+nk+vwHJQXLf3RRORL5fN6RziufyuWyrXtstccr46C8jApz9/gCKR7RaNROp9NsUQZe0JTmpgidvEfQc/EFzmwupMLfj5T4rYgo//xBUH4SQXkF7juK4eIRdB26Bs0L5SunQc9M1w2SHTq8lFsneqEdxOuIhyjzPPxcVD9jthG0eocg2RfT8odX/tvNMpbNZl339fsGaMpHlDOS4HSPmHf8NXVdd1T+VVUNrDx46Uw6x+t9vNL76VixYqI1G1m8HIiHV95TpUDh5mn7kc/nPfM5m822PJdnyP49YyUSiUQiWRf8FM90Oo2pqSlHvKRzuh5zlkgkEolE0lukcZZIJBKJZMCQxlkikUgk60b0sKaV7yTdIcecJRKJRLIuxOWERaR5WT/SOEskEolEMmC4jHOrmpBEIpFIJJL+Ydu22zhLJBKJRCK5uUiHMIlEIpFIBgxpnDukXq9jZmYGoVDI5aXYCyqVCkKhEGZmZsSonhCPxxEKhVzrdEskEolkcOjKOPttij43NycmBZobNfDnhEIhxONxVCoV1Go1dt7c3JzrmnSIiPFDQ0N9MZpoGualpSUsLi669iKVSAaFer2OeDzOykMsFmtri0/LshAKhXzLL0+lUgksa4VCwVUuR0dHxWSYm5vD8PAwSxOPx9t6VslgwX9DOsTpVLTV4tDQEIaHh33lbL3yu+lwrubZOfzi4VFh4wpidXXVVlWVbaZAa0KbpulYl1lc05RfnzVosXBa81c8v1/Qc3mt+yqR3ExoUwxay9g0TbaZRit5JbluVY5M02Rl1u+atMEAf4hrH6dSKcdmA/Ss4qYKksEmn8+7vje/brzd3KuA1rWmb+wla93I72aja+Nsc4u4ixltc5nttSMJYTZ38xAXrCejK4Z7gQ1cyJ42F9hqwiIZfKLCzmYEVY79ykixWGQbBHiVY55UKsXSepUB2vQkCMMwbPjs8ANh5zPJYKO12DmONqrgId2uCjujrVd+NyNddWu3w+HDh9FoNDA/P++7z2Y4HMbi4qIY3DG34p6/EkmvqNVqKJVKePrpp8UoHDlyBI1Gw3ODesuy8Oabb+LEiRNilItKpQLDMDzvQbzxxhs4cOCAGOzg559/FoMYqqrKYaNbBNpTPkj3btu2DfPz846wsbExaJqGtbU1FrZe+d2s9NU4Ly8vo1qtQlGUlhtMRyIRPPXUU2Jw38jlcq6xLi8nKcuykEwmEQqF2PjH1atXxWRYXl5GLBbD3NwcLMtiTmO8cxdda6g53l4oFMTLsHF5cTymXq+z8Xoaqxc3r8/lcuw5R0dH2Xg+fw16Li/48T8aE+I3C0dTOSeTScRiMXYO3dPLia1SqTiumUwmXe8m6Q2//fabGMTYtWsXAOD8+fNiFCYnJ3HixAncfvvtYpQDy7Lw8ssv49SpU2IUo1AowDAMjI+PI5lM+o5J33fffUDTkPMyVq/XYRgGDh48yKWWDCpvvPEGMpkMYrEYcrmcS18goNFkGIbDLqxXfjcrfTXOH330EQBg586dYpQnR48eFYP6Qi6XwwsvvIDZ2VnYto18Po/FxUWXsbMsC7FYDFeuXEG1WoVpmohEIq5W/vLyMj766COUSiXcuHEDk5OT+NOf/oSPP/4Yqqri1KlTKBQKmJycxJ49e1AsFqGqKiYmJhyODsvLyzh//jxKpZLj+gCg6zr27t0L27ZRrVbxyy+/OCoTy8vLeOedd1CtVmHbNl599VXHc1qWhaWlJWQyGc9WyejoKD7//HNcuHABtm3j9ddfx/T0NGKxGCtwtVoNX3/9NTKZDABgZmYGd999N7788ktomoZTp065KgNPPPEEzp07B9u2ce7cOVy4cIHFS/qDV+WR+OWXXxy/yWln//79jnAv3nvvPRw+fBiRSESMYnz77bdQVRWGYSCTyWD37t1IJpMupR0Oh5FKpWAYBh5++GHUajVYlgVd15HP5zE2NuZILxk8qOINAKVSCS+88ALuvfde3woZDzUk/uM//kOM6kh+NzViP/d68Btz9nP0apdOx5zbhZ6Xh8bQeMhxQRzn8HKGoWcVx1+KxaJnOG3eLeaNl3Pb6uqqDWFM3TRNR77ouu4ap0ulUq4xQa93pzF00QmHHDHEZ0Rz03E+vddzkxMIT7FYdF1P0htoHNdLZr3KEjl2UVqvb0iUy2XHua38LkzTtNPpNHP88SvDJGOUxs8vRTLYkHzQt/STC8MwmOyI/gadyu9mp68tZ8MwxKCBIBKJQNM0R5jY9WJZFjKZDOLxuCtufHzc8Zvn6aefdqSnrkIx/K677mL/t+K2224DABw/fpyFhcNhxzBAKBRCJpNxtMSfeeYZ9n8QCwsL0DTN1SJ67bXXAABvv/22IxzN3hAxPQBcunSJ/b9t2zZUq1VH9/3+/fuxbds29lvSOyKRCFKpFBqNBmKxGOtZWV5exssvvwwAuOOOO1j6yclJvP/++y75FqHubHHcMIhwOIypqSlUq1VomoZSqeQ5jDM/Pw9VVYFm6yuXy4lJJLcAY2NjWFlZQTabBQA8//zzYhJUKhWoqorp6WkAwMTEhGMorFP53ez01TgPKvPz87hy5QrQFJh4PO7qSibHg3vuuccR7hfWTyKRCBKJBDKZDIaHh5mS44cB6P8HHngAyWQS9XodIyMjLbsHa7UaDMPwVNCRSASKoqDRaLTVVSWyb98+aJqGiYkJxGIxdo2NGr7Yipw8eRLpdBqGYUBVVYyOjuLXX39l8Y899hjQh+5sPyKRCM6cOQN4jBdaloXR0VHMzs6iXC5DURRkMhnX8JLk1uHo0aPQdR2GYbjmJo+NjcG2bayuriKRSACAayisXfndCvTVOEejUQDAP//5TzHqpkPjJbOzs3juuefYsxI//fST4/fNZn5+HtlsFmtra5iYmMDo6KhD+EdGRlCtVhGNRpHJZKCqaluOV0FOGOjAX8CLcDiMlZUVpFIplEol7N69G/F43DX+KOktU1NTWFtbg23buHLlCnbu3IlqtQo0K0yWZeGll17Chx9+KJ7qolKp4O9//3tXFaqRkRFEo1HXeOGzzz7LHBfHxsbwzTffQNM0ZDKZtmRXMpi8+OKLQIBuGRkZwfz8PNLpNADg008/dcS3kt+tQl+NM3W7dusEdO3aNTGoKwqFAnbv3o3Z2VmsrKy01XoYBI4ePYoff/wR6XQa1WoVjz32mMMpLBKJYGVlBeVyGZqmYXp62tOD2ovLly+LQQ7uvPNOMagtwuEwTp48CcMwoOs6FhcXmae3ZGNIpVLsbyQSwdmzZ2EYBrZv3+5Y0Wn37t0AgOnpaQw1V/qbnZ1FqVRypBsaGmJdk7t378ZQwEphxN69ex2/adoM75U9MjKCxcVFKIqChYUFR3rJrUOr3jri0KFDAOCYTuWFKL9bhb4a5wMHDkBRFBiG4TneJCLWlmm6hWEYga2tWq3mGkP2o16vY2JiArquB07v2rFjBwDg888/F6NuKjSWl81m0Wg0sLS0JCZh4z/kKR7E2NgY67oWu6HQ7HpUVbXrQhGJRFAoFKDrOqrVaktlLukNhUIBi4uL0DQNr7zyCtCclpJOp10HdTVGo1Gk02ns2rULzz//vCtdOp1mPU2JRALpdLpl5e3q1auOcUhxGImgIZxB9VeRtKZWq0FV1ZZGmobSgrqqveR3yyB6iK0HP29tm/NKDlohzG6mE5f3s7lrZ7NZMYqh67rnvb0grz9xeTnRW9s0TeZpKnoVplIpl0ciXVf0mPbzgO0kvFwue74/n46Wu+Mhr0ieIG9tcRUfv1Wc4OE16eVNmU6nXd+c0vl5c0p6B80UUFXV9R288JI9P1p5a/Osrq66vPZpBoLXvRKJhEu+JLcOuq576nIRWj1S1FtEp/K72ejaOPNra+u67pnRvIEWFfbq6qqt67rLABD89dPCuty05KBoVILgjW42m7WLxaKt6zq7Rz6fZ4aQnpvuXS6X7XQ6zQw5L4TZbNaGx5Qpv3BSblFhPXKv9KQ0s9ksC0s3p6lQfkSby95R3lK+8ZUFmjpD78mH0zvRuxuGYWua5qrEkFIVp1LRc/NTc+gZKY9M07R1XXcpaklvKZfLbIqSKF9BdGucSQZ0XWeyUSwWfZ+BnpFkzjRNO5vNtqzISwYDpbksM+kSWqZT1OUkF4lEgslFuVy2VVX1rNytV343G10ZZ35eG394FW7DMBxr8pIi5z+YH2ZzziR/Ln04URDaIZ/PMwNNLU5SNqlUyiEMxWKRGTRVVdk83Si3kL9XPpSFeX+twv2uYzeFVdM09hzwmBOq67rjfKoIEaR4+YOPN7lNB9B8V/E7Uh6J1/B6bqrIRKNRdk1w+S3pD/Ttgyq8fnRrnM3m3Hv61pqmefb48KTTaYdc67ouDfMtAq9HleZmFV7fbrW58RF94yC93438bjaG7N8zRCKRSCQSyYDQV4cwiUQikUgknSONs0QikUgkA4Y0zhKJRCKRDBjSOEskEolEMmBI4yyRSCQSyYAhjbNEIpFIJAOGNM4SiUQikQwY0jhLJBKJRDJgSOMskUgkEsmAIY2zRCKRSCQDhjTOm4h6vY6ZmRmEQiExCrlcDkNDQ1heXhajNhTLsjA3N+fYh7pbKpUKcrmcGCzpASRTm5l6vY65ubnAbWklTmSZc5LL5Xq+DW5XxjkWi7k2YadN2r0oFAqOc0KhEOLxOCqVCmq1Gjtvbm7OdU06RMT4oTY2ft+MWJaFpaUlZDIZNBoNMXogsCwLx48fx6FDh9j+0LVaDcPDwxgaGkIsFvPcU5onFou5DPvY2BgeeeQRxOPxm65g4/G4Sx75g38/y7Ic6ePxuO/7Ly8vY3R01JFWzAeiXq87rttOvnpRq9Vw9uxZnDx5koV18n5ehEIh1zl0eFUqCaokUB7wOqZbGYpEIjh06BAmJydbnrtRUCU2KE9ELMtCKBTy1b9oGtVYLBaYphWFQgHXr1/H0aNHWVitVnPIxujoaFt6OOi7BrGe/CkUCojH40wGCcuymF0aHh5u2YCZmZlxpTl69CiuX7/e24qsuBNGp/BbOvpt70W7ktBOSfy2j/l83rElJI+u62wnE6/dTohOdtMZdLp9B9oVqBPK5bLn1m29hLal5L+jYRhMJmhHo6DtJFOpVOAuR9ls1o7exH2ATdNkuzFFo1HHoaqqraqqIy1td5rmdlxTPPa3pa1L6bpUJry2VqQ9cmn3L9ptDG3uvUysNvdg5p+lk/fzgvbnFc+NNncu89v6lXY/UhTFzmazfZMhyjsx/zca2maTvnO7kL700iGGYdjp5haufmnaIZ/Pe24jqyiKraoq+5b07EE7SwV91yA6zR9+tzTa5pf/xrQ1rmmadiqVshVhO1we2mLYD13X1523Iq3frA3oxb0eij6clyIhSFFFBcVKRlcM9wLATS9U3WI295ruBvoWnRCNRjtS3OtB13WX8qVtB+m7kRHyepZyuRxYKIhoNOophxtBNpv1VUa6rjv21k6lUq7yQN9OvIYq7HtLe22TsuHxM05UOW6njJjNfb/F5+jk/bzw21LQbu4NTFuw8njtb87TaxlKN7c5HQSo0dIOtLe9nx4mKH+C0vix2tzLXTRcuq47Kjx8hdBPn7X6ru3QTv7wZcVLdsnGkOwZhuGbP2Sngp6XdLiX/HVK8Ju1SZBxDsoYHsMwXIWiU+N8K0Mfvtv36NQ4r6dV1Sn0HUXFLD4rpRPliIxFUKEgSPmICmQj8GuRmc0WJ//+Xu9CLUv+OmWfXg1SInz+kfIU88/mjFircmg303op1U7eT8Rs9pJ5USwWPe9H3zJIIfZahihf/Z51IxHfzQ/SHSQ/4rvz+OVPO0SjUc/Kjd+1yHiK8tvOd22HVvnD61S/70nlgn9G+NicdhsxiUSiZS9SO3Q15tyK5eVlVKtVKIqCeDwuRjuIRCJ46qmnxOC+sby8zMZeLMtijlShUIiNG1iWhWQyycbDCoWCeBmAS0djGaOjo55pacx9bm6OjZOFQiEUi0U8/PDDqFarADeOzo/Z5HI5Nq421GLMUcSyLHY+f81YLIZMJgMA2L17NxvzoXvQEYvF2DmVSsUR1w6zs7NQFAUjIyNiVFtMTk7i3LlzCIfDYpSLnTt3AgDOnj0rRvUdfgyO5+LFi1BV1fH+Xu/yr3/9CwCwb98+FjY2NoaxsTEu1e/QmL2iKCzst99+41I42bVrFwDg/PnzYpQDy7Lw9ttvY3x8XIzq6P1EwuGwrw746quvXHH1eh3JZBIAsLi46JlfndCuDEUiEaiqinfffVeMGlgmJydx4sQJ3H777WJUz6hUKiiVSnjsscfEKExNTYlBAIDh4WEAwG233cbCev1dg3jvvfdQrVaRSCRc8tUpuVwOe/fu9SyLInv27IFhGK5x6U7pq3H+6KOPAE5htsKv8Pea5eVlfPTRRyiVSrhx4wYmJyfxpz/9CR9//DFUVcWpU6dQKBQwOTmJPXv2oFgsQlVVTExMuBxG6vU67r33XgDAjz/+CNM0MTo6iomJCSaEaN7z/Pnz7J7T09MYHh5Go9HA9evXce3aNUSjUeD3qiBs22aCkMvl8MILL2B2dha2bSOfz2NxcdFx/SCWlpZw5swZGIbhCF9ZWUE6nQYAlMtl2LaNqakpGIYBTdMAALquY2VlhZ0zNjbG8sM0TRbuR71eR6lU8pSBvXv3Ak2DAADXr18HAOzYsYOlKRQKUFW1rUIBzmgtLi6KUTeN8+fPQ9d1MdiBZVl45513kM1m2Tu0g5fSuXr1qhjE+OWXX8QgBxcvXkSj0cCDDz4oRvnSzvsFUSgU8Je//MURdvbsWTQaDSQSicD86IcMDQ8Po1qttl35vZmQAdi/f78Y1VM+/fRTAMD9998vRgUiVtra/a7dYlkWTp06BQB47bXXxGgGVVp//fVXoKmvAOChhx5iaWq1Gv72t7/5VkJE7rrrLqBZ6ewKsSm9Hvy6tf0cvdqln93adG2xa4W6hsRwv7GaaNMZRsSrS4fuyY/98fF+3TRe4X5d4F5pbZ/um6BwelavbqwgpxoRyk+v8UgvZx4+L72GOtqB8iCom3UjCfK3sLnxQlHmgqDuV/67UZjX2HK7ZSmVStngxuDaodX7BUE+KSIK57iUSCTY72g06rhXP2SIykS7Mt4v/MoyIXbVt9Nl3U4aL0jfiHIVhKqqrvu0+13bISh/+DHtbDbLnp9khUfj/J1I/vlhsWg02vEwGd2rG/rachZbaoPI008/7ehaoa4hMZxqQzzUKvRqNczOzgIAPvjgAzEKx44dY9dupzYfiURYS5boZ3cQms8VjUaxuLjoakFkMhkcOHDAEebHf//3fwMAtm3bJkYhEongyy+/xMLCArZv3w4A+OKLL1i8ruusZY/mFAYaemhnykVQN+9Gsby8jFAo5NvlG4vF8Pjjj6NaraJareLhhx929c54cfr0aaRSKYf8RCIRpFIpNBoNx3Sh5eVlvPzyywCAO+64g6X34u9//zvAlYNWtHq/Vnz22Weu1n+tVnNMB3zttdewtraGbDbLulbp3fopQzdu3BCDBorJyUm8//77fdcFANiQW7v3KhQKUBTF0drs5Lt2yzfffMP+/+Mf/4grV67AMAyoqorp6WnHlCfqZRsaGsLi4iKKxSJr1c/MzOCpp55ivwuFAhteTCaTgVM3u53S2lfjvNm5ePEi4GN47rvvPqBH3avz8/O4cuUK0Bz7icfjKJVKYrKe88YbbwBNQ0AUCgWMj4+3XUhbMTY2hmvXrsG2baysrDAlPzc3h8OHDzt+nzp1Ct988w2++eYbTE9Pe47r83z//fdi0Ibz1VdfeVbeiJWVFZimiXw+D1VVYRgGpqenxWQOarUarly5gldeeUWMwsmTJ5FOp5kiGh0dZV12ADzHDLuh1fu1YnFx0dWlTZUqTdMwNTXFFOPRo0eRSCTQaDQcMtkvGbp06ZIYNDBsVHf2erAsC++++y7OnDnjCO/0u3YDDd+cOHGC5VEkEmH6+NSpU44K3srKCmzbxrVr11j6SqUCwzDYcGulUsHExARef/11mKaJCxcu4Pjx4807etPOXG8/+mqcaQz1n//8pxi1KQiqWa+3JeFHpVLB8PAwZmdn8dxzz7G87SdjY2NQVRWZTIYJ8rlz5/Dcc8+JSXtKpVLBpUuXHD4ICwsLiEajGBkZwcjICKLRKKs8DDKFQgHPPPOMGOyAnKV++OEHKIqCUqnk24KwLAvT09M4c+aMbwVpamoKa2trsG0bV65cwc6dO1nLh3c26wXtvJ8ftVoNa2trvgbG6/1IJi5cuCBGOdhMMiRiWRZeeuklfPjhh2LUQHD8+HG8+uqrvjqwm+/aKWIPUCQSYZXJy5cvO+J4LMvC888/78hjGnc/evQowuEwjhw54tCNvaavxpm8r7vN8GvXrolBA0WQA46qqmJQxxQKBezevRuzs7NYWVnxVWb9gLrnT58+jXq97qhZ9gPLsvDyyy/jk08+cYQbhuFw0qCwIMjZ42bRaZdvOBxGIpEAArrk33vvPaTT6bavCQCpVIr97aUTTqfvJ+LVpQ3Bu1eE7rW2tiZGMXolQ+RsNmicPXsWhmFg+/btjpkTu3fvBgBMT0+zmRcbTaFQwJ49e/ryXTshaPiGnB2DGlde3v31et01vAgAP//8sxjEaGfY0o++GucDBw5AURQYhhHYfUSIwkRdw4ZhBPbt12o1z0zrN9R69ap80PN6TUnphHq9jomJCei67inw/SYej7PW8+nTpzvuwiSv2XZ7T44fP44TJ0541q69hg+CCFIGG8FXX33V8ffftm2b77SzmZkZPPPMM55xfhQKBSwuLkLTNM9ucBEyXuT1HMR63o9ncXERe/bsEYMxMjICVVUDh26C7tsrGeok7Uaya9cupNNp10EVu2g0inQ63dPKKenXoFYi6Xg/PdXtd+2EJ598EmgxtPXII4+IQUAL734vmfKDn+a4HvpqnMPhMObn5wEAyWQy0NGlUCg4pj+geT4ZwKWlJUccz1tvvYWnn35aDG6LoNpTK6hrrNFouCofNB7t5cbf7j3r9bpvrSyostINXoXv9ddfB5qOYIcOHRKjA3n44YcBn+uKBI2jqarqGgP065W4fPmyr4HbSAoeU4RacfXqVRw7dkwMRjKZxKOPPup6J6s5R9+L5eVlTExMQFXVwG5wnkcffRQA8NNPP4lRLoLez7IsVCoVXzmt1WowDMO3m/3IkSNAcxohD8nIiy++6AgneiFD1BPmp7xvNmNjY5iamnIdBw8eBJot/qmpKU/jsl7IaPrpo0KhgP/6r//yNMz8lM9OvmutVgu0GUHs27cPiqJgYWHBJYOff/45G94QqdfrOHfunGdFNhKJeFYs7rzzTsdveuauKxqi+3anGNza2rQ+qQhNQyI3dt5lfnV11dZ13XcFF/76aWFdbpqCIi4L2Q5+y8f5hdP0iqiwfjhNBVG4JdvK5bKtNNeL5aFrqKrq6ZpPq3Wl02k7n8/b+XzeNpvLwaE5tYPWdqU8yefz7D40tYLCefTmurtiXtG30ZtrzorTDGxuSbpOp6QQuq63nFZgNJfZ85Ifm8u71dXVwJWwzOZqVV5TtzYSv1WvCKW5FjF9J7O55KHX1DWachL1WJMaHtOeyuUykyVRXttBURTP5+Bp9X50f1HeiFQq1fIe0eY6zWK58rtmr2SIprXdTGiKGDymOfrRzjQpmirkp6v9oPwSdZrN6RBRNun7iWWxne9K0wLhs9pfO/lD+ZEQ1plXAqb+acL6/zx0vWw2y3Stl06k6aNiueyUrowzKQfx8BIOwzDsVCrF5puhaaQSiYRn5vOYpmmnuc0B6IhGoy4j1A5ez10ulzsK56EKBsVpzaX0eLyuIQoVVUQUYS5evrlAPDhBI0WTSqVs0zSZ4PAHCY5fOEHPRtf2IqgC1Qp6Nj+ht5vPEBRvc8tKivnDQ4qilUz1m0Qi4WtEbE5J8t/EK3/FdOIhGkgK6+Z7UT4H0er96Dv4PQNfMfHD5DYiQFNfeBkHohcyREah1bP1Ey9dIZZZL4KMs5d+gIcOCiLqsXwnGaKgQ/wm7XxXs7n0pldlq5P8KZfLDruh67qvbkin067nECkWi6wB5KcvEz1avnPI/r1ASySBDA8P44cffmira9QL2qqNhjn6RSwWY916kvVhWRYefvhhzM7OenZTbmbm5uZw6dIlx6p4kt+p1Wp44IEHYBhGTx0LNxOWZeHee+/Fl19+2fWwQl/HnCWbg17Mbf7www9x5cqVdY8htQPNKZSGuTvC4TC++OILvPvuu67xus2MZVlYWFhweXlLfmdkZAT5fJ55/0vcnD17FseOHevaMEMaZ4kfvAPXuXPnPB3bOiEcDmNlZQXT09N9MdC1Wg0ffPCBVKw9YmRkBGfOnMGzzz67JQy0ZVmYnJzEF1980VUldLMTj8fx5JNPtr2u/1aiUCjgxo0bPWscyG5tiQvqvlIUBaqqYnx8HCdPnhSTrQvLsrC0tIR9+/b1rGusVqvhP//zPzds45StRL1ex9LSUs8UziBSr9dx8eJFHDhwQBrmNqlUKrh+/fqWG/bwI2iWwHqRxlniol6vY3x8HGtrazh27NimVswSiUQyiEjjLJFIJBLJgCHHnCUSiUQiGTCkcZZIJBKJZMCQxlkikUgkkgFDGmeJRCKRSAaMTWOcK5UKkskkhoaGXOGhUMh3cwCJRCKRSAaNrozz3Nwc4vE4209U3PLRj7m5Occ+pLFYTEzSEZVKBd9//z0ymYwYJZFIusCyLORyOcRisZZltVAosGVaxUqyZOsSJBe1Wg2jo6MYGhpCKBRCMpncEovetIWw1va64BcwbwdK77UoejfQdSUSSfesrq6yRf5TqZTvJgmmabLNCGh3M68NASRbi1ZyQfKVSqXsVCrFZM1vE4utRk8sGb9LSKvdXPL5fN8+Aj2HRCLpjnJzG7+g7fVsbpvGdsq+ZOvQjlyI21aa3Pa4fjtHbSW66tbmSSQSAIB3331XjHLwxhtvsA23JRLJ4FGr1fDEE08AAL755hvPTenR7PLWdR3VahX5fF4u5SgBOpCLDz/80LFcajgcZnbkt99+41JuTXpmnA8ePAhVVVGtVtnuQCKVSgXDw8PYtWuXGMWwLAvJZJKNT4yOjnper1arOca7/cYqCoUCYrGYazy8Xq87zh8eHkYul3Ok4c+l56KxkUKh4EjbDpVKhY2vBL2bmE48CDEP4vG4Y8MKiWQ9HD58GI1GA/Pz876GGQDee+89VKtVJBIJXwUs2Xq0Kxde65gbhgFVVQPlbssgNqXXQzQatcvlsp1OpwO7q2lDedr4W0xHG2zTZuH8mBfftba6umorisI2u+a7UPhXKhaLtq7rNoQNyE3TtFVVZRt5033BbT7On5tKpexEImEXi0W7WCyy7r5OxtVWV1dtNDfopt9e1xE3TKfuRbF7iPKmWCzaNrfpuaqqHT2XRMLDy1EQpmmy8ia7ICVEN3JBOjJoGGUr0VPjHDRmYBgGK/B+xjmdTrvCSFnous7CVFV1pTMMgwkFj2js/MLoPl7potGow+BRJYQMeTt4nZNKpVxhmqbZiqKw3zZ3Lhl2u5nn/LPa3PWy2awjXCJpF6qQJhIJh5MOXxG0bdvOZrM2AFvTNDubzbLKraIoLrmUbB3WIxemadr5fJ41uCS/07NubTS7Kagb4/Tp046406dP4/XXX3eEibz99tt46qmnHGF/+MMfAACLi4tAc2suwzDw/PPPO9JFIhGoquoI8+O2226DoijYsWMHC7v99tsdaXj27t3r2QXz/fffi0G+3H333VAUBXfeeScL27ZtmyMNAFSrVezcudMRRsMA1GVdq9VQKpUQjUYd6e6++24AwN/+9jdHuETSLhcuXAAArK2t4ZlnnsG1a9dQLpextraGxx9/nG2N980337Bz/vjHP+LKlSusS3J6elquK7BFWY9cbN++HRMTE2g0GshkMojFYp5DlFsO0VqvB2o520ILllrP1I1MrU+vljN/nt9hGIadSCRs+LRavby1vVrJPKZp2ul0mrUQvFrO4rnUkhXD24Xeg3oZ+HeBR5ciPQf1HuTzeVfe8Id4vkTSLiRD4tAIyRzJFpU1vjVt+5R/ydZhvXJhGIadTqeZTlyvbt1M9LTljGYLVtd1AMDS0hIA4OzZs9B13bP1Sfz8888AgGKxiGZ3u+uIRCI9c3iyLAszMzO49957AQCzs7Nikp5Tr9cRi8UwPj6OPXv2MM9EHl3XYRiG4z1//fVXAMCTTz4JAPjpp58AAKZpuvLItm1cu3aNnSuRrAexrMbjcSiK4pJNsceJL/+XL192xEm2Dp3KRSQSwdTUFL788ksAwMLCgphky9Fz4wwAL774ItDsprYsCwsLCzh06JCYzMFtt90GAPjuu+/EqJ5jWRZisRguXLiAH3/8EVNTU7jrrrvEZD2lVqtB0zREIhFcu3YN8Xjcs1v73//936EoCvM+tywLb775JjRNY0MGdN4PP/wgnC2RdIeiKGIQg4ZbfvvtN9xxxx1iNOPBBx8EANy4cUOMkmxyupWLsbExaJqGtbU1MWrL0RfjTBncaDTw7LPPYnx8HJFIREzmYGRkBIqiYHFx0XO8gaYuPfTQQwCAr7/+WkjRPsePH0e1WsWZM2dcLYR+8de//hUAMD8/L0Y5GBkZwVtvvYU77rgD27dvx7333ovR0VGsrKywNI888ggA4KOPPuLO/P+sZ5qXRIJmCxnN6XxeKIqCkZER1osT5HdBcirZOvRCLsLhMMbHx8XgLUdPjLOXMX311VcBAKVSCQcPHhSjPYnH4zAMA7FYzNF1VqvVcP78eQDAn//8ZwBAJpNBrVZjaeDzHF54dY1T13G/MAxDDPKsQRYKBdRqNRQKBdi2jbW1NczPzzsqESMjI1BVFYuLi6753YVCgXV7SySdQmX1gw8+cIRbloVSqYRjx44BAPbt2wdFUbCwsOAqd59//jmi0aicq7oF6VYuLMvCtWvXWO/rlkYchO4UmpvmNYDvNeXJ5qZrQJjTxs83RtNhjKYW8enIKUxRFDZvOpFIMKeuVCrF0vOu/eTkQg5dmqbZ5eb8bHJkiEajdjabtQ3DYOeKU6no+fnpXa2g6ycSCfa89K6pVIrln6qqbNoBHfSOvCMFzZOmfIhGo4652xLJeqHyQVPyjOY6AqJskaMirTdgmiZzdJRzVbcu7cgF2Q3RmZimYEm6nOdMRpQ/eM9jMiwEfTTx4A04fUiK03Xds6Dznn3RaNReXV21o9GonUgkWHoyiPxhCwuy0/xNszlHW1VVu1wue57r9/ztQIuG8M9L14tGo8zw0lxlv0NciISek+YISsMs6QXZbJbJq6IodiqV8pStcrnsqFDruu7pjSvZWrSSC14Hk4zpuu6wH1udIft3IysZEGj7y127duH69euOLuqrV6/iwoUL0llCIpFINjnSOA8QtVoN09PTDucvkVgsFhgvkUgkklufnjiESXrDW2+95XKi4KnVati7d68YLJFIJJJNhmw5DxAzMzM4deoUVFXFkSNHsGPHDtx+++24fv06vv32W6CNqVgSiUQiufWRxnnAKBQKOHfuHEqlEgvTdR3PPfcc9u/f70grkUgkks2JNM4SiUQikQwYcsxZIpFIJJIBQxpniUQikUgGDGmcJRKJRCIZMKRxlkgkEolkwJDGWSKRSCSSAeOWNM71eh0zMzMIhUJiVM+pVCrI5XJi8LqZmZnx3BVLIpFIJBKiK+NcKBQwMzODoaEhDA0NIRaLiUk8qVQq7Bw62sWyLCwtLSGTyaDRaIjRPaVQKOD69es4evQo0Lx3LBbD0NAQhoeHsby8LJ7iYGZmxpXm5MmTOHv2rNxzWdI3ksmkq3zNzMyIyRiWZSGXyzHZ7qQcj46OBpaHWq2G4eFhdl1xm1cRcbtYyc2jUCggHo8jFAq5dLRlWYjH40y+4vF4y28rUq/XXXI6NDTkWCVxS8uPcx+M9cFvAdnOjjR8+vVuD0Y7mvSLfD7v2hJS0zRb13XbNE07lUrZiqL4vm+xWHSdT5imyXa/kkh6iWEYbKtW/vCTU363tFQq1bZM5vN5R/mlHdb4XdMMw2Dbn9IuRJqmcVdxkkql1q0PJL2D3zFK13W2ax8fr2manUgk7HQ6zXafUhTFc+cyPxKJhEtO+a2Ht7r89MS60f6vaO7hGQQpD0q/XvppnGmvUV6hkfIpFou23XwP+OxjTcIbJKjlcrljYZZIWkH7hbcDyWCn+y/T9qpiWac9e6nckF4gGSeD7vV85XLZtzIr2ThoT2WxosWTSqVc8kL62O8cEcMw7Ci3VbAXW11+emLd0um0Yw/mIINDta1BNs7RaNT1oemZecGAsBc1EeU2EA9CVVWXgpNI1gtVfNthdXV1XYbZ5soCVVQJqsCmUinb9iijFC9WaKknKUhvSPoPNSpaGVmv71QsFm100BOaSCQC72FL+bG7GnPmueeee5BIJAAAZ8+eFaOB5jjFhQsXcOjQITGKUa/XHWMZw8PDHTlkWZaFZDLJxklGR0dRqVTEZL5UKhWUSiU89thjYlRb5HI57N27F2NjY2KUi/HxcWQymcCdqCSSdjl9+jQMw8Do6GhLx8PDhw+j0Whgfn4eIyMjYnQgn3/+OQDgD3/4gyOcZH5xcdER3orJyUmcO3cO4XBYjJJsIO+99x6q1SoSiQTi8bgYzfD6Tv/6178AAPv27ROjXNTrdWQyGSSTScTj8a79bzar/PTMOAPAwYMHAQBvv/22p8E5e/YsxsfHfTPRsiyMj4/DMAyYpgnTNKEoCl544YW2DCw5bN1zzz1YW1vD6uoqGo0Gdu/e3dKRgPj0008BAPfff78jfNeuXQCAX3/9FWgKGAA89NBDLE2tVsPf/vY3TE1NsbAg7rnnHgDAxYsXxSiJpCMsy0K9XoeiKKhWq2x3M6+K7fLyMqrVKlRVDVTCflSrVQDwNeqGYcCyLLa9KemC69evAwB27NjB0hYKBaiq2lZlVtI/LMvCqVOnAACvvfaaGB2IZVl45513kM1mEYlExGgXFy9ehKZpaDQaWFxcxMTEBEZHR12VyS0vP2JTej2k02nW1RA09qCqKhuP8urW9uq2oO4SsStD7PKwm88hdjPT+WI3tR/UrePVRaJpGrt+KpWyIYxLRwMcb7yg92332SSSdjAMwzHMJJYdcshMJBJ2KpViPiCqqrq6qkVIZsWyR1BcuVz2dOjhu93bGXeUbAzZbNYGYGuaZmezWYeTlyg/PMVi0dY0raWPjR/5fJ7Jn+iDs9Xlx7uEdQhvnGnQXhz7Er2fvQo4jYPxCsLLYNs+xllRFNeYBzl3iWn9CEpLwkDvxz+n6CnIC10ikfAUXHq3zSxgkptHuenwBaESSWG6rrPxZj5tkIHuxDjbzfRUDqLRqGN8W9M0x2+aAdHKIEh6D1XYNE1zOL2SkSY/Ah7ShXSoqtqx/4LdHDOmyqTog7OV5ce7hHUIb5ztZgsZQus5KjhJBRVwu/nB0uk0u5aY2aJxJu/poKOdVm2r5/KiLHgKkgLLZrPMWUEUOj5dp/eTSNqFWkR8xZFkTqww+lWseVrJLMXxZd2LdDrteKZ008lsdXWVVai9et8k/YH0qVgx4/Wql/40TdPREFlvQ4P0pKIoYpQnW0F+vEtYh4jGmRQCfahyueyan+ZXwE1uDnE6nWYKo5VxJqUhClen+D2XHyRUvKKjWiBBgiMKdytFJ5H0ArH8BMmcV0tbJOj8oDiiXC67lLjanJtNiF2Ykv5C+tSrUkWt6iBjZzan17WSnSBIT7Ziq8hPTx3CiAMHDkBRFJRKJVQqFXz66ad49dVXxWQuyKHrwoUL+PHHHzE1NYW77rpLTObJbbfdBgD47rvvxKi+4uUpWK/XoWmaIx0A/Pzzz2IQACAajYpBEknPEOVLURTHb56dO3cCAH777TcxiqGqKsA56hDk0EPxXliWhZdffhmffPKJI9wwDIdzJYVJNoY77rhDDGI8+OCDAIAbN26IUYxwOMxm6wTJThDkdBvEVpKfvhhn/kPNzs7iwoULbXmFHj9+HNVqFWfOnPH16PZjZGQEiqJgcXHRpTTQ9OprBzKqouegF0Gegp08f1DBkEi65fLlyzhw4AD7TWXRbwaEoii+ntgAoOs6AOCHH35whP/v//6vI96L48eP48SJE57lY9u2bWKQZIN48sknAQDff/+9GMV45JFHxCAH27Ztayk7QfzjH/9gdsOPrSQ/PTHOXjUqmstcKpVw5MgRMdoTL4NIU5faIR6PwzAM1/qqtVoN58+fd6T1Y3x8HAho5RL1eh3nzp3DK6+8IkYhEomgVCqJwbjzzjsdv//xj38AwLrnVEskrcjlckgkEo4pLjTl8YMPPuBS/t4qKZVKOHbsmCOsUqk4KryHDh2Coij46quvWBgAfPXVV1AUxXcdA1p7e//+/WIUVFXFpUuXXGGSjWHfvn1QFAULCwuuxs3nn3+OaDTa0uhevXrVITto6km/SiCPZVk4c+ZM4DSuLSc/Yj93p9CYK605zUPL+YnhNCYNwVGFxhw0TbPL5bKdbk6NQnP8OpvN2oZhsHuK4yAmt8INnaNpmq10sAoSOROIXt8ioqcgj5dDmDhGYnPTscT8kUg6hcYF081pJ6Zp2tls1tMR0ebKGsk5eeaKU2L8vGjJF4TKH5Vpv3FJur6frHs59Ih+JpL+QnqLZpeYTS9qUX8qimKrqsq+NaXzmhIq6mn6tqTjKSwqeGKLbEX56co4k+HkDz5DyuWyq1CL6emwmx+Zrqk2pyqZTUcDtblRBO9ERQdv+EhQKI6fLtIuUY/lO3nSgqegF8VikQmm31Qq1ceLWyLplFVuAwuS+1bOkdlslp2jKIqdSqVccioaYR5exnll60Ur5Ws3y9VmmgpzK1JuOu/yciQ6eFGjgte/XvJhcw00/tvz56uq6il3IltRfobs3w2mhKNWq+GBBx6AYRhtrXizHiqVCp544gn8+OOPnuMnEolEItm69GTMebMxMjKCfD6PVColRvWM2dlZfPnll9IwSyQSicSFq+UsbqotkUgkEolkYyCT7DLOEieVSgXXr19vaypYO+RyOTzyyCMtPR8lEolEsnWRxlkikUgkkgFDjjlLJBKJRDJgSOMskUgkEsmAIY2zRCKRSCQDhjTOEolEIpEMGP8HZ8AAslKvQjYAAAAASUVORK5CYII=\"\u003e\u003c/div\u003e\n \u003c/div\u003e\n \u003cp\u003e2.2.Data preprocessing and feature alignment\u003c/p\u003e\n \u003cp\u003eAll clinical and laboratory measurements were obtained within the first 24 hours of ICU admission. To ensure compatibility with our feature space, we selected a subset of 44 features from both datasets. Collected variables included demographics (age, sex), vital signs and physiologic measures (temperature, heart rate, respiratory rate, oxygen saturation, Glasgow coma scale, lactate), and an extensive panel of laboratory values including hematologic indices (white blood cell count, hematocrit, platelet count), renal function tests (blood urea nitrogen, creatinine), metabolic and electrolyte markers (glucose, sodium, potassium), and liver function markers (total bilirubin, albumin). The complete variable lists for the MIMIC-IV and our prospective data are provided in the supplementary material Tables S2 and S3.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec4\" class=\"Section2\"\u003e\n \u003ch2\u003e2.3. Model development and validation\u003c/h2\u003e\n \u003cp\u003eWe trained supervised learning tree-based models (RF, XGBoost, LightGBM, CatBoost) as well as deep learning models (NODE, SAINT). We selected these architectures since the literature had previously identified them among the most promising approaches for ICU mortality prediction. The MIMIC- IV (source) dataset was first split into 80/20, with 80% used for model training and the remaining 20% held out as a test set. We performed a five-fold cross-validation within the training portion to avoid overfitting and ensure model robustness. Model evaluation was conducted primarily using AUC and AUPRC, given the class imbalance in ICU mortality. We have also reported the Brier score, sensitivity, and specificity. Hyperparameters used in all ML experiments were optimized with Optuna, a Bayesian optimizer [\u003cspan class=\"CitationRef\"\u003e28\u003c/span\u003e].\u003c/p\u003e\n \u003cp\u003eIn addition to the transfer learning experiments, we trained models exclusively on the prospective (target) dataset. These locally trained models served as comparators to assess the added value of transfer learning beyond what could be achieved using institution-specific data alone. To ensure a fair comparison, identical feature sets, preprocessing steps, performance metrics, and hyperparameter optimization procedures were applied.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec5\" class=\"Section2\"\u003e\n \u003ch2\u003e2.4. ITL implementation\u003c/h2\u003e\n \u003cp\u003eIn the ITL setting, models were trained on the internal MIMIC-IV cohort (source domain) and were further fine-tuned using the outcome labels from the prospective data (target domain). For boosting models, ITL was performed by continuing gradient boosting from the internally trained model using labeled external data. Specifically, the boosting model trained on MIMIC-IV was first fitted on the internal training data to obtain a base booster. This booster was then updated by training additional trees on the external transfer learning subset. To control the extent of adaptation and mitigate overfitting to the small external sample, only a limited number of new trees were added, and the learning rate for these trees was reduced relative to the original model. For each external split, the ITL procedure introduced three tunable hyperparameters: \u003cem\u003e(i)\u003c/em\u003e the learning-rate scaling factor \u003cem\u003e\u0026kappa;\u003c/em\u003e, \u003cem\u003e(ii)\u003c/em\u003e the number of additional boosting rounds, and \u003cem\u003e(iii)\u003c/em\u003e the class imbalance correction term (\u003cem\u003escale_pos_weight\u003c/em\u003e) applied during fine-tuning. These parameters were optimized using Optuna. Additionally, Platt scaling was applied as a post-hoc calibration step.\u003c/p\u003e\n \u003cp\u003eRandom forests were adapted using a model-based ensemble expansion strategy enabled by warm-start training. The source domain was reinitialized with warm-start enabled, and additional decision trees were appended to the existing ensemble. These newly added trees were trained solely on labeled target-domain data, while the original source-domain trees remained fixed. The tunable hyperparameters were \u003cem\u003e(i)\u003c/em\u003e number of trees, \u003cem\u003e(ii)\u003c/em\u003e maximum tree depth, and \u003cem\u003e(iii)\u003c/em\u003e minimum leaf size.\u003c/p\u003e\n \u003cp\u003eIn the NODE architecture, ITL was performed by initializing a classifier with the internal base weights learned on the original internal cohort, and then fine-tuning a restricted subset of parameters on the external cohort. A new NODE instance (stacked Oblivious Decision Trees-ODST layers with additive per-layer readouts) was loaded with the stored base checkpoint; all parameters were frozen, and we unfroze only \u003cem\u003e(i)\u003c/em\u003e the readout layers, \u003cem\u003e(ii)\u003c/em\u003e the global logit bias term, and \u003cem\u003e(iii)\u003c/em\u003e the final ODST layer. The final fine-tuned model was then trained using AdamW and weighted binary cross-entropy with logits.\u003c/p\u003e\n \u003cp\u003eSimilarly to tree-based methods, SAINT was adapted via parameter- restricted fine-tuning starting from internal base weights learned on the original internal cohort and then fine-tuning a restricted subset of parameters\u003c/p\u003e\n \u003cp\u003eon the external cohort. A new SAINT instance (feature-wise tokenization via learnable per-feature affine embeddings followed by multi-head self-attention blocks and mean pooling) was then loaded with the stored base checkpoint; all parameters were frozen, and we unfroze \u003cem\u003e(i)\u003c/em\u003e the classification head and \u003cem\u003e(ii)\u003c/em\u003e the final transformer block.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec6\" class=\"Section2\"\u003e\n \u003ch2\u003e2.5. DA implementation\u003c/h2\u003e\n \u003cp\u003eFor DA with all tree-based models as well as NODE, we estimated each internal training instance by fitting a domain classifier that discriminates between internal and external feature distributions. We constructed a pooled dataset with corresponding binary domain labels, then a logistic regression model was trained on \u003cstrong\u003eX\u003c/strong\u003e\u003csub\u003edomain\u003c/sub\u003e to estimate the posterior probability\u003c/p\u003e\n \u003cp\u003e\u003cem\u003eP\u003c/em\u003e (\u003cem\u003ed\u003c/em\u003e\u0026thinsp;=\u0026thinsp;1 | \u003cstrong\u003ex\u003c/strong\u003e), which represents the probability that an observation \u003cstrong\u003ex\u003c/strong\u003e originates from the external cohort. Using the trained domain classifier, we\u003c/p\u003e\n \u003cp\u003ecomputed an importance weight for each internal training sample \u003cstrong\u003ex\u003c/strong\u003e\u003csub\u003e\u003cem\u003ei\u003c/em\u003e\u003c/sub\u003e as a monotonic function of the odds of originating from the external cohort. To ensure numerical stability, the probabilities \u003cem\u003ep\u003c/em\u003e\u003csub\u003e\u003cem\u003ei\u003c/em\u003e\u003c/sub\u003e were clipped to the interval\u003c/p\u003e\n \u003cp\u003e\u003cem\u003ep\u003c/em\u003e \u003csub\u003e\u0026nbsp;\u003cem\u003ei\u003c/em\u003e\u0026nbsp;\u003c/sub\u003e \u0026isin; [10\u003csup\u003e\u003cem\u003e\u0026minus;\u003c/em\u003e\u0026thinsp;6\u003c/sup\u003e, 1\u0026thinsp;\u0026minus;\u0026thinsp;10\u003csup\u003e\u003cem\u003e\u0026minus;\u003c/em\u003e6\u003c/sup\u003e], and the resulting importance weights were further\u003c/p\u003e\n \u003cp\u003econstrained to lie within a bounded range. A DA model was then\u003c/p\u003e\n \u003cp\u003etrained on the same internal training data, using the computed importance weights. Thus, adaptation was achieved without using external labels for training the outcome model, as external data are only used to learn the domain shift and derive weights.\u003c/p\u003e\n \u003cp\u003eUnlike the tree-based approaches, SAINT was trained once to obtain a baseline model on the internal training set. Starting from the baseline weights, we fine-tuned an alignment model using labeled internal source batches together with unlabeled external TL batches by minimizing a joint objective:\u003c/p\u003e\n \u003cp\u003eL\u0026thinsp;=\u0026thinsp;L\u003csub\u003etask\u003c/sub\u003e(\u003cem\u003eX\u003c/em\u003e\u003csub\u003e\u003cem\u003es\u003c/em\u003e\u003c/sub\u003e, \u003cem\u003ey\u003c/em\u003e\u003csub\u003e\u003cem\u003es\u003c/em\u003e\u003c/sub\u003e) + \u003cem\u003e\u0026lambda;\u003c/em\u003e L\u003csub\u003ealign\u003c/sub\u003e(\u003cem\u003ef\u003c/em\u003e (\u003cem\u003eX\u003c/em\u003e\u003csub\u003e\u003cem\u003es\u003c/em\u003e\u003c/sub\u003e), \u003cem\u003ef\u003c/em\u003e (\u003cem\u003eX\u003c/em\u003e\u003csub\u003e\u003cem\u003et\u003c/em\u003e\u003c/sub\u003e)),\u003c/p\u003e\n \u003cp\u003ewhere L\u003csub\u003etask\u003c/sub\u003e is binary cross-entropy with logits on internal labels, \u003cem\u003ef\u003c/em\u003e (\u0026middot;) denotes the pooled SAINT representation, and L\u003csub\u003ealign\u003c/sub\u003e is a Radial Basis Function \u0026ndash; Maximum Mean Discrepancy loss computed between source and target rep-\u003c/p\u003e\n \u003cp\u003eresentations. Alignment hyperparameters were fixed from a prior Optuna run: \u003cem\u003e\u0026lambda;\u003c/em\u003e\u0026thinsp;=\u0026thinsp;9.7454 \u0026times; 10\u003csup\u003e\u003cem\u003e\u0026minus;\u003c/em\u003e\u0026thinsp;4\u003c/sup\u003e, \u003cem\u003e\u0026sigma;\u003c/em\u003e\u0026thinsp;=\u0026thinsp;13.8262, and freeze_epochs\u0026thinsp;=\u0026thinsp;3, where the feature extractor was frozen for the first 3 epochs, and the classification head remained trainable throughout.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec7\" class=\"Section2\"\u003e\n \u003ch2\u003e2.6. Performance aggregation strategy\u003c/h2\u003e\n \u003cp\u003eTo account for the relatively small size of the external prospective cohort (\u003cem\u003en\u003c/em\u003e\u0026thinsp;=\u0026thinsp;539), both ITL and DA procedures were repeated across 5 random 70/30 splits, each controlled by a sequential independent seed. For every seed, the adapted model was trained, optimized by Optuna, and evaluated on its corresponding split. The resulting performance metrics were aggregated across all repetitions. Final reported results, therefore, reflect the mean and 95% confidence intervals computed over five sequential seeds. APACHE II and SOFA scores used to benchmark our ML models were also computed over the same seeds.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e\n \u003ch2\u003e2.7. Statistical analysis\u003c/h2\u003e\n \u003cp\u003ePaired hypothesis testing was conducted to compare model performance across repeated data splits, using performance metrics computed from five sequential seeds. Specifically, paired \u003cem\u003et\u003c/em\u003e-tests were used to assess whether the mean difference in performance between models differed significantly from zero, and differences were considered statistically significant when \u003cem\u003ep\u0026thinsp;\u0026lt;\u003c/em\u003e\u0026thinsp;0.05.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec9\" class=\"Section2\"\u003e\n \u003ch2\u003e2.8. Model interpretation\u003c/h2\u003e\n \u003cp\u003eTo enhance the interpretability of our findings, we used SHapley Additive exPlanations (SHAP) [\u003cspan class=\"CitationRef\"\u003e29\u003c/span\u003e], an interpretability tool that helps to elucidate how our models compute their predictions. We used a SHAP Kernel Explainer to derive feature attributions, followed by a SHAP Beeswarm plot to visualize the distribution of these effects and explore how individual predictors influence the model\u0026rsquo;s outcome predictions.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec10\" class=\"Section2\"\u003e\n \u003ch2\u003e2.9. Software\u003c/h2\u003e\n \u003cp\u003eOur machine learning models were developed and trained using Python 3.12. Experiments were conducted on an Intel Core i7-14700k CPU and RTX 5600 GPU.\u003c/p\u003e\n\u003c/div\u003e"},{"header":"3. Results","content":"\u003cdiv id=\"Sec12\"\u003e\n \u003ch2\u003e3.1. Assessment of DA and ITL over baseline models\u003c/h2\u003e\n \u003cp\u003eAcross all evaluated models, DA consistently improved discrimination relative to baseline in 6 out of 6 models, while ITL improved discrimination in 4 out of 6. The largest AUC improvements were observed for the LightGBM model (Fig. 1a), while the largest AUPRC gains were observed for the XGBoost model (Fig. 1b). Both models showed improvements in discrimination relative to the baseline that were statistically significant for both adaptation strategies. The domain-adapted LGBM model achieved a statistically significant performance gain in AUC (Fig. 1a) (paired t-test; p = 0.0010). Similarly, XGBoost achieved a statistically significant performance in AUPRC using DA (Fig. 1b) (paired t-test; p = 0.0419). All evaluated models are summarized in Table 2, while additional performance measures, including the Brier score, sensitivity, and specificity, are reported in the supplementary material Table S1.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec13\"\u003e\n \u003ch2\u003e3.2. Comparison with scoring systems and locally trained models\u003c/h2\u003e\n \u003cp\u003eFigure 2 compares the performance of the domain-adapted LGBM (Fig. 2a) and XGBoost (Fig. 2b) models with the APACHE II and SOFA scores, as well as locally trained baselines, using AUC and AUPRC. We choose to plot LGBM and XGBoost as they demonstrated the largest gains in AUC and AUPRC, respectively. The domain-adapted models significantly outperformed APACHE II, showing higher AUC (p = 0.0044) and AUPRC (p = 0.00026). They also achieved superior discrimination relative to SOFA, with statistically significant improvements in both AUC (p = 0.0077) and AUPRC (p = 0.0013). Moreover, demonstrated significant gains over the locally trained models in AUC (p = 0.033) and AUPRC (p = 0.022).\u003c/p\u003e\n \u003cdiv\u003e\n \u003ctable id=\"Tab2\" border=\"1\"\u003e\u003c/table\u003e\n \u003c/div\u003e\n \u003cdiv\u003e\n \u003cdiv align=\"char\"\u003e\u003cstrong\u003eTable 2\u003c/strong\u003e: Performance comparison of all ML models used in this study.\u003c/div\u003e\n \u003ctable id=\"Taba\" border=\"1\"\u003e\n \u003ccolgroup cols=\"5\"\u003e\u003c/colgroup\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eRF\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003e0.920\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003e0.602\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003e0.897 [0.885, 0.910]\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003e0.791 [0.763, 0.818]\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eITL\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.903 [0.893, 0.914]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.813 [0.788, 0.838]\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eDA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.907 [0.895, 0.918]\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.802 [0.773, 0.832]\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eXGBoost\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.922\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.608\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.892 [0.878, 0.905]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.771 [0.747, 0.795]\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eITL\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.891 [0.874, 0.907]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.771 [0.741, 0.800]\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eDA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.905 [0.895, 0.914]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.803 [0.779, 0.827]\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eLightGBM\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.920\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.607\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.877 [0.863, 0.891]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.761 [0.734, 0.789]\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eITL\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.881 [0.865, 0.897]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.768 [0.737, 0.799]\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eDA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.892 [0.882, 0.902]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.782 [0.755, 0.809]\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCatBoost\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.920\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.600\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.893 [0.884, 0.903]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.772 [0.738, 0.806]\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eITL\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.896 [0.883, 0.908]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.782 [0.757, 0.807]\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eDA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.896 [0.885, 0.906]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.776 [0.746, 0.807]\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eNODE\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.913\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.577\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.876 [0.854, 0.898]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.734 [0.686, 0.783]\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eITL\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.882 [0.859, 0.905]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.750 [0.692, 0.807]\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eDA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.878 [0.860, 0.896]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.724 [0.700, 0.749]\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSAINT\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.911\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.581\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.896 [0.883, 0.909]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.793 [0.758, 0.828]\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eITL\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.889 [0.877, 0.900]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.785 [0.755, 0.814]\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eDA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.897 [0.883, 0.910]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.794 [0.760, 0.829]\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n \u003cp\u003eBest AUROC and AUPRC values are shown in bold. Means and CIs were calculated for the prospective cohort due to its lower sample size (n = 539). \u003cem\u003eAbbreviations: DA, Domain Adaptation; ITL, Inductive Transfer Learning; RF, Random Forest.\u003c/em\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec14\"\u003e\n \u003ch2\u003e3.3. Model interpretability\u003c/h2\u003e\n \u003cp\u003eFeature-level interpretability analyses for the domain-adapted RF model are illustrated in Figs. 3 and 4, including Mean Decrease in Impurity (MDI)- based feature importance rankings and SHAP Beeswarm plots, respectively. We chose to plot RF, as it achieved the highest AUC after domain adaptation.\u003c/p\u003e\n\u003c/div\u003e"},{"header":"4. Discussion","content":"\u003cp\u003eThis study demonstrates that both DA and ITL can significantly improve the discriminatory performance of ML models for predicting mortality in ICU patients compared to baseline approaches. Across all evaluated architectures, DA consistently improved discrimination, while ITL yielded improvements in most models. These improvements were systematic and, in several cases, statistically significant. Among the evaluated models, LightGBM exhibited the largest improvements in AUC, and XGBoost showed the most pronounced gains in AUPRC, suggesting that these models may be particularly well suited to benefit from DA or ITL in this setting. LightGBM\u0026rsquo;s leaf-wise tree growth and strong handling of heterogeneous feature interactions may enable more effective refinement of decision boundaries under covariate shift, leading to larger gains in overall ranking performance.\u003c/p\u003e \u003cp\u003eSeveral studies have investigated DA in temporal and dynamic clinical data using deep learning approaches [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e, \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e, \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e, \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e]. However, these works differ substantially from our study in both scope and methodology. Prior approaches primarily rely on time-series data, focus on pediatric populations, or restrict DA to deep neural architectures, without considering tree-based or transformer-based models, and without benchmarking DA alongside ITL. Most closely related to our work, Mutnuri et al. evaluated both DA and ITL within fully connected neural networks [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e]. In contrast to our setting, their source and target domains were of comparable size, their data contained temporal features, and adaptation was performed exclusively within fully contained neural networks. NODE and SAINT have been mainly assessed on synthetic and real-world benchmark datasets [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e, \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e], focusing on representation learning and feature interaction modeling, rather than on clinical DA or transfer learning across heterogeneous healthcare cohorts. To our knowledge, neither NODE nor SAINT has previously been systematically evaluated for DA or ITL in ICU mortality prediction, nor compared directly against classical tree-based methods under similar experimental conditions. Our study, therefore, extends prior work [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e, \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e, \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e] by examining how modern deep tabular architectures respond to both DA and ITL in a clinically realistic, cross-cohort setting, and furthermore by benchmarking their behavior alongside established methods.\u003c/p\u003e \u003cp\u003eNotably, beyond comparisons with baseline machine learning models, the\u003c/p\u003e \u003cp\u003edomain-adapted LGBM and XGBoost models substantially outperformed\u003c/p\u003e \u003cp\u003eestablished clinical scoring systems, including APACHE II and SOFA, in both AUC and AUPRC. These findings underscore the added value of data-driven, adapted ML approaches over traditional rule-based scores. In addition, the domain-adapted models demonstrated significant improvements over their respective baselines trained on our prospective data, indicating that performance gains were not solely attributable to local retraining but rather to the explicit incorporation of information from external domains.\u003c/p\u003e \u003cp\u003eInterpretability analyses of the RF model showed a coherent and clinically plausible set of dominant predictors, with strong agreement between global Gini importance and local SHAP attributions. Across both representations, APACHE II, lactate, and SOFA score emerged as top predictors in most models. SHAP values further demonstrated directionally consistent effects, in which higher lactate, higher severity scores, and lower saturation could enhance the predicted mortality (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e). Importantly, DA primarily reweighted these same core clinical variables rather than introducing new drivers, suggesting that performance gains were achieved by refining the relative importance of clinically meaningful signals to better match the external cohort.\u003c/p\u003e \u003cp\u003eIn our study, there is an imbalance between source and target sample sizes. With 94,458 samples available in the source domain and 539 in the target domain, instance-level weighting or selection could become inherently unstable, as similarity estimates and importance weights are dominated by high-dimensional noise and may be overly sensitive to a small number of target observations. In contrast, DA operates at a distributional level, enabling the model to leverage the large source dataset while explicitly adjusting for systematic differences between domains, rather than relying on sparse target instances to guide transfer. Under these conditions, DA could provide a more robust and data-efficient mechanism for aligning source and target domains, which likely explains its more consistent performance compared with ITL in our study.\u003c/p\u003e \u003cp\u003eThe limitations of our study should be addressed. First, the target domain was substantially smaller than the source dataset, reflecting a common real-world constraint but potentially limiting model stability and the generalizability of conclusions regarding adaptation effectiveness. Second, performance estimation relied on repeating the ITL and DA procedures across five random data splits. Although this approach improves robustness compared with a single split, the limited number of repetitions, which were driven by computational constraints, may reduce the precision of estimated confidence intervals and the power of statistical comparisons. Lastly, the use of a logistic regression domain classifier may inadequately capture complex, nonlinear domain shifts typical of ICU data, potentially resulting in suboptimal importance weighting.\u003c/p\u003e \u003cp\u003eThe strengths of our study should also be acknowledged. By leveraging the large-scale, widely recognized MIMIC-IV database for source training and validating our findings on an independent, multicenter prospective ICU cohort, we ensured a rigorous test of model generalizability in a clinically realistic setting. A key strength lies in our systematic benchmarking of modern deep tabular architectures, such as NODE and SAINT, alongside established tree-based methods. Our results highlight the utility of tree-based methods, which demonstrated superior capacity to refine decision boundaries under covariate shift, achieving the most pronounced gains in AUC and AUPRC, respectively. Furthermore, by demonstrating that domain-adapted models substantially outperform traditional clinical scoring systems like APACHE II and SOFA, this study provides strong evidence for the superiority of adaptive, data-driven approaches over static rule-based metrics.\u003c/p\u003e"},{"header":"5. Conclusions","content":"\u003cp\u003eThis study demonstrates that transfer learning strategies can meaningfully improve ICU mortality prediction under realistic cross-domain conditions, when target-domain data are limited. Across multiple modeling paradigms, domain adaptation consistently improved discrimination, achieving statistically significant improvements over baseline models or traditional scoring systems such as APACHE II or SOFA scores. From a clinical perspective, these findings support DA and ITL as pragmatic pathways to improve model generalizability across heterogeneous ICU populations without requiring large volumes of local retraining data. Such approaches may help mitigate well-documented performance degradation when models are transferred between hospitals, healthcare systems, or patient cohorts. The prospective validation performed in this study further strengthens the translational relevance of these results. Future work should extend this framework to incorporate multimodal representations to assess whether DA benefits persist in richer data settings. Broader multicenter prospective evaluations are also needed to examine clinical robustness across heterogeneous ICU environments.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003eData and code availability\u003c/p\u003e\n\u003cp\u003eThe processed MIMIC-IV dataset generated and analyzed during the current study is publicly available in PhysioNet: https://physionet.org/content/mimiciv/3.1. The code to reproduce our experiments will also be available in the Github repository: https://github. com/giannis3p/TL-mortality-pred. The datasets analyzed during the current study are not publicly available due to participant confidentiality and informed consent restrictions, but are available from the corresponding author on reasonable request.\u003c/p\u003e\n\u003cp\u003eAuthor Contributions\u003c/p\u003e\n\u003cp\u003eAKo, AGV, SP, and ID conceived the work; IP, AKa, SG, and SP designed the work; MT, ZM, VG, OK, MP, GP, VI, KK, SP, CK, and NSL acquired the data; IP, CSV, MT, ZM, VG, OK, AKa, MP, SG, GP, VI, KK, SP, CK, NSL, AKo, AGV, SP, ID analyzed and interpreted the data; IP, MT, ZM, VG, OK, AKa, MP, GP, VI, KK, SP, CK, NSL drafted the work; CSV, SG, AKo, AGV, SP, and ID substantively revised it. All authors have approved the submitted version and have agreed both to be personally accountable for the author\u0026apos;s own contributions and to ensure that questions related to the accuracy or integrity of any part of the work, even ones in which the author was not personally involved, are appropriately investigated, resolved, and the resolution documented in the literature.\u003c/p\u003e\n\u003cp\u003eCompeting Interests\u003c/p\u003e\n\u003cp\u003eThe authors declare no competing interests.\u003c/p\u003e\n\u003cp\u003eFunding\u003c/p\u003e\n\u003cp\u003eThe work was implemented under the Clusters of Research Excellence (CREs), funded by the European Union, Recovery and Resilience Facility (RRF), Greece 2.0, Next Generation EU. Grant number: \u0026Upsilon;\u0026Pi;3\u0026Tau;\u0026Alpha;-0559412.\u003c/p\u003e\n"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eKnaus, W. A., Draper, E. A., Wagner, D. P. \u0026amp; Zimmerman, J. E. APACHE II: a severity of disease classification system. \u003cem\u003eCrit. Care Med.\u003c/em\u003e \u003cb\u003e13\u003c/b\u003e (10), 818\u0026ndash;829 (1985).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLe Gall, J. R. et al. A simplified acute physiology score for ICU patients, Crit. \u003cem\u003eCare Med.\u003c/em\u003e \u003cb\u003e12\u003c/b\u003e (11), 975\u0026ndash;977 (1984).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVincent, J. L. et al. The SOFA (sepsis-related organ failure assessment) score to describe organ dysfunction/failure. \u003cem\u003eIntensive Care Med.\u003c/em\u003e \u003cb\u003e22\u003c/b\u003e (7), 707\u0026ndash;710 (1996).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRaith, E. P. et al. D. V. Pilcher, for the Australian, N. Z. I. C. S. A. C. for Outcomes, R. E. (CORE), Prognostic accuracy of the sofa score, sirs criteria, and qsofa score for in-hospital mortality among adults with suspected infection admitted to the intensive care unit, JAMA 317 (3) 290\u0026ndash;300. (2017). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1001/jama.2016.20328\u003c/span\u003e\u003cspan address=\"10.1001/jama.2016.20328\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHu, C., Gao, C., Li, T., Liu, C. \u0026amp; Peng, Z. Explainable artificial intelligence model for mortality risk prediction in the intensive care unit: a derivation and validation study. \u003cem\u003ePostgrad. Med. J.\u003c/em\u003e \u003cb\u003e100\u003c/b\u003e (1182), 219\u0026ndash;227. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/postmj/qgad144\u003c/span\u003e\u003cspan address=\"10.1093/postmj/qgad144\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOlang, O. et al. Artificial intelligence-based models for prediction of mortality in icu patients: A scoping review, Journal of Intensive Care Medicine 40 (12) 1240\u0026ndash;1246, pMID: 39150821. (2025). 10.1177/ 08850666241277134.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKeuning, B. E. et al. H. consortium, Mortality prediction models in the adult critically ill: A scoping review, Acta Anaesthesiologica Scandinavica 64 (4) 424\u0026ndash;442. (2020). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps: //doi.org/10.1111/aas.13527\u003c/span\u003e\u003cspan address=\"https: //10.1111/aas.13527\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJohnson, A. E. W. et al. MIMIC-III, a freely accessible critical care database. \u003cem\u003eSci. Data\u003c/em\u003e. \u003cb\u003e3\u003c/b\u003e (1), 160035 (2016).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJohnson, A. \u0026amp; Mark, R. G. Real-time mortality prediction in the intensive care unit. \u003cem\u003eAMIA Annu. Symp. Proc.\u003c/em\u003e \u003cb\u003e2017\u003c/b\u003e, 994\u0026ndash;1003 (2018).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJohnson, A. E. W. et al. MIMIC-IV, a freely accessible electronic health record dataset. \u003cem\u003eSci. Data\u003c/em\u003e. \u003cb\u003e10\u003c/b\u003e (1), 1 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePang, K., Li, L., Ouyang, W., Liu, X. \u0026amp; Tang, Y. Establishment of ICU mortality risk prediction models with machine learning algorithm using MIMIC-IV database. \u003cem\u003eDiagnostics (Basel)\u003c/em\u003e. \u003cb\u003e12\u003c/b\u003e (5), 1068 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChoi, M. H. et al. Mortality prediction of patients in intensive care units using machine learning algorithms based on electronic health records. \u003cem\u003eSci. Rep.\u003c/em\u003e \u003cb\u003e12\u003c/b\u003e (1), 7180 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKim, Y., Kim, Y. \u0026amp; Choi, M. Machine learning-based prediction models of mortality for intensive care unit patients using nursing records, in: Studies in Health Technology and Informatics, Studies in health technology and informatics, IOS, (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAlghatani, K., Ammar, N., Rezgui, A. \u0026amp; Shaban-Nejad, A. Predicting intensive care unit length of stay and mortality using patient vital signs: Machine learning model development and validation. \u003cem\u003eJMIR Med. Inf.\u003c/em\u003e \u003cb\u003e9\u003c/b\u003e (5), e21347. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.2196/21347\u003c/span\u003e\u003cspan address=\"10.2196/21347\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSomepalli, G., Goldblum, M., Schwarzschild, A., Bruss, C. B. \u0026amp; Goldstein, T. Saint: Improved neural networks for tabular data via row atten- tion and contrastive pre-training arXiv:2106.01342. URL (2021). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://arxiv.org/abs/2106.01342\u003c/span\u003e\u003cspan address=\"https://arxiv.org/abs/2106.01342\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePopov, S., Morozov, S. \u0026amp; Babenko, A. Neural oblivious decision ensembles for deep learning on tabular data (2019). arXiv:1909.06312. URL \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://arxiv.org/abs/1909.06312\u003c/span\u003e\u003cspan address=\"https://arxiv.org/abs/1909.06312\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGrinsztajn, L., Oyallon, E. \u0026amp; Varoquaux, G. Why do tree-based models still outperform deep learning on tabular data? arXiv:2207.08815. URL (2022). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://arxiv.org/abs/2207.08815\u003c/span\u003e\u003cspan address=\"https://arxiv.org/abs/2207.08815\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen, K. Y., Chiang, P. H., Chou, H. R., Chen, T. W. \u0026amp; Chang, T. H. Trompt: Towards a better deep neural network for tabular data arXiv:2305.18446. URL (2023). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://arxiv.org/abs/2305.18446\u003c/span\u003e\u003cspan address=\"https://arxiv.org/abs/2305.18446\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMutnuri, M. K., Stelfox, H. T., Forkert, N. D. \u0026amp; Lee, J. Using domain adaptation and inductive transfer learning to improve patient outcome prediction in the intensive care unit: Retrospective observational study, J Med Internet Res 26 e52730. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.2196/52730\u003c/span\u003e\u003cspan address=\"10.2196/52730\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. URL (2024). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.jmir.org/2024/1/e52730\u003c/span\u003e\u003cspan address=\"https://www.jmir.org/2024/1/e52730\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAlves, T., Laender, A., Veloso, A. \u0026amp; Ziviani, N. Dynamic prediction of icu mortality risk using domain adaptation, in: IEEE International Conference on Big Data (Big Data), IEEE, 2018, pp. 1328\u0026ndash;1336. (2018). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/bigdata.2018.8621927\u003c/span\u003e\u003cspan address=\"10.1109/bigdata.2018.8621927\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. URL http://dx.doi.org/10.1109/BigData.2018.8621927.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhu, Y. et al. Domain adaptation using convolutional autoencoder and gradient boosting for adverse events prediction in the intensive care unit, Frontiers in Artificial Intelligence Volume 5\u0026ndash;2022 (2022). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3389/frai.2022.640926\u003c/span\u003e\u003cspan address=\"10.3389/frai.2022.640926\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShickel, B. et al. Deep multi-modal transfer learning for augmented patient acuity assessment in the intelligent icu. \u003cem\u003eFront. Digit. Health Volume\u003c/em\u003e. 3\u0026ndash;2021. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3389/fdgth.2021.640685\u003c/span\u003e\u003cspan address=\"10.3389/fdgth.2021.640685\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHwang, Y. \u0026amp; Song, J. Recent deep learning methods for tabular data. \u003cem\u003eCommun. Stat. Appl. Methods\u003c/em\u003e. \u003cb\u003e30\u003c/b\u003e, 215\u0026ndash;226. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.29220/CSAM.2023.30.2.215\u003c/span\u003e\u003cspan address=\"10.29220/CSAM.2023.30.2.215\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGardner, J., Popovic, Z. \u0026amp; Schmidt, L. Benchmarking distribution shift in tabular data with tableshift arXiv:2312.07577. URL (2024). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://arxiv.org/abs/2312.07577\u003c/span\u003e\u003cspan address=\"https://arxiv.org/abs/2312.07577\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGutheil, J. \u0026amp; Donsa, K. SAINTENS: Self-Attention and Intersample Attention Transformer for Digital Biomarker Development Using Tabular Healthcare Real World Data, Vol. 293, (2022). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3233/SHTI220371\u003c/span\u003e\u003cspan address=\"10.3233/SHTI220371\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHeikal, M. et al. Using machine learning and electronic health records to identify neuropsychiatric risk scores for delirium in ICU and general hospital settings. \u003cem\u003eNeuropsychiatr Dis. Treat.\u003c/em\u003e \u003cb\u003e20\u003c/b\u003e, 1861\u0026ndash;1876 (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLin, Y. T., Deng, Y. X., Tsai, C. L., Huang, C. H. \u0026amp; Fu, L. C. Interpretable deep learning system for identifying critical patients through the pre- diction of triage level, hospitalization, and length of stay: Prospective study, JMIR Med Inform 12 e48862. (2024). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.2196/48862\u003c/span\u003e\u003cspan address=\"10.2196/48862\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. URL https://medinform.jmir.org/2024/1/e48862.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAkiba, T., Sano, S., Yanase, T., Ohta, T. \u0026amp; Koyama, M. Optuna: A next- generation hyperparameter optimization framework (2019). arXiv: 1907.10902. URL \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://arxiv.org/abs/1907.10902\u003c/span\u003e\u003cspan address=\"https://arxiv.org/abs/1907.10902\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLundberg, S. \u0026amp; Lee, S. I. A unified approach to interpreting model predic- tions (2017). arXiv:1705.07874.\u003c/span\u003e\u003cspan\u003eURL. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://arxiv.org/abs/1705.07874\u003c/span\u003e\u003cspan address=\"https://arxiv.org/abs/1705.07874\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"machine learning, mortality prediction, intensive care unit, adults, domain adaptation, inductive transfer learning, prospective study","lastPublishedDoi":"10.21203/rs.3.rs-8872055/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8872055/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eMortality prediction in critically ill patients remains challenging due to poor cross-institutional performance and limited generalizability of machine learning models. This study addresses this, by systematically benchmarking and prospectively validating transfer learning frameworks. We trained our models on MIMIC-IV and validated them on a multicenter prospective cohort of 539 patients from three hospitals. We compared tree-based methods and modern deep learning architectures for tabular data. Results demonstrated that both Domain Adaptation (DA) and Inductive Transfer Learning (ITL) significantly enhanced model performance under realistic conditions where target-domain data are limited. DA consistently improved discrimination across all evaluated models, with LightGBM showing the most significant gains in Area Under the Receiver Operating Characteristic Curve (AUC) (p\u0026thinsp;=\u0026thinsp;0.0010), and XGBoost yielding the largest improvements in Area Under the Precision-Recall Curve (AUPRC) (p\u0026thinsp;=\u0026thinsp;0.0419). Among all evaluated models, Random Forest (RF) achieved the highest discriminative performance, achieving 90.7% AUC with DA and 81.3% AUPRC with ITL. Notably, the domain-adapted models significantly outperformed APACHE II (p\u0026thinsp;=\u0026thinsp;0.0044) and SOFA (p\u0026thinsp;=\u0026thinsp;0.0077). These findings suggest that transfer learning provides a robust and data-efficient pathway for improving model generalizability across heterogeneous populations, offering a pragmatic solution to the challenge of model degradation in clinical deployment.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e","manuscriptTitle":"Prospective Multicenter Validation of Machine Learning Models for Mortality Prediction in Adult Critically Ill Patients using Transfer Learning","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-02-19 12:39:35","doi":"10.21203/rs.3.rs-8872055/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"bf592790-4833-436b-9fa9-ed5b47f24475","owner":[],"postedDate":"February 19th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":63128878,"name":"Biological sciences/Computational biology and bioinformatics"},{"id":63128879,"name":"Health sciences/Health care"},{"id":63128880,"name":"Physical sciences/Mathematics and computing"},{"id":63128881,"name":"Health sciences/Medical research"}],"tags":[],"updatedAt":"2026-02-19T12:39:36+00:00","versionOfRecord":[],"versionCreatedAt":"2026-02-19 12:39:35","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8872055","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8872055","identity":"rs-8872055","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.