Risk prediction models for delirium in ICU patients: A systematic review and critical appraisal | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Risk prediction models for delirium in ICU patients: A systematic review and critical appraisal Wen-Hua Chen, Lei Ding, Yue Sha, gongqian lu, Kaimin Qian, Bin Wang, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7799974/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 13 Dec, 2025 Read the published version in BMC Anesthesiology → Version 1 posted 13 You are reading this latest preprint version Abstract This systematic review critically appraises 26 studies on risk prediction models for delirium in ICU patients. Despite the development of 25 distinct models incorporating common predictors like age, sedation use, and APACHE-II scores, and demonstrating apparently strong discriminatory performance, most models exhibited significant methodological limitations. These included widespread overfitting, inadequate handling of missing data, predominant reliance on internal validation only, and heterogeneous outcome assessment. Only four models underwent robust external validation. The findings indicate that while machine learning approaches like XGBoost show promise, fundamental methodological shortcomings substantially limit the clinical applicability and generalizability of existing prediction tools. Future research must prioritize methodological rigor, external validation in diverse populations, and implementation studies to assess real-world clinical impact before these models can be recommended for routine use. Delirium Intensive Care Units Prediction Models Systematic review Figures Figure 1 Figure 2 1. Introduction Delirium is an acute neurocognitive disorder, marked by transient and fluctuating impairments in consciousness and awareness, that frequently arises in intensive care unit (ICU) settings [ 1 ] . The prevalence of delirium in ICU has been reported to be as high as 60–80% [ 2 , 3 ] , contributing to an estimated annual healthcare expenditure of $ 164 billion [ 4 ] . Delirium is strongly associated with prolonged mechanical ventilation, extended hospital stays, long-term cognitive impairment, and increased mortality rates [ 5 – 7 ] . Furthermore, it exacerbates patients' physical and psychological burdens [ 8 – 10 ] and significantly increases socioeconomic costs for families and healthcare systems [ 11 , 12 ] , posing multidimensional challenges for patients, healthcare providers, and the medical system. Given these circumstances, early identification and intervention are critical. However, heterogeneous clinical manifestations of delirium—combined wit1h factors such as sedative use and mechanical ventilation that often mask or interfere with symptom presentation—result in persistently high rates of underdiagnosis and delayed interventions in clinical practice. Therefore, the development of effective risk prediction models to enable accurate identification of high-risk patients has emerged as a pivotal strategy for achieving precise delirium management and improving long-term outcomes in critically ill populations. In recent years, researchers have developed multiple risk prediction models for ICU delirium to assist healthcare providers in rapidly screening high-risk populations through quantitative assessment. For instance, the PRE-DELIRIC model incorporates 10 risk factors (e.g., use of sedatives, APACHE-II score, metabolic disorders) to predict delirium risk within 24 hours of ICU admission [ 13 ] , while the E-PRE-DELIRIC model further simplifies variables and extends applicability to early admission assessments [ 14 ] . Additionally, other models integrate diverse predictors such as age, mechanical ventilation, sedation use, and biomarkers [ 1 , 2 , 4 , 15 ] . These models are constructed using algorithms ranging from logistic regression to machine learning, some of which have undergone validation in specific cohorts. However, marked heterogeneity in variable selection and applicable populations across different models limits their clinical generalizability. Although existing models provide tools for delirium risk stratification, their methodological quality and clinical applicability remain controversial. Due to population heterogeneity, such as differences in patient characteristics between surgical ICU and medical ICU settings, some models demonstrate limited generalizability in external validation. The complexity of variable collection further restricts their practical implementation in clinical practice. Additionally, most models are developed based on single-center, retrospective data with limited sample sizes and insufficient external validation, leading to a high risk of overfitting. In this context, there is an urgent need for a systematic evaluation of the predictive performance and risk of bias in delirium risk prediction models for ICU patients. Therefore, this review seeks to synthesize evidence from studies that developed or validated ICU delirium risk prediction models, critically appraise existing models, provide evidence-based references for clinical practice, and offer insights to guide future research. 2. Methods This systematic review adheres to the Preferred Reporting of Items in Systematic Reviews and Meta-Analyzes (PRISMA) guidance to ensure transparent reporting of prediction model studies [ 16 ] . The study protocol was prospectively registered on PROSPERO (CRD420251028221). 2.1 Search Strategy A systematic literature search was conducted from database inception to April 8, 2025, across PubMed, Embase, Web of Science, and the Cochrane Library, with language restrictions limited to English. The search strategy employed the following Boolean logic: (predict* OR prognos* OR risk OR prediction model) AND (critical care OR intensive care unit OR ICU OR critically ill) AND (delirium OR ICU syndrome OR acute confusional state OR acute brain dysfunction). 2.2 Eligibility Criteria Inclusion Criteria: (1) Study design: Cohort studies or case report studies; (2) Study participants: ICU patients (age ≧ 18 years); (3) Study content: development or validation of multivariable prediction models to predict the risk of delirium in ICU patients; (4) Model performance was evaluated by at least one metric, such as the area under the curve (AUC), Hosmer-Lemeshow test, and sensitivity. Exclusion criteria: (1) Reviews, conference abstracts, letters; (2) Studies focusing on pediatric ICU patients, non-ICU settings (such as general ward), or subsyndromic delirium; (3) Studies utilizing univariable prediction models; (4) Studies focused only on risk factors or incidence rates. 2.3 Study selection The initial screening of titles and abstracts was performed independently by two reviewers to identify studies that potentially satisfied the inclusion criteria. Subsequently, full-text assessments of the remaining studies were performed to confirm eligibility. Any discrepancies during the screening process were resolved through discussion or by consulting a third independent reviewer. 2.4 Data Extraction Two independent reviewers extracted data using a pre-designed Excel template based on the CHARMS (Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies) checklist [ 17 ] . Any discrepancies were resolved through discussion or adjudication by a third reviewer. The extracted information included: (1) basic information: authors(years), participant country, study design, participant profile, delirium cases, sample size, outcome measurement methods; (2) model development details: predictors, methods for handling missing data, modeling algorithms, model validation approaches, model performance metrics. 2.5 Risk of Bias and Applicability Assessment The methodological quality of the included studies was evaluated independently by two reviewers with the Prediction Model Risk of Bias Assessment Tool (PROBAST) [ 18 , 19 ] , which evaluates four domains: participant selection, predictors, outcome assessment, and statistical analysis (e.g., handling of missing data, overfitting risk). Each domain was rated as low risk, high risk, or unclear risk of bias (ROB), and an overall judgment was derived from the domain-specific ratings. Discrepancies were resolved through discussion. If consensus could not be reached, a third reviewer was consulted to reach a consensus. 2.6 Data synthesis Due to substantial heterogeneity in participant characteristics, model predictors, and modeling methodologies, we employed a narrative synthesis approach to summarize study findings, without conducting quantitative analysis. We systematically extracted and summarized the following information from all included models and their development/validation processes: study design, participant characteristics, outcome measurement, predictors, handling of missing data, model development/validation methods, discrimination and calibration results, and model presentation formats. For model discrimination, an AUC or c-index ≥ 0.70 was considered indicative of good discriminatory ability [ 20 , 21 ] . For calibration, acceptable performance was defined as a Hosmer-Lemeshow test p-value > 0.05 or a Brier score < 0.25 [ 20 , 22 ] . 3. Results 3.1 Study Selection The initial database search yielded a total of 10563 records. Among these, 5717 were duplicate records. Following the review of titles and abstracts, 4803 records were excluded as they did not meet the inclusion criteria, retaining 43 articles for further assessment. During the full-text screening phase, 17 articles were excluded for the reasons depicted in Fig. 1. Consequently, 26 studies were ultimately included [ 1 , 2 , 4 , 13 – 15 , 23 – 42 ] . The process of study selection is summarized in Fig. 1. 3.2 Characteristics of included studies Table 1 summarized the key characteristics of the included studies. Published between 2014 and 2025, these studies introduced a total of 25 prediction models. The research encompassed ICU patients from countries including the United States, China, South Korea, Japan, Australia, Canada, and Denmark. Among these, 10 studies were prospective cohort studies, 14 utilized a retrospective cohort design, one employed a combination of prospective and retrospective methods, and one was a case-control study. Nineteen studies assessed delirium using the Confusion Assessment Method for the Intensive Care Unit (CAM-ICU), one study used the Intensive Care Delirium Screening Checklist (ICDSC), three studies employed both CAM-ICU and ICDSC, and one study utilized CAM-ICU alongside the Richmond Agitation-Sedation Scale (RASS). Delirium assessments were conducted twice daily or more frequently in 15 of these studies. Regarding candidate predictors, only the studies by Wassenaar [ 26 , 29 ] , Shi, and Anton [ 13 ] did not provide any information on the initial pool of candidate predictors. The studies by Ko [ 32 ] , Hur [ 35 , 36 ] , Green, and Gong [ 4 ] did not report the specific number of candidate predictors considered but did describe their general categories or types. In the remaining studies, the number of candidate predictors ranged from 9 to 70. Table 1 Characteristics of included studies. Authors (years) Participant country Study design Participants (male/female) Delirium cases /Sample size Outcome measurement Timing of delirium assessment Candidate predictors Zhang et al. (2023) America Retrospective cohort Age ≧ 18, sepsis patients in different ICUs (6102/8518) 5390/14620 CAM-ICU ─ 53 (age, weight, gender, ethnicity, ICU type, vital signs, laboratory tests, GCS and SOFA scores, treatment measures and comorbidity) Zhang et al. (2021) China Prospective cohort Age ≧ 18, patients in mixed ICU (145/78) 46/223 CAM-ICU Twice a day (at 9 am-11 am and 3 pm-5 pm) 16 (age, gender, history of hypertension, heart disease, history of pulmonary dysfunction, alcohol abuse, history of nicotine, history of peptic ulcer, hypoxaemia, hypotension, deep sedation, benzodiazepines, mechanical ventilation, metabolic acidosis, sepsis and surgery) Wu et al. (2025) America Retrospective cohort Age ≧ 65, elderly patients with COPD and respiratory failure in ICU 146/1155 ─ ─ 65 (age, weight, length of hospitalization, clinical scores, vital signs, ventilation mode, and laboratory tests) Wassenaar et al. (2015) 7 countries Prospective cohort Age ≧ 18, patients in ICU (1716/1198) 689/2914 CAM-ICU Once every 8 h or 12 h 18 (age, gender, history of cognitive impairment, history of alcohol, nicotine and drugs abuse, history of vascular disease, Glasgow Coma Scale score, diabetes, blood urea nitrogen, use of opiates in the 24 h before ICU admission, use of anti-psychotics before ICU admission, admission category, urgent admission, mean arterial blood pressure, infection, use of corticosteroids, or respiratory failure) Wassenaar et al. (2018) 7 countries Prospective cohort Age ≧ 18, patients in ICU (1324/854) 467/2178 CAM-ICU/ ICDSC At least every 12 h ─ Wang et al. (2020) China Prospective cohort Age ≧ 18, patients in NICU (140/170) 118/310 CAM-ICU At 08:00–09:00 and 20:00–21:00 28 (General data, condition monitoring indicators, the Glasgow Coma Scale and APACHE II score) Boogaard et al. (2014) 6 countries Prospective cohort Age ≧ 18, patients in ICU (1040/784) 363/1824 CAM-ICU At least twice daily 10 (age, APACHE-II, urgent and admission category, infection, coma, sedation, morphine use, urea level, metabolic acidosis) Tang et al. (2024) America Retrospective cohort Age ≧ 65, elderly patients in ICU (5277/4471) 4243/9748 CAM-ICU ─ 48 (demographic characteristics, admission condition, chronic comorbidities, disease severity scores, vital signs and laboratory indicators) Shi et al. (2022) China Retrospective cohort Age ≧ 18, patients in EICU (204/115) 96/319 CAM-ICU/ RASS At every 24 h ─ Park et al. (2025) Korea Retrospective and prospective cohorts Age ≧ 18, patients in ICU (6404/4182) ─/10586 CAM-ICU ─ 19 (demographic variables and parameters derived from ECG lead II, PPG, and respiratory waveforms) Miyamoto et al. (2020) Japan Prospective cohort Mechanically ventilated patients with sepsis in ICU (99/59) 63/158 CAM-ICU At least once daily 10 (age, APACHE II score, presence of coma, admission route, presence of infection, presence of metabolic acidosis, morphine dose on the first day, sedative usage on the first day, blood urea nitrogen, and incidence of urgent admission) Ma et al. (2024) America Retrospective cohort Age ≧ 65, patients in ICU (10285/8475) 3463/18760 CAM-ICU ─ 60 (demographic characteristics, vital signs and laboratory indicators) Ko et al. (2024) Korea Retrospective cohort Age ≧ 18, patients in CICU (1794/980) 677/2774 CAM-ICU Three times a day Clinical characteristics, primary diagnoses, vital signs, laboratory test results, and clinical presentations at CICU admission Kim et al. (2022) Korea Retrospective cohort Age ≧ 20, patients in mixed ICU (2246/1451) 741/3697 CAM-ICU At 10 am 14 (basic information, drug usage, and procedure/intervention application) Kim et al (2024) Korea Case-control Acute stroke patients in NICU (261/159) 84/420 CAM-ICU/ ICDSC Every 8h 50 (clinical features at admission and features based on vital signs) Hur et al. (2021) Korea Retrospective cohort Age ≧ 18, patients in the medical or surgical ICU (7877/4532) 3816/12409 CAM-ICU 3 times a day General information, Admission category, Reason for ICU admission, Vital signs, Comorbidity, Laboratory tests, Medications Green et al. (2019) Australia Prospective cohort Patients in ICU (241/214) 160/455 CAM-ICU Twice a day Demographic information, APACHE II and III illness severity scores, a morphine equivalent and blood urea nitrogen Gong et al. (2023) America Retrospective cohort Patients in ICU (9616/2536) 2536/18302 CAM-ICU ─ Demographics, medical history and comorbidities, laboratory studies, medications administered, other treatments, nurse documentation, and physiologic time series Gao W et al. (2022) China Retrospective cohort Age ≧ 18, patients following cardiac surgeries in CSICU (483/242) 120/725 CAM-ICU Every 12 hours 9 (age, history of cognitive impairment and alcohol abuse, admission category (surgical), urgent admission, MAP and blood urea nitrogen at time of ICU admission, use of corticosteroids, and respiratory failure) Fan et al. (2019) China Prospective cohort Age ≧ 18, patients in SICU, TVICU, CCU, RICU (202/134) 68/336 CAM-ICU At around 7 AM and 7 PM 13 (baseline demographic data, history of chronic Diseases and delirium, history of alcohol drinking or abuse, visual and hearing deficits, disease-related factors, and iatrogenic and environmental factors) Esumi et al. (2025) Japan Retrospective cohort Age ≧ 18, patients with burns in ICU (52/30) 32/82 CAM-ICU/ ICDSC Every 8 hours 70 (Physiological, biochemical, and clinical data) Coombes et al. (2021) America Retrospective cohort Age ≧ 18, patients in ICU (27220/21321) 3850/48541 ─ ─ 31 (laboratory and imaging orders and 4 medications) Cherak et al. (2020) Canada Retrospective cohort Age ≧ 18, patients in ICU (5113/3765) 4431/8878 ICDSC Twice a day 10 (age, sex, APACHE II score at admission, GCS score at admission, SOFA score at admission, Charlson Comorbidity Index at admission, vasoactive medication receipt within 24 hours of ICU admission, pre-existing neuropsychiatric disorder, continuous renal replacement therapy receipt within 24 hours of ICU admission, and invasive mechanical ventilation receipt within 24 hours of ICU admission) Chen et al. (2017) China Prospective cohort Age ≧ 18, patients in ICU (305/315) 160/620 CAM-ICU At both 9 am and 5 pm 11 (age, APACHE II score, coma, emergency operation, mechanical ventilation, multiple trauma, metabolic acidosis, history of hypertension, history of delirium, history of dementia, and the application of Dexmedetomidine Hydrochloride within 24 hours after admission to the ICU) Bhattachary-ya et al. (2022) America Retrospective cohort Age ≧ 18, patients in ICU (12384/10456) 4421/22840 CAM-ICU ─ 21 (demographic data, vital signs, laboratory values, and vasopressor dose that fulfilled above criteria) Anton et al. (2024) Denmark Prospective cohort Age ≧ 18, patients in four mixed ICUs (395/265) 247/660 CAM-ICU Twice daily ─ CAM-ICU, Confusion Assessment Method for ICU; ICDSC, Intensive Care Delirium Screening Checklist; ─, no information; ICU, Intensive Care Unit; NICU, Neonatal Intensive Care Unit; EICU, Emergency Intensive Care Unit; CICU, Cardiac Intensive Care Unit; CSICU, Cardiac Surgery Intensive Care Unit; GCS, Glasgow Coma Scale; SOFA, Sequential Organ Failure Assessment; APACHE, Acute Physiology And Chronic Health Evaluation; ECG, Electrocardiogram; PPG, Photoplethysmography; MAP, Mean Arterial Pressure. 3.3 Characteristics of Prediction Models Table 2 details the characteristics of the risk prediction models for delirium in ICU patients. Regarding the methods for handling missing data, the approaches varied considerably. One model handled missing data by excluding cases [ 1 ] , and four models used a combination of excluding incomplete cases and multiple imputation methods [ 23 , 25 ] . One model employed the Multivariate Imputation by Chained Equations (MICE) method [ 34 ] , one used a combination of filled with mean values and left blank [ 35 ] , and one model used an unspecified "replacement" method [ 37 ] . Two models applied mean imputation [ 13 , 38 ] , and one used a combination of excluding and mean imputation [ 4 ] . One model utilized forward and backward imputation [ 42 ] . Four studies reported that either no method was used to handle missing data or that no missing data were present. Additionally, the method for handling missing data was not reported for eight models. Univariable analysis was the most frequently employed method for predictor screening. Three models used Least Absolute Shrinkage and Selection Operator (LASSO) regression for predictor selection [ 2 , 23 , 31 ] , and one model combined LASSO regression with the optimal subset method [ 25 ] . One model utilized a combination of Random Forest (RF), extreme gradient boosting, partial least squares, and Plmnet-elastic-net [ 32 ] . One model employed stepwise selection [ 34 ] . Furthermore, the predictor pre-screening process was not reported in fourteen studies. The majority of the included models utilized logistic regression analysis to identify the final predictors. Furthermore, machine learning methods were employed to generate and compare models, including Support Vector Machine (SVM), eXtreme Gradient Boosting (XGBoost), Random Forest (RF), K-Nearest Neighbors (KNN), Decision Tree (DT), Naïve Bayes (NB), extra-trees classifier, LightGBM, Deep Neural Network (DNN), neural network, adaptive boosting (AdaBoost), Linear Discriminant Analysis (LDA), Classification and Regression Trees (CART), Bidirectional Long Short-Term Memory (BiLSTM), and CatBoost. After generating and comparing the models, 13 studies demonstrated that the model developed using logistic regression exhibited the best performance [ 14 , 15 , 24 , 27 – 29 , 32 – 34 , 38 – 41 ] . Four studies found that the model generated by XGBoost performed best [ 2 , 23 , 25 , 31 ] . One study reported that the XGBoost-based model showed superior performance upon internal validation, whereas the RF-based model performed better upon external validation [ 35 ] . Two other studies indicated that models generated by CatBoost and BiLSTM, respectively, yielded the best performance [ 4 , 42 ] . An additional five studies did not specify the optimal modeling method. The final number of predictors included in the models ranged from 5 to 59, encompassing a total of 170 distinct predictor variables. Figure 2 presents the top 21 most frequently occurring predictors. The most common predictor for delirium in ICU patients was age, followed by sedation, APACHE-II score, urgent admission, mechanical ventilation, and GCS score. Other high-frequency predictors included blood urea nitrogen, coma, infection, and metabolic acidosis, each of which was incorporated into eight models. Three predictors—history of cognitive impairment, urea concentration, and morphine use—were included in six models. Additionally, eight predictors—respiratory rate, sex, history of alcohol abuse, admission category, mean arterial pressure, use of corticosteroids, respiratory failure, and heart rate—were featured in five models each. The specific predictors included in the final models are detailed in Table 3 . 3.4 Characteristics of model validation Table 2 presents the model validation characteristics. Among them, four models were validated both internally and externally [ 1 , 14 , 23 , 35 ] . Fourteen models underwent internal validation only [ 2 , 15 , 24 , 25 , 29 , 31 – 34 , 38 – 42 ] , and seven models were subjected to external validation only [ 4 , 13 , 26 , 27 , 30 , 36 , 37 ] . Additionally, one model reported no validation following its development. Of the 18 models that completed internal validation, the methods varied: one utilized bootstrapping [ 29 ] , 13 employed a random split method [ 1 , 2 , 14 , 15 , 24 , 25 , 31 , 33 – 35 , 38 , 39 , 41 ] , one used internal cross-validation [ 40 ] , and one implemented stratified repeated cross-validation [ 42 ] . The specific internal validation method was not reported for the remaining two models. 3.5 Characteristics of model performance As detailed in Table 2 , all included studies reported metrics for model discrimination, which were evaluated using the area under the curve (AUC). Among the 21 studies involving model development, 15 reported discrimination performance, with AUC values ranging from 0.67 to 0.921. Regarding model validation, substantial variation was observed across all included models, with AUC values ranging from 0.63 to 0.932. A total of 14 studies reported on model calibration. Three studies utilized calibration curves for validation, indicating good calibration performance [ 1 , 2 , 23 ] . Four studies employed calibration plots for assessment [ 14 , 26 , 30 , 31 ] ; among these, only the study by Miyamoto et al [ 30 ] . showed inadequate calibration. Four studies evaluated calibration using the Hosmer–Lemeshow test, with results suggesting well-calibrated models [ 24 , 28 , 37 , 40 ] . Three studies applied the Brier score as a metric [ 4 , 31 , 35 ] . Only two studies reported calibration slope values (1.09, 1.07, and 0.63) [ 13 , 28 ] . Additionally, the study by Anton et al. provided intercept values (0.11 and 0.19) [ 13 ] . Furthermore, 18 studies reported model performance using metrics including sensitivity, specificity, negative predictive value, positive predictive value, accuracy, the kappa coefficient, precision, recall, the F1-score, and the Matthews correlation coefficient (MCC). To assess net clinical benefit, decision curve analysis was employed in five of the included studies, with favorable outcomes [ 1 , 25 , 29 , 31 , 35 ] . Table 2 Characteristics of delirium risk prediction models for ICU patients. Authors (years) Missing data Variabies selection Modeling method Model validation Model name AUC Other indexes Calibration Zhang et al. (2023) Excluding and multiple imputation LASSO regression LR, SVM, XGBoost, RF, KNN, DT, and NB Internal and external validation XGBoost AUC int = 0.793 AUC ext = 0.701 Se int = 0.852, Se ext = 0.664, Sp int = 0.568, Sp ext = 0.579, PPV int = 0.769, PPV ext = 0.789, NPV int = 0.694, NPV ext = 0.421, Acc int = 0.746, Acc ext = 0.639, Ka int = 0.436, Ka ext = 0.220 Calibration curve: best fit Zhang et al. (2021) ─ UR LR Internal validation (random split) LR AUC = 0.862 AUC int = 0.739 ─ H-L test: P >0.05 Wu et al. (2025) Excluding and multiple imputation LASSO regression and the optimal subset method LR, KNN, RF and XGBoost Internal validation (80/20 split ratio) XGBoost AUC = 0.921 AUC int = 0.932 Acc int = 0.891, F1 int = 0.810, Pre int = 0.839, Recall int = 0.795 DCA: favorable results ─ Wassenaar et al. (2015) Multiple imputation LR LR Internal validation (random split) and external validation (temporal validation) E-PRE-DELIRIC AUC = 0.76 AUC int = 0.75 AUC ext : 0.70–0.81 Se = 0.71, Sp = 0.69, Se ext : 0.62–0.78, Sp ext : 0.67–0.68 Calibration plots: well Wassenaar et al. (2018) ─ ─ ─ External validation PRE-DELIRIC and E-PRE-DELIRIC AUC PRE−DELIRIC = 0.76 AUC E−PRE−DELIRIC = 0.68 Se PRE−DELIRIC = 0.69 Sp PRE−DELIRIC = 0.66 Se E−PRE−DELIRIC = 0.60 Sp E−PRE−DELIRIC = 0.65 Calibration plots: well Wang et al. (2020) ─ UR LR (stepwise) External validation LR AUC ext = 0.80 Se ext = 0.68, Sp ext = 0.83 ─ Boogaard et al. (2014) Multiple imputation ─ LR (stepwise) ─ PRE-DELIRIC (Recalibration) AUC = 0.76 Se = 0.70, Sp = 0.73 Calibration slope = 1.09 H-L test: P = 0.045 Tang et al. (2024) Excluding and multiple imputation LASSO regression LR, DT, SVM, XGBoost, KNN, and NB Internal validation (random split) XGBoost AUC = 0.836 AUC int = 0.810 Acc = 0.765, Se = 0.713, F1 = 0.725, Recall = 0.713, Acc int = 0.744 Calibration curve: best fit Shi et al. (2022) Not handling UR LR Bootstrap internal validation BS full and BS stepwise AUC full = 0.75 AUC stepwise = 0.75 DCA: good clinical practicability ─ Park et al. (2025) Excluding ─ RF, extra-trees classifier and LightGBM Internal validation (8:2 split ratio) and external validation (temporal validation) RF AUC = 0.757 AUC int = 0.82 AUC ext = 0.82 DCA: a greater net benefit Calibration curve: strongly agreement Miyamoto et al. (2020) ─ ─ ─ External validation PRE-DELIRIC AUC = 0.60 Se = 0.57, Sp = 0.68 Calibration plot: no good calibration Ma et al. (2024) Excluding and multiple imputation LASSO regression LR and XGBoost Internal validation (random split) XGBoost AUC = 0.853 AUC int = 0.831 Acc = 0.757, Se = 0.794, Sp = 0.748, F1 = 0.547, Acc int = 0.753, Se int = 0.775, Sp int = 0.748, F1 int = 0.534, DCA: a higher net beneft Brier score = 0.106, Brier score int = 0.113 Calibration plot: high clinical utility Ko et al. (2024) ─ RF, extreme gradient boosting, partial least squares, and Plmnet-elastic.net LR Internal validation LR AUC = 0.860 AUC int = 0.855 ─ ─ Kim et al. (2022) ─ Univariable LR LR (stepwise) and RF Internal validation (random split) LR AUC = 0.820 AUC int = 0.779 Se int : 0.42, 0.67, 0.86, Sp int : 0.49, 0.71, 0.90 ─ Kim et al (2024) MICE replacement method Stepwise selection LR, RF, LightGBM, SVM and XGBoost Internal validation (random split) LR AUC = 0.80 AUC int = 0.71 Se = 0.75, Sp = 0.72 ─ Hur et al. (2021) Filled with mean values and left blank ─ RF, XGBoost, DNN and LR Internal validation (random split) and external validation PRIDE XGBoost: AUC int = 0.919 RF: AUC ext = 0.721 Se int = 0.904, Se ext = 0.91, Sp int = 0.731, Sp ext = 0.27, PPV int = 0.565, PPV ext = 0.159, NPV int = 0.952, NPV ext = 0.952 DCA: net benefit Brier score int = 0.094, Brier score ext = 0.168 Green et al. (2019) ─ ─ ─ External validation PRE-DELIRIC, recalibrated PRE-DELIRIC, E-PRE-DELIRIC and Lanzhou AUC PRE−DELIRIC = 0.79 AUC recalibrated PRE−DELIRIC = 0.79 AUC E−PRE−DELIRIC = 0.72 AUC Lanzhou = 0.77 ─ ─ Gong et al. (2023) Excluding and mean imputation ─ CatBoost External validation 24-h model and dynamic model 24-h model: AUC = 0.785 AUC ext : 0.796, 0.810 dynamic model: AUC = 0.845 AUC ext : 0.804, 0.838 24-h model: Se = 0.85, Sp = 0.556 dynamic model: Se = 0.85, Sp = 0.657 24-h model: Brier score = 0.102, Brier score ext = 0.105, 0.110 dynamic model: Brier score = 0.111, Brier score ext = 0.165, 0.132 Gao W et al. (2022) Replacement ─ ─ External validation E-PRE-DELIRIC AUC = 0.54 ─ H-L test: P = 0.027 Fan et al. (2019) Mean imputation UR Multiple LR (backward stepwise) Internal validation (random split) DYNAMIC-ICU AUC int : 0.907, 0.888, 0.874, 0.900 ─ ─ Esumi et al. (2025) Not handling ─ LR, RF, SVM, neural network, KNN, DT, NB, AdaBoost, GBM and LDA Internal validation (random split) LR AUC = 0.906 MCC = 0.625, Acc = 0.818, Pre = 0.797, Recall = 0.743, F1 = 0.755 ─ Coombes et al. (2021) Not handling ─ LR, CART, RF, NB and SVM Internal validation (random split) LR AUC = 0.83 AUC int = 0.83 Se int = 0.794, Sp int = 0.715, PPV int = 0.197, NPV int = 0.976 ─ Cherak et al. (2020) Not handling ─ LASSO LR Internal cross-validation LASSO LR AUC: 0.67–0.78 Se:0.532–0.639 Sp: 0.690–0.746 H-L test: P : 0.13–0.98 Chen et al. (2017) ─ ─ Multiple LR Internal validation (random split) Multiple LR AUC int = 0.78 Se int : 0.305–0.756 Sp int : 0.667–0.982 ─ Bhattachary-ya et al. (2022) Forward and backward imputation ─ LR, RF and BiLSTM Stratified repeated cross-validation BiLSTM AUC: 0.849–0.884 Pre: 0.375, 0.175, 0.270, 0.113 Recall: 0.861, 0.756, 0.937, 0.926 ─ Anton et al. (2024) Mean imputation ─ ─ External validation PRE-DELIRIC and E-PRE-DELIRIC AUC PRE−DELIRIC = 0.70 AUC E−PRE−DELIRIC = 0.63 ─ Calibration slope PRE−DELIRIC = 1.07 Intercept PRE−DELIRIC = 0.11 Calibration slope E−PRE−DELIRIC = 0.63 Intercept E−PRE−DELIRIC = 0.19 LASSO, Least Absolute Shrinkage and Selection Operator; LR, logistic regression; SVM, Support Vector Machine; XGBoost ,eXtreme Gradient Boosting; RF, Random Forest; KNN, K-Nearest Neighbors; DT, Decision Tree; NB, Naïve Bayes; AUC, area under the curve; Se, sensitivity; Sp, specificity; NPV, negative predictive value; PPV, positive predictive value; Acc, accuracy; Ka, the kappa coefficient, precision; UR, univariate regression; LR, logistic regression; H-L test, Hosmer–Lemeshow test; DCA, decision curve analysis; BS, backward selection; LightGBM, Light Gradient Boosting Machine; DNN, Deep Neural Network; PRIDE, prediction of ICU delirium; AdaBoost, adaptive boosting; GBM, gradient-boosting machine; LDA, Linear Discriminant Analysis; MCC, the Matthews correlation coefficient; CART, Classification and Regression Trees; BiLSTM, Bidirectional Long Short-Term Memory; MICE, Multivariate Imputation by Chained Equations; ─, no information. Table 3 Final predictors in risk prediction models for ICU delirium patients. Authors (years) NO. of predictors Final predictors Zhang et al. (2023) 15 Mechanical ventilation, cardiovascular ICU (CVICU), GCS score, sedation, acute kidney injury (AKI), temperature, anion gap, blood sodium, vasopressors, respiratory rate, age, stroke, bicarbonate, platelets, and white blood cells Zhang et al. (2021) 6 History of hypertension, hypoxaemia, use of benzodiazepines, deep sedation, sepsis and mechanical ventilation Wu et al. (2025) 8 GCS verbal score, length of hospital stay, mean SpO₂ on the first day of ICU admission, Modification of Diet in Renal Disease (MDRD) equation score, mean diastolic blood pressure, GCS motor score, gender, and duration of noninvasive ventilation Wassenaar et al. (2015) 9 Age, history of cognitive impairment, history of alcohol abuse, blood urea nitrogen, admission category, urgent admission, mean arterial blood pressure, use of corticosteroids, and respiratory failure Wassenaar et al. (2018) 10/9 PRE-DELIRIC: age, APACHE-II score, admission group, coma, infection, metabolic acidosis, use of sedatives and morphine, urea concentration, and urgent admission E-PRE-DELIRIC: Age, history of cognitive impairment, history of alcohol abuse, blood urea nitrogen, admission category, urgent admission, mean arterial blood pressure, use of corticosteroids, and respiratory failure Wang et al. (2020) 6 Cognitive dysfunction on admission, fever, hypoalbuminemia, abnormal liver function, sedative use ≥ 4 times, and physical restraint Boogaard et al. (2014) 10 Age, APACHE-II, urgent and admission category, infection, coma, sedation, morphine use, urea level, metabolic acidosis Tang et al. (2024) 10 GCS score, mechanical ventilation, sedation, ICU type, the Acute Physiology Score III (APSIII), temperature, age, diastolic blood pressure, oxyhemoglobin saturation and SOFA score Shi et al. (2022) 5 Stomach and urinary tubes, sedative, mechanical ventilation and APACHE-II scores Park et al. (2025) 8 Age, sex, ECG-derived features (activity, complexity, mobility, kurtosis, skewness), PPG-derived features (activity, kurtosis, skewness), respiratory waveform-derived features (activity, kurtosis, skewness), HR (median, SD), RR (median, SD), and SpO2 (median, SD) Miyamoto et al. (2020) 10 Age, APACHE-II score, admission group, coma, infection, metabolic acidosis, use of sedatives and morphine, urea concentration, and urgent admission Ma et al. (2024) 22 Age, dementia, SOFA, frst care unit, infection, the maximum values of GCS, creatinine, calcium, sodium, heart rate, SBP, DBP, respiratory rate, temperature, the minimum values of hematocrit, platelets, MCHC, creatinine, glucose, potassium, DBP, respiratory rate Ko et al. (2024) 8 Albumin level, international normalized ratio, blood urea nitrogen, white blood cell count, C-reactive protein level, age, heart rate, and mechanical ventilation Kim et al. (2022) 6 Old age, hospitalization through the emergency room, applying restraint, drainage tube, using benzodiazepines, and some types of opioid analgesics Kim et al (2024) 20 Age, Sex, Alcohol Intake, National Institute of Health Stroke Scale (NIHSS), HbA1c, Prothrombin time, D-dimer, and Hemoglobin, Mean or Variability indexes calculated from Body Temperature (BT), Heart Rate (HR), Respiratory Rate (RR), Oxygen saturation (SpO2), Systolic Blood Pressure (SBP), and Diastolic Blood Pressure (DBP) Hur et al. (2021) 59 Age, sex, and invasive mechanical ventilation, medical ICU or surgical ICU, respiratory, cardiovascular, gastrointestinal, neurology, perioperative, nephrology, metabolic, and trauma, systolic blood pressure, diastolic blood pressure, heart rate, respiratory rate, peripheral capillary oxygen saturation, and Glasgow Coma Scale (eye, verbal, and motor), charlson Comorbidity Index, white blood count, hemoglobin, hematocrit, platelet count, and erythrocyte sedimentation rate, prothrombin time (INR) and activated partial thromboplastin time, total protein, albumin, total bilirubin, aspartate aminotransferase, alanine aminotransferase, glucose fasting, blood urea nitrogen, creatinine, phosphorus, sodium, potassium, magnesium, calcium (ionized), C-reactive protein quantitative, and lactic acid, pH, PaCO 2 , PaO 2 , HCO 3 and O 2 Saturation Green et al. (2019) 10/10/9/11 PRE-DELIRIC: age, APACHE II score, coma (drug-induced or otherwise), patient classification (medical, surgical, trauma, neurologic), presence of infection, metabolic acidosis, morphine dose, use of sedatives, urea concentration and emergency admission recalibrated PRE-DELIRIC: age, APACHE II score, coma (drug-induced or otherwise), patient classification (medical, surgical, trauma, neurologic), presence of infection, metabolic acidosis, morphine dose, use of sedatives, urea concentration and mergency admission E-PRE-DELIRIC: age, history of cognitive impairment, history of alcohol abuse, patient classification (medical, surgical, trauma, neurologic), mean arterial pressure at ICU admission, use of corticosteroids, presence of respiratory failure, blood urea nitrogen at ICU admission and emergency admission Lanzhou: age, APACHE II score, mechanical ventilation, emergency surgery, coma, multiple trauma, metabolic acidosis, history of hypertension, history of delirium, history of dementia and use of dexmedetomidine Gong et al. (2023) 20/20 24-h model: mean total glasgow coma score, mean verbal glasgow coma score, age in years, maximum richmond agitation sedation scale, minimum richmond agitation sedation scale, APACHE IV score, mean richmond agitation sedation scale, 24-hour urine output, SOFA respiratory subscore, mean corpuscle volume labs ordered, neuro ICU type, H 2 blocker administrations, minimum temperature, maximum red cell distribution width, maximum temperature, medical-surgical ICU type, maximum potassium, mean blood urea nitrogen, minimum bicarbonate and bicarbonate labs ordered dynamic model: current ICU length of stay, last richmond agitation sedation scale, last total glasgow coma score, last verbal glasgow coma score, APACHE IV score, hospital teaching status, age in years, last temperature, last motor glasgow coma score, hospital size, given glucose elevating agents, medical-surgical ICU type, given H 2 blockers, given monoamine oxidase inhibitors, SOFA subscore: nervous, alcohol abuse, given tetracyclic antidepressants, current time of day, given glycopeptides and neurology-related diagnosis Gao W et al. (2022) 9 Age, history of cognitive impairment, history of alcohol abuse, blood urea nitrogen, admission category, urgent admission, mean arterial blood pressure, use of corticosteroids and respiratory failure Fan et al. (2019) 7 History of chronic diseases, hearing deficits, infection, higher APACHE II scores, the use of sedatives and analgesics, indwelling catheter, and sleep disturbance Esumi et al. (2025) 15 Daily urinary output, eosinophil count, age, basophil count, fibrinogen, dBP a , PO 2 , CPK b , LDH c , AST d , sBP e , glucose, height, ALP f and burn area Coombes et al. (2021) 17 Mental status, deliri*, hallucin*, confus*, reorient*, urine culture, ABG a , renal function panel, CBC b , thyroid function test, toxicology screen, autoimmune serology, B vitamins, HIV antibody, antipsychotics, benzodiazepines and medetomidine Cherak et al. (2020) 12 Sex, age, admission type, APACHE II score, GCS, SOFA score, charlson comorbidity Index, pre-existing neuropsychiatric disorder, vasoactive medication use, required continuous renal replacement therapy, required invasive mechanical ventilation and constant Chen et al. (2017) 11 Age, APACHE II score, mechanical ventilation, emergency operation, Coma, multiple trauma, metabolic acidosis, history of hypertension, history of delirium, history of dementia and dexmedetomidine hydrochloride Bhattachary-ya et al. (2022) 21 Age, sex, height, weight, heart rate, oxygen saturation, glucose, temperature, serum sodium, BUN, WBC, hemoglobin, platelets, serum potassium, chloride, serum bicarbonate, serum creatinine, ventilation, total norepinephrine dose, SOFA and SOFA without GCS Anton et al. (2024) 10/9 PRE-DELIRIC: age, APACHE-II score, admission group, coma, infection, metabolic acidosis, use of sedatives and morphine, urea concentration, and urgent admission E-PRE-DELIRIC: Age, history of cognitive impairment, history of alcohol abuse, blood urea nitrogen, admission category, urgent admission, mean arterial blood pressure, use of corticosteroids, and respiratory failure ICU, Intensive Care Unit; GCS, Glasgow Coma Scale; APACHE, acute physiology and chronic health evaluation; SOFA, Sequential Organ Failure Assessment; HR, heart rate, RR, respiratory rate; SpO2, Peripheral Oxygen Saturation; SBP, systolic blood pressure; DBP, diastolic blood pressure; pH, potential of hydrogen; PaCO 2 , arterial partial pressure of carbon dioxide; PaO 2 , arterial partial pressure of oxygen; HCO 3 , bicarbonate ion; CPK b , creatine phosphokinase; LDH c , lactate dehydrogenase; AST d , aspartate aminotransferase; ALP f , alkaline phosphatase; ABG, arterial blood gas; CBC, complete blood count; BUN, blood urea nitrogen; WBC, white blood cell. 3.6 Study quality The results summarizing the risk of bias and applicability of the included models are presented in Table 4 and Supplementary Material Table S1 . In terms of applicability assessment, 15 models were rated as low risk, 6 as high risk, and 5 had unclear risk levels. The studies exhibited a high risk of bias, which was pervasive and primarily driven by critical methodological flaws in two core domains: Analysis and Predictors. Analysis Domain: The most prevalent and critical flaw was the failure to adequately account for overfitting and model optimism. The vast majority of studies relied solely on internal validation techniques [ 2 , 15 , 24 , 25 , 29 , 31 – 34 , 38 – 42 ] . The almost universal lack of external validation means that the reported performance metrics (e.g., high AUC) are likely severely optimistic, and the models' performance in new, independent populations is unknown and unproven. Predictors Domain: Due to the ubiquitous retrospective study design [ 2 , 4 , 15 , 23 , 25 , 29 , 31 – 35 , 37 , 39 , 40 , 42 ] , the assessment of predictors was not performed without knowledge of the outcome data. This introduces a clear measurement bias, as the data collection or extraction process could be influenced by the known outcome status. In addition to the core issues in the Analysis and Predictors domains, other prevalent methodological shortcomings were identified: concerns regarding selection bias arose in the Participants domain due to inappropriate inclusion/exclusion criteria or the failure to include all enrolled participants in the analysis [ 15 , 23 , 25 , 26 , 29 , 37 ] ; within the Analysis domain, the handling of missing data was frequently poorly reported or inappropriate (often rated 'PN' or 'NI'), introducing potential model bias [ 2 , 27 , 30 ] ; furthermore, in the Predictors domain, the use of suboptimal variable selection methods based on univariable analysis in several studies threatened to produce unstable and overly optimistic models [ 24 , 27 , 29 , 33 , 38 ] . The study [ 39 ] by Coombes et al. represented an extreme case of high risk of bias due to a fundamental methodological flaw: using clinician actions (e.g., antipsychotic administration) to define the delirium outcome and then using those same actions as predictors. This circular reasoning resulted in a perfect but entirely meaningless statistical relationship, thereby invalidating its conclusions. Table 4 ROB and clinical applicability of included studies. Authors (year) ROB Applicability Overall Participants Predictors Outcome Analysis Participants Predictors Outcome ROB Applicability Zhang et al. (2023) - - + - + ? + - + Zhang et al. (2021) - - + - + + + - + Wu et al. (2025) - - + - + + + - + Wassenaar et al. (2015) - - + - + + + - + Wassenaar et al. (2018) - ? + - + + + - ? Wang et al. (2020) - + + - + + + - + Boogaard et al. (2014) ? - + ? + + + ? + Tang et al. (2024) - + + - + + + - + Shi et al. (2022) - - + - + + + - + Park et al. (2025) ? + + - + + + - + Miyamoto et al. (2020) - - + ? + + + - ? Ma et al. (2024) ? ? + - + + + - + Ko et al. (2024) ? + + - + + + - ? Kim et al. (2022) - - + - + + + - + Kim et al (2024) ? + + - + + + - ? Hur et al. (2021) ? + + - + + + - + Green et al. (2019) - - + ? + + + - + Gong et al. (2023) ? ? + ? + + + ? + Gao W et al. (2022) ? + - ? - - - + - Fan et al. (2019) - + - ? - - - + - Esumi et al. (2025) + - - + - - - + ? Coombes et al. (2021) + + + + ? + + + + Cherak et al. (2020) ? + - + - - - + - Chen et al. (2017) - + - + - - - + - Bhattachary-ya et al. (2022) ? - - + - - - + - Anton et al. (2024) - ? - ? - - - ? - ROB, risk of bias; +, low risk of bias/low concern regarding applicability; − , high risk of bias/high concern regarding applicability; ?, unclear 4. Discussion This systematic review identified, synthesized, and critically appraised 26 studies for delirium in ICU patients. The findings illustrate a field in rapid evolution, increasingly leveraging machine learning (ML) techniques, yet also reveal profound methodological heterogeneity and widespread risks of bias that substantially limit the clinical applicability and generalizability of existing models. The most frequently identified predictors across models—age, sedation, APACHE-II score, urgent admission, mechanical ventilation, and GCS—are consistent with established clinical knowledge and pathophysiological understanding of delirium. These factors align with previously proposed mechanisms such as neuroinflammation, neurotransmitter dysregulation, and physiological stress, often exacerbated by critical illness and iatrogenic interventions [ 43 , 44 ] . The predominance of logistic regression reflects its interpretability and ease of clinical implementation. However, the superior discrimination performance of certain ML models (e.g., XGBoost, RF, BiLSTM) in multiple studies [ 2 , 4 , 42 ] suggests that complex, non-linear relationships between predictors and delirium outcomes may be better captured by these algorithms. This is particularly relevant in critical care settings where patient data are high-dimensional and interactions between clinical variables are complex [ 45 ] . A central finding of this review is the high risk of bias pervasive across most studies, primarily driven by flaws in the analysis and predictors domains as assessed by PROBAST. The near-universal reliance on internal validation only—often via simple random splitting—fails to account for model optimism and overfitting. This renders reported performance metrics (e.g., AUC values upwards of 0.90) likely inflated and not replicable in external, real-world populations [ 18 , 46 ] . Only four models underwent both internal and external validation [ 1 , 14 , 23 , 35 ] , a fundamental requirement for establishing generalizability [ 47 ] . Furthermore, the handling of missing data was frequently poorly documented or methodologically unsound (e.g., complete case analysis, mean imputation), potentially introducing selection bias and reducing model robustness [ 48 ] . Many studies also relied on univariable screening for predictor selection, a practice strongly discouraged in contemporary prediction modeling as it can lead to unstable models that omit important multivariable relationships [ 45 ] . An extreme example of methodological flaw was illustrated by Coombes et al., where the outcome (delirium) was defined by clinician actions (e.g., antipsychotic use) which were also used as predictors [ 39 ] . This circularity perfectly exemplifies how flawed design can yield statistically significant but clinically meaningless results. Variability in delirium assessment methods (CAM-ICU, ICDSC, RASS) and frequency (ranging from once daily to every 8 hours) across studies introduces significant heterogeneity. This lack of standardization challenges the comparative evaluation of models and their translation into clinical practice. The CAM-ICU, while widely used, has shown variable sensitivity and specificity across different patient subgroups and clinical settings [ 5 , 49 ] . Inconsistent outcome measurement directly impacts model training and validation, potentially leading to misclassification bias and reduced accuracy [ 19 ] . Despite the proliferation of models, their integration into routine ICU workflow remains limited. Many models require predictors that are not routinely collected, are retrospectively derived, or are computationally intensive, creating barriers to real-time point-of-care use [ 50 ] . For instance, models incorporating complex ML algorithms or high-frequency physiological waveform data [ 1 ] may offer high accuracy but face practical challenges related to data integration, processing latency, and clinical interpretability. Moreover, most models are static, predicting delirium at a single time point (e.g., at 24 hours post-ICU admission). Delirium is a dynamic syndrome; therefore, models capable of providing updated predictions based on evolving clinical data—such as the dynamic model developed by Gong et al.—represent a promising direction for future research [ 4 ] . The methodological quality of prediction studies remains concerning despite the availability of reporting guidelines such as TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis) [ 45 ] . Our analysis reveals inconsistent adherence to these standards, particularly in critical areas such as sample size justification, handling of missing data, and model performance reporting. Only 35% of studies provided a sample size calculation or justification, raising concerns about potential overfitting, especially in models with large numbers of predictors. This is particularly problematic for machine learning approaches, which typically require larger sample sizes to achieve stable performance [ 50 ] . The reporting of model performance measures also showed substantial variability. While most studies reported discrimination metrics (typically AUC), fewer reported calibration measures such as calibration plots or slopes. This omission is significant because a model with good discrimination can still produce poorly calibrated predictions that may mislead clinical decision-making [ 51 ] . Recent guidelines emphasize the importance of reporting both discrimination and calibration metrics, along with clinical utility measures such as decision curve analysis [ 52 ] . The variable selection approaches used across studies ranged from hypothesis-driven selection based on clinical knowledge to purely data-driven approaches using machine learning algorithms. While data-driven approaches can identify novel predictors, they risk including clinically implausible or spurious associations, especially when applied to small datasets [ 53 ] . Several studies attempted to balance these approaches by combining clinical expertise with statistical selection methods, but the optimal strategy remains uncertain. The clinical interpretability of models varied considerably. Traditional regression models typically produced odds ratios or risk scores that are intuitively understandable to clinicians. In contrast, many machine learning models function as "black boxes," providing accurate predictions but limited insight into the underlying reasoning [ 54 ] . This represent a significant barrier to clinical adoption, as clinicians may be reluctant to trust predictions without understanding their basis [ 55 ] . Some studies addressed this limitation through feature importance analysis or model simplification, but more work is needed to enhance the interpretability of complex models. The transition from prediction model development to clinical implementation faces numerous challenges. First, many models require data elements that are not routinely collected or documented in structured formats in electronic health records [ 56 ] . This creates additional documentation burdens that may limit practical implementation. Second, the computational requirements of some complex models may exceed the capabilities of existing clinical infrastructure, particularly in resource-limited settings [ 57 ] . The timing of prediction also presents implementation challenges. Most models provide a single prediction at ICU admission, but delirium risk evolves throughout the ICU stay as clinical conditions change. Dynamic models that update predictions based on new information may be more clinically useful but also more complex to implement [ 4 ] . Additionally, the optimal method for presenting risk predictions to clinicians remains uncertain. Should predictions be presented as numerical probabilities, risk categories, or specific recommendations? How should uncertainty be communicated? These questions require further investigation through implementation studies [ 58 ] . The implementation of prediction models raises several ethical considerations. First, there is a risk of algorithmic bias if models perform differently across patient subgroups based on age, gender, race, or socioeconomic status [ 59 ] . Few studies reported testing for such differential performance, highlighting an important gap in the current literature. Second, there is a concern about alert fatigue if models generate excessive false positive predictions [ 56 ] . This is particularly relevant for delirium prediction, given the high prevalence of the condition in ICU settings. Additionally, the use of prediction models could potentially change clinical behavior in unintended ways. For example, clinicians might become over-reliant on model predictions or develop negative expectations about patients predicted to develop delirium [ 57 ] . These potential unintended consequences underscore the importance of studying not only the accuracy of prediction models but also their impact on clinical processes and patient outcomes. Based on our findings, we recommend several priorities for future research. First, there is a critical need for external validation studies of existing models across diverse populations and settings. Rather than developing new models, researchers should prioritize validating and potentially refining existing models. Second, future studies should adhere to established reporting guidelines such as TRIPOD and PROBAST to enhance methodological quality and transparency [ 18 , 60 ] . Third, researchers should explore the integration of novel data sources, such as electronic health record data, physiological waveforms, and potentially biomarkers, while balancing predictive accuracy with clinical feasibility [ 58 ] . Fourth, more attention should be paid to the implementation aspects of prediction models, including development of user-friendly interfaces, integration into clinical workflows, and assessment of impact on patient outcomes [ 59 ] . Finally, there is a need for randomized trials evaluating the impact of prediction model use on clinical processes and patient outcomes. Such studies would provide crucial evidence about whether these models actually improve care rather than simply providing accurate predictions. Several limitations of this systematic review should be acknowledged, as they constrain the interpretation of the results. First of all, The confinement of this systematic review to publications in English constitutes a potential source of language bias, as pertinent studies published in other languages were not considered. Secondly, the substantial methodological and clinical heterogeneity among the included studies, particularly in terms of participant characteristics, predictor selection, delirium assessment methods, and modeling approaches, precluded a quantitative meta-analysis of model performance metrics. Thirdly, the quality assessment relied solely on information reported in the published articles; incomplete reporting may have led to an underestimation of both methodological strengths and weaknesses in some studies. Finally, while comprehensive search strategies were employed, it is possible that some relevant studies were not identified through database searching alone. 5. Conclusions In conclusion, this systematic review demonstrates that while the field of ICU delirium prediction is advancing with increasingly sophisticated modeling approaches, particularly machine learning, the clinical applicability of existing models remains limited. Fundamental methodological shortcomings—including widespread overfitting, inadequate handling of missing data, insufficient external validation, and heterogeneous outcome assessment—substantially constrain the generalizability and real-world utility of these models. Future research should prioritize methodological rigor, robust external validation in diverse populations, and implementation studies that assess not only predictive accuracy but also clinical impact and workflow integration before these tools can be recommended for routine clinical use. Declarations Ethics approval and consent to participate Not applicable. Consent for publication Not applicable. Availability of data and materials Data and materials are provided in the manuscript or the supplementary information file. Competing interests The authors declare no competing interests. Funding No Funding. Authors' contributions Wen-Hua Chen and Lei Ding drafted the main manuscript text; Yue Sha and gongqian lu performed data extraction; Kaimin Qian and Bin Wang analyzed the data; Huiling Wang was responsible for the design and oversight of the study, and critically revised the manuscript. All authors approved the final submitted version. Acknowledgements Not applicable. Clinical trial number Not applicable. References Park C, et al. Development and validation of a machine learning model for early prediction of delirium in intensive care units using continuous physiological data: retrospective study. J Med Internet Res. 2025;27:e59520. 10.2196/59520. Tang D, Ma C, Xu Y. Interpretable machine learning model for early prediction of delirium in elderly patients following intensive care unit admission: a derivation and validation study. Front Med (Lausanne). 2024;11:1399848. 10.3389/fmed.2024.1399848. Stollings JL, et al. Delirium in critical illness: clinical manifestations, outcomes, and management. Intensive Care Med. 2021;47(10):1089-1103. 10.1007/s00134-021-06503-1. Gong KD, et al. Predicting Intensive Care Delirium with Machine Learning: Model Development and External Validation. Anesthesiology. 2023;138(3):299-311. 10.1097/ALN.0000000000004478. Chen TJ, et al. Diagnostic accuracy of the CAM-ICU and ICDSC in detecting intensive care unit delirium: A bivariate meta-analysis. Int J Nurs Stud. 2021;113:103782. 10.1016/j.ijnurstu.2020.103782. Kotfis K, et al. The future of intensive care: delirium should no longer be an issue. Crit Care. 2022;26(1):200. 10.1186/s13054-022-04077-y. van den Boogaard M, et al. Incidence and short-term consequences of delirium in critically ill patients: A prospective observational cohort study. Int J Nurs Stud. 2012;49(7):775-83. 10.1016/j.ijnurstu.2011.11.016. Ko RE, et al. Association between the presence of delirium during intensive care unit admission and cognitive impairment or psychiatric problems: the Korean ICU National Data Study. J Intensive Care. 2022;10(1):7. 10.1186/s40560-022-00598-4. Hofhuis JGM, Schermer T, Spronk PE. Mental health-related quality of life is related to delirium in intensive care patients. Intensive Care Med. 2022;48(9):1197-1205. 10.1007/s00134-022-06841-8. Wilcox ME, Girard TD, Hough CL. Delirium and long term cognition in critically ill patients. BMJ. 2021;373:n1007. 10.1136/bmj.n1007. Devlin JW, et al. Clinical Practice Guidelines for the Prevention and Management of Pain, Agitation/Sedation, Delirium, Immobility, and Sleep Disruption in Adult Patients in the ICU. Crit Care Med. 2018;46(9):e825-e873. 10.1097/CCM.0000000000003299. Kinchin I, et al. The economic cost of delirium: A systematic review and quality assessment. Alzheimers Dement. 2021;17(6):1026-1041. 10.1002/alz.12262. Anton Joseph N, et al. Validation of PRE-DELIRIC and E-PRE-DELIRIC in a Danish population of intensive care unit patients-A prospective observational multicenter study. Acta Anaesthesiol Scand. 2024;68(3):385-393. 10.1111/aas.14363. Wassenaar A, et al. Multinational development and validation of an early prediction model for delirium in ICU patients. Intensive Care Med. 2015;41(6):1048-56. 10.1007/s00134-015-3777-2. Esumi R, et al. Machine Learning-Based Prediction of Delirium and Risk Factor Identification in Intensive Care Unit Patients With Burns: Retrospective Observational Study. JMIR Form Res. 2025;9:e65190. 10.2196/65190. Page MJ, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71. 10.1136/bmj.n71. Moons KG, et al. Critical appraisal and data extraction for systematic reviews of prediction modelling studies: the CHARMS checklist. PLoS Med. 2014;11(10):e1001744. 10.1371/journal.pmed.1001744. Moons KGM, et al. PROBAST: A Tool to Assess Risk of Bias and Applicability of Prediction Model Studies: Explanation and Elaboration. Ann Intern Med. 2019;170(1):W1-W33. 10.7326/M18-1377. Wolff RF, et al. PROBAST: A Tool to Assess the Risk of Bias and Applicability of Prediction Model Studies. Ann Intern Med. 2019;170(1):51-58. 10.7326/M18-1376. Abe T, et al. Development of risk prediction models for incident frailty and their performance evaluation. Prev Med. 2021;153:106768. 10.1016/j.ypmed.2021.106768. Li Q, et al. Risk factors and a nomogram for frailty in Chinese older patients with Alzheimer's disease: A single-center cross-sectional study. Geriatr Nurs. 2022;47:47-54. 10.1016/j.gerinurse.2022.06.012. Liu Q, et al. Development and validation of a preliminary clinical support system for measuring the probability of incident 2-year (pre)frailty among community-dwelling older adults: A prospective cohort study. Int J Med Inform. 2023;177:105138. 10.1016/j.ijmedinf.2023.105138. Zhang Y, et al. Development of a machine learning-based prediction model for sepsis-associated delirium in the intensive care unit. Sci Rep. 2023;13(1):12697. 10.1038/s41598-023-38650-4. Zhang H, et al. Development and validation of a predictive score for ICU delirium in critically ill patients. BMC Anesthesiol. 2021;21(1):37. 10.1186/s12871-021-01259-z. Wu ZB, et al. Enhanced machine learning predictive modeling for delirium in elderly ICU patients with COPD and respiratory failure: A retrospective study based on MIMIC-IV. PLoS One. 2025;20(3):e0319297. 10.1371/journal.pone.0319297. Wassenaar A, et al. Delirium prediction in the intensive care unit: comparison of two delirium prediction models. Crit Care. 2018;22(1):114. 10.1186/s13054-018-2037-6. Wang J, et al. Establishment and validation of a delirium prediction model for neurosurgery patients in intensive care. Int J Nurs Pract. 2020;26(4):e12818. 10.1111/ijn.12818. van den Boogaard M, et al. Recalibration of the delirium prediction model for ICU patients (PRE-DELIRIC): a multinational observational study. Intensive Care Med. 2014;40(3):361-9. 10.1007/s00134-013-3202-7. Shi Y, et al. Nomogram Models for Predicting Delirium of Patients in Emergency Intensive Care Unit: A Retrospective Cohort Study. Int J Gen Med. 2022;15:4259-4272. 10.2147/IJGM.S353318. Miyamoto K, et al. Utility of a prediction model for delirium in intensive care unit patients (PRE-DELIRIC) in mechanically ventilated patients with sepsis. Acute Med Surg. 2020;7(1):e589. 10.1002/ams2.589. Ma R, et al. Machine learning for the prediction of delirium in elderly intensive care unit patients. Eur Geriatr Med. 2024;15(5):1393-1403. 10.1007/s41999-024-01012-y. Ko RE, et al. Machine learning methods for developing a predictive model of the incidence of delirium in cardiac intensive care units. Rev Esp Cardiol (Engl Ed). 2024;77(7):547-555. 10.1016/j.rec.2023.12.007. Kim MK, et al. Development and Validation of Simplified Delirium Prediction Model in Intensive Care Unit. Front Psychiatry. 2022;13:886186. 10.3389/fpsyt.2022.886186. Kim H, et al. Prediction of delirium occurrence using machine learning in acute stroke patients in intensive care unit. Front Neurosci. 2025;18:1425562. 10.3389/fnins.2024.1425562. Hur S, et al. A Machine Learning-Based Algorithm for the Prediction of Intensive Care Unit Delirium (PRIDE): Retrospective Study. JMIR Med Inform. 2021;9(7):e23401. 10.2196/23401. Green C, et al. Prediction of ICU Delirium: Validation of Current Delirium Predictive Models in Routine Clinical Practice. Crit Care Med. 2019;47(3):428-435. 10.1097/CCM.0000000000003577. Gao W, Zhang Y, Jin J. Validation of E-PRE-DELIRIC in cardiac surgical ICU delirium: A retrospective cohort study. Nurs Crit Care. 2022;27(2):233-239. 10.1111/nicc.12674. Fan H, et al. Development and validation of a dynamic delirium prediction rule in patients admitted to the Intensive Care Units (DYNAMIC-ICU): A prospective cohort study. Int J Nurs Stud. 2019;93:64-73. 10.1016/j.ijnurstu.2018.10.008. Coombes CE, Coombes KR, Fareed N. A novel model to label delirium in an intensive care unit from clinician actions. BMC Med Inform Decis Mak. 2021;21(1):97. 10.1186/s12911-021-01461-6. Cherak SJ, et al. Development and validation of delirium prediction model for critically ill adults parameterized to ICU admission acuity. PLoS One. 2020;15(8):e0237639. 10.1371/journal.pone.0237639. Chen Y, et al. Development and validation of risk-stratification delirium prediction model for critically ill patients: A prospective, observational, single-center study. Medicine (Baltimore). 2017;96(29):e7543. 10.1097/MD.0000000000007543. Bhattacharyya A, et al. Delirium prediction in the ICU: designing a screening tool for preventive interventions. JAMIA Open. 2022;5(2):ooac048. 10.1093/jamiaopen/ooac048. Girard TD, et al. Clinical phenotypes of delirium during critical illness and severity of subsequent long-term cognitive impairment: a prospective cohort study. Lancet Respir Med. 2018;6(3):213-222. 10.1016/S2213-2600(18)30062-6. Wilson JE, et al. Delirium. Nat Rev Dis Primers. 2020;6(1):90. 10.1038/s41572-020-00223-4. Sauerbrei W, et al. State of the art in selection of variables and functional forms in multivariable analysis-outstanding issues. Diagn Progn Res. 2020;4:3. 10.1186/s41512-020-00074-3. Steyerberg EW, Vergouwe Y. Towards better clinical prediction models: seven steps for development and an ABCD for validation. Eur Heart J. 2014;35(29):1925-31. 10.1093/eurheartj/ehu207. Collins GS, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ. 2015;350:g7594. 10.1136/bmj.g7594. Sterne JA, et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ. 2009;338:b2393. 10.1136/bmj.b2393. Gusmao-Flores D, et al. The confusion assessment method for the intensive care unit (CAM-ICU) and intensive care delirium screening checklist (ICDSC) for the diagnosis of delirium: a systematic review and meta-analysis of clinical studies. Crit Care. 2012;16(4):R115. 10.1186/cc11407. Wong A, et al. External Validation of a Widely Implemented Proprietary Sepsis Prediction Model in Hospitalized Patients. JAMA Intern Med. 2021;181(8):1065-1070. 10.1001/jamainternmed.2021.2626. Wiens J, et al. Do no harm: a roadmap for responsible machine learning for health care. Nat Med. 2019;25(9):1337-1340. 10.1038/s41591-019-0548-6. Marra A, et al. The ABCDEF Bundle in Critical Care. Crit Care Clin. 2017;33(2):225-243. 10.1016/j.ccc.2016.12.005. Kappen TH, et al. Evaluating the impact of prediction models: lessons learned, challenges, and recommendations. Diagn Progn Res. 2018;2:11. 10.1186/s41512-018-0033-6. Obermeyer Z, et al. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(6464):447-453. 10.1126/science.aax2342. Ancker JS, et al. Effects of workload, work complexity, and repeated alerts on alert fatigue in a clinical decision support system. BMC Med Inform Decis Mak. 2017;17(1):36. 10.1186/s12911-017-0430-8. Cabitza F, Rasoini R, Gensini GF. Unintended Consequences of Machine Learning in Medicine. JAMA. 2017;318(6):517-518. 10.1001/jama.2017.7797. Debray TP, et al. A framework for developing, implementing, and evaluating clinical prediction models in an individual participant data meta-analysis. Stat Med. 2013;32(18):3158-80. 10.1002/sim.5732. Greenhalgh T, et al. Beyond Adoption: A New Framework for Theorizing and Evaluating Nonadoption, Abandonment, and Challenges to the Scale-Up, Spread, and Sustainability of Health and Care Technologies. J Med Internet Res. 2017;19(11):e367. 10.2196/jmir.8775. Steyerberg EW, et al. Prognosis Research Strategy (PROGRESS) 3: prognostic model research. PLoS Med. 2013;10(2):e1001381. 10.1371/journal.pmed.1001381. Collins GS, Moons KGM. Reporting of artificial intelligence prediction models. Lancet. 2019;393(10181):1577-1579. 10.1016/S0140-6736(19)30037-6. Additional Declarations No competing interests reported. Supplementary Files Supplementarymaterial.docx Cite Share Download PDF Status: Published Journal Publication published 13 Dec, 2025 Read the published version in BMC Anesthesiology → Version 1 posted Editorial decision: Revision requested 18 Nov, 2025 Reviews received at journal 17 Nov, 2025 Reviews received at journal 16 Nov, 2025 Reviews received at journal 14 Nov, 2025 Reviewers agreed at journal 04 Nov, 2025 Reviewers agreed at journal 31 Oct, 2025 Reviewers agreed at journal 28 Oct, 2025 Reviewers agreed at journal 28 Oct, 2025 Reviewers invited by journal 28 Oct, 2025 Editor assigned by journal 28 Oct, 2025 Editor invited by journal 27 Oct, 2025 Submission checks completed at journal 23 Oct, 2025 First submitted to journal 23 Oct, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7799974","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":540258659,"identity":"4ccda397-6016-4019-a180-4b0a6024d51c","order_by":0,"name":"Wen-Hua Chen","email":"","orcid":"","institution":"Nantong Haimen People’s Hospital","correspondingAuthor":false,"prefix":"","firstName":"Wen-Hua","middleName":"","lastName":"Chen","suffix":""},{"id":540258660,"identity":"309cbf45-baed-4720-953d-c903a3756c4a","order_by":1,"name":"Lei Ding","email":"","orcid":"","institution":"Nantong Haimen People’s Hospital","correspondingAuthor":false,"prefix":"","firstName":"Lei","middleName":"","lastName":"Ding","suffix":""},{"id":540258661,"identity":"fb3008a2-2620-415e-89f7-f56f60af11c7","order_by":2,"name":"Yue Sha","email":"","orcid":"","institution":"Nantong Haimen People’s Hospital","correspondingAuthor":false,"prefix":"","firstName":"Yue","middleName":"","lastName":"Sha","suffix":""},{"id":540258662,"identity":"85b23bb6-f23e-4247-ae70-912a5118c859","order_by":3,"name":"gongqian lu","email":"","orcid":"","institution":"Nantong Haimen People’s Hospital","correspondingAuthor":false,"prefix":"","firstName":"gongqian","middleName":"","lastName":"lu","suffix":""},{"id":540258663,"identity":"4856765e-9781-428c-8033-35b83e7b018d","order_by":4,"name":"Kaimin Qian","email":"","orcid":"","institution":"Nantong Haimen People’s Hospital","correspondingAuthor":false,"prefix":"","firstName":"Kaimin","middleName":"","lastName":"Qian","suffix":""},{"id":540258664,"identity":"566c0fd0-25d4-4ad9-88ae-73a362d53fde","order_by":5,"name":"Bin Wang","email":"","orcid":"","institution":"Nantong Haimen People’s Hospital","correspondingAuthor":false,"prefix":"","firstName":"Bin","middleName":"","lastName":"Wang","suffix":""},{"id":540258665,"identity":"73d273b5-3057-4d7a-8ef4-db69acc983e2","order_by":6,"name":"Huiling Wang","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAw0lEQVRIiWNgGAWjYBACNmbmw49/8EjI2R9vIFILHztbmjGDjI0xw5kDRGqR4+dRkGawSUtsuJFAtMN4GIwLcg4zNs58vPEGQ41NNBFaeA88nnHmMDOzdFqxBcOxtNwGwlr4Egx4ew6zsUnnmEkwNhwmRguPgQTvv8M8PJJnSNAizcOTJiEhwUO0FrY0wxk8NgYGPEC/JBDjF/n+w4cffOCRqN/AfnjjjQ81NoS1IAMDiQRSlEO0kKpjFIyCUTAKRgYAAO4XNzHegrUzAAAAAElFTkSuQmCC","orcid":"","institution":"Nantong Haimen People’s Hospital","correspondingAuthor":true,"prefix":"","firstName":"Huiling","middleName":"","lastName":"Wang","suffix":""}],"badges":[],"createdAt":"2025-10-07 13:23:30","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7799974/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7799974/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1186/s12871-025-03562-5","type":"published","date":"2025-12-13T15:58:04+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":95358086,"identity":"4e655a7d-cfde-4a92-a858-bc1504b31cba","added_by":"auto","created_at":"2025-11-07 07:03:58","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":537410,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.docx","url":"https://assets-eu.researchsquare.com/files/rs-7799974/v1/6ff39dad4c8ec698f86a361c.docx"},{"id":95524616,"identity":"6cad6a8c-773a-458d-a68f-d03da69aa269","added_by":"auto","created_at":"2025-11-10 10:03:04","extension":"json","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":7504,"visible":true,"origin":"","legend":"","description":"","filename":"74d7fa9899a748d7bf87effc8df7f711.json","url":"https://assets-eu.researchsquare.com/files/rs-7799974/v1/2e000a20f5e8eceeaa46937c.json"},{"id":95525902,"identity":"445987ef-654f-418f-bfd7-3b657c0b1ca4","added_by":"auto","created_at":"2025-11-10 10:05:49","extension":"docx","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":54258,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementarymaterial.docx","url":"https://assets-eu.researchsquare.com/files/rs-7799974/v1/d446a735989ff644807003d9.docx"},{"id":95358091,"identity":"7b35e1d1-e100-4d60-ac1f-4358b6091518","added_by":"auto","created_at":"2025-11-07 07:03:58","extension":"xml","order_by":3,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":214486,"visible":true,"origin":"","legend":"","description":"","filename":"74d7fa9899a748d7bf87effc8df7f7111enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-7799974/v1/35e7ed58c3ff131aec97f727.xml"},{"id":95525966,"identity":"0ce9164b-a27c-420d-ba28-0a57cb8d27d7","added_by":"auto","created_at":"2025-11-10 10:05:58","extension":"png","order_by":4,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":78722,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7799974/v1/371cd71985733060cbab1ace.png"},{"id":95358082,"identity":"92eb3965-8e43-42d8-a003-8f1aa42e25ea","added_by":"auto","created_at":"2025-11-07 07:03:58","extension":"png","order_by":5,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":26161,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-7799974/v1/55a34c88bf3398d39e479723.png"},{"id":95358084,"identity":"827dfcb4-b15b-4c08-b83f-57dda8c99f8a","added_by":"auto","created_at":"2025-11-07 07:03:58","extension":"png","order_by":6,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":21591,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7799974/v1/262d2f1849fa118be2060404.png"},{"id":95526003,"identity":"2ad5a1f8-664d-45bd-902f-76a5e11175d6","added_by":"auto","created_at":"2025-11-10 10:06:01","extension":"png","order_by":7,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":11268,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-7799974/v1/f11967d04c6c00eff2e0cb85.png"},{"id":95358093,"identity":"bc7052cd-6d05-46bd-8ff7-6d8464279e81","added_by":"auto","created_at":"2025-11-07 07:03:59","extension":"xml","order_by":8,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":213068,"visible":true,"origin":"","legend":"","description":"","filename":"74d7fa9899a748d7bf87effc8df7f7111structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-7799974/v1/d9e784cf2e9f61ea6b214bf3.xml"},{"id":95358092,"identity":"561f82c1-8192-49d4-b9c4-283a189c64ad","added_by":"auto","created_at":"2025-11-07 07:03:59","extension":"html","order_by":9,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":218089,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-7799974/v1/f58db7d77d14a2689c07dc9f.html"},{"id":95525085,"identity":"aa69b259-e9d2-49d5-9416-72232fefee95","added_by":"auto","created_at":"2025-11-10 10:04:13","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":97687,"visible":true,"origin":"","legend":"\u003cp\u003eSee image above for figure legend.\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-7799974/v1/0149e53052768dfd6d9af73e.png"},{"id":95358088,"identity":"3f854e6e-6304-42fc-bd23-249264db3cab","added_by":"auto","created_at":"2025-11-07 07:03:58","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":61860,"visible":true,"origin":"","legend":"\u003cp\u003eSee image above for figure legend.\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-7799974/v1/3a66a504d3d3fd783c0b197b.png"},{"id":98243809,"identity":"1d008e93-13ac-4fb0-ad96-811f4bae7149","added_by":"auto","created_at":"2025-12-15 16:10:34","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1807930,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7799974/v1/28b2792d-4743-4dbd-8ce2-e3b5f86f4f02.pdf"},{"id":95358081,"identity":"6f0c1d20-2679-4452-9706-c6db40d4798e","added_by":"auto","created_at":"2025-11-07 07:03:58","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":54258,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementarymaterial.docx","url":"https://assets-eu.researchsquare.com/files/rs-7799974/v1/75462979483af48321bff1be.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"Risk prediction models for delirium in ICU patients: A systematic review and critical appraisal","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eDelirium is an acute neurocognitive disorder, marked by transient and fluctuating impairments in consciousness and awareness, that frequently arises in intensive care unit (ICU) settings\u003csup\u003e[\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]\u003c/sup\u003e. The prevalence of delirium in ICU has been reported to be as high as 60\u0026ndash;80%\u003csup\u003e[\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]\u003c/sup\u003e, contributing to an estimated annual healthcare expenditure of \u003cspan\u003e$\u003c/span\u003e164 billion\u003csup\u003e[\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]\u003c/sup\u003e. Delirium is strongly associated with prolonged mechanical ventilation, extended hospital stays, long-term cognitive impairment, and increased mortality rates\u003csup\u003e[\u003cspan additionalcitationids=\"CR6\" citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]\u003c/sup\u003e. Furthermore, it exacerbates patients' physical and psychological burdens\u003csup\u003e[\u003cspan additionalcitationids=\"CR9\" citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]\u003c/sup\u003e and significantly increases socioeconomic costs for families and healthcare systems\u003csup\u003e[\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e, \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e]\u003c/sup\u003e, posing multidimensional challenges for patients, healthcare providers, and the medical system. Given these circumstances, early identification and intervention are critical. However, heterogeneous clinical manifestations of delirium\u0026mdash;combined wit1h factors such as sedative use and mechanical ventilation that often mask or interfere with symptom presentation\u0026mdash;result in persistently high rates of underdiagnosis and delayed interventions in clinical practice. Therefore, the development of effective risk prediction models to enable accurate identification of high-risk patients has emerged as a pivotal strategy for achieving precise delirium management and improving long-term outcomes in critically ill populations.\u003c/p\u003e\u003cp\u003eIn recent years, researchers have developed multiple risk prediction models for ICU delirium to assist healthcare providers in rapidly screening high-risk populations through quantitative assessment. For instance, the PRE-DELIRIC model incorporates 10 risk factors (e.g., use of sedatives, APACHE-II score, metabolic disorders) to predict delirium risk within 24 hours of ICU admission\u003csup\u003e[\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]\u003c/sup\u003e, while the E-PRE-DELIRIC model further simplifies variables and extends applicability to early admission assessments\u003csup\u003e[\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]\u003c/sup\u003e. Additionally, other models integrate diverse predictors such as age, mechanical ventilation, sedation use, and biomarkers\u003csup\u003e[\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e, \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]\u003c/sup\u003e. These models are constructed using algorithms ranging from logistic regression to machine learning, some of which have undergone validation in specific cohorts. However, marked heterogeneity in variable selection and applicable populations across different models limits their clinical generalizability.\u003c/p\u003e\u003cp\u003eAlthough existing models provide tools for delirium risk stratification, their methodological quality and clinical applicability remain controversial. Due to population heterogeneity, such as differences in patient characteristics between surgical ICU and medical ICU settings, some models demonstrate limited generalizability in external validation. The complexity of variable collection further restricts their practical implementation in clinical practice. Additionally, most models are developed based on single-center, retrospective data with limited sample sizes and insufficient external validation, leading to a high risk of overfitting. In this context, there is an urgent need for a systematic evaluation of the predictive performance and risk of bias in delirium risk prediction models for ICU patients.\u003c/p\u003e\u003cp\u003eTherefore, this review seeks to synthesize evidence from studies that developed or validated ICU delirium risk prediction models, critically appraise existing models, provide evidence-based references for clinical practice, and offer insights to guide future research.\u003c/p\u003e"},{"header":"2. Methods","content":"\u003cp\u003eThis systematic review adheres to the Preferred Reporting of Items in Systematic Reviews and Meta-Analyzes (PRISMA) guidance to ensure transparent reporting of prediction model studies\u003csup\u003e[\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e]\u003c/sup\u003e. The study protocol was prospectively registered on PROSPERO (CRD420251028221).\u003c/p\u003e\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\u003ch2\u003e2.1 Search Strategy\u003c/h2\u003e\u003cp\u003eA systematic literature search was conducted from database inception to April 8, 2025, across PubMed, Embase, Web of Science, and the Cochrane Library, with language restrictions limited to English. The search strategy employed the following Boolean logic: (predict* OR prognos* OR risk OR prediction model) AND (critical care OR intensive care unit OR ICU OR critically ill) AND (delirium OR ICU syndrome OR acute confusional state OR acute brain dysfunction).\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec4\" class=\"Section2\"\u003e\u003ch2\u003e2.2 Eligibility Criteria\u003c/h2\u003e\u003cp\u003eInclusion Criteria: (1) Study design: Cohort studies or case report studies; (2) Study participants: ICU patients (age\u0026thinsp;≧\u0026thinsp;18 years); (3) Study content: development or validation of multivariable prediction models to predict the risk of delirium in ICU patients; (4) Model performance was evaluated by at least one metric, such as the area under the curve (AUC), Hosmer-Lemeshow test, and sensitivity.\u003c/p\u003e\u003cp\u003eExclusion criteria: (1) Reviews, conference abstracts, letters; (2) Studies focusing on pediatric ICU patients, non-ICU settings (such as general ward), or subsyndromic delirium; (3) Studies utilizing univariable prediction models; (4) Studies focused only on risk factors or incidence rates.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec5\" class=\"Section2\"\u003e\u003ch2\u003e2.3 Study selection\u003c/h2\u003e\u003cp\u003eThe initial screening of titles and abstracts was performed independently by two reviewers to identify studies that potentially satisfied the inclusion criteria. Subsequently, full-text assessments of the remaining studies were performed to confirm eligibility. Any discrepancies during the screening process were resolved through discussion or by consulting a third independent reviewer.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec6\" class=\"Section2\"\u003e\u003ch2\u003e2.4 Data Extraction\u003c/h2\u003e\u003cp\u003eTwo independent reviewers extracted data using a pre-designed Excel template based on the CHARMS (Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies) checklist\u003csup\u003e[\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e]\u003c/sup\u003e. Any discrepancies were resolved through discussion or adjudication by a third reviewer. The extracted information included: (1) basic information: authors(years), participant country, study design, participant profile, delirium cases, sample size, outcome measurement methods; (2) model development details: predictors, methods for handling missing data, modeling algorithms, model validation approaches, model performance metrics.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec7\" class=\"Section2\"\u003e\u003ch2\u003e2.5 Risk of Bias and Applicability Assessment\u003c/h2\u003e\u003cp\u003eThe methodological quality of the included studies was evaluated independently by two reviewers with the Prediction Model Risk of Bias Assessment Tool (PROBAST)\u003csup\u003e[\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e, \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e]\u003c/sup\u003e, which evaluates four domains: participant selection, predictors, outcome assessment, and statistical analysis (e.g., handling of missing data, overfitting risk). Each domain was rated as low risk, high risk, or unclear risk of bias (ROB), and an overall judgment was derived from the domain-specific ratings. Discrepancies were resolved through discussion. If consensus could not be reached, a third reviewer was consulted to reach a consensus.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e\u003ch2\u003e2.6 Data synthesis\u003c/h2\u003e\u003cp\u003eDue to substantial heterogeneity in participant characteristics, model predictors, and modeling methodologies, we employed a narrative synthesis approach to summarize study findings, without conducting quantitative analysis. We systematically extracted and summarized the following information from all included models and their development/validation processes: study design, participant characteristics, outcome measurement, predictors, handling of missing data, model development/validation methods, discrimination and calibration results, and model presentation formats. For model discrimination, an AUC or c-index\u0026thinsp;\u0026ge;\u0026thinsp;0.70 was considered indicative of good discriminatory ability\u003csup\u003e[\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e, \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]\u003c/sup\u003e. For calibration, acceptable performance was defined as a Hosmer-Lemeshow test p-value\u0026thinsp;\u0026gt;\u0026thinsp;0.05 or a Brier score\u0026thinsp;\u0026lt;\u0026thinsp;0.25\u003csup\u003e[\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e, \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e]\u003c/sup\u003e.\u003c/p\u003e\u003c/div\u003e"},{"header":"3. Results","content":"\u003cdiv id=\"Sec10\" class=\"Section2\"\u003e\n \u003ch2\u003e3.1 Study Selection\u003c/h2\u003e\n \u003cp\u003eThe initial database search yielded a total of 10563 records. Among these, 5717 were duplicate records. Following the review of titles and abstracts, 4803 records were excluded as they did not meet the inclusion criteria, retaining 43 articles for further assessment. During the full-text screening phase, 17 articles were excluded for the reasons depicted in Fig.\u0026nbsp;1. Consequently, 26 studies were ultimately included\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e4\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e13\u003c/span\u003e\u0026ndash;\u003cspan class=\"CitationRef\"\u003e15\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e23\u003c/span\u003e\u0026ndash;\u003cspan class=\"CitationRef\"\u003e42\u003c/span\u003e]\u003c/sup\u003e. The process of study selection is summarized in Fig. 1.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e\n \u003ch2\u003e3.2 Characteristics of included studies\u003c/h2\u003e\n \u003cp\u003eTable \u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e summarized the key characteristics of the included studies. Published between 2014 and 2025, these studies introduced a total of 25 prediction models. The research encompassed ICU patients from countries including the United States, China, South Korea, Japan, Australia, Canada, and Denmark. Among these, 10 studies were prospective cohort studies, 14 utilized a retrospective cohort design, one employed a combination of prospective and retrospective methods, and one was a case-control study. Nineteen studies assessed delirium using the Confusion Assessment Method for the Intensive Care Unit (CAM-ICU), one study used the Intensive Care Delirium Screening Checklist (ICDSC), three studies employed both CAM-ICU and ICDSC, and one study utilized CAM-ICU alongside the Richmond Agitation-Sedation Scale (RASS). Delirium assessments were conducted twice daily or more frequently in 15 of these studies. Regarding candidate predictors, only the studies by Wassenaar\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e26\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e29\u003c/span\u003e]\u003c/sup\u003e, Shi, and Anton\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e13\u003c/span\u003e]\u003c/sup\u003e did not provide any information on the initial pool of candidate predictors. The studies by Ko\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e32\u003c/span\u003e]\u003c/sup\u003e, Hur\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e35\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e36\u003c/span\u003e]\u003c/sup\u003e, Green, and Gong\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e4\u003c/span\u003e]\u003c/sup\u003e did not report the specific number of candidate predictors considered but did describe their general categories or types. In the remaining studies, the number of candidate predictors ranged from 9 to 70.\u003c/p\u003e\n \u003cdiv class=\"gridtable\"\u003e\n \u003ctable id=\"Tab1\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\n \u003cdiv class=\"CaptionContent\"\u003e\n \u003cp\u003eCharacteristics of included studies.\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eAuthors (years)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eParticipant\u003c/p\u003e\n \u003cp\u003ecountry\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eStudy\u003c/p\u003e\n \u003cp\u003edesign\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eParticipants (male/female)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eDelirium cases\u003c/p\u003e\n \u003cp\u003e/Sample size\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eOutcome\u003c/p\u003e\n \u003cp\u003emeasurement\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eTiming of delirium assessment\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eCandidate predictors\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eZhang et al. (2023)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAmerica\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eRetrospective cohort\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAge\u0026thinsp;≧\u0026thinsp;18, sepsis patients in different ICUs (6102/8518)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e5390/14620\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCAM-ICU\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e53 (age, weight, gender, ethnicity, ICU type, vital signs, laboratory tests, GCS and SOFA scores, treatment measures and comorbidity)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eZhang et al.\u003c/p\u003e\n \u003cp\u003e(2021)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eChina\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eProspective cohort\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAge\u0026thinsp;≧\u0026thinsp;18, patients in mixed ICU (145/78)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e46/223\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCAM-ICU\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eTwice a day (at 9\u0026thinsp;am-11 am and 3 pm-5 pm)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e16 (age, gender, history of hypertension, heart disease, history of pulmonary dysfunction, alcohol abuse, history of nicotine, history of peptic ulcer, hypoxaemia, hypotension, deep sedation, benzodiazepines, mechanical ventilation, metabolic acidosis, sepsis and surgery)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eWu et al. (2025)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAmerica\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eRetrospective cohort\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAge\u0026thinsp;≧\u0026thinsp;65, elderly patients with COPD and respiratory failure in ICU\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e146/1155\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e65 (age, weight, length of hospitalization, clinical scores, vital signs, ventilation mode, and laboratory tests)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eWassenaar et al. (2015)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e7 countries\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eProspective cohort\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAge\u0026thinsp;≧\u0026thinsp;18, patients in ICU (1716/1198)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e689/2914\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCAM-ICU\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eOnce every 8 h or 12 h\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e18 (age, gender, history of cognitive impairment, history of alcohol, nicotine and drugs abuse, history of vascular disease, Glasgow Coma Scale score, diabetes, blood urea nitrogen, use of opiates in the 24 h before ICU admission, use of anti-psychotics before ICU admission, admission category, urgent admission, mean arterial blood pressure, infection, use of corticosteroids, or respiratory failure)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eWassenaar et al. (2018)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e7 countries\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eProspective cohort\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAge\u0026thinsp;≧\u0026thinsp;18, patients in ICU (1324/854)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e467/2178\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCAM-ICU/ ICDSC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAt least every 12 h\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eWang et al. (2020)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eChina\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eProspective cohort\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAge\u0026thinsp;≧\u0026thinsp;18, patients in NICU (140/170)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e118/310\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCAM-ICU\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAt 08:00\u0026ndash;09:00 and 20:00\u0026ndash;21:00\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e28 (General data, condition monitoring indicators, the Glasgow Coma Scale and APACHE II score)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eBoogaard et al. (2014)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e6 countries\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eProspective cohort\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAge\u0026thinsp;≧\u0026thinsp;18, patients in ICU (1040/784)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e363/1824\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCAM-ICU\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAt least twice\u003c/p\u003e\n \u003cp\u003edaily\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e10 (age, APACHE-II, urgent and admission\u003c/p\u003e\n \u003cp\u003ecategory, infection, coma, sedation, morphine use, urea level, metabolic acidosis)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eTang et al. (2024)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAmerica\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eRetrospective cohort\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAge\u0026thinsp;≧\u0026thinsp;65, elderly patients in ICU (5277/4471)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e4243/9748\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCAM-ICU\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e48 (demographic characteristics, admission condition, chronic comorbidities, disease severity scores, vital signs and laboratory indicators)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eShi et al. (2022)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eChina\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eRetrospective cohort\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAge\u0026thinsp;≧\u0026thinsp;18, patients in EICU (204/115)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e96/319\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCAM-ICU/ RASS\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAt every 24 h\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePark et al. (2025)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eKorea\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eRetrospective and prospective cohorts\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAge\u0026thinsp;≧\u0026thinsp;18, patients in ICU (6404/4182)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─/10586\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCAM-ICU\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e19 (demographic variables and parameters derived from ECG lead II, PPG, and respiratory waveforms)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eMiyamoto et al. (2020)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eJapan\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eProspective cohort\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eMechanically ventilated patients with sepsis in ICU (99/59)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e63/158\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCAM-ICU\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAt least once daily\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e10 (age, APACHE II score, presence of coma, admission route, presence of infection, presence of metabolic acidosis, morphine dose on the first day, sedative usage on the first day, blood urea nitrogen, and incidence of urgent admission)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eMa et al. (2024)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAmerica\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eRetrospective cohort\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAge\u0026thinsp;≧\u0026thinsp;65, patients in ICU (10285/8475)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e3463/18760\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCAM-ICU\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e60 (demographic characteristics, vital signs and laboratory indicators)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eKo et al. (2024)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eKorea\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eRetrospective cohort\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAge\u0026thinsp;≧\u0026thinsp;18, patients in CICU (1794/980)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e677/2774\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCAM-ICU\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eThree times a day\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eClinical characteristics, primary diagnoses, vital signs, laboratory test results, and clinical presentations at CICU admission\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eKim et al. (2022)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eKorea\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eRetrospective cohort\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAge\u0026thinsp;≧\u0026thinsp;20, patients in mixed ICU (2246/1451)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e741/3697\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCAM-ICU\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAt 10 am\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e14 (basic information, drug usage, and procedure/intervention application)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eKim et al (2024)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eKorea\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCase-control\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAcute stroke patients in NICU (261/159)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e84/420\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCAM-ICU/ ICDSC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eEvery 8h\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e50 (clinical features at admission and features based on vital signs)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eHur et al. (2021)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eKorea\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eRetrospective cohort\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAge\u0026thinsp;≧\u0026thinsp;18, patients in the medical or surgical ICU (7877/4532)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e3816/12409\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCAM-ICU\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e3 times a day\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eGeneral information, Admission category, Reason for ICU admission, Vital signs, Comorbidity, Laboratory tests, Medications\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eGreen et al. (2019)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAustralia\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eProspective cohort\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePatients in ICU (241/214)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e160/455\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCAM-ICU\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eTwice a day\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eDemographic information, APACHE II and III illness severity scores, a morphine equivalent and blood urea nitrogen\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eGong et al. (2023)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAmerica\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eRetrospective cohort\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePatients in ICU (9616/2536)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2536/18302\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCAM-ICU\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eDemographics, medical history and comorbidities, laboratory studies, medications administered, other treatments, nurse documentation, and physiologic time series\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eGao W et al. (2022)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eChina\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eRetrospective cohort\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAge\u0026thinsp;≧\u0026thinsp;18, patients following cardiac surgeries in CSICU (483/242)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e120/725\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCAM-ICU\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eEvery 12 hours\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e9 (age, history of cognitive impairment and alcohol abuse, admission category (surgical), urgent\u003c/p\u003e\n \u003cp\u003eadmission, MAP and blood urea nitrogen at time of ICU admission, use of corticosteroids, and respiratory failure)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eFan et al. (2019)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eChina\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eProspective cohort\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAge\u0026thinsp;≧\u0026thinsp;18, patients in SICU, TVICU, CCU, RICU (202/134)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e68/336\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCAM-ICU\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAt around 7 AM and 7 PM\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e13 (baseline demographic data, history of chronic\u003c/p\u003e\n \u003cp\u003eDiseases and delirium, history of alcohol drinking or abuse, visual and hearing deficits, disease-related factors, and iatrogenic and environmental factors)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eEsumi et al. (2025)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eJapan\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eRetrospective cohort\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAge\u0026thinsp;≧\u0026thinsp;18, patients with burns in ICU (52/30)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e32/82\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCAM-ICU/ ICDSC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eEvery 8 hours\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e70 (Physiological, biochemical, and clinical data)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCoombes et al. (2021)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAmerica\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eRetrospective cohort\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAge\u0026thinsp;≧\u0026thinsp;18, patients in ICU (27220/21321)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e3850/48541\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e31 (laboratory and imaging orders and 4 medications)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCherak et al. (2020)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCanada\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eRetrospective cohort\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAge\u0026thinsp;≧\u0026thinsp;18, patients in ICU (5113/3765)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e4431/8878\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eICDSC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eTwice a day\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e10 (age, sex, APACHE II score at admission, GCS score at admission, SOFA score at admission, Charlson Comorbidity Index at admission, vasoactive medication receipt within 24 hours of ICU admission, pre-existing neuropsychiatric disorder, continuous renal replacement therapy receipt within 24 hours of ICU admission, and invasive mechanical ventilation receipt within 24 hours of ICU admission)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eChen et al. (2017)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eChina\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eProspective cohort\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAge\u0026thinsp;≧\u0026thinsp;18, patients in ICU (305/315)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e160/620\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCAM-ICU\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAt both 9 am and 5 pm\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e11 (age, APACHE II score, coma, emergency operation, mechanical ventilation, multiple trauma, metabolic acidosis, history of hypertension, history of delirium, history of dementia, and the application of Dexmedetomidine Hydrochloride within 24 hours after admission to the ICU)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eBhattachary-ya et al. (2022)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAmerica\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eRetrospective cohort\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAge\u0026thinsp;≧\u0026thinsp;18, patients in ICU (12384/10456)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e4421/22840\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCAM-ICU\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e21 (demographic data, vital signs, laboratory values, and vasopressor dose that fulfilled above criteria)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAnton et al. (2024)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eDenmark\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eProspective cohort\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAge\u0026thinsp;≧\u0026thinsp;18, patients in four mixed ICUs (395/265)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e247/660\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCAM-ICU\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eTwice daily\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colspan=\"8\"\u003e\n \u003cp\u003eCAM-ICU, Confusion Assessment Method for ICU; ICDSC, Intensive Care Delirium Screening Checklist; ─, no information; ICU, Intensive Care Unit; NICU, Neonatal Intensive Care Unit; EICU, Emergency Intensive Care Unit; CICU, Cardiac Intensive Care Unit; CSICU, Cardiac Surgery Intensive Care Unit; GCS, Glasgow Coma Scale; SOFA, Sequential Organ Failure Assessment; APACHE, Acute Physiology And Chronic Health Evaluation; ECG, Electrocardiogram; PPG, Photoplethysmography; MAP, Mean Arterial Pressure.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec12\" class=\"Section2\"\u003e\n \u003ch2\u003e3.3 Characteristics of Prediction Models\u003c/h2\u003e\n \u003cp\u003eTable \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003e details the characteristics of the risk prediction models for delirium in ICU patients. Regarding the methods for handling missing data, the approaches varied considerably. One model handled missing data by excluding cases\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e1\u003c/span\u003e]\u003c/sup\u003e, and four models used a combination of excluding incomplete cases and multiple imputation methods\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e23\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e25\u003c/span\u003e]\u003c/sup\u003e. One model employed the Multivariate Imputation by Chained Equations (MICE) method\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e34\u003c/span\u003e]\u003c/sup\u003e, one used a combination of filled with mean values and left blank\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e35\u003c/span\u003e]\u003c/sup\u003e, and one model used an unspecified \u0026quot;replacement\u0026quot; method\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e37\u003c/span\u003e]\u003c/sup\u003e. Two models applied mean imputation\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e13\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e38\u003c/span\u003e]\u003c/sup\u003e, and one used a combination of excluding and mean imputation\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e4\u003c/span\u003e]\u003c/sup\u003e. One model utilized forward and backward imputation\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e42\u003c/span\u003e]\u003c/sup\u003e. Four studies reported that either no method was used to handle missing data or that no missing data were present. Additionally, the method for handling missing data was not reported for eight models.\u003c/p\u003e\n \u003cp\u003eUnivariable analysis was the most frequently employed method for predictor screening. Three models used Least Absolute Shrinkage and Selection Operator (LASSO) regression for predictor selection\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e23\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e31\u003c/span\u003e]\u003c/sup\u003e, and one model combined LASSO regression with the optimal subset method\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e25\u003c/span\u003e]\u003c/sup\u003e. One model utilized a combination of Random Forest (RF), extreme gradient boosting, partial least squares, and Plmnet-elastic-net\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e32\u003c/span\u003e]\u003c/sup\u003e. One model employed stepwise selection\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e34\u003c/span\u003e]\u003c/sup\u003e. Furthermore, the predictor pre-screening process was not reported in fourteen studies.\u003c/p\u003e\n \u003cp\u003eThe majority of the included models utilized logistic regression analysis to identify the final predictors. Furthermore, machine learning methods were employed to generate and compare models, including Support Vector Machine (SVM), eXtreme Gradient Boosting (XGBoost), Random Forest (RF), K-Nearest Neighbors (KNN), Decision Tree (DT), Na\u0026iuml;ve Bayes (NB), extra-trees classifier, LightGBM, Deep Neural Network (DNN), neural network, adaptive boosting (AdaBoost), Linear Discriminant Analysis (LDA), Classification and Regression Trees (CART), Bidirectional Long Short-Term Memory (BiLSTM), and CatBoost. After generating and comparing the models, 13 studies demonstrated that the model developed using logistic regression exhibited the best performance\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e14\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e15\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e24\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e27\u003c/span\u003e\u0026ndash;\u003cspan class=\"CitationRef\"\u003e29\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e32\u003c/span\u003e\u0026ndash;\u003cspan class=\"CitationRef\"\u003e34\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e38\u003c/span\u003e\u0026ndash;\u003cspan class=\"CitationRef\"\u003e41\u003c/span\u003e]\u003c/sup\u003e. Four studies found that the model generated by XGBoost performed best\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e23\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e25\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e31\u003c/span\u003e]\u003c/sup\u003e. One study reported that the XGBoost-based model showed superior performance upon internal validation, whereas the RF-based model performed better upon external validation\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e35\u003c/span\u003e]\u003c/sup\u003e. Two other studies indicated that models generated by CatBoost and BiLSTM, respectively, yielded the best performance\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e4\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e42\u003c/span\u003e]\u003c/sup\u003e. An additional five studies did not specify the optimal modeling method.\u003c/p\u003e\n \u003cp\u003eThe final number of predictors included in the models ranged from 5 to 59, encompassing a total of 170 distinct predictor variables. Figure 2 presents the top 21 most frequently occurring predictors. The most common predictor for delirium in ICU patients was age, followed by sedation, APACHE-II score, urgent admission, mechanical ventilation, and GCS score. Other high-frequency predictors included blood urea nitrogen, coma, infection, and metabolic acidosis, each of which was incorporated into eight models. Three predictors\u0026mdash;history of cognitive impairment, urea concentration, and morphine use\u0026mdash;were included in six models. Additionally, eight predictors\u0026mdash;respiratory rate, sex, history of alcohol abuse, admission category, mean arterial pressure, use of corticosteroids, respiratory failure, and heart rate\u0026mdash;were featured in five models each. The specific predictors included in the final models are detailed in Table \u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003e.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec13\" class=\"Section2\"\u003e\n \u003ch2\u003e3.4 Characteristics of model validation\u003c/h2\u003e\n \u003cp\u003eTable \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003e presents the model validation characteristics. Among them, four models were validated both internally and externally\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e14\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e23\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e35\u003c/span\u003e]\u003c/sup\u003e. Fourteen models underwent internal validation only\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e15\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e24\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e25\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e29\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e31\u003c/span\u003e\u0026ndash;\u003cspan class=\"CitationRef\"\u003e34\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e38\u003c/span\u003e\u0026ndash;\u003cspan class=\"CitationRef\"\u003e42\u003c/span\u003e]\u003c/sup\u003e, and seven models were subjected to external validation only\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e4\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e13\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e26\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e27\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e30\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e36\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e37\u003c/span\u003e]\u003c/sup\u003e. Additionally, one model reported no validation following its development. Of the 18 models that completed internal validation, the methods varied: one utilized bootstrapping\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e29\u003c/span\u003e]\u003c/sup\u003e, 13 employed a random split method\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e14\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e15\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e24\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e25\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e31\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e33\u003c/span\u003e\u0026ndash;\u003cspan class=\"CitationRef\"\u003e35\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e38\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e39\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e41\u003c/span\u003e]\u003c/sup\u003e, one used internal cross-validation\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e40\u003c/span\u003e]\u003c/sup\u003e, and one implemented stratified repeated cross-validation\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e42\u003c/span\u003e]\u003c/sup\u003e. The specific internal validation method was not reported for the remaining two models.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec14\" class=\"Section2\"\u003e\n \u003ch2\u003e3.5 Characteristics of model performance\u003c/h2\u003e\n \u003cp\u003eAs detailed in Table \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003e, all included studies reported metrics for model discrimination, which were evaluated using the area under the curve (AUC). Among the 21 studies involving model development, 15 reported discrimination performance, with AUC values ranging from 0.67 to 0.921. Regarding model validation, substantial variation was observed across all included models, with AUC values ranging from 0.63 to 0.932.\u003c/p\u003e\n \u003cp\u003eA total of 14 studies reported on model calibration. Three studies utilized calibration curves for validation, indicating good calibration performance\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e23\u003c/span\u003e]\u003c/sup\u003e. Four studies employed calibration plots for assessment\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e14\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e26\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e30\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e31\u003c/span\u003e]\u003c/sup\u003e; among these, only the study by Miyamoto et al\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e30\u003c/span\u003e]\u003c/sup\u003e. showed inadequate calibration. Four studies evaluated calibration using the Hosmer\u0026ndash;Lemeshow test, with results suggesting well-calibrated models\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e24\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e28\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e37\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e40\u003c/span\u003e]\u003c/sup\u003e. Three studies applied the Brier score as a metric\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e4\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e31\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e35\u003c/span\u003e]\u003c/sup\u003e. Only two studies reported calibration slope values (1.09, 1.07, and 0.63)\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e13\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e28\u003c/span\u003e]\u003c/sup\u003e. Additionally, the study by Anton et al. provided intercept values (0.11 and 0.19)\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e13\u003c/span\u003e]\u003c/sup\u003e.\u003c/p\u003e\n \u003cp\u003eFurthermore, 18 studies reported model performance using metrics including sensitivity, specificity, negative predictive value, positive predictive value, accuracy, the kappa coefficient, precision, recall, the F1-score, and the Matthews correlation coefficient (MCC). To assess net clinical benefit, decision curve analysis was employed in five of the included studies, with favorable outcomes\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e25\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e29\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e31\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e35\u003c/span\u003e]\u003c/sup\u003e.\u003c/p\u003e\n \u003cdiv class=\"gridtable\"\u003e\n \u003ctable id=\"Tab2\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e\n \u003cdiv class=\"CaptionContent\"\u003e\n \u003cp\u003eCharacteristics of delirium risk prediction models for ICU patients.\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eAuthors (years)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eMissing data\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eVariabies selection\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eModeling method\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eModel validation\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eModel\u003c/p\u003e\n \u003cp\u003ename\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eAUC\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eOther indexes\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eCalibration\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eZhang et al. (2023)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eExcluding and multiple imputation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eLASSO regression\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eLR, SVM, XGBoost, RF, KNN, DT, and NB\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eInternal and external validation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eXGBoost\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAUC\u003csub\u003eint\u003c/sub\u003e = 0.793\u003c/p\u003e\n \u003cp\u003eAUC\u003csub\u003eext\u003c/sub\u003e = 0.701\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSe\u003csub\u003eint\u003c/sub\u003e = 0.852, Se\u003csub\u003eext\u003c/sub\u003e = 0.664, Sp\u003csub\u003eint\u003c/sub\u003e = 0.568, Sp\u003csub\u003eext\u003c/sub\u003e = 0.579, PPV\u003csub\u003eint\u003c/sub\u003e = 0.769, PPV\u003csub\u003eext\u003c/sub\u003e = 0.789, NPV\u003csub\u003eint\u003c/sub\u003e = 0.694, NPV\u003csub\u003eext\u003c/sub\u003e = 0.421, Acc\u003csub\u003eint\u003c/sub\u003e = 0.746, Acc\u003csub\u003eext\u003c/sub\u003e = 0.639, Ka\u003csub\u003eint\u003c/sub\u003e = 0.436, Ka\u003csub\u003eext\u003c/sub\u003e = 0.220\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCalibration curve: best fit\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eZhang et al.\u003c/p\u003e\n \u003cp\u003e(2021)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eUR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eLR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eInternal validation\u003c/p\u003e\n \u003cp\u003e(random split)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eLR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAUC\u0026thinsp;=\u0026thinsp;0.862\u003c/p\u003e\n \u003cp\u003eAUC\u003csub\u003eint\u003c/sub\u003e = 0.739\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eH-L test: \u003cem\u003eP\u003c/em\u003e \u0026gt;0.05\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eWu et al. (2025)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eExcluding and multiple imputation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eLASSO regression and the optimal subset method\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eLR, KNN, RF and XGBoost\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eInternal validation\u003c/p\u003e\n \u003cp\u003e(80/20 split ratio)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eXGBoost\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAUC\u0026thinsp;=\u0026thinsp;0.921\u003c/p\u003e\n \u003cp\u003eAUC\u003csub\u003eint\u003c/sub\u003e = 0.932\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAcc\u003csub\u003eint\u003c/sub\u003e = 0.891, F1\u003csub\u003eint\u003c/sub\u003e\u0026thinsp;=\u0026thinsp;0.810,\u003c/p\u003e\n \u003cp\u003ePre\u003csub\u003eint\u003c/sub\u003e = 0.839, Recall\u003csub\u003eint\u003c/sub\u003e = 0.795\u003c/p\u003e\n \u003cp\u003eDCA: favorable results\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eWassenaar et al. (2015)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eMultiple imputation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eLR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eLR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eInternal validation\u003c/p\u003e\n \u003cp\u003e(random split) and external validation (temporal validation)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eE-PRE-DELIRIC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAUC\u0026thinsp;=\u0026thinsp;0.76\u003c/p\u003e\n \u003cp\u003eAUC\u003csub\u003eint\u003c/sub\u003e = 0.75\u003c/p\u003e\n \u003cp\u003eAUC\u003csub\u003eext\u003c/sub\u003e: 0.70\u0026ndash;0.81\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSe\u0026thinsp;=\u0026thinsp;0.71, Sp\u0026thinsp;=\u0026thinsp;0.69,\u003c/p\u003e\n \u003cp\u003eSe\u003csub\u003eext\u003c/sub\u003e: 0.62\u0026ndash;0.78,\u003c/p\u003e\n \u003cp\u003eSp\u003csub\u003eext\u003c/sub\u003e: 0.67\u0026ndash;0.68\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCalibration plots: well\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eWassenaar et al. (2018)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eExternal validation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePRE-DELIRIC and E-PRE-DELIRIC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAUC\u003csub\u003ePRE\u0026minus;DELIRIC\u003c/sub\u003e = 0.76\u003c/p\u003e\n \u003cp\u003eAUC\u003csub\u003eE\u0026minus;PRE\u0026minus;DELIRIC\u003c/sub\u003e = 0.68\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSe\u003csub\u003ePRE\u0026minus;DELIRIC\u003c/sub\u003e = 0.69\u003c/p\u003e\n \u003cp\u003eSp\u003csub\u003ePRE\u0026minus;DELIRIC\u003c/sub\u003e = 0.66\u003c/p\u003e\n \u003cp\u003eSe\u003csub\u003eE\u0026minus;PRE\u0026minus;DELIRIC\u003c/sub\u003e = 0.60\u003c/p\u003e\n \u003cp\u003eSp\u003csub\u003eE\u0026minus;PRE\u0026minus;DELIRIC\u003c/sub\u003e = 0.65\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCalibration plots: well\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eWang et al. (2020)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eUR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eLR (stepwise)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eExternal validation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eLR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAUC\u003csub\u003eext\u003c/sub\u003e = 0.80\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSe\u003csub\u003eext\u003c/sub\u003e = 0.68, Sp\u003csub\u003eext\u003c/sub\u003e = 0.83\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eBoogaard et al. (2014)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eMultiple imputation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eLR (stepwise)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePRE-DELIRIC (Recalibration)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAUC\u0026thinsp;=\u0026thinsp;0.76\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSe\u0026thinsp;=\u0026thinsp;0.70, Sp\u0026thinsp;=\u0026thinsp;0.73\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCalibration slope\u0026thinsp;=\u0026thinsp;1.09 H-L test: \u003cem\u003eP\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.045\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eTang et al. (2024)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eExcluding and multiple imputation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eLASSO regression\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eLR, DT, SVM, XGBoost, KNN, and NB\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eInternal validation\u003c/p\u003e\n \u003cp\u003e(random split)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eXGBoost\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAUC\u0026thinsp;=\u0026thinsp;0.836\u003c/p\u003e\n \u003cp\u003eAUC\u003csub\u003eint\u003c/sub\u003e = 0.810\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAcc\u0026thinsp;=\u0026thinsp;0.765, Se\u0026thinsp;=\u0026thinsp;0.713, F1\u0026thinsp;=\u0026thinsp;0.725, Recall\u0026thinsp;=\u0026thinsp;0.713, Acc\u003csub\u003eint\u003c/sub\u003e = 0.744\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCalibration curve: best fit\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eShi et al. (2022)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eNot handling\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eUR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eLR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eBootstrap internal validation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eBS full and BS stepwise\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAUC\u003csub\u003efull\u003c/sub\u003e = 0.75\u003c/p\u003e\n \u003cp\u003eAUC\u003csub\u003estepwise\u003c/sub\u003e = 0.75\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eDCA: good clinical practicability\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePark et al. (2025)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eExcluding\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eRF, extra-trees classifier and LightGBM\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eInternal validation (8:2 split ratio) and external validation (temporal validation)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eRF\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAUC\u0026thinsp;=\u0026thinsp;0.757\u003c/p\u003e\n \u003cp\u003eAUC\u003csub\u003eint\u003c/sub\u003e = 0.82\u003c/p\u003e\n \u003cp\u003eAUC\u003csub\u003eext\u003c/sub\u003e = 0.82\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eDCA: a greater net benefit\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCalibration curve: strongly agreement\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eMiyamoto et al. (2020)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eExternal validation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePRE-DELIRIC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAUC\u0026thinsp;=\u0026thinsp;0.60\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSe\u0026thinsp;=\u0026thinsp;0.57, Sp\u0026thinsp;=\u0026thinsp;0.68\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCalibration plot: no good calibration\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eMa et al. (2024)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eExcluding and multiple imputation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eLASSO regression\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eLR and XGBoost\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eInternal validation\u003c/p\u003e\n \u003cp\u003e(random split)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eXGBoost\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAUC\u0026thinsp;=\u0026thinsp;0.853\u003c/p\u003e\n \u003cp\u003eAUC\u003csub\u003eint\u003c/sub\u003e = 0.831\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAcc\u0026thinsp;=\u0026thinsp;0.757, Se\u0026thinsp;=\u0026thinsp;0.794, Sp\u0026thinsp;=\u0026thinsp;0.748, F1\u0026thinsp;=\u0026thinsp;0.547, Acc\u003csub\u003eint\u003c/sub\u003e = 0.753, Se\u003csub\u003eint\u003c/sub\u003e = 0.775, Sp\u003csub\u003eint\u003c/sub\u003e = 0.748, F1\u003csub\u003eint\u003c/sub\u003e\u0026thinsp;=\u0026thinsp;0.534, DCA: a higher net beneft\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eBrier score\u0026thinsp;=\u0026thinsp;0.106,\u003c/p\u003e\n \u003cp\u003eBrier score\u003csub\u003eint\u003c/sub\u003e = 0.113\u003c/p\u003e\n \u003cp\u003eCalibration plot: high clinical utility\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eKo et al. (2024)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eRF, extreme gradient boosting, partial least squares, and Plmnet-elastic.net\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eLR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eInternal validation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eLR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAUC\u0026thinsp;=\u0026thinsp;0.860\u003c/p\u003e\n \u003cp\u003eAUC\u003csub\u003eint\u003c/sub\u003e = 0.855\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eKim et al. (2022)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eUnivariable LR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eLR (stepwise) and RF\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eInternal validation\u003c/p\u003e\n \u003cp\u003e(random split)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eLR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAUC\u0026thinsp;=\u0026thinsp;0.820\u003c/p\u003e\n \u003cp\u003eAUC\u003csub\u003eint\u003c/sub\u003e = 0.779\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSe\u003csub\u003eint\u003c/sub\u003e: 0.42, 0.67, 0.86,\u003c/p\u003e\n \u003cp\u003eSp\u003csub\u003eint\u003c/sub\u003e: 0.49, 0.71, 0.90\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eKim et al (2024)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eMICE replacement\u003c/p\u003e\n \u003cp\u003emethod\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eStepwise selection\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eLR, RF, LightGBM, SVM and XGBoost\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eInternal validation\u003c/p\u003e\n \u003cp\u003e(random split)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eLR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAUC\u0026thinsp;=\u0026thinsp;0.80\u003c/p\u003e\n \u003cp\u003eAUC\u003csub\u003eint\u003c/sub\u003e = 0.71\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSe\u0026thinsp;=\u0026thinsp;0.75, Sp\u0026thinsp;=\u0026thinsp;0.72\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eHur et al. (2021)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eFilled with mean values and left blank\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eRF, XGBoost, DNN and LR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eInternal validation (random split) and external validation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePRIDE\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eXGBoost:\u003c/p\u003e\n \u003cp\u003eAUC\u003csub\u003eint\u003c/sub\u003e = 0.919\u003c/p\u003e\n \u003cp\u003eRF:\u003c/p\u003e\n \u003cp\u003eAUC\u003csub\u003eext\u003c/sub\u003e = 0.721\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSe\u003csub\u003eint\u003c/sub\u003e = 0.904, Se\u003csub\u003eext\u003c/sub\u003e = 0.91,\u003c/p\u003e\n \u003cp\u003eSp\u003csub\u003eint\u003c/sub\u003e = 0.731, Sp\u003csub\u003eext\u003c/sub\u003e = 0.27, PPV\u003csub\u003eint\u003c/sub\u003e = 0.565, PPV\u003csub\u003eext\u003c/sub\u003e = 0.159, NPV\u003csub\u003eint\u003c/sub\u003e = 0.952, NPV\u003csub\u003eext\u003c/sub\u003e = 0.952\u003c/p\u003e\n \u003cp\u003eDCA: net benefit\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eBrier score\u003csub\u003eint\u003c/sub\u003e = 0.094,\u003c/p\u003e\n \u003cp\u003eBrier score\u003csub\u003eext\u003c/sub\u003e = 0.168\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eGreen et al. (2019)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eExternal validation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePRE-DELIRIC, recalibrated PRE-DELIRIC, E-PRE-DELIRIC and Lanzhou\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAUC\u003csub\u003ePRE\u0026minus;DELIRIC\u003c/sub\u003e = 0.79\u003c/p\u003e\n \u003cp\u003eAUC\u003csub\u003erecalibrated PRE\u0026minus;DELIRIC\u003c/sub\u003e = 0.79\u003c/p\u003e\n \u003cp\u003eAUC\u003csub\u003eE\u0026minus;PRE\u0026minus;DELIRIC\u003c/sub\u003e = 0.72\u003c/p\u003e\n \u003cp\u003eAUC\u003csub\u003eLanzhou\u003c/sub\u003e = 0.77\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eGong et al. (2023)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eExcluding and mean imputation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCatBoost\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eExternal validation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e24-h model and dynamic model\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e24-h model:\u003c/p\u003e\n \u003cp\u003eAUC\u0026thinsp;=\u0026thinsp;0.785\u003c/p\u003e\n \u003cp\u003eAUC\u003csub\u003eext\u003c/sub\u003e: 0.796, 0.810\u003c/p\u003e\n \u003cp\u003edynamic model:\u003c/p\u003e\n \u003cp\u003eAUC\u0026thinsp;=\u0026thinsp;0.845\u003c/p\u003e\n \u003cp\u003eAUC\u003csub\u003eext\u003c/sub\u003e: 0.804, 0.838\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e24-h model:\u003c/p\u003e\n \u003cp\u003eSe\u0026thinsp;=\u0026thinsp;0.85, Sp\u0026thinsp;=\u0026thinsp;0.556\u003c/p\u003e\n \u003cp\u003edynamic model:\u003c/p\u003e\n \u003cp\u003eSe\u0026thinsp;=\u0026thinsp;0.85, Sp\u0026thinsp;=\u0026thinsp;0.657\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e24-h model:\u003c/p\u003e\n \u003cp\u003eBrier score\u0026thinsp;=\u0026thinsp;0.102,\u003c/p\u003e\n \u003cp\u003eBrier score\u003csub\u003eext\u003c/sub\u003e = 0.105, 0.110\u003c/p\u003e\n \u003cp\u003edynamic model:\u003c/p\u003e\n \u003cp\u003eBrier score\u0026thinsp;=\u0026thinsp;0.111,\u003c/p\u003e\n \u003cp\u003eBrier score\u003csub\u003eext\u003c/sub\u003e = 0.165, 0.132\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eGao W et al. (2022)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eReplacement\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eExternal validation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eE-PRE-DELIRIC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAUC\u0026thinsp;=\u0026thinsp;0.54\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eH-L test: \u003cem\u003eP\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.027\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eFan et al. (2019)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eMean imputation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eUR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eMultiple LR (backward stepwise)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eInternal validation (random split)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eDYNAMIC-ICU\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAUC\u003csub\u003eint\u003c/sub\u003e: 0.907, 0.888, 0.874, 0.900\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eEsumi et al. (2025)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eNot handling\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eLR, RF, SVM, neural network, KNN, DT, NB, AdaBoost, GBM and LDA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eInternal validation (random split)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eLR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAUC\u0026thinsp;=\u0026thinsp;0.906\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eMCC\u0026thinsp;=\u0026thinsp;0.625, Acc\u0026thinsp;=\u0026thinsp;0.818, Pre\u0026thinsp;=\u0026thinsp;0.797, Recall\u0026thinsp;=\u0026thinsp;0.743, F1\u0026thinsp;=\u0026thinsp;0.755\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCoombes et al. (2021)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eNot handling\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eLR, CART, RF, NB and SVM\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eInternal validation (random split)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eLR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAUC\u0026thinsp;=\u0026thinsp;0.83\u003c/p\u003e\n \u003cp\u003eAUC\u003csub\u003eint\u003c/sub\u003e = 0.83\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSe\u003csub\u003eint\u003c/sub\u003e = 0.794, Sp\u003csub\u003eint\u003c/sub\u003e = 0.715,\u003c/p\u003e\n \u003cp\u003ePPV\u003csub\u003eint\u003c/sub\u003e = 0.197, NPV\u003csub\u003eint\u003c/sub\u003e = 0.976\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCherak et al. (2020)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eNot handling\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eLASSO LR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eInternal cross-validation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eLASSO LR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAUC: 0.67\u0026ndash;0.78\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSe:0.532\u0026ndash;0.639\u003c/p\u003e\n \u003cp\u003eSp: 0.690\u0026ndash;0.746\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eH-L test: \u003cem\u003eP\u003c/em\u003e: 0.13\u0026ndash;0.98\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eChen et al. (2017)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eMultiple LR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eInternal validation (random split)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eMultiple LR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAUC\u003csub\u003eint\u003c/sub\u003e = 0.78\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSe\u003csub\u003eint\u003c/sub\u003e: 0.305\u0026ndash;0.756\u003c/p\u003e\n \u003cp\u003eSp\u003csub\u003eint\u003c/sub\u003e: 0.667\u0026ndash;0.982\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eBhattachary-ya et al. (2022)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eForward and backward imputation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eLR, RF and BiLSTM\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eStratified repeated cross-validation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eBiLSTM\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAUC: 0.849\u0026ndash;0.884\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePre: 0.375, 0.175, 0.270, 0.113\u003c/p\u003e\n \u003cp\u003eRecall: 0.861, 0.756, 0.937, 0.926\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAnton et al. (2024)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eMean imputation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eExternal validation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePRE-DELIRIC and E-PRE-DELIRIC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAUC\u003csub\u003ePRE\u0026minus;DELIRIC\u003c/sub\u003e = 0.70\u003c/p\u003e\n \u003cp\u003eAUC\u003csub\u003eE\u0026minus;PRE\u0026minus;DELIRIC\u003c/sub\u003e = 0.63\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e─\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCalibration slope\u003csub\u003ePRE\u0026minus;DELIRIC\u003c/sub\u003e = 1.07\u003c/p\u003e\n \u003cp\u003eIntercept\u003csub\u003ePRE\u0026minus;DELIRIC\u003c/sub\u003e = 0.11\u003c/p\u003e\n \u003cp\u003eCalibration slope\u003csub\u003eE\u0026minus;PRE\u0026minus;DELIRIC\u003c/sub\u003e = 0.63\u003c/p\u003e\n \u003cp\u003eIntercept\u003csub\u003eE\u0026minus;PRE\u0026minus;DELIRIC\u003c/sub\u003e = 0.19\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colspan=\"9\"\u003e\n \u003cp\u003eLASSO, Least Absolute Shrinkage and Selection Operator; LR, logistic regression; SVM, Support Vector Machine; XGBoost ,eXtreme Gradient Boosting; RF, Random Forest; KNN, K-Nearest Neighbors; DT, Decision Tree; NB, Na\u0026iuml;ve Bayes; AUC, area under the curve; Se, sensitivity; Sp, specificity; NPV, negative predictive value; PPV, positive predictive value; Acc, accuracy; Ka, the kappa coefficient, precision; UR, univariate regression; LR, logistic regression; H-L test, Hosmer\u0026ndash;Lemeshow test; DCA, decision curve analysis; BS, backward selection; LightGBM, Light Gradient Boosting Machine; DNN, Deep Neural Network; PRIDE, prediction of ICU delirium; AdaBoost, adaptive boosting; GBM, gradient-boosting machine; LDA, Linear Discriminant Analysis; MCC, the Matthews correlation coefficient; CART, Classification and Regression Trees; BiLSTM, Bidirectional Long Short-Term Memory; MICE, Multivariate Imputation by Chained Equations; ─, no information.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n \u003cdiv class=\"gridtable\"\u003e\n \u003ctable id=\"Tab3\" border=\"1\"\u003e\u003c/table\u003e\n \u003c/div\u003e\n \u003cp\u003eTable 3 Final predictors in risk prediction models for ICU delirium patients.\u003c/p\u003e\n \u003cdiv class=\"gridtable\"\u003e\n \u003ctable id=\"Taba\" border=\"1\"\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eAuthors (years)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eNO. of predictors\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eFinal predictors\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eZhang et al. (2023)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e15\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eMechanical ventilation, cardiovascular ICU (CVICU), GCS score, sedation, acute kidney injury (AKI), temperature, anion gap, blood sodium, vasopressors, respiratory rate, age, stroke, bicarbonate, platelets, and white blood cells\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eZhang et al.\u003c/p\u003e\n \u003cp\u003e(2021)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e6\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eHistory of hypertension, hypoxaemia, use of benzodiazepines, deep sedation, sepsis and mechanical ventilation\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eWu et al. (2025)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e8\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eGCS verbal score, length of hospital stay, mean SpO₂ on the first day of ICU admission, Modification of Diet in Renal Disease (MDRD) equation score, mean diastolic blood pressure, GCS motor score, gender, and duration of noninvasive ventilation\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eWassenaar et al. (2015)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e9\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAge, history of cognitive impairment, history of alcohol abuse, blood urea nitrogen, admission category, urgent admission, mean arterial blood pressure, use of corticosteroids, and respiratory failure\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eWassenaar et al. (2018)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e10/9\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePRE-DELIRIC: age, APACHE-II score, admission group, coma, infection, metabolic acidosis, use of sedatives and morphine, urea concentration, and urgent admission\u003c/p\u003e\n \u003cp\u003eE-PRE-DELIRIC: Age, history of cognitive impairment, history of alcohol abuse, blood urea nitrogen, admission category, urgent admission, mean arterial blood pressure, use of corticosteroids, and respiratory failure\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eWang et al. (2020)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e6\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCognitive dysfunction on admission, fever, hypoalbuminemia, abnormal liver function, sedative use\u0026thinsp;\u0026ge;\u0026thinsp;4 times, and physical restraint\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eBoogaard et al. (2014)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e10\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAge, APACHE-II, urgent and admission category, infection, coma, sedation, morphine use, urea level, metabolic acidosis\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eTang et al. (2024)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e10\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eGCS score, mechanical ventilation, sedation, ICU type, the Acute Physiology Score III (APSIII), temperature, age, diastolic blood pressure, oxyhemoglobin saturation and SOFA score\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eShi et al. (2022)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eStomach and urinary tubes, sedative, mechanical ventilation and APACHE-II scores\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePark et al. (2025)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e8\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAge, sex, ECG-derived features (activity, complexity, mobility, kurtosis, skewness), PPG-derived features (activity, kurtosis, skewness), respiratory waveform-derived features (activity, kurtosis, skewness), HR (median, SD), RR (median, SD), and SpO2 (median, SD)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eMiyamoto et al. (2020)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e10\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAge, APACHE-II score, admission group, coma, infection, metabolic acidosis, use of sedatives and morphine, urea concentration, and urgent admission\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eMa et al. (2024)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e22\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAge, dementia, SOFA, frst care unit, infection, the maximum values of GCS, creatinine, calcium, sodium, heart rate, SBP, DBP, respiratory rate, temperature, the minimum values of hematocrit, platelets, MCHC, creatinine, glucose, potassium, DBP, respiratory rate\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eKo et al. (2024)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e8\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAlbumin level, international normalized ratio, blood urea nitrogen, white blood cell count, C-reactive protein level, age, heart rate, and mechanical ventilation\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eKim et al. (2022)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e6\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eOld age, hospitalization through the emergency room, applying restraint, drainage tube, using benzodiazepines, and some types of opioid analgesics\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eKim et al (2024)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e20\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAge, Sex, Alcohol Intake, National Institute of Health Stroke Scale (NIHSS), HbA1c, Prothrombin time, D-dimer, and Hemoglobin, Mean or Variability indexes calculated from Body Temperature (BT), Heart Rate (HR), Respiratory Rate (RR), Oxygen saturation (SpO2), Systolic Blood Pressure (SBP), and Diastolic Blood Pressure (DBP)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eHur et al. (2021)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e59\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAge, sex, and invasive mechanical ventilation, medical ICU or surgical ICU, respiratory, cardiovascular, gastrointestinal, neurology, perioperative, nephrology, metabolic, and trauma, systolic blood pressure, diastolic blood pressure, heart rate, respiratory rate, peripheral capillary oxygen saturation, and Glasgow Coma Scale (eye, verbal, and motor), charlson Comorbidity Index, white blood count, hemoglobin, hematocrit, platelet count, and erythrocyte sedimentation rate, prothrombin time (INR) and activated partial thromboplastin time, total protein, albumin, total bilirubin, aspartate aminotransferase, alanine aminotransferase, glucose fasting, blood urea nitrogen, creatinine, phosphorus, sodium, potassium, magnesium, calcium (ionized), C-reactive protein quantitative, and lactic acid, pH, PaCO\u003csub\u003e2\u003c/sub\u003e, PaO\u003csub\u003e2\u003c/sub\u003e, HCO\u003csub\u003e3\u003c/sub\u003e and O\u003csub\u003e2\u003c/sub\u003e Saturation\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eGreen et al. (2019)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e10/10/9/11\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePRE-DELIRIC: age, APACHE II score, coma (drug-induced or otherwise), patient classification (medical, surgical,\u003c/p\u003e\n \u003cp\u003etrauma, neurologic), presence of infection, metabolic acidosis, morphine dose, use of sedatives, urea concentration and emergency admission\u003c/p\u003e\n \u003cp\u003erecalibrated PRE-DELIRIC: age, APACHE II score, coma (drug-induced or otherwise), patient classification (medical, surgical, trauma, neurologic), presence of infection, metabolic acidosis, morphine dose, use of sedatives, urea concentration and mergency admission\u003c/p\u003e\n \u003cp\u003eE-PRE-DELIRIC: age, history of cognitive impairment, history of alcohol abuse, patient classification (medical, surgical,\u003c/p\u003e\n \u003cp\u003etrauma, neurologic), mean arterial pressure at ICU admission, use of corticosteroids, presence of respiratory failure, blood urea nitrogen at ICU admission and emergency admission\u003c/p\u003e\n \u003cp\u003eLanzhou: age, APACHE II score, mechanical ventilation, emergency surgery, coma, multiple trauma, metabolic acidosis, history of hypertension, history of delirium, history of dementia and use of dexmedetomidine\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eGong et al. (2023)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e20/20\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e24-h model: mean total glasgow coma score, mean verbal glasgow coma score, age in years, maximum richmond agitation sedation scale, minimum richmond agitation sedation scale, APACHE IV score, mean richmond agitation sedation scale, 24-hour urine output, SOFA respiratory subscore, mean corpuscle volume labs ordered, neuro ICU type, H\u003csub\u003e2\u003c/sub\u003e blocker administrations, minimum temperature, maximum red cell distribution width, maximum temperature, medical-surgical ICU type, maximum potassium, mean blood urea nitrogen, minimum bicarbonate and bicarbonate labs ordered\u003c/p\u003e\n \u003cp\u003edynamic model: current ICU length of stay, last richmond agitation sedation scale, last total glasgow coma score, last verbal glasgow coma score, APACHE IV score, hospital teaching status, age in years, last temperature, last motor glasgow coma score, hospital size, given glucose elevating agents, medical-surgical ICU type, given H\u003csub\u003e2\u003c/sub\u003e blockers, given monoamine oxidase inhibitors, SOFA subscore: nervous, alcohol abuse, given tetracyclic antidepressants, current time of day, given glycopeptides and neurology-related diagnosis\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eGao W et al. (2022)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e9\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAge, history of cognitive impairment, history of alcohol abuse, blood urea nitrogen, admission category, urgent admission, mean arterial blood pressure, use of corticosteroids and respiratory failure\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eFan et al. (2019)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e7\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eHistory of chronic diseases, hearing deficits, infection, higher APACHE II scores, the use of sedatives and analgesics, indwelling catheter, and sleep disturbance\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eEsumi et al. (2025)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e15\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eDaily urinary output, eosinophil count, age, basophil count, fibrinogen, dBP\u003csup\u003ea\u003c/sup\u003e, PO\u003csub\u003e2\u003c/sub\u003e, CPK\u003csup\u003eb\u003c/sup\u003e, LDH\u003csup\u003ec\u003c/sup\u003e, AST\u003csup\u003ed\u003c/sup\u003e, sBP\u003csup\u003ee\u003c/sup\u003e, glucose, height, ALP\u003csup\u003ef\u003c/sup\u003e and burn area\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCoombes et al. (2021)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e17\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eMental status, deliri*, hallucin*, confus*, reorient*, urine culture, ABG\u003csup\u003ea\u003c/sup\u003e, renal function panel, CBC\u003csup\u003eb\u003c/sup\u003e, thyroid function test, toxicology screen, autoimmune serology, B vitamins, HIV antibody, antipsychotics, benzodiazepines and medetomidine\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCherak et al. (2020)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e12\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSex, age, admission type, APACHE II score, GCS, SOFA score, charlson comorbidity Index, pre-existing neuropsychiatric disorder, vasoactive medication use, required continuous renal replacement therapy, required invasive mechanical ventilation and constant\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eChen et al. (2017)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e11\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAge, APACHE II score, mechanical ventilation, emergency operation, Coma, multiple trauma, metabolic acidosis, history of hypertension, history of delirium, history of dementia and dexmedetomidine hydrochloride\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eBhattachary-ya et al. (2022)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e21\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAge, sex, height, weight, heart rate, oxygen saturation, glucose, temperature, serum sodium, BUN, WBC, hemoglobin, platelets, serum potassium, chloride, serum bicarbonate, serum creatinine, ventilation, total norepinephrine dose, SOFA and SOFA without GCS\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAnton et al. (2024)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e10/9\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePRE-DELIRIC: age, APACHE-II score, admission group, coma, infection, metabolic acidosis, use of sedatives and morphine, urea concentration, and urgent admission\u003c/p\u003e\n \u003cp\u003eE-PRE-DELIRIC: Age, history of cognitive impairment, history of alcohol abuse, blood urea nitrogen, admission category, urgent admission, mean arterial blood pressure, use of corticosteroids, and respiratory failure\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colspan=\"3\"\u003e\n \u003cp\u003eICU, Intensive Care Unit; GCS, Glasgow Coma Scale; APACHE, acute physiology and chronic health evaluation; SOFA, Sequential Organ Failure Assessment; HR, heart rate, RR, respiratory rate; SpO2, Peripheral Oxygen Saturation; SBP, systolic blood pressure; DBP, diastolic blood pressure; pH, potential of hydrogen; PaCO\u003csub\u003e2\u003c/sub\u003e, arterial partial pressure of carbon dioxide; PaO\u003csub\u003e2\u003c/sub\u003e, arterial partial pressure of oxygen; HCO\u003csub\u003e3\u003c/sub\u003e, bicarbonate ion; CPK\u003csup\u003eb\u003c/sup\u003e, creatine phosphokinase; LDH\u003csup\u003ec\u003c/sup\u003e, lactate dehydrogenase; AST\u003csup\u003ed\u003c/sup\u003e, aspartate aminotransferase; ALP\u003csup\u003ef\u003c/sup\u003e, alkaline phosphatase; ABG, arterial blood gas; CBC, complete blood count; BUN, blood urea nitrogen; WBC, white blood cell.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec15\" class=\"Section2\"\u003e\n \u003ch2\u003e3.6 Study quality\u003c/h2\u003e\n \u003cp\u003eThe results summarizing the risk of bias and applicability of the included models are presented in Table \u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003e and Supplementary Material Table \u003cspan class=\"InternalRef\"\u003eS1\u003c/span\u003e. In terms of applicability assessment, 15 models were rated as low risk, 6 as high risk, and 5 had unclear risk levels. The studies exhibited a high risk of bias, which was pervasive and primarily driven by critical methodological flaws in two core domains: Analysis and Predictors. Analysis Domain: The most prevalent and critical flaw was the failure to adequately account for overfitting and model optimism. The vast majority of studies relied solely on internal validation techniques\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e15\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e24\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e25\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e29\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e31\u003c/span\u003e\u0026ndash;\u003cspan class=\"CitationRef\"\u003e34\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e38\u003c/span\u003e\u0026ndash;\u003cspan class=\"CitationRef\"\u003e42\u003c/span\u003e]\u003c/sup\u003e. The almost universal lack of external validation means that the reported performance metrics (e.g., high AUC) are likely severely optimistic, and the models\u0026apos; performance in new, independent populations is unknown and unproven. Predictors Domain: Due to the ubiquitous retrospective study design\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e4\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e15\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e23\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e25\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e29\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e31\u003c/span\u003e\u0026ndash;\u003cspan class=\"CitationRef\"\u003e35\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e37\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e39\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e40\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e42\u003c/span\u003e]\u003c/sup\u003e, the assessment of predictors was not performed without knowledge of the outcome data. This introduces a clear measurement bias, as the data collection or extraction process could be influenced by the known outcome status.\u003c/p\u003e\n \u003cp\u003eIn addition to the core issues in the Analysis and Predictors domains, other prevalent methodological shortcomings were identified: concerns regarding selection bias arose in the Participants domain due to inappropriate inclusion/exclusion criteria or the failure to include all enrolled participants in the analysis\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e15\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e23\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e25\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e26\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e29\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e37\u003c/span\u003e]\u003c/sup\u003e; within the Analysis domain, the handling of missing data was frequently poorly reported or inappropriate (often rated \u0026apos;PN\u0026apos; or \u0026apos;NI\u0026apos;), introducing potential model bias\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e27\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e30\u003c/span\u003e]\u003c/sup\u003e; furthermore, in the Predictors domain, the use of suboptimal variable selection methods based on univariable analysis in several studies threatened to produce unstable and overly optimistic models\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e24\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e27\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e29\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e33\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e38\u003c/span\u003e]\u003c/sup\u003e. The study\u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e39\u003c/span\u003e]\u003c/sup\u003e by Coombes et al. represented an extreme case of high risk of bias due to a fundamental methodological flaw: using clinician actions (e.g., antipsychotic administration) to define the delirium outcome and then using those same actions as predictors. This circular reasoning resulted in a perfect but entirely meaningless statistical relationship, thereby invalidating its conclusions.\u003c/p\u003e\n \u003cp\u003eTable 4 ROB and clinical applicability of included studies.\u003c/p\u003e\n \u003ctable border=\"0\" cellspacing=\"0\" cellpadding=\"0\" width=\"645\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"2\" style=\"width: 104px;\"\u003e\n \u003cp\u003eAuthors (year)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd colspan=\"4\" style=\"width: 240px;\"\u003e\n \u003cp\u003eROB\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd colspan=\"3\" style=\"width: 176px;\"\u003e\n \u003cp\u003eApplicability\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd colspan=\"2\" style=\"width: 125px;\"\u003e\n \u003cp\u003eOverall\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 60px;\"\u003e\n \u003cp\u003eParticipants\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 60px;\"\u003e\n \u003cp\u003ePredictors\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 60px;\"\u003e\n \u003cp\u003eOutcome\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 60px;\"\u003e\n \u003cp\u003eAnalysis\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 60px;\"\u003e\n \u003cp\u003eParticipants\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 60px;\"\u003e\n \u003cp\u003ePredictors\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003eOutcome\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 65px;\"\u003e\n \u003cp\u003eROB\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 60px;\"\u003e\n \u003cp\u003eApplicability\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 104px;\"\u003e\n \u003cp\u003eZhang et al. (2023)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e?\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 55px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 65px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 104px;\"\u003e\n \u003cp\u003eZhang et al. (2021)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 55px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 65px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 104px;\"\u003e\n \u003cp\u003eWu et al. (2025)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 55px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 65px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 104px;\"\u003e\n \u003cp\u003eWassenaar et al. (2015)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 55px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 65px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 104px;\"\u003e\n \u003cp\u003eWassenaar et al. (2018)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e?\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 55px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 65px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e?\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 104px;\"\u003e\n \u003cp\u003eWang et al. (2020)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 55px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 65px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 104px;\"\u003e\n \u003cp\u003eBoogaard et al. (2014)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e?\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e?\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 55px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 65px;\"\u003e\n \u003cp\u003e?\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 104px;\"\u003e\n \u003cp\u003eTang et al. (2024)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 55px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 65px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 104px;\"\u003e\n \u003cp\u003eShi et al. (2022)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 55px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 65px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 104px;\"\u003e\n \u003cp\u003ePark et al. (2025)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e?\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 55px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 65px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 104px;\"\u003e\n \u003cp\u003eMiyamoto et al. (2020)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e?\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 55px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 65px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e?\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 104px;\"\u003e\n \u003cp\u003eMa et al. (2024)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e?\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e?\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 55px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 65px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 104px;\"\u003e\n \u003cp\u003eKo et al. (2024)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e?\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 55px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 65px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e?\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 104px;\"\u003e\n \u003cp\u003eKim et al. (2022)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 55px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 65px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 104px;\"\u003e\n \u003cp\u003eKim et al (2024)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e?\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 55px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 65px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e?\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 104px;\"\u003e\n \u003cp\u003eHur et al. (2021)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e?\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 55px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 65px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 104px;\"\u003e\n \u003cp\u003eGreen et al. (2019)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e?\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 55px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 65px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 104px;\"\u003e\n \u003cp\u003eGong et al. (2023)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e?\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e?\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e?\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 55px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 65px;\"\u003e\n \u003cp\u003e?\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 104px;\"\u003e\n \u003cp\u003eGao W et al. (2022)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e?\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e?\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 55px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 65px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 104px;\"\u003e\n \u003cp\u003eFan et al. (2019)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e?\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 55px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 65px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 104px;\"\u003e\n \u003cp\u003eEsumi et al. (2025)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 55px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 65px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e?\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 104px;\"\u003e\n \u003cp\u003eCoombes et al. (2021)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e?\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 55px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 65px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 104px;\"\u003e\n \u003cp\u003eCherak et al. (2020)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e?\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 55px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 65px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 104px;\"\u003e\n \u003cp\u003eChen et al. (2017)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 55px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 65px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 104px;\"\u003e\n \u003cp\u003eBhattachary-ya et al. (2022)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e?\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 55px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 65px;\"\u003e\n \u003cp\u003e+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 104px;\"\u003e\n \u003cp\u003eAnton et al. (2024)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e?\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e?\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 55px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 65px;\"\u003e\n \u003cp\u003e?\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"10\" valign=\"top\" style=\"width: 645px;\"\u003e\n \u003cp\u003eROB, risk of bias; +, low risk of bias/low concern regarding applicability; \u0026minus; , high risk of bias/high concern regarding applicability; ?, unclear\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e"},{"header":"4. Discussion","content":"\u003cp\u003eThis systematic review identified, synthesized, and critically appraised 26 studies for delirium in ICU patients. The findings illustrate a field in rapid evolution, increasingly leveraging machine learning (ML) techniques, yet also reveal profound methodological heterogeneity and widespread risks of bias that substantially limit the clinical applicability and generalizability of existing models.\u003c/p\u003e\u003cp\u003eThe most frequently identified predictors across models\u0026mdash;age, sedation, APACHE-II score, urgent admission, mechanical ventilation, and GCS\u0026mdash;are consistent with established clinical knowledge and pathophysiological understanding of delirium. These factors align with previously proposed mechanisms such as neuroinflammation, neurotransmitter dysregulation, and physiological stress, often exacerbated by critical illness and iatrogenic interventions\u003csup\u003e[\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e, \u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e]\u003c/sup\u003e. The predominance of logistic regression reflects its interpretability and ease of clinical implementation. However, the superior discrimination performance of certain ML models (e.g., XGBoost, RF, BiLSTM) in multiple studies\u003csup\u003e[\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e, \u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e]\u003c/sup\u003e suggests that complex, non-linear relationships between predictors and delirium outcomes may be better captured by these algorithms. This is particularly relevant in critical care settings where patient data are high-dimensional and interactions between clinical variables are complex\u003csup\u003e[\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e]\u003c/sup\u003e.\u003c/p\u003e\u003cp\u003eA central finding of this review is the high risk of bias pervasive across most studies, primarily driven by flaws in the analysis and predictors domains as assessed by PROBAST. The near-universal reliance on internal validation only\u0026mdash;often via simple random splitting\u0026mdash;fails to account for model optimism and overfitting. This renders reported performance metrics (e.g., AUC values upwards of 0.90) likely inflated and not replicable in external, real-world populations\u003csup\u003e[\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e, \u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e]\u003c/sup\u003e. Only four models underwent both internal and external validation\u003csup\u003e[\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e, \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e, \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e]\u003c/sup\u003e, a fundamental requirement for establishing generalizability\u003csup\u003e[\u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e]\u003c/sup\u003e.\u003c/p\u003e\u003cp\u003eFurthermore, the handling of missing data was frequently poorly documented or methodologically unsound (e.g., complete case analysis, mean imputation), potentially introducing selection bias and reducing model robustness\u003csup\u003e[\u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e]\u003c/sup\u003e. Many studies also relied on univariable screening for predictor selection, a practice strongly discouraged in contemporary prediction modeling as it can lead to unstable models that omit important multivariable relationships\u003csup\u003e[\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e]\u003c/sup\u003e. An extreme example of methodological flaw was illustrated by Coombes et al., where the outcome (delirium) was defined by clinician actions (e.g., antipsychotic use) which were also used as predictors\u003csup\u003e[\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e]\u003c/sup\u003e. This circularity perfectly exemplifies how flawed design can yield statistically significant but clinically meaningless results.\u003c/p\u003e\u003cp\u003eVariability in delirium assessment methods (CAM-ICU, ICDSC, RASS) and frequency (ranging from once daily to every 8 hours) across studies introduces significant heterogeneity. This lack of standardization challenges the comparative evaluation of models and their translation into clinical practice. The CAM-ICU, while widely used, has shown variable sensitivity and specificity across different patient subgroups and clinical settings\u003csup\u003e[\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e, \u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e]\u003c/sup\u003e. Inconsistent outcome measurement directly impacts model training and validation, potentially leading to misclassification bias and reduced accuracy\u003csup\u003e[\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e]\u003c/sup\u003e.\u003c/p\u003e\u003cp\u003eDespite the proliferation of models, their integration into routine ICU workflow remains limited. Many models require predictors that are not routinely collected, are retrospectively derived, or are computationally intensive, creating barriers to real-time point-of-care use\u003csup\u003e[\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e]\u003c/sup\u003e. For instance, models incorporating complex ML algorithms or high-frequency physiological waveform data\u003csup\u003e[\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]\u003c/sup\u003e may offer high accuracy but face practical challenges related to data integration, processing latency, and clinical interpretability. Moreover, most models are static, predicting delirium at a single time point (e.g., at 24 hours post-ICU admission). Delirium is a dynamic syndrome; therefore, models capable of providing updated predictions based on evolving clinical data\u0026mdash;such as the dynamic model developed by Gong et al.\u0026mdash;represent a promising direction for future research\u003csup\u003e[\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]\u003c/sup\u003e.\u003c/p\u003e\u003cp\u003eThe methodological quality of prediction studies remains concerning despite the availability of reporting guidelines such as TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis)\u003csup\u003e[\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e]\u003c/sup\u003e. Our analysis reveals inconsistent adherence to these standards, particularly in critical areas such as sample size justification, handling of missing data, and model performance reporting. Only 35% of studies provided a sample size calculation or justification, raising concerns about potential overfitting, especially in models with large numbers of predictors. This is particularly problematic for machine learning approaches, which typically require larger sample sizes to achieve stable performance\u003csup\u003e[\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e]\u003c/sup\u003e. The reporting of model performance measures also showed substantial variability. While most studies reported discrimination metrics (typically AUC), fewer reported calibration measures such as calibration plots or slopes. This omission is significant because a model with good discrimination can still produce poorly calibrated predictions that may mislead clinical decision-making\u003csup\u003e[\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e]\u003c/sup\u003e. Recent guidelines emphasize the importance of reporting both discrimination and calibration metrics, along with clinical utility measures such as decision curve analysis\u003csup\u003e[\u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e]\u003c/sup\u003e.\u003c/p\u003e\u003cp\u003eThe variable selection approaches used across studies ranged from hypothesis-driven selection based on clinical knowledge to purely data-driven approaches using machine learning algorithms. While data-driven approaches can identify novel predictors, they risk including clinically implausible or spurious associations, especially when applied to small datasets\u003csup\u003e[\u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e53\u003c/span\u003e]\u003c/sup\u003e. Several studies attempted to balance these approaches by combining clinical expertise with statistical selection methods, but the optimal strategy remains uncertain. The clinical interpretability of models varied considerably. Traditional regression models typically produced odds ratios or risk scores that are intuitively understandable to clinicians. In contrast, many machine learning models function as \"black boxes,\" providing accurate predictions but limited insight into the underlying reasoning\u003csup\u003e[\u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e54\u003c/span\u003e]\u003c/sup\u003e. This represent a significant barrier to clinical adoption, as clinicians may be reluctant to trust predictions without understanding their basis\u003csup\u003e[\u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e55\u003c/span\u003e]\u003c/sup\u003e. Some studies addressed this limitation through feature importance analysis or model simplification, but more work is needed to enhance the interpretability of complex models.\u003c/p\u003e\u003cp\u003eThe transition from prediction model development to clinical implementation faces numerous challenges. First, many models require data elements that are not routinely collected or documented in structured formats in electronic health records\u003csup\u003e[\u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e56\u003c/span\u003e]\u003c/sup\u003e. This creates additional documentation burdens that may limit practical implementation. Second, the computational requirements of some complex models may exceed the capabilities of existing clinical infrastructure, particularly in resource-limited settings\u003csup\u003e[\u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e57\u003c/span\u003e]\u003c/sup\u003e. The timing of prediction also presents implementation challenges. Most models provide a single prediction at ICU admission, but delirium risk evolves throughout the ICU stay as clinical conditions change. Dynamic models that update predictions based on new information may be more clinically useful but also more complex to implement\u003csup\u003e[\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]\u003c/sup\u003e. Additionally, the optimal method for presenting risk predictions to clinicians remains uncertain. Should predictions be presented as numerical probabilities, risk categories, or specific recommendations? How should uncertainty be communicated? These questions require further investigation through implementation studies\u003csup\u003e[\u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e58\u003c/span\u003e]\u003c/sup\u003e.\u003c/p\u003e\u003cp\u003eThe implementation of prediction models raises several ethical considerations. First, there is a risk of algorithmic bias if models perform differently across patient subgroups based on age, gender, race, or socioeconomic status\u003csup\u003e[\u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e59\u003c/span\u003e]\u003c/sup\u003e. Few studies reported testing for such differential performance, highlighting an important gap in the current literature. Second, there is a concern about alert fatigue if models generate excessive false positive predictions\u003csup\u003e[\u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e56\u003c/span\u003e]\u003c/sup\u003e. This is particularly relevant for delirium prediction, given the high prevalence of the condition in ICU settings. Additionally, the use of prediction models could potentially change clinical behavior in unintended ways. For example, clinicians might become over-reliant on model predictions or develop negative expectations about patients predicted to develop delirium\u003csup\u003e[\u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e57\u003c/span\u003e]\u003c/sup\u003e. These potential unintended consequences underscore the importance of studying not only the accuracy of prediction models but also their impact on clinical processes and patient outcomes.\u003c/p\u003e\u003cp\u003eBased on our findings, we recommend several priorities for future research. First, there is a critical need for external validation studies of existing models across diverse populations and settings. Rather than developing new models, researchers should prioritize validating and potentially refining existing models. Second, future studies should adhere to established reporting guidelines such as TRIPOD and PROBAST to enhance methodological quality and transparency\u003csup\u003e[\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e, \u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e60\u003c/span\u003e]\u003c/sup\u003e. Third, researchers should explore the integration of novel data sources, such as electronic health record data, physiological waveforms, and potentially biomarkers, while balancing predictive accuracy with clinical feasibility\u003csup\u003e[\u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e58\u003c/span\u003e]\u003c/sup\u003e. Fourth, more attention should be paid to the implementation aspects of prediction models, including development of user-friendly interfaces, integration into clinical workflows, and assessment of impact on patient outcomes\u003csup\u003e[\u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e59\u003c/span\u003e]\u003c/sup\u003e. Finally, there is a need for randomized trials evaluating the impact of prediction model use on clinical processes and patient outcomes. Such studies would provide crucial evidence about whether these models actually improve care rather than simply providing accurate predictions.\u003c/p\u003e\u003cp\u003eSeveral limitations of this systematic review should be acknowledged, as they constrain the interpretation of the results. First of all, The confinement of this systematic review to publications in English constitutes a potential source of language bias, as pertinent studies published in other languages were not considered. Secondly, the substantial methodological and clinical heterogeneity among the included studies, particularly in terms of participant characteristics, predictor selection, delirium assessment methods, and modeling approaches, precluded a quantitative meta-analysis of model performance metrics. Thirdly, the quality assessment relied solely on information reported in the published articles; incomplete reporting may have led to an underestimation of both methodological strengths and weaknesses in some studies. Finally, while comprehensive search strategies were employed, it is possible that some relevant studies were not identified through database searching alone.\u003c/p\u003e"},{"header":"5. Conclusions","content":"\u003cp\u003eIn conclusion, this systematic review demonstrates that while the field of ICU delirium prediction is advancing with increasingly sophisticated modeling approaches, particularly machine learning, the clinical applicability of existing models remains limited. Fundamental methodological shortcomings\u0026mdash;including widespread overfitting, inadequate handling of missing data, insufficient external validation, and heterogeneous outcome assessment\u0026mdash;substantially constrain the generalizability and real-world utility of these models. Future research should prioritize methodological rigor, robust external validation in diverse populations, and implementation studies that assess not only predictive accuracy but also clinical impact and workflow integration before these tools can be recommended for routine clinical use.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003eEthics approval and consent to participate\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003eConsent for publication\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003eAvailability of data and materials\u003c/p\u003e\n\u003cp\u003eData and materials are provided in the manuscript or the supplementary information file.\u003c/p\u003e\n\u003cp\u003eCompeting interests\u003c/p\u003e\n\u003cp\u003eThe authors declare no competing interests.\u003c/p\u003e\n\u003cp\u003eFunding\u003c/p\u003e\n\u003cp\u003eNo Funding.\u003c/p\u003e\n\u003cp\u003eAuthors' contributions\u003c/p\u003e\n\u003cp\u003eWen-Hua Chen and Lei Ding drafted the main manuscript text; Yue Sha and gongqian lu performed data extraction; Kaimin Qian and Bin Wang analyzed the data; Huiling Wang was responsible for the design and oversight of the study, and critically revised the manuscript. All authors approved the final submitted version.\u003c/p\u003e\n\u003cp\u003eAcknowledgements\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003eClinical trial number\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003ePark C, et al. Development and validation of a machine learning model for early prediction of delirium in intensive care units using continuous physiological data: retrospective study. J Med Internet Res. 2025;27:e59520. 10.2196/59520.\u003c/li\u003e\n\u003cli\u003eTang D, Ma C, Xu Y. Interpretable machine learning model for early prediction of delirium in elderly patients following intensive care unit admission: a derivation and validation study. Front Med (Lausanne). 2024;11:1399848. 10.3389/fmed.2024.1399848.\u003c/li\u003e\n\u003cli\u003eStollings JL, et al. Delirium in critical illness: clinical manifestations, outcomes, and management. Intensive Care Med. 2021;47(10):1089-1103. 10.1007/s00134-021-06503-1.\u003c/li\u003e\n\u003cli\u003eGong KD, et al. Predicting Intensive Care Delirium with Machine Learning: Model Development and External Validation. Anesthesiology. 2023;138(3):299-311. 10.1097/ALN.0000000000004478.\u003c/li\u003e\n\u003cli\u003eChen TJ, et al. Diagnostic accuracy of the CAM-ICU and ICDSC in detecting intensive care unit delirium: A bivariate meta-analysis. Int J Nurs Stud. 2021;113:103782. 10.1016/j.ijnurstu.2020.103782.\u003c/li\u003e\n\u003cli\u003eKotfis K, et al. The future of intensive care: delirium should no longer be an issue. Crit Care. 2022;26(1):200. 10.1186/s13054-022-04077-y.\u003c/li\u003e\n\u003cli\u003evan den Boogaard M, et al. Incidence and short-term consequences of delirium in critically ill patients: A prospective observational cohort study. Int J Nurs Stud. 2012;49(7):775-83. 10.1016/j.ijnurstu.2011.11.016.\u003c/li\u003e\n\u003cli\u003eKo RE, et al. Association between the presence of delirium during intensive care unit admission and cognitive impairment or psychiatric problems: the Korean ICU National Data Study. J Intensive Care. 2022;10(1):7. 10.1186/s40560-022-00598-4.\u003c/li\u003e\n\u003cli\u003eHofhuis JGM, Schermer T, Spronk PE. Mental health-related quality of life is related to delirium in intensive care patients. Intensive Care Med. 2022;48(9):1197-1205. 10.1007/s00134-022-06841-8.\u003c/li\u003e\n\u003cli\u003eWilcox ME, Girard TD, Hough CL. Delirium and long term cognition in critically ill patients. BMJ. 2021;373:n1007. 10.1136/bmj.n1007.\u003c/li\u003e\n\u003cli\u003eDevlin JW, et al. Clinical Practice Guidelines for the Prevention and Management of Pain, Agitation/Sedation, Delirium, Immobility, and Sleep Disruption in Adult Patients in the ICU. Crit Care Med. 2018;46(9):e825-e873. 10.1097/CCM.0000000000003299.\u003c/li\u003e\n\u003cli\u003eKinchin I, et al. The economic cost of delirium: A systematic review and quality assessment. Alzheimers Dement. 2021;17(6):1026-1041. 10.1002/alz.12262.\u003c/li\u003e\n\u003cli\u003eAnton Joseph N, et al. Validation of PRE-DELIRIC and E-PRE-DELIRIC in a Danish population of intensive care unit patients-A prospective observational multicenter study. Acta Anaesthesiol Scand. 2024;68(3):385-393. 10.1111/aas.14363.\u003c/li\u003e\n\u003cli\u003eWassenaar A, et al. Multinational development and validation of an early prediction model for delirium in ICU patients. Intensive Care Med. 2015;41(6):1048-56. 10.1007/s00134-015-3777-2.\u003c/li\u003e\n\u003cli\u003eEsumi R, et al. Machine Learning-Based Prediction of Delirium and Risk Factor Identification in Intensive Care Unit Patients With Burns: Retrospective Observational Study. JMIR Form Res. 2025;9:e65190. 10.2196/65190.\u003c/li\u003e\n\u003cli\u003ePage MJ, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71. 10.1136/bmj.n71.\u003c/li\u003e\n\u003cli\u003eMoons KG, et al. Critical appraisal and data extraction for systematic reviews of prediction modelling studies: the CHARMS checklist. PLoS Med. 2014;11(10):e1001744. 10.1371/journal.pmed.1001744.\u003c/li\u003e\n\u003cli\u003eMoons KGM, et al. PROBAST: A Tool to Assess Risk of Bias and Applicability of Prediction Model Studies: Explanation and Elaboration. Ann Intern Med. 2019;170(1):W1-W33. 10.7326/M18-1377.\u003c/li\u003e\n\u003cli\u003eWolff RF, et al. PROBAST: A Tool to Assess the Risk of Bias and Applicability of Prediction Model Studies. Ann Intern Med. 2019;170(1):51-58. 10.7326/M18-1376.\u003c/li\u003e\n\u003cli\u003eAbe T, et al. Development of risk prediction models for incident frailty and their performance evaluation. Prev Med. 2021;153:106768. 10.1016/j.ypmed.2021.106768.\u003c/li\u003e\n\u003cli\u003eLi Q, et al. Risk factors and a nomogram for frailty in Chinese older patients with Alzheimer\u0026apos;s disease: A single-center cross-sectional study. Geriatr Nurs. 2022;47:47-54. 10.1016/j.gerinurse.2022.06.012.\u003c/li\u003e\n\u003cli\u003eLiu Q, et al. Development and validation of a preliminary clinical support system for measuring the probability of incident 2-year (pre)frailty among community-dwelling older adults: A prospective cohort study. Int J Med Inform. 2023;177:105138. 10.1016/j.ijmedinf.2023.105138.\u003c/li\u003e\n\u003cli\u003eZhang Y, et al. Development of a machine learning-based prediction model for sepsis-associated delirium in the intensive care unit. Sci Rep. 2023;13(1):12697. 10.1038/s41598-023-38650-4.\u003c/li\u003e\n\u003cli\u003eZhang H, et al. Development and validation of a predictive score for ICU delirium in critically ill patients. BMC Anesthesiol. 2021;21(1):37. 10.1186/s12871-021-01259-z.\u003c/li\u003e\n\u003cli\u003eWu ZB, et al. Enhanced machine learning predictive modeling for delirium in elderly ICU patients with COPD and respiratory failure: A retrospective study based on MIMIC-IV. PLoS One. 2025;20(3):e0319297. 10.1371/journal.pone.0319297.\u003c/li\u003e\n\u003cli\u003eWassenaar A, et al. Delirium prediction in the intensive care unit: comparison of two delirium prediction models. Crit Care. 2018;22(1):114. 10.1186/s13054-018-2037-6.\u003c/li\u003e\n\u003cli\u003eWang J, et al. Establishment and validation of a delirium prediction model for neurosurgery patients in intensive care. Int J Nurs Pract. 2020;26(4):e12818. 10.1111/ijn.12818.\u003c/li\u003e\n\u003cli\u003evan den Boogaard M, et al. Recalibration of the delirium prediction model for ICU patients (PRE-DELIRIC): a multinational observational study. Intensive Care Med. 2014;40(3):361-9. 10.1007/s00134-013-3202-7.\u003c/li\u003e\n\u003cli\u003eShi Y, et al. Nomogram Models for Predicting Delirium of Patients in Emergency Intensive Care Unit: A Retrospective Cohort Study. Int J Gen Med. 2022;15:4259-4272. 10.2147/IJGM.S353318.\u003c/li\u003e\n\u003cli\u003eMiyamoto K, et al. Utility of a prediction model for delirium in intensive care unit patients (PRE-DELIRIC) in mechanically ventilated patients with sepsis. Acute Med Surg. 2020;7(1):e589. 10.1002/ams2.589.\u003c/li\u003e\n\u003cli\u003eMa R, et al. Machine learning for the prediction of delirium in elderly intensive care unit patients. Eur Geriatr Med. 2024;15(5):1393-1403. 10.1007/s41999-024-01012-y.\u003c/li\u003e\n\u003cli\u003eKo RE, et al. Machine learning methods for developing a predictive model of the incidence of delirium in cardiac intensive care units. Rev Esp Cardiol (Engl Ed). 2024;77(7):547-555. 10.1016/j.rec.2023.12.007.\u003c/li\u003e\n\u003cli\u003eKim MK, et al. Development and Validation of Simplified Delirium Prediction Model in Intensive Care Unit. Front Psychiatry. 2022;13:886186. 10.3389/fpsyt.2022.886186.\u003c/li\u003e\n\u003cli\u003eKim H, et al. Prediction of delirium occurrence using machine learning in acute stroke patients in intensive care unit. Front Neurosci. 2025;18:1425562. 10.3389/fnins.2024.1425562.\u003c/li\u003e\n\u003cli\u003eHur S, et al. A Machine Learning-Based Algorithm for the Prediction of Intensive Care Unit Delirium (PRIDE): Retrospective Study. JMIR Med Inform. 2021;9(7):e23401. 10.2196/23401.\u003c/li\u003e\n\u003cli\u003eGreen C, et al. Prediction of ICU Delirium: Validation of Current Delirium Predictive Models in Routine Clinical Practice. Crit Care Med. 2019;47(3):428-435. 10.1097/CCM.0000000000003577.\u003c/li\u003e\n\u003cli\u003eGao W, Zhang Y, Jin J. Validation of E-PRE-DELIRIC in cardiac surgical ICU delirium: A retrospective cohort study. Nurs Crit Care. 2022;27(2):233-239. 10.1111/nicc.12674.\u003c/li\u003e\n\u003cli\u003eFan H, et al. Development and validation of a dynamic delirium prediction rule in patients admitted to the Intensive Care Units (DYNAMIC-ICU): A prospective cohort study. Int J Nurs Stud. 2019;93:64-73. 10.1016/j.ijnurstu.2018.10.008.\u003c/li\u003e\n\u003cli\u003eCoombes CE, Coombes KR, Fareed N. A novel model to label delirium in an intensive care unit from clinician actions. BMC Med Inform Decis Mak. 2021;21(1):97. 10.1186/s12911-021-01461-6.\u003c/li\u003e\n\u003cli\u003eCherak SJ, et al. Development and validation of delirium prediction model for critically ill adults parameterized to ICU admission acuity. PLoS One. 2020;15(8):e0237639. 10.1371/journal.pone.0237639.\u003c/li\u003e\n\u003cli\u003eChen Y, et al. Development and validation of risk-stratification delirium prediction model for critically ill patients: A prospective, observational, single-center study. Medicine (Baltimore). 2017;96(29):e7543. 10.1097/MD.0000000000007543.\u003c/li\u003e\n\u003cli\u003eBhattacharyya A, et al. Delirium prediction in the ICU: designing a screening tool for preventive interventions. JAMIA Open. 2022;5(2):ooac048. 10.1093/jamiaopen/ooac048.\u003c/li\u003e\n\u003cli\u003eGirard TD, et al. Clinical phenotypes of delirium during critical illness and severity of subsequent long-term cognitive impairment: a prospective cohort study. Lancet Respir Med. 2018;6(3):213-222. 10.1016/S2213-2600(18)30062-6.\u003c/li\u003e\n\u003cli\u003eWilson JE, et al. Delirium. Nat Rev Dis Primers. 2020;6(1):90. 10.1038/s41572-020-00223-4.\u003c/li\u003e\n\u003cli\u003eSauerbrei W, et al. State of the art in selection of variables and functional forms in multivariable analysis-outstanding issues. Diagn Progn Res. 2020;4:3. 10.1186/s41512-020-00074-3.\u003c/li\u003e\n\u003cli\u003eSteyerberg EW, Vergouwe Y. Towards better clinical prediction models: seven steps for development and an ABCD for validation. Eur Heart J. 2014;35(29):1925-31. 10.1093/eurheartj/ehu207.\u003c/li\u003e\n\u003cli\u003eCollins GS, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ. 2015;350:g7594. 10.1136/bmj.g7594.\u003c/li\u003e\n\u003cli\u003eSterne JA, et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ. 2009;338:b2393. 10.1136/bmj.b2393.\u003c/li\u003e\n\u003cli\u003eGusmao-Flores D, et al. The confusion assessment method for the intensive care unit (CAM-ICU) and intensive care delirium screening checklist (ICDSC) for the diagnosis of delirium: a systematic review and meta-analysis of clinical studies. Crit Care. 2012;16(4):R115. 10.1186/cc11407.\u003c/li\u003e\n\u003cli\u003eWong A, et al. External Validation of a Widely Implemented Proprietary Sepsis Prediction Model in Hospitalized Patients. JAMA Intern Med. 2021;181(8):1065-1070. 10.1001/jamainternmed.2021.2626. \u003c/li\u003e\n\u003cli\u003eWiens J, et al. Do no harm: a roadmap for responsible machine learning for health care. Nat Med. 2019;25(9):1337-1340. 10.1038/s41591-019-0548-6.\u003c/li\u003e\n\u003cli\u003eMarra A, et al. The ABCDEF Bundle in Critical Care. Crit Care Clin. 2017;33(2):225-243. 10.1016/j.ccc.2016.12.005.\u003c/li\u003e\n\u003cli\u003eKappen TH, et al. Evaluating the impact of prediction models: lessons learned, challenges, and recommendations. Diagn Progn Res. 2018;2:11. 10.1186/s41512-018-0033-6.\u003c/li\u003e\n\u003cli\u003eObermeyer Z, et al. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(6464):447-453. 10.1126/science.aax2342.\u003c/li\u003e\n\u003cli\u003eAncker JS, et al. Effects of workload, work complexity, and repeated alerts on alert fatigue in a clinical decision support system. BMC Med Inform Decis Mak. 2017;17(1):36. 10.1186/s12911-017-0430-8.\u003c/li\u003e\n\u003cli\u003eCabitza F, Rasoini R, Gensini GF. Unintended Consequences of Machine Learning in Medicine. JAMA. 2017;318(6):517-518. 10.1001/jama.2017.7797.\u003c/li\u003e\n\u003cli\u003eDebray TP, et al. A framework for developing, implementing, and evaluating clinical prediction models in an individual participant data meta-analysis. Stat Med. 2013;32(18):3158-80. 10.1002/sim.5732.\u003c/li\u003e\n\u003cli\u003eGreenhalgh T, et al. Beyond Adoption: A New Framework for Theorizing and Evaluating Nonadoption, Abandonment, and Challenges to the Scale-Up, Spread, and Sustainability of Health and Care Technologies. J Med Internet Res. 2017;19(11):e367. 10.2196/jmir.8775.\u003c/li\u003e\n\u003cli\u003eSteyerberg EW, et al. Prognosis Research Strategy (PROGRESS) 3: prognostic model research. PLoS Med. 2013;10(2):e1001381. 10.1371/journal.pmed.1001381.\u003c/li\u003e\n\u003cli\u003eCollins GS, Moons KGM. Reporting of artificial intelligence prediction models. Lancet. 2019;393(10181):1577-1579. 10.1016/S0140-6736(19)30037-6.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"bmc-anesthesiology","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"bane","sideBox":"Learn more about [BMC Anesthesiology](http://bmcanesthesiol.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/bane","title":"BMC Anesthesiology","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Delirium, Intensive Care Units, Prediction Models, Systematic review","lastPublishedDoi":"10.21203/rs.3.rs-7799974/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7799974/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThis systematic review critically appraises 26 studies on risk prediction models for delirium in ICU patients. Despite the development of 25 distinct models incorporating common predictors like age, sedation use, and APACHE-II scores, and demonstrating apparently strong discriminatory performance, most models exhibited significant methodological limitations. These included widespread overfitting, inadequate handling of missing data, predominant reliance on internal validation only, and heterogeneous outcome assessment. Only four models underwent robust external validation. The findings indicate that while machine learning approaches like XGBoost show promise, fundamental methodological shortcomings substantially limit the clinical applicability and generalizability of existing prediction tools. Future research must prioritize methodological rigor, external validation in diverse populations, and implementation studies to assess real-world clinical impact before these models can be recommended for routine use.\u003c/p\u003e","manuscriptTitle":"Risk prediction models for delirium in ICU patients: A systematic review and critical appraisal","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-11-07 07:03:53","doi":"10.21203/rs.3.rs-7799974/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-11-19T04:25:25+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-11-17T19:42:40+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-11-16T05:03:43+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-11-14T10:34:46+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"237945280958859049878252860129553024230","date":"2025-11-04T13:27:00+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"329558043969676616923173693002114410155","date":"2025-10-31T17:36:48+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"143895000492957830245451837653590687321","date":"2025-10-28T18:02:14+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"213093515222614252841388641855267328931","date":"2025-10-28T16:58:03+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-10-28T16:16:06+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-10-28T16:12:01+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2025-10-27T14:59:38+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-10-23T11:04:45+00:00","index":"","fulltext":""},{"type":"submitted","content":"BMC Anesthesiology","date":"2025-10-23T11:01:13+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"bmc-anesthesiology","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"bane","sideBox":"Learn more about [BMC Anesthesiology](http://bmcanesthesiol.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/bane","title":"BMC Anesthesiology","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"823db294-4f0f-4e35-b091-2eceee96f2c3","owner":[],"postedDate":"November 7th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2025-12-15T16:02:42+00:00","versionOfRecord":{"articleIdentity":"rs-7799974","link":"https://doi.org/10.1186/s12871-025-03562-5","journal":{"identity":"bmc-anesthesiology","isVorOnly":false,"title":"BMC Anesthesiology"},"publishedOn":"2025-12-13 15:58:04","publishedOnDateReadable":"December 13th, 2025"},"versionCreatedAt":"2025-11-07 07:03:53","video":"","vorDoi":"10.1186/s12871-025-03562-5","vorDoiUrl":"https://doi.org/10.1186/s12871-025-03562-5","workflowStages":[]},"version":"v1","identity":"rs-7799974","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7799974","identity":"rs-7799974","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.