Machine Learning Prediction of Surgical Site Infections Following Major Gastrointestinal Surgery: A Comprehensive Model Development and Validation Study in Yemeni Patients

doi:10.21203/rs.3.rs-8232610/v1

Machine Learning Prediction of Surgical Site Infections Following Major Gastrointestinal Surgery: A Comprehensive Model Development and Validation Study in Yemeni Patients

2025 · doi:10.21203/rs.3.rs-8232610/v1

preprint OA: closed

Full text JSON View at publisher

Full text 119,716 characters · extracted from preprint-html · click to expand

Machine Learning Prediction of Surgical Site Infections Following Major Gastrointestinal Surgery: A Comprehensive Model Development and Validation Study in Yemeni Patients | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Machine Learning Prediction of Surgical Site Infections Following Major Gastrointestinal Surgery: A Comprehensive Model Development and Validation Study in Yemeni Patients Saif Ghabisha, Qasem Alyhari, Ahmed Ateik, Saleh Al-wageeh, Faisal Ahmed, and 4 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8232610/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 16 Mar, 2026 Read the published version in Patient Safety in Surgery → Version 1 posted 9 You are reading this latest preprint version Abstract Background Surgical site infections (SSIs) continue to exert a substantial burden on healthcare systems, particularly in resource-limited settings where they contribute to prolonged hospitalizations, escalated costs, and increased patient morbidity. The ability to accurately predict SSI risk is essential for implementing targeted prevention strategies and optimizing resource allocation, especially in constrained environments. Methods We conducted a retrospective cohort study utilizing data from 525 patients who underwent major gastrointestinal surgery at Ibb University-affiliated hospitals in Yemen between 2018 and 2023. Four machine learning models—Logistic Regression, Random Forest, XGBoost, and Neural Network—were developed using 38 preoperative and intraoperative variables. Temporal validation was performed, with data from 2018–2022 used for model training (n = 420) and 2023 data (n = 105) reserved for testing. Model performance was evaluated by area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), calibration metrics, and decision curve analysis. Subgroup analyses assessed model fairness across demographic and clinical strata. Results The observed SSI rate was 16.2%, consistent across both training and test sets. XGBoost achieved the highest predictive performance (AUROC: 0.934; 95% CI: 0.891–0.967; AUPRC: 0.809), outperforming logistic regression (AUROC: 0.868, p = 0.012) and neural network (AUROC: 0.890, p = 0.038) models. Random Forest also demonstrated competitive accuracy (AUROC: 0.924; AUPRC: 0.787). Robust performance was maintained across critical subgroups, with XGBoost yielding an AUROC of 0.967 among elderly patients and Random Forest achieving an AUROC of 0.979 among diabetic patients. All models systematically overestimated SSI risk (calibration slopes > 2.0), though XGBoost exhibited the best calibration (Brier score: 0.080). Decision curve analysis confirmed clinical utility within probability thresholds of 15–35%. Conclusion Machine learning models, specifically XGBoost and Random Forest, can accurately predict SSI risk following major gastrointestinal surgery in the Yemeni healthcare context. Despite calibration limitations, these models demonstrate strong discriminative ability and clinical utility, supporting their use for risk stratification in resource-limited settings. The development of a simplified risk score offers a pragmatic alternative for implementation in environments with limited technological infrastructure. Machine Learning Surgical Site Infections Gastrointestinal Surgery: A Comprehensive Figures Figure 1 Figure 2 Figure 3 Introduction Surgical site infections (SSIs) persist as a critical challenge in contemporary surgical care, particularly within the domain of gastrointestinal surgery. SSIs account for a considerable proportion of postoperative complications, affecting approximately 2–5% of patients undergoing inpatient surgical procedures and resulting in significant clinical and economic burdens worldwide ( 1 – 3 ). The incidence of SSIs following gastrointestinal surgery is notably higher, with reported rates ranging from 10% to 25%, leading to prolonged hospital stays, increased readmission rates, potential reoperations, and substantial patient morbidity ( 4 ). From an economic perspective, SSIs exert a considerable strain on healthcare systems. In the United States, the additional costs associated with SSIs have been estimated at approximately $ 20,000 per patient, amounting to billions of dollars in annual expenditures ( 5 , 6 ). Beyond the financial implications, SSIs adversely impact patient quality of life, engendering longer recovery periods, psychological distress, and, in severe cases, increased mortality. Traditional approaches to SSI risk stratification have relied predominantly on conventional statistical methodologies, such as logistic regression and composite clinical risk scores. While these models, including the American Society of Anesthesiologists (ASA) classification and the National Healthcare Safety Network (NHSN) risk index, are widely used in clinical practice, they suffer from limited discriminative capability. Typical AUROC values for these models range from 0.60 to 0.70, reflecting suboptimal predictive accuracy and an inability to capture complex, non-linear relationships among patient demographics, comorbidities, intraoperative variables, and infection risk ( 7 , 8 ). The advent of machine learning (ML) in healthcare has catalyzed a paradigm shift in predictive modeling. ML algorithms, particularly ensemble methods such as random forests and gradient boosting machines, as well as deep learning approaches, have demonstrated superior performance in a variety of medical prediction tasks due to their capacity to identify intricate patterns in high-dimensional clinical data ( 9 ). Despite this promise, the application of advanced machine learning models to SSI prediction in gastrointestinal surgery remains relatively underexplored. Crucially, predictive accuracy alone is insufficient for the successful clinical translation of ML-based risk models. To engender trust and facilitate adoption among clinicians, models must also demonstrate clinical utility, fairness across diverse patient subgroups, and interpretability ( 10 , 11 ). The practical value of a predictive system hinges on its ability to inform actionable decision-making, equitably benefit all patient populations, and provide transparent reasoning for its outputs. This study aims to address these critical gaps by systematically developing, validating, and comparing multiple machine learning algorithms for SSI prediction following gastrointestinal surgery. Using a rigorous comparative framework, we evaluate traditional and advanced ML models not only in terms of predictive performance but also clinical utility (via decision curve analysis), fairness (through subgroup analyses), and interpretability (leveraging SHapley Additive exPlanations, SHAP). By focusing on these multidimensional aspects, we endeavor to bridge the chasm between predictive excellence and practical implementation in surgical care. The primary objectives of this investigation are threefold: ( 1 ) to develop and benchmark multiple ML models for SSI prediction post-gastrointestinal surgery; ( 2 ) to assess the clinical utility and fairness of the top-performing model across relevant patient subgroups; and ( 3 ) to enhance interpretability through advanced feature importance techniques, thereby supporting integration of predictive analytics into routine surgical workflows for targeted SSI prevention. Patients and Methods Study Setting and Population This retrospective cohort study was conducted at Ibb University-affiliated hospitals in Yemen, focusing exclusively on patients who underwent major gastrointestinal surgery between January 2018 and December 2023. The participating hospitals serve a large and diverse catchment area in central Yemen, reflecting the patient and procedural diversity characteristic of resource-limited settings. The initial study cohort included 580 consecutive adults (aged 18 years or older) who underwent elective or emergency major gastrointestinal operations, including procedures involving the stomach, small and large intestines, hepatobiliary system, and colorectal region. Patients were excluded if they lacked complete postoperative follow-up data for at least 30 days, had missing critical predictor variables, or were lost to follow-up, yielding a final analytic sample of 525 patients ( Supplementary Figure S1 ). All surgeries were performed by attending general surgeons trained in gastrointestinal surgery, ensuring a consistent standard of surgical care across the study period. Study Design and Ethical Considerations This study employed a retrospective cohort design, leveraging routinely collected clinical data. Ethical approval was obtained from the Ibb University Faculty of Medicine Institutional Review Board, with a waiver of informed consent in accordance with the retrospective nature of the analysis. The investigation adhered to the principles of the Declaration of Helsinki and followed TRIPOD guidelines for transparent reporting of multivariable prediction model development and validation. Data Sources and Quality Assurance Data extraction was performed from the hospitals' archived health records, integrating information from preoperative assessment forms, anesthesia records, operative reports, and postoperative clinical documentation. A standardized data abstraction protocol was developed and implemented by two trained research assistants. Data completeness and accuracy were monitored through periodic audits conducted by the principal investigator. To ensure inter-rater reliability, a random subset of records was independently reviewed, and Cohen's kappa statistic was calculated, demonstrating substantial agreement for the primary outcome and key predictors (κ = 0.82). Outcome Definition The primary outcome was the occurrence of surgical site infection (SSI) within 30 days of gastrointestinal surgery. SSI was defined according to the Centers for Disease Control and Prevention National Healthcare Safety Network criteria: (a) purulent drainage from the incision, (b) positive microbial cultures obtained from the wound, (c) clinical signs of infection (erythema, warmth, tenderness, or induration), or (d) surgeon diagnosis requiring therapeutic intervention. Two independent surgical consultants adjudicated all outcomes, with disagreements resolved through consensus discussion with a third senior surgeon. Predictor Variables and Feature Engineering A total of 38 preoperative and intraoperative variables were selected for model development, based on clinical relevance and evidence from prior literature ( Supplementary Table S2 ). Demographic variables included age (both continuous and dichotomized at ≥ 60 years), gender (coded as Gender_0 for female and Gender_1 for male), and body mass index (BMI, categorized per WHO guidelines). Comorbidities comprised diabetes mellitus, hypertension, chronic renal failure, chronic liver disease, pulmonary disease, and current smoking status. Procedural variables encompassed the type of gastrointestinal surgery (classified by anatomical site and complexity), wound contamination class (clean, clean-contaminated, contaminated, dirty), anesthesia type, surgical urgency (elective, urgent, emergency), and estimated intraoperative blood loss. Intraoperative metrics included operative duration (minutes), preoperative leukocyte count, and temperature at the conclusion of the procedure. Risk assessment tools such as the American Society of Anesthesiologists (ASA) physical status and the National Nosocomial Infections Surveillance (NNIS) risk index were also included. Continuous predictors were retained in their natural scale for tree-based models (Random Forest, XGBoost) and standardized (z-score normalization) for logistic regression and neural network models. Categorical variables underwent one-hot encoding to facilitate algorithm compatibility. Missing data, present in less than 3% of all variables, were imputed using multivariate imputation by chained equations, ensuring that no predictor with > 10% missingness was used. Model Development Data Partitioning and Temporal Validation The dataset was split temporally to mimic real-world clinical implementation and to minimize overfitting. Data from 2018 to 2022 (n = 420) served as the development (training and internal validation) set; data from 2023 (n = 105) formed the external temporal test set. Machine Learning Algorithms Four supervised machine learning algorithms were employed: Logistic Regression (LR) : A baseline interpretable model using regularized logistic regression with L2 penalty. Random Forest (RF) : An ensemble of decision trees constructed using bootstrap aggregation and random feature selection at each split. Extreme Gradient Boosting (XGBoost) : A highly efficient implementation of gradient boosting decision trees, optimized for tabular data. Feed-Forward Neural Network (FNN) : A multi-layer perceptron with two hidden layers, using ReLU activation and dropout regularization (dropout rate = 0.2). All models were implemented in Python using Scikit-learn and XGBoost libraries. Hyperparameters were optimized using five-fold cross-validation on the 2018–2022 data, with performance assessed by mean area under the receiver operating characteristic curve (AUROC). Grid search and random search strategies were employed for hyperparameter tuning ( Supplementary Table S3 ). Model Evaluation Metrics Performance was comprehensively evaluated using the following metrics: Discrimination : Area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC). Calibration : Brier score, calibration slope, and calibration plots comparing predicted probabilities with observed SSI rates. Clinical Utility : Decision curve analysis (DCA) was conducted to estimate net benefit across clinically plausible risk thresholds (15–35%). Subgroup Analysis : Model performance was assessed within predefined demographic and clinical subgroups (age < 60 vs ≥ 60 years, gender, diabetes status, wound class, urgency), to evaluate fairness and generalizability. Statistical Comparison : DeLong's test was used to compare AUROCs between models. Simplified Risk Score Development A parsimonious risk score was derived from the best-performing machine learning model, using backward stepwise feature selection and clinical interpretability criteria. Model coefficients were transformed into integer points proportional to their effect size, yielding a user-friendly tool for clinical implementation in low-resource environments ( Supplementary Note S1 ). Results Patient and Procedure Characteristics Of the 525 included patients, the mean age was 52.4 years (SD 15.8), with 41.9% aged ≥ 60. Males represented 57.9% of the cohort. The prevalence of major comorbidities included diabetes mellitus (21.3%), hypertension (29.1%), chronic renal failure (4.6%), and current smoking (22.1%). The majority of procedures were classified as elective (62.7%), with the remainder being urgent (21.1%) or emergency (16.2%). Wound contamination classes were distributed as follows: clean (24.8%), clean-contaminated (38.3%), contaminated (23.1%), and dirty (13.7%). The observed 30-day SSI rate was 16.2% (n = 85), with similar rates in both the development (16.0%) and test cohorts (17.1%) (Table 1 ). Table 1 Baseline Characteristics of the Study Cohort (n = 525) Variable Overall (n = 525) SSI (n = 85) No SSI (n = 440) Age, mean (SD), years 52.4 (15.8) 56.1 (14.3) 51.7 (16.1) Age ≥ 60 years, n (%) 220 (41.9) 44 (51.8) 176 (40.0) Male gender, n (%) 304 (57.9) 53 (62.4) 251 (57.0) BMI ≥ 30 kg/m², n (%) 98 (18.7) 19 (22.4) 79 (18.0) Diabetes mellitus, n (%) 112 (21.3) 33 (38.8) 79 (18.0) Hypertension, n (%) 153 (29.1) 27 (31.8) 126 (28.6) Chronic renal failure, n (%) 24 (4.6) 6 (7.1) 18 (4.1) Current smoker, n (%) 116 (22.1) 21 (24.7) 95 (21.6) Emergency surgery, n (%) 85 (16.2) 23 (27.1) 62 (14.1) Wound contamination: Dirty, n (%) 72 (13.7) 24 (28.2) 48 (10.9) Operative duration > 180 min, n (%) 108 (20.6) 29 (34.1) 79 (18.0) Preop leukocytosis (> 12 ×10⁹/L), n (%) 97 (18.5) 24 (28.2) 73 (16.6) ASA class ≥ 3, n (%) 189 (36.0) 41 (48.2) 148 (33.6) SSI within 30 days, n (%) 85 (16.2) — — Model Performance: Discrimination On the held-out 2023 test set (n = 105), XGBoost demonstrated the highest discrimination for SSI prediction, achieving an AUROC of 0.934 (95% CI: 0.891–0.967) and an AUPRC of 0.809. Random Forest yielded comparably strong performance (AUROC: 0.924, AUPRC: 0.787), while the neural network model (AUROC: 0.890, AUPRC: 0.712) and logistic regression (AUROC: 0.868, AUPRC: 0.677) performed less well (Fig. 1 A). DeLong's test confirmed that XGBoost significantly outperformed both logistic regression (p = 0.012) and the neural network (p = 0.038), but not Random Forest (p = 0.17) (Fig. 1 B). Pairwise AUROC comparisons using DeLong's test confirmed the statistically significant superiority of the XGBoost model over logistic regression and the neural network, while the difference between XGBoost and Random Forest was not statistically significant. The net reclassification improvement (NRI) for XGBoost over logistic regression was 0.21 (p = 0.009), indicating meaningful clinical impact (Table 2 ). Table 2 Model Performance Metrics on the 2023 Test Set (n = 105) Model AUROC (95% CI) AUPRC Brier Score Sensitivity Specificity Optimal Threshold (Youden) Logistic Regression 0.868 (0.799–0.921) 0.677 0.098 0.78 0.87 0.18 Random Forest 0.924 (0.871–0.961) 0.787 0.087 0.89 0.90 0.22 XGBoost 0.934 (0.891–0.967) 0.809 0.080 0.91 0.92 0.20 Neural Network 0.890 (0.825–0.938) 0.712 0.093 0.85 0.88 0.19 Simplified Score 0.841 (0.764–0.904) 0.603 0.104 0.74 0.85 3 points Calibration and Clinical Utility All models demonstrated a tendency to overestimate absolute SSI risk, as reflected by calibration slopes > 2.0. Nonetheless, XGBoost exhibited the best calibration (Brier score: 0.080), followed by Random Forest (Brier score: 0.087), neural network (Brier score: 0.093), and logistic regression (Brier score: 0.098). Calibration plots (Fig. 2 A) showed that recalibration using isotonic regression improved alignment between predicted and observed risks, particularly for XGBoost and Random Forest. Decision curve analysis indicated that the net benefit of XGBoost and Random Forest models exceeded that of both logistic regression and neural network models across clinically relevant SSI risk thresholds (15–35%), supporting their clinical applicability for risk stratification (Fig. 2 B). Subgroup Analyses: Fairness and Robustness Model performance was consistent across clinically important subgroups (Table 3 ). For elderly patients (≥ 60 years), XGBoost achieved an AUROC of 0.967, with Random Forest also performing strongly (AUROC: 0.959). Among diabetic patients, Random Forest exhibited the highest discrimination (AUROC: 0.979), while XGBoost and neural network models maintained AUROCs above 0.91. No significant differences in performance were observed by gender or surgical urgency ( Supplementary Table S4 ). Table 3 Subgroup Model Performance (AUROC) for XGBoost and Random Forest Subgroup XGBoost AUROC Random Forest AUROC Age ≥ 60 years 0.967 0.959 Age < 60 years 0.913 0.902 Male 0.921 0.918 Female 0.943 0.926 Diabetes 0.957 0.979 No Diabetes 0.912 0.901 Emergency surgery 0.951 0.939 Elective surgery 0.929 0.917 Dirty wound class 0.962 0.957 A sensitivity analysis excluding patients with missing data (complete case analysis) yielded similar results, confirming the robustness of the model findings. Feature Importance Both XGBoost and Random Forest models highlighted the following predictors as most influential: wound contamination class, operative duration, diabetes mellitus status, ASA class, and age. Additional important features included preoperative leukocyte count, estimated blood loss, and emergency status. Figure 3 A displays the top ten feature importances for the XGBoost model using SHAP analysis. Simplified Risk Score Development A seven-variable risk score was derived from the XGBoost model, including: wound contamination, operative duration (> 180 min), diabetes, ASA class (≥ 3), emergency surgery, age ≥ 60 years, and preoperative leucocytosis (> 12 × 10⁹/L). Assigning integer points based on logistic regression beta coefficients, the risk score demonstrated an AUROC of 0.841 on the test set—comparable to logistic regression, and only modestly lower than the full XGBoost model (Fig. 3 B). Discussion The study presents a comprehensive evaluation of machine learning algorithms designed to predict surgical site infection (SSI) following major gastrointestinal surgery in a resource-constrained setting. Notably, ensemble methods such as XGBoost and Random Forest demonstrated superior discriminative performance compared to traditional methods including logistic regression and neural networks, with area under the receiver operating characteristic curve (AUROC) values exceeding 0.9. These results underscore the potential of sophisticated machine learning techniques to effectively identify patients at enhanced risk of SSI. The derivation of a simplified risk score from the highest-performing model offers a pragmatic tool for clinical implementation, particularly in environments with limited computational resources. This contribution adds to the growing evidence suggesting that high-performance SSI prediction can be achieved using structured clinical data alone, circumventing the need for complex unstructured data such as natural language processing, which is often not feasible in low-resource settings. The superior performance of our ensemble models over logistic regression highlights the value of algorithms capable of capturing complex, non-linear relationships within clinical data. This finding is consistent with investigations such as those conducted by Chen et al. (2020), who similarly identified the efficacy of ensemble methods in the prediction of SSI ( 12 ). Conversely, our findings diverge from alternative studies, notably Song et al., which concluded that SSI identification with administrative data may inaccurate ( 13 ). This divergence is likely attributable to methodological variations. Our investigation employed a comprehensive array of clinical and intraoperative variables, the complex interdependencies of which are more effectively captured by tree-based algorithms. In contrast, research that relies on less complex or administrative datasets may not demonstrate the same degree of intricate feature interactions, thus rendering linear models adequate. This observation implies that the most suitable algorithm is contingent upon contextual factors, shaped by the specificity and nature of the predictor variables. The implications are unequivocal: as clinical datasets evolve to become more extensive and nuanced, the utilization of advanced, non-linear models such as XGBoost or Temporal Adaptive Neural Evolutionary Algorithm (TANEA) will become increasingly essential to realize their full predictive capabilities ( 14 , 15 ). Regarding performance metrics, our models demonstrated exceptional discriminative ability, with an AUROC of 0.934 for XGBoost, which surpasses the pooled AUC of 0.93 reported in the meta-analysis and the specific results of similar predictive studies like Chen et al. (AUROC ~ 0.78) ( 12 , 16 ). Furthermore, our model's sensitivity (0.91) and specificity (0.92) at the optimal threshold are notably higher than the meta-analysis pooled estimates for structured-data models (sensitivity 0.56, specificity 0.95) ( 16 ). While our study demonstrates that robust SSI prediction is achievable through the analysis of structured clinical data, these findings must be interpreted with caution due to important methodological constraints, particularly when compared to prior approaches. Previous studies have demonstrated strong performance in SSI identification using ML but with key differences. For instance, one study using NLP achieved a sensitivity and positive predictive value of 97% for SSI detection ( 17 ) by leveraging the rich, post-operative data in clinical notes, a data modality our study lacked. In contrast, our predictive model, reliant on pre- and intra-operative structured data, may be inherently limited in complex cases. Our cohort's exclusion of critical predictors—such as neoadjuvant radiation, chemotherapy, and immunodeficiency, which are pivotal in oncological surgeries—likely diminishes the model's accuracy and generalizability for patients undergoing radical procedures for malignancy. Furthermore, the absence of radiological image variables represents another significant source of unmeasured confounding. Another study achieved an AUC of 86% using preoperative blood tests ( 18 ), a more limited but highly standardized dataset. While our model, which incorporated a broader set of operative variables, achieved a higher AUC of 0.934, its performance in the most complex surgical cases remains uncertain due to these data omissions. Therefore, our approach, though promising for general gastrointestinal surgery, should be considered inferior for populations where the excluded variables are key determinants of outcome. While our models demonstrated excellent discrimination, they were poorly calibrated, systematically overestimating the absolute risk of SSI. This miscalibration necessitates caution in clinical interpretation, as a predicted probability of 30% from our model does not equate to a true 30% risk. However, despite this limitation, decision curve analysis revealed that both the XGBoost model and the simplified risk score provided a superior net benefit across a range of clinically relevant risk thresholds compared to default strategies. This suggests that even with imperfect probability estimation, the model's ability to correctly rank patients by risk offers tangible value for decision-making. In resource-constrained environments, this could rationally guide the targeted intensification of perioperative measures—such as prolonged antibiotic prophylaxis or enhanced post-discharge monitoring—for the highest-risk patients, thereby maximizing the efficient use of limited infection control resources ( 19 ). Therefore, the clinical utility of our models is not negated by their poor calibration but is instead contextualized by it; they serve as effective tools for risk stratification rather than for providing precise individual-level prognostic probabilities. Our analysis of feature importance further corroborated the clinical relevance of our models. The prominence of well-recognized risk factors—specifically, wound contamination class, operative duration, diabetes mellitus, and ASA classification—exhibits strong concordance with the extant literature, including studies conducted by Bucher et al. and Merath et al. ( 20 , 21 ). This alignment reinforces the biological and clinical face validity of our selected predictors. The robust performance of these features, even in the absence of textual data, suggests that they encapsulate a fundamental and potent core of SSI risk. Study limitations: Despite these advantages, our investigation is accompanied by several significant limitations that necessitate consideration for the advancement of this research. Firstly, the single-center, retrospective design inherently restricts the external validity of our results. Although temporal validation yields a more accurate assessment of performance than random partitioning, it does not evaluate the models' efficacy across various healthcare environments characterized by differing patient demographics, surgical methodologies, and documentation standards. Conducting external validation in heterogeneous clinical contexts emerges as a vital subsequent endeavor. Secondly, we discerned a substantial deficiency in model calibration. All of our models, inclusive of the highest performing XGBoost, exhibited a propensity to overestimate the absolute risk of SSI. This miscalibration implies that while the models excel in stratifying patients according to risk, their predicted probabilities lack reliability in conveying precise individual risk assessments. This phenomenon represents a prevalent yet frequently underreported challenge in the domain of clinical ML that must be addressed prior to the comprehensive integration of models into clinical decision-making processes. Thirdly, we encountered the traditional trade-off between model efficacy and interpretability. The XGBoost algorithm, notwithstanding its high accuracy, functions as a "black box," complicating clinicians' ability to comprehend the basis for its predictions. Our endeavor to mitigate this gap by developing a simplified risk score, although successful in improving interpretability, resulted in a significant decline in predictive accuracy (AUROC reduction from 0.934 to 0.841). This underscores a fundamental conundrum within the discipline. Furthermore, our algorithmic exploration was constrained; we did not assess alternative model classes, such as Explainable Boosting Machines (EBMs) or Bayesian Networks, which are specifically engineered to achieve a more advantageous equilibrium between accuracy, interpretability, and inherent calibration. Ultimately, our predictive capability was limited by the data accessible within our retrospective archives. We were deprived of unstructured textual data from clinical notes, which has proven to be a critical differentiator for the top-performing models documented in the literature. Additionally, we were unable to incorporate dynamic laboratory trends, microbiological data such as the history of multi-drug resistant organism (MDRO) colonization, or social determinants of health, all of which are likely to be significant predictors, particularly in contexts with limited resources. Conclusion In summary, this study reinforces the significance of ML, particularly ensemble methodologies like XGBoost, in facilitating accurate predictions of SSIs. It illustrates that meticulous model development and validation can be effectively executed in a resource-constrained environment, producing tools that possess the potential to markedly enhance preoperative risk stratification. For clinical practice, our simplified risk score serves as an immediately actionable tool. For the research community, our findings emphasize that robust prediction is attainable with structured data, while concurrently accentuating the critical significance of external validation, model calibration, and interpretability. Future investigations should therefore concentrate on three pivotal areas: 1) the external validation of these models across diverse healthcare systems, 2) the incorporation of enriched data sources, including clinical notes and longitudinal laboratory data, and 3) the development and application of inherently interpretable and well-calibrated model architectures, such as Explainable Boosting Machines, to reconcile the dichotomy between predictive accuracy and clinical applicability. Declarations Acknowledgements Not applicable. Funding This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. Ethical Approval: Ethical approval was obtained from the Ibb University Faculty of Medicine Institutional Review Board, with a waiver of informed consent in accordance with the retrospective nature of the analysis. The investigation adhered to the principles of the Declaration of Helsinki and followed TRIPOD guidelines for transparent reporting of multivariable prediction model development and validation. Competing Interests All authors declare that they have no competing interests to disclose. Consent for publication All authors have read and accepted the final version of the work, and they agree to its publication. Contributors Saif G Investigation, Methodology, Project Administration, Writing – Review & Editing. Ahmed A and Wadhah E Conceptualization, Data Curation, Formal Analysis, Writing – Original Draft. Qasem A and Yasser O Investigation, Methodology, Project Administration, Writing – Review & Editing. Saleh A and Mohammed A Conceptualization, Data Curation, Formal Analysis, Writing – Original Draft. Faisal A Supervision, Validation, Visualization, Writing – Review & Editing, Al-Shehari W writing, Review and editing Corresponding Author. Corresponding author Correspondence to Wadee Abdullah Al-Shehari Data availability The datasets used and analyzed during the current analysis available from the corresponding author on reasonable request. References Chang J, Karlsdottir BR, Phillips HL, Loeffler BT, Mott SL, Hrabe JE, et al. Modern Trends in Surgical Site Infection Rates for Colorectal Surgery: A National Surgical Quality Improvement Project Study 2013–2020. Dis Colon Rectum. 2024;67(9):1201–9. Chen KA, Joisa CU, Stem JM, Guillem JG, Gomez SM, Kapadia MR. Improved Prediction of Surgical-Site Infection After Colorectal Surgery Using Machine Learning. Dis Colon Rectum. 2023;66(3):458–66. Agostinho A, Chalot E, Teixeira D, Bosetti D, Buetti N, Catho G, et al. Semi-automated surveillance of surgical site infections using machine learning and rule-based classification models. npj Digit Med. 2025;8(1):617. Costabella F, Patel KB, Adepoju AV, Singh P, Attia Hussein Mahmoud H, Zafar A, et al. Healthcare Cost and Outcomes Associated With Surgical Site Infection and Patient Outcomes in Low- and Middle-Income Countries. Cureus. 2023;15(7):e42493. Broex EC, van Asselt AD, Bruggeman CA, van Tiel FH. Surgical site infections: how high are the costs? J Hosp Infect. 2009;72(3):193–201. Hirani S, Trivedi NA, Chauhan J, Chauhan Y. A study of clinical and economic burden of surgical site infection in patients undergoing caesarian section at a tertiary care teaching hospital in India. PLoS ONE. 2022;17(6):e0269530. Korol E, Johnston K, Waser N, Sifakis F, Jafri HS, Lo M, et al. A systematic review of risk factors associated with surgical site infections among surgical patients. PLoS ONE. 2013;8(12):e83743. Zimlichman E, Henderson D, Tamir O, Franz C, Song P, Yamin CK, et al. Health care-associated infections: a meta-analysis of costs and financial impact on the US health care system. JAMA Intern Med. 2013;173(22):2039–46. Rajkomar A, Dean J, Kohane I. Machine Learning in Medicine. N Engl J Med. 2019;380(14):1347–58. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25(1):44–56. Chakradeo K, Huynh I, Balaganeshan SB, Dollerup OL, Gade-Jørgensen H, Laupstad SK, et al. Navigating fairness aspects of clinical prediction models. BMC Med. 2025;23(1):567. Chen W, Lu Z, You L, Zhou L, Xu J, Chen K. Artificial Intelligence-Based Multimodal Risk Assessment Model for Surgical Site Infection (AMRAMS): Development and Validation Study. JMIR Med Inf. 2020;8(6):e18186. Song X, Cosgrove SE, Pass MA, Perl TM. Using hospital claim data to monitor surgical site infections for inpatient procedures. Am J Infect Control. 2008;36(3):S32–6. S C, C A, K S. Advanced predictive disease modeling in biomedical IoT using the temporal adaptive neural evolutionary algorithm. Sci Rep. 2025;15(1):20378. Rafie Z, Talab MS, Koor BEZ, Garavand A, Salehnasab C, Ghaderzadeh M. Leveraging XGBoost and explainable AI for accurate prediction of type 2 diabetes. BMC Public Health. 2025;25(1):3688. Wu G, Khair S, Yang F, Cheligeer C, Southern D, Zhang Z, et al. Performance of machine learning algorithms for surgical site infection case detection and prediction: A systematic review and meta-analysis. Ann Med Surg (Lond). 2022;84:104956. Mamlook REA, Wells LJ, Sawyer R. Machine-learning models for predicting surgical site infections using patient pre-operative risk and surgical procedure factors. Am J Infect Control. 2023;51(5):544–50. Mandagani P, Coleman S, Zahid A, Ehlers AP, Roy SB, De Cock M, editors. Machine learning models for surgical site infection prediction. AMIA KDDM-WG Symposium (American medical Informatics Association Knowledge Discovery and Data Mining Working Group); 2016. Seymour CW, Cooke CR, Wang Z, Kerr KF, Yealy DM, Angus DC, et al. Improving risk classification of critical illness with biomarkers: a simulation study. J Crit Care. 2013;28(5):541–8. Bucher BT, Shi J, Ferraro JP, Skarda DE, Samore MH, Hurdle JF, et al. Portable Automated Surveillance of Surgical Site Infections Using Natural Language Processing: Development and Validation. Ann Surg. 2020;272(4):629–36. Merath K, Hyer JM, Mehta R, Farooq A, Bagante F, Sahara K, et al. Use of Machine Learning for Prediction of Patient Risk of Postoperative Complications After Liver, Pancreatic, and Colorectal Surgery. J Gastrointest Surg. 2020;24(8):1843–51. Additional Declarations No competing interests reported. Supplementary Files SUPPLEMENTARY.docx Cite Share Download PDF Status: Published Journal Publication published 16 Mar, 2026 Read the published version in Patient Safety in Surgery → Version 1 posted Editorial decision: Revision requested 20 Feb, 2026 Reviews received at journal 20 Feb, 2026 Reviewers agreed at journal 20 Feb, 2026 Reviews received at journal 12 Dec, 2025 Reviewers agreed at journal 01 Dec, 2025 Reviewers invited by journal 01 Dec, 2025 Editor assigned by journal 01 Dec, 2025 Submission checks completed at journal 01 Dec, 2025 First submitted to journal 28 Nov, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8232610","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":553598574,"identity":"8c47ba48-870a-4d19-b992-603b3aa94814","order_by":0,"name":"Saif Ghabisha","email":"","orcid":"","institution":"Ibb University","correspondingAuthor":false,"prefix":"","firstName":"Saif","middleName":"","lastName":"Ghabisha","suffix":""},{"id":553598575,"identity":"60ff3c8a-80dc-484b-93a2-ca61af01d46b","order_by":1,"name":"Qasem Alyhari","email":"","orcid":"","institution":"Ibb University","correspondingAuthor":false,"prefix":"","firstName":"Qasem","middleName":"","lastName":"Alyhari","suffix":""},{"id":553598576,"identity":"cd9a64ac-b468-413d-8ce2-e1963fb8d8ca","order_by":2,"name":"Ahmed Ateik","email":"","orcid":"","institution":"21 September University","correspondingAuthor":false,"prefix":"","firstName":"Ahmed","middleName":"","lastName":"Ateik","suffix":""},{"id":553598577,"identity":"9dbc5b58-06ef-4c3a-a6e1-eb8141d5ba92","order_by":3,"name":"Saleh Al-wageeh","email":"","orcid":"","institution":"Ibb University","correspondingAuthor":false,"prefix":"","firstName":"Saleh","middleName":"","lastName":"Al-wageeh","suffix":""},{"id":553598579,"identity":"9a2cb3ac-7cd7-49c4-8b09-a76c42ea75ee","order_by":4,"name":"Faisal Ahmed","email":"","orcid":"","institution":"Ibb University","correspondingAuthor":false,"prefix":"","firstName":"Faisal","middleName":"","lastName":"Ahmed","suffix":""},{"id":553598580,"identity":"f0bb361d-e37b-482a-92f7-dbd8738341e5","order_by":5,"name":"Mohammed Al-Shehari","email":"","orcid":"","institution":"Sana’a University","correspondingAuthor":false,"prefix":"","firstName":"Mohammed","middleName":"","lastName":"Al-Shehari","suffix":""},{"id":553598582,"identity":"ca905e99-2e41-4b4c-9d3e-35b845dcb0ba","order_by":6,"name":"Yasser Obadiel","email":"","orcid":"","institution":"Sana’a University","correspondingAuthor":false,"prefix":"","firstName":"Yasser","middleName":"","lastName":"Obadiel","suffix":""},{"id":553598590,"identity":"21026c89-cb68-4fd4-a0e3-2c4bd58ffe7e","order_by":7,"name":"Wadhah Hassan Edrees","email":"","orcid":"","institution":"Hajjah University","correspondingAuthor":false,"prefix":"","firstName":"Wadhah","middleName":"Hassan","lastName":"Edrees","suffix":""},{"id":553598592,"identity":"50b527ca-df0d-4a04-8b3c-b1a62321bb77","order_by":8,"name":"Wadee Abdullah Al-Shehari","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA3UlEQVRIiWNgGAWjYBACezDJxsDA2Mx8gIGxgQgthg1QLcztbQnEaTE4ANXC3nPGgEgttw+wSfwos5HjnZHzTeLnDhs5BvbDRzfg1XIugU2y51yaseSM3G2SvWfSjBl40tJu4NVyhoHtBm/b4cSNQC0SIEaDBI8ZQS03/7b9r99/I+eZ5F9itdzmbTuQwNhzhk2aKFsMexjbf8ucSzZsbG8ztpZtSzNmI+QXex7mw4ZvyuzkgVH58ObbNhs5fvbDx/BqYUCKCxYJEMmGXzkqYP5AiupRMApGwSgYOQAAF5NMxd9doRsAAAAASUVORK5CYII=","orcid":"","institution":"Ibb University","correspondingAuthor":true,"prefix":"","firstName":"Wadee","middleName":"Abdullah","lastName":"Al-Shehari","suffix":""}],"badges":[],"createdAt":"2025-11-28 18:08:18","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8232610/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8232610/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1186/s13037-026-00481-3","type":"published","date":"2026-03-16T15:59:40+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":97369610,"identity":"3f55ffa9-41a0-4c74-a40b-56db78c13f9b","added_by":"auto","created_at":"2025-12-03 16:25:18","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":518566,"visible":true,"origin":"","legend":"","description":"","filename":"Revisedmanuscript.docx","url":"https://assets-eu.researchsquare.com/files/rs-8232610/v1/805af9654d8fdebd613380da.docx"},{"id":97345423,"identity":"b8f9d519-a187-4ab2-abb3-5dee2f421aee","added_by":"auto","created_at":"2025-12-03 11:45:01","extension":"json","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":10917,"visible":true,"origin":"","legend":"","description":"","filename":"398f37c3f1694efb84a01166fad6116a.json","url":"https://assets-eu.researchsquare.com/files/rs-8232610/v1/dc1098f804c7087c528955e4.json"},{"id":97369743,"identity":"718bac79-1035-4309-87f9-82892da13a2a","added_by":"auto","created_at":"2025-12-03 16:25:39","extension":"docx","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":210765,"visible":true,"origin":"","legend":"","description":"","filename":"SUPPLEMENTARY.docx","url":"https://assets-eu.researchsquare.com/files/rs-8232610/v1/10e8bde3e4b5c1b1c2dff8de.docx"},{"id":97345432,"identity":"50f939bf-963f-4844-891e-7aec695e844d","added_by":"auto","created_at":"2025-12-03 11:45:01","extension":"xml","order_by":3,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":98795,"visible":true,"origin":"","legend":"","description":"","filename":"398f37c3f1694efb84a01166fad6116a1enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-8232610/v1/da832190c793a9a7c11a97a5.xml"},{"id":97369719,"identity":"36121420-9a11-46fd-a37e-588ea9ebe05f","added_by":"auto","created_at":"2025-12-03 16:25:36","extension":"png","order_by":5,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":94108,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-8232610/v1/6b6862280401a911f3ac6e03.png"},{"id":97369693,"identity":"80ffe991-aee9-4fb2-b766-69022edcbc1b","added_by":"auto","created_at":"2025-12-03 16:25:34","extension":"png","order_by":6,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":106687,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-8232610/v1/2bf279631e4d1cefd624a159.png"},{"id":97345436,"identity":"3721b96d-1c65-409d-838d-b307f3582556","added_by":"auto","created_at":"2025-12-03 11:45:01","extension":"png","order_by":7,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":127964,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-8232610/v1/60afdb6fbe809f3ab2ed0267.png"},{"id":97370772,"identity":"9542c6ef-da4b-4d79-b785-34fd9858098b","added_by":"auto","created_at":"2025-12-03 16:27:54","extension":"jpeg","order_by":8,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":63370,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage5.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8232610/v1/bbc20c3fd7c66dde0f3cb800.jpeg"},{"id":97370606,"identity":"fc31a9ef-ce93-4d20-be0b-26e3e4030714","added_by":"auto","created_at":"2025-12-03 16:27:39","extension":"png","order_by":9,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":47711,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-8232610/v1/d85fcd6118044cd93ccb112e.png"},{"id":97345437,"identity":"52caa7aa-1019-47c3-a82b-2d67f25a4fcd","added_by":"auto","created_at":"2025-12-03 11:45:01","extension":"png","order_by":10,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":22212,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-8232610/v1/ae053fdabd51773e23107e1e.png"},{"id":97345440,"identity":"1738f4a3-e5da-4c12-9a27-2e504d901bcc","added_by":"auto","created_at":"2025-12-03 11:45:02","extension":"png","order_by":11,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":25147,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-8232610/v1/4036dbe8e5e40f934c635ee5.png"},{"id":97345438,"identity":"c7e5213d-4ea3-4334-9dd0-01ce0c6027a9","added_by":"auto","created_at":"2025-12-03 11:45:01","extension":"png","order_by":12,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":28329,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-8232610/v1/512139892bd2cd241c32bfda.png"},{"id":97345428,"identity":"b160c502-f353-4b29-81db-1546b0a42807","added_by":"auto","created_at":"2025-12-03 11:45:01","extension":"png","order_by":13,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":21379,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-8232610/v1/d88dc54565fb6116e962d071.png"},{"id":97345435,"identity":"e2f80d90-702a-45f4-b221-2608886336bc","added_by":"auto","created_at":"2025-12-03 11:45:01","extension":"xml","order_by":14,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":94999,"visible":true,"origin":"","legend":"","description":"","filename":"398f37c3f1694efb84a01166fad6116a1structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-8232610/v1/4fce653d1cb7362661320862.xml"},{"id":97345439,"identity":"01ce7239-beb8-4c7c-b131-7ca505c75292","added_by":"auto","created_at":"2025-12-03 11:45:01","extension":"html","order_by":15,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":108660,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-8232610/v1/12effd677bbf9ffda51952be.html"},{"id":97369653,"identity":"a1bcbbf1-7d74-4790-a098-bf95fc0a289b","added_by":"auto","created_at":"2025-12-03 16:25:23","extension":"jpeg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":268276,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eModel Discrimination Performance.\u003c/strong\u003e\u003cbr\u003e\n \u003cstrong\u003eA.\u003c/strong\u003e Receiver Operating Characteristic (ROC) curves for all four models (Logistic Regression, Random Forest, XGBoost, Neural Network) on the 2023 test cohort. XGBoost and Random Forest demonstrate superior discrimination over neural network and logistic regression.\u003cbr\u003e\n \u003cstrong\u003eB.\u003c/strong\u003e Precision-Recall curves for all models in the test set. XGBoost achieves the highest AUPRC, followed closely by Random Forest.\u003c/p\u003e","description":"","filename":"floatimage1.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8232610/v1/817d83608ad3e6de9f255ce4.jpeg"},{"id":97345427,"identity":"c418aa02-7218-4ad8-a957-80ad0d6b1878","added_by":"auto","created_at":"2025-12-03 11:45:01","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":94985,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eModel Calibration and Clinical Utility.\u003c/strong\u003e\u003cbr\u003e\n \u003cstrong\u003eA.\u003c/strong\u003e Calibration plots for each model before and after isotonic regression recalibration. The dashed line represents perfect calibration. XGBoost and Random Forest demonstrate improved calibration post-recalibration, though all models tend to overestimate risk.\u003cbr\u003e\n \u003cstrong\u003eB.\u003c/strong\u003e Decision curve analysis comparing the net benefit for all models across SSI risk thresholds from 10% to 40%. The ‘Treat All’ and ‘Treat None’ strategies are shown as reference. XGBoost and Random Forest offer the highest net benefit within the clinically relevant range of 15-35%.\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-8232610/v1/b8d8458aec33e0f5c1c840b9.png"},{"id":97345421,"identity":"aa2f9213-6080-497b-b56b-b904e0e1fc4e","added_by":"auto","created_at":"2025-12-03 11:45:01","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":116067,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFeature Importance and Simplified Score.\u003c/strong\u003e\u003cbr\u003e\n \u003cstrong\u003eA.\u003c/strong\u003e Feature importance plot for the XGBoost model, illustrating the relative influence of the top ten predictors based on mean absolute SHAP values. The color gradient indicates the feature value (red: high, blue: low) and its impact on the prediction.\u003cbr\u003e\n \u003cstrong\u003eB.\u003c/strong\u003e Receiver Operating Characteristic (ROC) curve comparing the performance of the full XGBoost model and the derived simplified risk score on the test set.\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-8232610/v1/8c216a0a6c209b497e81fd59.png"},{"id":105224139,"identity":"d88e1b68-a87f-48bc-9a57-f81f280b6aa9","added_by":"auto","created_at":"2026-03-23 16:12:40","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1618781,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8232610/v1/569293ed-1dc8-4ff9-968c-4cf019f6e3be.pdf"},{"id":97345425,"identity":"33c0ad50-df02-4a29-8dfd-a1ef8227cfad","added_by":"auto","created_at":"2025-12-03 11:45:01","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":210765,"visible":true,"origin":"","legend":"","description":"","filename":"SUPPLEMENTARY.docx","url":"https://assets-eu.researchsquare.com/files/rs-8232610/v1/44c1a5d8a15e37bdf1581d70.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"Machine Learning Prediction of Surgical Site Infections Following Major Gastrointestinal Surgery: A Comprehensive Model Development and Validation Study in Yemeni Patients","fulltext":[{"header":"Introduction","content":"\u003cp\u003eSurgical site infections (SSIs) persist as a critical challenge in contemporary surgical care, particularly within the domain of gastrointestinal surgery. SSIs account for a considerable proportion of postoperative complications, affecting approximately 2\u0026ndash;5% of patients undergoing inpatient surgical procedures and resulting in significant clinical and economic burdens worldwide (\u003cspan additionalcitationids=\"CR2\" citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e). The incidence of SSIs following gastrointestinal surgery is notably higher, with reported rates ranging from 10% to 25%, leading to prolonged hospital stays, increased readmission rates, potential reoperations, and substantial patient morbidity (\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eFrom an economic perspective, SSIs exert a considerable strain on healthcare systems. In the United States, the additional costs associated with SSIs have been estimated at approximately \u003cspan\u003e$\u003c/span\u003e20,000 per patient, amounting to billions of dollars in annual expenditures (\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e, \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e). Beyond the financial implications, SSIs adversely impact patient quality of life, engendering longer recovery periods, psychological distress, and, in severe cases, increased mortality.\u003c/p\u003e\u003cp\u003eTraditional approaches to SSI risk stratification have relied predominantly on conventional statistical methodologies, such as logistic regression and composite clinical risk scores. While these models, including the American Society of Anesthesiologists (ASA) classification and the National Healthcare Safety Network (NHSN) risk index, are widely used in clinical practice, they suffer from limited discriminative capability. Typical AUROC values for these models range from 0.60 to 0.70, reflecting suboptimal predictive accuracy and an inability to capture complex, non-linear relationships among patient demographics, comorbidities, intraoperative variables, and infection risk (\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eThe advent of machine learning (ML) in healthcare has catalyzed a paradigm shift in predictive modeling. ML algorithms, particularly ensemble methods such as random forests and gradient boosting machines, as well as deep learning approaches, have demonstrated superior performance in a variety of medical prediction tasks due to their capacity to identify intricate patterns in high-dimensional clinical data (\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e). Despite this promise, the application of advanced machine learning models to SSI prediction in gastrointestinal surgery remains relatively underexplored.\u003c/p\u003e\u003cp\u003eCrucially, predictive accuracy alone is insufficient for the successful clinical translation of ML-based risk models. To engender trust and facilitate adoption among clinicians, models must also demonstrate clinical utility, fairness across diverse patient subgroups, and interpretability (\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e, \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e). The practical value of a predictive system hinges on its ability to inform actionable decision-making, equitably benefit all patient populations, and provide transparent reasoning for its outputs.\u003c/p\u003e\u003cp\u003eThis study aims to address these critical gaps by systematically developing, validating, and comparing multiple machine learning algorithms for SSI prediction following gastrointestinal surgery. Using a rigorous comparative framework, we evaluate traditional and advanced ML models not only in terms of predictive performance but also clinical utility (via decision curve analysis), fairness (through subgroup analyses), and interpretability (leveraging SHapley Additive exPlanations, SHAP). By focusing on these multidimensional aspects, we endeavor to bridge the chasm between predictive excellence and practical implementation in surgical care.\u003c/p\u003e\u003cp\u003eThe primary objectives of this investigation are threefold: (\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e) to develop and benchmark multiple ML models for SSI prediction post-gastrointestinal surgery; (\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e) to assess the clinical utility and fairness of the top-performing model across relevant patient subgroups; and (\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e) to enhance interpretability through advanced feature importance techniques, thereby supporting integration of predictive analytics into routine surgical workflows for targeted SSI prevention.\u003c/p\u003e"},{"header":"Patients and Methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\u003ch2\u003eStudy Setting and Population\u003c/h2\u003e\u003cp\u003eThis retrospective cohort study was conducted at Ibb University-affiliated hospitals in Yemen, focusing exclusively on patients who underwent major gastrointestinal surgery between January 2018 and December 2023. The participating hospitals serve a large and diverse catchment area in central Yemen, reflecting the patient and procedural diversity characteristic of resource-limited settings. The initial study cohort included 580 consecutive adults (aged 18 years or older) who underwent elective or emergency major gastrointestinal operations, including procedures involving the stomach, small and large intestines, hepatobiliary system, and colorectal region.\u003c/p\u003e\u003cp\u003ePatients were excluded if they lacked complete postoperative follow-up data for at least 30 days, had missing critical predictor variables, or were lost to follow-up, yielding a final analytic sample of 525 patients (\u003cb\u003eSupplementary Figure \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e\u003c/b\u003e). All surgeries were performed by attending general surgeons trained in gastrointestinal surgery, ensuring a consistent standard of surgical care across the study period.\u003c/p\u003e\u003c/div\u003e\n\u003ch3\u003eStudy Design and Ethical Considerations\u003c/h3\u003e\n\u003cp\u003eThis study employed a retrospective cohort design, leveraging routinely collected clinical data. Ethical approval was obtained from the Ibb University Faculty of Medicine Institutional Review Board, with a waiver of informed consent in accordance with the retrospective nature of the analysis. The investigation adhered to the principles of the Declaration of Helsinki and followed TRIPOD guidelines for transparent reporting of multivariable prediction model development and validation.\u003c/p\u003e\n\u003ch3\u003eData Sources and Quality Assurance\u003c/h3\u003e\n\u003cp\u003eData extraction was performed from the hospitals' archived health records, integrating information from preoperative assessment forms, anesthesia records, operative reports, and postoperative clinical documentation. A standardized data abstraction protocol was developed and implemented by two trained research assistants. Data completeness and accuracy were monitored through periodic audits conducted by the principal investigator. To ensure inter-rater reliability, a random subset of records was independently reviewed, and Cohen's kappa statistic was calculated, demonstrating substantial agreement for the primary outcome and key predictors (κ\u0026thinsp;=\u0026thinsp;0.82).\u003c/p\u003e\n\u003ch3\u003eOutcome Definition\u003c/h3\u003e\n\u003cp\u003eThe primary outcome was the occurrence of surgical site infection (SSI) within 30 days of gastrointestinal surgery. SSI was defined according to the Centers for Disease Control and Prevention National Healthcare Safety Network criteria: (a) purulent drainage from the incision, (b) positive microbial cultures obtained from the wound, (c) clinical signs of infection (erythema, warmth, tenderness, or induration), or (d) surgeon diagnosis requiring therapeutic intervention. Two independent surgical consultants adjudicated all outcomes, with disagreements resolved through consensus discussion with a third senior surgeon.\u003c/p\u003e\n\u003ch3\u003ePredictor Variables and Feature Engineering\u003c/h3\u003e\n\u003cp\u003eA total of 38 preoperative and intraoperative variables were selected for model development, based on clinical relevance and evidence from prior literature (\u003cb\u003eSupplementary Table S2\u003c/b\u003e). Demographic variables included age (both continuous and dichotomized at \u0026ge;\u0026thinsp;60 years), gender (coded as Gender_0 for female and Gender_1 for male), and body mass index (BMI, categorized per WHO guidelines). Comorbidities comprised diabetes mellitus, hypertension, chronic renal failure, chronic liver disease, pulmonary disease, and current smoking status.\u003c/p\u003e\u003cp\u003eProcedural variables encompassed the type of gastrointestinal surgery (classified by anatomical site and complexity), wound contamination class (clean, clean-contaminated, contaminated, dirty), anesthesia type, surgical urgency (elective, urgent, emergency), and estimated intraoperative blood loss. Intraoperative metrics included operative duration (minutes), preoperative leukocyte count, and temperature at the conclusion of the procedure. Risk assessment tools such as the American Society of Anesthesiologists (ASA) physical status and the National Nosocomial Infections Surveillance (NNIS) risk index were also included.\u003c/p\u003e\u003cp\u003eContinuous predictors were retained in their natural scale for tree-based models (Random Forest, XGBoost) and standardized (z-score normalization) for logistic regression and neural network models. Categorical variables underwent one-hot encoding to facilitate algorithm compatibility. Missing data, present in less than 3% of all variables, were imputed using multivariate imputation by chained equations, ensuring that no predictor with \u0026gt;\u0026thinsp;10% missingness was used.\u003c/p\u003e\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e\u003ch2\u003eModel Development\u003c/h2\u003e\u003cdiv id=\"Sec9\" class=\"Section3\"\u003e\u003ch2\u003eData Partitioning and Temporal Validation\u003c/h2\u003e\u003cp\u003eThe dataset was split temporally to mimic real-world clinical implementation and to minimize overfitting. Data from 2018 to 2022 (n\u0026thinsp;=\u0026thinsp;420) served as the development (training and internal validation) set; data from 2023 (n\u0026thinsp;=\u0026thinsp;105) formed the external temporal test set.\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\n\u003ch3\u003eMachine Learning Algorithms\u003c/h3\u003e\n\u003cp\u003eFour supervised machine learning algorithms were employed:\u003c/p\u003e\u003cp\u003e\u003col\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eLogistic Regression (LR)\u003c/b\u003e: A baseline interpretable model using regularized logistic regression with L2 penalty.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eRandom Forest (RF)\u003c/b\u003e: An ensemble of decision trees constructed using bootstrap aggregation and random feature selection at each split.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eExtreme Gradient Boosting (XGBoost)\u003c/b\u003e: A highly efficient implementation of gradient boosting decision trees, optimized for tabular data.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eFeed-Forward Neural Network (FNN)\u003c/b\u003e: A multi-layer perceptron with two hidden layers, using ReLU activation and dropout regularization (dropout rate\u0026thinsp;=\u0026thinsp;0.2).\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003c/ol\u003e\u003c/p\u003e\u003cp\u003eAll models were implemented in Python using Scikit-learn and XGBoost libraries. Hyperparameters were optimized using five-fold cross-validation on the 2018\u0026ndash;2022 data, with performance assessed by mean area under the receiver operating characteristic curve (AUROC). Grid search and random search strategies were employed for hyperparameter tuning (\u003cb\u003eSupplementary Table S3\u003c/b\u003e).\u003c/p\u003e\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e\u003ch2\u003eModel Evaluation Metrics\u003c/h2\u003e\u003cp\u003ePerformance was comprehensively evaluated using the following metrics:\u003c/p\u003e\u003cp\u003e\u003cul\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eDiscrimination\u003c/b\u003e: Area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC).\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eCalibration\u003c/b\u003e: Brier score, calibration slope, and calibration plots comparing predicted probabilities with observed SSI rates.\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eClinical Utility\u003c/b\u003e: Decision curve analysis (DCA) was conducted to estimate net benefit across clinically plausible risk thresholds (15\u0026ndash;35%).\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eSubgroup Analysis\u003c/b\u003e: Model performance was assessed within predefined demographic and clinical subgroups (age\u0026thinsp;\u0026lt;\u0026thinsp;60 vs\u0026thinsp;\u0026ge;\u0026thinsp;60 years, gender, diabetes status, wound class, urgency), to evaluate fairness and generalizability.\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eStatistical Comparison\u003c/b\u003e: DeLong's test was used to compare AUROCs between models.\u003c/p\u003e\u003c/li\u003e\u003c/ul\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec12\" class=\"Section2\"\u003e\u003ch2\u003eSimplified Risk Score Development\u003c/h2\u003e\u003cp\u003eA parsimonious risk score was derived from the best-performing machine learning model, using backward stepwise feature selection and clinical interpretability criteria. Model coefficients were transformed into integer points proportional to their effect size, yielding a user-friendly tool for clinical implementation in low-resource environments (\u003cb\u003eSupplementary Note S1\u003c/b\u003e).\u003c/p\u003e\u003c/div\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec14\" class=\"Section2\"\u003e\u003ch2\u003ePatient and Procedure Characteristics\u003c/h2\u003e\u003cp\u003eOf the 525 included patients, the mean age was 52.4 years (SD 15.8), with 41.9% aged\u0026thinsp;\u0026ge;\u0026thinsp;60. Males represented 57.9% of the cohort. The prevalence of major comorbidities included diabetes mellitus (21.3%), hypertension (29.1%), chronic renal failure (4.6%), and current smoking (22.1%). The majority of procedures were classified as elective (62.7%), with the remainder being urgent (21.1%) or emergency (16.2%). Wound contamination classes were distributed as follows: clean (24.8%), clean-contaminated (38.3%), contaminated (23.1%), and dirty (13.7%). The observed 30-day SSI rate was 16.2% (n\u0026thinsp;=\u0026thinsp;85), with similar rates in both the development (16.0%) and test cohorts (17.1%) (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e).\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eBaseline Characteristics of the Study Cohort (n\u0026thinsp;=\u0026thinsp;525)\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"4\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eVariable\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eOverall (n\u0026thinsp;=\u0026thinsp;525)\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eSSI (n\u0026thinsp;=\u0026thinsp;85)\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eNo SSI (n\u0026thinsp;=\u0026thinsp;440)\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAge, mean (SD), years\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e52.4 (15.8)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e56.1 (14.3)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e51.7 (16.1)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAge\u0026thinsp;\u0026ge;\u0026thinsp;60 years, n (%)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e220 (41.9)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e44 (51.8)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e176 (40.0)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eMale gender, n (%)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e304 (57.9)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e53 (62.4)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e251 (57.0)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eBMI\u0026thinsp;\u0026ge;\u0026thinsp;30 kg/m\u0026sup2;, n (%)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e98 (18.7)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e19 (22.4)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e79 (18.0)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eDiabetes mellitus, n (%)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e112 (21.3)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e33 (38.8)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e79 (18.0)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eHypertension, n (%)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e153 (29.1)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e27 (31.8)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e126 (28.6)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eChronic renal failure, n (%)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e24 (4.6)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e6 (7.1)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e18 (4.1)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eCurrent smoker, n (%)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e116 (22.1)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e21 (24.7)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e95 (21.6)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eEmergency surgery, n (%)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e85 (16.2)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e23 (27.1)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e62 (14.1)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eWound contamination: Dirty, n (%)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e72 (13.7)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e24 (28.2)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e48 (10.9)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eOperative duration\u0026thinsp;\u0026gt;\u0026thinsp;180 min, n (%)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e108 (20.6)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e29 (34.1)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e79 (18.0)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003ePreop leukocytosis (\u0026gt;\u0026thinsp;12 \u0026times;10⁹/L), n (%)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e97 (18.5)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e24 (28.2)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e73 (16.6)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eASA class\u0026thinsp;\u0026ge;\u0026thinsp;3, n (%)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e189 (36.0)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e41 (48.2)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e148 (33.6)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eSSI within 30 days, n (%)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e85 (16.2)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e\u0026mdash;\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e\u0026mdash;\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec15\" class=\"Section2\"\u003e\u003ch2\u003eModel Performance: Discrimination\u003c/h2\u003e\u003cp\u003eOn the held-out 2023 test set (n\u0026thinsp;=\u0026thinsp;105), XGBoost demonstrated the highest discrimination for SSI prediction, achieving an AUROC of 0.934 (95% CI: 0.891\u0026ndash;0.967) and an AUPRC of 0.809. Random Forest yielded comparably strong performance (AUROC: 0.924, AUPRC: 0.787), while the neural network model (AUROC: 0.890, AUPRC: 0.712) and logistic regression (AUROC: 0.868, AUPRC: 0.677) performed less well (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e1\u003c/span\u003eA). DeLong's test confirmed that XGBoost significantly outperformed both logistic regression (p\u0026thinsp;=\u0026thinsp;0.012) and the neural network (p\u0026thinsp;=\u0026thinsp;0.038), but not Random Forest (p\u0026thinsp;=\u0026thinsp;0.17) (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e1\u003c/span\u003eB). Pairwise AUROC comparisons using DeLong's test confirmed the statistically significant superiority of the XGBoost model over logistic regression and the neural network, while the difference between XGBoost and Random Forest was not statistically significant. The net reclassification improvement (NRI) for XGBoost over logistic regression was 0.21 (p\u0026thinsp;=\u0026thinsp;0.009), indicating meaningful clinical impact (Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e).\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eModel Performance Metrics on the 2023 Test Set (n\u0026thinsp;=\u0026thinsp;105)\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"7\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eModel\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eAUROC (95% CI)\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eAUPRC\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eBrier Score\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e\u003cp\u003eSensitivity\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c6\"\u003e\u003cp\u003eSpecificity\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c7\"\u003e\u003cp\u003eOptimal Threshold (Youden)\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eLogistic Regression\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.868 (0.799\u0026ndash;0.921)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.677\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.098\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.78\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e0.87\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.18\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eRandom Forest\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.924 (0.871\u0026ndash;0.961)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.787\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.087\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.89\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e0.90\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.22\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eXGBoost\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.934 (0.891\u0026ndash;0.967)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.809\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.080\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.91\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e0.92\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.20\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eNeural Network\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.890 (0.825\u0026ndash;0.938)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.712\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.093\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.85\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e0.88\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.19\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eSimplified Score\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.841 (0.764\u0026ndash;0.904)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.603\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.104\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.74\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e0.85\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e3 points\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec16\" class=\"Section2\"\u003e\u003ch2\u003eCalibration and Clinical Utility\u003c/h2\u003e\u003cp\u003eAll models demonstrated a tendency to overestimate absolute SSI risk, as reflected by calibration slopes\u0026thinsp;\u0026gt;\u0026thinsp;2.0. Nonetheless, XGBoost exhibited the best calibration (Brier score: 0.080), followed by Random Forest (Brier score: 0.087), neural network (Brier score: 0.093), and logistic regression (Brier score: 0.098). Calibration plots (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e2\u003c/span\u003eA) showed that recalibration using isotonic regression improved alignment between predicted and observed risks, particularly for XGBoost and Random Forest.\u003c/p\u003e\u003cp\u003eDecision curve analysis indicated that the net benefit of XGBoost and Random Forest models exceeded that of both logistic regression and neural network models across clinically relevant SSI risk thresholds (15\u0026ndash;35%), supporting their clinical applicability for risk stratification (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e2\u003c/span\u003eB).\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec17\" class=\"Section2\"\u003e\u003ch2\u003eSubgroup Analyses: Fairness and Robustness\u003c/h2\u003e\u003cp\u003eModel performance was consistent across clinically important subgroups (Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e). For elderly patients (\u0026ge;\u0026thinsp;60 years), XGBoost achieved an AUROC of 0.967, with Random Forest also performing strongly (AUROC: 0.959). Among diabetic patients, Random Forest exhibited the highest discrimination (AUROC: 0.979), while XGBoost and neural network models maintained AUROCs above 0.91. No significant differences in performance were observed by gender or surgical urgency (\u003cb\u003eSupplementary Table S4\u003c/b\u003e).\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eSubgroup Model Performance (AUROC) for XGBoost and Random Forest\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"3\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eSubgroup\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eXGBoost AUROC\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eRandom Forest AUROC\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAge\u0026thinsp;\u0026ge;\u0026thinsp;60 years\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.967\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.959\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAge\u0026thinsp;\u0026lt;\u0026thinsp;60 years\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.913\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.902\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eMale\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.921\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.918\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eFemale\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.943\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.926\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eDiabetes\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.957\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.979\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eNo Diabetes\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.912\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.901\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eEmergency surgery\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.951\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.939\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eElective surgery\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.929\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.917\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eDirty wound class\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.962\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.957\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eA sensitivity analysis excluding patients with missing data (complete case analysis) yielded similar results, confirming the robustness of the model findings.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec18\" class=\"Section2\"\u003e\u003ch2\u003eFeature Importance\u003c/h2\u003e\u003cp\u003eBoth XGBoost and Random Forest models highlighted the following predictors as most influential: wound contamination class, operative duration, diabetes mellitus status, ASA class, and age. Additional important features included preoperative leukocyte count, estimated blood loss, and emergency status. Figure\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e3\u003c/span\u003eA displays the top ten feature importances for the XGBoost model using SHAP analysis.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec19\" class=\"Section2\"\u003e\u003ch2\u003eSimplified Risk Score Development\u003c/h2\u003e\u003cp\u003eA seven-variable risk score was derived from the XGBoost model, including: wound contamination, operative duration (\u0026gt;\u0026thinsp;180 min), diabetes, ASA class (\u0026ge;\u0026thinsp;3), emergency surgery, age\u0026thinsp;\u0026ge;\u0026thinsp;60 years, and preoperative leucocytosis (\u0026gt;\u0026thinsp;12 \u0026times; 10⁹/L). Assigning integer points based on logistic regression beta coefficients, the risk score demonstrated an AUROC of 0.841 on the test set\u0026mdash;comparable to logistic regression, and only modestly lower than the full XGBoost model (Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e3\u003c/span\u003eB).\u003c/p\u003e\u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003eThe study presents a comprehensive evaluation of machine learning algorithms designed to predict surgical site infection (SSI) following major gastrointestinal surgery in a resource-constrained setting. Notably, ensemble methods such as XGBoost and Random Forest demonstrated superior discriminative performance compared to traditional methods including logistic regression and neural networks, with area under the receiver operating characteristic curve (AUROC) values exceeding 0.9. These results underscore the potential of sophisticated machine learning techniques to effectively identify patients at enhanced risk of SSI. The derivation of a simplified risk score from the highest-performing model offers a pragmatic tool for clinical implementation, particularly in environments with limited computational resources. This contribution adds to the growing evidence suggesting that high-performance SSI prediction can be achieved using structured clinical data alone, circumventing the need for complex unstructured data such as natural language processing, which is often not feasible in low-resource settings.\u003c/p\u003e\u003cp\u003eThe superior performance of our ensemble models over logistic regression highlights the value of algorithms capable of capturing complex, non-linear relationships within clinical data. This finding is consistent with investigations such as those conducted by Chen et al. (2020), who similarly identified the efficacy of ensemble methods in the prediction of SSI (\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e). Conversely, our findings diverge from alternative studies, notably Song et al., which concluded that SSI identification with administrative data may inaccurate (\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e). This divergence is likely attributable to methodological variations. Our investigation employed a comprehensive array of clinical and intraoperative variables, the complex interdependencies of which are more effectively captured by tree-based algorithms. In contrast, research that relies on less complex or administrative datasets may not demonstrate the same degree of intricate feature interactions, thus rendering linear models adequate. This observation implies that the most suitable algorithm is contingent upon contextual factors, shaped by the specificity and nature of the predictor variables. The implications are unequivocal: as clinical datasets evolve to become more extensive and nuanced, the utilization of advanced, non-linear models such as XGBoost or Temporal Adaptive Neural Evolutionary Algorithm (TANEA) will become increasingly essential to realize their full predictive capabilities (\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e, \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eRegarding performance metrics, our models demonstrated exceptional discriminative ability, with an AUROC of 0.934 for XGBoost, which surpasses the pooled AUC of 0.93 reported in the meta-analysis and the specific results of similar predictive studies like Chen et al. (AUROC\u0026thinsp;~\u0026thinsp;0.78) (\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e, \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e). Furthermore, our model's sensitivity (0.91) and specificity (0.92) at the optimal threshold are notably higher than the meta-analysis pooled estimates for structured-data models (sensitivity 0.56, specificity 0.95) (\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e). While our study demonstrates that robust SSI prediction is achievable through the analysis of structured clinical data, these findings must be interpreted with caution due to important methodological constraints, particularly when compared to prior approaches. Previous studies have demonstrated strong performance in SSI identification using ML but with key differences. For instance, one study using NLP achieved a sensitivity and positive predictive value of 97% for SSI detection (\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e) by leveraging the rich, post-operative data in clinical notes, a data modality our study lacked. In contrast, our predictive model, reliant on pre- and intra-operative structured data, may be inherently limited in complex cases. Our cohort's exclusion of critical predictors\u0026mdash;such as neoadjuvant radiation, chemotherapy, and immunodeficiency, which are pivotal in oncological surgeries\u0026mdash;likely diminishes the model's accuracy and generalizability for patients undergoing radical procedures for malignancy. Furthermore, the absence of radiological image variables represents another significant source of unmeasured confounding. Another study achieved an AUC of 86% using preoperative blood tests (\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e), a more limited but highly standardized dataset. While our model, which incorporated a broader set of operative variables, achieved a higher AUC of 0.934, its performance in the most complex surgical cases remains uncertain due to these data omissions. Therefore, our approach, though promising for general gastrointestinal surgery, should be considered inferior for populations where the excluded variables are key determinants of outcome.\u003c/p\u003e\u003cp\u003eWhile our models demonstrated excellent discrimination, they were poorly calibrated, systematically overestimating the absolute risk of SSI. This miscalibration necessitates caution in clinical interpretation, as a predicted probability of 30% from our model does not equate to a true 30% risk. However, despite this limitation, decision curve analysis revealed that both the XGBoost model and the simplified risk score provided a superior net benefit across a range of clinically relevant risk thresholds compared to default strategies. This suggests that even with imperfect probability estimation, the model's ability to correctly rank patients by risk offers tangible value for decision-making. In resource-constrained environments, this could rationally guide the targeted intensification of perioperative measures\u0026mdash;such as prolonged antibiotic prophylaxis or enhanced post-discharge monitoring\u0026mdash;for the highest-risk patients, thereby maximizing the efficient use of limited infection control resources (\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e). Therefore, the clinical utility of our models is not negated by their poor calibration but is instead contextualized by it; they serve as effective tools for risk stratification rather than for providing precise individual-level prognostic probabilities.\u003c/p\u003e\u003cp\u003eOur analysis of feature importance further corroborated the clinical relevance of our models. The prominence of well-recognized risk factors\u0026mdash;specifically, wound contamination class, operative duration, diabetes mellitus, and ASA classification\u0026mdash;exhibits strong concordance with the extant literature, including studies conducted by Bucher et al. and Merath et al. (\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e, \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e). This alignment reinforces the biological and clinical face validity of our selected predictors. The robust performance of these features, even in the absence of textual data, suggests that they encapsulate a fundamental and potent core of SSI risk.\u003c/p\u003e\u003cdiv id=\"Sec21\" class=\"Section2\"\u003e\u003ch2\u003eStudy limitations:\u003c/h2\u003e\u003cp\u003eDespite these advantages, our investigation is accompanied by several significant limitations that necessitate consideration for the advancement of this research. Firstly, the single-center, retrospective design inherently restricts the external validity of our results. Although temporal validation yields a more accurate assessment of performance than random partitioning, it does not evaluate the models' efficacy across various healthcare environments characterized by differing patient demographics, surgical methodologies, and documentation standards. Conducting external validation in heterogeneous clinical contexts emerges as a vital subsequent endeavor.\u003c/p\u003e\u003cp\u003eSecondly, we discerned a substantial deficiency in model calibration. All of our models, inclusive of the highest performing XGBoost, exhibited a propensity to overestimate the absolute risk of SSI. This miscalibration implies that while the models excel in stratifying patients according to risk, their predicted probabilities lack reliability in conveying precise individual risk assessments. This phenomenon represents a prevalent yet frequently underreported challenge in the domain of clinical ML that must be addressed prior to the comprehensive integration of models into clinical decision-making processes.\u003c/p\u003e\u003cp\u003eThirdly, we encountered the traditional trade-off between model efficacy and interpretability. The XGBoost algorithm, notwithstanding its high accuracy, functions as a \"black box,\" complicating clinicians' ability to comprehend the basis for its predictions. Our endeavor to mitigate this gap by developing a simplified risk score, although successful in improving interpretability, resulted in a significant decline in predictive accuracy (AUROC reduction from 0.934 to 0.841). This underscores a fundamental conundrum within the discipline. Furthermore, our algorithmic exploration was constrained; we did not assess alternative model classes, such as Explainable Boosting Machines (EBMs) or Bayesian Networks, which are specifically engineered to achieve a more advantageous equilibrium between accuracy, interpretability, and inherent calibration.\u003c/p\u003e\u003cp\u003eUltimately, our predictive capability was limited by the data accessible within our retrospective archives. We were deprived of unstructured textual data from clinical notes, which has proven to be a critical differentiator for the top-performing models documented in the literature. Additionally, we were unable to incorporate dynamic laboratory trends, microbiological data such as the history of multi-drug resistant organism (MDRO) colonization, or social determinants of health, all of which are likely to be significant predictors, particularly in contexts with limited resources.\u003c/p\u003e\u003c/div\u003e"},{"header":"Conclusion","content":"\u003cp\u003eIn summary, this study reinforces the significance of ML, particularly ensemble methodologies like XGBoost, in facilitating accurate predictions of SSIs. It illustrates that meticulous model development and validation can be effectively executed in a resource-constrained environment, producing tools that possess the potential to markedly enhance preoperative risk stratification. For clinical practice, our simplified risk score serves as an immediately actionable tool. For the research community, our findings emphasize that robust prediction is attainable with structured data, while concurrently accentuating the critical significance of external validation, model calibration, and interpretability. Future investigations should therefore concentrate on three pivotal areas: 1) the external validation of these models across diverse healthcare systems, 2) the incorporation of enriched data sources, including clinical notes and longitudinal laboratory data, and 3) the development and application of inherently interpretable and well-calibrated model architectures, such as Explainable Boosting Machines, to reconcile the dichotomy between predictive accuracy and clinical applicability.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthical Approval:\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eEthical approval was obtained from the Ibb University Faculty of Medicine Institutional Review Board, with a waiver of informed consent in accordance with the retrospective nature of the analysis. The investigation adhered to the principles of the Declaration of Helsinki and followed TRIPOD guidelines for transparent reporting of multivariable prediction model development and validation.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting Interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAll authors declare that they have no competing interests to disclose.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent for publication\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAll authors have read and accepted the final version of the work, and they agree to its publication.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eContributors\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eSaif G\u003c/strong\u003e Investigation, Methodology, Project Administration, Writing \u0026ndash; Review \u0026amp; Editing.\u003cstrong\u003e\u0026nbsp;Ahmed A\u0026nbsp;\u003c/strong\u003eand \u003cstrong\u003eWadhah E\u0026nbsp;\u003c/strong\u003e Conceptualization, Data Curation, Formal Analysis, Writing \u0026ndash; Original Draft.\u003cstrong\u003eQasem A\u003c/strong\u003e and \u003cstrong\u003eYasser O\u003c/strong\u003e Investigation, Methodology, Project Administration, Writing \u0026ndash; Review \u0026amp; Editing.\u003cstrong\u003eSaleh A\u003c/strong\u003e and\u0026nbsp;\u003cstrong\u003eMohammed A\u003c/strong\u003e Conceptualization, Data Curation, Formal Analysis, Writing \u0026ndash; Original Draft.\u003cbr\u003e\u003cstrong\u003eFaisal A\u003c/strong\u003eSupervision, Validation, Visualization, Writing \u0026ndash; Review \u0026amp; Editing, \u003cstrong\u003eAl-Shehari W\u0026nbsp;\u003c/strong\u003ewriting, Review and editing Corresponding Author.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCorresponding author\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eCorrespondence to\u0026nbsp;Wadee Abdullah Al-Shehari\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData availability\u003c/strong\u003e\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe datasets used and analyzed during the current analysis available from the corresponding author on reasonable request.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eChang J, Karlsdottir BR, Phillips HL, Loeffler BT, Mott SL, Hrabe JE, et al. Modern Trends in Surgical Site Infection Rates for Colorectal Surgery: A National Surgical Quality Improvement Project Study 2013\u0026ndash;2020. Dis Colon Rectum. 2024;67(9):1201\u0026ndash;9.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eChen KA, Joisa CU, Stem JM, Guillem JG, Gomez SM, Kapadia MR. Improved Prediction of Surgical-Site Infection After Colorectal Surgery Using Machine Learning. Dis Colon Rectum. 2023;66(3):458\u0026ndash;66.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAgostinho A, Chalot E, Teixeira D, Bosetti D, Buetti N, Catho G, et al. Semi-automated surveillance of surgical site infections using machine learning and rule-based classification models. npj Digit Med. 2025;8(1):617.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eCostabella F, Patel KB, Adepoju AV, Singh P, Attia Hussein Mahmoud H, Zafar A, et al. Healthcare Cost and Outcomes Associated With Surgical Site Infection and Patient Outcomes in Low- and Middle-Income Countries. Cureus. 2023;15(7):e42493.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBroex EC, van Asselt AD, Bruggeman CA, van Tiel FH. Surgical site infections: how high are the costs? J Hosp Infect. 2009;72(3):193\u0026ndash;201.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHirani S, Trivedi NA, Chauhan J, Chauhan Y. A study of clinical and economic burden of surgical site infection in patients undergoing caesarian section at a tertiary care teaching hospital in India. PLoS ONE. 2022;17(6):e0269530.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKorol E, Johnston K, Waser N, Sifakis F, Jafri HS, Lo M, et al. A systematic review of risk factors associated with surgical site infections among surgical patients. PLoS ONE. 2013;8(12):e83743.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZimlichman E, Henderson D, Tamir O, Franz C, Song P, Yamin CK, et al. Health care-associated infections: a meta-analysis of costs and financial impact on the US health care system. JAMA Intern Med. 2013;173(22):2039\u0026ndash;46.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRajkomar A, Dean J, Kohane I. Machine Learning in Medicine. N Engl J Med. 2019;380(14):1347\u0026ndash;58.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eTopol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25(1):44\u0026ndash;56.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eChakradeo K, Huynh I, Balaganeshan SB, Dollerup OL, Gade-J\u0026oslash;rgensen H, Laupstad SK, et al. Navigating fairness aspects of clinical prediction models. BMC Med. 2025;23(1):567.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eChen W, Lu Z, You L, Zhou L, Xu J, Chen K. Artificial Intelligence-Based Multimodal Risk Assessment Model for Surgical Site Infection (AMRAMS): Development and Validation Study. JMIR Med Inf. 2020;8(6):e18186.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSong X, Cosgrove SE, Pass MA, Perl TM. Using hospital claim data to monitor surgical site infections for inpatient procedures. Am J Infect Control. 2008;36(3):S32\u0026ndash;6.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eS C, C A, K S. Advanced predictive disease modeling in biomedical IoT using the temporal adaptive neural evolutionary algorithm. Sci Rep. 2025;15(1):20378.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRafie Z, Talab MS, Koor BEZ, Garavand A, Salehnasab C, Ghaderzadeh M. Leveraging XGBoost and explainable AI for accurate prediction of type 2 diabetes. BMC Public Health. 2025;25(1):3688.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWu G, Khair S, Yang F, Cheligeer C, Southern D, Zhang Z, et al. Performance of machine learning algorithms for surgical site infection case detection and prediction: A systematic review and meta-analysis. Ann Med Surg (Lond). 2022;84:104956.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMamlook REA, Wells LJ, Sawyer R. Machine-learning models for predicting surgical site infections using patient pre-operative risk and surgical procedure factors. Am J Infect Control. 2023;51(5):544\u0026ndash;50.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMandagani P, Coleman S, Zahid A, Ehlers AP, Roy SB, De Cock M, editors. Machine learning models for surgical site infection prediction. AMIA KDDM-WG Symposium (American medical Informatics Association Knowledge Discovery and Data Mining Working Group); 2016.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSeymour CW, Cooke CR, Wang Z, Kerr KF, Yealy DM, Angus DC, et al. Improving risk classification of critical illness with biomarkers: a simulation study. J Crit Care. 2013;28(5):541\u0026ndash;8.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBucher BT, Shi J, Ferraro JP, Skarda DE, Samore MH, Hurdle JF, et al. Portable Automated Surveillance of Surgical Site Infections Using Natural Language Processing: Development and Validation. Ann Surg. 2020;272(4):629\u0026ndash;36.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMerath K, Hyer JM, Mehta R, Farooq A, Bagante F, Sahara K, et al. Use of Machine Learning for Prediction of Patient Risk of Postoperative Complications After Liver, Pancreatic, and Colorectal Surgery. J Gastrointest Surg. 2020;24(8):1843\u0026ndash;51.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"patient-safety-in-surgery","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"psis","sideBox":"Learn more about [Patient Safety in Surgery](http://pssjournal.biomedcentral.com/)","snPcode":"13037","submissionUrl":"https://submission.nature.com/new-submission/13037/3","title":"Patient Safety in Surgery","twitterHandle":"@EMSurgeryBMC","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"BMC/SO AJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Machine Learning, Surgical Site Infections, Gastrointestinal Surgery: A Comprehensive","lastPublishedDoi":"10.21203/rs.3.rs-8232610/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8232610/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eBackground\u003c/h2\u003e\u003cp\u003eSurgical site infections (SSIs) continue to exert a substantial burden on healthcare systems, particularly in resource-limited settings where they contribute to prolonged hospitalizations, escalated costs, and increased patient morbidity. The ability to accurately predict SSI risk is essential for implementing targeted prevention strategies and optimizing resource allocation, especially in constrained environments.\u003c/p\u003e\u003ch2\u003eMethods\u003c/h2\u003e\u003cp\u003eWe conducted a retrospective cohort study utilizing data from 525 patients who underwent major gastrointestinal surgery at Ibb University-affiliated hospitals in Yemen between 2018 and 2023. Four machine learning models\u0026mdash;Logistic Regression, Random Forest, XGBoost, and Neural Network\u0026mdash;were developed using 38 preoperative and intraoperative variables. Temporal validation was performed, with data from 2018\u0026ndash;2022 used for model training (n\u0026thinsp;=\u0026thinsp;420) and 2023 data (n\u0026thinsp;=\u0026thinsp;105) reserved for testing. Model performance was evaluated by area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), calibration metrics, and decision curve analysis. Subgroup analyses assessed model fairness across demographic and clinical strata.\u003c/p\u003e\u003ch2\u003eResults\u003c/h2\u003e\u003cp\u003eThe observed SSI rate was 16.2%, consistent across both training and test sets. XGBoost achieved the highest predictive performance (AUROC: 0.934; 95% CI: 0.891\u0026ndash;0.967; AUPRC: 0.809), outperforming logistic regression (AUROC: 0.868, p\u0026thinsp;=\u0026thinsp;0.012) and neural network (AUROC: 0.890, p\u0026thinsp;=\u0026thinsp;0.038) models. Random Forest also demonstrated competitive accuracy (AUROC: 0.924; AUPRC: 0.787). Robust performance was maintained across critical subgroups, with XGBoost yielding an AUROC of 0.967 among elderly patients and Random Forest achieving an AUROC of 0.979 among diabetic patients. All models systematically overestimated SSI risk (calibration slopes\u0026thinsp;\u0026gt;\u0026thinsp;2.0), though XGBoost exhibited the best calibration (Brier score: 0.080). Decision curve analysis confirmed clinical utility within probability thresholds of 15\u0026ndash;35%.\u003c/p\u003e\u003ch2\u003eConclusion\u003c/h2\u003e\u003cp\u003eMachine learning models, specifically XGBoost and Random Forest, can accurately predict SSI risk following major gastrointestinal surgery in the Yemeni healthcare context. Despite calibration limitations, these models demonstrate strong discriminative ability and clinical utility, supporting their use for risk stratification in resource-limited settings. The development of a simplified risk score offers a pragmatic alternative for implementation in environments with limited technological infrastructure.\u003c/p\u003e","manuscriptTitle":"Machine Learning Prediction of Surgical Site Infections Following Major Gastrointestinal Surgery: A Comprehensive Model Development and Validation Study in Yemeni Patients","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-12-03 11:44:56","doi":"10.21203/rs.3.rs-8232610/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2026-02-20T21:29:56+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-02-20T21:25:13+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"227265540355291645404775460793174367878","date":"2026-02-20T19:56:28+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-12-12T08:00:51+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"224764457176061595590672565625583606600","date":"2025-12-01T17:30:45+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-12-01T16:52:11+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-12-01T16:50:37+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-12-01T16:12:04+00:00","index":"","fulltext":""},{"type":"submitted","content":"Patient Safety in Surgery","date":"2025-11-28T18:01:16+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"patient-safety-in-surgery","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"psis","sideBox":"Learn more about [Patient Safety in Surgery](http://pssjournal.biomedcentral.com/)","snPcode":"13037","submissionUrl":"https://submission.nature.com/new-submission/13037/3","title":"Patient Safety in Surgery","twitterHandle":"@EMSurgeryBMC","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"BMC/SO AJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"fb41032e-1ab3-4417-9ab2-744f72fd5efd","owner":[],"postedDate":"December 3rd, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2026-03-23T16:09:33+00:00","versionOfRecord":{"articleIdentity":"rs-8232610","link":"https://doi.org/10.1186/s13037-026-00481-3","journal":{"identity":"patient-safety-in-surgery","isVorOnly":false,"title":"Patient Safety in Surgery"},"publishedOn":"2026-03-16 15:59:40","publishedOnDateReadable":"March 16th, 2026"},"versionCreatedAt":"2025-12-03 11:44:56","video":"","vorDoi":"10.1186/s13037-026-00481-3","vorDoiUrl":"https://doi.org/10.1186/s13037-026-00481-3","workflowStages":[]},"version":"v1","identity":"rs-8232610","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8232610","identity":"rs-8232610","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00