Machine Learning-Based Individualized Prediction: Risk Assessment of Retinopathy in Preterm Infants at High Altitude | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Machine Learning-Based Individualized Prediction: Risk Assessment of Retinopathy in Preterm Infants at High Altitude Yang Yu, Yunjie Zhang, Yuanfang Xin, Nancuo Suo, Xueren Ma, Yumei Guan, and 4 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8533267/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 07 Apr, 2026 Read the published version in BMC Ophthalmology → Version 1 posted 10 You are reading this latest preprint version Abstract Background Retinopathy of prematurity (ROP) has been one of the main eye troubles leading to childhood blindness. The specific chronic hypoxic environment at high altitude may form a unique risk profile, acting as a potential trigger of the onset and progression of ROP. So far, there is an absence of specific ROP risk predictive model for preterm infants in these areas. Accordingly, this study intended to develop an ROP predictive model at high altitude using machine learning (ML) methods. Methods Through a retrospective collection of the clinical data from 2,138 premature infants who underwent fundus screening at Qinghai Red Cross Hospital between May 2014 and May 2025, this study was conducted with the establishment of a training set (n = 1,470) and a testing set (n = 668) at a 7:3 ratio. Key predictors from 59 candidate variables were screened by employing univariate analysis and LASSO regression. This study continued to construct nine ML models involving logistic regression, decision tree, random forest, XGBoost, LightGBM, support vector machine, Gaussian Naive Bayes, multilayer perceptron, and TabNet. Finally, to evaluate the model performance, another independent testing set was utilized to carry out model training and hyper-parameter optimization were performed using five-fold cross-validation and Bayesian optimization. Results LASSO regression identified 11 key predictors, including perinatal asphyxia, bronchopulmonary dysplasia (BPD), surfactant administration, gestational age, hyperbilirubinemia, respiratory failure, mode of delivery, premature rupture of membranes, intravenous nutritional duration, fasting duration, and total bile acids. The area under the receiver operating characteristic curve (AUC) of all models was greater than 0.82 on the testing set. The AUC of the decision tree model was the highest (0.954, 95% CI: 0.919–0.989), but the random forest model exhibited the optimal comprehensive performance (AUC = 0.933, 95% CI: 0.891–0.974; sensitivity = 0.691; specificity = 0.943; F1 score = 0.631). The integrated model also demonstrated a robust performance (AUC = 0.949). In addition, duration of parenteral nutrition, respiratory failure, and gestational age were identified as the most influential predictors by SHAP analysis. Conclusions This study successfully develops and validates a ML predictive model for ROP in preterm infants at high altitude. With an effective identification of infants at high risk for ROP based on routine clinical indicators, the random forest model demonstrates the optimal overall performance, and hence offers a scientific tool for precision screening and early intervention. Retinopathy of prematurity Machine learning Predictive model LASSO regression Random forest High-altitude regions SHAP analysis Figures Figure 1 Figure 2 Figure 3 Figure 4 1. Introduction Retinopathy of prematurity (ROP) is a retinal neovascularization in premature and low birth weight (BW) infants with abnormal retinal vascular development as the core pathological feature, which may trigger traction retinal detachment, acting as one of the leading causes of childhood blindness [ 1 , 2 ]. In general, retinal vessels are developed normally between 36 and 40 weeks of gestation. However, there may be a disrupted retinal vascularization in premature infants due to their early departure from the uterine environment, forming ischemic and hypoxic areas. It may further trigger abnormal proliferation of retinal neovascularization, hemorrhage, and fibrous tissue proliferation owing to the resultant excessive expression of vascular endothelial growth factor (VEGF), ultimately inducing retinal detachment and permanent blindness. According to a latest systematic review, the overall prevalence of ROP reaches 32% among screened premature infants worldwide, with severe ROP accounting for 7.5% [ 3 ]. Longitudinal study also documents that 50% of untreated threshold ROP cases progress to retinal detachment, 90% of which ultimately progress to blindness [ 3 ]. Moreover, cerebral palsy may have an incidence of 20–30% in children with severe ROP, coupled with a prevalence of 40–50% for cognitive impairment, which are significantly higher than those in ROP-free premature infants [ 4 ]. Nowadays, ROP management relies primarily on regular fundus screening to detect threshold lesions requiring timely laser photocoagulation or anti-VEGF drug therapy [ 5 ]. However, this model faces dual challenges in practice. First, the rate of receiving standardized fundus screening is relatively low for high-risk premature infants in low- and middle-income countries, merely 20–30%, much lower than the 90% set by the World Health Organization. Second, the widely applied screening criteria based on gestational age (GA) and BW provide broad coverage, which, however, produce a substantial screening burden, carrying risks of delayed diagnosis and missed cases, in addition to exacerbating the burden on clinical resources [ 6 – 8 ]. In addition, the risk factors identified for ROP, such as low GA, low BW, and prolonged oxygen therapy, present with relatively limited predictive value for individualized risk stratification [ 9 , 10 ]. It highlights an urgent need of adopting a more precise, data-driven risk-evaluation strategy to efficiently identify high-risk infants, thus optimizing the allocation of limited medical resources and ensuring timely interventions. With the emergence and application of machine learning (ML), we may acquire new opportunities to address all the above challenges. This technology enables the analysis of complex and high-dimensional clinical data, thus facilitating the capture of subtle patterns and interactions that are difficult to recognize by traditional methods, which may benefit the development of more robust ROP predictive models [ 11 – 13 ]. Despite the feasibility of ML validated by preliminary studies, there are still many challenges and unresolved issues in this field. To be specific, for many models constructed based on limited clinical characteristics, there is usually an ignorance of the potential impact of perinatal complications [e.g., bronchopulmonary dysplasia (BPD) and perinatal asphyxia] and metabolic factors. Meanwhile, there is still a lack of sufficient research to support the generalization capabilities of existing models across different populations, especially those who reside in unique environments (e.g., high-altitude hypoxia) [ 14 , 15 ]. The unique exposure to chronic hypoxia in high-altitude areas (e.g., Qinghai and Tibet) may foster distinct ROP risk characteristics, resulting in the uncertainty on the applicability of screening models developed for populations in plain areas. Therefore, to deal with current study gaps and improve health outcomes., it is critical to construct customized ML predictive models specifically for populations chronically exposed to hypoxia at high altitude To fill the aforementioned gaps, this study was conducted to develop and validate a ML-based ROP predictive model based on a retrospective cohort of premature infants at high altitude. Our study indented to: 1) identify independent risk factors for ROP and construct a concise and efficient feature set; and 2) systematically evaluate and compare the predictive performance of multiple ML algorithms on this dataset to determine the optimal approach. This study hypothesized that a ML model integrating multidimensional clinical data with specific environmental exposure data can achieve early and precise identification of infants at high risk for ROP. Our findings are expected to provide a data-driven tool for optimizing screening strategies and enabling timely intervention, thus improving visual prognosis in this population. 2. Materials and Methods Study objects As a retrospective cohort study, it was conducted with the enrollment of preterm infants hospitalized in the Neonatology Department of Qinghai Red Cross Hospital between May 2014 and May 2025. All these infants were incorporated for undergoing standardized ROP fundus screening using the Retcam 3 wide-field digital imaging system. The diagnosis of ROP followed the diagnostic criteria adhering to the Guidelines for Screening ROP in China in 2014 [ 16 ]. Inclusion criteria: 1) GA < 37 weeks; 2) completion of the ROP screening protocol; and 3) complete clinical records. Exclusion criteria: 1) severe congenital deformity or chromosomal abnormalities; 2) other significant ocular pathologies interfering with ROP assessment; 3) missing data for key variables (e.g., oxygen therapy duration, BW); and 4) death or voluntary discharge before screening via initial fundus. Clinical data collection This study retrospectively collected data from the electronic medical record system of Qinghai Red Cross Hospital, with the variables including baseline clinical information (e.g., gender, GA, BW, and Apgar scores), perinatal factors (e.g., premature rupture of membranes, and antenatal steroid use), treatment status (e.g., respiratory support, nutritional strategies, and blood transfusion), related complications involving respiratory (e.g., BPD, and RDS), circulatory (e.g., PDA, and PPHN), neurological (e.g., ICH), infectious (e.g., sepsis), and metabolic disorders, and laboratory parameters [complete blood count, liver and kidney function, inflammatory markers (procalcitonin, PCT), metabolites (lactate, and 25(OH)D 3 ), and blood gas analysis parameters]. Data preprocessing and feature screening Based on ROP status (dependent variable, binary classification), the dataset was stratified and sampled at a 7:3 ratio to form a training set (n = 1,470) and a testing set (n = 668) [ 17 ]. The training set-based feature selection is shown as follows: (1) Univariate analysis was performed on 59 indicators to preliminarily screen potential predictors. Specifically, following the testing for data normality, continuous variables are presented as mean ± standard deviation when normally distributed, and as median (interquartile range) when non-normally distributed. Differences between groups were assessed using independent samples t‑test for normally distributed data and the Mann‑Whitney U test for non‑normally distributed data. Categorical variables were compared using the chi‑square test or Fisher’s exact test, depending on the expected cell counts. (2) Relevant variables were further optimized by integrating clinical evaluation with multilinear tests (VIF < 5). (3) Automated feature selection was performed using Least Absolute Shrinkage and Selection Operator (LASSO) regression. In this analysis, the optimal value of the penalty parameter λ was determined by employing 10-fold cross-validation. The final model was chosen based on the “one standard-deviation rule” (λ. 1se) to enhance the simplicity and generalization of the constructed model under the premise of maintaining performance [ 18 , 19 ]. Missing data imputation and class imbalance handling Missing data imputation A separate handling strategy was adopted to process the missing data in the training and testing sets to prevent data leakage and ensure independence between model training and evaluation. First, variables (including repeated laboratory measures and non-standardized data entries) with a missing rate of > 30% in the training set were excluded from the study. Then, for the remaining variables with a missing rate ≤ 30%, multiple imputation with the Chain Equations (MICE) algorithm [ 20 ] was applied using the R software package MICE (version 3.14.0). For imputation, Predictive Mean Matching (PMM) and logistic regression were respectively used for continuous variables and binary variables. In terms of the selection of PMM, it can preserve the original distribution characteristics of variables; is applicable for non-normally distributed data; and can ensure imputed values within reasonable ranges. As for the setting of critical parameters, the maximum iterations was 50, and random seed was 42 to generate 5 complete datasets. To strictly prevent data leakage, the imputation should follow several steps below: (1) The imputation model parameters of the training set were designed based on the data estimation of the training set; (2) The trained imputation model was applied to the testing set via the newdata parameter of the mice.mids() function; and (3) the analysis results from the five imputed datasets were integrated using the Rubin’s rules. According to the analysis of missing data patterns in the training set, the median missing rate was 8.2% (range: 2.1%–28.7%) and 5.6% (range: 0.3%–22.4%) for the continuous and categorical variables, respectively. with the highest missing rates for PCT (28.7%) and 25(OH)D (24.3%). Class imbalance handling In this study, the class imbalance of ROP samples (11.8% of the positive samples) in the training set was processed by the synthetic minority over-sampling technique (SMOTE) [ 21 ]. SMOTE was strictly applied after feature selection and before model training on the training set. To be specific, the initial step was the extraction of feature subsets correspondent to 11 screened key features (n = 1,470, including 173 ROP positive cases and 1,297 negative cases) using LASSO. Then, SMOTE was performed using Python’s imbalanced-learn library (version 0.10.1) with parameters of sampling_strategy=‘auto’ (1:1 balance), k_neighbors = 5, random_state = 42. After that, minority class samples in feature space were subjected to linear interpolation using SMOTE to generate synthetic samples, with the formula x_new = x_i + λ×(x_neighbor - x_i), where λ ~ U(0,1). The sample size of the training set was increased to 2,594 cases (1,297 ROP-positive and 1,297 ROP-negative cases) after SMOTE processing, achieving a perfect class balance. However, the testing set [n = 668, including 68 (10.2%) ROP-positive cases] retained its original imbalanced distribution to authentically reveal clinical scenarios and ensure ecological validity in model performance evaluation [ 22 ]. This strategy adhered to fundamental ML principles that the testing set can be performed with preprocessing optimization, while the testing set must maintain true distributions, thereby preventing risks of data leakage and performance overestimation. Model evaluation and statistical test Both internal and external validations were employed jointly to comprehensively assess the model performance. Specifically, the internal validation was performed using five-fold cross-validation on the training set. A stratified sampling strategy was adopted in this process, which was implemented via the StratifiedKFold function in the scikit-learn library, with parameters of n_splits = 5, shuffle = True, and random_state = 42. It aimed to ensure the maintenance of the original positive-to-negative sample ratio in each split, and guarantee the reproducibility of the splitting process. Four folds were used for model training and the remaining 1 fold for validation in each round of cross-validation, with 5 times of repeated processing until all subsets had been used as the validation set once. Finally, the average area under the receiver operating characteristic curve (AUC) of all 5 rounds of validation was used for robust estimation of the model performance [ 23 , 24 ]. In particular, this method was highly suitable for addressing the class imbalance (where ROP positive samples accounted for 11% approximately) in this study. It could facilitate an effective reduction of the bias of performance evaluation and enhance the generalization capability of the constructed model. Moreover, the cross-validation results could be used to further guide the Bayesian hyperparameter optimization to inhibit model overfitting. The metrics for model evaluation were multi-dimensional, involving discriminative performance indicators, prediction calibration metrics (i.e., the Brier score) that enabled a quantification of the consistency between the model-predicted probabilities and the actual probabilities, and clinical utility metrics. The net benefit values at different decision thresholds could be calculated through decision curve analysis, thereby achieving an evaluation of the potential value of the model in clinical settings [ 25 – 27 ]. In addition, all quantitative and categorical data were described by appropriate descriptive statistics, with differences in AUC performance across models on the testing set compared for significance using the DeLong test. ML In this study, nine representative ML models covering different modeling approaches were constructed and compared to systematically evaluate and identify the optimal algorithm for the ROP prediction task. The specific models were gradient boosting frameworks (XGBoost, and LightGBM), kernel-based methods (SVM), linear models (logistic regression), tree-based models (decision trees, random forest), probabilistic models (GaussianNB), and deep learning models (MLP, TabNet). To ensure fairness and objectivity of performance comparison, all models were independently trained and tuned on the same training set. In order to avoid overfitting and optimize model performance, this study applied the Bayesian hyperparameter optimization based on five-fold cross-validatio. Specifically and firstly, the Gaussian process was utilized to construct a surrogate model, with the average AUC value from the five-fold cross-validation as the objective function. Secondly, evaluation points with the greatest potential in the parameter space were intelligently determined by functions like expected improvement. The optimization was achieved using the BayesSearchCV from the scikit-optimize library (v0.9.0) [ 28 – 30 ]. Table S2 summarized the specific optimal hyperparameter combinations of all models after Bayesian optimization. In addition, to objectively evaluate the model’s generalization performance, a completely independent testing set was used for the external validation of the final model. This study adopted a comprehensive evaluation metric system, including the AUC and its 95% confidence interval (95% CI), accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and F1 score. The systematic modeling framework spanned from data preprocessing, feature engineering, model training, and rigorous validation, which could effectively ensure the reliability of performance comparisons in different models, and guarantee the reproducibility of the entire study process and results [ 31 ]. Data processing and statistical analysis All statistical analyses and predictive model construction were conducted within the specific computational environment. The R software (version 4.2.1) was used for variable selection, univariate analysis, and LASSO regression. In addition, Python libraries (e.g., scikit-learn, XGBoost, and LightGBM) on the Anaconda Navigator platform was accessed to conduct ML model training, optimization, and evaluation. 3. Results Comparison of baseline characteristics To assess the balance of the two datasets after stratified sampling, In, this study systematically compared the baseline characteristic distributions between the training set (n = 1,470) and the testing set (n = 668), as presented in Table 1 . There were no statistically significant inter-group differences for the vast majority of variables (all P > 0.05), confirming the validity of the sampling strategy. Specifically, the training and testing sets were observed with consistent distribution of all variables ultimately included in the predictive model, providing a reliable feature foundation for core predictive tasks. A good inter-group comparability was noticed for the frequency, proportion, central tendency, and dispersion degree of categorical variables (e.g., gender, maternal pregnancy complications, and neonatal complications) and continuous variables (e.g., GA, Apgar score, key laboratory indicators) (all P > 0.05). Meanwhile, some variables excluded from the model (e.g., necrotizing enterocolitis, BW, and oxygen saturation) with statistical differences (all P < 0.05), given that their effect sizes were limited and irrelevant to the model, did not affect the validity of subsequent modeling and validation. Collectively, the inter-set balanced distribution of key predictive variables established a solid basis for robust modeling and generalization capability assessment. Table 1 Baseline characteristics of the training and testing cohorts (key variables). Characteristic Training set (n = 1470) Testing set (n = 668) p value Male, n (%) 829(56.4) 367(54.9) 0.561 Altitude category, n (%) 0.915 Low (< 1500 m) 2(0.1) 1(0.1) Middle (1500ཞ2500 m) 798(54.3) 369(55.2) High (≥ 2500 m) 670(45.6) 298(44.6) ROP, n (%) 173(11.8) 68(10.2) 0.316 Predictors included in the final model Mode of delivery = Cesarean section, n (%) 1000(68) 455(68.1) 1 Premature rupture of membranes, n (%) 510(34.7) 238(35.6) 0.711 Respiratory failure, n (%) 708(48.2) 335(50.1) 0.421 Bronchopulmonary dysplasia, n (%) 208(14.1) 84(12.6) 0.36 Hyperbilirubinemia, n (%) 963(65.5) 428(64.1) 0.55 Perinatal asphyxia, n (%) 283(19.3) 108(16.2) 0.099 Pulmonary surfactant administration, n (%) 344(23.4) 163(24.4) 0.653 Gestational age, weeks, median (IQR) 34.14(32.43, 35.43) 34.29(32.57, 35.43) 0.339 Parenteral nutrition duration, days, median (IQR) 16.00(11.00, 25.00) 15.00(11.00, 23.00) 0.392 Fasting duration, days, median (IQR) 1.00(0.00, 2.00) 1.00(0.00, 2.00) 0.228 Total bile acids, µmol/L, median (IQR) 9.10(6.00, 13.60) 9.35(6.10, 14.30) 0.233 Footnote : Data are presented as n (%) for categorical variables and as median (IQR) for continuous variables (as appropriate). p values were calculated using the chi-square test or Fisher’s exact test for categorical variables and the independent-samples t-test or Mann–Whitney U test for continuous variables, depending on distribution. Abbreviations: ROP, retinopathy of prematurity. To systematically identify potential predictors of ROP, this study continued to compare the baseline characteristics between the ROP lesion group (n=173) and the non-lesion group (n=1,297) within the training set, as shown in Table S1A and Table S1B. Based on the univariate analysis, there existed significant inter-group differences in multiple key indicators(all P <0.05). The analysis of primary risk factors showed significantly lower GA, BW, and Apgar scores in the ROP group (all P <0.001), confirming that preterm birth, low BW, and poor birth status were core ROP risk factors. There were significant differences in complications and therapeutic interventions, with the ROP group exhibiting significantly higher proportion of BPD and neonatal resuscitation ( P <0.001). Patients in the ROP group were also observed with longer oxygen therapy duration, higher inspired oxygen concentrations, and more frequent invasive respiratory support (all P <0.001), supporting an intimate association between oxygen therapy intensity and invasive respiratory support. Laboratory indicators exhibited characteristic patterns, with the ROP group presenting significantly elevated blood lactate and total bile acid (TBA) levels, as well as obviously lower nutritional markers such as albumin and hemoglobin (all P <0.001). Noticeably, 25(OH)D levels were elevated in the ROP group ( P <0.001). In addition, some variables like gender ( P =0.035) were observed with statistical differences, yet with limited clinical significance. But there were no significant inter-group differences in gestational hypertension or patent ductus arteriosus. Collectively, based on the univariate analysis results in Table S1A and Table S1B, this study identified key clinical characteristics closely associated with ROP occurrence. These variables, which were multi-dimensional, involved preterm birth severity, BW, respiratory support intensity, neonatal complications, and metabolic indicators. Significantly, these variables with significant inter-group differences provided potentially candidate features with strong discrimination power for subsequent construction of ML-based predictive models. Feature screening based on LASSO regression Based on the training set (n=1,470), LASSO regression analysis was performed to determine the optimal penalty parameter (λ.1se=0.045) by applying 10-fold cross-validation. It achieved a successful selection of 11 independent predictors of ROP from the 38 initially screened significant variables (Table 2). Consequently, perinatal asphyxia (coefficient: 0.898), pulmonary surfactant administration (coefficient: 0.489), and bronchopulmonary dysplasia (coefficient: 0.404) were strong risk factors for ROP (all P<0.001). These results were strongly consistent with the oxygen-related pathophysiological mechanisms of ROP, collectively supporting the severity of respiratory disease was a key driver of ROP. In contrast, gestational age (coefficient: -0.157) was a remarkable protective factor with an odds ratio (OR) of 0.855 (95% CI: 0.812–0.901), indicating a reduced risk of ROP by approximately 14.5% for each additional week of GA. In addition, it had statistically significant positive association with total bile acids (coefficient: 0.002), despite small coefficients (P=0.013), suggesting the potential of hepatic-biliary metabolic disorder as a weak yet independent risk for ROP. While further investigation should be scheduled to clarify the effects of other protective factors such as mode of delivery (coefficient: -0.200) and respiratory failure (coefficient: -0.347). Altogether, these results confirmed the advantages of LASSO regression in efficient dimensionality reduction and feature screening, which could benefit subsequent construction of a streamlined and effective predictive model. Table 2. Predictors selected by LASSO regression for ROP risk modeling (training set; λ_1se = 0.045). Predictor Type Unit / Coding Coefficient Mode of Delivery Binary Vaginal delivery = 0; Cesarean section = 1 -0.199845 Premature Rupture of Membranes Binary Yes = 1; No = 0 0.129999 Respiratory Failure Binary Yes = 1; No = 0 -0.347105 Bronchopulmonary Dysplasia Binary Yes = 1; No = 0 0.403884 Perinatal Asphyxia Binary Yes = 1; No = 0 0.898120 Hyperbilirubinemia Binary Yes = 1; No = 0 0.323956 Pulmonary Surfactant Administration Binary Yes = 1; No = 0 0.489140 Gestational Age Continuous weeks -0.156957 Parenteral nutrition duration Continuous days 0.013350 Fasting Duration Continuous days 0.100320 Total Bile Acids Continuous μmol/L 0.002007 Footnote: Predictors were selected using LASSO regression at λ_1se. LASSO coefficients are shown for feature selection and do not directly represent odds ratios. Through feature screening based on LASSO regression, the predictive variables were streamlined from the initial 59 to 11 core factors, and model degrees of freedom reduced from 38 to 11. Eventually, it contributed to an effective control of the model complexity and mitigation of overfitting risks, while preserving key predictive information. Clinical evaluation further validated clear clinical validity possessed by all screened variables, with their pathophysiological significance aligning with established ROP pathogenesis. Collectively, our obtained feature set comprehensively covered multiple clinical dimensions of perinatal conditions, complications, therapeutic interventions, and laboratory indicators, laying an ideal feature foundation for the establishment of ML predictive models with high accuracy and strong interpretability. Model performance evaluation In this study, based on the systematic evaluation of these models on an independent testing set (Table 3), we continued to comprehensively compared the discriminative performance, calibration, and clinical utility of nine ML models). All models demonstrated strong discriminatory ability (AUC>0.82), with decision trees (AUC: 0.954), logistic regression (AUC: 0.940), and random forest (AUC: 0.933) validated to have the most outstanding performance. Table 3. Predictive performance of the machine-learning models in the training set (5-fold cross-validation) and the independent testing set. Model AUC (training, 5-fold CV AUC (testing, 95% CI) Accuracy Sensitivity Specificity PPV NPV F1-score Decision tree 0.912 0.954 (0.919–0.989) 0.891 0.794 0.902 0.478 0.975 0.597 Soft-voting ensemble 0.942 0.949 (0.912–0.986) 0.900 0.750 0.917 0.505 0.970 0.604 Logistic regression 0.901 0.940 (0.901–0.980) 0.850 0.824 0.853 0.389 0.977 0.528 Gaussian naive Bayes 0.865 0.937 (0.897–0.978) 0.861 0.765 0.872 0.403 0.970 0.528 RandomForest 0.960 0.933 (0.891–0.974) 0.918 0.691 0.943 0.580 0.964 0.631 Multilayer perceptron 0.920 0.909 (0.861–0.956) 0.906 0.691 0.930 0.528 0.964 0.599 XGBoost 0.972 0.906 (0.857–0.954) 0.916 0.662 0.945 0.577 0.961 0.616 LightGBM 0.979 0.868 (0.812–0.923) 0.925 0.603 0.962 0.641 0.955 0.621 Support vector machine 0.946 0.854 (0.796–0.912) 0.892 0.485 0.938 0.471 0.941 0.478 TabNet 0.996 0.820 (0.757–0.882) 0.886 0.574 0.922 0.453 0.950 0.506 Abbreviations: AUC, area under the receiver operating characteristic curve; CI, confidence interval; PPV, positive predictive value; NPV, negative predictive value. According to the results, the decision tree model had the highest AUC, and outstanding sensitivity (0.794) and negative predictive value (0.975). It was featured by a simple structure and high interpretability, which would be suitable for rare disease screening. Furthermore, as a classic medical predictive model, the logistic regression exhibited the highest sensitivity (0.824) and a negative predictive value of 0.977, providing a reliable performance baseline for model comparison. Additionally, the Random Forest revealed relatively lower sensitivity (0.691) and the highest F1-score (0.631), suggesting an optimal balance between precision and recall; and its high specificity (0.943) would minimize false positives, supporting its applicability for precise interventions in resource-constrained settings. Notably, some ensemble models and deep learning models (e.g., XGBoost, LightGBM, and TabNet) exhibited extremely high AUC (>0.97) on the training set after cross-validation. However, they yielded significant performance degradation on the independent testing set, with the AUC dropping to the range of 0.82–0.91, indicating potential risk of overfitting. In contrast, given their more stable performance across different datasets, decision trees, logistic regression, and random forest had superior generalization and greater potential for clinical employment. By applying a soft voting ensemble strategy, this study integrated the predictive strengths of multiple base models and construct a more robust ROP predictive model. This ensemble method was based on the weighted average of the output probabilities from each base model, with the equation shown as follows: Pfinal (class=i)=Σ(wk * Pk (class=i)) where Pfinal (class=i) represents the final probability of a sample belonging to Class I by the ensemble model; wk reveals the weight coefficient of the k th base model; and Pk (class=i) indicates the probability of a sample belonging to Class I by the k th base model. To reflect their corresponding relative discriminative capabilities, on the basis of the AUC performance of each model on the testing set, the weight coefficients were standardized to the decision tree with 0.337, logistic regression with 0.333, and random forest with 0.330. This ensemble model achieved outstanding overall performance on the independent testing set, with an AUC of 0.949, coupled with a good balance across all evaluation metrics (Table 3), suggesting excellent predictive robustness and clinical applicability potential. Furthermore, in order to validate the statistical significance of performance differences across all models, Delong’s test was applied for pairwise comparisons of AUC values on the testing set, with the results detailed in Table S3. It was found that: (1) There were significant performance gaps between the ensemble model (Voting) and certain base models: To be specific, there were extremely significant AUC differences between the ensemble model and relatively weaker models (e.g., TabNet, SVM, and LightGBM) were (all P <0.001). Taking TabNet as an example, the AUC difference between the ensemble model and TabNet was -0.129 (95% CI: -0.179 to -0.079, P <0.001), highlighting the performance advantage of the ensemble approach. (2) There were no significant differences between the core models and the ensemble model: No statistically significant AUC differences were observed between the three core models (decision tree, logistic regression, and random forest) and the ensemble model, with the difference in the comparison of decision tree, logistic regression, and random forest with Voting as 0.005 ( P =0.292), -0.009 ( P =0.108), and -0.016 ( P =0.003), respectively. (3) The discrimination capabilities among core models were highly similar: There were no significant differences in AUC among the three core models (e.g., decision tree vs. logistic regression, P =0.097), further validating their consistent discrimination performance. Using ROC, calibration and decision curves, this study further comprehensively evaluated the predictive performance and clinical applicability of the models. The ROC curves of each model on the testing set (Figure 1) visually indicated their discriminative capabilities, with superior overall performance identified for curves closer to the upper-left corner. The significant AUC values of some core models (e.g., random forest) were consistent with their high discriminative performance. As shown in Figure 2 regarding the calibration curves for each model, there was a certain degree of systematic probability overestimation, manifesting by a relatively lower localization of these calibration curves to the ideal diagonal line in all models. It might be attributed to the inconsistent prior distributions owing to balanced handling in the training set and low positive proportions in the testing set. The discriminatory performance of core models (e.g., random forest) remained outstanding, despite potential calibration bias, revealing their persistent value in risk ranking. The practical utility of models was subsequently assessed through clinical decision curves, as depicted in Figure 3. The net benefits of some core models (e.g., random forest) were consistently higher than those after two baseline strategies of “all intervention” and “no intervention” across a broad threshold range of 5%–40%, showing strong clinical applicability. Therefore, despite limitations in probability calibration, the risk stratification based on these models can still provide substantial support for clinical decision-making. In accordance with the aforementioned comprehensive analysis, although the decision tree model exhibited slightly higher AUC, the random forest had more balanced performance in terms of the F1-score (0.631) and the specificity (0.943). In addition, the random forest had superior generalization as an ensemble model than that of a single decision tree, with lower risk of overfitting. Therefore, following comprehensive consideration of discriminative capability, robustness, and clinical utility (referring to the DCA curve), the random forest was recommended to be the optimal model for predicting the risk of ROP in this study. Model interpretability analysis This study conducted an interpretability analysis on the random forest model with the optimal performance (AUC=0.933 and F1-score=0.631 for the testing set) via the game theory-based Shapley Additive exPlanations (SHAP) framework, with a purpose to reveal its feature contribution mechanisms and enhancing the clinical comprehensibility of the model. The analysis was implemented to precisely quantify the contribution degree of each feature to individual prediction outcomes by using Python’s shap package (version 0.41.0), with the efficient TreeSHAP algorithm targeted for ensemble tree models applied. Aiming at balancing the computational efficiency and result accuracy, SHAP value approximation was performed by randomly selecting 100 samples from the testing set to enhance analysis feasibility while ensuring statistical reliability. Figure 4a presents the results of the global feature importance analysis based on average absolute SHAP values. The top-6 features contributing the most significantly to the predictions of the random forest model were PN-D, RF, GA, PSA, HB, and PA in order. Moreover, this ranking was highly consistent with the clinical pathophysiological mechanisms of ROP. Consequently, it confirmed that the onset and progression of ROP might be significantly affected by respiratory complication severity, preterm maturity, and metabolic dysfunction. SHAP-based feature impact direction analysis further revealed intrinsic associations between key predictor variables and ROP risk(Figure 4b). Specifically, clinical conditions such as perinatal asphyxia (PA), PSA, and BPD , as well as lower GA and higher TBA levels, were all associated with positive SHAP values, confirming them as independent risk factors for ROP. Conversely, higher GA and lower TBA levels were associated with negative SHAP values, indicating clear protective effects. Critically, the feature importance ranking determined by SHAP analysis was highly consistent with the results of feature screening from LASSO regression (Kendall’s τ=0.89, P <0.001). It in turn methodologically cross-validated the robustness and clinical rationality of feature screening in this study. Discussion In this study, a ML model suitable for predicting the risk of ROP at high altitude was successfully constructed based on clinical data from preterm infants at Qinghai Red Cross Hospital. In terms of the key findings, firstly, 11 independent predictors from 59 candidate features were identified by LASSO regression, with duration of parenteral nutrition, respiratory failure, and low gestational age confirmed as the strongest risk factors for ROP. Second, the random forest model had superior predictive performance (AUC=0.933 for the testing set), with its discriminatory capability significantly superior to traditional screening based on GA and BW. Results in this study both extend and refine relevant evidence reported previously. This study not only reaffirmed the classic finding that low GA and low BW serve as core risk factors for ROP [32, 33], but also further quantified the protective effect of GA through a multivariate regression model (OR=0.855). More importantly, through rigorous feature screening, this study for the first time identified perinatal asphyxia (PA)and BPD as strong predictors independent of traditional risk factors in the population at high altitude. It is is of great pathophysiological significance to reveal the critical mechanistic role of severe neonatal pulmonary disease and accompanying oxygen-dynamic disturbances in ROP pathogenesis [34], thus advancing the academic emphasis on “oxygen therapy management as central” [35] from theoretical to clinically quantifiable assessment [36]. Altogether, there may be a unique ROP pathogenesis, featured by a “cardiopulmonary-retinal axis”, in high-altitude hypoxic environments, providing fresh insights for targeted intervention studies in the future. Nowadays, multiple predictive models have been developed based on routine clinical data, such as BW, GA, postnatal weight gain, days on oxygen, and number of blood transfusions. Common models are ROPScore, PW-ROP, WINROP, G-ROP, and DIGI-ROP [37]. In recent years, with the development of artificial intelligence technologies, the predictive models based on ML (e.g., XGBoost) and deep learning algorithms have shown superior predictive performance. Innovatively, it is the first to reveal that the superior discriminative performance of ML predictive models for ROP risk prediction in preterm infants at high altitude to that of traditional screening criteria. Second, the introduction of the SHAP interpretability framework can also achieve effective alignment between predictive results and clinical knowledge, in addition to revealing the nonlinear influencing patterns of key predictors [38, 39]. It may provide a methodological foundation for transforming complex models into reliable decision-support tools trusted by clinicians. Findings in our study have clear potential for clinical translation. For example, it may benefit clinicians’ early risk stratification management on preterm infants, and intensify clinical monitoring and intervention for high-risk patients (e.g., those complicated with PA or BPD). Eventually, it can optimize the allocation of screened resources, and can also enhance the efficiency of ROP prevention and treatment. However, this study still has several limitations that warrant consideration. First, there might be a risk of selection bias due to the collection of samples from a single medical institution merely. Second, there was a lack of an external validation cohort, requiring further verification of the generalizability of the model across different geographic regions and population characteristics. Third, the data quality in this retrospective study might impact model training and feature screening. Fourth, as an analysis based on clinical macro-level data, this study has not yet conducted biological experiments for a thorough validation of the specific mechanisms by which identified biomarkers (e.g., TBA) contribute to the onset and progression of ROP. Therefore, to systematically enhance the generalization and biological interpretability of the constructed models, multicenter prospective cohort validation is recommended to be conducted with standardized data collection procedures, and molecular biology experiments. In view of the findings and limitations of this study, in the future, relevant investigation can be advanced in three dimensions as follows. Firstly and preferentially, the generalization of models across different populations and settings should be assessed rigorously by large-scale and multicenter prospective cohort validation studies, which is an indispensable step for clinical translation. Second, with further requirement of overcoming the limitations of existing static clinical data, there is a need to explore the integration of continuous dynamic physiological monitoring data (e.g., blood oxygen fluctuations, and heart rate variability) into the model to enhance predictive timeliness and sensitivity. Finally, subsequent studies of mechanisms focusing on identified key risk biomarkers (e.g., TBA) should be conducted by integrating molecular biology techniques, thereby elucidating their pathways in the onset and progression of ROP. Eventually, the precision and individualization of ROP prevention and treatment may be advanced significantly through organic combination of validation, optimization, and mechanism exploration. Conclusion To sum up, this study successfully constructs a risk predictive model for ROP in preterm infants at high altitude using ML techniques. It provides reliable evidence-based support for precision screening and targeted intervention implementation by accurately identifying high-risk populations for ROP using ensemble learning algorithms. Despite inherent limitations in single-center retrospective analysis, this study is the first to systematically construct an ROP predictive model in the high-altitude hypoxic environment.Beyond offering clinicians with a practical risk assessment tool, it also supplies important methodological references and translational insights for ROP prevention and treatment strategy optimization in unique geographical environments. This model holds promise for demonstrating its clinical value in broader populations on the basis of further validation and improvement via multicenter prospective studies. Declarations Funding This study was supported by the Qinghai Province Kunlun Talents Program (High-end Innovation and Entrepreneurship Talent – Leading Talent Project; grant number QHKLYC-GDCXCY-2020-202). The funder was involved in the study design, data collection, analysis, decision to publish, and manuscript review. Conflicts of interest/Competing interests XL, QD, YY, YZ, XY, NS, XM, YG, YM, and YH declare that they have no competing interests. Availability of data and material The datasets used and/or analysed during the current study are not publicly available due to ethical and privacy restrictions, but are available from the corresponding author on reasonable request and with permission of the institutional ethics committee (and, where applicable, a data use agreement). Authors' contributions XL conceived and designed the study and oversaw the overall project. YY developed the research protocol, collected and analyzed the data, and drafted the manuscript. YZ collected the data, performed the statistical analysis, and contributed to manuscript writing. QD reviewed the study protocol, coordinated the study progress, and critically reviewed the manuscript. YM and HY collected the data. NS and XM assisted with data collection and resource allocation. YX and YG assisted in developing the research protocol and contributed to data analysis. All authors read and approved the final manuscript. Ethics approval This study was officially approved by the Ethics Committee of Qinghai Red Cross Hospital (Approval No.: KY-2025-13). All procedures were conducted in accordance with the Declaration of Helsinki and relevant national regulations. Consent to participate Given the retrospective nature of this study, the requirement for informed consent from the parents or legal guardians of the participating infants was waived by the Ethics Committee of Qinghai Red Cross Hospital. This waiver was granted in accordance with Chinese national regulations governing biomedical research involving humans (《涉及人的生物医学研究伦理审查办法》, 2016; 《涉及人的生命科学和医学研究伦理审查办法》, 2023), based on the following justifications: (1) the study involved minimal risk as it was a retrospective analysis of de-identified medical records; (2) obtaining informed cosennt was impracticable due to the large sample size and loss of contact with participants after hospital discharge; and (3) strict measures were implemented to protect patient privacy and data confidentiality, including complete de-identification of all personal information. Consent for publication Not applicable. The manuscript does not contain any individual person’s data in any form (including individual details, images or videos). References Cakir B, et al. Thrombocytopenia is associated with severe retinopathy of prematurity. JCI Insight. 2018;3(19):e124238. Bohley M, et al. A single intravenous injection of cyclosporin A-loaded lipid nanocapsules prevents retinopathy of prematurity. Sci Adv. 2022;8(38):eabo6638. Moin M, et al. Severe ROP rate and assessment of the burden of ROP screening at a single tertiary care public hospital in Pakistan. BMC Ophthalmol. 2025;25(1):594. García H, et al. Global prevalence and severity of retinopathy of prematurity over the last four decades (1985–2021): a systematic review and meta-analysis. Arch Med Res. 2024;55(2):102967. Gundlach BS, et al. Real-world visual outcomes of laser and anti-VEGF treatments for retinopathy of prematurity. Am J Ophthalmol. 2022;238:86–96. Trzcionkowska K, Schalij-Delfos NE, van den Akker-van Marle EME. Cost reduction in screening for retinopathy of prematurity in the Netherlands by comparing different screening strategies. Acta Ophthalmol. 2023;101(1):81–90. Dogra MR, Vinekar A. Role of anti-vascular endothelial growth factor (anti-VEGF) in the treatment of retinopathy of prematurity: a narrative review in the context of middle-income countries. Pediatr Health Med Ther. 2023;14:59–69. Tsai AS, et al. Assessment and management of retinopathy of prematurity in the era of anti-vascular endothelial growth factor (VEGF). Prog Retin Eye Res. 2022;88:101018. Bishnoi K, et al. A narrative review on managing retinopathy of prematurity: insights into pathogenesis, screening, and treatment strategies. Cureus. 2024;16(3):e56168. Shah PK, et al. Retinopathy of prematurity: past, present and future. World J Clin Pediatr. 2016;5(1):35–46. Chen BH. Minimum standards for evaluating machine-learned models of high-dimensional data. Front Aging. 2022;3:901841. Su R, et al. Genomic selection in pig breeding: comparative analysis of machine learning algorithms. Genet Sel Evol. 2025;57(1):13. Ali M, Aittokallio T. Machine learning and feature selection for drug response prediction in precision oncology applications. Biophys Rev. 2019;11(1):31–9. Vannuccini S, et al. Infertility and reproductive disorders: impact of hormonal and inflammatory mechanisms on pregnancy outcome. Hum Reprod Update. 2016;22(1):104–15. Green EA, et al. The role of the interleukin-1 family in complications of prematurity. Int J Mol Sci. 2023;24(3):2815. Fundus Diseases Group in Ophthalmology Branch of Chinese Medical Association. Guidelines for screening retinopathy of prematurity in China in 2014. Chin J Ophthalmol. 2014;50(12):933–5. López-Rueda A, et al. Enhancing mortality prediction in patients with spontaneous intracerebral hemorrhage: radiomics and supervised machine learning on non-contrast computed tomography. Eur J Radiol Open. 2024;13:100618. Saberzadeh-Ardestani B, et al. Immune marker spatial distribution and clinical outcome after PD-1 blockade in mismatch repair-deficient, advanced colorectal carcinomas. Clin Cancer Res. 2023;29(20):4268–77. Yu Z. Data-driven discovery of core sleep biomarkers for predicting early cardiometabolic risk in a healthy population using machine learning. medRxiv. 2025. Chekole B, et al. Survival status and predictors of mortality among HIV-positive children initiated antiretroviral therapy in Bahir Dar town public health facilities Amhara region, Ethiopia, 2020. SAGE Open Med. 2022;10:20503121211069477. Xie C, et al. Effect of machine learning re-sampling techniques for imbalanced datasets in (18)F-FDG PET-based radiomics model on prognostication performance in cohorts of head and neck cancer patients. Eur J Nucl Med Mol Imaging. 2020;47(12):2826–35. Demircioğlu A. Applying oversampling before cross-validation will lead to high bias in radiomics. Sci Rep. 2024;14(1):11563. Lin CY, et al. Machine learning-based prediction of three-year heart failure and mortality after premature ventricular contraction ablation. Diagnostics (Basel). 2025;15(21):10281. Elshewey AM, et al. DDoS classification of network traffic in software defined networking SDN using a hybrid convolutional and gated recurrent neural network. Sci Rep. 2025;15(1):29122. Zhou W, et al. Predicting central lymph node metastasis in papillary thyroid microcarcinoma: a breakthrough with interpretable machine learning. Front Endocrinol (Lausanne). 2025;16:1537386. Zhan M, et al. Application of artificial intelligence in conjunction with clinical laboratory indicators to aid decision-making for surgical or conservative treatment of pediatric intestinal obstruction. World J Pediatr Surg. 2025;8(5):e001079. Han H et al. Development of an interpretable machine learning model to predict short-term bleeding risk in patients receiving dual antithrombotic therapy following cardiac surgery. Int J Clin Pharm. 2025. Heidari P, Milan A. Combining K-fold cross validation with bayesian hyperparameter optimization for accuracy enhancement of land cover and land use classification. Sci Rep. 2025;15(1):39758. Huang H, et al. Predicting rheological properties of asphalt modified with mineral powder: bagging, boosting, and stacking vs. single machine learning models. Mater (Basel). 2025;18(12):2985. Ruiz Sarrias O, et al. Predicting severe haematological toxicity in gastrointestinal cancer patients undergoing 5-FU-based chemotherapy: a Bayesian network approach. Cancers (Basel). 2023;15(17):4278. Kang BY, et al. Serum calcium-based interpretable machine learning model for predicting anastomotic leakage after rectal cancer resection: a multi-center study. World J Gastroenterol. 2025;31(19):105283. Yucel OE, et al. Incidence and risk factors for retinopathy of prematurity in premature, extremely low birth weight and extremely low gestational age infants. BMC Ophthalmol. 2022;22(1):367. Zhang H, et al. Risk factors for retinopathy of prematurity among preterm infants with bronchopulmonary dysplasia. J Matern Fetal Neonatal Med. 2025;38(1):2497058. Wickramasinghe LC, et al. Lung and eye disease develop concurrently in supplemental oxygen-exposed neonatal mice. Am J Pathol. 2020;190(9):1801–12. Rashidian P, Karami S, Salehi SA. A review on retinopathy of prematurity. Med Hypothesis Discov Innov Ophthalmol. 2024;13(4):201–12. Woods J, Biswas S. Retinopathy of prematurity: from oxygen management to molecular manipulation. Mol Cell Pediatr. 2023;10(1):12. Hutchinson AK, et al. Clinical models and algorithms for the prediction of retinopathy of prematurity: a report by the American Academy of Ophthalmology. Ophthalmology. 2016;123(4):804–16. Shin DR, Song IH, Lee SK. Interpretable QSAR modelling for immunotoxicity prediction using enhanced fingerprint and SHAP-based feature selection. SAR QSAR Environ Res. 2025;36(10):955–69. Raptis S, Ilioudis C, Theodorou K. From pixels to prognosis: unveiling radiomics models with SHAP and LIME for enhanced interpretability. Biomed Phys Eng Express. 2024;10(3):035022. Additional Declarations No competing interests reported. Supplementary Files SupplementaryFiles.docx Cite Share Download PDF Status: Published Journal Publication published 07 Apr, 2026 Read the published version in BMC Ophthalmology → Version 1 posted Editorial decision: Revision requested 27 Feb, 2026 Reviews received at journal 14 Feb, 2026 Reviewers agreed at journal 11 Feb, 2026 Reviews received at journal 03 Feb, 2026 Reviewers agreed at journal 15 Jan, 2026 Reviewers invited by journal 13 Jan, 2026 Editor assigned by journal 13 Jan, 2026 Editor invited by journal 12 Jan, 2026 Submission checks completed at journal 09 Jan, 2026 First submitted to journal 09 Jan, 2026 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8533267","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":574235010,"identity":"6d695737-2134-4e2c-aa1f-ba8aa737736f","order_by":0,"name":"Yang Yu","email":"","orcid":"","institution":"Qinghai University","correspondingAuthor":false,"prefix":"","firstName":"Yang","middleName":"","lastName":"Yu","suffix":""},{"id":574235011,"identity":"04951b53-841e-4192-a577-4682eea4d114","order_by":1,"name":"Yunjie Zhang","email":"","orcid":"","institution":"Qinghai Red Cross Hospital","correspondingAuthor":false,"prefix":"","firstName":"Yunjie","middleName":"","lastName":"Zhang","suffix":""},{"id":574235012,"identity":"d2fcb7e9-b7df-42f7-b286-c7e967c1778d","order_by":2,"name":"Yuanfang Xin","email":"","orcid":"","institution":"Qinghai Red Cross Hospital","correspondingAuthor":false,"prefix":"","firstName":"Yuanfang","middleName":"","lastName":"Xin","suffix":""},{"id":574235013,"identity":"96ef1123-a7ea-4d5f-b803-9f4b00d35ad3","order_by":3,"name":"Nancuo Suo","email":"","orcid":"","institution":"Qinghai Red Cross Hospital","correspondingAuthor":false,"prefix":"","firstName":"Nancuo","middleName":"","lastName":"Suo","suffix":""},{"id":574235014,"identity":"38384417-c93d-4e11-a656-a180c70d64c1","order_by":4,"name":"Xueren Ma","email":"","orcid":"","institution":"Qinghai Red Cross Hospital","correspondingAuthor":false,"prefix":"","firstName":"Xueren","middleName":"","lastName":"Ma","suffix":""},{"id":574235015,"identity":"032e91bc-f824-47d3-a300-b28dfe6317c9","order_by":5,"name":"Yumei Guan","email":"","orcid":"","institution":"Qinghai Red Cross Hospital","correspondingAuthor":false,"prefix":"","firstName":"Yumei","middleName":"","lastName":"Guan","suffix":""},{"id":574235016,"identity":"43d02de6-a29c-4d07-8b95-59dbbd9d27a5","order_by":6,"name":"Yingying Ma","email":"","orcid":"","institution":"Qinghai Red Cross Hospital","correspondingAuthor":false,"prefix":"","firstName":"Yingying","middleName":"","lastName":"Ma","suffix":""},{"id":574235017,"identity":"d61700d2-730d-427d-98be-ed78540ae952","order_by":7,"name":"Hui Yu","email":"","orcid":"","institution":"Qinghai Red Cross Hospital","correspondingAuthor":false,"prefix":"","firstName":"Hui","middleName":"","lastName":"Yu","suffix":""},{"id":574235018,"identity":"15ad15ba-2787-4854-bba5-406f20decec9","order_by":8,"name":"Qiuxia Dong","email":"","orcid":"","institution":"Qinghai Red Cross Hospital","correspondingAuthor":false,"prefix":"","firstName":"Qiuxia","middleName":"","lastName":"Dong","suffix":""},{"id":574235019,"identity":"b805141c-185f-425c-ba65-87cdf39b15b9","order_by":9,"name":"Xinzhang Li","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA/klEQVRIie3PMUsDMRTA8XcE4vJqHO9A8CukFHoI1X6VyEEnB8cbI4Wb/ADnhxBuEtze8cApeGuhDkrBUVpcb7CtSLfcjYL5E0iG/EgeQCj0BzsWALRfoN5XbT5BpayfyANBodHNTpOSOsjPtifyZFDwRFvTQY5wyDft61mqHp4hdg1qoGi9ufZ9TBou8WP4VH7O3nS+xFRYkdw/+oggxpijakGpNm6J55akGHhJZBk1T7dkHNfFC2oyXWT3iuGrqrkbJ7cF9SHbWZA4qxYyG4HLMCnruXcWpdzoC1u+qBquV5BfTpWa1+uNhxyKze8psn3u796jnhdDoVDo3/UNp5ZXBB0dJIQAAAAASUVORK5CYII=","orcid":"","institution":"Qinghai Red Cross Hospital","correspondingAuthor":true,"prefix":"","firstName":"Xinzhang","middleName":"","lastName":"Li","suffix":""}],"badges":[],"createdAt":"2026-01-06 16:08:19","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8533267/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8533267/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1186/s12886-026-04798-6","type":"published","date":"2026-04-07T15:57:45+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":100413436,"identity":"d822d722-8c8d-4764-94d9-2ffd5d7e8912","added_by":"auto","created_at":"2026-01-16 13:17:24","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":642979,"visible":true,"origin":"","legend":"","description":"","filename":"MLofROP.docx","url":"https://assets-eu.researchsquare.com/files/rs-8533267/v1/c8c29d431bb660f9bbd42fd3.docx"},{"id":100414096,"identity":"2af77d6f-e4ea-44d1-b1fd-05f35c02825f","added_by":"auto","created_at":"2026-01-16 13:18:48","extension":"json","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":12064,"visible":true,"origin":"","legend":"","description":"","filename":"b24ad371fcb14405ac465f67a15bceb8.json","url":"https://assets-eu.researchsquare.com/files/rs-8533267/v1/b52183eb91067706d903a9f6.json"},{"id":100414028,"identity":"9fd41ed3-41a2-495d-aa6c-5a3b75c5e504","added_by":"auto","created_at":"2026-01-16 13:18:35","extension":"docx","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":235983,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryFiles.docx","url":"https://assets-eu.researchsquare.com/files/rs-8533267/v1/69e356cb9004ee5ba9689227.docx"},{"id":100413997,"identity":"df2f6bab-c744-4fde-98f2-aeb30a78fc85","added_by":"auto","created_at":"2026-01-16 13:18:34","extension":"xml","order_by":3,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":125883,"visible":true,"origin":"","legend":"","description":"","filename":"b24ad371fcb14405ac465f67a15bceb81enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-8533267/v1/1119b6001f15f48e28f6dcc9.xml"},{"id":100414033,"identity":"72d100a7-4e27-42f4-a86c-668a7f992619","added_by":"auto","created_at":"2026-01-16 13:18:35","extension":"png","order_by":4,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":172107,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-8533267/v1/bfe5f69846d0aaf10b90ad47.png"},{"id":100413674,"identity":"39c6067b-4491-4443-82cb-eb91a05e3d24","added_by":"auto","created_at":"2026-01-16 13:17:55","extension":"png","order_by":5,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":154808,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-8533267/v1/e8aa8afe33d10549208ca361.png"},{"id":100414027,"identity":"fc1c1658-14f3-452e-a41a-afb2675aab0e","added_by":"auto","created_at":"2026-01-16 13:18:35","extension":"png","order_by":6,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":113866,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-8533267/v1/96c4f2ad753a53ff818e6f57.png"},{"id":100413904,"identity":"f3bb0491-3359-42f9-81de-0b9f8b5c5f02","added_by":"auto","created_at":"2026-01-16 13:18:22","extension":"jpeg","order_by":7,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":3239330,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage4.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8533267/v1/95e14d3019ee2035906510a1.jpeg"},{"id":100413934,"identity":"199240a3-ba0b-4d92-bdad-1bb74c00eb7a","added_by":"auto","created_at":"2026-01-16 13:18:29","extension":"png","order_by":8,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":33945,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-8533267/v1/6fe7d059fbbd4a1f60a30079.png"},{"id":100413440,"identity":"116fa4e0-6e2e-414e-be4c-a42e976e2eb6","added_by":"auto","created_at":"2026-01-16 13:17:25","extension":"png","order_by":9,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":29532,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-8533267/v1/d88bd564d5db98052e0c3093.png"},{"id":100413726,"identity":"af763785-d83e-4cd2-b132-a21fb137dd9e","added_by":"auto","created_at":"2026-01-16 13:18:04","extension":"png","order_by":10,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":25602,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-8533267/v1/b435ef6f7657e0f705299755.png"},{"id":100413561,"identity":"96a247c0-ac9f-453f-ab0c-1061b5d6fd50","added_by":"auto","created_at":"2026-01-16 13:17:38","extension":"png","order_by":11,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":25669,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-8533267/v1/b1b880ee1e77aead5732acb8.png"},{"id":100413874,"identity":"c4f639f9-2e4e-417d-b3b4-34b227047279","added_by":"auto","created_at":"2026-01-16 13:18:21","extension":"xml","order_by":12,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":122255,"visible":true,"origin":"","legend":"","description":"","filename":"b24ad371fcb14405ac465f67a15bceb81structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-8533267/v1/ef74ba69dde49b23e3007713.xml"},{"id":100414026,"identity":"6ef63d6d-bc4a-4c6b-b148-888e3247092d","added_by":"auto","created_at":"2026-01-16 13:18:35","extension":"html","order_by":13,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":135140,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-8533267/v1/80d633c195168cccb98fa43b.html"},{"id":100413709,"identity":"223c31f4-6c5e-4d7c-bb70-c55bb9a9772a","added_by":"auto","created_at":"2026-01-16 13:18:03","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":150824,"visible":true,"origin":"","legend":"\u003cp\u003eReceiver operating characteristic (ROC) curves of the ten prediction models on the independent testing set (n = 668).Areas under the ROC curve (AUCs) are reported with 95% confidence intervals (CIs): logistic regression, 0.940 (0.917–0.960); decision tree, 0.954 (0.934–0.972); random forest, 0.933 (0.903–0.957); XGBoost, 0.906 (0.862–0.945); support vector machine (SVM), 0.854 (0.799–0.901); Gaussian naïve Bayes, 0.937 (0.915–0.957); multilayer perceptron (MLP), 0.909 (0.868–0.943); TabNet, 0.820 (0.755–0.876); LightGBM, 0.868 (0.808–0.918); and soft-voting ensemble, 0.949 (0.925–0.968).\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-8533267/v1/bbb3c5b450dbf528e7767f00.png"},{"id":100414112,"identity":"b21b85d6-2f21-4c6f-8aeb-9fe55de7982b","added_by":"auto","created_at":"2026-01-16 13:18:50","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":127146,"visible":true,"origin":"","legend":"\u003cp\u003eCalibration curves (reliability diagrams) of the four core models on the independent testing set (n = 668). The dashed diagonal line indicates perfect calibration (predicted probability equals observed probability). Curves show the relationship between the mean predicted probabilities and the corresponding observed event frequencies across probability bins. Models include logistic regression, decision tree, random forest, and the soft-voting ensemble. Overall, the curves deviate from the diagonal, indicating calibration bias in the predicted probabilities on the testing set.\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-8533267/v1/062351c91cbf4641de459193.png"},{"id":100413531,"identity":"b80988dc-5b73-40dc-9237-6c85e6649e46","added_by":"auto","created_at":"2026-01-16 13:17:36","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":94398,"visible":true,"origin":"","legend":"\u003cp\u003eDecision curve analysis (DCA) for predicting retinopathy of prematurity (ROP) on the independent testing set (n = 668). Net benefit is plotted against the threshold probability. Curves represent logistic regression, decision tree, random forest, and the soft-voting ensemble, and are compared with the default strategies of treat-all and treat-none. Across clinically relevant thresholds, the models—particularly the random forest—show higher net benefit than the two default strategies, indicating potential clinical utility for risk stratification.\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-8533267/v1/3a901e69709235ba4a644443.png"},{"id":100413637,"identity":"b7ff9775-801e-4258-95e8-adda38f09790","added_by":"auto","created_at":"2026-01-16 13:17:52","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":135011,"visible":true,"origin":"","legend":"\u003cp\u003eSHAP-based interpretation of the optimal random forest model. (a) Global feature importance ranked by the mean absolute SHAP value. Larger values indicate stronger overall contributions to the model output. (b) SHAP summary (beeswarm) plot showing the distribution of SHAP values for each predictor. Each dot represents one subject; color encodes the feature value (blue: low, red: high). Positive SHAP values indicate an increased predicted risk of ROP, whereas negative values indicate a decreased predicted risk.\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-8533267/v1/62cef9f4ca6d19292bba1254.png"},{"id":106808816,"identity":"63151a45-238a-4c56-8964-790ead50e730","added_by":"auto","created_at":"2026-04-13 16:02:30","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1289493,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8533267/v1/a7489c5d-c95c-4bad-b670-34fb219179dd.pdf"},{"id":100414030,"identity":"6a79e69e-7dbc-49b2-9aa1-168159982071","added_by":"auto","created_at":"2026-01-16 13:18:35","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":235983,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryFiles.docx","url":"https://assets-eu.researchsquare.com/files/rs-8533267/v1/b08f801fc92ec5fd31771ca8.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"Machine Learning-Based Individualized Prediction: Risk Assessment of Retinopathy in Preterm Infants at High Altitude","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eRetinopathy of prematurity (ROP) is a retinal neovascularization in premature and low birth weight (BW) infants with abnormal retinal vascular development as the core pathological feature, which may trigger traction retinal detachment, acting as one of the leading causes of childhood blindness [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. In general, retinal vessels are developed normally between 36 and 40 weeks of gestation. However, there may be a disrupted retinal vascularization in premature infants due to their early departure from the uterine environment, forming ischemic and hypoxic areas. It may further trigger abnormal proliferation of retinal neovascularization, hemorrhage, and fibrous tissue proliferation owing to the resultant excessive expression of vascular endothelial growth factor (VEGF), ultimately inducing retinal detachment and permanent blindness. According to a latest systematic review, the overall prevalence of ROP reaches 32% among screened premature infants worldwide, with severe ROP accounting for 7.5% [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. Longitudinal study also documents that 50% of untreated threshold ROP cases progress to retinal detachment, 90% of which ultimately progress to blindness [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. Moreover, cerebral palsy may have an incidence of 20\u0026ndash;30% in children with severe ROP, coupled with a prevalence of 40\u0026ndash;50% for cognitive impairment, which are significantly higher than those in ROP-free premature infants [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eNowadays, ROP management relies primarily on regular fundus screening to detect threshold lesions requiring timely laser photocoagulation or anti-VEGF drug therapy [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. However, this model faces dual challenges in practice. First, the rate of receiving standardized fundus screening is relatively low for high-risk premature infants in low- and middle-income countries, merely 20\u0026ndash;30%, much lower than the 90% set by the World Health Organization. Second, the widely applied screening criteria based on gestational age (GA) and BW provide broad coverage, which, however, produce a substantial screening burden, carrying risks of delayed diagnosis and missed cases, in addition to exacerbating the burden on clinical resources [\u003cspan additionalcitationids=\"CR7\" citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. In addition, the risk factors identified for ROP, such as low GA, low BW, and prolonged oxygen therapy, present with relatively limited predictive value for individualized risk stratification [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e, \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. It highlights an urgent need of adopting a more precise, data-driven risk-evaluation strategy to efficiently identify high-risk infants, thus optimizing the allocation of limited medical resources and ensuring timely interventions.\u003c/p\u003e \u003cp\u003eWith the emergence and application of machine learning (ML), we may acquire new opportunities to address all the above challenges. This technology enables the analysis of complex and high-dimensional clinical data, thus facilitating the capture of subtle patterns and interactions that are difficult to recognize by traditional methods, which may benefit the development of more robust ROP predictive models [\u003cspan additionalcitationids=\"CR12\" citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]. Despite the feasibility of ML validated by preliminary studies, there are still many challenges and unresolved issues in this field. To be specific, for many models constructed based on limited clinical characteristics, there is usually an ignorance of the potential impact of perinatal complications [e.g., bronchopulmonary dysplasia (BPD) and perinatal asphyxia] and metabolic factors. Meanwhile, there is still a lack of sufficient research to support the generalization capabilities of existing models across different populations, especially those who reside in unique environments (e.g., high-altitude hypoxia) [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e, \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]. The unique exposure to chronic hypoxia in high-altitude areas (e.g., Qinghai and Tibet) may foster distinct ROP risk characteristics, resulting in the uncertainty on the applicability of screening models developed for populations in plain areas. Therefore, to deal with current study gaps and improve health outcomes., it is critical to construct customized ML predictive models specifically for populations chronically exposed to hypoxia at high altitude\u003c/p\u003e \u003cp\u003eTo fill the aforementioned gaps, this study was conducted to develop and validate a ML-based ROP predictive model based on a retrospective cohort of premature infants at high altitude. Our study indented to: 1) identify independent risk factors for ROP and construct a concise and efficient feature set; and 2) systematically evaluate and compare the predictive performance of multiple ML algorithms on this dataset to determine the optimal approach. This study hypothesized that a ML model integrating multidimensional clinical data with specific environmental exposure data can achieve early and precise identification of infants at high risk for ROP. Our findings are expected to provide a data-driven tool for optimizing screening strategies and enabling timely intervention, thus improving visual prognosis in this population.\u003c/p\u003e"},{"header":"2. Materials and Methods","content":"\u003cp\u003e \u003cb\u003eStudy objects\u003c/b\u003e \u003c/p\u003e \u003cp\u003eAs a retrospective cohort study, it was conducted with the enrollment of preterm infants hospitalized in the Neonatology Department of Qinghai Red Cross Hospital between May 2014 and May 2025. All these infants were incorporated for undergoing standardized ROP fundus screening using the Retcam 3 wide-field digital imaging system. The diagnosis of ROP followed the diagnostic criteria adhering to the Guidelines for Screening ROP in China in 2014 [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e]. Inclusion criteria: 1) GA\u0026thinsp;\u0026lt;\u0026thinsp;37 weeks; 2) completion of the ROP screening protocol; and 3) complete clinical records. Exclusion criteria: 1) severe congenital deformity or chromosomal abnormalities; 2) other significant ocular pathologies interfering with ROP assessment; 3) missing data for key variables (e.g., oxygen therapy duration, BW); and 4) death or voluntary discharge before screening via initial fundus.\u003c/p\u003e \u003cp\u003e \u003cb\u003eClinical data collection\u003c/b\u003e \u003c/p\u003e \u003cp\u003eThis study retrospectively collected data from the electronic medical record system of Qinghai Red Cross Hospital, with the variables including baseline clinical information (e.g., gender, GA, BW, and Apgar scores), perinatal factors (e.g., premature rupture of membranes, and antenatal steroid use), treatment status (e.g., respiratory support, nutritional strategies, and blood transfusion), related complications involving respiratory (e.g., BPD, and RDS), circulatory (e.g., PDA, and PPHN), neurological (e.g., ICH), infectious (e.g., sepsis), and metabolic disorders, and laboratory parameters [complete blood count, liver and kidney function, inflammatory markers (procalcitonin, PCT), metabolites (lactate, and 25(OH)D\u003csub\u003e3\u003c/sub\u003e), and blood gas analysis parameters].\u003c/p\u003e \u003cp\u003e \u003cb\u003eData preprocessing and feature screening\u003c/b\u003e \u003c/p\u003e \u003cp\u003eBased on ROP status (dependent variable, binary classification), the dataset was stratified and sampled at a 7:3 ratio to form a training set (n\u0026thinsp;=\u0026thinsp;1,470) and a testing set (n\u0026thinsp;=\u0026thinsp;668) [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e]. The training set-based feature selection is shown as follows:\u003c/p\u003e \u003cp\u003e(1) Univariate analysis was performed on 59 indicators to preliminarily screen potential predictors. Specifically, following the testing for data normality, continuous variables are presented as mean\u0026thinsp;\u0026plusmn;\u0026thinsp;standard deviation when normally distributed, and as median (interquartile range) when non-normally distributed. Differences between groups were assessed using independent samples t‑test for normally distributed data and the Mann‑Whitney U test for non‑normally distributed data. Categorical variables were compared using the chi‑square test or Fisher\u0026rsquo;s exact test, depending on the expected cell counts.\u003c/p\u003e \u003cp\u003e(2) Relevant variables were further optimized by integrating clinical evaluation with multilinear tests (VIF\u0026thinsp;\u0026lt;\u0026thinsp;5).\u003c/p\u003e \u003cp\u003e(3) Automated feature selection was performed using Least Absolute Shrinkage and Selection Operator (LASSO) regression.\u003c/p\u003e \u003cp\u003eIn this analysis, the optimal value of the penalty parameter λ was determined by employing 10-fold cross-validation. The final model was chosen based on the \u0026ldquo;one standard-deviation rule\u0026rdquo; (λ. 1se) to enhance the simplicity and generalization of the constructed model under the premise of maintaining performance [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e, \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e].\u003c/p\u003e \u003cp\u003e \u003cb\u003eMissing data imputation and class imbalance handling\u003c/b\u003e \u003c/p\u003e \u003cp\u003eMissing data imputation\u003c/p\u003e \u003cp\u003eA separate handling strategy was adopted to process the missing data in the training and testing sets to prevent data leakage and ensure independence between model training and evaluation.\u003c/p\u003e \u003cp\u003eFirst, variables (including repeated laboratory measures and non-standardized data entries) with a missing rate of \u0026gt;\u0026thinsp;30% in the training set were excluded from the study. Then, for the remaining variables with a missing rate\u0026thinsp;\u0026le;\u0026thinsp;30%, multiple imputation with the Chain Equations (MICE) algorithm [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e] was applied using the R software package MICE (version 3.14.0).\u003c/p\u003e \u003cp\u003eFor imputation, Predictive Mean Matching (PMM) and logistic regression were respectively used for continuous variables and binary variables. In terms of the selection of PMM, it can preserve the original distribution characteristics of variables; is applicable for non-normally distributed data; and can ensure imputed values within reasonable ranges. As for the setting of critical parameters, the maximum iterations was 50, and random seed was 42 to generate 5 complete datasets.\u003c/p\u003e \u003cp\u003eTo strictly prevent data leakage, the imputation should follow several steps below: (1) The imputation model parameters of the training set were designed based on the data estimation of the training set; (2) The trained imputation model was applied to the testing set via the newdata parameter of the mice.mids() function; and (3) the analysis results from the five imputed datasets were integrated using the Rubin\u0026rsquo;s rules.\u003c/p\u003e \u003cp\u003eAccording to the analysis of missing data patterns in the training set, the median missing rate was 8.2% (range: 2.1%\u0026ndash;28.7%) and 5.6% (range: 0.3%\u0026ndash;22.4%) for the continuous and categorical variables, respectively. with the highest missing rates for PCT (28.7%) and 25(OH)D (24.3%).\u003c/p\u003e \u003cp\u003eClass imbalance handling\u003c/p\u003e \u003cp\u003eIn this study, the class imbalance of ROP samples (11.8% of the positive samples) in the training set was processed by the synthetic minority over-sampling technique (SMOTE) [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eSMOTE was strictly applied after feature selection and before model training on the training set. To be specific, the initial step was the extraction of feature subsets correspondent to 11 screened key features (n\u0026thinsp;=\u0026thinsp;1,470, including 173 ROP positive cases and 1,297 negative cases) using LASSO. Then, SMOTE was performed using Python\u0026rsquo;s imbalanced-learn library (version 0.10.1) with parameters of sampling_strategy=\u0026lsquo;auto\u0026rsquo; (1:1 balance), k_neighbors\u0026thinsp;=\u0026thinsp;5, random_state\u0026thinsp;=\u0026thinsp;42. After that, minority class samples in feature space were subjected to linear interpolation using SMOTE to generate synthetic samples, with the formula x_new\u0026thinsp;=\u0026thinsp;x_i\u0026thinsp;+\u0026thinsp;λ\u0026times;(x_neighbor - x_i), where λ\u0026thinsp;~\u0026thinsp;U(0,1).\u003c/p\u003e \u003cp\u003eThe sample size of the training set was increased to 2,594 cases (1,297 ROP-positive and 1,297 ROP-negative cases) after SMOTE processing, achieving a perfect class balance. However, the testing set [n\u0026thinsp;=\u0026thinsp;668, including 68 (10.2%) ROP-positive cases] retained its original imbalanced distribution to authentically reveal clinical scenarios and ensure ecological validity in model performance evaluation [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThis strategy adhered to fundamental ML principles that the testing set can be performed with preprocessing optimization, while the testing set must maintain true distributions, thereby preventing risks of data leakage and performance overestimation.\u003c/p\u003e \u003cp\u003e \u003cb\u003eModel evaluation and statistical test\u003c/b\u003e \u003c/p\u003e \u003cp\u003eBoth internal and external validations were employed jointly to comprehensively assess the model performance. Specifically, the internal validation was performed using five-fold cross-validation on the training set. A stratified sampling strategy was adopted in this process, which was implemented via the StratifiedKFold function in the scikit-learn library, with parameters of n_splits\u0026thinsp;=\u0026thinsp;5, shuffle\u0026thinsp;=\u0026thinsp;True, and random_state\u0026thinsp;=\u0026thinsp;42. It aimed to ensure the maintenance of the original positive-to-negative sample ratio in each split, and guarantee the reproducibility of the splitting process. Four folds were used for model training and the remaining 1 fold for validation in each round of cross-validation, with 5 times of repeated processing until all subsets had been used as the validation set once. Finally, the average area under the receiver operating characteristic curve (AUC) of all 5 rounds of validation was used for robust estimation of the model performance [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e, \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e]. In particular, this method was highly suitable for addressing the class imbalance (where ROP positive samples accounted for 11% approximately) in this study. It could facilitate an effective reduction of the bias of performance evaluation and enhance the generalization capability of the constructed model. Moreover, the cross-validation results could be used to further guide the Bayesian hyperparameter optimization to inhibit model overfitting.\u003c/p\u003e \u003cp\u003eThe metrics for model evaluation were multi-dimensional, involving discriminative performance indicators, prediction calibration metrics (i.e., the Brier score) that enabled a quantification of the consistency between the model-predicted probabilities and the actual probabilities, and clinical utility metrics. The net benefit values at different decision thresholds could be calculated through decision curve analysis, thereby achieving an evaluation of the potential value of the model in clinical settings [\u003cspan additionalcitationids=\"CR26\" citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e]. In addition, all quantitative and categorical data were described by appropriate descriptive statistics, with differences in AUC performance across models on the testing set compared for significance using the DeLong test.\u003c/p\u003e \u003cp\u003e \u003cb\u003eML\u003c/b\u003e \u003c/p\u003e \u003cp\u003eIn this study, nine representative ML models covering different modeling approaches were constructed and compared to systematically evaluate and identify the optimal algorithm for the ROP prediction task. The specific models were gradient boosting frameworks (XGBoost, and LightGBM), kernel-based methods (SVM), linear models (logistic regression), tree-based models (decision trees, random forest), probabilistic models (GaussianNB), and deep learning models (MLP, TabNet). To ensure fairness and objectivity of performance comparison, all models were independently trained and tuned on the same training set.\u003c/p\u003e \u003cp\u003eIn order to avoid overfitting and optimize model performance, this study applied the Bayesian hyperparameter optimization based on five-fold cross-validatio. Specifically and firstly, the Gaussian process was utilized to construct a surrogate model, with the average AUC value from the five-fold cross-validation as the objective function. Secondly, evaluation points with the greatest potential in the parameter space were intelligently determined by functions like expected improvement. The optimization was achieved using the BayesSearchCV from the scikit-optimize library (v0.9.0) [\u003cspan additionalcitationids=\"CR29\" citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e]. Table S2 summarized the specific optimal hyperparameter combinations of all models after Bayesian optimization.\u003c/p\u003e \u003cp\u003eIn addition, to objectively evaluate the model\u0026rsquo;s generalization performance, a completely independent testing set was used for the external validation of the final model. This study adopted a comprehensive evaluation metric system, including the AUC and its 95% confidence interval (95% CI), accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and F1 score. The systematic modeling framework spanned from data preprocessing, feature engineering, model training, and rigorous validation, which could effectively ensure the reliability of performance comparisons in different models, and guarantee the reproducibility of the entire study process and results [\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e].\u003c/p\u003e \u003cp\u003e \u003cb\u003eData processing and statistical analysis\u003c/b\u003e \u003c/p\u003e \u003cp\u003eAll statistical analyses and predictive model construction were conducted within the specific computational environment. The R software (version 4.2.1) was used for variable selection, univariate analysis, and LASSO regression. In addition, Python libraries (e.g., scikit-learn, XGBoost, and LightGBM) on the Anaconda Navigator platform was accessed to conduct ML model training, optimization, and evaluation.\u003c/p\u003e"},{"header":"3. Results","content":"\u003cp\u003e \u003cb\u003eComparison of baseline characteristics\u003c/b\u003e \u003c/p\u003e \u003cp\u003eTo assess the balance of the two datasets after stratified sampling, In, this study systematically compared the baseline characteristic distributions between the training set (n\u0026thinsp;=\u0026thinsp;1,470) and the testing set (n\u0026thinsp;=\u0026thinsp;668), as presented in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. There were no statistically significant inter-group differences for the vast majority of variables (all \u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026gt;\u0026thinsp;0.05), confirming the validity of the sampling strategy. Specifically, the training and testing sets were observed with consistent distribution of all variables ultimately included in the predictive model, providing a reliable feature foundation for core predictive tasks. A good inter-group comparability was noticed for the frequency, proportion, central tendency, and dispersion degree of categorical variables (e.g., gender, maternal pregnancy complications, and neonatal complications) and continuous variables (e.g., GA, Apgar score, key laboratory indicators) (all \u003cem\u003eP\u0026thinsp;\u0026gt;\u003c/em\u003e\u0026thinsp;0.05). Meanwhile, some variables excluded from the model (e.g., necrotizing enterocolitis, BW, and oxygen saturation) with statistical differences (all \u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05), given that their effect sizes were limited and irrelevant to the model, did not affect the validity of subsequent modeling and validation. Collectively, the inter-set balanced distribution of key predictive variables established a solid basis for robust modeling and generalization capability assessment.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eBaseline characteristics of the training and testing cohorts (key variables).\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCharacteristic\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTraining set (n\u0026thinsp;=\u0026thinsp;1470)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eTesting set (n\u0026thinsp;=\u0026thinsp;668)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003ep value\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMale, n (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e829(56.4)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e367(54.9)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.561\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAltitude category, n (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.915\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLow (\u0026lt;\u0026thinsp;1500 m)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2(0.1)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1(0.1)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMiddle (1500ཞ2500 m)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e798(54.3)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e369(55.2)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHigh (\u0026ge;\u0026thinsp;2500 m)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e670(45.6)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e298(44.6)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eROP, n (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e173(11.8)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e68(10.2)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.316\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colspan=\"4\" nameend=\"c4\" namest=\"c1\"\u003e \u003cp\u003ePredictors included in the final model\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMode of delivery\u0026thinsp;=\u0026thinsp;Cesarean section, n (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1000(68)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e455(68.1)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePremature rupture of membranes, n (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e510(34.7)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e238(35.6)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.711\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRespiratory failure, n (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e708(48.2)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e335(50.1)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.421\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBronchopulmonary dysplasia, n (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e208(14.1)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e84(12.6)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.36\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHyperbilirubinemia, n (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e963(65.5)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e428(64.1)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.55\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePerinatal asphyxia, n (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e283(19.3)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e108(16.2)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.099\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePulmonary surfactant administration, n (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e344(23.4)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e163(24.4)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.653\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGestational age, weeks, median (IQR)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e34.14(32.43, 35.43)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e34.29(32.57, 35.43)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.339\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eParenteral nutrition duration, days, median (IQR)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e16.00(11.00, 25.00)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e15.00(11.00, 23.00)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.392\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFasting duration, days, median (IQR)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1.00(0.00, 2.00)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.00(0.00, 2.00)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.228\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTotal bile acids, \u0026micro;mol/L, median (IQR)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e9.10(6.00, 13.60)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e9.35(6.10, 14.30)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.233\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"4\"\u003e\u003cb\u003eFootnote\u003c/b\u003e: Data are presented as n (%) for categorical variables and as median (IQR) for continuous variables (as appropriate). p values were calculated using the chi-square test or Fisher\u0026rsquo;s exact test for categorical variables and the independent-samples t-test or Mann\u0026ndash;Whitney U test for continuous variables, depending on distribution.\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAbbreviations:\u0026nbsp;\u003c/strong\u003eROP, retinopathy of prematurity.\u003c/p\u003e\n\u003cp\u003eTo systematically identify potential predictors of ROP, this study continued to compare the baseline characteristics between the ROP lesion group (n=173) and the non-lesion group (n=1,297) within the training set, as shown in\u0026nbsp;Table S1A and Table S1B.\u003c/p\u003e\n\u003cp\u003eBased on the univariate analysis, there existed significant inter-group differences in multiple key indicators(all \u003cem\u003eP\u003c/em\u003e\u0026lt;0.05). The analysis of primary risk factors showed significantly lower GA, BW, and Apgar scores in the ROP group (all \u003cem\u003eP\u003c/em\u003e\u0026lt;0.001), confirming that preterm birth, low BW, and poor birth status were core ROP risk factors. There were significant differences in complications and therapeutic interventions, with \u0026nbsp;the ROP group exhibiting significantly higher proportion of BPD and neonatal resuscitation (\u003cem\u003eP\u003c/em\u003e\u0026lt;0.001). Patients in the ROP group were also observed with longer oxygen therapy duration, higher inspired oxygen concentrations, and more frequent invasive respiratory support (all \u003cem\u003eP\u003c/em\u003e\u0026lt;0.001), supporting an intimate association between oxygen therapy intensity and invasive respiratory support. Laboratory indicators exhibited characteristic patterns, with the ROP group presenting significantly elevated blood lactate and total bile acid (TBA) levels, as well as obviously lower nutritional markers such as albumin and hemoglobin (all \u003cem\u003eP\u003c/em\u003e\u0026lt;0.001). Noticeably, 25(OH)D levels were elevated in the ROP group (\u003cem\u003eP\u003c/em\u003e\u0026lt;0.001). In addition, some variables like gender (\u003cem\u003eP\u003c/em\u003e=0.035) were observed with statistical differences, yet with limited clinical significance. But there were no significant inter-group differences in gestational hypertension or patent ductus arteriosus.\u003c/p\u003e\n\u003cp\u003eCollectively, based on the univariate analysis results in Table S1A and Table S1B,\u0026nbsp;this study identified key clinical characteristics closely associated with ROP occurrence. These variables, which were multi-dimensional, involved preterm birth severity, BW, respiratory support intensity, neonatal complications, and metabolic indicators. Significantly, these variables with significant inter-group differences provided potentially candidate features with strong discrimination power for subsequent construction of ML-based predictive models.\u003c/p\u003e\n\u003ch2\u003eFeature screening based on LASSO regression\u003c/h2\u003e\n\u003cp\u003eBased on the training set (n=1,470), LASSO regression analysis was performed to determine the optimal penalty parameter (\u0026lambda;.1se=0.045) by applying 10-fold cross-validation. It achieved a successful selection of 11 independent predictors of ROP from the 38 initially screened significant variables (Table 2). Consequently, perinatal asphyxia (coefficient: 0.898), pulmonary surfactant administration (coefficient: 0.489), and bronchopulmonary dysplasia (coefficient: 0.404) were strong risk factors for ROP (all P\u0026lt;0.001). These results were strongly consistent with the oxygen-related pathophysiological mechanisms of ROP, collectively supporting the severity of respiratory disease was a key driver of ROP. In contrast, gestational age (coefficient: -0.157) was a remarkable protective factor with an odds ratio (OR) of 0.855 (95% CI: 0.812\u0026ndash;0.901), indicating a reduced risk of ROP by approximately 14.5% for each additional week of GA. In addition, it had statistically significant positive association with total bile acids (coefficient: 0.002), despite small coefficients (P=0.013), suggesting the potential of hepatic-biliary metabolic disorder as a weak yet independent risk for ROP. While further investigation should be scheduled to clarify the effects of other protective factors such as mode of delivery (coefficient: -0.200) and respiratory failure (coefficient: -0.347). Altogether, these results confirmed the advantages of LASSO regression in efficient dimensionality reduction and feature screening, which could benefit subsequent construction of a streamlined and effective predictive model.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable 2.\u0026nbsp;\u003c/strong\u003ePredictors selected by LASSO regression for ROP risk modeling (training set; \u0026lambda;_1se = 0.045).\u003c/p\u003e\n\u003cdiv\u003e\n \u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\" width=\"623\"\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 223px;\"\u003e\n \u003cp\u003e\u003cstrong\u003ePredictor\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 88px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eType\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 220px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eUnit / Coding\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 107px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eCoefficient\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 223px;\"\u003e\n \u003cp\u003eMode of Delivery\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 88px;\"\u003e\n \u003cp\u003eBinary\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 220px;\"\u003e\n \u003cp\u003eVaginal delivery = 0; Cesarean section = 1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 107px;\"\u003e\n \u003cp\u003e-0.199845\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 223px;\"\u003e\n \u003cp\u003ePremature Rupture of Membranes\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 88px;\"\u003e\n \u003cp\u003eBinary\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 220px;\"\u003e\n \u003cp\u003eYes = 1; No = 0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 107px;\"\u003e\n \u003cp\u003e0.129999\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 223px;\"\u003e\n \u003cp\u003eRespiratory Failure\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 88px;\"\u003e\n \u003cp\u003eBinary\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 220px;\"\u003e\n \u003cp\u003eYes = 1; No = 0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 107px;\"\u003e\n \u003cp\u003e-0.347105\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 223px;\"\u003e\n \u003cp\u003eBronchopulmonary Dysplasia\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 88px;\"\u003e\n \u003cp\u003eBinary\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 220px;\"\u003e\n \u003cp\u003eYes = 1; No = 0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 107px;\"\u003e\n \u003cp\u003e0.403884\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 223px;\"\u003e\n \u003cp\u003ePerinatal Asphyxia\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 88px;\"\u003e\n \u003cp\u003eBinary\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 220px;\"\u003e\n \u003cp\u003eYes = 1; No = 0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 107px;\"\u003e\n \u003cp\u003e0.898120\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 223px;\"\u003e\n \u003cp\u003eHyperbilirubinemia\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 88px;\"\u003e\n \u003cp\u003eBinary\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 220px;\"\u003e\n \u003cp\u003eYes = 1; No = 0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 107px;\"\u003e\n \u003cp\u003e0.323956\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 223px;\"\u003e\n \u003cp\u003ePulmonary Surfactant Administration\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 88px;\"\u003e\n \u003cp\u003eBinary\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 220px;\"\u003e\n \u003cp\u003eYes = 1; No = 0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 107px;\"\u003e\n \u003cp\u003e0.489140\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 223px;\"\u003e\n \u003cp\u003eGestational Age\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 88px;\"\u003e\n \u003cp\u003eContinuous\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 220px;\"\u003e\n \u003cp\u003eweeks\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 107px;\"\u003e\n \u003cp\u003e-0.156957\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 223px;\"\u003e\n \u003cp\u003eParenteral nutrition duration\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 88px;\"\u003e\n \u003cp\u003eContinuous\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 220px;\"\u003e\n \u003cp\u003edays\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 107px;\"\u003e\n \u003cp\u003e0.013350\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 223px;\"\u003e\n \u003cp\u003eFasting Duration\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 88px;\"\u003e\n \u003cp\u003eContinuous\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 220px;\"\u003e\n \u003cp\u003edays\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 107px;\"\u003e\n \u003cp\u003e0.100320\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 223px;\"\u003e\n \u003cp\u003eTotal Bile Acids\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 88px;\"\u003e\n \u003cp\u003eContinuous\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 220px;\"\u003e\n \u003cp\u003e\u0026mu;mol/L\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 107px;\"\u003e\n \u003cp\u003e0.002007\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003c/table\u003e\n\u003c/div\u003e\n\u003cp\u003e\u003cstrong\u003eFootnote:\u0026nbsp;\u003c/strong\u003ePredictors were selected using LASSO regression at \u0026lambda;_1se. LASSO coefficients are shown for feature selection and do not directly represent odds ratios.\u003c/p\u003e\n\u003cp\u003eThrough feature screening based on LASSO regression, the predictive variables were streamlined from the initial 59 to 11 core factors, and model degrees of freedom reduced from 38 to 11. Eventually, it contributed to an effective control of the model complexity and mitigation of overfitting risks, while preserving key predictive information. Clinical evaluation further validated clear clinical validity possessed by all screened variables, with their pathophysiological significance aligning with established ROP pathogenesis. Collectively, our obtained feature set comprehensively covered multiple clinical dimensions of perinatal conditions, complications, therapeutic interventions, and laboratory indicators, laying an ideal feature foundation for the establishment of ML predictive models with high accuracy and strong interpretability.\u003c/p\u003e\n\u003ch2\u003eModel performance evaluation\u003c/h2\u003e\n\u003cp\u003eIn this study, based on the systematic evaluation of these models on an independent testing set (Table\u0026nbsp;3), we continued to comprehensively compared the discriminative performance, calibration, and clinical utility of nine ML models). All models demonstrated strong discriminatory ability (AUC\u0026gt;0.82), with decision trees (AUC: 0.954), logistic regression (AUC: 0.940), and random forest (AUC: 0.933) validated to have the most outstanding performance.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable 3.\u003c/strong\u003e Predictive performance of the machine-learning models in the training set (5-fold cross-validation) and the independent testing set.\u003c/p\u003e\n\u003cdiv\u003e\n \u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\" width=\"631\"\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 104px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eModel\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eAUC (training, 5-fold CV\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 98px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eAUC\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e(testing, 95% CI)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eAccuracy\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 57px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eSensitivity\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eSpecificity\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e\u003cstrong\u003ePPV\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eNPV\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 54px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eF1-score\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 104px;\"\u003e\n \u003cp\u003eDecision tree\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003e0.912\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 98px;\"\u003e\n \u003cp\u003e0.954 (0.919\u0026ndash;0.989)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.891\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 57px;\"\u003e\n \u003cp\u003e0.794\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.902\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.478\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.975\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 54px;\"\u003e\n \u003cp\u003e0.597\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 104px;\"\u003e\n \u003cp\u003eSoft-voting ensemble\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003e0.942\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 98px;\"\u003e\n \u003cp\u003e0.949 (0.912\u0026ndash;0.986)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.900\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 57px;\"\u003e\n \u003cp\u003e0.750\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.917\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.505\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.970\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 54px;\"\u003e\n \u003cp\u003e0.604\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 104px;\"\u003e\n \u003cp\u003eLogistic regression\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003e0.901\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 98px;\"\u003e\n \u003cp\u003e0.940 (0.901\u0026ndash;0.980)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.850\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 57px;\"\u003e\n \u003cp\u003e0.824\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.853\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.389\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.977\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 54px;\"\u003e\n \u003cp\u003e0.528\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 104px;\"\u003e\n \u003cp\u003eGaussian naive Bayes\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003e0.865\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 98px;\"\u003e\n \u003cp\u003e0.937 (0.897\u0026ndash;0.978)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.861\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 57px;\"\u003e\n \u003cp\u003e0.765\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.872\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.403\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.970\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 54px;\"\u003e\n \u003cp\u003e0.528\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 104px;\"\u003e\n \u003cp\u003eRandomForest\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003e0.960\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 98px;\"\u003e\n \u003cp\u003e0.933 (0.891\u0026ndash;0.974)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.918\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 57px;\"\u003e\n \u003cp\u003e0.691\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.943\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.580\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.964\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 54px;\"\u003e\n \u003cp\u003e0.631\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 104px;\"\u003e\n \u003cp\u003eMultilayer perceptron\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003e0.920\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 98px;\"\u003e\n \u003cp\u003e0.909 (0.861\u0026ndash;0.956)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.906\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 57px;\"\u003e\n \u003cp\u003e0.691\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.930\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.528\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.964\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 54px;\"\u003e\n \u003cp\u003e0.599\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 104px;\"\u003e\n \u003cp\u003eXGBoost\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003e0.972\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 98px;\"\u003e\n \u003cp\u003e0.906 (0.857\u0026ndash;0.954)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.916\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 57px;\"\u003e\n \u003cp\u003e0.662\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.945\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.577\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.961\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 54px;\"\u003e\n \u003cp\u003e0.616\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 104px;\"\u003e\n \u003cp\u003eLightGBM\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003e0.979\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 98px;\"\u003e\n \u003cp\u003e0.868 (0.812\u0026ndash;0.923)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.925\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 57px;\"\u003e\n \u003cp\u003e0.603\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.962\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.641\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.955\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 54px;\"\u003e\n \u003cp\u003e0.621\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 104px;\"\u003e\n \u003cp\u003eSupport vector machine\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003e0.946\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 98px;\"\u003e\n \u003cp\u003e0.854 (0.796\u0026ndash;0.912)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.892\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 57px;\"\u003e\n \u003cp\u003e0.485\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.938\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.471\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.941\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 54px;\"\u003e\n \u003cp\u003e0.478\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 104px;\"\u003e\n \u003cp\u003eTabNet\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003e0.996\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 98px;\"\u003e\n \u003cp\u003e0.820 (0.757\u0026ndash;0.882)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.886\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 57px;\"\u003e\n \u003cp\u003e0.574\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.922\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.453\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e0.950\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 54px;\"\u003e\n \u003cp\u003e0.506\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e\n\u003cp\u003e\u003cstrong\u003eAbbreviations:\u003c/strong\u003e AUC, area under the receiver operating characteristic curve; CI, confidence interval; PPV, positive predictive value; NPV, negative predictive value.\u003c/p\u003e\n\u003cp\u003eAccording to the results, the decision tree model had the highest AUC, and outstanding sensitivity (0.794) and negative predictive value (0.975). It was featured by a simple structure and high interpretability, which would be suitable for rare disease screening. Furthermore, as a classic medical predictive model, the logistic regression exhibited the highest sensitivity (0.824) and a negative predictive value of 0.977, providing a reliable performance baseline for model comparison. Additionally, the Random Forest revealed relatively lower sensitivity (0.691) and the highest F1-score (0.631), suggesting an optimal balance between precision and recall; and its high specificity (0.943) would minimize false positives, supporting its applicability for precise interventions in resource-constrained settings.\u003c/p\u003e\n\u003cp\u003eNotably, some ensemble models and deep learning models (e.g., XGBoost, LightGBM, and TabNet) exhibited extremely high AUC (\u0026gt;0.97) on the training set after cross-validation. However, they yielded significant performance degradation on the independent testing set, with the AUC dropping to the range of 0.82\u0026ndash;0.91, indicating potential risk of overfitting. In contrast, given their more stable performance across different datasets, decision trees, logistic regression, and random forest had superior generalization and greater potential for clinical employment.\u003c/p\u003e\n\u003cp\u003eBy applying a soft voting ensemble strategy, this study integrated the predictive strengths of multiple base models and construct a more robust ROP predictive model. This ensemble method was based on the weighted average of the output probabilities from each base model, with the equation shown as follows:\u003c/p\u003e\n\u003cp\u003ePfinal (class=i)=\u0026Sigma;(wk * Pk (class=i))\u003c/p\u003e\n\u003cp\u003ewhere Pfinal (class=i) represents the final probability of a sample belonging to Class I by the ensemble model; wk reveals the weight coefficient of the k\u003csup\u003eth\u003c/sup\u003e base model; and Pk (class=i) indicates the probability of a sample belonging to Class I by the k\u003csup\u003eth\u003c/sup\u003e base model.\u003c/p\u003e\n\u003cp\u003eTo reflect their corresponding relative discriminative capabilities, on the basis of the AUC performance of each model on the testing set, the weight coefficients were standardized to the decision tree with 0.337, logistic regression with 0.333, and random forest with 0.330. This ensemble model achieved outstanding overall performance on the independent testing set, with an AUC of 0.949, coupled with a good balance across all evaluation metrics (Table 3), suggesting excellent predictive robustness and clinical applicability potential.\u003c/p\u003e\n\u003cp\u003eFurthermore, in order to validate the statistical significance of performance differences across all models, Delong\u0026rsquo;s test was applied for pairwise comparisons of AUC values on the testing set, with the results detailed in Table\u0026nbsp;S3. It was found that:\u003c/p\u003e\n\u003cp\u003e(1) There were significant performance gaps between the ensemble model (Voting) and certain base models:\u003c/p\u003e\n\u003cp\u003eTo be specific, there were extremely significant AUC differences between the ensemble model and relatively weaker models (e.g., TabNet, SVM, and LightGBM) were (all \u003cem\u003eP\u003c/em\u003e\u0026lt;0.001). Taking TabNet as an example, the AUC difference between the ensemble model and TabNet was -0.129 (95% CI: -0.179 to -0.079, \u003cem\u003eP\u003c/em\u003e\u0026lt;0.001), highlighting the performance advantage of the ensemble approach.\u003c/p\u003e\n\u003cp\u003e(2) There were no significant differences between the core models and the ensemble model:\u003c/p\u003e\n\u003cp\u003eNo statistically significant AUC differences were observed between the three core models (decision tree, logistic regression, and random forest) and the ensemble model, with the difference in the comparison of decision tree, logistic regression, and random forest with Voting as 0.005 (\u003cem\u003eP\u003c/em\u003e=0.292), -0.009 (\u003cem\u003eP\u003c/em\u003e=0.108), and -0.016 (\u003cem\u003eP\u003c/em\u003e=0.003), respectively.\u003c/p\u003e\n\u003cp\u003e(3) The discrimination capabilities among core models were highly similar:\u003c/p\u003e\n\u003cp\u003eThere were no significant differences in AUC among the three core models (e.g., decision tree vs. logistic regression, \u003cem\u003eP\u003c/em\u003e=0.097), further validating their consistent discrimination performance.\u003c/p\u003e\n\u003cp\u003eUsing ROC, calibration and decision curves, this study further comprehensively evaluated the predictive performance and clinical applicability of the models. The ROC curves of each model on the testing set (Figure 1) visually indicated their discriminative capabilities, with superior overall performance identified for curves closer to the upper-left corner. The significant AUC values of some core models (e.g., random forest) were consistent with their high discriminative performance. As shown in Figure 2 regarding the calibration curves for each model, there was a certain degree of systematic probability overestimation, manifesting by a relatively lower localization of these calibration curves to the ideal diagonal line in all models. It might be attributed to the inconsistent prior distributions owing to balanced handling in the training set and low positive proportions in the testing set. The discriminatory performance of core models (e.g., random forest) remained outstanding, despite potential calibration bias, revealing their persistent value in risk ranking. The practical utility of models was subsequently assessed through clinical decision curves, as depicted in Figure 3. The net benefits of some core models (e.g., random forest) were consistently higher than those after two baseline strategies of \u0026ldquo;all intervention\u0026rdquo; and \u0026ldquo;no intervention\u0026rdquo; across a broad threshold range of 5%\u0026ndash;40%, showing strong clinical applicability. Therefore, despite limitations in probability calibration, the risk stratification based on these models can still provide substantial support for clinical decision-making.\u003c/p\u003e\n\u003cp\u003eIn accordance with the aforementioned comprehensive analysis, although the decision tree model exhibited slightly higher AUC, the random forest had more balanced performance in terms of the F1-score (0.631) and the specificity (0.943). In addition, the random forest had superior generalization as an ensemble model than that of a single decision tree, with lower risk of overfitting. Therefore, following comprehensive consideration of discriminative capability, robustness, and clinical utility (referring to the DCA curve), the random forest was recommended to be the optimal model for predicting the risk of ROP in this study.\u003c/p\u003e\n\u003ch2\u003eModel interpretability analysis\u003c/h2\u003e\n\u003cp\u003eThis study conducted an interpretability analysis on the random forest model with the optimal performance (AUC=0.933 and F1-score=0.631 for the testing set) via the game theory-based Shapley Additive exPlanations (SHAP) framework, with a purpose to reveal its feature contribution mechanisms and enhancing the clinical comprehensibility of the model. The analysis was implemented to precisely quantify the contribution degree of each feature to individual prediction outcomes by using Python\u0026rsquo;s shap package (version 0.41.0), with the efficient TreeSHAP algorithm targeted for ensemble tree models applied. Aiming at balancing the computational efficiency and result accuracy, SHAP value approximation was performed by randomly selecting 100 samples from the testing set to enhance analysis feasibility while ensuring statistical reliability.\u003c/p\u003e\n\u003cp\u003eFigure 4a presents the results of the global feature importance analysis based on average absolute SHAP values. The top-6 features contributing the most significantly to the predictions of the random forest model were PN-D, RF, GA, PSA, HB, and PA in order. Moreover, this ranking was highly consistent with the clinical pathophysiological mechanisms of ROP. Consequently, it confirmed that the onset and progression of ROP might be significantly affected by respiratory complication severity, preterm maturity, and metabolic dysfunction.\u003c/p\u003e\n\u003cp\u003eSHAP-based feature impact direction analysis further revealed intrinsic associations between key predictor variables and ROP risk(Figure 4b). Specifically, clinical conditions such as perinatal asphyxia (PA), PSA, and BPD , as well as lower GA and higher TBA levels, were all associated with positive SHAP values, confirming them as independent risk factors for ROP. Conversely, higher GA and lower TBA levels were associated with negative SHAP values, indicating clear protective effects.\u003c/p\u003e\n\u003cp\u003eCritically, the feature importance ranking determined by SHAP analysis was highly consistent with the results of feature screening from LASSO regression (Kendall\u0026rsquo;s \u0026tau;=0.89, \u003cem\u003eP\u003c/em\u003e\u0026lt;0.001). It in turn methodologically cross-validated the robustness and clinical rationality of feature screening in this study.\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eIn this study, a ML model suitable for predicting the risk of ROP at high altitude was successfully constructed based on clinical data from preterm infants at Qinghai Red Cross Hospital. In terms of the key findings, firstly, 11 independent predictors from 59 candidate features were identified by LASSO regression, with duration of parenteral nutrition, respiratory failure, and low gestational age confirmed as the strongest risk factors for ROP. Second, the random forest model had superior predictive performance (AUC=0.933 for the testing set), with its discriminatory capability significantly superior to traditional screening based on GA and BW.\u003c/p\u003e\n\u003cp\u003eResults in this study both extend and refine relevant evidence reported previously. This study not only reaffirmed the classic finding that low GA and low BW serve as core risk factors for ROP [32, 33], but also further quantified the protective effect of GA through a multivariate regression model (OR=0.855). More importantly, through rigorous feature screening, this study for the first time identified perinatal asphyxia (PA)and BPD as strong predictors independent of traditional risk factors in the population at high altitude. It is is of great pathophysiological significance to reveal the critical mechanistic role of severe neonatal pulmonary disease and accompanying oxygen-dynamic disturbances in ROP pathogenesis [34], thus advancing the academic emphasis on \u0026ldquo;oxygen therapy management as central\u0026rdquo; [35] from theoretical to clinically quantifiable assessment [36]. Altogether, there may be a unique ROP pathogenesis, featured by a \u0026ldquo;cardiopulmonary-retinal axis\u0026rdquo;, in high-altitude hypoxic environments, providing fresh insights for targeted intervention studies in the future.\u003c/p\u003e\n\u003cp\u003eNowadays, multiple predictive models have been developed based on routine clinical data, such as BW, GA, postnatal weight gain, days on oxygen, and number of blood transfusions. Common models are ROPScore, PW-ROP, WINROP, G-ROP, and DIGI-ROP [37]. In recent years, with the development of artificial intelligence technologies, the predictive models based on ML (e.g., XGBoost) and deep learning algorithms have shown superior predictive performance. Innovatively, it is the first to reveal that the superior discriminative performance of ML predictive models for ROP risk prediction in preterm infants at high altitude to that of traditional screening criteria. Second, the introduction of the SHAP interpretability framework can also achieve effective alignment between predictive results and clinical knowledge, in addition to revealing the nonlinear influencing patterns of key predictors [38, 39]. It may provide a methodological foundation for transforming complex models into reliable decision-support tools trusted by clinicians.\u003c/p\u003e\n\u003cp\u003eFindings in our study have clear potential for clinical translation. For example, it may benefit clinicians\u0026rsquo; early risk stratification management on preterm infants, and intensify clinical monitoring and intervention for high-risk patients (e.g., those complicated with PA or BPD). Eventually, it can optimize the allocation of screened resources, and can also enhance the efficiency of ROP prevention and treatment.\u003c/p\u003e\n\u003cp\u003eHowever, this study still has several limitations that warrant consideration. First, there might be a risk of selection bias due to the collection of samples from a single medical institution merely. Second, there was a lack of an external validation cohort, requiring further verification of the generalizability of the model across different geographic regions and population characteristics. Third, the data quality in this retrospective study might impact model training and feature screening. Fourth, as an analysis based on clinical macro-level data, this study has not yet conducted biological experiments for a thorough validation of the specific mechanisms by which identified biomarkers (e.g., TBA) contribute to the onset and progression of ROP. Therefore, to systematically enhance the generalization and biological interpretability of the constructed models, multicenter prospective cohort validation is recommended to be conducted with standardized data collection procedures, and molecular biology experiments.\u003c/p\u003e\n\u003cp\u003eIn view of the findings and limitations of this study, in the future, relevant investigation can be advanced in three dimensions as follows. Firstly and preferentially, the generalization of models across different populations and settings should be assessed rigorously by large-scale and multicenter prospective cohort validation studies, which is an indispensable step for clinical translation. Second, with further requirement of overcoming the limitations of existing static clinical data, there is a need to explore the integration of continuous dynamic physiological monitoring data (e.g., blood oxygen fluctuations, and heart rate variability) into the model to enhance predictive timeliness and sensitivity. Finally, subsequent studies of mechanisms focusing on identified key risk biomarkers (e.g., TBA) should be conducted by integrating molecular biology techniques, thereby elucidating their pathways in the onset and progression of ROP. Eventually, the precision and individualization of ROP prevention and treatment may be advanced significantly through organic combination of validation, optimization, and mechanism exploration.\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eTo sum up, this study successfully constructs a risk predictive model for ROP in preterm infants at high altitude using ML techniques. It provides reliable evidence-based support for precision screening and targeted intervention implementation by accurately identifying high-risk populations for ROP using ensemble learning algorithms. Despite inherent limitations in single-center retrospective analysis, this study is the first to systematically construct an ROP predictive model in the high-altitude hypoxic environment.Beyond offering clinicians with a practical risk assessment tool, it also supplies important methodological references and translational insights for ROP prevention and treatment strategy optimization in unique geographical environments. This model holds promise for demonstrating its clinical value in broader populations on the basis of further validation and improvement via multicenter prospective studies.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003eFunding\u003c/p\u003e\n\u003cp\u003eThis study was supported by the Qinghai Province Kunlun Talents Program (High-end Innovation and Entrepreneurship Talent\u0026nbsp;\u0026ndash;\u0026nbsp;Leading Talent Project; grant number QHKLYC-GDCXCY-2020-202). The funder was involved in the study design, data collection, analysis, decision to publish, and manuscript review.\u003c/p\u003e\n\u003cp\u003eConflicts of interest/Competing interests\u003c/p\u003e\n\u003cp\u003eXL, QD, YY, YZ, XY, NS, XM, YG, YM, and YH declare that they have no competing interests.\u003c/p\u003e\n\u003cp\u003eAvailability of data and material\u003c/p\u003e\n\u003cp\u003eThe datasets used and/or analysed during the current study are not publicly available due to ethical and privacy restrictions, but are available from the corresponding author on reasonable request and with permission of the institutional ethics committee (and, where applicable, a data use agreement).\u003c/p\u003e\n\u003cp\u003eAuthors\u0026apos; contributions\u003c/p\u003e\n\u003cp\u003eXL conceived and designed the study and oversaw the overall project. YY developed the research protocol, collected and analyzed the data, and drafted the manuscript. YZ collected the data, performed the statistical analysis, and contributed to manuscript writing. QD reviewed the study protocol, coordinated the study progress, and critically reviewed the manuscript. YM and HY collected the data. NS and XM assisted with data collection and resource allocation. YX and YG assisted in developing the research protocol and contributed to data analysis. All authors read and approved the final manuscript.\u003c/p\u003e\n\u003cp\u003eEthics approval\u003c/p\u003e\n\u003cp\u003eThis study was officially approved by the Ethics Committee of Qinghai Red Cross Hospital (Approval No.: KY-2025-13). All procedures were conducted in accordance with the Declaration of Helsinki and relevant national regulations.\u003c/p\u003e\n\u003cp\u003eConsent to participate\u003c/p\u003e\n\u003cp\u003eGiven the retrospective nature of this study, the requirement for informed consent from the parents or legal guardians of the participating infants was waived by the Ethics Committee of Qinghai Red Cross Hospital. This waiver was granted in accordance with Chinese national regulations governing biomedical research involving humans (《涉及人的生物医学研究伦理审查办法》, 2016;\u0026nbsp;《涉及人的生命科学和医学研究伦理审查办法》, 2023), based on the following justifications: (1) the study involved minimal risk as it was a retrospective analysis of de-identified medical records; (2) obtaining informed cosennt was impracticable due to the large sample size and loss of contact with participants after hospital discharge; and (3) strict measures were implemented to protect patient privacy and data confidentiality, including complete de-identification of all personal information.\u003c/p\u003e\n\u003cp\u003eConsent for publication\u003c/p\u003e\n\u003cp\u003eNot applicable. The manuscript does not contain any individual person\u0026rsquo;s data in any form (including individual details, images or videos).\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eCakir B, et al. Thrombocytopenia is associated with severe retinopathy of prematurity. JCI Insight. 2018;3(19):e124238.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBohley M, et al. A single intravenous injection of cyclosporin A-loaded lipid nanocapsules prevents retinopathy of prematurity. Sci Adv. 2022;8(38):eabo6638.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMoin M, et al. Severe ROP rate and assessment of the burden of ROP screening at a single tertiary care public hospital in Pakistan. BMC Ophthalmol. 2025;25(1):594.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGarc\u0026iacute;a H, et al. Global prevalence and severity of retinopathy of prematurity over the last four decades (1985\u0026ndash;2021): a systematic review and meta-analysis. Arch Med Res. 2024;55(2):102967.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGundlach BS, et al. Real-world visual outcomes of laser and anti-VEGF treatments for retinopathy of prematurity. Am J Ophthalmol. 2022;238:86\u0026ndash;96.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTrzcionkowska K, Schalij-Delfos NE, van den Akker-van Marle EME. Cost reduction in screening for retinopathy of prematurity in the Netherlands by comparing different screening strategies. Acta Ophthalmol. 2023;101(1):81\u0026ndash;90.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDogra MR, Vinekar A. Role of anti-vascular endothelial growth factor (anti-VEGF) in the treatment of retinopathy of prematurity: a narrative review in the context of middle-income countries. Pediatr Health Med Ther. 2023;14:59\u0026ndash;69.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTsai AS, et al. Assessment and management of retinopathy of prematurity in the era of anti-vascular endothelial growth factor (VEGF). Prog Retin Eye Res. 2022;88:101018.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBishnoi K, et al. A narrative review on managing retinopathy of prematurity: insights into pathogenesis, screening, and treatment strategies. Cureus. 2024;16(3):e56168.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShah PK, et al. Retinopathy of prematurity: past, present and future. World J Clin Pediatr. 2016;5(1):35\u0026ndash;46.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen BH. Minimum standards for evaluating machine-learned models of high-dimensional data. Front Aging. 2022;3:901841.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSu R, et al. Genomic selection in pig breeding: comparative analysis of machine learning algorithms. Genet Sel Evol. 2025;57(1):13.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAli M, Aittokallio T. Machine learning and feature selection for drug response prediction in precision oncology applications. Biophys Rev. 2019;11(1):31\u0026ndash;9.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVannuccini S, et al. Infertility and reproductive disorders: impact of hormonal and inflammatory mechanisms on pregnancy outcome. Hum Reprod Update. 2016;22(1):104\u0026ndash;15.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGreen EA, et al. The role of the interleukin-1 family in complications of prematurity. Int J Mol Sci. 2023;24(3):2815.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFundus Diseases Group in Ophthalmology Branch of Chinese Medical Association. Guidelines for screening retinopathy of prematurity in China in 2014. Chin J Ophthalmol. 2014;50(12):933\u0026ndash;5.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eL\u0026oacute;pez-Rueda A, et al. Enhancing mortality prediction in patients with spontaneous intracerebral hemorrhage: radiomics and supervised machine learning on non-contrast computed tomography. Eur J Radiol Open. 2024;13:100618.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSaberzadeh-Ardestani B, et al. Immune marker spatial distribution and clinical outcome after PD-1 blockade in mismatch repair-deficient, advanced colorectal carcinomas. Clin Cancer Res. 2023;29(20):4268\u0026ndash;77.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYu Z. Data-driven discovery of core sleep biomarkers for predicting early cardiometabolic risk in a healthy population using machine learning. medRxiv. 2025.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChekole B, et al. Survival status and predictors of mortality among HIV-positive children initiated antiretroviral therapy in Bahir Dar town public health facilities Amhara region, Ethiopia, 2020. SAGE Open Med. 2022;10:20503121211069477.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXie C, et al. Effect of machine learning re-sampling techniques for imbalanced datasets in (18)F-FDG PET-based radiomics model on prognostication performance in cohorts of head and neck cancer patients. Eur J Nucl Med Mol Imaging. 2020;47(12):2826\u0026ndash;35.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDemircioğlu A. Applying oversampling before cross-validation will lead to high bias in radiomics. Sci Rep. 2024;14(1):11563.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLin CY, et al. Machine learning-based prediction of three-year heart failure and mortality after premature ventricular contraction ablation. Diagnostics (Basel). 2025;15(21):10281.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eElshewey AM, et al. DDoS classification of network traffic in software defined networking SDN using a hybrid convolutional and gated recurrent neural network. Sci Rep. 2025;15(1):29122.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhou W, et al. Predicting central lymph node metastasis in papillary thyroid microcarcinoma: a breakthrough with interpretable machine learning. Front Endocrinol (Lausanne). 2025;16:1537386.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhan M, et al. Application of artificial intelligence in conjunction with clinical laboratory indicators to aid decision-making for surgical or conservative treatment of pediatric intestinal obstruction. World J Pediatr Surg. 2025;8(5):e001079.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHan H et al. Development of an interpretable machine learning model to predict short-term bleeding risk in patients receiving dual antithrombotic therapy following cardiac surgery. Int J Clin Pharm. 2025.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHeidari P, Milan A. Combining K-fold cross validation with bayesian hyperparameter optimization for accuracy enhancement of land cover and land use classification. Sci Rep. 2025;15(1):39758.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHuang H, et al. Predicting rheological properties of asphalt modified with mineral powder: bagging, boosting, and stacking vs. single machine learning models. Mater (Basel). 2025;18(12):2985.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRuiz Sarrias O, et al. Predicting severe haematological toxicity in gastrointestinal cancer patients undergoing 5-FU-based chemotherapy: a Bayesian network approach. Cancers (Basel). 2023;15(17):4278.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKang BY, et al. Serum calcium-based interpretable machine learning model for predicting anastomotic leakage after rectal cancer resection: a multi-center study. World J Gastroenterol. 2025;31(19):105283.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYucel OE, et al. Incidence and risk factors for retinopathy of prematurity in premature, extremely low birth weight and extremely low gestational age infants. BMC Ophthalmol. 2022;22(1):367.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang H, et al. Risk factors for retinopathy of prematurity among preterm infants with bronchopulmonary dysplasia. J Matern Fetal Neonatal Med. 2025;38(1):2497058.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWickramasinghe LC, et al. Lung and eye disease develop concurrently in supplemental oxygen-exposed neonatal mice. Am J Pathol. 2020;190(9):1801\u0026ndash;12.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRashidian P, Karami S, Salehi SA. A review on retinopathy of prematurity. Med Hypothesis Discov Innov Ophthalmol. 2024;13(4):201\u0026ndash;12.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWoods J, Biswas S. Retinopathy of prematurity: from oxygen management to molecular manipulation. Mol Cell Pediatr. 2023;10(1):12.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHutchinson AK, et al. Clinical models and algorithms for the prediction of retinopathy of prematurity: a report by the American Academy of Ophthalmology. Ophthalmology. 2016;123(4):804\u0026ndash;16.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShin DR, Song IH, Lee SK. Interpretable QSAR modelling for immunotoxicity prediction using enhanced fingerprint and SHAP-based feature selection. SAR QSAR Environ Res. 2025;36(10):955\u0026ndash;69.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRaptis S, Ilioudis C, Theodorou K. From pixels to prognosis: unveiling radiomics models with SHAP and LIME for enhanced interpretability. Biomed Phys Eng Express. 2024;10(3):035022.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"bmc-ophthalmology","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"boph","sideBox":"Learn more about [BMC Ophthalmology](http://bmcophthalmol.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/boph","title":"BMC Ophthalmology","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Retinopathy of prematurity, Machine learning, Predictive model, LASSO regression, Random forest, High-altitude regions, SHAP analysis","lastPublishedDoi":"10.21203/rs.3.rs-8533267/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8533267/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003e\u003cb\u003eBackground\u003c/b\u003e\u003c/p\u003e \u003cp\u003eRetinopathy of prematurity (ROP) has been one of the main eye troubles leading to childhood blindness. The specific chronic hypoxic environment at high altitude may form a unique risk profile, acting as a potential trigger of the onset and progression of ROP. So far, there is an absence of specific ROP risk predictive model for preterm infants in these areas. Accordingly, this study intended to develop an ROP predictive model at high altitude using machine learning (ML) methods.\u003c/p\u003e\u003cp\u003e\u003cb\u003eMethods\u003c/b\u003e\u003c/p\u003e \u003cp\u003eThrough a retrospective collection of the clinical data from 2,138 premature infants who underwent fundus screening at Qinghai Red Cross Hospital between May 2014 and May 2025, this study was conducted with the establishment of a training set (n\u0026thinsp;=\u0026thinsp;1,470) and a testing set (n\u0026thinsp;=\u0026thinsp;668) at a 7:3 ratio. Key predictors from 59 candidate variables were screened by employing univariate analysis and LASSO regression. This study continued to construct nine ML models involving logistic regression, decision tree, random forest, XGBoost, LightGBM, support vector machine, Gaussian Naive Bayes, multilayer perceptron, and TabNet. Finally, to evaluate the model performance, another independent testing set was utilized to carry out model training and hyper-parameter optimization were performed using five-fold cross-validation and Bayesian optimization.\u003c/p\u003e\u003cp\u003e\u003cb\u003eResults\u003c/b\u003e\u003c/p\u003e \u003cp\u003eLASSO regression identified 11 key predictors, including perinatal asphyxia, bronchopulmonary dysplasia (BPD), surfactant administration, gestational age, hyperbilirubinemia, respiratory failure, mode of delivery, premature rupture of membranes, intravenous nutritional duration, fasting duration, and total bile acids. The area under the receiver operating characteristic curve (AUC) of all models was greater than 0.82 on the testing set. The AUC of the decision tree model was the highest (0.954, 95% CI: 0.919\u0026ndash;0.989), but the random forest model exhibited the optimal comprehensive performance (AUC\u0026thinsp;=\u0026thinsp;0.933, 95% CI: 0.891\u0026ndash;0.974; sensitivity\u0026thinsp;=\u0026thinsp;0.691; specificity\u0026thinsp;=\u0026thinsp;0.943; F1 score\u0026thinsp;=\u0026thinsp;0.631). The integrated model also demonstrated a robust performance (AUC\u0026thinsp;=\u0026thinsp;0.949). In addition, duration of parenteral nutrition, respiratory failure, and gestational age were identified as the most influential predictors by SHAP analysis.\u003c/p\u003e\u003cp\u003e\u003cb\u003eConclusions\u003c/b\u003e\u003c/p\u003e \u003cp\u003eThis study successfully develops and validates a ML predictive model for ROP in preterm infants at high altitude. With an effective identification of infants at high risk for ROP based on routine clinical indicators, the random forest model demonstrates the optimal overall performance, and hence offers a scientific tool for precision screening and early intervention.\u003c/p\u003e","manuscriptTitle":"Machine Learning-Based Individualized Prediction: Risk Assessment of Retinopathy in Preterm Infants at High Altitude","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-01-16 11:02:34","doi":"10.21203/rs.3.rs-8533267/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2026-02-27T05:03:54+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-02-14T18:32:52+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"256700239891615585691708327039575665076","date":"2026-02-11T10:48:54+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-02-03T21:38:50+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"295651648131576230940256520891722967258","date":"2026-01-15T19:24:27+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2026-01-13T05:15:50+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2026-01-13T05:13:23+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2026-01-12T06:05:41+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2026-01-09T15:26:15+00:00","index":"","fulltext":""},{"type":"submitted","content":"BMC Ophthalmology","date":"2026-01-09T15:18:52+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"bmc-ophthalmology","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"boph","sideBox":"Learn more about [BMC Ophthalmology](http://bmcophthalmol.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/boph","title":"BMC Ophthalmology","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"5f304424-766e-43a6-af73-c685b36f108b","owner":[],"postedDate":"January 16th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2026-04-13T16:00:33+00:00","versionOfRecord":{"articleIdentity":"rs-8533267","link":"https://doi.org/10.1186/s12886-026-04798-6","journal":{"identity":"bmc-ophthalmology","isVorOnly":false,"title":"BMC Ophthalmology"},"publishedOn":"2026-04-07 15:57:45","publishedOnDateReadable":"April 7th, 2026"},"versionCreatedAt":"2026-01-16 11:02:34","video":"","vorDoi":"10.1186/s12886-026-04798-6","vorDoiUrl":"https://doi.org/10.1186/s12886-026-04798-6","workflowStages":[]},"version":"v1","identity":"rs-8533267","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8533267","identity":"rs-8533267","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.