Lipid profile-based prediction of ischemic heart disease using machine learning: a comparative analysis of classification algorithms

preprint OA: closed
Full text JSON View at publisher
Full text 167,425 characters · extracted from preprint-html · click to expand
Lipid profile-based prediction of ischemic heart disease using machine learning: a comparative analysis of classification algorithms | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Lipid profile-based prediction of ischemic heart disease using machine learning: a comparative analysis of classification algorithms ANAM IJAZ, Sara Aslam, Fatima Taj, Shabana NA This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9464045/v1 This work is licensed under a CC BY 4.0 License Status: Under Revision Version 1 posted 15 You are reading this latest preprint version Abstract Ischemic heart disease (IHD) is the leading cause of cardiovascular mortality in Pakistan, yet accessible tools for early biochemical risk stratification remain limited. The present case-control study compared five supervised machine learning (ML) classifiers (Logistic Regression, Decision Tree, Support Vector Machine with Radial Basis Function kernel (SVM-RBF), XGBoost, and LightGBM) for IHD prediction using routinely available lipid profiles, anthropometric, and clinical variables in 929 participants (710 IHD cases, 219 healthy controls) recruited from two tertiary care centres in Lahore, Pakistan. Models were trained on a stratified 80% partition and validated on an independent 20% test set and a held-out 10% unseen dataset. LightGBM achieved the best performance, with an accuracy of 0.98 and AUC-ROC of 0.99 on both 80% train set and 20% test set, and perfect classification (accuracy 1.00, AUC-ROC 1.00) on the unseen dataset (10%), with a cross-validation accuracy of 0.98 ± 0.02. SVM-RBF demonstrated comparably strong performance (accuracy 0.98, AUC-ROC 0.99). SHAP analysis identified BMI, LDL, VLDL, HDL, and age as the most influential predictors. These findings establish LightGBM as an accurate, interpretable, and generalizable framework for early IHD risk stratification in resource-limited settings. Health sciences/Biomarkers Health sciences/Cardiology Biological sciences/Computational biology and bioinformatics Health sciences/Diseases Health sciences/Medical research Health sciences/Risk factors Ischemic heart disease cardiovascular diseases machine learning models Lipid profile LightGBM. Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Introduction Ischemic heart disease (IHD) is defined by decreased cardiac blood flow causing an imbalance between myocardial oxygen supply and demand, manifesting across a spectrum from chronic coronary syndromes to acute atherothrombotic events 1 . IHD remains the leading cause of death globally, responsible for an estimated 9.14 million deaths in 2019, with the highest age-standardized mortality rates in Central Asia, Eastern Europe, and the Middle East 2 , 3 . Pakistan bears a disproportionately high burden: the age-standardized cardiovascular diseases (CVD) incidence rate of 918.18 per 100,000 population exceeds the global average of 684.33 per 100,000, and IHD accounted for approximately 183,409 deaths in 2019 representing 53.76% of all CVD deaths 4 , 5 . Elevated LDL cholesterol and high blood pressure remain the two leading attributable risk factors, underscoring the importance of lipid profile assessment for early IHD detection in the Pakistani population 6 . Dyslipidemia is characterized by elevated total cholesterol (TC), low-density lipoprotein cholesterol (LDL-C), triglycerides (TG), and very-low-density lipoprotein (VLDL), alongside reduced high-density lipoprotein cholesterol (HDL-C). These parameters are often early indicators of coronary artery disease 7 , 8 . While LDL-C is causally validated as the primary atherogenic driver 9 , its predictive value alone is limited, as many IHD cases occur in individuals with normal LDL-C levels 10 . The combined pattern of elevated TG and reduced HDL-C has demonstrated at least equivalent predictive power, highlighting the value of comprehensive lipid panel assessment over single-fraction analysis 11 , 12 . Machine learning (ML) has demonstrated substantial superiority over conventional risk scoring tools for cardiovascular risk prediction, with supervised algorithms including Logistic Regression (LR), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), and gradient boosting frameworks such as LightGBM and XGBoost 13 . Each of these algorithms operates on distinct mathematical principles, conferring differential strengths in terms of interpretability, handling of non-linear relationships, and robustness to class imbalance 14 . Despite the accumulating evidence supporting the utility of ML in cardiovascular risk prediction, the application of these models specifically to lipid-profile-driven classification of ischemic heart disease patients versus healthy controls remains comparatively underexplored. Traditional tools and laboratory findings fall short of the ideal standard for risk prediction, and the thriving application of machine-learning algorithms in IHD-related datasets including supervised methods such as Decision Tree, XGBoost, SVM, and Logistic Regression has provided compelling evidence for enhancing both short- and long-term mortality and morbidity risk estimation in ischemic cardiac conditions 15 . The present study therefore aims to address this gap by employing and comparing multiple ML classification algorithms applied to a clinical lipid profile dataset of IHD patients and healthy controls to identify the most accurate and clinically applicable predictive model for the early biochemical detection of ischemic heart disease. Methods Study Design This was a hospital-based case-control study conducted from February 2024 to July 2024. Data was collected at a single time point; due to the cross-sectional nature of data collection, causality and temporal relationships cannot be inferred. The research was approved from Research Ethics and Biosafety Committee of University of the Punjab, Pakistan that follows Helsinki Declaration. Written informed consent was obtained from all participants prior to enrollment. Patient recruitment and data collection A minimum sample size of approximately 594 participants was calculated assuming a 17% IHD prevalence, 5% detectable difference, 90% power, and a 5% significance level. The study comprised 710 confirmed IHD cases and 218 healthy controls recruited from two tertiary care centers: the Social Security Hospital (PESSI) and Punjab Institute of Cardiology (PIC), Lahore, Pakistan. Cases and controls were matched for gender and ethnicity. Diagnosis of IHD was confirmed by an on-duty cardiologist based on positive cardiac echocardiography, electrocardiogram (ECG) findings, clinical assessment, and serum cardiac biomarkers including Troponin T/I. Only recently diagnosed patients who had not yet commenced lipid-lowering or antihypertensive medications were enrolled, to avoid confounding effects of pharmacological treatment on lipid profiles. Participants with pre-existing conditions including liver disease, kidney disease, or malignancy were excluded from both cases and controls groups. IHD patients with comorbid diabetes, hypertension, smoking history, and positive family history of IHD were included in the study, as these represent clinically relevant risk factors. Diabetes was defined as a fasting blood glucose (FBG) of ≥ 126 mg/dL or a 2-hour postprandial blood glucose of ≥ 200 mg/dL, in accordance with WHO diagnostic criteria. Hypertension was classified using the criteria defined in the Seventh Report of the Joint National Committee on Prevention, Detection, Evaluation, and Treatment of High Blood Pressure (JNC 7) 16 . Blood pressure was recorded as the means of two consecutive measurements taken after the participant had been seated and rested for five minutes. Anthropometric measurements including height and weight were recorded in the clinic by trained staff. Body mass index (BMI) was calculated as weight in kilograms divided by the square of height in meters (kg/m²). Demographic and clinical data were recorded through structured face-to-face interviews and review of participants' medical records. A standardized questionnaire was administered to collect information on age, gender, BMI, smoking status, and history of diabetes and hypertension. Lipid profile values were extracted from laboratory biochemical reports generated at the respective analysis centers. Prior to biochemical analysis, all samples were prescreened for human immunodeficiency virus (HIV), hepatitis B virus (HBV), hepatitis C virus (HCV), and syphilis, and reactive samples were excluded. Missing values for lipid parameters, accounting for less than 5% of the dataset, were imputed using mean substitution. Blood Sample Collection All participants were instructed to fast for a minimum of 10–12 hours overnight prior to blood collection. Venous blood samples were collected in sterile vacutainer tubes under aseptic conditions and allowed to clot at room temperature for 10–20 minutes. Serum was separated by centrifugation at 3000rpm for 5 min and transferred to labelled Eppendorf tubes and stored at − 20°C until biochemical analysis was performed. LDL = Total Cholesterol − HDL − (TG/5) where TG/5 represents the estimated VLDL cholesterol concentration. All calculations were performed automatically by the analyzer. At IDC Evercare Hospital, local commercial kits were employed for direct quantification of all lipid parameters. Each kit contained two reagents (R1 and R2) that were automatically mixed with serum in the reaction vessels of the analyzer. Following sequential addition of reagents and serum, enzymatic reactions occurred and absorbance was measured spectrophotometrically to quantify each lipid parameter. Results were available within approximately 15 minutes of sample loading. The complete lipid profile for each participant included LDL cholesterol, HDL cholesterol, VLDL cholesterol, total cholesterol (TC), and triglycerides (TG), which served as the primary biochemical variables in subsequent statistical and machine learning analyses. At PESSI, LDL cholesterol was calculated using the Friedewald equation. It is noted that this formula is validated for triglyceride concentrations below 400 mg/dL; samples with triglyceride levels exceeding this threshold may carry a degree of estimation error in calculated LDL, which is acknowledged as a limitation. At IDC Evercare Hospital, LDL was directly measured using enzymatic colorimetric kits. Despite the use of two analytical platforms, both centers employed instruments operating on the same spectrophotometric principle with commercially standardized reagent kits, minimizing systematic bias. The Friedewald equation used at PESSI is internationally validated and widely accepted in clinical lipid research. Pooled analysis was therefore considered appropriate for the purposes of this study. Statistical analysis Descriptive and inferential statistics were performed in R (version 4.5.3) using the following packages: readxl , dplyr , gtsummary , flextable , and officer . Continuous variables are expressed as mean (standard deviation, SD) and categorical variables as frequency and percentage. Between-group differences were assessed using Welch's two-sample t-test for continuous variables and Pearson's chi-squared test for categorical variables (Table 1 ). Univariate and multivariate binary logistic regression analyses were subsequently conducted to identify independent predictors of IHD, with results reported as odds ratios (OR) with 95% confidence intervals (CI) and corresponding p-values. A p-value < 0.05 was considered statistically significant throughout all conventional statistical analyses. Table 1 Characteristics of the cohort included in the study. Characteristics Cases N = 710 1 Controls N = 219 1 p-value 2 TG 252 (122) 188 (66) < 0.001 TC 216 (103) 175 (43) < 0.001 VLDL 83 (84) 38 (13) < 0.001 LDL 122 (79) 77 (21) < 0.001 HDL 73 (68) 68 (18) 0.082 BMI 33 (6) 22 (3) < 0.001 Age 59 (12) 56 (10) 0.002 Gender 0.2 Female 288 (41%) 100 (46%) Male 422 (59%) 119 (54%) Diabetes 325 (46%) 34 (16%) < 0.001 Hypertension 446 (63%) 49 (22%) < 0.001 Smoking 130 (18%) 26 (12%) 0.034 1 Mean (SD); n (%) 2 Welch Two Sample t-test; Pearson's Chi-squared test All machine learning (ML) analyses were implemented in Python (version 7.5.5, Jupyter Notebook environment) using the Pandas, NumPy, and Keras libraries, supplemented by scikit-learn, XGBoost, LightGBM, and SHAP. The complete analytical workflow is illustrated in Fig. 1 . The dataset comprised 929 participants (710 IHD cases, 219 healthy controls) and was partitioned under two independent stratified splitting strategies to evaluate model performance and generalizability. In the primary split, the dataset was divided into a training set (80%) and a test set (20%). To further assess generalizability on completely unseen data, a secondary three-way split was applied, yielding a training set (80%), an internal test set (10%), and a held-out unseen test set (10%). Stratified splitting was applied in both schemes to preserve the case-to-control ratio across all partitions, and a fixed random state (random_state = 42) was used throughout to ensure full reproducibility. Five supervised classification algorithms were trained and evaluated: Logistic Regression (LR) 18 , Decision Tree (DT) 19 , Support Vector Machine with a Radial Basis Function kernel (SVM-RBF) 20 , Extreme Gradient Boosting (XGBoost, hyperparameter-optimized via grid search) 21 , and Light Gradient Boosting Machine (LightGBM) 22 . Feature standardization was applied prior to SVM training; tree-based and ensemble models were trained on unscaled features. Model performance was quantified using precision, recall, F1-score, overall classification accuracy, area under the receiver operating characteristic curve (AUC-ROC), and stratified five-fold cross-validation accuracy (mean ± SD). Confusion matrices and ROC curves were generated for all models on both training and test partitions. Feature importance was assessed for all models using built-in impurity-based importance scores. For the best-performing model, global model interpretability was further examined using SHAP (SHapley Additive exPlanations) values, providing a theoretically grounded, model-agnostic decomposition of individual feature contributions to predictions. A learning curve analysis was additionally conducted for the best-performing model to assess the relationship between training sample size and generalization performance, and to confirm model convergence prior to final evaluation. Results Cohort characteristics A total of 929 participants were included in this study, comprising 710 ischemic heart disease (IHD) cases and 219 controls. Comparative analysis of baseline characteristics revealed significant differences in lipid profile parameters between the two groups. Specifically, triglycerides, total cholesterol, VLDL, and LDL levels were significantly elevated in cases compared to controls (all p < 0.001), indicating a strong association between dyslipidemia and IHD. In contrast, HDL levels did not differ significantly ( p = 0.082), suggesting a comparatively weaker discriminative role in this cohort (Table 1 ). In addition to lipid parameters, cases exhibited a significantly higher mean body mass index (BMI) (33 ± 6 in cases vs. 22 ± 3 in controls, p < 0.001) and were slightly older than controls (59 ± 12 years in cases vs. 56 ± 10 years in control, p = 0.002). The prevalence of key comorbidities, including diabetes (46% in cases vs. 16% control, p < 0.001) and hypertension (63% in cases vs. 22% in control, p < 0.001), was substantially greater in cases, further reinforcing their established role as risk factors for IHD. Smoking was also more prevalent among cases ( p = 0.034), whereas gender distribution did not differ significantly between groups ( p = 0.2). Collectively, these findings confirm that individuals with IHD exhibit a significantly adverse cardiometabolic profile (Table 1 ). Univariate and multivariate regression analysis (training dataset, 80%) Multivariate regression analysis identified BMI, VLDL, LDL, and hypertension as significant independent predictors of IHD, with BMI and hypertension demonstrating the strongest associations. Although diabetes and smoking were significant in univariate analysis, they lost statistical significance after adjustment, suggesting potential confounding effects. Female gender showed a protective association in the adjusted model, while age and HDL were not statistically significant predictors (Table 2 ). Table 2 Univariate and multivariate regression on applied on training (80%) dataset. Characteristic Univariate Multivariate N OR 95% CI p-value OR 95% CI p-value TG 743 1.01 1.01, 1.01 < 0.001 0.99 0.99, 1.00 0.033 VLDL 743 1.04 1.03, 1.05 < 0.001 1.06 1.04, 1.09 < 0.001 LDL 743 1.02 1.02, 1.03 < 0.001 1.03 1.02, 1.05 < 0.001 HDL 743 1.00 1.00, 1.00 0.3 0.99 0.98, 1.00 0.053 BMI 743 1.81 1.66, 2.01 < 0.001 2.05 1.78, 2.42 < 0.001 Age 743 1.02 1.01, 1.04 0.004 1.02 0.99, 1.06 0.2 Gender 743 0.014 Male — — — — Female 0.83 0.59, 1.16 0.3 0.35 0.14, 0.81 Diabetes 743 0.2 No — — — — Yes 4.66 3.04, 7.38 < 0.001 1.97 0.77, 5.24 Hypertension 743 < 0.001 No — — — — Yes 6.31 4.27, 9.53 0.9 No — — — — Yes 2.08 1.24, 3.70 0.008 1.05 0.33, 3.41 Abbreviations: CI = Confidence Interval, OR = Odds Ratio Performance of machine learning models on the training dataset (80%) The predictive performance of five machine learning models (logistic regression, decision tree, support vector machine (SVM) with radial basis function (RBF) kernel, XGBoost, and LightGBM) were first evaluated on the training dataset. Overall, all models demonstrated high classification performance, with their accuracy values ranging from 0.95 to 0.98 and AUC-ROC scores exceeding 0.93 (Table 3 ). Table 3 Results of ML models applied on training (80%) dataset. Models Precision Recall F1-score Support Classification report Accuracy AUC-ROC Score Cross-Validation Accuracy Logistic Regression Cases 0.96 0.97 0.96 213 0.95 0.98 0.96 (+/- 0.03) Control 0.89 0.88 0.89 66 Macro avg 0.93 0.92 0.93 279 - - - Weighted avg 0.95 0.95 0.95 279 - - - Decision tree Cases 0.97 0.96 0.96 142 0.95 0.93 0.96 (+/- 0.02) Control 0.87 0.91 0.89 44 Macro avg 0.92 0.93 0.93 186 - - - Weighted avg 0.95 0.95 0.95 186 - - - SVM with RBF Kernel Cases 0.97 1.00 0.99 142 0.98 0.99 0.96 (+/- 0.03) Control 1.00 0.91 0.95 44 Macro avg 0.99 0.95 0.97 186 - - - Weighted avg 0.98 0.98 0.98 186 - - - XGBOOST (optimized) Cases 0.97 0.99 0.98 142 0.96 0.99 0.96 (+/- 0.02) Control 0.95 0.89 0.92 44 Macro avg 0.96 0.94 0.95 186 - - - Weighted avg 0.96 0.96 0.96 186 - - - Light GBM Cases 1.00 0.93 0.96 44 0.98 0.99 0.98 (± 0.02) Control 0.98 1.00 0.99 142 Macro avg 0.99 0.97 0.98 186 - - - Weighted avg 0.98 0.98 0.98 186 - - - Logistic regression achieved an accuracy of 0.95 and an AUC-ROC of 0.98, indicating strong baseline performance, as further supported by its confusion matrix and ROC curve (Fig. 2 ). The decision tree model yielded a similar accuracy (0.95) but a comparatively lower AUC-ROC (0.93), with its confusion matrix reflecting slightly higher misclassification of controls (Fig. 3 ). In contrast, the SVM with RBF kernel demonstrated superior classification performance, achieving an accuracy of 0.98 and an AUC-ROC of 0.99, with near-perfect classification observed in the confusion matrix and a highly discriminative ROC curve (Fig. 4 ). The ensemble-based models further enhanced predictive performance, XGBoost, achieved an accuracy of 0.96 and an AUC-ROC of 0.99, with its confusion matrix indicating improved balance between sensitivity and specificity and strong probability separation (Fig. 5 ). LightGBM demonstrated one of the best overall performances with an accuracy of 0.98 and an AUC-ROC of 0.99, supported by a well-defined confusion matrix and strong feature importance patterns. Additionally, SHAP summary plots highlighted the relative contribution of features, while the learning curve confirmed model stability and minimal overfitting (Fig. 6 ). Importantly, LightGBM exhibited the highest cross-validation accuracy (0.98 ± 0.02), indicating robust generalization (Table 3 ). Performance on the independent testing dataset (20%) The generalizability of the models was assessed on an independent testing dataset. The performance trends observed during training were largely preserved (Table 4 ). SVM (RBF kernel) and LightGBM continued to outperform other models, each achieving an accuracy of 0.98 and an AUC-ROC of 0.99. XGBoost also maintained strong performance (accuracy 0.96, AUC-ROC 0.99). Table 4 Results of ML models applied on testing (20%) dataset. Model Accuracy AUC-ROC score Top features Logistic Regression 0.95 0.99 - Decision Tree 0.95 0.93 - SVM (RBF Kernel) 0.98 0.99 BMI, VLDL, LDL, Hypertension, HDL XGBoost (optimized) 0.96 0.99 BMI, LDL, Hypertension, VLDL, HDL LightGBM 0.98 0.99 BMI, HDL, LDL, VLDL, Age Logistic regression and decision tree models showed stable but comparatively lower performance, both achieving an accuracy of 0.95, with AUC-ROC values of 0.99 and 0.93, respectively. Confusion matrix analysis of LightGBM on the testing dataset demonstrated high true positive and true negative rates, indicating excellent classification balance (Fig. 7 ). Detailed classification metrics for LightGBM further confirmed high precision, recall, and F1-scores across both classes (Table 5 ). Table 5 Parameters of Light GBM applied on testing dataset (20%). Light GBM Precision Recall F1-score Support Classification report Accuracy AUC-ROC Score Cross-Validation Accuracy Cases 1.00 0.93 0.96 44 0.98 0.99 0.98(± 0.02) Control 0.98 1.00 0.99 142 macro average 0.99 0.97 0.98 186 - - - Weighted average 0.98 0.98 0.98 186 - - - Feature importance analysis on the testing dataset revealed that BMI, LDL, VLDL, HDL, and hypertension were consistently among the top predictors across models (Table 4 ). These findings emphasize the dominant contribution of lipid abnormalities and metabolic risk factors in IHD prediction. Performance on unseen test dataset (10%) To further evaluate model robustness, performance was assessed on a completely unseen test dataset (Table 6 ). LightGBM achieved perfect classification performance, with an accuracy of 1.00 and an AUC-ROC of 1.00, indicating exceptional generalizability. This was supported by its confusion matrix, which showed no misclassification (Fig. 8 ). SVM maintained strong performance with an accuracy of 0.98 and an AUC-ROC of 0.99, followed by XGBoost (accuracy 0.96, AUC-ROC 0.99). Logistic regression and decision tree models demonstrated consistent but lower performance, each achieving an accuracy of 0.95. Detailed classification metrics for LightGBM on the unseen dataset showed high precision (0.97 for cases, 0.95 for controls), recall (0.99 for cases, 0.91 for controls), and F1-scores (0.98 and 0.93, respectively), further supporting its reliability (Table 7 ). Table 6 Results of ML models applied on unseen testing (10%) dataset. Model Accuracy AUC-ROC score Logistic Regression 0.95 0.98 Decision Tree 0.95 0.93 SVM (RBF Kernel) 0.98 0.99 XGBoost (optimized) 0.96 0.99 LightGBM 1.00 1.00 Table 7 Parameters of Light GBM applied on unseen testing dataset (10%). Light GBM Precision Recall F1-score Support Classification report Accuracy AUC-ROC Score Cross-Validation Accuracy Cases 0.97 0.99 0.98 71 0.97 1.00 0.96 (+/- 0.03) Control 0.95 0.91 0.93 22 Macro average 0.96 0.95 0.95 93 - - - Weighted average 0.97 0.97 0.97 93 - - - Feature importance and interpretability Across all models, lipid-related variables (including LDL, VLDL, and HDL) along with BMI emerged as the most influential predictors of IHD. Hypertension and age also contributed significantly. The consistency of these predictors across models reinforces their biological and clinical relevance. SHAP (SHapley Additive exPlanations) analysis of the LightGBM model provided further interpretability, demonstrating that higher BMI and adverse lipid levels were strongly associated with increased predicted risk of IHD (Fig. 6 ). The SHAP summary plot indicated that BMI and LDL had the largest impact on model output, followed by VLDL, HDL, and hypertension. Discussion This study presents a novel and clinically relevant application of machine learning (ML) models for the prediction of ischemic heart disease (IHD) using lipid profile parameters integrated with demographic and anthropometric characteristics. To the best of our knowledge, this is among the first studies to systematically evaluate and compare multiple advanced ML algorithms specifically on a lipidemia-centered dataset for IHD prediction. While previous research has explored AI in cardiovascular disease, most studies have incorporated heterogeneous inputs such as imaging, genomics, or electronic health records, often without isolating the predictive contribution of lipid fractions. In contrast, our work demonstrates that biochemical markers are routinely available, combined with basic clinical variables and can achieve highly accurate and generalizable predictions, highlighting both novelty and translational potential. The observed differences in lipid parameters between cases and controls in our cohort align robustly with established epidemiological evidence. Cardiovascular disease is the leading cause of morbidity and mortality worldwide, and dyslipidemia is one of its major risk factors, with LDL cholesterol serving as the key transmitter of cholesterol to the vascular artery wall. Specifically, elevated levels of triglycerides and triglyceride-rich lipoproteins including VLDL and intermediate-density lipoprotein (IDL) are independently recognized as important cardiovascular risk factors that can traverse the endothelium, accumulate, and promote atherosclerosis progression 23 . These mechanisms directly support the significantly higher TG, VLDL, and LDL values observed in our IHD cases compared to controls (all p < 0.001; Table 1 ). Further underscoring the atherogenic role of triglyceride-rich lipoproteins, prospective data from the Women's Health Study confirmed that TRL-C and sdLDL-C concentrations were significantly higher in cardiovascular events case groups (myocardial infarction) compared with reference sub-cohorts (hazard ratio: 3.71 [95% CI: 1.59 to 8.63]; p < 0.001), and that their cholesterol content influences atherogenesis independently of LDL cholesterol 24 . Taken together, these findings validate the lipid parameters selected in our feature set as clinically meaningful predictors of IHD. An important finding of our study is the limited predictive contribution of HDL cholesterol, whose levels did not differ significantly between cases and controls (p = 0.082; Table 1 ). A high level of high prebeta-1 HDL has been associated with ischemic heart disease, suggesting functional impairment of cholesterol efflux from the artery wall, indicating that HDL functionality rather than its absolute concentration may be a more relevant determinant of cardiovascular protection 25 , 26 . This insight is consistent with our ML feature importance results, in which HDL ranked lower than TG, VLDL, and LDL across models, reinforcing the ability of data-driven approaches to refine traditional lipid risk paradigms. Beyond lipid parameters, BMI emerged as the most influential predictor across all models in both the training and testing phases (Tables 4 and 6 ). This finding is strongly supported by literature. A study reported that the BMI is causally associated with atrial fibrillation, heart failure, and ischemic stroke, and that a 5 kg/m² increase in BMI raised risk of coronary heart disease by approximately 1.23-fold 27 . Furthermore, elevated BMI is associated with increased body fat, which can lead to hypertension, dyslipidemia, and insulin resistance which are all major risk factors for high CVD risk, with overweight individuals experiencing increased strain on the cardiovascular system 28 . In parallel, a recent ML study of 7,260 participants identified body fat percentage and lipid profiles as significant predictors of CHD, with SHAP analysis confirming the contribution of non-HDL cholesterol and blood pressure 27 . The consistent importance of hypertension in our models (present in 63% of cases vs. 22% of controls, p < 0.001; Table 1 ) is similarly well-established, as hypertension and dyslipidemia are two highly prevalent and modifiable risk factors in stable ischemic heart disease, with multiple lines of evidence demonstrating that lowering blood pressure and LDL cholesterol improves clinical outcomes 29 . Among all algorithms evaluated, LightGBM demonstrated the best overall performance, achieving an accuracy of 0.98 on the 20% test dataset and 1.00 on the unseen 10% dataset, with AUC-ROC values of 0.99 and 1.00, respectively (Tables 4 and 6 ). These results compare favorably with the broader ML cardiovascular literature. A recent LightGBM-based framework for coronary heart disease prediction trained on the BRFSS_2015 dataset achieved an accuracy of approximately 90.6% accuracy and an AUC of 0.81, significantly outperforming the Framingham Risk Score, which achieved an accuracy of 79.00% and an AUC of 70.00% 30 . Our findings substantially exceed these benchmarks, which may be attributed to the focused selection of lipid and anthropometric predictors and the higher discriminability of our case-control cohort. Advanced classification models like XGBoost and LightGBM have consistently outperformed logistic regression and decision trees for heart disease prediction 31 , a pattern confirmed in our own comparative results where LightGBM surpassed all other classifiers across training, testing, and unseen datasets. The high cross-validation accuracy of LightGBM (0.98 ± 0.02; Table 3 ), combined with its stable learning curves (Fig. 6 ), indicates minimal overfitting and supports the robustness and reliability of the model. The SVM model with RBF kernel also demonstrated strong clinical utility, particularly in terms of sensitivity, achieving a recall of 1.00 for IHD cases in the training dataset (Table 3 ) and an accuracy of 0.98 and AUC of 0.99 on the test dataset (Table 4 ). This is clinically significant because high sensitivity is essential in screening applications where failure to detect high-risk individuals has serious consequences. These results are consistent with prior studies: an SVM classifier built using the RBF kernel for coronary heart disease classification based on blood pressure and plasma lipid data achieved superior accuracy, sensitivity, and specificity compared to artificial neural networks, linear discriminant analysis, and logistic regression 32 . However, compared to LightGBM, SVM lacks inherent interpretability, which may limit its standalone clinical utility. To address the interpretability challenge, SHAP (SHapley Additive exPlanations) analysis was applied to the LightGBM model, providing transparent and clinically actionable insights (Fig. 6 ). Explainable AI (XAI) techniques such as SHAP and LIME have been increasingly integrated as clinical transparency tools, enabling physicians to understand the factors influencing model predictions 31 . In our study, SHAP analysis confirmed that higher BMI, LDL, VLDL, HDL, and age were the most influential predictors, consistent with established biological mechanisms and aligning with the top features identified in Tables 4 and 6 . Obesity as measured by BMI contributed significantly to coronary artery disease prediction in SHAP-based analyses, consistent with umbrella reviews and Mendelian randomization studies that confirm obesity as a causal factor for CAD 28 . This convergence of data-driven and mechanistic evidence reinforces the translational validity of our model's predictions. Regarding model generalizability, LightGBM maintained high and stable performance across training (accuracy 0.98, AUC 0.99), testing (accuracy 0.98, AUC 0.99), and the fully unseen dataset (accuracy 1.00, AUC 1.00), demonstrating that the model did not simply overfit to the training data. Systematic reviews of ML models for cardiovascular disease risk using electronic health records have reported a rising interest in related papers, particularly throughout the past five years, with increasing emphasis on the need for multi-dataset validation to confirm generalizability 31 . Our three-dataset validation strategy that includes training (80%), test (20%), and unseen (10%) datasets directly address this recognized gap, distinguishing our study from many prior works that rely on a single train-test split. Meta-analyses indicate that ensemble and neural-network models often outperform conventional statistical approaches, and several explainable-AI frameworks combining machine learning with SHAP have revealed key predictors including blood pressure, lipids, and glycated hemoglobin that drive model decisions 33 . Despite these strengths, certain limitations must be acknowledged. First, the study is based on a single institutional dataset, and external validation in geographically and ethnically diverse populations is necessary to confirm generalizability. Second, while the lipid-centered feature set enhances clinical applicability and reflects routinely collected data, the inclusion of additional biomarkers such as inflammatory markers (e.g., hsCRP), genetic variants, or imaging-derived features may further improve predictive performance. Third, the cross-sectional nature of the data precludes the assessment of longitudinal IHD risk trajectories. Conclusion In sum, this study demonstrates that ML models, particularly LightGBM, can achieve highly accurate and generalizable prediction of ischemic heart disease using routinely available lipid, anthropometric, and clinical variables. The superiority of LightGBM over other classifiers, combined with SHAP-based interpretability, the high recall of SVM for IHD cases, and robust performance across three validation datasets, collectively establish a practical and clinically actionable ML-driven framework for early IHD risk stratification. Declarations Ethics approval and consent to participate: The research was approved from Research Ethics and Biosafety Committee of University of the Punjab, Pakistan that follows Helsinki Declaration. Written informed consent was obtained from all participants prior to enrollment. Consent for publication: Not applicable. Competing interests: The authors have no competing interests. Funding: This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. Author Contribution A.I.: Designed the study, data collection, data curation and analysis, wrote and proofread the manuscript. S.A.: Data analysis, prepared tables, and figures, and wrote and proofread the manuscript. F.T.: Data analysis, prepared tables, and figures, S.: Proofread, reviewed, and approved the draft of manuscript. Acknowledgements: not applicable Data Availability The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request. References Dharmakumar, R. et al. Reperfused myocardial infarction: the road to CCS classification of acute MI and beyond. JACC: Adv. 4 , 101528 (2025). Zhou, X. et al. Global trends in ischemic heart disease mortality from 1990 to 2021 and 2036 projections: insights from GBD 2021 data. Global Heart . 20 , 92 (2025). Shu, T. et al. Assessing global, regional, and national time trends and associated risk factors of the mortality in ischemic heart disease through global burden of disease 2019 study: population-based study. JMIR public. health surveillance . 10 , e46821 (2024). Siddiqi, A. K., Mubashir, E., Cheema, A. A. A., Noshab, M. & Naeem, M. Vol. 88 59–60 (LWW, (2026). Hassan, K. M. et al. Abstract P266: burden and trends of cardiovascular disease and its attributable risk factor in Pakistan from 1990–2019: a benchmarking analysis. Circulation 149 , AP266–AP266 (2024). Ullah, S. A., Shah, S. T., Khan, S. & Khalil, A. A. Frequency and Pattern of Coronary Artery Disease and Its Associated Risk Factors in Stable Ischemic Heart Disease Patients Undergoing Coronary Angiography: A Cross Sectional Study. J. Khyber Coll. Dent. 10 , 1–6 (2020). Labreuche, J., Touboul, P. J. & Amarenco, P. Plasma triglyceride levels and risk of stroke and carotid atherosclerosis: a systematic review of the epidemiological studies. Atherosclerosis 203 , 331–345 (2009). Parhofer, K. G. & Laufs, U. Lipid profile and lipoprotein (a) testing. Deutsches Ärzteblatt Int. 120 , 582 (2023). Wan, H., Wu, H., Wei, Y., Wang, S. & Ji, Y. Novel lipid profiles and atherosclerotic cardiovascular disease risk: insights from a latent profile analysis. Lipids Health Dis. 24 , 71 (2025). Jeppesen, J. & Triglycerides high-density lipoprotein cholesterol, and risk of ischemic heart disease: a view from the Copenhagen Male Study. Metab. Syndr. Relat. Disord. 1 , 33–53 (2003). Jeppesen, J., Hein, H. O., Suadicani, P. & Gyntelberg, F. Relation of high TG–low HDL cholesterol and LDL cholesterol to the incidence of ischemic heart disease: an 8-year follow-up in the Copenhagen male study. Arterioscler. Thromb. Vasc. Biol. 17 , 1114–1120 (1997). Inyaku, M. et al. Calculated small dense low-density lipoprotein cholesterol level is a predominant predictor for new onset of ischemic heart disease. J. Atheroscler. Thromb. 31 , 232–248 (2024). Alsabhan, W. & Alfadhly, A. Effectiveness of machine learning models in diagnosis of heart disease: a comparative study. Sci. Rep. 15 , 24568 (2025). Ogunpola, A., Saeed, F., Basurra, S., Albarrak, A. M. & Qasem, S. N. Machine learning-based predictive models for detection of cardiovascular diseases. Diagnostics 14 , 144 (2024). Bani Hani, S. H. & Ahmad, M. M. Machine-learning algorithms for ischemic heart disease prediction: a systematic review. Curr. Cardiol. Rev. 19 , 87–99 (2023). Chobanian, A. V. et al. Seventh report of the joint national committee on prevention, detection, evaluation, and treatment of high blood pressure. hypertension 42, 1206–1252 (2003). Johnson, R., McNutt, P., MacMahon, S. & Robson, R. Use of the Friedewald formula to estimate LDL-cholesterol in patients with chronic renal failure on dialysis. Clin. Chem. 43 , 2183–2184 (1997). Stojanov, D., Lazarova, E., Veljkova, E., Rubartelli, P. & Giacomini, M. Predicting the outcome of heart failure against chronic-ischemic heart disease in elderly population–Machine learning approach based on logistic regression, case to Villa Scassi hospital Genoa, Italy. J. King Saud University-Science . 35 , 102573 (2023). Ghiasi, M. M., Zendehboudi, S. & Mohsenipour, A. A. Decision tree-based diagnosis of coronary artery disease: CART model. Comput. Methods Programs Biomed. 192 , 105400 (2020). Yang, C., An, B. & Yin, S. in IEEE International Conference on Systems, Man, and Cybernetics (SMC). 3153–3158 (IEEE). 3153–3158 (IEEE). (2018). Xu, Y. et al. Predicting ICU mortality in rheumatic heart disease: comparison of XGBoost and logistic regression. Front. Cardiovasc. Med. 9 , 847206 (2022). Omotehinwa, T. O., Oyewola, D. O. & Moung, E. G. Optimizing the light gradient-boosting machine algorithm for an efficient early detection of coronary heart disease. Inf. Health . 1 , 70–81 (2024). Du, Z. & Qin, Y. Dyslipidemia and cardiovascular disease: current knowledge, existing challenges, and new opportunities for management strategies. J. Clin. Med. 12 , 363 (2023). Duran, E. K. et al. Triglyceride-rich lipoprotein cholesterol, small dense LDL cholesterol, and incident cardiovascular disease. J. Am. Coll. Cardiol. 75 , 2122–2135 (2020). Sethi, A. A. et al. High pre-β1 HDL concentrations and low lecithin: cholesterol acyltransferase activities are strong positive risk markers for ischemic heart disease and independent of HDL-cholesterol. Clin. Chem. 56 , 1128–1137 (2010). Kane, J. P. & Malloy, M. J. Prebeta-1 HDL and coronary heart disease. Curr. Opin. Lipidol. 23 , 367–371 (2012). Vu, T. et al. Machine learning model for predicting coronary heart disease risk: Development and validation using insights from a Japanese population–based study. JMIR cardio . 9 , e68066 (2025). Tasmurzayev, N. et al. Interpretable Machine Learning for Coronary Artery Disease Risk Stratification: A SHAP-Based Analysis. Algorithms 18 , 697 (2025). Li, J., Xue, Z., Zhang, M., Ji, S. & Lu, F. Machine learning-derived identification of an obesity and lipid metabolism-related genes signature for the diagnosis and molecular typing of acute myocardial infarction. Frontiers Cardiovasc. Medicine 13 , 1694872 . Deng, L., Lu, K. & Hu, H. An interpretable LightGBM model for predicting coronary heart disease: Enhancing clinical decision-making with machine learning. Plos one . 20 , e0330377 (2025). Banerjee, T. & Paçal, İ. A systematic review of machine learning in heart disease prediction. Turkish J. Biology . 49 , 600–634 (2025). Zhu, Y., Wu, J. & Fang, Y. Study on application of SVM in prediction of coronary heart disease. Sheng Wu Yi Xue Gong. Cheng Xue Za Zhi= J. Biomedical Engineering= Shengwu Yixue Gongchengxue Zazhi . 30 , 1180–1185 (2013). Srinivasan, S. et al. An active learning machine technique based prediction of cardiovascular heart disease from UCI-repository database. Sci. Rep. 13 , 13588 (2023). Additional Declarations No competing interests reported. Supplementary Files SupplementaryFile.docx Cite Share Download PDF Status: Under Revision Version 1 posted Editorial decision: Revision requested 30 Apr, 2026 Reviews received at journal 29 Apr, 2026 Reviewers agreed at journal 29 Apr, 2026 Reviews received at journal 29 Apr, 2026 Reviewers agreed at journal 29 Apr, 2026 Reviews received at journal 29 Apr, 2026 Reviewers agreed at journal 29 Apr, 2026 Reviews received at journal 29 Apr, 2026 Reviewers agreed at journal 29 Apr, 2026 Reviewers agreed at journal 28 Apr, 2026 Reviewers invited by journal 28 Apr, 2026 Editor invited by journal 27 Apr, 2026 Editor assigned by journal 20 Apr, 2026 Submission checks completed at journal 20 Apr, 2026 First submitted to journal 19 Apr, 2026 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9464045","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":630561925,"identity":"0c7fd4e7-9fa8-419f-b567-40528aaacfe7","order_by":0,"name":"ANAM IJAZ","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA/ElEQVRIiWNgGAWjYLCCBB4GBn5m5gMHPgA5bOzEapFsb0s8OAOkhZlYmwzOnDE+zANiEdLCP7v56YYHMnb2DDcSDA7b/Nomz8fMwPjhYw5uLRJ3jpndSOBJZmackZBwOLfvtmEbMwOz5MxteKy5kQDSwszGLJFw4HBuz21GoBY2Zl48WuRvpH8DaqnnYZNIbDhs2XPbnqAWgxs5IFsOS/DwHGY4zPDjdiJBLYY3csqAWo4bSLC3MRzsbbid3MbM2IzXL3I30rfd/NlTbW9/mP/zhx9/btvOb28++OEjPu+DAGMPjNEGJhsIqAeBHzDGHyIUj4JRMApGwYgDAEMlVY/TmuoyAAAAAElFTkSuQmCC","orcid":"","institution":"University of the Punjab","correspondingAuthor":true,"prefix":"","firstName":"ANAM","middleName":"","lastName":"IJAZ","suffix":""},{"id":630561926,"identity":"874e61df-7116-4e0a-baa8-ef94c06aa2af","order_by":1,"name":"Sara Aslam","email":"","orcid":"","institution":"The Superior University","correspondingAuthor":false,"prefix":"","firstName":"Sara","middleName":"","lastName":"Aslam","suffix":""},{"id":630561927,"identity":"f8399aeb-f865-4ea6-8415-cca25a4e3bac","order_by":2,"name":"Fatima Taj","email":"","orcid":"","institution":"The Superior University","correspondingAuthor":false,"prefix":"","firstName":"Fatima","middleName":"","lastName":"Taj","suffix":""},{"id":630561928,"identity":"94a20589-26bd-4317-a89b-9d85e803323e","order_by":3,"name":"Shabana NA","email":"","orcid":"","institution":"University of the Punjab","correspondingAuthor":false,"prefix":"","firstName":"Shabana","middleName":"","lastName":"NA","suffix":""}],"badges":[],"createdAt":"2026-04-19 17:53:29","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-9464045/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9464045/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":108182091,"identity":"0d3d6877-d644-4b84-8be9-12f9f6765692","added_by":"auto","created_at":"2026-04-30 08:59:08","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":104456,"visible":true,"origin":"","legend":"\u003cp\u003eSchemetic diagram of the data analysis.\u003c/p\u003e","description":"","filename":"Figure1.png","url":"https://assets-eu.researchsquare.com/files/rs-9464045/v1/b83f89cbab5a2c89719ebaf5.png"},{"id":108073076,"identity":"40a4fbd4-0d85-4d9f-adbd-67385de7d6d8","added_by":"auto","created_at":"2026-04-29 06:17:08","extension":"jpg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":78413,"visible":true,"origin":"","legend":"\u003cp\u003eResults of multivariate linear regression analysis on 80% training dataset: a) Confusion matrix, b) ROC curve, c) feature importance.\u003c/p\u003e","description":"","filename":"Figure2.jpg","url":"https://assets-eu.researchsquare.com/files/rs-9464045/v1/0220450a7fb80ab09ecabfd7.jpg"},{"id":108073077,"identity":"010235cb-83b0-47a0-9894-fde1b04620eb","added_by":"auto","created_at":"2026-04-29 06:17:08","extension":"jpg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":74144,"visible":true,"origin":"","legend":"\u003cp\u003eResults of decision tree analysis on 80% training dataset: a) Confusion matrix, b) ROC curve, c) feature importance.\u003c/p\u003e","description":"","filename":"Figure3.jpg","url":"https://assets-eu.researchsquare.com/files/rs-9464045/v1/383a4da1545bf04830df1b66.jpg"},{"id":108181628,"identity":"a3875649-38ce-49f1-854a-b14948eb0470","added_by":"auto","created_at":"2026-04-30 08:58:48","extension":"jpg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":62247,"visible":true,"origin":"","legend":"\u003cp\u003eResults of \u0026nbsp;RBF Kernel analysis on 80% training dataset: a) Confusion matrix, b) ROC curve, c) feature importance.\u003c/p\u003e","description":"","filename":"Figure4.jpg","url":"https://assets-eu.researchsquare.com/files/rs-9464045/v1/283621389b67c4ca97658cd2.jpg"},{"id":108181665,"identity":"34c8c9ae-b6c4-459a-b6e7-ac05ff8ff146","added_by":"auto","created_at":"2026-04-30 08:58:49","extension":"jpg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":86136,"visible":true,"origin":"","legend":"\u003cp\u003eResults of XGBOOST analysis on 80% training dataset. a) Confusion matrix, b) ROC curve, c) feature importance, d) prediction probablility distribution.\u003c/p\u003e","description":"","filename":"Figure5.jpg","url":"https://assets-eu.researchsquare.com/files/rs-9464045/v1/2e1a2485b7470d385c9e0055.jpg"},{"id":108073080,"identity":"e5f33e06-9ff8-4309-8547-dc661a721b79","added_by":"auto","created_at":"2026-04-29 06:17:08","extension":"jpg","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":121832,"visible":true,"origin":"","legend":"\u003cp\u003eResults of Light GBM analysis on 80% training dataset: a) feature correlation matrix, b) confusion matrix, c) ROC curve, d) feature importance graph, e) learning curve of Light GBM model, f) SHAP summary plot.\u003c/p\u003e","description":"","filename":"Figure6.jpg","url":"https://assets-eu.researchsquare.com/files/rs-9464045/v1/3853e20d5558fbb059cf6505.jpg"},{"id":108181637,"identity":"d3bceabf-250b-4165-a7c4-7b8bddae52d9","added_by":"auto","created_at":"2026-04-30 08:58:48","extension":"jpg","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":76235,"visible":true,"origin":"","legend":"\u003cp\u003eResults of Light GBM analysis on test dataset (20%): a) Confusion matrix, b) ROC curve, c) feature importance.\u003c/p\u003e","description":"","filename":"Figure7.jpg","url":"https://assets-eu.researchsquare.com/files/rs-9464045/v1/d5f4b1d75fdf19b579c98dc5.jpg"},{"id":108073081,"identity":"d46902b4-343b-48a1-8619-01f060226440","added_by":"auto","created_at":"2026-04-29 06:17:08","extension":"jpg","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":77815,"visible":true,"origin":"","legend":"\u003cp\u003eResults of Light GBM analysis on test dataset (10%): a) Confusion matrix, b) ROC curve, c) feature importance.\u003c/p\u003e","description":"","filename":"Figure8.jpg","url":"https://assets-eu.researchsquare.com/files/rs-9464045/v1/eecb0f792609335d7c7fc6e4.jpg"},{"id":108183816,"identity":"d0ebe31d-cb84-4c33-857e-be814ced29b4","added_by":"auto","created_at":"2026-04-30 09:02:53","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1182081,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9464045/v1/2d5485e3-826e-49e3-a40e-804d24087536.pdf"},{"id":108073074,"identity":"83aa7961-063e-42a9-950b-85e7405ce17e","added_by":"auto","created_at":"2026-04-29 06:17:08","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":16636,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryFile.docx","url":"https://assets-eu.researchsquare.com/files/rs-9464045/v1/bffc32b62762e3186a7dddcd.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"Lipid profile-based prediction of ischemic heart disease using machine learning: a comparative analysis of classification algorithms","fulltext":[{"header":"Introduction","content":"\u003cp\u003eIschemic heart disease (IHD) is defined by decreased cardiac blood flow causing an imbalance between myocardial oxygen supply and demand, manifesting across a spectrum from chronic coronary syndromes to acute atherothrombotic events\u003csup\u003e\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u003c/sup\u003e. IHD remains the leading cause of death globally, responsible for an estimated 9.14\u0026nbsp;million deaths in 2019, with the highest age-standardized mortality rates in Central Asia, Eastern Europe, and the Middle East\u003csup\u003e\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e,\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e\u003c/sup\u003e. Pakistan bears a disproportionately high burden: the age-standardized cardiovascular diseases (CVD) incidence rate of 918.18 per 100,000 population exceeds the global average of 684.33 per 100,000, and IHD accounted for approximately 183,409 deaths in 2019 representing 53.76% of all CVD deaths\u003csup\u003e\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e,\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u003c/sup\u003e. Elevated LDL cholesterol and high blood pressure remain the two leading attributable risk factors, underscoring the importance of lipid profile assessment for early IHD detection in the Pakistani population\u003csup\u003e\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eDyslipidemia is characterized by elevated total cholesterol (TC), low-density lipoprotein cholesterol (LDL-C), triglycerides (TG), and very-low-density lipoprotein (VLDL), alongside reduced high-density lipoprotein cholesterol (HDL-C). These parameters are often early indicators of coronary artery disease\u003csup\u003e\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e,\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u003c/sup\u003e. While LDL-C is causally validated as the primary atherogenic driver\u003csup\u003e\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u003c/sup\u003e, its predictive value alone is limited, as many IHD cases occur in individuals with normal LDL-C levels\u003csup\u003e\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u003c/sup\u003e. The combined pattern of elevated TG and reduced HDL-C has demonstrated at least equivalent predictive power, highlighting the value of comprehensive lipid panel assessment over single-fraction analysis\u003csup\u003e\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e,\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eMachine learning (ML) has demonstrated substantial superiority over conventional risk scoring tools for cardiovascular risk prediction, with supervised algorithms including Logistic Regression (LR), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), and gradient boosting frameworks such as LightGBM and XGBoost\u003csup\u003e\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u003c/sup\u003e. Each of these algorithms operates on distinct mathematical principles, conferring differential strengths in terms of interpretability, handling of non-linear relationships, and robustness to class imbalance\u003csup\u003e\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eDespite the accumulating evidence supporting the utility of ML in cardiovascular risk prediction, the application of these models specifically to lipid-profile-driven classification of ischemic heart disease patients versus healthy controls remains comparatively underexplored. Traditional tools and laboratory findings fall short of the ideal standard for risk prediction, and the thriving application of machine-learning algorithms in IHD-related datasets including supervised methods such as Decision Tree, XGBoost, SVM, and Logistic Regression has provided compelling evidence for enhancing both short- and long-term mortality and morbidity risk estimation in ischemic cardiac conditions\u003csup\u003e\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e\u003c/sup\u003e. The present study therefore aims to address this gap by employing and comparing multiple ML classification algorithms applied to a clinical lipid profile dataset of IHD patients and healthy controls to identify the most accurate and clinically applicable predictive model for the early biochemical detection of ischemic heart disease.\u003c/p\u003e"},{"header":"Methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eStudy Design\u003c/h2\u003e \u003cp\u003eThis was a hospital-based case-control study conducted from February 2024 to July 2024. Data was collected at a single time point; due to the cross-sectional nature of data collection, causality and temporal relationships cannot be inferred. The research was approved from Research Ethics and Biosafety Committee of University of the Punjab, Pakistan that follows Helsinki Declaration. Written informed consent was obtained from all participants prior to enrollment.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003ePatient recruitment and data collection\u003c/h3\u003e\n\u003cp\u003eA minimum sample size of approximately 594 participants was calculated assuming a 17% IHD prevalence, 5% detectable difference, 90% power, and a 5% significance level. The study comprised 710 confirmed IHD cases and 218 healthy controls recruited from two tertiary care centers: the Social Security Hospital (PESSI) and Punjab Institute of Cardiology (PIC), Lahore, Pakistan. Cases and controls were matched for gender and ethnicity. Diagnosis of IHD was confirmed by an on-duty cardiologist based on positive cardiac echocardiography, electrocardiogram (ECG) findings, clinical assessment, and serum cardiac biomarkers including Troponin T/I. Only recently diagnosed patients who had not yet commenced lipid-lowering or antihypertensive medications were enrolled, to avoid confounding effects of pharmacological treatment on lipid profiles.\u003c/p\u003e \u003cp\u003eParticipants with pre-existing conditions including liver disease, kidney disease, or malignancy were excluded from both cases and controls groups. IHD patients with comorbid diabetes, hypertension, smoking history, and positive family history of IHD were included in the study, as these represent clinically relevant risk factors. Diabetes was defined as a fasting blood glucose (FBG) of \u0026ge;\u0026thinsp;126 mg/dL or a 2-hour postprandial blood glucose of \u0026ge;\u0026thinsp;200 mg/dL, in accordance with WHO diagnostic criteria. Hypertension was classified using the criteria defined in the Seventh Report of the Joint National Committee on Prevention, Detection, Evaluation, and Treatment of High Blood Pressure (JNC 7)\u003csup\u003e16\u003c/sup\u003e. Blood pressure was recorded as the means of two consecutive measurements taken after the participant had been seated and rested for five minutes. Anthropometric measurements including height and weight were recorded in the clinic by trained staff. Body mass index (BMI) was calculated as weight in kilograms divided by the square of height in meters (kg/m\u0026sup2;).\u003c/p\u003e \u003cp\u003eDemographic and clinical data were recorded through structured face-to-face interviews and review of participants' medical records. A standardized questionnaire was administered to collect information on age, gender, BMI, smoking status, and history of diabetes and hypertension. Lipid profile values were extracted from laboratory biochemical reports generated at the respective analysis centers.\u003c/p\u003e \u003cp\u003ePrior to biochemical analysis, all samples were prescreened for human immunodeficiency virus (HIV), hepatitis B virus (HBV), hepatitis C virus (HCV), and syphilis, and reactive samples were excluded. Missing values for lipid parameters, accounting for less than 5% of the dataset, were imputed using mean substitution.\u003c/p\u003e\n\u003ch3\u003eBlood Sample Collection\u003c/h3\u003e\n\u003cp\u003eAll participants were instructed to fast for a minimum of 10\u0026ndash;12 hours overnight prior to blood collection. Venous blood samples were collected in sterile vacutainer tubes under aseptic conditions and allowed to clot at room temperature for 10\u0026ndash;20 minutes. Serum was separated by centrifugation at 3000rpm for 5 min and transferred to labelled Eppendorf tubes and stored at \u0026minus;\u0026thinsp;20\u0026deg;C until biochemical analysis was performed.\u003c/p\u003e\n\u003ch3\u003eLDL = Total Cholesterol − HDL − (TG/5)\u003c/h3\u003e\n\u003cp\u003ewhere TG/5 represents the estimated VLDL cholesterol concentration. All calculations were performed automatically by the analyzer.\u003c/p\u003e \u003cp\u003e At IDC Evercare Hospital, local commercial kits were employed for direct quantification of all lipid parameters. Each kit contained two reagents (R1 and R2) that were automatically mixed with serum in the reaction vessels of the analyzer. Following sequential addition of reagents and serum, enzymatic reactions occurred and absorbance was measured spectrophotometrically to quantify each lipid parameter. Results were available within approximately 15 minutes of sample loading.\u003c/p\u003e \u003cp\u003eThe complete lipid profile for each participant included LDL cholesterol, HDL cholesterol, VLDL cholesterol, total cholesterol (TC), and triglycerides (TG), which served as the primary biochemical variables in subsequent statistical and machine learning analyses.\u003c/p\u003e \u003cp\u003eAt PESSI, LDL cholesterol was calculated using the Friedewald equation. It is noted that this formula is validated for triglyceride concentrations below 400 mg/dL; samples with triglyceride levels exceeding this threshold may carry a degree of estimation error in calculated LDL, which is acknowledged as a limitation. At IDC Evercare Hospital, LDL was directly measured using enzymatic colorimetric kits. Despite the use of two analytical platforms, both centers employed instruments operating on the same spectrophotometric principle with commercially standardized reagent kits, minimizing systematic bias. The Friedewald equation used at PESSI is internationally validated and widely accepted in clinical lipid research. Pooled analysis was therefore considered appropriate for the purposes of this study.\u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eStatistical analysis\u003c/h2\u003e \u003cp\u003eDescriptive and inferential statistics were performed in R (version 4.5.3) using the following packages: \u003cem\u003ereadxl\u003c/em\u003e, \u003cem\u003edplyr\u003c/em\u003e, \u003cem\u003egtsummary\u003c/em\u003e, \u003cem\u003eflextable\u003c/em\u003e, and \u003cem\u003eofficer\u003c/em\u003e. Continuous variables are expressed as mean (standard deviation, SD) and categorical variables as frequency and percentage. Between-group differences were assessed using Welch's two-sample t-test for continuous variables and Pearson's chi-squared test for categorical variables (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). Univariate and multivariate binary logistic regression analyses were subsequently conducted to identify independent predictors of IHD, with results reported as odds ratios (OR) with 95% confidence intervals (CI) and corresponding p-values. A p-value\u0026thinsp;\u0026lt;\u0026thinsp;0.05 was considered statistically significant throughout all conventional statistical analyses.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eCharacteristics of the cohort included in the study.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCharacteristics\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCases \u003c/p\u003e \u003cp\u003eN\u0026thinsp;=\u0026thinsp;710\u003csup\u003e1\u003c/sup\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eControls \u003c/p\u003e \u003cp\u003eN\u0026thinsp;=\u0026thinsp;219\u003csup\u003e1\u003c/sup\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003ep-value\u003csup\u003e\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u003c/sup\u003e\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTG\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e252 (122)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e188 (66)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTC\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e216 (103)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e175 (43)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eVLDL\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e83 (84)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e38 (13)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLDL\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e122 (79)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e77 (21)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHDL\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e73 (68)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e68 (18)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.082\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBMI\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e33 (6)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e22 (3)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAge\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e59 (12)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e56 (10)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.002\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGender\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.2\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFemale\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e288 (41%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e100 (46%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMale\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e422 (59%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e119 (54%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDiabetes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e325 (46%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e34 (16%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHypertension\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e446 (63%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e49 (22%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSmoking\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e130 (18%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e26 (12%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.034\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colspan=\"4\" nameend=\"c4\" namest=\"c1\"\u003e \u003cp\u003e\u003csup\u003e1\u003c/sup\u003eMean (SD); n (%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"4\"\u003e\u003csup\u003e2\u003c/sup\u003eWelch Two Sample t-test; Pearson's Chi-squared test\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eAll machine learning (ML) analyses were implemented in Python (version 7.5.5, Jupyter Notebook environment) using the Pandas, NumPy, and Keras libraries, supplemented by scikit-learn, XGBoost, LightGBM, and SHAP. The complete analytical workflow is illustrated in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. The dataset comprised 929 participants (710 IHD cases, 219 healthy controls) and was partitioned under two independent stratified splitting strategies to evaluate model performance and generalizability. In the primary split, the dataset was divided into a training set (80%) and a test set (20%). To further assess generalizability on completely unseen data, a secondary three-way split was applied, yielding a training set (80%), an internal test set (10%), and a held-out unseen test set (10%). Stratified splitting was applied in both schemes to preserve the case-to-control ratio across all partitions, and a fixed random state (random_state\u0026thinsp;=\u0026thinsp;42) was used throughout to ensure full reproducibility.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFive supervised classification algorithms were trained and evaluated: Logistic Regression (LR)\u003csup\u003e\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e\u003c/sup\u003e, Decision Tree (DT)\u003csup\u003e\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u003c/sup\u003e, Support Vector Machine with a Radial Basis Function kernel (SVM-RBF)\u003csup\u003e\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e\u003c/sup\u003e, Extreme Gradient Boosting (XGBoost, hyperparameter-optimized via grid search)\u003csup\u003e\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u003c/sup\u003e, and Light Gradient Boosting Machine (LightGBM)\u003csup\u003e\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e\u003c/sup\u003e. Feature standardization was applied prior to SVM training; tree-based and ensemble models were trained on unscaled features. Model performance was quantified using precision, recall, F1-score, overall classification accuracy, area under the receiver operating characteristic curve (AUC-ROC), and stratified five-fold cross-validation accuracy (mean\u0026thinsp;\u0026plusmn;\u0026thinsp;SD). Confusion matrices and ROC curves were generated for all models on both training and test partitions.\u003c/p\u003e \u003cp\u003eFeature importance was assessed for all models using built-in impurity-based importance scores. For the best-performing model, global model interpretability was further examined using SHAP (SHapley Additive exPlanations) values, providing a theoretically grounded, model-agnostic decomposition of individual feature contributions to predictions. A learning curve analysis was additionally conducted for the best-performing model to assess the relationship between training sample size and generalization performance, and to confirm model convergence prior to final evaluation.\u003c/p\u003e \u003c/div\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003eCohort characteristics\u003c/h2\u003e \u003cp\u003eA total of 929 participants were included in this study, comprising 710 ischemic heart disease (IHD) cases and 219 controls. Comparative analysis of baseline characteristics revealed significant differences in lipid profile parameters between the two groups. Specifically, triglycerides, total cholesterol, VLDL, and LDL levels were significantly elevated in cases compared to controls (all \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001), indicating a strong association between dyslipidemia and IHD. In contrast, HDL levels did not differ significantly (\u003cem\u003ep\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.082), suggesting a comparatively weaker discriminative role in this cohort (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eIn addition to lipid parameters, cases exhibited a significantly higher mean body mass index (BMI) (33\u0026thinsp;\u0026plusmn;\u0026thinsp;6 in cases vs. 22\u0026thinsp;\u0026plusmn;\u0026thinsp;3 in controls, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001) and were slightly older than controls (59\u0026thinsp;\u0026plusmn;\u0026thinsp;12 years in cases vs. 56\u0026thinsp;\u0026plusmn;\u0026thinsp;10 years in control, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.002). The prevalence of key comorbidities, including diabetes (46% in cases vs. 16% control, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001) and hypertension (63% in cases vs. 22% in control, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001), was substantially greater in cases, further reinforcing their established role as risk factors for IHD. Smoking was also more prevalent among cases (\u003cem\u003ep\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.034), whereas gender distribution did not differ significantly between groups (\u003cem\u003ep\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.2). Collectively, these findings confirm that individuals with IHD exhibit a significantly adverse cardiometabolic profile (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003eUnivariate and multivariate regression analysis (training dataset, 80%)\u003c/h2\u003e \u003cp\u003eMultivariate regression analysis identified BMI, VLDL, LDL, and hypertension as significant independent predictors of IHD, with BMI and hypertension demonstrating the strongest associations. Although diabetes and smoking were significant in univariate analysis, they lost statistical significance after adjustment, suggesting potential confounding effects. Female gender showed a protective association in the adjusted model, while age and HDL were not statistically significant predictors (Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eUnivariate and multivariate regression on applied on training (80%) dataset.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"8\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e\u003cb\u003eCharacteristic\u003c/b\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"4\" nameend=\"c5\" namest=\"c2\"\u003e \u003cp\u003eUnivariate\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c8\" namest=\"c6\"\u003e \u003cp\u003eMultivariate\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003eN\u003c/b\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003eOR\u003c/b\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e95% CI\u003c/b\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003ep-value\u003c/b\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003eOR\u003c/b\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e95% CI\u003c/b\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003ep-value\u003c/b\u003e\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTG\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e743\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1.01, 1.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e\u0026lt;\u0026thinsp;0.001\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.99, 1.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e0.033\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eVLDL\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e743\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.04\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1.03, 1.05\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e\u0026lt;\u0026thinsp;0.001\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1.06\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e1.04, 1.09\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e\u0026lt;\u0026thinsp;0.001\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLDL\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e743\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.02\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1.02, 1.03\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e\u0026lt;\u0026thinsp;0.001\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1.03\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e1.02, 1.05\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e\u0026lt;\u0026thinsp;0.001\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHDL\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e743\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1.00, 1.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.98, 1.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.053\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBMI\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e743\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.81\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1.66, 2.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e\u0026lt;\u0026thinsp;0.001\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e2.05\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e1.78, 2.42\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e\u0026lt;\u0026thinsp;0.001\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAge\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e743\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.02\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1.01, 1.04\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.004\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1.02\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.99, 1.06\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.2\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGender\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e743\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e0.014\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMale\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026mdash;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u0026mdash;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u0026mdash;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u0026mdash;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFemale\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.83\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.59, 1.16\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.35\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.14, 0.81\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDiabetes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e743\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.2\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026mdash;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u0026mdash;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u0026mdash;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u0026mdash;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e4.66\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e3.04, 7.38\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e\u0026lt;\u0026thinsp;0.001\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1.97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.77, 5.24\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHypertension\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e743\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e\u0026lt;\u0026thinsp;0.001\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026mdash;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u0026mdash;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u0026mdash;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u0026mdash;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e6.31\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e4.27, 9.53\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e\u0026lt;\u0026thinsp;0.001\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e4.54\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e1.91, 11.4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSmoking\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e743\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u0026gt;\u0026thinsp;0.9\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026mdash;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u0026mdash;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u0026mdash;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u0026mdash;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e2.08\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1.24, 3.70\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.008\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1.05\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.33, 3.41\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"8\"\u003eAbbreviations: CI\u0026thinsp;=\u0026thinsp;Confidence Interval, OR\u0026thinsp;=\u0026thinsp;Odds Ratio\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003ePerformance of machine learning models on the training dataset (80%)\u003c/h2\u003e \u003cp\u003eThe predictive performance of five machine learning models (logistic regression, decision tree, support vector machine (SVM) with radial basis function (RBF) kernel, XGBoost, and LightGBM) were first evaluated on the training dataset. Overall, all models demonstrated high classification performance, with their accuracy values ranging from 0.95 to 0.98 and AUC-ROC scores exceeding 0.93 (Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eResults of ML models applied on training (80%) dataset.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"8\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModels\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eF1-score\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSupport\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eClassification report Accuracy\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eAUC-ROC Score\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003eCross-Validation Accuracy\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colspan=\"8\" nameend=\"c8\" namest=\"c1\"\u003e \u003cp\u003eLogistic Regression\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCases\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e213\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e0.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e0.96 (+/- 0.03)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eControl\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.89\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.88\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.89\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e66\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMacro avg\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.93\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.92\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.93\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e279\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eWeighted avg\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e279\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colspan=\"8\" nameend=\"c8\" namest=\"c1\"\u003e \u003cp\u003e\u003cb\u003eDecision tree\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCases\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e142\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e0.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e0.93\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e0.96 (+/- 0.02)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eControl\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.87\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.91\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.89\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e44\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMacro avg\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.92\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.93\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.93\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e186\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eWeighted avg\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e186\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colspan=\"8\" nameend=\"c8\" namest=\"c1\"\u003e \u003cp\u003e\u003cb\u003eSVM with RBF Kernel\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCases\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e142\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e0.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e0.96 (+/- 0.03)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eControl\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.91\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e44\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMacro avg\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e186\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eWeighted avg\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e186\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colspan=\"8\" nameend=\"c8\" namest=\"c1\"\u003e \u003cp\u003e\u003cb\u003eXGBOOST (optimized)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCases\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e142\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e0.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e0.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e0.96 (+/- 0.02)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eControl\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.89\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.92\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e44\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMacro avg\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.94\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e186\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eWeighted avg\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e186\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colspan=\"8\" nameend=\"c8\" namest=\"c1\"\u003e \u003cp\u003e\u003cb\u003eLight GBM\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCases\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.93\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e44\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e0.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e0.98 (\u0026plusmn;\u0026thinsp;0.02)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eControl\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e142\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMacro avg\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e186\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eWeighted avg\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e186\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eLogistic regression achieved an accuracy of 0.95 and an AUC-ROC of 0.98, indicating strong baseline performance, as further supported by its confusion matrix and ROC curve (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). The decision tree model yielded a similar accuracy (0.95) but a comparatively lower AUC-ROC (0.93), with its confusion matrix reflecting slightly higher misclassification of controls (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e). In contrast, the SVM with RBF kernel demonstrated superior classification performance, achieving an accuracy of 0.98 and an AUC-ROC of 0.99, with near-perfect classification observed in the confusion matrix and a highly discriminative ROC curve (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe ensemble-based models further enhanced predictive performance, XGBoost, achieved an accuracy of 0.96 and an AUC-ROC of 0.99, with its confusion matrix indicating improved balance between sensitivity and specificity and strong probability separation (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e). LightGBM demonstrated one of the best overall performances with an accuracy of 0.98 and an AUC-ROC of 0.99, supported by a well-defined confusion matrix and strong feature importance patterns. Additionally, SHAP summary plots highlighted the relative contribution of features, while the learning curve confirmed model stability and minimal overfitting (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e). Importantly, LightGBM exhibited the highest cross-validation accuracy (0.98\u0026thinsp;\u0026plusmn;\u0026thinsp;0.02), indicating robust generalization (Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003ePerformance on the independent testing dataset (20%)\u003c/h2\u003e \u003cp\u003eThe generalizability of the models was assessed on an independent testing dataset. The performance trends observed during training were largely preserved (Table\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e). SVM (RBF kernel) and LightGBM continued to outperform other models, each achieving an accuracy of 0.98 and an AUC-ROC of 0.99. XGBoost also maintained strong performance (accuracy 0.96, AUC-ROC 0.99).\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab4\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eResults of ML models applied on testing (20%) dataset.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAccuracy\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eAUC-ROC score\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eTop features\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eLogistic Regression\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eDecision Tree\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.93\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eSVM (RBF Kernel)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eBMI, VLDL, LDL, Hypertension, HDL\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eXGBoost (optimized)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eBMI, LDL, Hypertension, VLDL, HDL\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eLightGBM\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eBMI, HDL, LDL, VLDL, Age\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eLogistic regression and decision tree models showed stable but comparatively lower performance, both achieving an accuracy of 0.95, with AUC-ROC values of 0.99 and 0.93, respectively. Confusion matrix analysis of LightGBM on the testing dataset demonstrated high true positive and true negative rates, indicating excellent classification balance (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003e). Detailed classification metrics for LightGBM further confirmed high precision, recall, and F1-scores across both classes (Table\u0026nbsp;\u003cspan refid=\"Tab5\" class=\"InternalRef\"\u003e5\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab5\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 5\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eParameters of Light GBM applied on testing dataset (20%).\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"8\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLight GBM\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eF1-score\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSupport\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eClassification report Accuracy\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eAUC-ROC Score\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003eCross-Validation Accuracy\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCases\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.93\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e44\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e0.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e0.98(\u0026plusmn;\u0026thinsp;0.02)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eControl\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e142\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003emacro average\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e186\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eWeighted average\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e186\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eFeature importance analysis on the testing dataset revealed that BMI, LDL, VLDL, HDL, and hypertension were consistently among the top predictors across models (Table\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e). These findings emphasize the dominant contribution of lipid abnormalities and metabolic risk factors in IHD prediction.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003ePerformance on unseen test dataset (10%)\u003c/h2\u003e \u003cp\u003eTo further evaluate model robustness, performance was assessed on a completely unseen test dataset (Table\u0026nbsp;\u003cspan refid=\"Tab6\" class=\"InternalRef\"\u003e6\u003c/span\u003e). LightGBM achieved perfect classification performance, with an accuracy of 1.00 and an AUC-ROC of 1.00, indicating exceptional generalizability. This was supported by its confusion matrix, which showed no misclassification (Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003e). SVM maintained strong performance with an accuracy of 0.98 and an AUC-ROC of 0.99, followed by XGBoost (accuracy 0.96, AUC-ROC 0.99). Logistic regression and decision tree models demonstrated consistent but lower performance, each achieving an accuracy of 0.95. Detailed classification metrics for LightGBM on the unseen dataset showed high precision (0.97 for cases, 0.95 for controls), recall (0.99 for cases, 0.91 for controls), and F1-scores (0.98 and 0.93, respectively), further supporting its reliability (Table\u0026nbsp;\u003cspan refid=\"Tab7\" class=\"InternalRef\"\u003e7\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab6\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 6\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eResults of ML models applied on unseen testing (10%) dataset.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAccuracy\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eAUC-ROC score\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eLogistic Regression\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eDecision Tree\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.93\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eSVM (RBF Kernel)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.99\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eXGBoost (optimized)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.99\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eLightGBM\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.00\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab7\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 7\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eParameters of Light GBM applied on unseen testing dataset (10%).\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"8\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLight GBM\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eF1-score\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSupport\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eClassification report Accuracy\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eAUC-ROC Score\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003eCross-Validation Accuracy\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCases\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e71\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e0.97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e1.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e0.96 (+/- 0.03)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eControl\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.91\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.93\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e22\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMacro average\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e93\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eWeighted average\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e93\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003eFeature importance and interpretability\u003c/h2\u003e \u003cp\u003eAcross all models, lipid-related variables (including LDL, VLDL, and HDL) along with BMI emerged as the most influential predictors of IHD. Hypertension and age also contributed significantly. The consistency of these predictors across models reinforces their biological and clinical relevance.\u003c/p\u003e \u003cp\u003eSHAP (SHapley Additive exPlanations) analysis of the LightGBM model provided further interpretability, demonstrating that higher BMI and adverse lipid levels were strongly associated with increased predicted risk of IHD (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e). The SHAP summary plot indicated that BMI and LDL had the largest impact on model output, followed by VLDL, HDL, and hypertension.\u003c/p\u003e \u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003eThis study presents a novel and clinically relevant application of machine learning (ML) models for the prediction of ischemic heart disease (IHD) using lipid profile parameters integrated with demographic and anthropometric characteristics. To the best of our knowledge, this is among the first studies to systematically evaluate and compare multiple advanced ML algorithms specifically on a lipidemia-centered dataset for IHD prediction. While previous research has explored AI in cardiovascular disease, most studies have incorporated heterogeneous inputs such as imaging, genomics, or electronic health records, often without isolating the predictive contribution of lipid fractions. In contrast, our work demonstrates that biochemical markers are routinely available, combined with basic clinical variables and can achieve highly accurate and generalizable predictions, highlighting both novelty and translational potential.\u003c/p\u003e \u003cp\u003eThe observed differences in lipid parameters between cases and controls in our cohort align robustly with established epidemiological evidence. Cardiovascular disease is the leading cause of morbidity and mortality worldwide, and dyslipidemia is one of its major risk factors, with LDL cholesterol serving as the key transmitter of cholesterol to the vascular artery wall. Specifically, elevated levels of triglycerides and triglyceride-rich lipoproteins including VLDL and intermediate-density lipoprotein (IDL) are independently recognized as important cardiovascular risk factors that can traverse the endothelium, accumulate, and promote atherosclerosis progression\u003csup\u003e\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e\u003c/sup\u003e. These mechanisms directly support the significantly higher TG, VLDL, and LDL values observed in our IHD cases compared to controls (all p\u0026thinsp;\u0026lt;\u0026thinsp;0.001; Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). Further underscoring the atherogenic role of triglyceride-rich lipoproteins, prospective data from the Women's Health Study confirmed that TRL-C and sdLDL-C concentrations were significantly higher in cardiovascular events case groups (myocardial infarction) compared with reference sub-cohorts (hazard ratio: 3.71 [95% CI: 1.59 to 8.63]; p\u0026thinsp;\u0026lt;\u0026thinsp;0.001), and that their cholesterol content influences atherogenesis independently of LDL cholesterol\u003csup\u003e\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e\u003c/sup\u003e. Taken together, these findings validate the lipid parameters selected in our feature set as clinically meaningful predictors of IHD.\u003c/p\u003e \u003cp\u003eAn important finding of our study is the limited predictive contribution of HDL cholesterol, whose levels did not differ significantly between cases and controls (p\u0026thinsp;=\u0026thinsp;0.082; Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). A high level of high prebeta-1 HDL has been associated with ischemic heart disease, suggesting functional impairment of cholesterol efflux from the artery wall, indicating that HDL functionality rather than its absolute concentration may be a more relevant determinant of cardiovascular protection\u003csup\u003e\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e,\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e\u003c/sup\u003e. This insight is consistent with our ML feature importance results, in which HDL ranked lower than TG, VLDL, and LDL across models, reinforcing the ability of data-driven approaches to refine traditional lipid risk paradigms.\u003c/p\u003e \u003cp\u003eBeyond lipid parameters, BMI emerged as the most influential predictor across all models in both the training and testing phases (Tables\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e and \u003cspan refid=\"Tab6\" class=\"InternalRef\"\u003e6\u003c/span\u003e). This finding is strongly supported by literature. A study reported that the BMI is causally associated with atrial fibrillation, heart failure, and ischemic stroke, and that a 5 kg/m\u0026sup2; increase in BMI raised risk of coronary heart disease by approximately 1.23-fold\u003csup\u003e\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e\u003c/sup\u003e. Furthermore, elevated BMI is associated with increased body fat, which can lead to hypertension, dyslipidemia, and insulin resistance which are all major risk factors for high CVD risk, with overweight individuals experiencing increased strain on the cardiovascular system\u003csup\u003e\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e\u003c/sup\u003e. In parallel, a recent ML study of 7,260 participants identified body fat percentage and lipid profiles as significant predictors of CHD, with SHAP analysis confirming the contribution of non-HDL cholesterol and blood pressure\u003csup\u003e\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e\u003c/sup\u003e. The consistent importance of hypertension in our models (present in 63% of cases vs. 22% of controls, p\u0026thinsp;\u0026lt;\u0026thinsp;0.001; Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e) is similarly well-established, as hypertension and dyslipidemia are two highly prevalent and modifiable risk factors in stable ischemic heart disease, with multiple lines of evidence demonstrating that lowering blood pressure and LDL cholesterol improves clinical outcomes\u003csup\u003e\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eAmong all algorithms evaluated, LightGBM demonstrated the best overall performance, achieving an accuracy of 0.98 on the 20% test dataset and 1.00 on the unseen 10% dataset, with AUC-ROC values of 0.99 and 1.00, respectively (Tables\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e and \u003cspan refid=\"Tab6\" class=\"InternalRef\"\u003e6\u003c/span\u003e). These results compare favorably with the broader ML cardiovascular literature. A recent LightGBM-based framework for coronary heart disease prediction trained on the BRFSS_2015 dataset achieved an accuracy of approximately 90.6% accuracy and an AUC of 0.81, significantly outperforming the Framingham Risk Score, which achieved an accuracy of 79.00% and an AUC of 70.00%\u003csup\u003e30\u003c/sup\u003e. Our findings substantially exceed these benchmarks, which may be attributed to the focused selection of lipid and anthropometric predictors and the higher discriminability of our case-control cohort. Advanced classification models like XGBoost and LightGBM have consistently outperformed logistic regression and decision trees for heart disease prediction\u003csup\u003e\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e\u003c/sup\u003e, a pattern confirmed in our own comparative results where LightGBM surpassed all other classifiers across training, testing, and unseen datasets. The high cross-validation accuracy of LightGBM (0.98\u0026thinsp;\u0026plusmn;\u0026thinsp;0.02; Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e), combined with its stable learning curves (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e), indicates minimal overfitting and supports the robustness and reliability of the model.\u003c/p\u003e \u003cp\u003eThe SVM model with RBF kernel also demonstrated strong clinical utility, particularly in terms of sensitivity, achieving a recall of 1.00 for IHD cases in the training dataset (Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e) and an accuracy of 0.98 and AUC of 0.99 on the test dataset (Table\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e). This is clinically significant because high sensitivity is essential in screening applications where failure to detect high-risk individuals has serious consequences. These results are consistent with prior studies: an SVM classifier built using the RBF kernel for coronary heart disease classification based on blood pressure and plasma lipid data achieved superior accuracy, sensitivity, and specificity compared to artificial neural networks, linear discriminant analysis, and logistic regression\u003csup\u003e\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e\u003c/sup\u003e. However, compared to LightGBM, SVM lacks inherent interpretability, which may limit its standalone clinical utility.\u003c/p\u003e \u003cp\u003eTo address the interpretability challenge, SHAP (SHapley Additive exPlanations) analysis was applied to the LightGBM model, providing transparent and clinically actionable insights (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e). Explainable AI (XAI) techniques such as SHAP and LIME have been increasingly integrated as clinical transparency tools, enabling physicians to understand the factors influencing model predictions\u003csup\u003e\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e\u003c/sup\u003e. In our study, SHAP analysis confirmed that higher BMI, LDL, VLDL, HDL, and age were the most influential predictors, consistent with established biological mechanisms and aligning with the top features identified in Tables\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e and \u003cspan refid=\"Tab6\" class=\"InternalRef\"\u003e6\u003c/span\u003e. Obesity as measured by BMI contributed significantly to coronary artery disease prediction in SHAP-based analyses, consistent with umbrella reviews and Mendelian randomization studies that confirm obesity as a causal factor for CAD\u003csup\u003e\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e\u003c/sup\u003e. This convergence of data-driven and mechanistic evidence reinforces the translational validity of our model's predictions.\u003c/p\u003e \u003cp\u003eRegarding model generalizability, LightGBM maintained high and stable performance across training (accuracy 0.98, AUC 0.99), testing (accuracy 0.98, AUC 0.99), and the fully unseen dataset (accuracy 1.00, AUC 1.00), demonstrating that the model did not simply overfit to the training data. Systematic reviews of ML models for cardiovascular disease risk using electronic health records have reported a rising interest in related papers, particularly throughout the past five years, with increasing emphasis on the need for multi-dataset validation to confirm generalizability\u003csup\u003e\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e\u003c/sup\u003e. Our three-dataset validation strategy that includes training (80%), test (20%), and unseen (10%) datasets directly address this recognized gap, distinguishing our study from many prior works that rely on a single train-test split. Meta-analyses indicate that ensemble and neural-network models often outperform conventional statistical approaches, and several explainable-AI frameworks combining machine learning with SHAP have revealed key predictors including blood pressure, lipids, and glycated hemoglobin that drive model decisions\u003csup\u003e\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eDespite these strengths, certain limitations must be acknowledged. First, the study is based on a single institutional dataset, and external validation in geographically and ethnically diverse populations is necessary to confirm generalizability. Second, while the lipid-centered feature set enhances clinical applicability and reflects routinely collected data, the inclusion of additional biomarkers such as inflammatory markers (e.g., hsCRP), genetic variants, or imaging-derived features may further improve predictive performance. Third, the cross-sectional nature of the data precludes the assessment of longitudinal IHD risk trajectories.\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eIn sum, this study demonstrates that ML models, particularly LightGBM, can achieve highly accurate and generalizable prediction of ischemic heart disease using routinely available lipid, anthropometric, and clinical variables. The superiority of LightGBM over other classifiers, combined with SHAP-based interpretability, the high recall of SVM for IHD cases, and robust performance across three validation datasets, collectively establish a practical and clinically actionable ML-driven framework for early IHD risk stratification.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e \u003cstrong\u003eEthics approval and consent to participate:\u003c/strong\u003e \u003cp\u003eThe research was approved from Research Ethics and Biosafety Committee of University of the Punjab, Pakistan that follows Helsinki Declaration. Written informed consent was obtained from all participants prior to enrollment.\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eConsent for publication:\u003c/strong\u003e \u003cp\u003eNot applicable.\u003c/p\u003e \u003c/p\u003e\u003cp\u003e \u003ch2\u003eCompeting interests:\u003c/h2\u003e \u003cp\u003eThe authors have no competing interests.\u003c/p\u003e \u003c/p\u003e\u003ch2\u003eFunding:\u003c/h2\u003e \u003cp\u003eThis research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.\u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eA.I.: Designed the study, data collection, data curation and analysis, wrote and proofread the manuscript. S.A.: Data analysis, prepared tables, and figures, and wrote and proofread the manuscript. F.T.: Data analysis, prepared tables, and figures, S.: Proofread, reviewed, and approved the draft of manuscript.\u003c/p\u003e\u003ch2\u003eAcknowledgements:\u003c/h2\u003e \u003cp\u003enot applicable\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eThe datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eDharmakumar, R. et al. Reperfused myocardial infarction: the road to CCS classification of acute MI and beyond. \u003cem\u003eJACC: Adv.\u003c/em\u003e \u003cb\u003e4\u003c/b\u003e, 101528 (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhou, X. et al. Global trends in ischemic heart disease mortality from 1990 to 2021 and 2036 projections: insights from GBD 2021 data. \u003cem\u003eGlobal Heart\u003c/em\u003e. \u003cb\u003e20\u003c/b\u003e, 92 (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShu, T. et al. Assessing global, regional, and national time trends and associated risk factors of the mortality in ischemic heart disease through global burden of disease 2019 study: population-based study. \u003cem\u003eJMIR public. health surveillance\u003c/em\u003e. \u003cb\u003e10\u003c/b\u003e, e46821 (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSiddiqi, A. K., Mubashir, E., Cheema, A. A. A., Noshab, M. \u0026amp; Naeem, M. Vol. 88 59\u0026ndash;60 (LWW, (2026).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHassan, K. M. et al. Abstract P266: burden and trends of cardiovascular disease and its attributable risk factor in Pakistan from 1990\u0026ndash;2019: a benchmarking analysis. \u003cem\u003eCirculation\u003c/em\u003e \u003cb\u003e149\u003c/b\u003e, AP266\u0026ndash;AP266 (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eUllah, S. A., Shah, S. T., Khan, S. \u0026amp; Khalil, A. A. Frequency and Pattern of Coronary Artery Disease and Its Associated Risk Factors in Stable Ischemic Heart Disease Patients Undergoing Coronary Angiography: A Cross Sectional Study. \u003cem\u003eJ. Khyber Coll. Dent.\u003c/em\u003e \u003cb\u003e10\u003c/b\u003e, 1\u0026ndash;6 (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLabreuche, J., Touboul, P. J. \u0026amp; Amarenco, P. Plasma triglyceride levels and risk of stroke and carotid atherosclerosis: a systematic review of the epidemiological studies. \u003cem\u003eAtherosclerosis\u003c/em\u003e \u003cb\u003e203\u003c/b\u003e, 331\u0026ndash;345 (2009).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eParhofer, K. G. \u0026amp; Laufs, U. Lipid profile and lipoprotein (a) testing. \u003cem\u003eDeutsches \u0026Auml;rzteblatt Int.\u003c/em\u003e \u003cb\u003e120\u003c/b\u003e, 582 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWan, H., Wu, H., Wei, Y., Wang, S. \u0026amp; Ji, Y. Novel lipid profiles and atherosclerotic cardiovascular disease risk: insights from a latent profile analysis. \u003cem\u003eLipids Health Dis.\u003c/em\u003e \u003cb\u003e24\u003c/b\u003e, 71 (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJeppesen, J. \u0026amp; Triglycerides high-density lipoprotein cholesterol, and risk of ischemic heart disease: a view from the Copenhagen Male Study. \u003cem\u003eMetab. Syndr. Relat. Disord.\u003c/em\u003e \u003cb\u003e1\u003c/b\u003e, 33\u0026ndash;53 (2003).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJeppesen, J., Hein, H. O., Suadicani, P. \u0026amp; Gyntelberg, F. Relation of high TG\u0026ndash;low HDL cholesterol and LDL cholesterol to the incidence of ischemic heart disease: an 8-year follow-up in the Copenhagen male study. \u003cem\u003eArterioscler. Thromb. Vasc. Biol.\u003c/em\u003e \u003cb\u003e17\u003c/b\u003e, 1114\u0026ndash;1120 (1997).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eInyaku, M. et al. Calculated small dense low-density lipoprotein cholesterol level is a predominant predictor for new onset of ischemic heart disease. \u003cem\u003eJ. Atheroscler. Thromb.\u003c/em\u003e \u003cb\u003e31\u003c/b\u003e, 232\u0026ndash;248 (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAlsabhan, W. \u0026amp; Alfadhly, A. Effectiveness of machine learning models in diagnosis of heart disease: a comparative study. \u003cem\u003eSci. Rep.\u003c/em\u003e \u003cb\u003e15\u003c/b\u003e, 24568 (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOgunpola, A., Saeed, F., Basurra, S., Albarrak, A. M. \u0026amp; Qasem, S. N. Machine learning-based predictive models for detection of cardiovascular diseases. \u003cem\u003eDiagnostics\u003c/em\u003e \u003cb\u003e14\u003c/b\u003e, 144 (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBani Hani, S. H. \u0026amp; Ahmad, M. M. Machine-learning algorithms for ischemic heart disease prediction: a systematic review. \u003cem\u003eCurr. Cardiol. Rev.\u003c/em\u003e \u003cb\u003e19\u003c/b\u003e, 87\u0026ndash;99 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChobanian, A. V. et al. Seventh report of the joint national committee on prevention, detection, evaluation, and treatment of high blood pressure. \u003cem\u003ehypertension\u003c/em\u003e 42, 1206\u0026ndash;1252 (2003).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJohnson, R., McNutt, P., MacMahon, S. \u0026amp; Robson, R. Use of the Friedewald formula to estimate LDL-cholesterol in patients with chronic renal failure on dialysis. \u003cem\u003eClin. Chem.\u003c/em\u003e \u003cb\u003e43\u003c/b\u003e, 2183\u0026ndash;2184 (1997).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eStojanov, D., Lazarova, E., Veljkova, E., Rubartelli, P. \u0026amp; Giacomini, M. Predicting the outcome of heart failure against chronic-ischemic heart disease in elderly population\u0026ndash;Machine learning approach based on logistic regression, case to Villa Scassi hospital Genoa, Italy. \u003cem\u003eJ. King Saud University-Science\u003c/em\u003e. \u003cb\u003e35\u003c/b\u003e, 102573 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGhiasi, M. M., Zendehboudi, S. \u0026amp; Mohsenipour, A. A. Decision tree-based diagnosis of coronary artery disease: CART model. \u003cem\u003eComput. Methods Programs Biomed.\u003c/em\u003e \u003cb\u003e192\u003c/b\u003e, 105400 (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYang, C., An, B. \u0026amp; Yin, S. in \u003cem\u003eIEEE International Conference on Systems, Man, and Cybernetics (SMC).\u003c/em\u003e 3153\u0026ndash;3158 (IEEE). 3153\u0026ndash;3158 (IEEE). (2018).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXu, Y. et al. Predicting ICU mortality in rheumatic heart disease: comparison of XGBoost and logistic regression. \u003cem\u003eFront. Cardiovasc. Med.\u003c/em\u003e \u003cb\u003e9\u003c/b\u003e, 847206 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOmotehinwa, T. O., Oyewola, D. O. \u0026amp; Moung, E. G. Optimizing the light gradient-boosting machine algorithm for an efficient early detection of coronary heart disease. \u003cem\u003eInf. Health\u003c/em\u003e. \u003cb\u003e1\u003c/b\u003e, 70\u0026ndash;81 (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDu, Z. \u0026amp; Qin, Y. Dyslipidemia and cardiovascular disease: current knowledge, existing challenges, and new opportunities for management strategies. \u003cem\u003eJ. Clin. Med.\u003c/em\u003e \u003cb\u003e12\u003c/b\u003e, 363 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDuran, E. K. et al. Triglyceride-rich lipoprotein cholesterol, small dense LDL cholesterol, and incident cardiovascular disease. \u003cem\u003eJ. Am. Coll. Cardiol.\u003c/em\u003e \u003cb\u003e75\u003c/b\u003e, 2122\u0026ndash;2135 (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSethi, A. A. et al. High pre-β1 HDL concentrations and low lecithin: cholesterol acyltransferase activities are strong positive risk markers for ischemic heart disease and independent of HDL-cholesterol. \u003cem\u003eClin. Chem.\u003c/em\u003e \u003cb\u003e56\u003c/b\u003e, 1128\u0026ndash;1137 (2010).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKane, J. P. \u0026amp; Malloy, M. J. Prebeta-1 HDL and coronary heart disease. \u003cem\u003eCurr. Opin. Lipidol.\u003c/em\u003e \u003cb\u003e23\u003c/b\u003e, 367\u0026ndash;371 (2012).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVu, T. et al. Machine learning model for predicting coronary heart disease risk: Development and validation using insights from a Japanese population\u0026ndash;based study. \u003cem\u003eJMIR cardio\u003c/em\u003e. \u003cb\u003e9\u003c/b\u003e, e68066 (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTasmurzayev, N. et al. Interpretable Machine Learning for Coronary Artery Disease Risk Stratification: A SHAP-Based Analysis. \u003cem\u003eAlgorithms\u003c/em\u003e \u003cb\u003e18\u003c/b\u003e, 697 (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi, J., Xue, Z., Zhang, M., Ji, S. \u0026amp; Lu, F. Machine learning-derived identification of an obesity and lipid metabolism-related genes signature for the diagnosis and molecular typing of acute myocardial infarction. \u003cem\u003eFrontiers Cardiovasc. Medicine\u003c/em\u003e \u003cb\u003e13\u003c/b\u003e, 1694872 .\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDeng, L., Lu, K. \u0026amp; Hu, H. An interpretable LightGBM model for predicting coronary heart disease: Enhancing clinical decision-making with machine learning. \u003cem\u003ePlos one\u003c/em\u003e. \u003cb\u003e20\u003c/b\u003e, e0330377 (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBanerjee, T. \u0026amp; Pa\u0026ccedil;al, İ. A systematic review of machine learning in heart disease prediction. \u003cem\u003eTurkish J. Biology\u003c/em\u003e. \u003cb\u003e49\u003c/b\u003e, 600\u0026ndash;634 (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhu, Y., Wu, J. \u0026amp; Fang, Y. Study on application of SVM in prediction of coronary heart disease. \u003cem\u003eSheng Wu Yi Xue Gong. Cheng Xue Za Zhi= J. Biomedical Engineering= Shengwu Yixue Gongchengxue Zazhi\u003c/em\u003e. \u003cb\u003e30\u003c/b\u003e, 1180\u0026ndash;1185 (2013).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSrinivasan, S. et al. An active learning machine technique based prediction of cardiovascular heart disease from UCI-repository database. \u003cem\u003eSci. Rep.\u003c/em\u003e \u003cb\u003e13\u003c/b\u003e, 13588 (2023).\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Ischemic heart disease, cardiovascular diseases, machine learning models, Lipid profile, LightGBM.","lastPublishedDoi":"10.21203/rs.3.rs-9464045/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9464045/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eIschemic heart disease (IHD) is the leading cause of cardiovascular mortality in Pakistan, yet accessible tools for early biochemical risk stratification remain limited. The present case-control study compared five supervised machine learning (ML) classifiers (Logistic Regression, Decision Tree, Support Vector Machine with Radial Basis Function kernel (SVM-RBF), XGBoost, and LightGBM) for IHD prediction using routinely available lipid profiles, anthropometric, and clinical variables in 929 participants (710 IHD cases, 219 healthy controls) recruited from two tertiary care centres in Lahore, Pakistan. Models were trained on a stratified 80% partition and validated on an independent 20% test set and a held-out 10% unseen dataset. LightGBM achieved the best performance, with an accuracy of 0.98 and AUC-ROC of 0.99 on both 80% train set and 20% test set, and perfect classification (accuracy 1.00, AUC-ROC 1.00) on the unseen dataset (10%), with a cross-validation accuracy of 0.98\u0026thinsp;\u0026plusmn;\u0026thinsp;0.02. SVM-RBF demonstrated comparably strong performance (accuracy 0.98, AUC-ROC 0.99). SHAP analysis identified BMI, LDL, VLDL, HDL, and age as the most influential predictors. These findings establish LightGBM as an accurate, interpretable, and generalizable framework for early IHD risk stratification in resource-limited settings.\u003c/p\u003e","manuscriptTitle":"Lipid profile-based prediction of ischemic heart disease using machine learning: a comparative analysis of classification algorithms","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-04-29 06:17:03","doi":"10.21203/rs.3.rs-9464045/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2026-04-30T07:06:11+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-04-29T15:39:40+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"73654851669661628589676612442712744662","date":"2026-04-29T15:37:40+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-04-29T15:32:29+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"268725505282432999651584998760115704848","date":"2026-04-29T15:29:29+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-04-29T08:47:58+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"340188918200967734416323802914164202629","date":"2026-04-29T08:08:54+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-04-29T06:48:07+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"223280908666910433665993995688693389605","date":"2026-04-29T04:44:52+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"52590589974415114771154323166710128034","date":"2026-04-29T02:01:27+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2026-04-29T00:33:44+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2026-04-27T18:45:48+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2026-04-20T11:30:42+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2026-04-20T11:30:03+00:00","index":"","fulltext":""},{"type":"submitted","content":"Scientific Reports","date":"2026-04-19T17:47:41+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"51d07f40-2fbd-49f1-9fb0-ee16a91b9c80","owner":[],"postedDate":"April 29th, 2026","published":true,"recentEditorialEvents":[{"type":"decision","content":"Revision requested","date":"2026-04-30T07:06:11+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-04-29T15:39:40+00:00","index":37,"fulltext":""},{"type":"reviewerAgreed","content":"73654851669661628589676612442712744662","date":"2026-04-29T15:37:40+00:00","index":36,"fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-04-29T15:32:29+00:00","index":35,"fulltext":""},{"type":"reviewerAgreed","content":"268725505282432999651584998760115704848","date":"2026-04-29T15:29:29+00:00","index":34,"fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-04-29T08:47:58+00:00","index":33,"fulltext":""},{"type":"reviewerAgreed","content":"340188918200967734416323802914164202629","date":"2026-04-29T08:08:54+00:00","index":32,"fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-04-29T06:48:07+00:00","index":31,"fulltext":""}],"rejectedJournal":[],"revision":"","amendment":"","status":"in-revision","subjectAreas":[{"id":67113997,"name":"Health sciences/Biomarkers"},{"id":67113998,"name":"Health sciences/Cardiology"},{"id":67113999,"name":"Biological sciences/Computational biology and bioinformatics"},{"id":67114000,"name":"Health sciences/Diseases"},{"id":67114001,"name":"Health sciences/Medical research"},{"id":67114002,"name":"Health sciences/Risk factors"}],"tags":[],"updatedAt":"2026-04-30T07:10:24+00:00","versionOfRecord":[],"versionCreatedAt":"2026-04-29 06:17:03","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9464045","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9464045","identity":"rs-9464045","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00