Machine learning models for predicting severe clinical events in hospitalized patients with coronary artery disease | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Machine learning models for predicting severe clinical events in hospitalized patients with coronary artery disease Hao Liu, Meijun Liu, Xinmiao Guan, Feng Cao, Changhao Liang, Zhongwen Qi, and 10 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8368403/v1 This work is licensed under a CC BY 4.0 License Status: Under Revision Version 1 posted 12 You are reading this latest preprint version Abstract Background and purpose: Coronary artery disease (CAD) represents the leading cause of mortality on a global scale, with severe clinical events such as resuscitation or death occurring frequently during the course of hospitalisation. The utility of existing predictive models may be constrained by their incomplete utilisation of the depth of electronic medical records (EMRs), which could limit their effectiveness and scope. This study aims to develop and validate interpretable risk prediction models to predict severe clinical events in hospitalized patients with coronary artery disease, enhancing clinical decision-making and patient management. Methods: We conducted a retrospective study using EMRs from CAD patients admitted to Xiyuan Hospital between 2016 and 2024. The dataset includes structured and unstructured data extracted via natural language processing (NLP) from EMRs. We developed five machine learning (ML), including Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Gaussian Naive Bayes (GNB), and Deep Neural Network (DNN). The discrimination ability was comprehensively evaluated by the area under the curve (AUC); sensitivity, specificity, and F1 score. SHapley Additive exPlanations (SHAP) were used to interpret model predictions. Results: Of the 6,971 patients included, 268 (3.84%) experienced severe clinical events during hospitalization. The DNN model demonstrated the best performance, with an AUC of 0.995 (95% CI: 0.985–0.999). The SHAP analysis demonstrated that the most significant predictors were the admission principal diagnoses of acute CAD, followed by the presence of urinary occult blood and the mental state of the patient. Conclusion: Using NLP and ML models to integrate data from EMRs enables early warning of severe clinical events in hospitalized CAD patients. The interpretable prediction models developed in this study can assist clinicians in more accurately predicting severe clinical events, thereby enhancing clinical decision-making and patient management. Coronary artery disease Machine learning Risk prediction models Electronic medical records SHAP Figures Figure 1 Figure 2 Figure 3 Figure 4 1. Introduction Coronary artery disease (CAD) is a form of heart disease characterized by the accumulation of atherosclerotic plaques in the epicardial coronary arteries, which leads to obstructive or non-obstructive lesions and subsequent myocardial damage[ 1 , 2 ]. The dynamic nature of the CAD process leads to varied clinical manifestations, which can be characterized as acute coronary syndromes (ACS) or chronic coronary syndromes (CCS). Its age-standardized Disability-Adjusted Life Years (DALYs) are 2,275.9 per 100,000, the highest of any disease worldwide[ 3 ]. Research from the World Health Organization (WHO) indicates that CAD is the leading cause of mortality worldwide, accounting for 13% of all deaths[ 4 ]. Severe clinical events, such as resuscitation or death, occur frequently during the hospitalization of patients with coronary artery disease. Accurately predicting these events, identifying risk factors, and enhancing management of high-risk patients are essential to improve patient prognosis[ 5 ]. However, the effectiveness of automated short-term risk of CAD prediction in hospital settings, particularly among low-risk individuals identified by conventional clinical guidelines, remains to be explored[ 6 ]. Electronic medical records (EMRs) encompass vast quantities of real-world patient data collected during routine clinical practice. They provide a rich resource of structured and unstructured medical data that can be used to capture the core characteristics of several diseases, making them ideal for clinical risk prediction and stratification. Although existing studies provide some insights, in-depth research utilizing EMRs remains insufficient[ 7 ]. A prediction score, developed using system-wide EMRs and machine learning (ML), can reveal residual disease risks not detected by conventional tools, which typically rely on a limited set of traditional risk factors[ 6 ]. Furthermore, utilizing unstructured data in EMRs—such as progress notes, nursing notes, chief complaints, and discharge summaries—is a complex and time-consuming process[ 8 ]. Natural Language Processing (NLP) merges linguistics with artificial intelligence to enable machines to understand and interpret text. It is being widely applied to extract insights from hidden information in clinical notes[ 9 ]. The application of NLP techniques to unstructured clinical text has the potential to enhance the performance of clinical prediction models[ 10 , 11 ]. Indeed, NLP has been adopted to computationally extract clinical information from EMRs for a wide range of applications ranging from advancing EMR-based clinical research[ 12 , 13 ] to supporting clinical decision-making[ 14 , 15 ]. Therefore, this study proposes to develop predictive models using five distinct ML techniques, drawing on EMRs data and NLP. The aim is to predict severe clinical events during the hospitalization of patients with CAD, thereby enabling clinicians to identify high-risk patients earlier and implement targeted interventions that may enhance clinical outcomes. 2 Methods 2.1 Data sources and Study population This retrospective study utilized a real-world dataset obtained from the EMRs of Xiyuan Hospital, stored in the China Traditional Chinese Medicine Cardiovascular Bank. The dataset encompasses routine medical information from 2,302,360 outpatient and inpatient visits for 317,625 patients with cardiovascular disease, recorded between 2016 and 2024 at Xiyuan Hospital. The collected EMRs include demographic information, diagnoses, admission notes, medical orders, medical examinations, disease course notes and discharge summaries all of which can be analyzed to develop the prediction model. All inpatients with a principal diagnosis of CAD were included in our analysis; in other words, these patients were admitted for CAD rather than for other conditions. An all-comers design was used, with no exclusion criteria, to reflect real-world consultations. If a patient had multiple hospitalizations, only the most recent one was included. The selection of the population and the development, evaluation, and interpretation of the machine learning model are illustrated in Fig. 1 . 2.2 Features and outcomes Patient features extracted for the development of the ML models include 133 variables, covering nearly all information in the EMRs. These variables encompass daily indicators and characteristics relevant to the hospitalization of CAD patients. The prediction model constructed with these variables will be more practical. Specifically, the extracted variables include demographic information, diagnoses, both current and past medical histories, physical examinations, admission and discharge diagnoses, examination and test reports, admission and discharge records, prescriptions, course texts, and records of adverse events. For a more detailed list of variables and their availability, see Table S1 . The primary outcome of the study was severe clinical events, defined as occurrences of resuscitation or death during hospitalization. Additionally, in alignment with Chinese cultural perspectives, cases where resuscitation was not performed due to family members opting to discontinue treatment are also classified as severe clinical events. 2.3 NLP for Data extraction (1) Data preprocessing: The initial step was to divided the EMRs into paragraphs in order to extract the pertinent text, including the chief complaint, the patient's current and past medical history, their personal history, and so forth. Subsequently, the medical record text was subjected to word segmentation, stopword removal and part-of-speech tagging. Subsequently, data cleaning and standardization were conducted to guarantee data quality. (2) Model training: The BERT (Bidirectional Encoder Representations from Transformers) pre-trained model was employed to perform semantic comprehension and feature extraction of textual data, including lexical features, syntactic features and semantic features. (3) Information Extraction: A Named Entity Recognition (NER) model was employed to identify and classify medical entities in unstructured text, including disease names, symptoms, medical history and diagnoses. A relationship extraction algorithm was implemented to determine the relationships between entities in the text. (4) Post-processing: The post-processing of the identified medical entities comprises the removal of duplicate entities and the merging of adjacent entities, with the objective of enhancing the accuracy of the results. 2.4 Data preprocessing and feature selection The patient ID number is used as the unique identifier. We excluded features exhibiting more than 30% missing data across patients, and subsequently excluded patients who had more than 30% missing data across the remaining features[ 16 ]. The outliers in the data can increase the variability in the normal dataset, and data normalization is sensitive to the presence of outliers. Therefore, the interquartile range (IQR) was determined to detect and remove the outliers. The continuous variables were normalized using the MinMaxScaler before imputation, according to the following equation: X * = (X i - X min ) / (X max - X min ) Normalization is a strategy that compresses the range of data points, effectively making them more uniform and decreasing the variation between them. After normalization, all variables in the dataset were scaled to a range between 0 and 1. Disorderly multi-categorical variables were one-hot encoded, with each category mapped to a separate binary vector. This approach prevents the introduction of erroneous numerical relationships between categories, thereby avoiding inaccurate algorithmic predictions based on such relationships. However, one-hot encoding can significantly increase the dimensionality of the feature space when dealing with a large number of categories, potentially leading to computational complexity and overfitting issues. Multiple imputation was used for missing data comprising less than 30% of the dataset. Missing data is a critical factor that may introduce biases in ML modeling. Multiple imputation is considered the best method for modeling each characteristic as a function of other features[ 17 – 19 ]. It is a widely accepted approach that effectively handles missing values in both continuous and categorical variables[ 20 ]. Given that there were 189 positive samples in the training set, we planned to conduct one-way ANOVA tests, Lasso regression analyses, and covariate diagnostics for the 133 variables in the training set to screen for independent variables. Ultimately, we expect to include 18 variables in our model. All eligible variables were tested for collinearity using the variance inflation factor measure before being included in the final multivariable models. 2.5 Machine models Due to the significant imbalance in the ratio of positives to negatives, randomly splitting the data into training and validation sets may not be appropriate. Instead, we divided the dataset into training (70%) and testing (30%) subsets using stratified sampling to maintain the proportional representation of positives and negatives, thereby ensuring an unbiased assessment of model performance. The training dataset was utilized for hyperparameter tuning and training processes through 10-fold cross-validation, while the test dataset was reserved solely for the final evaluation of the model's performance. Five ML approaches were used to train predictive models for severe clinical events during the hospitalization of patients with CAD. The methods included logistic regression (LR), random forest (RF), Gaussian naive Bayes (GNB), extreme gradient boosting (XGBoost), and deep neural networks (DNN). 2.6 Model evaluation The discrimination of the models was evaluated using the area under the receiver operating characteristic curve (ROC-AUC). Confidence intervals (CIs) for the ROC-AUC were estimated with 1000 bootstrap replicas, each generated using a unique random seed by the normal bootstrap method. Additionally, we evaluated the model's performance using several metrics: Sensitivity, Specificity, Youden's J Index, Accuracy, Positive Predictive Value (PPV), Negative Predictive Value (NPV), F1 Score, and Matthews Correlation Coefficient (MCC). Detailed formulas and explanations for these metrics are available in Table S2 . 2.7 Model Interpretation Interpreting predictions from black-box machine learning models presents a significant challenge for health professionals. To address this, we employed the SHapley Additive exPlanation (SHAP) method to identify the contribution of each attribute in our prediction model, enhancing the interpretability of the generated models. SHAP values offer a consistent and accurate way to assign contribution values to each feature within each model, which we illustrated through a graphical summary of the predictors [ 21 ]. As a game-theoretic approach to model interpretability, SHAP scores provide insights into the global structure of the model by synthesizing local explanations for each prediction. 2.8 Statisic analysis Statistical analysis was performed using Python (version 3.10.0). Continuous variables were assessed for normality using the Kolmogorov-Smirnov test. Variables that conformed to normality were described by their mean and standard deviation, and differences between the two groups were compared using the t-test. Variables not following a normal distribution were described using the median and interquartile range (Q25, Q75), and differences between groups were assessed using the Mann-Whitney U test. Categorical data were presented as absolute numbers and percentages (n%), and differences between groups were analyzed using Pearson's chi-square test, continuity-corrected chi-square test, or Fisher's exact test as appropriate. 3 Results 3.1 Patient Characteristics A total of 8,126 individuals were hospitalized for CAD. Of these, 1,155 were excluded due to having more than 30% missing data, leaving 6,971 participants in the study (Fig. 1 ). Among them, 268 (3.84%) experienced severe clinical events during hospitalization (hereafter referred to as the “events group”), and 6,703 (96.16%) did not (hereafter referred to as the “no events group”). A total of 171 variables were extracted, and 38 variables with more than 30% missing values were excluded. Table S1 shows the all extracted variables and the missing rate. Detailed baseline characteristics for both groups can be found in Table 1 . For biochemical test indicators, including cardiac, liver, and renal function tests from blood and urine samples, refer to ( Table S3 ). As shown in Table 1 , There was no significant difference in the median length of hospital stay between the two groups, each recording 8 days, nor in their gender composition. In the events group, a higher percentage of dyspnea, insomnia, vomiting, coughing or expectoration, hypokalemia, hypoproteinemia, chronic renal insufficiency, pneumonia, electrolyte imbalance, hyponatremia, and acute exacerbation of chronic cardiac insufficiency were observed (Table 1 ). In addition, myoglobin, creatine kinase isoenzymes, white blood cell counts, blood sodium, blood uric acid, and fasting blood glucose levels were higher in the events group compared to the non events group ( Table S3 ). The CAD admission diagnosis types in our database were categorized into three main categories according to MeSH categories[ 22 ]: chronic CAD, acute CAD, and angina pectoris ( Table S4 ). The events group had a greater proportion with admission diagnosis types of acute CAD (55.22%), in contrast, the no events group had a greater proportion with admission diagnosis types of chronic CAD (70.03%). Table 1 Baseline clinical characteristics of patients with coronary artery disease Variables Events group (n = 268) a No events group (n = 6703) a P- value b Demographic information Age, years 76.50(68.00, 84.00) 69.00(61.00, 77.00) 0.001 Gender 0.180 female 106 (39.6%) 2929 (43.7%) male 162 (60.4%) 3774 (56.3%) Pulse rate, (Freq/Min) 78.12(72.46, 87.91) 73.13(68.59, 78.50) 0.001 Systolic blood pressure, noninvasive, (mmHg) 128.00(115.33, 142.33) 132.07(122.00, 143.67) 0.001 Diastolic blood pressure, noninvasive, (mmHg) 72.83(64.00, 80.75) 76.00(69.73, 83.00) 0.001 Length of hospitalization, (days) 8.00(7.00, 12.00) 8.00(6.00, 11.00) 0.068 CAD admission diagnosis types 0.001 Chronic CAD 106(39.55%) 4694(70.03%) Acute myocardial infarction 148(55.22%) 598(8.92%) Angina pectoris 14(5.22%) 1411(21.05%) Symptoms at the moment of admission Dyspnea 77(28.73%) 653(9.74%) 0.001 Insomnia 14(5.22%) 125(1.86%) 0.001 Nausea 29(10.82%) 537(8.01%) 0.099 Vomiting 40(14.93%) 593(8.85%) 0.001 Abdominal distension 17(6.34%) 291(4.34%) 0.118 Coughing or expectoration 171(63.81%) 979(14.61%) 0.001 Orthopnea 24(8.96%) 117(1.75%) 0.001 Of the chief complaints, chest pain, chest tightness, precordial pain duration ,years 2.00(0.10, 6.33) 4.00(1.00, 8.90) 0.001 Personal history Drinking history 33(12.31%) 1373(20.48%) 0.001 Smoking history 71(26.49%) 1912(28.52%) 0.470 Mental state 0.001 Well 123(45.90%) 5423(80.90%) General 8(2.99%) 202(3.01%) Lassitude 137(51.12%) 1074(16.02%) Anxiety 0(0.00%) 4(0.06%) Tongue c Tongue color Dark 146(54.48%) 4047(60.38%) 0.053 Red 159(59.33%) 4438(66.21%) 0.020 Light 110(41.04%) 2420(36.10%) 0.099 White 3(1.12%) 42(0.63%) 0.549 Purple 8(2.99%) 169(2.52%) 0.636 Tongue coating 0.001 White 92(34.33%) 2930(43.71%) 0.002 Yellow 86(32.09%) 2123(31.67%) 0.886 Thick 25(9.33%) 569(8.49%) 0.629 Greasy 108(40.30%) 3479(51.90%) 0.001 Less 19(7.09%) 240(3.58%) 0.003 Large fat tongues 4(1.49%) 80(1.19%) 0.877 Pulse signals c 0.001 Wiry pulse 122(45.52%) 3833(57.18%) 0.001 Thin pulse 76(28.36%) 1476(22.02%) 0.014 Slippery pulse 88(32.84%) 2726(40.67%) 0.010 Sunken pulse 68(25.37%) 1540(22.97%) 0.361 Rapid pulse 23(8.58%) 364(5.43%) 0.027 Slow pulse 6(2.24%) 233(3.48%) 0.275 Weak pulse 26(9.70%) 356(5.31%) 0.002 Intermittent pulse 11(4.10%) 107(1.60%) 0.002 Rough pulse 10(3.73%) 234(3.49%) 0.834 Co-morbidity Hypokalemia 53(19.78%) 512(7.64%) 0.001 Hypoproteinemia 58(21.64%) 154(2.30%) 0.001 Chronic renal insufficiency 49(18.28%) 474(7.07%) 0.001 Pneumonia 89(33.21%) 363(5.42%) 0.001 Electrolyte imbalance 32(11.94%) 73(1.09%) 0.001 Hyponatremia 34(12.69%) 99(1.48%) 0.001 Cardiac insufficiency 117(43.66%) 1741(25.97%) 0.001 Acute exacerbation of chronic cardiac insufficiency 60(22.39%) 456(6.80%) 0.001 Hypertension classification 0.001 No Hypertension 105(39.18%) 1883(28.09%) Stage 1 Hypertension 6(2.24%) 482(7.19%) Stage 2 Hypertension 46(17.16%) 1617(24.12%) Stage 3 Hypertension 111(41.42%) 2721(40.59%) Cardiac function grading (NYHA) 0.001 NYHA-0 204(76.12%) 5311(79.23%) NYHA-1 4(1.49%) 92(1.37%) NYHA-2 8(2.99%) 540(8.06%) NYHA2-3 21(7.84%) 587(8.76%) NYHA-3 19(7.09%) 131(1.95%) NYHA-4 12(4.48%) 42(0.63%) a : For normally distributed data, express as “mean ± standard” deviation; for non-normally distributed data, express as “ median (25th percentile, 75th percentile) ”; for categorical data, express as “ (n%)”. b : Statistical tests include: t-test, Mann-Whitney U test, Chi-squared tests (basic Chi-squared, Pearson Chi-squared, continuity correction Chi-squared), and Fisher's exact test. c : Traditional Chinese medicine signs and symptoms. Abbreviations: CAD, Coronary artery disease; NYHA, New York Heart Association Functional Classification. 3.2 Model Development and Evaluation Finally18 variables were screened out of 133 variables by one-way ANOVA tests, lasso regression analyses, and covariance diagnostics. The variance inflation factor (VIF) for these 18 variables was under 10, indicating acceptably low multicollinearity. Details of the Lasso regression analysis are provided in Figure S1 A and S1B . Covariate diagnostics are available in Table S5 . We employed five widely used machine learning algorithms to construct the predictive models: Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Gaussian Naive Bayes (GNB), and Deep Neural Networks (DNN). The characteristics and parameters of these algorithms are summarized in Table S6 . Table 2 shows the performance comparison of the 5 models in the training set and validation. The result demonstrated that the DNN showed the best performance, and has outstanding advantages in various evaluation indicators. It has high AUC values in the validation sets (Fig. 2 ). In addition, it has a sensitivity of 0.873, a specificity of 0.999, a Yoden index of 0.872, an accuracy of 0.994, a Positive Predictive Value (PPV) of 0.972, a Matthews Correlation Coefficient (MCC) of 0.918, and a Negative Predictive Value (NPV) of 0.995 in the validation set, which are the highest values for all five models in total. In contrast, the AUC of the other 4 models in the validation set ranged from 0.877–0.896 (Fig. 2 ), and although Specificity, Accuracy, and NPV were all high, Sensitivity, Youden J Index, PPV, F1 Score, and MCC were not high, which may be related to the uneven distribution of our data, and our number of positive samples was much lower than the number of negative samples. Table 2 Model’ performance in training and validation set Model AUC (95% CI) Accuracy Sensitivity Specificity Youden's J statistic PPV NPV F1 Score MCC Training set (n = 4879) LR 0.898(0.874–0.919) 0.963 0.169 0.995 0.164 0.571 0.967 0.258 0.278 RF 0.897(0.872–0.922) 0.962 0.005 1.000 0.005 1.000 0.962 0.011 0.071 XGBoost 0.996(0.992–0.998) 0.990 0.751 1.000 0.751 1.000 0.990 0.858 0.863 GNB 0.871(0.842–0.897) 0.892 0.619 0.903 0.522 0.204 0.983 0.321 0.319 DNN 0.987(0.980–0.993) 0.990 0.810 0.997 0.806 0.911 0.992 0.857 0.853 Validation set (n = 2092) LR 0.877(0.829–0.919) 0.966 0.241 0.994 0.235 0.613 0.971 0.368 0.385 RF 0.896(0.855–0.932) 0.964 0.038 1.000 0.038 1.000 0.964 0.073 0.191 XGBoost 0.888(0.854–0.920) 0.964 0.317 0.990 0.306 0.544 0.974 0.400 0.398 GNB 0.884(0.841–0.921) 0.892 0.671 0.900 0.571 0.209 0.986 0.331 0.327 DNN 0.995(0.985–0.999) 0.994 0.873 0.999 0.872 0.972 0.995 0.920 0.918 Abbreviations: AUC, area under the curve; CI, confidence interval; LR, logistic regression; RF, random forest; GNB, Gaussian naive Bayes; XGB, extreme gradient boosting; DNN, deep neural network; NPV, negative predictive value; PPV, positive predictive value; MCC: Matthews correlation coefficient. 3.3 Identification of Important Risk Factors Contributing to the Model The SHAP algorithm was utilized to assess the importance of each predictor variable in the DNN model's predictions. The variable importance plot, arranged in descending order, is shown in Figure 3 . The CAD admission diagnosis type emerged as the most predictive variable across all prediction horizons, followed closely by urinary occult blood, mental state (well, general, lassitude, anxiety), flushed tongue, and hypoproteinemia. The importance of each predictor in the remaining three models were also calculated using the SHAP algorithm, with the results presented in descending order in Figure S2A-C . To discern the positive and negative relationships between predictors and the target outcome, SHAP values were employed to identify mortality risk factors. As illustrated in Figure 4 , values to the left of the horizontal zero line (ranging from 0 to -1) indicate a decreasing probability of the predicted event, whereas values to the right (ranging from 0 to 1) indicate an increasing probability. The color coding of the dots represents the raw values of the features: red for high values and blue for low values. It is notable that most samples with a diagnosis of non-acute coronary artery disease cluster to the left of zero, suggesting a lower likelihood of in-hospital major events for these patients. Conversely, patients diagnosed with acute coronary artery disease typically cluster on the right, indicating a higher likelihood of experiencing an in-hospital major event. 4 Discussion 4.1 Principal Findings In this retrospective cohort study, we examined real-world clinical data from 6,971 EMRs of patients with CAD to develop and validate five interpretable ML models aimed at predicting severe in-hospital clinical events. The models, including XGBoost, RF, GNB and LR, demonstrated comparable AUC; however, the DNN model exhibited superior performance, achieving an AUC of 0.995(0.985-0.999), a specificity of 0.999, a sensitivity of 0.873, a Youden's index of 0.872, an F1 score of 0.920, and a MCC of 0.918. This model demonstrated favorable differentiation, calibration, and clinical applicability in both the training cohort and the validation cohort. To facilitate model interpretability, SHAP were utilized, identifying the admission diagnosis types of acute CAD, urinary occult blood, mental state (ranging from well, general, to lassitude and anxiety), flushed tongue, and hypoproteinemia as key predictors impacting the risk of severe clinical events in this patient population. Some studies investigated the in-hospital risk of AMI or post-percutaneous coronary intervention (PCI), but, to the best of our knowledge, our study is among the first to apply ML techniques to predict severe in-hospital clinical events across a general population of patients with CAD. We developed a prediction interpretable model (DNN) with good predictive capacity, which allows to understand the contributing factors and the paths to the rational decision. Predictors with clinical plausibility and easily available in most health services were used. 4.2 Relationship with previous studies Several prognostic risk prediction models for CAD have been developed, primarily focusing on the in-hospital prognosis of AMI or postoperative risks of PCI (such as mortality[23–27], acute kidney injury[28], heart failure[29], bleeding[30], mortality[31], adverse events duriling hospitalization[32]). Predictions of in-hospital prognosis for CAD were uncommon; only one study was found predicting in-hospital mortality for CAD patients with concurrent chronic kidney disease[33]. Additionally, several studies have addressed long-term outcomes, such as readmission[34, 35]for CAD and one-year mortality for AMI[36]. Building on the foundation of these studies, our inclusion criteria for CAD patients are broader, encompassing all forms of CAD: chronic CAD, angina, and AMI. There have been similar research ideas before, such as modifying the GRACE risk score to support its potential applicability in different subgroups of ACS[37]. Our inclusion of a broader population with coronary heart disease, along with outcomes that align closely with clinical practice, enhances the pragmatic utility of our prediction model. Similar to this study, the models employed in these studies predominantly utilize ML techniques. The commonly used ML models include XGBoost, LR, RF, Support Vector Machine (SVM), and GNB. 4.3 Implications for future research The development and validation of an ML-based predictive model for severe clinical events in patients with CAD using EMRs opens several avenues for future research that could further refine and enhance the utility of predictive models in clinical settings. (1) Expansion to other clinical conditions: While this study focused on coronary artery disease, the methodology could be adapted for other chronic diseases that require continuous monitoring and management, such as diabetes and chronic kidney disease. This could help in developing comprehensive models that can predict multiple adverse events across different disease spectrums. (2) Exploration of unstructured data: This study utilized NLP to extract data from unstructured sources within EMRs. Future research could delve deeper into more advanced NLP techniques and other forms of unstructured data, such as images (X-rays, electrocardiography) and notes from newer digital communication tools (telehealth transcripts), to enrich the data inputs into the ML models. (3) Multi-centric validation studies: The model developed in this study was validated using data from a single center. Conducting multi-centric studies would help validate the model across different populations and healthcare settings, enhancing its generalizability and robustness. (4) Integration with clinical decision support systems: Research into how these predictive models can be effectively integrated into existing clinical decision support systems could be crucial. This includes understanding the workflow of healthcare professionals and ensuring that the model's predictions are delivered in a way that seamlessly fits into clinical decision-making processes without disrupting them. (5) Advanced ML techniques: Exploring more advanced ML techniques, such as deep learning and reinforcement learning, could provide new ways of modeling complex clinical data and interactions. These methods might uncover patterns not visible with current techniques and improve prediction accuracy. 4.4 Strengths and limitations First, we constructed our predictive models using routine clinical data from EMRs. This approach efficiently leverages real-world information contained in EMRs to develop more practical and applicable risk prediction models. EMRs contain almost all clinical features about patients, including large amounts of structured data and unstructured data collected during regular clinical practice [38, 39]. Second, we employed NLP to extract data from EMRs. While structured clinical data, such as age, vital signs, and laboratory results, come in a fixed format that simplifies preprocessing, in contrast, clinical notes present unique challenges. These notes are often unstructured, containing abbreviations, grammatical errors, and misspellings, and use distinct clinical language and idioms that pose difficulties for health information research[40, 41]. NLP, which merges linguistics and artificial intelligence to enable machine understanding and interpretation of text, is increasingly being used to extract and analyze hidden information in clinical notes[9]. It has been widely adopted for computationally extracting clinical information from EMRs, with applications that range from enhancing EMR-based clinical research to supporting clinical decision-making[14, 15]. Third, we utilized ML to construct risk prediction models. In EMR-based health systems, patients accrue millions of heterogeneous clinical data points longitudinally, presenting significant challenges for analysis and interpretation without ML technologies[42–45]. Traditional risk prediction models, which rely on statistical methods, often struggle with issues such as variable correlation, heterogeneity, nonlinearity, and overfitting, particularly in complex data sets with numerous features[46]. Unlike traditional statistical methods that start with a predefined model and use data as input, ML adopts a data-driven approach. This method generalizes a model from the data itself, enhancing its applicability to new data[47]. ML methods are capable of identifying patterns in large data sets characterized by multidimensional and nonlinear relationships among clinical features, thereby predicting various outcomes more effectively[48]. While much of these data exist in unstructured text forms, ML techniques facilitate the application of algorithms that classify features or predict events from clinical notes[49]. These approaches often achieve higher sensitivity and specificity in identifying at-risk patients than methods relying on structured data[50, 51]. Consequently, ML has become a crucial component in the prevention, diagnosis, treatment, and support of clinical decisions. Specifically, ML models in EMRs have demonstrated superior performance to traditional survival models in predicting mortality among patients with CAD[52]. However, this study is subject to several limitations: Firstly, due to constraints inherent to the study's database, external validation of the data was not feasible. Instead, internal validation methods were employed, although external validation would provide a more robust proof of the model's applicability. Secondly, the diagnosis of coronary artery disease in the study population was based on clinical assessments rather than the gold standard of coronary angiography, introducing some diagnostic uncertainty. Third, although the sample size was not calculated specifically for the prediction model, the number of positive cases in the training set was 189 out of 268 occurrences in the study group. Consequently, the final model incorporated only 18 variables, significantly fewer than the total number collected. 5 Conclusions The interpretable predictive model helps physicians more accurately predict the severe clinical events for patients with coronary heart disease, therefore, enhancing clinical decision-making and patient management. In addition, the interpretable framework can increase the transparency of the model and facilitate understanding the reliability of the predictive model for physicians. Our machine learning algorithm especially DNN can predict, with high sensitivity and specificity, the impending occurrence of severe clinical events in-hospital for patients with coronary artery disease. Algorithm-generated predictive alerts modestly impacted clinical measures. The next steps include describing clinical perception of this tool and optimizing algorithm design and delivery. Abbreviations CAD Coronary artery disease EMRs electronic medical records NLP natural language processing LR machine learning RF Random Forest XGBoost Extreme Gradient Boosting GNB Gaussian Naive Bayes DNN Deep Neural Network AUC area under the curve SHAP SHapley Additive exPlanations ACS acute coronary syndromes CCS chronic coronary syndromes DALYs Disability-Adjusted Life Years WHO World Health Organization IQR interquartile range CIs Confidence intervals PPV Positive Predictive Value NPV Negative Predictive Value MCC Matthews Correlation Coefficien Declarations Data Availability Statement The original contributions presented in the study are included in the article/Supplementary Material; further inquiries can be directed to the corresponding author. Ethics statement This research was reviewed and approved by the institutional review board of Xiyuan Hospital of China Academy of Chinese Medical Sciences (registration number 2024XLA222-2). This study was conducted in accordance with the Declaration of Helsinki.Informed consent was waived by the review board. Author contributions H.L., M.L., Z.Q., contributed to the study design. X.G., F.C., C.L., M.L., J.H., J.Z., participated in the collection, analysis, or interpretation of data. J.Z., D.Z., L.L., H.X. performed the statistical analysis. F.X., Y.F., J.X. critically revised the article. All authors have read and approved the manuscript. Funding This work was supported by the program of Beijing Traditional Chinese Medicine Science and Technology Development Funding Program (No.BJZYZD-2023-04) and Beijing Natural Science Foundation-Haidian Original Innovation Joint Fund Key Research Project (L242058). Conflict of interest The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. Publisher’s note The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. Acknowledgements The authors would like to thank the Cardiovascular Disease Program of the China Center for Evidence-Based Medicine in Traditional Chinese Medicine for providing electronic medical record data utilized in this study. We also extend our gratitude to Beijing Yikang Medical Technology Co., Ltd., for their support with natural language processing, statistical analysis, and machine learning modeling. Supplementary Material Supplementary material associated with this article can be found, in the online version, at doi: . Supplemental Table S1 - S6 . Supplemental Figure S1- S2 . References Knuuti J, Wijns W, Saraste A, Capodanno D, Barbato E, Funck-Brentano C, et al. 2019 ESC Guidelines for the diagnosis and management of chronic coronary syndromes. Eur Heart J. 2020;41:407–77. https://doi.org/10.1093/eurheartj/ehz425 . Lawton JS, Tamis-Holland JE, Bangalore S, Bates ER, Beckie TM, Bischoff JM, et al. 2021 ACC/AHA/SCAI Guideline for Coronary Artery Revascularization: Executive Summary: A Report of the American College of Cardiology/American Heart Association Joint Committee on Clinical Practice Guidelines. Circulation. 2022;145:e4–17. https://doi.org/10.1161/CIR.0000000000001039 . Mensah GA, Fuster V, Murray CJL, Roth GA. Global Burden of Cardiovascular Diseases and Risks Collaborators. Global Burden of Cardiovascular Diseases and Risks, 1990–2022. J Am Coll Cardiol. 2023;82:2350–473. https://doi.org/10.1016/j.jacc.2023.11.007 . The top 10 causes of death. https://www.who.int/news-room/fact-sheets/detail/the-top-10-causes-of-death . Accessed 30 Aug 2024. Song L, Li Y, Nie S, Feng Z, Liu Y, Ding F, et al. Using machine learning to predict adverse events in acute coronary syndrome: A retrospective study. Clin Cardiol. 2023;46:1594–602. https://doi.org/10.1002/clc.24127 . Bs BOP. Coronary Risk Estimation Based on Clinical Data in Electronic Health Records. 2022;79. Forrest IS, Petrazzini BO, Duffy Á, Park JK, Marquez-Luna C, Jordan DM, et al. Machine learning-based marker for coronary artery disease: derivation and validation in two longitudinal cohorts. Lancet. 2023;401:215–25. https://doi.org/10.1016/S0140-6736(22)02079-7 . Yan MY, Gustad LT, Nytrø Ø. Sepsis prediction, early detection, and identification using clinical text for machine learning: a systematic review. J Am Med Inf Assoc. 2021;29:559–75. https://doi.org/10.1093/jamia/ocab236 . Kreimeyer K, Foster M, Pandey A, Arya N, Halford G, Jones SF, et al. Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review. J Biomed Inform. 2017;73:14. https://doi.org/10.1016/j.jbi.2017.07.012 . Sung S-F, Chen C-H, Pan R-C, Hu Y-H, Jeng J-S. Natural Language Processing Enhances Prediction of Functional Outcome After Acute Ischemic Stroke. J Am Heart Assoc. 2021;10:e023486. https://doi.org/10.1161/JAHA.121.023486 . Weissman GE, Hubbard RA, Ungar LH, Harhay MO, Greene CS, Himes BE, et al. Inclusion of Unstructured Clinical Text Improves Early Prediction of Death or Prolonged ICU Stay. Crit Care Med. 2018;46:1125–32. https://doi.org/10.1097/CCM.0000000000003148 . Liu F, Weng C, Yu H. Natural Language Processing, Electronic Health Records, and Clinical Research. In: Richesson RL, Andrews JE, editors. Clinical Research Informatics. London: Springer; 2012. pp. 293–310. https://doi.org/10.1007/978-1-84882-448-5_16 . Fu S, Leung LY, Wang Y, Raulli A-O, Kallmes DF, Kinsman KA, et al. Natural Language Processing for the Identification of Silent Brain Infarcts From Neuroimaging Reports. JMIR Med Inf. 2019;7. https://doi.org/10.2196/12109 . Harkema H, Chapman WW, Saul M, Dellon ES, Schoen RE, Mehrotra A. Focus on clinical and translational research: Developing a natural language processing application for measuring the quality of colonoscopy procedures. J Am Med Inf Association: JAMIA. 2011;18(Suppl 1):i150. https://doi.org/10.1136/amiajnl-2011-000431 . Demner-Fushman D, Chapman WW, McDonald CJ. What can Natural Language Processing do for Clinical Decision Support? J Biomed Inform. 2009;42:760. https://doi.org/10.1016/j.jbi.2009.08.007 . Vaid A, Chan L, Chaudhary K, Jaladanki SK, Paranjpe I, Russak A, et al. Predictive Approaches for Acute Dialysis Requirement and Death in COVID-19. Clin J Am Soc Nephrol. 2021;16:1158–68. https://doi.org/10.2215/CJN.17311120 . Harel O, Mitchell EM, Perkins NJ, Cole SR, Tchetgen Tchetgen EJ, Sun B, et al. Multiple Imputation for Incomplete Data in Epidemiologic Studies. Am J Epidemiol. 2018;187:576–84. https://doi.org/10.1093/aje/kwx349 . Chang C, Deng Y, Jiang X, Long Q. Multiple imputation for analysis of incomplete data in distributed health data networks. Nat Commun. 2020;11:5467. https://doi.org/10.1038/s41467-020-19270-2 . van Ginkel JR, Linting M, Rippe RCA, van der Voort A. Rebutting Existing Misconceptions About Multiple Imputation as a Method for Handling Missing Data. J Pers Assess. 2020;102:297–308. https://doi.org/10.1080/00223891.2018.1530680 . Stekhoven DJ, Bühlmann P. MissForest–non-parametric missing value imputation for mixed-type data. Bioinformatics. 2012;28:112–8. https://doi.org/10.1093/bioinformatics/btr597 . Lundberg SM, Lee S-I. A unified approach to interpreting model predictions. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates Inc.; 2017. pp. 4768–77. MeSH-NCBI. https://www.ncbi.nlm.nih.gov/mesh . Accessed 28 Oct 2024. Ke J, Chen Y, Wang X, Wu Z, Zhang Q, Lian Y, et al. Machine learning-based in-hospital mortality prediction models for patients with acute coronary syndrome. Am J Emerg Med. 2022;53:127–34. https://doi.org/10.1016/j.ajem.2021.12.070 . Khera R, Haimovich J, Hurley NC, McNamara R, Spertus JA, Desai N, et al. Use of Machine Learning Models to Predict Death After Acute Myocardial Infarction. JAMA Cardiol. 2021;6:633–41. https://doi.org/10.1001/jamacardio.2021.0122 . Oliveira M, Seringa J, Pinto FJ, Henriques R, Magalhães T. Machine learning prediction of mortality in Acute Myocardial Infarction. BMC Med Inf Decis Mak. 2023;23:70. https://doi.org/10.1186/s12911-023-02168-6 . Zhu X, Xie B, Chen Y, Zeng H, Hu J. Machine learning in the prediction of in-hospital mortality in patients with first acute myocardial infarction. Clin Chim Acta. 2024;554:117776. https://doi.org/10.1016/j.cca.2024.117776 . Zhao J, Zhao P, Li C, Hou Y. Optimized Machine Learning Models to Predict In-Hospital Mortality for Patients with ST-Segment Elevation Myocardial Infarction. Ther Clin Risk Manag. 2021;17:951–61. https://doi.org/10.2147/TCRM.S321799 . Song L, Li Y, Nie S, Feng Z, Liu Y, Ding F, et al. Using machine learning to predict adverse events in acute coronary syndrome: A retrospective study. Clin Cardiol. 2023;46:1594–602. https://doi.org/10.1002/clc.24127 . Chen S, Pan X, Mo J, Wang B. Establishment and validation of a prediction nomogram for heart failure risk in patients with acute myocardial infarction during hospitalization. BMC Cardiovasc Disord. 2023;23:619. https://doi.org/10.1186/s12872-023-03665-2 . Zhao X, Wang J, Yang J, Chen T, Song Y, Li X, et al. Machine learning for prediction of bleeding in acute myocardial infarction patients after percutaneous coronary intervention. Ther Adv Chronic Dis. 2023;14:20406223231158561. https://doi.org/10.1177/20406223231158561 . Al’Aref SJ, Singh G, van Rosendael AR, Kolli KK, Ma X, Maliakal G, et al. Determinants of In-Hospital Mortality After Percutaneous Coronary Intervention: A Machine Learning Approach. J Am Heart Assoc. 2019;8:e011160. https://doi.org/10.1161/JAHA.118.011160 . Niimi N, Shiraishi Y, Sawano M, Ikemura N, Inohara T, Ueda I, et al. Machine learning models for prediction of adverse events after percutaneous coronary intervention. Sci Rep-uk. 2022;12:6262. https://doi.org/10.1038/s41598-022-10346-1 . Ye Z, An S, Gao Y, Xie E, Zhao X, Guo Z, et al. The prediction of in-hospital mortality in chronic kidney disease patients with coronary artery disease using machine learning models. Eur J Med Res. 2023;28:33. https://doi.org/10.1186/s40001-023-00995-x . Ermak AD, Gavrilov DV, Novitskiy RE, Gusev AV, Andreychenko AE. Development, evaluation and validation of machine learning models to predict hospitalizations of patients with coronary artery disease within the next 12 months. Int J Med Inf. 2024;188:105476. https://doi.org/10.1016/j.ijmedinf.2024.105476 . Zhang Y, Zhu X, Gao F, Yang S. Systematic Review and Critical Appraisal of Prediction Models for Readmission in Coronary Artery Disease Patients: Assessing Current Efficacy and Future Directions. Risk Manag Healthc P. 2024;17:549–57. https://doi.org/10.2147/RMHP.S451436 . Lee HC, Park JS, Choe JC, Ahn JH, Lee HW, Oh J-H, et al. Prediction of 1-Year Mortality from Acute Myocardial Infarction Using Machine Learning. Am J Cardiol. 2020;133:23–31. https://doi.org/10.1016/j.amjcard.2020.07.048 . Georgiopoulos G, Kraler S, Mueller-Hennessen M, Delialis D, Mavraganis G, Sopova K, et al. Modification of the GRACE Risk Score for Risk Prediction in Patients With Acute Coronary Syndromes. JAMA Cardiol. 2023;8:946–56. https://doi.org/10.1001/jamacardio.2023.2741 . Hossain E, Rana R, Higgins N, Soar J, Barua PD, Pisani AR, et al. Natural Language Processing in Electronic Health Records in relation to healthcare decision-making: A systematic review. Comput Biol Med. 2023;155:106649. https://doi.org/10.1016/j.compbiomed.2023.106649 . Hu D, Li S, Zhang H, Wu N, Lu X. Using Natural Language Processing and Machine Learning to Preoperatively Predict Lymph Node Metastasis for Non-Small Cell Lung Cancer With Electronic Medical Records: Development and Validation Study. JMIR Med Inf. 2022;10:e35475. https://doi.org/10.2196/35475 . Ulrich CM, Grady C, Demiris G, Richmond TS. The Competing Demands of Patient Privacy and Clinical Research. Ethics Hum Res. 2021;43:25–31. https://doi.org/10.1002/eahr.500076 . Choudhary A, Choudhary A, Suman S. NLP Applications for Big Data Analytics Within Healthcare. In: Mishra S, Tripathy HK, Mallick P, Shaalan K, editors. Augmented Intelligence in Healthcare: A Pragmatic and Integrated Analysis. Singapore: Springer Nature; 2022. pp. 237–57. https://doi.org/10.1007/978-981-19-1076-0_13 . Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, et al. Scalable and accurate deep learning with electronic health records. NPJ Digit Med. 2018;1:18. https://doi.org/10.1038/s41746-018-0029-1 . Rajkomar A, Dean J, Kohane I. Machine Learning in Medicine. N Engl J Med. 2019;380:1347–58. https://doi.org/10.1056/NEJMra1814259 . Li L, Cheng W-Y, Glicksberg BS, Gottesman O, Tamler R, Chen R, et al. Identification of type 2 diabetes subgroups through topological analysis of patient similarity. Sci Transl Med. 2015;7:311ra174. https://doi.org/10.1126/scitranslmed.aaa9364 . Obermeyer Z, Lee TH. Lost in Thought - The Limits of the Human Mind and the Future of Medicine. N Engl J Med. 2017;377:1209–11. https://doi.org/10.1056/NEJMp1705348 . Zhuang X-D, Tian T, Liao L-Z, Dong Y-H, Zhou H-J, Zhang S-Z, et al. Deep Phenotyping and Prediction of Long-term Cardiovascular Disease: Optimized by Machine Learning. Can J Cardiol. 2022;38:774–82. https://doi.org/10.1016/j.cjca.2022.02.008 . Wiens J, Shenoy ES. Machine Learning for Healthcare: On the Verge of a Major Shift in Healthcare Epidemiology. Clin Infect Dis. 2018;66:149–53. https://doi.org/10.1093/cid/cix731 . Motwani M, Dey D, Berman DS, Germano G, Achenbach S, Al-Mallah MH, et al. Machine learning for prediction of all-cause mortality in patients with suspected coronary artery disease: a 5-year multicentre prospective registry analysis. Eur Heart J. 2017;38:500–7. https://doi.org/10.1093/eurheartj/ehw188 . Ribelles N, Jerez JM, Rodriguez-Brazzarola P, Jimenez B, Diaz-Redondo T, Mesa H, et al. Machine learning and natural language processing (NLP) approach to predict early progression to first-line treatment in real-world hormone receptor-positive (HR+)/HER2-negative advanced breast cancer patients. Eur J Cancer. 2021;144:224–31. https://doi.org/10.1016/j.ejca.2020.11.030 . Ling AY, Kurian AW, Caswell-Jin JL, Sledge GW, Shah NH, Tamang SR. Using natural language processing to construct a metastatic breast cancer cohort from linked cancer registry and electronic medical records data. JAMIA Open. 2019;2:528–37. https://doi.org/10.1093/jamiaopen/ooz040 . Carrell DS, Halgrim S, Tran D-T, Buist DSM, Chubak J, Chapman WW, et al. Using natural language processing to improve efficiency of manual chart abstraction in research: the case of breast cancer recurrence. Am J Epidemiol. 2014;179:749–58. https://doi.org/10.1093/aje/kwt441 . Steele AJ, Denaxas SC, Shah AD, Hemingway H, Luscombe NM. Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease. PLoS ONE. 2018;13:e0202344. https://doi.org/10.1371/journal.pone.0202344 . Additional Declarations No competing interests reported. Supplementary Files SupplementaryMaterial.docx Cite Share Download PDF Status: Under Revision Version 1 posted Editorial decision: Revision requested 12 Feb, 2026 Reviews received at journal 29 Jan, 2026 Reviewers agreed at journal 27 Jan, 2026 Reviews received at journal 25 Jan, 2026 Reviewers agreed at journal 24 Jan, 2026 Reviewers agreed at journal 11 Jan, 2026 Reviewers agreed at journal 08 Jan, 2026 Reviewers invited by journal 07 Jan, 2026 Editor invited by journal 22 Dec, 2025 Editor assigned by journal 19 Dec, 2025 Submission checks completed at journal 19 Dec, 2025 First submitted to journal 15 Dec, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8368403","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":572696056,"identity":"72482ba5-48c8-4c70-a6d7-a9031326d1ee","order_by":0,"name":"Hao Liu","email":"","orcid":"","institution":"Centre for Evidence-Based Chinese Medicine, Beijing University of Chinese Medicine","correspondingAuthor":false,"prefix":"","firstName":"Hao","middleName":"","lastName":"Liu","suffix":""},{"id":572696059,"identity":"75edf0db-f232-4a28-b2f1-926621382f4e","order_by":1,"name":"Meijun Liu","email":"","orcid":"","institution":"Centre for Evidence-Based Chinese Medicine, Beijing University of Chinese Medicine","correspondingAuthor":false,"prefix":"","firstName":"Meijun","middleName":"","lastName":"Liu","suffix":""},{"id":572696061,"identity":"c9204528-44c2-4341-8b56-10c5d6051842","order_by":2,"name":"Xinmiao Guan","email":"","orcid":"","institution":"Centre for Evidence-Based Chinese Medicine, Beijing University of Chinese Medicine","correspondingAuthor":false,"prefix":"","firstName":"Xinmiao","middleName":"","lastName":"Guan","suffix":""},{"id":572696063,"identity":"f7d2fbdc-9396-4c83-8187-c8d9e73f7ca2","order_by":3,"name":"Feng Cao","email":"","orcid":"","institution":"Centre for Evidence-Based Chinese Medicine, Beijing University of Chinese Medicine","correspondingAuthor":false,"prefix":"","firstName":"Feng","middleName":"","lastName":"Cao","suffix":""},{"id":572696065,"identity":"401aa92c-ee41-4f2f-a869-d27818c47443","order_by":4,"name":"Changhao Liang","email":"","orcid":"","institution":"Centre for Evidence-Based Chinese Medicine, Beijing University of Chinese Medicine","correspondingAuthor":false,"prefix":"","firstName":"Changhao","middleName":"","lastName":"Liang","suffix":""},{"id":572696066,"identity":"73b24c68-c970-40f4-bd13-2d4b2640ffb5","order_by":5,"name":"Zhongwen Qi","email":"","orcid":"","institution":"Xiyuan Hospital of China Academy of Chinese Medical Sciences","correspondingAuthor":false,"prefix":"","firstName":"Zhongwen","middleName":"","lastName":"Qi","suffix":""},{"id":572696067,"identity":"abec13d6-7a36-4532-87e6-1ad58a9a97ae","order_by":6,"name":"Jiaqi Hui","email":"","orcid":"","institution":"Xiyuan Hospital of China Academy of Chinese Medical Sciences","correspondingAuthor":false,"prefix":"","firstName":"Jiaqi","middleName":"","lastName":"Hui","suffix":""},{"id":572696068,"identity":"ca702a6d-e5c8-42ba-a7f2-4d592538d424","order_by":7,"name":"Junnan Zhao","email":"","orcid":"","institution":"Xiyuan Hospital of China Academy of Chinese Medical Sciences","correspondingAuthor":false,"prefix":"","firstName":"Junnan","middleName":"","lastName":"Zhao","suffix":""},{"id":572696069,"identity":"163b5186-bf6b-4124-bb94-543758651fe2","order_by":8,"name":"Jingli Xing","email":"","orcid":"","institution":"Centre for Evidence-Based Chinese Medicine, Beijing University of Chinese Medicine","correspondingAuthor":false,"prefix":"","firstName":"Jingli","middleName":"","lastName":"Xing","suffix":""},{"id":572696070,"identity":"ef294483-e83f-4b9c-a07b-644b6c78ffee","order_by":9,"name":"Jianguo Zhou","email":"","orcid":"","institution":"Beijing Yikang Medical Technology Co., Ltd","correspondingAuthor":false,"prefix":"","firstName":"Jianguo","middleName":"","lastName":"Zhou","suffix":""},{"id":572696071,"identity":"79ece32f-db72-4ff4-ae55-9fccd0eeb3ce","order_by":10,"name":"Dong Zhang","email":"","orcid":"","institution":"Beijing Yikang Medical Technology Co., Ltd","correspondingAuthor":false,"prefix":"","firstName":"Dong","middleName":"","lastName":"Zhang","suffix":""},{"id":572696072,"identity":"162eca13-08cf-4189-8f62-245e261edddb","order_by":11,"name":"Lei Liu","email":"","orcid":"","institution":"Beijing Yikang Medical Technology Co., Ltd","correspondingAuthor":false,"prefix":"","firstName":"Lei","middleName":"","lastName":"Liu","suffix":""},{"id":572696073,"identity":"cb035abf-b894-4ee7-acb1-efec391ce99f","order_by":12,"name":"Xiaoliang Hao","email":"","orcid":"","institution":"Beijing Yikang Medical Technology Co., Ltd","correspondingAuthor":false,"prefix":"","firstName":"Xiaoliang","middleName":"","lastName":"Hao","suffix":""},{"id":572696074,"identity":"5113a24c-50c7-4b1d-b31f-eaa0051a7b01","order_by":13,"name":"Minjing Luo","email":"","orcid":"","institution":"Centre for Evidence-Based Chinese Medicine, Beijing University of Chinese Medicine","correspondingAuthor":false,"prefix":"","firstName":"Minjing","middleName":"","lastName":"Luo","suffix":""},{"id":572696075,"identity":"575cbc1f-1401-4890-9cfb-3d31786f5942","order_by":14,"name":"Fengqin Xu","email":"","orcid":"","institution":"Xiyuan Hospital of China Academy of Chinese Medical Sciences","correspondingAuthor":false,"prefix":"","firstName":"Fengqin","middleName":"","lastName":"Xu","suffix":""},{"id":572696076,"identity":"90133e34-5f87-4313-84b7-4e00cbcfb686","order_by":15,"name":"Yutong Fei","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAxklEQVRIiWNgGAWjYBAC9gbGBgaGigMQHg8xWngOALUcOEOaFiBxsI0kLezNbdIf591J7J92gPHB2zYGeXOCWngOtkkc3PYsccbtBGbDuW0MhjsbCGixl0gEaTmcuEE6gU2at40hweAAIVvkHwK1zAFrYf9NnBYJRqCWBogtzMRp4Ulstjhz7JnxjNuJzZJzzkkYbiCohf34wxsVNXdk+2cnH/zwpsxGnqAtQMAiAaFByYBBgrB6IGD+QJSyUTAKRsEoGLkAABMcRUub8frxAAAAAElFTkSuQmCC","orcid":"","institution":"Centre for Evidence-Based Chinese Medicine, Beijing University of Chinese Medicine","correspondingAuthor":true,"prefix":"","firstName":"Yutong","middleName":"","lastName":"Fei","suffix":""}],"badges":[],"createdAt":"2025-12-15 16:24:00","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8368403/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8368403/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":100037128,"identity":"c0988ada-1560-4679-b65d-956420f88f74","added_by":"auto","created_at":"2026-01-12 10:27:25","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":567811,"visible":true,"origin":"","legend":"","description":"","filename":"BMCmanuscript3.docx","url":"https://assets-eu.researchsquare.com/files/rs-8368403/v1/c9540d527524b0f8d19cb246.docx"},{"id":100363515,"identity":"6a04383b-cdb3-4213-9ae9-2ab9a521f5ac","added_by":"auto","created_at":"2026-01-16 07:49:57","extension":"json","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":15283,"visible":true,"origin":"","legend":"","description":"","filename":"44528e979e7147fdaf94dd8746e38a7c.json","url":"https://assets-eu.researchsquare.com/files/rs-8368403/v1/95ad33d7899540e6efaf1c08.json"},{"id":100362711,"identity":"2b7e1bde-23d1-43fe-9d2e-231465141e7f","added_by":"auto","created_at":"2026-01-16 07:47:56","extension":"docx","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":931737,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryMaterial.docx","url":"https://assets-eu.researchsquare.com/files/rs-8368403/v1/fd2e0aa96f1ac3efbb4aa0fe.docx"},{"id":100037131,"identity":"4e637bc6-37a7-4ef5-95e0-a625cc77ac30","added_by":"auto","created_at":"2026-01-12 10:27:26","extension":"xml","order_by":3,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":173120,"visible":true,"origin":"","legend":"","description":"","filename":"44528e979e7147fdaf94dd8746e38a7c1enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-8368403/v1/5abb68aab15e6f93cbcdd9e4.xml"},{"id":100037121,"identity":"41c46430-9563-4f19-b165-03bd7c2455f5","added_by":"auto","created_at":"2026-01-12 10:27:25","extension":"jpeg","order_by":4,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":348992,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage1.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8368403/v1/de6a26ef12f65d4ba8691f1d.jpeg"},{"id":100363611,"identity":"f4e6dd6f-fa8a-467e-aa34-bade93bf1cc6","added_by":"auto","created_at":"2026-01-16 07:50:44","extension":"png","order_by":8,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":47111,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-8368403/v1/8e64754c84a969f1a1f35abb.png"},{"id":100362863,"identity":"5e86b42a-4eb3-4f95-87f9-1d6766ff3370","added_by":"auto","created_at":"2026-01-16 07:48:10","extension":"png","order_by":9,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":19992,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-8368403/v1/7ad5e6bfe589f1d7d0caac94.png"},{"id":100363155,"identity":"a4fb00d5-8db4-4566-b679-b5d58a364460","added_by":"auto","created_at":"2026-01-16 07:49:00","extension":"png","order_by":10,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":27586,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-8368403/v1/d4990274741d3f7bf8601e3b.png"},{"id":100363682,"identity":"78ab1ee8-27c4-4ab7-9f07-40cd30c5ad35","added_by":"auto","created_at":"2026-01-16 07:51:13","extension":"png","order_by":11,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":38259,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-8368403/v1/d20fd40f90e22f6ecfe87b20.png"},{"id":100037133,"identity":"414a83a8-dee5-489a-9208-d20ecb519dcd","added_by":"auto","created_at":"2026-01-12 10:27:26","extension":"xml","order_by":12,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":172045,"visible":true,"origin":"","legend":"","description":"","filename":"44528e979e7147fdaf94dd8746e38a7c1structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-8368403/v1/d3c6606278637cc50a922324.xml"},{"id":100037132,"identity":"d5616b53-ed7a-44eb-b060-fa6a4bcc24ee","added_by":"auto","created_at":"2026-01-12 10:27:26","extension":"html","order_by":13,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":188435,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-8368403/v1/ebabed7f7b3bd4e6e7eeb58e.html"},{"id":100037118,"identity":"13c2f608-ba6b-4fea-a86b-3b743e266d40","added_by":"auto","created_at":"2026-01-12 10:27:25","extension":"jpg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":200904,"visible":true,"origin":"","legend":"\u003cp\u003eFlowchart of the study design.\u003c/p\u003e","description":"","filename":"floatimage1.jpg","url":"https://assets-eu.researchsquare.com/files/rs-8368403/v1/3922177d34dbd6d8eff509b1.jpg"},{"id":100037119,"identity":"56bf719f-fe8a-4ced-aa08-b55d0ae6df7e","added_by":"auto","created_at":"2026-01-12 10:27:25","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":63706,"visible":true,"origin":"","legend":"\u003cp\u003eComparison of area under the ROC for machine learning models to identify severe clinical events in-hospital for patients with coronary artery disease\u003c/p\u003e\n\u003cp\u003e(A) Training set; (B) Validation set.\u003c/p\u003e\n\u003cp\u003eAbbreviations: AUC, area under the curve; LR, logistic regression; RF, random forest; GNB, Gaussian naive Bayes; XGB, extreme gradient boosting; DNN, deep neural network; TPR, True Positive Rate; FPR, False Positive Rate.\u003c/p\u003e","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-8368403/v1/f08a166bf1b89431227952a1.png"},{"id":100361941,"identity":"9cbe5eca-964b-4e99-82cd-419a97937085","added_by":"auto","created_at":"2026-01-16 07:45:57","extension":"jpeg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":107511,"visible":true,"origin":"","legend":"\u003cp\u003eMean(|SHAP value|) for DNN (average impact on model output magnitude)\u003c/p\u003e","description":"","filename":"floatimage3.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8368403/v1/641f9d81d95e631f8783e21b.jpeg"},{"id":100362806,"identity":"60f3f04f-c1dd-44c7-b169-eed21260892b","added_by":"auto","created_at":"2026-01-16 07:48:06","extension":"jpeg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":130803,"visible":true,"origin":"","legend":"\u003cp\u003eSHAP values for DNN (impact on model output)\u003c/p\u003e","description":"","filename":"floatimage4.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8368403/v1/f28977c51b80d10ec7060639.jpeg"},{"id":100381363,"identity":"3678ca21-45f0-4b26-b565-632c315de063","added_by":"auto","created_at":"2026-01-16 10:38:39","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1858069,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8368403/v1/758f17ca-96b3-4de3-9ffc-751fe519206f.pdf"},{"id":100037123,"identity":"c64841ba-291f-4480-92dc-87c20c7a1397","added_by":"auto","created_at":"2026-01-12 10:27:25","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":931737,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryMaterial.docx","url":"https://assets-eu.researchsquare.com/files/rs-8368403/v1/32a471d94f151ee4622ea6b5.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"Machine learning models for predicting severe clinical events in hospitalized patients with coronary artery disease","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eCoronary artery disease (CAD) is a form of heart disease characterized by the accumulation of atherosclerotic plaques in the epicardial coronary arteries, which leads to obstructive or non-obstructive lesions and subsequent myocardial damage[\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. The dynamic nature of the CAD process leads to varied clinical manifestations, which can be characterized as acute coronary syndromes (ACS) or chronic coronary syndromes (CCS). Its age-standardized Disability-Adjusted Life Years (DALYs) are 2,275.9 per 100,000, the highest of any disease worldwide[\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. Research from the World Health Organization (WHO) indicates that CAD is the leading cause of mortality worldwide, accounting for 13% of all deaths[\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eSevere clinical events, such as resuscitation or death, occur frequently during the hospitalization of patients with coronary artery disease. Accurately predicting these events, identifying risk factors, and enhancing management of high-risk patients are essential to improve patient prognosis[\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. However, the effectiveness of automated short-term risk of CAD prediction in hospital settings, particularly among low-risk individuals identified by conventional clinical guidelines, remains to be explored[\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eElectronic medical records (EMRs) encompass vast quantities of real-world patient data collected during routine clinical practice. They provide a rich resource of structured and unstructured medical data that can be used to capture the core characteristics of several diseases, making them ideal for clinical risk prediction and stratification. Although existing studies provide some insights, in-depth research utilizing EMRs remains insufficient[\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. A prediction score, developed using system-wide EMRs and machine learning (ML), can reveal residual disease risks not detected by conventional tools, which typically rely on a limited set of traditional risk factors[\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. Furthermore, utilizing unstructured data in EMRs\u0026mdash;such as progress notes, nursing notes, chief complaints, and discharge summaries\u0026mdash;is a complex and time-consuming process[\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. Natural Language Processing (NLP) merges linguistics with artificial intelligence to enable machines to understand and interpret text. It is being widely applied to extract insights from hidden information in clinical notes[\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. The application of NLP techniques to unstructured clinical text has the potential to enhance the performance of clinical prediction models[\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e, \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. Indeed, NLP has been adopted to computationally extract clinical information from EMRs for a wide range of applications ranging from advancing EMR-based clinical research[\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e, \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e] to supporting clinical decision-making[\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e, \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eTherefore, this study proposes to develop predictive models using five distinct ML techniques, drawing on EMRs data and NLP. The aim is to predict severe clinical events during the hospitalization of patients with CAD, thereby enabling clinicians to identify high-risk patients earlier and implement targeted interventions that may enhance clinical outcomes.\u003c/p\u003e"},{"header":"2 Methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e\u003cb\u003e2.1\u003c/b\u003e Data sources and Study population\u003c/h2\u003e \u003cp\u003eThis retrospective study utilized a real-world dataset obtained from the EMRs of Xiyuan Hospital, stored in the China Traditional Chinese Medicine Cardiovascular Bank. The dataset encompasses routine medical information from 2,302,360 outpatient and inpatient visits for 317,625 patients with cardiovascular disease, recorded between 2016 and 2024 at Xiyuan Hospital. The collected EMRs include demographic information, diagnoses, admission notes, medical orders, medical examinations, disease course notes and discharge summaries all of which can be analyzed to develop the prediction model.\u003c/p\u003e \u003cp\u003eAll inpatients with a principal diagnosis of CAD were included in our analysis; in other words, these patients were admitted for CAD rather than for other conditions. An all-comers design was used, with no exclusion criteria, to reflect real-world consultations. If a patient had multiple hospitalizations, only the most recent one was included. The selection of the population and the development, evaluation, and interpretation of the machine learning model are illustrated in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e\u003cb\u003e2.2\u003c/b\u003e Features and outcomes\u003c/h2\u003e \u003cp\u003ePatient features extracted for the development of the ML models include 133 variables, covering nearly all information in the EMRs. These variables encompass daily indicators and characteristics relevant to the hospitalization of CAD patients. The prediction model constructed with these variables will be more practical. Specifically, the extracted variables include demographic information, diagnoses, both current and past medical histories, physical examinations, admission and discharge diagnoses, examination and test reports, admission and discharge records, prescriptions, course texts, and records of adverse events. For a more detailed list of variables and their availability, see \u003cb\u003eTable \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e\u003c/b\u003e.\u003c/p\u003e \u003cp\u003eThe primary outcome of the study was severe clinical events, defined as occurrences of resuscitation or death during hospitalization. Additionally, in alignment with Chinese cultural perspectives, cases where resuscitation was not performed due to family members opting to discontinue treatment are also classified as severe clinical events.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e\u003cb\u003e2.3\u003c/b\u003e NLP for Data extraction\u003c/h2\u003e \u003cp\u003e(1) Data preprocessing: The initial step was to divided the EMRs into paragraphs in order to extract the pertinent text, including the chief complaint, the patient's current and past medical history, their personal history, and so forth. Subsequently, the medical record text was subjected to word segmentation, stopword removal and part-of-speech tagging. Subsequently, data cleaning and standardization were conducted to guarantee data quality. (2) Model training: The BERT (Bidirectional Encoder Representations from Transformers) pre-trained model was employed to perform semantic comprehension and feature extraction of textual data, including lexical features, syntactic features and semantic features. (3) Information Extraction: A Named Entity Recognition (NER) model was employed to identify and classify medical entities in unstructured text, including disease names, symptoms, medical history and diagnoses. A relationship extraction algorithm was implemented to determine the relationships between entities in the text. (4) Post-processing: The post-processing of the identified medical entities comprises the removal of duplicate entities and the merging of adjacent entities, with the objective of enhancing the accuracy of the results.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003e\u003cb\u003e2.4\u003c/b\u003e Data preprocessing and feature selection\u003c/h2\u003e \u003cp\u003eThe patient ID number is used as the unique identifier. We excluded features exhibiting more than 30% missing data across patients, and subsequently excluded patients who had more than 30% missing data across the remaining features[\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e]. The outliers in the data can increase the variability in the normal dataset, and data normalization is sensitive to the presence of outliers. Therefore, the interquartile range (IQR) was determined to detect and remove the outliers. The continuous variables were normalized using the MinMaxScaler before imputation, according to the following equation:\u003c/p\u003e \u003cp\u003eX\u003csup\u003e*\u003c/sup\u003e = (X\u003csub\u003ei\u003c/sub\u003e - X\u003csub\u003emin\u003c/sub\u003e) / (X\u003csub\u003emax\u003c/sub\u003e - X\u003csub\u003emin\u003c/sub\u003e)\u003c/p\u003e \u003cp\u003eNormalization is a strategy that compresses the range of data points, effectively making them more uniform and decreasing the variation between them. After normalization, all variables in the dataset were scaled to a range between 0 and 1. Disorderly multi-categorical variables were one-hot encoded, with each category mapped to a separate binary vector. This approach prevents the introduction of erroneous numerical relationships between categories, thereby avoiding inaccurate algorithmic predictions based on such relationships. However, one-hot encoding can significantly increase the dimensionality of the feature space when dealing with a large number of categories, potentially leading to computational complexity and overfitting issues.\u003c/p\u003e \u003cp\u003eMultiple imputation was used for missing data comprising less than 30% of the dataset. Missing data is a critical factor that may introduce biases in ML modeling. Multiple imputation is considered the best method for modeling each characteristic as a function of other features[\u003cspan additionalcitationids=\"CR18\" citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e]. It is a widely accepted approach that effectively handles missing values in both continuous and categorical variables[\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eGiven that there were 189 positive samples in the training set, we planned to conduct one-way ANOVA tests, Lasso regression analyses, and covariate diagnostics for the 133 variables in the training set to screen for independent variables. Ultimately, we expect to include 18 variables in our model. All eligible variables were tested for collinearity using the variance inflation factor measure before being included in the final multivariable models.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003e\u003cb\u003e2.5\u003c/b\u003e Machine models\u003c/h2\u003e \u003cp\u003eDue to the significant imbalance in the ratio of positives to negatives, randomly splitting the data into training and validation sets may not be appropriate. Instead, we divided the dataset into training (70%) and testing (30%) subsets using stratified sampling to maintain the proportional representation of positives and negatives, thereby ensuring an unbiased assessment of model performance. The training dataset was utilized for hyperparameter tuning and training processes through 10-fold cross-validation, while the test dataset was reserved solely for the final evaluation of the model's performance.\u003c/p\u003e \u003cp\u003eFive ML approaches were used to train predictive models for severe clinical events during the hospitalization of patients with CAD. The methods included logistic regression (LR), random forest (RF), Gaussian naive Bayes (GNB), extreme gradient boosting (XGBoost), and deep neural networks (DNN).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003e\u003cb\u003e2.6\u003c/b\u003e Model evaluation\u003c/h2\u003e \u003cp\u003eThe discrimination of the models was evaluated using the area under the receiver operating characteristic curve (ROC-AUC). Confidence intervals (CIs) for the ROC-AUC were estimated with 1000 bootstrap replicas, each generated using a unique random seed by the normal bootstrap method. Additionally, we evaluated the model's performance using several metrics: Sensitivity, Specificity, Youden's J Index, Accuracy, Positive Predictive Value (PPV), Negative Predictive Value (NPV), F1 Score, and Matthews Correlation Coefficient (MCC). Detailed formulas and explanations for these metrics are available in \u003cb\u003eTable S2\u003c/b\u003e.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003e\u003cb\u003e2.7\u003c/b\u003e Model Interpretation\u003c/h2\u003e \u003cp\u003eInterpreting predictions from black-box machine learning models presents a significant challenge for health professionals. To address this, we employed the SHapley Additive exPlanation (SHAP) method to identify the contribution of each attribute in our prediction model, enhancing the interpretability of the generated models. SHAP values offer a consistent and accurate way to assign contribution values to each feature within each model, which we illustrated through a graphical summary of the predictors [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]. As a game-theoretic approach to model interpretability, SHAP scores provide insights into the global structure of the model by synthesizing local explanations for each prediction.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003e\u003cb\u003e2.8\u003c/b\u003e Statisic analysis\u003c/h2\u003e \u003cp\u003eStatistical analysis was performed using Python (version 3.10.0). Continuous variables were assessed for normality using the Kolmogorov-Smirnov test. Variables that conformed to normality were described by their mean and standard deviation, and differences between the two groups were compared using the t-test. Variables not following a normal distribution were described using the median and interquartile range (Q25, Q75), and differences between groups were assessed using the Mann-Whitney U test. Categorical data were presented as absolute numbers and percentages (n%), and differences between groups were analyzed using Pearson's chi-square test, continuity-corrected chi-square test, or Fisher's exact test as appropriate.\u003c/p\u003e \u003c/div\u003e"},{"header":"3 Results","content":"\u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003e\u003cb\u003e3.1\u003c/b\u003e Patient Characteristics\u003c/h2\u003e \u003cp\u003eA total of 8,126 individuals were hospitalized for CAD. Of these, 1,155 were excluded due to having more than 30% missing data, leaving 6,971 participants in the study (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). Among them, 268 (3.84%) experienced severe clinical events during hospitalization (hereafter referred to as the \u0026ldquo;events group\u0026rdquo;), and 6,703 (96.16%) did not (hereafter referred to as the \u0026ldquo;no events group\u0026rdquo;).\u003c/p\u003e \u003cp\u003eA total of 171 variables were extracted, and 38 variables with more than 30% missing values were excluded. \u003cb\u003eTable \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e\u003c/b\u003e shows the all extracted variables and the missing rate. Detailed baseline characteristics for both groups can be found in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. For biochemical test indicators, including cardiac, liver, and renal function tests from blood and urine samples, refer to (\u003cb\u003eTable S3\u003c/b\u003e). As shown in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, There was no significant difference in the median length of hospital stay between the two groups, each recording 8 days, nor in their gender composition. In the events group, a higher percentage of dyspnea, insomnia, vomiting, coughing or expectoration, hypokalemia, hypoproteinemia, chronic renal insufficiency, pneumonia, electrolyte imbalance, hyponatremia, and acute exacerbation of chronic cardiac insufficiency were observed (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). In addition, myoglobin, creatine kinase isoenzymes, white blood cell counts, blood sodium, blood uric acid, and fasting blood glucose levels were higher in the events group compared to the non events group (\u003cb\u003eTable S3\u003c/b\u003e).\u003c/p\u003e \u003cp\u003eThe CAD admission diagnosis types in our database were categorized into three main categories according to MeSH categories[\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e]: chronic CAD, acute CAD, and angina pectoris (\u003cb\u003eTable S4\u003c/b\u003e). The events group had a greater proportion with admission diagnosis types of acute CAD (55.22%), in contrast, the no events group had a greater proportion with admission diagnosis types of chronic CAD (70.03%).\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eBaseline clinical characteristics of patients with coronary artery disease\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eVariables\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eEvents group (n\u0026thinsp;=\u0026thinsp;268)\u003csup\u003ea\u003c/sup\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNo events group (n\u0026thinsp;=\u0026thinsp;6703)\u003csup\u003ea\u003c/sup\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003eP-\u003c/em\u003evalue\u003csup\u003eb\u003c/sup\u003e\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eDemographic information\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAge, years\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e76.50(68.00, 84.00)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e69.00(61.00, 77.00)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGender\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.180\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003efemale\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e106 (39.6%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e2929 (43.7%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003emale\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e162 (60.4%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e3774 (56.3%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePulse rate, (Freq/Min)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e78.12(72.46, 87.91)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e73.13(68.59, 78.50)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSystolic blood pressure, noninvasive, (mmHg)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e128.00(115.33, 142.33)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e132.07(122.00, 143.67)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDiastolic blood pressure, noninvasive, (mmHg)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e72.83(64.00, 80.75)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e76.00(69.73, 83.00)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLength of hospitalization, (days)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e8.00(7.00, 12.00)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e8.00(6.00, 11.00)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.068\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eCAD admission diagnosis types\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eChronic CAD\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e106(39.55%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e4694(70.03%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAcute myocardial infarction\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e148(55.22%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e598(8.92%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAngina pectoris\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e14(5.22%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1411(21.05%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eSymptoms at the moment of admission\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDyspnea\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e77(28.73%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e653(9.74%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eInsomnia\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e14(5.22%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e125(1.86%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNausea\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e29(10.82%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e537(8.01%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.099\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eVomiting\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e40(14.93%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e593(8.85%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAbdominal distension\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e17(6.34%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e291(4.34%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.118\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCoughing or expectoration\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e171(63.81%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e979(14.61%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eOrthopnea\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e24(8.96%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e117(1.75%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eOf the chief complaints, chest pain, chest tightness, precordial pain duration ,years\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2.00(0.10, 6.33)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e4.00(1.00, 8.90)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003ePersonal history\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDrinking history\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e33(12.31%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1373(20.48%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSmoking history\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e71(26.49%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1912(28.52%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.470\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eMental state\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eWell\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e123(45.90%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e5423(80.90%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGeneral\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e8(2.99%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e202(3.01%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLassitude\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e137(51.12%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1074(16.02%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAnxiety\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0(0.00%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e4(0.06%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colspan=\"2\" nameend=\"c2\" namest=\"c1\"\u003e \u003cp\u003e\u003cb\u003eTongue\u003c/b\u003e\u003csup\u003ec\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTongue color\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDark\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e146(54.48%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e4047(60.38%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.053\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRed\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e159(59.33%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e4438(66.21%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.020\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLight\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e110(41.04%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e2420(36.10%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.099\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eWhite\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e3(1.12%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e42(0.63%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.549\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePurple\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e8(2.99%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e169(2.52%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.636\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTongue coating\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eWhite\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e92(34.33%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e2930(43.71%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.002\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eYellow\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e86(32.09%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e2123(31.67%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.886\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eThick\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e25(9.33%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e569(8.49%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.629\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGreasy\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e108(40.30%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e3479(51.90%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLess\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e19(7.09%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e240(3.58%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.003\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLarge fat tongues\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e4(1.49%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e80(1.19%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.877\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colspan=\"3\" nameend=\"c3\" namest=\"c1\"\u003e \u003cp\u003e\u003cb\u003ePulse signals\u003c/b\u003e \u003csup\u003ec\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eWiry pulse\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e122(45.52%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e3833(57.18%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eThin pulse\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e76(28.36%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1476(22.02%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.014\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSlippery pulse\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e88(32.84%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e2726(40.67%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.010\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSunken pulse\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e68(25.37%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1540(22.97%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.361\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRapid pulse\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e23(8.58%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e364(5.43%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.027\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSlow pulse\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e6(2.24%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e233(3.48%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.275\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eWeak pulse\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e26(9.70%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e356(5.31%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.002\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eIntermittent pulse\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e11(4.10%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e107(1.60%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.002\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRough pulse\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e10(3.73%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e234(3.49%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.834\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eCo-morbidity\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHypokalemia\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e53(19.78%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e512(7.64%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHypoproteinemia\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e58(21.64%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e154(2.30%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eChronic renal insufficiency\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e49(18.28%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e474(7.07%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePneumonia\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e89(33.21%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e363(5.42%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eElectrolyte imbalance\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e32(11.94%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e73(1.09%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHyponatremia\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e34(12.69%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e99(1.48%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCardiac insufficiency\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e117(43.66%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1741(25.97%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAcute exacerbation of chronic cardiac insufficiency\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e60(22.39%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e456(6.80%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eHypertension classification\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNo Hypertension\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e105(39.18%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1883(28.09%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eStage 1 Hypertension\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e6(2.24%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e482(7.19%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eStage 2 Hypertension\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e46(17.16%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1617(24.12%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eStage 3 Hypertension\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e111(41.42%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e2721(40.59%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eCardiac function grading (NYHA)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNYHA-0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e204(76.12%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e5311(79.23%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNYHA-1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e4(1.49%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e92(1.37%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNYHA-2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e8(2.99%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e540(8.06%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNYHA2-3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e21(7.84%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e587(8.76%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNYHA-3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e19(7.09%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e131(1.95%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNYHA-4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e12(4.48%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e42(0.63%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colspan=\"4\" nameend=\"c4\" namest=\"c1\"\u003e \u003cp\u003e\u003csup\u003e\u003cb\u003ea\u003c/b\u003e\u003c/sup\u003e : For normally distributed data, express as \u0026ldquo;mean\u0026thinsp;\u0026plusmn;\u0026thinsp;standard\u0026rdquo; deviation; for non-normally distributed data, express as \u0026ldquo; median (25th percentile, 75th percentile) \u0026rdquo;; for categorical data, express as \u0026ldquo; (n%)\u0026rdquo;.\u003c/p\u003e \u003cp\u003e\u003csup\u003e\u003cb\u003eb\u003c/b\u003e\u003c/sup\u003e : Statistical tests include: t-test, Mann-Whitney U test, Chi-squared tests (basic Chi-squared, Pearson Chi-squared, continuity correction Chi-squared), and Fisher's exact test.\u003c/p\u003e \u003cp\u003e\u003csup\u003e\u003cb\u003ec\u003c/b\u003e\u003c/sup\u003e : Traditional Chinese medicine signs and symptoms.\u003c/p\u003e \u003cp\u003eAbbreviations: CAD, Coronary artery disease; NYHA, New York Heart Association Functional Classification.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003e\u003cb\u003e3.2\u003c/b\u003e Model Development and Evaluation\u003c/h2\u003e \u003cp\u003eFinally18 variables were screened out of 133 variables by one-way ANOVA tests, lasso regression analyses, and covariance diagnostics. The variance inflation factor (VIF) for these 18 variables was under 10, indicating acceptably low multicollinearity. Details of the Lasso regression analysis are provided in \u003cb\u003eFigure \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003eA and S1B\u003c/b\u003e. Covariate diagnostics are available in \u003cb\u003eTable S5\u003c/b\u003e.\u003c/p\u003e \u003cp\u003eWe employed five widely used machine learning algorithms to construct the predictive models: Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Gaussian Naive Bayes (GNB), and Deep Neural Networks (DNN). The characteristics and parameters of these algorithms are summarized in \u003cb\u003eTable S6\u003c/b\u003e. Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e shows the performance comparison of the 5 models in the training set and validation. The result demonstrated that the DNN showed the best performance, and has outstanding advantages in various evaluation indicators. It has high AUC values in the validation sets (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). In addition, it has a sensitivity of 0.873, a specificity of 0.999, a Yoden index of 0.872, an accuracy of 0.994, a Positive Predictive Value (PPV) of 0.972, a Matthews Correlation Coefficient (MCC) of 0.918, and a Negative Predictive Value (NPV) of 0.995 in the validation set, which are the highest values for all five models in total. In contrast, the AUC of the other 4 models in the validation set ranged from 0.877\u0026ndash;0.896 (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e), and although Specificity, Accuracy, and NPV were all high, Sensitivity, Youden J Index, PPV, F1 Score, and MCC were not high, which may be related to the uneven distribution of our data, and our number of positive samples was much lower than the number of negative samples.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eModel\u0026rsquo; performance in training and validation set\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"10\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c10\" colnum=\"10\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAUC (95% CI)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eAccuracy\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eSensitivity\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSpecificity\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eYouden's J statistic\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003ePPV\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003eNPV\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c9\"\u003e \u003cp\u003eF1 Score\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c10\"\u003e \u003cp\u003eMCC\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colspan=\"2\" nameend=\"c2\" namest=\"c1\"\u003e \u003cp\u003e\u003cb\u003eTraining set\u003c/b\u003e\u003c/p\u003e \u003cp\u003e\u003cb\u003e(n\u0026thinsp;=\u0026thinsp;4879)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.898(0.874\u0026ndash;0.919)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.963\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.169\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.995\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.164\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.571\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.967\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.258\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e0.278\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRF\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.897(0.872\u0026ndash;0.922)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.962\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.005\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e1.000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.005\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e1.000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.962\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.011\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e0.071\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eXGBoost\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.996(0.992\u0026ndash;0.998)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.990\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.751\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e1.000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.751\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e1.000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.990\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.858\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e0.863\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGNB\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.871(0.842\u0026ndash;0.897)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.892\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.619\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.903\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.522\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.204\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.983\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.321\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e0.319\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDNN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.987(0.980\u0026ndash;0.993)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.990\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.810\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.997\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.806\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.911\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.992\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.857\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e0.853\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colspan=\"2\" nameend=\"c2\" namest=\"c1\"\u003e \u003cp\u003e\u003cb\u003eValidation set\u003c/b\u003e\u003c/p\u003e \u003cp\u003e\u003cb\u003e(n\u0026thinsp;=\u0026thinsp;2092)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.877(0.829\u0026ndash;0.919)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.966\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.241\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.994\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.235\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.613\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.971\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.368\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e0.385\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRF\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.896(0.855\u0026ndash;0.932)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.964\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.038\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e1.000\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.038\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e1.000\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.964\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.073\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e0.191\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eXGBoost\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.888(0.854\u0026ndash;0.920)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.964\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.317\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.990\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.306\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.544\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.974\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.400\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e0.398\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGNB\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.884(0.841\u0026ndash;0.921)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.892\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.671\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.900\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.571\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.209\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.986\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.331\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e0.327\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDNN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003e0.995(0.985\u0026ndash;0.999)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e0.994\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.873\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.999\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e0.872\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.972\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e0.995\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e0.920\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e\u003cb\u003e0.918\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"10\"\u003eAbbreviations: AUC, area under the curve; CI, confidence interval; LR, logistic regression; RF, random forest; GNB, Gaussian naive Bayes; XGB, extreme gradient boosting; DNN, deep neural network; NPV, negative predictive value; PPV, positive predictive value; MCC: Matthews correlation coefficient.\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e\u003ch2\u003e3.3 Identification of Important Risk Factors Contributing to the Model\u003c/h2\u003e\n\u003cp\u003eThe SHAP algorithm was utilized to assess the importance of each predictor variable in the DNN model\u0026apos;s predictions. The variable importance plot, arranged in descending order, is shown in\u0026nbsp;\u003cstrong\u003eFigure 3\u003c/strong\u003e\u003cstrong\u003e.\u003c/strong\u003e The CAD admission diagnosis type emerged as the most predictive variable across all prediction horizons, followed closely by urinary occult blood, mental state (well, general, lassitude, anxiety), flushed tongue, and hypoproteinemia. The importance of each predictor in the remaining three models were also calculated using the SHAP algorithm, with the results presented in descending order in\u0026nbsp;\u003cstrong\u003eFigure S2A-C\u003c/strong\u003e. To discern the positive and negative relationships between predictors and the target outcome, SHAP values were employed to identify mortality risk factors. As illustrated in\u0026nbsp;\u003cstrong\u003eFigure\u0026nbsp;\u003c/strong\u003e\u003cstrong\u003e4\u003c/strong\u003e, values to the left of the horizontal zero line (ranging from 0 to -1) indicate a decreasing probability of the predicted event, whereas values to the right (ranging from 0 to 1) indicate an increasing probability. The color coding of the dots represents the raw values of the features: red for high values and blue for low values. It is notable that most samples with a diagnosis of non-acute coronary artery disease cluster to the left of zero, suggesting a lower likelihood of in-hospital major events for these patients. Conversely, patients diagnosed with acute coronary artery disease typically cluster on the right, indicating a higher likelihood of experiencing an in-hospital major event.\u003c/p\u003e"},{"header":"4 Discussion","content":"\u003ch2\u003e4.1 Principal Findings\u003c/h2\u003e\n\u003cp\u003eIn this retrospective cohort study, we examined real-world clinical data from 6,971 EMRs of patients with CAD to develop and validate five interpretable ML models aimed at predicting severe in-hospital clinical events. The models, including XGBoost, RF, GNB and LR, demonstrated comparable AUC; however, the DNN model exhibited superior performance, achieving an AUC of 0.995(0.985-0.999), a specificity of 0.999, a sensitivity of 0.873, a Youden\u0026apos;s index of 0.872, an F1 score of 0.920, and a MCC of 0.918. This model demonstrated favorable differentiation, calibration, and clinical applicability in both the training cohort and the validation cohort. To facilitate model interpretability, SHAP were utilized, identifying the admission diagnosis types of acute CAD, urinary occult blood, mental state (ranging from well, general, to lassitude and anxiety), flushed tongue, and hypoproteinemia as key predictors impacting the risk of severe clinical events in this patient population. Some studies investigated the in-hospital risk of AMI or post-percutaneous coronary intervention (PCI), but, to the best of our knowledge, our study is among the first to apply ML techniques to predict severe in-hospital clinical events across a general population of patients with CAD. We developed a prediction interpretable model (DNN) with good predictive capacity, which allows to understand the contributing factors and the paths to the rational decision. Predictors with clinical plausibility and easily available in most health services were used.\u003c/p\u003e\n\u003ch2\u003e4.2\u0026nbsp;Relationship with previous studies\u003c/h2\u003e\n\u003cp\u003eSeveral prognostic risk prediction models for CAD have been developed, primarily focusing on the in-hospital prognosis of AMI or postoperative risks of PCI (such as mortality[23\u0026ndash;27], acute kidney injury[28], heart failure[29], bleeding[30], mortality[31], adverse events duriling hospitalization[32]). Predictions of in-hospital prognosis for CAD were uncommon; only one study was found predicting in-hospital mortality for CAD patients with concurrent chronic kidney disease[33]. Additionally, several studies have addressed long-term outcomes, such as readmission[34, 35]for CAD and one-year mortality for AMI[36]. Building on the foundation of these studies, our inclusion criteria for CAD patients are broader, encompassing all forms of CAD: chronic CAD, angina, and AMI. There have been similar research ideas before, such as modifying the GRACE risk score to support its potential applicability in different subgroups of ACS[37]. Our inclusion of a broader population with coronary heart disease, along with outcomes that align closely with clinical practice, enhances the pragmatic utility of our prediction model. Similar to this study, the models employed in these studies predominantly utilize ML techniques. The commonly used ML models include XGBoost, LR, RF, Support Vector Machine (SVM), and GNB.\u003c/p\u003e\n\u003ch2\u003e4.3\u0026nbsp;Implications for future research\u003c/h2\u003e\n\u003cp\u003eThe development and validation of an ML-based predictive model for severe clinical events in patients with CAD using EMRs opens several avenues for future research that could further refine and enhance the utility of predictive models in clinical settings. (1) Expansion to other clinical conditions: While this study focused on coronary artery disease, the methodology could be adapted for other chronic diseases that require continuous monitoring and management, such as diabetes and chronic kidney disease. This could help in developing comprehensive models that can predict multiple adverse events across different disease spectrums. (2) Exploration of unstructured data: This study utilized NLP to extract data from unstructured sources within EMRs. Future research could delve deeper into more advanced NLP techniques and other forms of unstructured data, such as images (X-rays, electrocardiography) and notes from newer digital communication tools (telehealth transcripts), to enrich the data inputs into the ML models. (3) Multi-centric validation studies: The model developed in this study was validated using data from a single center. Conducting multi-centric studies would help validate the model across different populations and healthcare settings, enhancing its generalizability and robustness. (4) Integration with clinical decision support systems: Research into how these predictive models can be effectively integrated into existing clinical decision support systems could be crucial. This includes understanding the workflow of healthcare professionals and ensuring that the model\u0026apos;s predictions are delivered in a way that seamlessly fits into clinical decision-making processes without disrupting them. (5) Advanced ML techniques: Exploring more advanced ML techniques, such as deep learning and reinforcement learning, could provide new ways of modeling complex clinical data and interactions. These methods might uncover patterns not visible with current techniques and improve prediction accuracy.\u003c/p\u003e\n\u003ch2\u003e4.4\u0026nbsp;Strengths and limitations\u003c/h2\u003e\n\u003cp\u003eFirst, we constructed our predictive models using routine clinical data from EMRs. This approach efficiently leverages real-world information contained in EMRs to develop more practical and applicable risk prediction models.\u0026nbsp;EMRs contain almost all clinical features about patients, including large amounts of structured data and unstructured data collected during regular clinical practice\u0026nbsp;[38, 39].\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eSecond, we employed NLP to extract data from EMRs. While structured clinical data, such as age, vital signs, and laboratory results, come in a fixed format that simplifies preprocessing, in contrast, clinical\u0026nbsp;notes present unique challenges. These notes are often unstructured, containing abbreviations, grammatical errors, and misspellings, and use distinct clinical language and idioms that pose difficulties for health information research[40, 41].\u0026nbsp;NLP, which merges linguistics and artificial intelligence to enable machine understanding and interpretation of text, is increasingly being used to extract and analyze hidden information in clinical notes[9].\u0026nbsp;It has been widely adopted for computationally extracting clinical information from EMRs, with applications that range from enhancing EMR-based clinical research to supporting clinical decision-making[14, 15].\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThird, we utilized ML to construct risk prediction models. In EMR-based health systems, patients accrue millions of heterogeneous clinical data points longitudinally, presenting significant challenges for analysis and interpretation without ML technologies[42\u0026ndash;45]. Traditional risk prediction models, which rely on statistical methods, often struggle with issues such as variable correlation, heterogeneity, nonlinearity, and overfitting, particularly in complex data sets with numerous features[46]. Unlike traditional statistical methods that start with a predefined model and use data as input, ML adopts a data-driven approach. This method generalizes a model from the data itself, enhancing its applicability to new data[47]. ML methods are capable of identifying patterns in large data sets characterized by multidimensional and nonlinear relationships among clinical features, thereby predicting various outcomes more effectively[48]. While much of these data exist in unstructured text forms, ML techniques facilitate the application of algorithms that classify features or predict events from clinical notes[49]. These approaches often achieve higher sensitivity and specificity in identifying at-risk patients than methods relying on structured data[50, 51]. Consequently, ML has become a crucial component in the prevention, diagnosis, treatment, and support of clinical decisions. Specifically, ML models in EMRs have demonstrated superior performance to traditional survival models in predicting mortality among patients with CAD[52].\u003c/p\u003e\n\u003cp\u003eHowever, this study is subject to several limitations: Firstly, due to constraints inherent to the study\u0026apos;s database, external validation of the data was not feasible. Instead, internal validation methods were employed, although external validation would provide a more robust proof of the model\u0026apos;s applicability. Secondly, the diagnosis of coronary artery disease in the study population was based on clinical assessments rather than the gold standard of coronary angiography, introducing some diagnostic uncertainty. Third, although the sample size was not calculated specifically for the prediction model, the number of positive cases in the training set was 189 out of 268 occurrences in the study group. Consequently, the final model incorporated only 18 variables, significantly fewer than the total number collected.\u0026nbsp;\u003c/p\u003e"},{"header":"5 Conclusions","content":"\u003cp\u003eThe interpretable predictive model helps physicians more accurately predict the severe clinical events for patients with coronary heart disease, therefore, enhancing clinical decision-making and patient management. In addition, the interpretable framework can increase the transparency of the model and facilitate understanding the reliability of the predictive model for physicians. Our machine learning algorithm especially DNN can predict, with high sensitivity and specificity, the impending occurrence of severe clinical events in-hospital for patients with coronary artery disease. Algorithm-generated predictive alerts modestly impacted clinical measures. The next steps include describing clinical perception of this tool and optimizing algorithm design and delivery.\u003c/p\u003e"},{"header":"Abbreviations","content":"\u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 25.7143%;\"\u003e\n \u003cp\u003eCAD\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 74.2857%;\"\u003e\n \u003cp\u003eCoronary artery disease\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 25.7143%;\"\u003e\n \u003cp\u003eEMRs\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 74.2857%;\"\u003e\n \u003cp\u003eelectronic medical records\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 25.7143%;\"\u003e\n \u003cp\u003eNLP\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 74.2857%;\"\u003e\n \u003cp\u003enatural language processing\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 25.7143%;\"\u003e\n \u003cp\u003eLR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 74.2857%;\"\u003e\n \u003cp\u003emachine learning\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 25.7143%;\"\u003e\n \u003cp\u003eRF\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 74.2857%;\"\u003e\n \u003cp\u003eRandom Forest\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 25.7143%;\"\u003e\n \u003cp\u003eXGBoost\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 74.2857%;\"\u003e\n \u003cp\u003eExtreme Gradient Boosting\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 25.7143%;\"\u003e\n \u003cp\u003eGNB\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 74.2857%;\"\u003e\n \u003cp\u003eGaussian Naive Bayes\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 25.7143%;\"\u003e\n \u003cp\u003eDNN\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 74.2857%;\"\u003e\n \u003cp\u003eDeep Neural Network\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 25.7143%;\"\u003e\n \u003cp\u003eAUC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 74.2857%;\"\u003e\n \u003cp\u003earea under the curve\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 25.7143%;\"\u003e\n \u003cp\u003eSHAP\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 74.2857%;\"\u003e\n \u003cp\u003eSHapley Additive exPlanations\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 25.7143%;\"\u003e\n \u003cp\u003eACS\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 74.2857%;\"\u003e\n \u003cp\u003eacute coronary syndromes\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 25.7143%;\"\u003e\n \u003cp\u003eCCS\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 74.2857%;\"\u003e\n \u003cp\u003echronic coronary syndromes\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 25.7143%;\"\u003e\n \u003cp\u003eDALYs\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 74.2857%;\"\u003e\n \u003cp\u003eDisability-Adjusted Life Years\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 25.7143%;\"\u003e\n \u003cp\u003eWHO\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 74.2857%;\"\u003e\n \u003cp\u003eWorld Health Organization\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 25.7143%;\"\u003e\n \u003cp\u003eIQR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 74.2857%;\"\u003e\n \u003cp\u003einterquartile range\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 25.7143%;\"\u003e\n \u003cp\u003eCIs\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 74.2857%;\"\u003e\n \u003cp\u003eConfidence intervals\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 25.7143%;\"\u003e\n \u003cp\u003ePPV\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 74.2857%;\"\u003e\n \u003cp\u003ePositive Predictive Value\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 25.7143%;\"\u003e\n \u003cp\u003eNPV\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 74.2857%;\"\u003e\n \u003cp\u003eNegative Predictive Value\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 25.7143%;\"\u003e\n \u003cp\u003eMCC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 74.2857%;\"\u003e\n \u003cp\u003eMatthews Correlation Coefficien\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e"},{"header":"Declarations","content":"\u003cp\u003eData Availability Statement\u003c/p\u003e\n\u003cp\u003eThe original contributions presented in the study are included in the article/Supplementary Material; further inquiries can be directed to the corresponding author.\u003c/p\u003e\n\u003cp\u003eEthics statement\u003c/p\u003e\n\u003cp\u003eThis research was reviewed and approved by the institutional review board of Xiyuan Hospital of China Academy of Chinese Medical Sciences (registration number 2024XLA222-2). This study was conducted in accordance with the Declaration of Helsinki.Informed consent was waived by the review board.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eAuthor contributions\u003c/p\u003e\n\u003cp\u003eH.L., M.L., Z.Q., contributed to the study design. X.G., F.C., C.L., M.L., J.H., J.Z., participated in the collection, analysis, or interpretation of data. J.Z., D.Z., L.L., H.X. performed the statistical analysis. F.X., Y.F., J.X. critically revised the article. All authors have read and approved the manuscript.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eFunding\u003c/p\u003e\n\u003cp\u003eThis work was supported by the program of Beijing Traditional Chinese Medicine Science and Technology Development Funding Program (No.BJZYZD-2023-04) and Beijing Natural Science Foundation-Haidian Original Innovation Joint Fund Key Research Project (L242058).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eConflict of interest\u003c/p\u003e\n\u003cp\u003eThe authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.\u003c/p\u003e\n\u003cp\u003ePublisher\u0026rsquo;s note\u003c/p\u003e\n\u003cp\u003eThe authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors would like to thank the Cardiovascular Disease Program of the China Center for Evidence-Based Medicine in Traditional Chinese Medicine for providing electronic medical record data utilized in this study. We also extend our gratitude to Beijing Yikang Medical Technology Co., Ltd., for their support with natural language processing, statistical analysis, and machine learning modeling.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eSupplementary Material\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eSupplementary material associated with this article can be found, in the online version, at doi: \u0026nbsp; . \u003cstrong\u003eSupplemental Table S1 - S6\u003c/strong\u003e.\u0026nbsp;\u003cstrong\u003eSupplemental Figure S1- S2\u003c/strong\u003e.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eKnuuti J, Wijns W, Saraste A, Capodanno D, Barbato E, Funck-Brentano C, et al. 2019 ESC Guidelines for the diagnosis and management of chronic coronary syndromes. Eur Heart J. 2020;41:407\u0026ndash;77. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/eurheartj/ehz425\u003c/span\u003e\u003cspan address=\"10.1093/eurheartj/ehz425\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLawton JS, Tamis-Holland JE, Bangalore S, Bates ER, Beckie TM, Bischoff JM, et al. 2021 ACC/AHA/SCAI Guideline for Coronary Artery Revascularization: Executive Summary: A Report of the American College of Cardiology/American Heart Association Joint Committee on Clinical Practice Guidelines. Circulation. 2022;145:e4\u0026ndash;17. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1161/CIR.0000000000001039\u003c/span\u003e\u003cspan address=\"10.1161/CIR.0000000000001039\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMensah GA, Fuster V, Murray CJL, Roth GA. Global Burden of Cardiovascular Diseases and Risks Collaborators. Global Burden of Cardiovascular Diseases and Risks, 1990\u0026ndash;2022. J Am Coll Cardiol. 2023;82:2350\u0026ndash;473. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.jacc.2023.11.007\u003c/span\u003e\u003cspan address=\"10.1016/j.jacc.2023.11.007\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eThe top 10 causes of death. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.who.int/news-room/fact-sheets/detail/the-top-10-causes-of-death\u003c/span\u003e\u003cspan address=\"https://www.who.int/news-room/fact-sheets/detail/the-top-10-causes-of-death\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Accessed 30 Aug 2024.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSong L, Li Y, Nie S, Feng Z, Liu Y, Ding F, et al. Using machine learning to predict adverse events in acute coronary syndrome: A retrospective study. Clin Cardiol. 2023;46:1594\u0026ndash;602. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1002/clc.24127\u003c/span\u003e\u003cspan address=\"10.1002/clc.24127\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBs BOP. Coronary Risk Estimation Based on Clinical Data in Electronic Health Records. 2022;79.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eForrest IS, Petrazzini BO, Duffy \u0026Aacute;, Park JK, Marquez-Luna C, Jordan DM, et al. Machine learning-based marker for coronary artery disease: derivation and validation in two longitudinal cohorts. Lancet. 2023;401:215\u0026ndash;25. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/S0140-6736(22)02079-7\u003c/span\u003e\u003cspan address=\"10.1016/S0140-6736(22)02079-7\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYan MY, Gustad LT, Nytr\u0026oslash; \u0026Oslash;. Sepsis prediction, early detection, and identification using clinical text for machine learning: a systematic review. J Am Med Inf Assoc. 2021;29:559\u0026ndash;75. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/jamia/ocab236\u003c/span\u003e\u003cspan address=\"10.1093/jamia/ocab236\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKreimeyer K, Foster M, Pandey A, Arya N, Halford G, Jones SF, et al. Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review. J Biomed Inform. 2017;73:14. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.jbi.2017.07.012\u003c/span\u003e\u003cspan address=\"10.1016/j.jbi.2017.07.012\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSung S-F, Chen C-H, Pan R-C, Hu Y-H, Jeng J-S. Natural Language Processing Enhances Prediction of Functional Outcome After Acute Ischemic Stroke. J Am Heart Assoc. 2021;10:e023486. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1161/JAHA.121.023486\u003c/span\u003e\u003cspan address=\"10.1161/JAHA.121.023486\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWeissman GE, Hubbard RA, Ungar LH, Harhay MO, Greene CS, Himes BE, et al. Inclusion of Unstructured Clinical Text Improves Early Prediction of Death or Prolonged ICU Stay. Crit Care Med. 2018;46:1125\u0026ndash;32. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1097/CCM.0000000000003148\u003c/span\u003e\u003cspan address=\"10.1097/CCM.0000000000003148\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu F, Weng C, Yu H. Natural Language Processing, Electronic Health Records, and Clinical Research. In: Richesson RL, Andrews JE, editors. Clinical Research Informatics. London: Springer; 2012. pp. 293\u0026ndash;310. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/978-1-84882-448-5_16\u003c/span\u003e\u003cspan address=\"10.1007/978-1-84882-448-5_16\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFu S, Leung LY, Wang Y, Raulli A-O, Kallmes DF, Kinsman KA, et al. Natural Language Processing for the Identification of Silent Brain Infarcts From Neuroimaging Reports. JMIR Med Inf. 2019;7. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.2196/12109\u003c/span\u003e\u003cspan address=\"10.2196/12109\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHarkema H, Chapman WW, Saul M, Dellon ES, Schoen RE, Mehrotra A. Focus on clinical and translational research: Developing a natural language processing application for measuring the quality of colonoscopy procedures. J Am Med Inf Association: JAMIA. 2011;18(Suppl 1):i150. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1136/amiajnl-2011-000431\u003c/span\u003e\u003cspan address=\"10.1136/amiajnl-2011-000431\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDemner-Fushman D, Chapman WW, McDonald CJ. What can Natural Language Processing do for Clinical Decision Support? J Biomed Inform. 2009;42:760. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.jbi.2009.08.007\u003c/span\u003e\u003cspan address=\"10.1016/j.jbi.2009.08.007\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVaid A, Chan L, Chaudhary K, Jaladanki SK, Paranjpe I, Russak A, et al. Predictive Approaches for Acute Dialysis Requirement and Death in COVID-19. Clin J Am Soc Nephrol. 2021;16:1158\u0026ndash;68. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.2215/CJN.17311120\u003c/span\u003e\u003cspan address=\"10.2215/CJN.17311120\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHarel O, Mitchell EM, Perkins NJ, Cole SR, Tchetgen Tchetgen EJ, Sun B, et al. Multiple Imputation for Incomplete Data in Epidemiologic Studies. Am J Epidemiol. 2018;187:576\u0026ndash;84. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/aje/kwx349\u003c/span\u003e\u003cspan address=\"10.1093/aje/kwx349\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChang C, Deng Y, Jiang X, Long Q. Multiple imputation for analysis of incomplete data in distributed health data networks. Nat Commun. 2020;11:5467. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41467-020-19270-2\u003c/span\u003e\u003cspan address=\"10.1038/s41467-020-19270-2\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003evan Ginkel JR, Linting M, Rippe RCA, van der Voort A. Rebutting Existing Misconceptions About Multiple Imputation as a Method for Handling Missing Data. J Pers Assess. 2020;102:297\u0026ndash;308. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1080/00223891.2018.1530680\u003c/span\u003e\u003cspan address=\"10.1080/00223891.2018.1530680\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eStekhoven DJ, B\u0026uuml;hlmann P. MissForest\u0026ndash;non-parametric missing value imputation for mixed-type data. Bioinformatics. 2012;28:112\u0026ndash;8. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/bioinformatics/btr597\u003c/span\u003e\u003cspan address=\"10.1093/bioinformatics/btr597\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLundberg SM, Lee S-I. A unified approach to interpreting model predictions. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates Inc.; 2017. pp. 4768\u0026ndash;77.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMeSH-NCBI. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.ncbi.nlm.nih.gov/mesh\u003c/span\u003e\u003cspan address=\"https://www.ncbi.nlm.nih.gov/mesh\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Accessed 28 Oct 2024.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKe J, Chen Y, Wang X, Wu Z, Zhang Q, Lian Y, et al. Machine learning-based in-hospital mortality prediction models for patients with acute coronary syndrome. Am J Emerg Med. 2022;53:127\u0026ndash;34. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.ajem.2021.12.070\u003c/span\u003e\u003cspan address=\"10.1016/j.ajem.2021.12.070\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKhera R, Haimovich J, Hurley NC, McNamara R, Spertus JA, Desai N, et al. Use of Machine Learning Models to Predict Death After Acute Myocardial Infarction. JAMA Cardiol. 2021;6:633\u0026ndash;41. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1001/jamacardio.2021.0122\u003c/span\u003e\u003cspan address=\"10.1001/jamacardio.2021.0122\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOliveira M, Seringa J, Pinto FJ, Henriques R, Magalh\u0026atilde;es T. Machine learning prediction of mortality in Acute Myocardial Infarction. BMC Med Inf Decis Mak. 2023;23:70. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1186/s12911-023-02168-6\u003c/span\u003e\u003cspan address=\"10.1186/s12911-023-02168-6\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhu X, Xie B, Chen Y, Zeng H, Hu J. Machine learning in the prediction of in-hospital mortality in patients with first acute myocardial infarction. Clin Chim Acta. 2024;554:117776. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.cca.2024.117776\u003c/span\u003e\u003cspan address=\"10.1016/j.cca.2024.117776\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhao J, Zhao P, Li C, Hou Y. Optimized Machine Learning Models to Predict In-Hospital Mortality for Patients with ST-Segment Elevation Myocardial Infarction. Ther Clin Risk Manag. 2021;17:951\u0026ndash;61. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.2147/TCRM.S321799\u003c/span\u003e\u003cspan address=\"10.2147/TCRM.S321799\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSong L, Li Y, Nie S, Feng Z, Liu Y, Ding F, et al. Using machine learning to predict adverse events in acute coronary syndrome: A retrospective study. Clin Cardiol. 2023;46:1594\u0026ndash;602. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1002/clc.24127\u003c/span\u003e\u003cspan address=\"10.1002/clc.24127\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen S, Pan X, Mo J, Wang B. Establishment and validation of a prediction nomogram for heart failure risk in patients with acute myocardial infarction during hospitalization. BMC Cardiovasc Disord. 2023;23:619. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1186/s12872-023-03665-2\u003c/span\u003e\u003cspan address=\"10.1186/s12872-023-03665-2\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhao X, Wang J, Yang J, Chen T, Song Y, Li X, et al. Machine learning for prediction of bleeding in acute myocardial infarction patients after percutaneous coronary intervention. Ther Adv Chronic Dis. 2023;14:20406223231158561. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1177/20406223231158561\u003c/span\u003e\u003cspan address=\"10.1177/20406223231158561\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAl\u0026rsquo;Aref SJ, Singh G, van Rosendael AR, Kolli KK, Ma X, Maliakal G, et al. Determinants of In-Hospital Mortality After Percutaneous Coronary Intervention: A Machine Learning Approach. J Am Heart Assoc. 2019;8:e011160. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1161/JAHA.118.011160\u003c/span\u003e\u003cspan address=\"10.1161/JAHA.118.011160\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNiimi N, Shiraishi Y, Sawano M, Ikemura N, Inohara T, Ueda I, et al. Machine learning models for prediction of adverse events after percutaneous coronary intervention. Sci Rep-uk. 2022;12:6262. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41598-022-10346-1\u003c/span\u003e\u003cspan address=\"10.1038/s41598-022-10346-1\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYe Z, An S, Gao Y, Xie E, Zhao X, Guo Z, et al. The prediction of in-hospital mortality in chronic kidney disease patients with coronary artery disease using machine learning models. Eur J Med Res. 2023;28:33. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1186/s40001-023-00995-x\u003c/span\u003e\u003cspan address=\"10.1186/s40001-023-00995-x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eErmak AD, Gavrilov DV, Novitskiy RE, Gusev AV, Andreychenko AE. Development, evaluation and validation of machine learning models to predict hospitalizations of patients with coronary artery disease within the next 12 months. Int J Med Inf. 2024;188:105476. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.ijmedinf.2024.105476\u003c/span\u003e\u003cspan address=\"10.1016/j.ijmedinf.2024.105476\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang Y, Zhu X, Gao F, Yang S. Systematic Review and Critical Appraisal of Prediction Models for Readmission in Coronary Artery Disease Patients: Assessing Current Efficacy and Future Directions. Risk Manag Healthc P. 2024;17:549\u0026ndash;57. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.2147/RMHP.S451436\u003c/span\u003e\u003cspan address=\"10.2147/RMHP.S451436\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLee HC, Park JS, Choe JC, Ahn JH, Lee HW, Oh J-H, et al. Prediction of 1-Year Mortality from Acute Myocardial Infarction Using Machine Learning. Am J Cardiol. 2020;133:23\u0026ndash;31. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.amjcard.2020.07.048\u003c/span\u003e\u003cspan address=\"10.1016/j.amjcard.2020.07.048\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGeorgiopoulos G, Kraler S, Mueller-Hennessen M, Delialis D, Mavraganis G, Sopova K, et al. Modification of the GRACE Risk Score for Risk Prediction in Patients With Acute Coronary Syndromes. JAMA Cardiol. 2023;8:946\u0026ndash;56. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1001/jamacardio.2023.2741\u003c/span\u003e\u003cspan address=\"10.1001/jamacardio.2023.2741\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHossain E, Rana R, Higgins N, Soar J, Barua PD, Pisani AR, et al. Natural Language Processing in Electronic Health Records in relation to healthcare decision-making: A systematic review. Comput Biol Med. 2023;155:106649. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.compbiomed.2023.106649\u003c/span\u003e\u003cspan address=\"10.1016/j.compbiomed.2023.106649\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHu D, Li S, Zhang H, Wu N, Lu X. Using Natural Language Processing and Machine Learning to Preoperatively Predict Lymph Node Metastasis for Non-Small Cell Lung Cancer With Electronic Medical Records: Development and Validation Study. JMIR Med Inf. 2022;10:e35475. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.2196/35475\u003c/span\u003e\u003cspan address=\"10.2196/35475\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eUlrich CM, Grady C, Demiris G, Richmond TS. The Competing Demands of Patient Privacy and Clinical Research. Ethics Hum Res. 2021;43:25\u0026ndash;31. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1002/eahr.500076\u003c/span\u003e\u003cspan address=\"10.1002/eahr.500076\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChoudhary A, Choudhary A, Suman S. NLP Applications for Big Data Analytics Within Healthcare. In: Mishra S, Tripathy HK, Mallick P, Shaalan K, editors. Augmented Intelligence in Healthcare: A Pragmatic and Integrated Analysis. Singapore: Springer Nature; 2022. pp. 237\u0026ndash;57. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/978-981-19-1076-0_13\u003c/span\u003e\u003cspan address=\"10.1007/978-981-19-1076-0_13\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, et al. Scalable and accurate deep learning with electronic health records. NPJ Digit Med. 2018;1:18. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41746-018-0029-1\u003c/span\u003e\u003cspan address=\"10.1038/s41746-018-0029-1\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRajkomar A, Dean J, Kohane I. Machine Learning in Medicine. N Engl J Med. 2019;380:1347\u0026ndash;58. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1056/NEJMra1814259\u003c/span\u003e\u003cspan address=\"10.1056/NEJMra1814259\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi L, Cheng W-Y, Glicksberg BS, Gottesman O, Tamler R, Chen R, et al. Identification of type 2 diabetes subgroups through topological analysis of patient similarity. Sci Transl Med. 2015;7:311ra174. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1126/scitranslmed.aaa9364\u003c/span\u003e\u003cspan address=\"10.1126/scitranslmed.aaa9364\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eObermeyer Z, Lee TH. Lost in Thought - The Limits of the Human Mind and the Future of Medicine. N Engl J Med. 2017;377:1209\u0026ndash;11. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1056/NEJMp1705348\u003c/span\u003e\u003cspan address=\"10.1056/NEJMp1705348\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhuang X-D, Tian T, Liao L-Z, Dong Y-H, Zhou H-J, Zhang S-Z, et al. Deep Phenotyping and Prediction of Long-term Cardiovascular Disease: Optimized by Machine Learning. Can J Cardiol. 2022;38:774\u0026ndash;82. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.cjca.2022.02.008\u003c/span\u003e\u003cspan address=\"10.1016/j.cjca.2022.02.008\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWiens J, Shenoy ES. Machine Learning for Healthcare: On the Verge of a Major Shift in Healthcare Epidemiology. Clin Infect Dis. 2018;66:149\u0026ndash;53. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/cid/cix731\u003c/span\u003e\u003cspan address=\"10.1093/cid/cix731\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMotwani M, Dey D, Berman DS, Germano G, Achenbach S, Al-Mallah MH, et al. Machine learning for prediction of all-cause mortality in patients with suspected coronary artery disease: a 5-year multicentre prospective registry analysis. Eur Heart J. 2017;38:500\u0026ndash;7. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/eurheartj/ehw188\u003c/span\u003e\u003cspan address=\"10.1093/eurheartj/ehw188\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRibelles N, Jerez JM, Rodriguez-Brazzarola P, Jimenez B, Diaz-Redondo T, Mesa H, et al. Machine learning and natural language processing (NLP) approach to predict early progression to first-line treatment in real-world hormone receptor-positive (HR+)/HER2-negative advanced breast cancer patients. Eur J Cancer. 2021;144:224\u0026ndash;31. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.ejca.2020.11.030\u003c/span\u003e\u003cspan address=\"10.1016/j.ejca.2020.11.030\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLing AY, Kurian AW, Caswell-Jin JL, Sledge GW, Shah NH, Tamang SR. Using natural language processing to construct a metastatic breast cancer cohort from linked cancer registry and electronic medical records data. JAMIA Open. 2019;2:528\u0026ndash;37. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/jamiaopen/ooz040\u003c/span\u003e\u003cspan address=\"10.1093/jamiaopen/ooz040\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCarrell DS, Halgrim S, Tran D-T, Buist DSM, Chubak J, Chapman WW, et al. Using natural language processing to improve efficiency of manual chart abstraction in research: the case of breast cancer recurrence. Am J Epidemiol. 2014;179:749\u0026ndash;58. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/aje/kwt441\u003c/span\u003e\u003cspan address=\"10.1093/aje/kwt441\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSteele AJ, Denaxas SC, Shah AD, Hemingway H, Luscombe NM. Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease. PLoS ONE. 2018;13:e0202344. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1371/journal.pone.0202344\u003c/span\u003e\u003cspan address=\"10.1371/journal.pone.0202344\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"bmc-medical-informatics-and-decision-making","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"midm","sideBox":"Learn more about [BMC Medical Informatics and Decision Making](http://bmcmedinformdecismak.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/midm/default.aspx","title":"BMC Medical Informatics and Decision Making","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Coronary artery disease, Machine learning, Risk prediction models, Electronic medical records, SHAP","lastPublishedDoi":"10.21203/rs.3.rs-8368403/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8368403/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eBackground and purpose:\u003c/h2\u003e \u003cp\u003eCoronary artery disease (CAD) represents the leading cause of mortality on a global scale, with severe clinical events such as resuscitation or death occurring frequently during the course of hospitalisation. The utility of existing predictive models may be constrained by their incomplete utilisation of the depth of electronic medical records (EMRs), which could limit their effectiveness and scope. This study aims to develop and validate interpretable risk prediction models to predict severe clinical events in hospitalized patients with coronary artery disease, enhancing clinical decision-making and patient management.\u003c/p\u003e\u003ch2\u003eMethods:\u003c/h2\u003e \u003cp\u003eWe conducted a retrospective study using EMRs from CAD patients admitted to Xiyuan Hospital between 2016 and 2024. The dataset includes structured and unstructured data extracted via natural language processing (NLP) from EMRs. We developed five machine learning (ML), including Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Gaussian Naive Bayes (GNB), and Deep Neural Network (DNN). The discrimination ability was comprehensively evaluated by the area under the curve (AUC); sensitivity, specificity, and F1 score. SHapley Additive exPlanations (SHAP) were used to interpret model predictions.\u003c/p\u003e\u003ch2\u003eResults:\u003c/h2\u003e \u003cp\u003eOf the 6,971 patients included, 268 (3.84%) experienced severe clinical events during hospitalization. The DNN model demonstrated the best performance, with an AUC of 0.995 (95% CI: 0.985\u0026ndash;0.999). The SHAP analysis demonstrated that the most significant predictors were the admission principal diagnoses of acute CAD, followed by the presence of urinary occult blood and the mental state of the patient.\u003c/p\u003e\u003ch2\u003eConclusion:\u003c/h2\u003e \u003cp\u003eUsing NLP and ML models to integrate data from EMRs enables early warning of severe clinical events in hospitalized CAD patients. The interpretable prediction models developed in this study can assist clinicians in more accurately predicting severe clinical events, thereby enhancing clinical decision-making and patient management.\u003c/p\u003e","manuscriptTitle":"Machine learning models for predicting severe clinical events in hospitalized patients with coronary artery disease","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-01-12 10:27:21","doi":"10.21203/rs.3.rs-8368403/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2026-02-12T09:43:56+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-01-30T03:35:18+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"310455348532606442415292186113326555240","date":"2026-01-27T07:55:07+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-01-25T20:57:59+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"162874743609996738749277236125619682591","date":"2026-01-25T03:09:03+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"93923512662612776778004870274538464254","date":"2026-01-11T08:31:38+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"39169652883971789256863880093484269833","date":"2026-01-08T07:02:28+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2026-01-08T04:59:38+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2025-12-22T18:48:39+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-12-19T07:01:25+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-12-19T06:58:13+00:00","index":"","fulltext":""},{"type":"submitted","content":"BMC Medical Informatics and Decision Making","date":"2025-12-15T16:06:03+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"bmc-medical-informatics-and-decision-making","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"midm","sideBox":"Learn more about [BMC Medical Informatics and Decision Making](http://bmcmedinformdecismak.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/midm/default.aspx","title":"BMC Medical Informatics and Decision Making","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"92155ffe-b1b2-49d9-9a18-4c98508ebd89","owner":[],"postedDate":"January 12th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"in-revision","subjectAreas":[],"tags":[],"updatedAt":"2026-02-12T09:54:45+00:00","versionOfRecord":[],"versionCreatedAt":"2026-01-12 10:27:21","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8368403","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8368403","identity":"rs-8368403","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.