Early differentiation between paroxysmal and persistent atrial fibrillation based on interpretable machine learning: a multicenter retrospective study

doi:10.21203/rs.3.rs-7302454/v1

Early differentiation between paroxysmal and persistent atrial fibrillation based on interpretable machine learning: a multicenter retrospective study

2025 · doi:10.21203/rs.3.rs-7302454/v1

preprint OA: closed

Full text JSON View at publisher

Full text 158,735 characters · extracted from preprint-html · click to expand

Early differentiation between paroxysmal and persistent atrial fibrillation based on interpretable machine learning: a multicenter retrospective study | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Early differentiation between paroxysmal and persistent atrial fibrillation based on interpretable machine learning: a multicenter retrospective study Sijin Li, Yuqi Zhang, Weijie Wu, Guang Li, Tucheng Huang, Kuan Zeng, and 3 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7302454/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 06 Feb, 2026 Read the published version in BioData Mining → Version 1 posted 7 You are reading this latest preprint version Abstract Aims Atrial fibrillation (AF) is a common arrhythmia associated with increased risks of stroke and heart failure. Early differentiation between paroxysmal and persistent AF at first diagnosis is critical for guiding treatment decisions. This study aimed to develop an interpretable machine learning model based on structured electronic health records (EHR) to distinguish AF subtypes and identify key contributing factors. Methods and results In this multicenter, retrospective cohort study, data were collected from three tertiary hospitals in China between January 2013 and January 2023. A total of 11,986 patients with suspected AF were screened, of whom 4155 patients with first-diagnosed AF were included (paroxysmal: 2565 [61.3%]; persistent: 1620 [38.7%]). Structured EHR variables including clinical demographics, serological indicators, and echocardiographic parameters were extracted. Variable selection was performed using Spearman correlation and least absolute shrinkage and selection operator regression. Three machine learning algorithms were trained and externally validated. The CatBoost model achieved the best performance, with an area under the receiver operating characteristic curve of 0.876 (95% CI: 0.871–0.880) and accuracy of 0.808 (95% CI: 0.803–0.816). Sensitivity and specificity ranged from 0.802 to 0.811. Shapley additive explanations (SHAP) were used to interpret model outputs and identify variables most associated with AF subtype classification. Conclusion This multicenter study demonstrates that interpretable machine learning models based on structured EHR data can accurately distinguish paroxysmal from persistent AF at first diagnosis. The proposed model may facilitate early subtype-specific risk stratification and personalized treatment, potentially improve outcomes and reduce disparities in AF care across different medical conditions. Paroxysmal atrial fibrillation Persistent atrial fibrillation Subtype classification Machine learning Multicenter retrospective study Figures Figure 1 Figure 2 Figure 3 Introduction Atrial fibrillation (AF) is the most common sustained cardiac arrhythmia and is associated with an increased risk of stroke, death, heart failure, hospitalization, and cognitive decline 1 – 6 . Early detection of AF offers a critical period for initiating appropriate therapies, modifying risk factors, and ensuring closer follow-up of high-risk individuals to prevent AF-related morbidity and mortality 7 – 9 . Most AF screening is opportunistic, according to the latest European Society of Cardiology (ESC) and American Heart Association (AHA) guidelines 10 – 12 . And screening also has shortcomings, such as easily overlooking patients with paroxysmal or short-term persistent AF 13 , 14 . Current ESC/AHA guidelines emphasize that paroxysmal and persistent AF exhibit distinct treatment strategies and divergent long-term clinical prognoses 10 , 15 , 16 . Patients with persistent AF are more likely to have adverse cardiovascular events, such as stroke and heart failure with preserved ejection fraction 11 , 17 – 21 . Notably, paroxysmal and persistent AF account for approximately 38.9% and 39.2%, respectively, of all outpatient AF cases in China 22 . Therefore, early differentiation between paroxysmal and persistent AF has strong clinical application value. Currently, most researchers have paid attention to distinguishing AF subtypes by analyzing the AF attack phenomenon in ECG signals without evaluating from multiple angles, including previous medical history, serological tests, and cardiac ultrasound results 23 , 24 . Although this has been noted by some researchers, it has been limited to explorations of small samples of people in single centers 25 . Consequently, there is a lack of attention to the differentiation of paroxysmal and persistent AF. Recent technological advances in artificial intelligence (AI) and machine learning (ML) provide opportunities to develop new tools to enable screening in risk factors and achieve more personalized treatment 26 – 28 . In particular, ML models leveraging electronic health record (EHR) data have seen increasing application in cardiovascular medicine 29 . As shown in Fig. 1 A, in this study, we collected patient data from three tertiary hospitals in China and developed an interpretable ML model capable of accurately distinguishing paroxysmal AF from persistent AF of first-diagnosed AF patients. Additionally, we identified novel AF-related risk factors and implemented an online calculator to facilitate the broad application of our findings across Chinese and broader Asian populations. By enabling the timely and precise identification of AF subtypes, our approach aims to inform clinical decision-making, enhance preventive strategies, and pave the way for personalized care in AF management. Methods Datasets This study was a multicenter retrospective study. The data were obtained from the EHR data of three tertiary medical institutions in different regions of China, including Sun Yat-sen Memorial Hospital of Sun Yat-sen University (SYSMH), Tungwah Hospital of Sun Yat-sen University (DH), and Dongguan Songshan Lake Tungwah Hospital (SSH). Patient enrolment and data collection Patients were enrolled based on the following criteria: (1) Hospitalization occurring between January 2013 and January 2023. (2) Documented evidence of AF rhythm either prior to or during hospitalization, confirmed by diagnostic modalities such as surface electrocardiography, 24-hour Holter monitoring, or pacemaker memory recordings. All patients had a first-time diagnosis of AF, though the exact AF subtype at the time of initial diagnosis was not specified. (3) AF was confirmed as the discharge diagnosis, classified as either paroxysmal or persistent AF according to the International Classification of Diseases, 10th Revision (ICD-10). Specifically, paroxysmal AF was coded as I48.x02 and persistent AF as I48.x00x007. Paroxysmal AF was defined as self-terminating episodes lasting less than 7 days, while persistent AF was defined as sustained episodes lasting longer than 7 days and requiring medical intervention 10 , 11 . (4) Complete and detailed medical history records were available. Patients were excluded if they met any of the following conditions: (1) History of rheumatic heart disease, congenital heart disease, primary valvular heart disease, cardiomyopathy, pericardial disease, cor pulmonale, malignancy, or recent major surgery. (2) Presence of systemic diseases potentially affecting cardiac structure or function, including acute myocardial infarction, thyroid dysfunction (hyperthyroidism or hypothyroidism), amyloidosis, pheochromocytoma, systemic lupus erythematosus, or severe infections. (3) Severe hepatic dysfunction, defined as alanine aminotransferase (ALT) levels exceeding three times the upper limit of normal and an ALT/AST ratio greater than 1. (4) Severe renal insufficiency, defined as an estimated glomerular filtration rate (eGFR) of less than 30 ml/min/1.73 m². (5) Incomplete clinical data, specifically defined as the absence of more than half of the required clinical variables or missing transthoracic echocardiography results. All procedures in this study were performed in accordance with the ethical standards of the responsible committee for human experimentation in China and the Helsinki Declaration of 1975. This study was reviewed and approved by the Ethics Committee of SYSMH, DH, and SSH, respectively (SYSKY-2024-004-01, DHKY-2025-003-01, and SDHKY-2025-002-01). Selection of variables We collected demographic data, medication use, serological indicators, and baseline cardiac ultrasound data of all included subjects, totaling 50 variables. We performed Spearman correlation analysis on all the collected variables to initially screen out the variables that were correlated with the diagnosis of AF subtypes. Among them, P < 0.05 was considered statistically significant. Based on our previous studies, we added several variables that were considered to be correlated with the diagnosis of AF subtypes before further screening 25 . Subsequently, we further screened the variables by Gradient Boosting Tree Recursive Feature Elimination (RFE) and Random Forest RFE. Through the above steps, we could comprehensively and accurately assess the importance of the variables and provide strong support for the subsequent model construction. Machine Learning Algorithms In this study, we conducted experiments using five widely adopted gradient boosting machine (GBM) algorithms: LightGBM (Light Gradient Boosting Machine), AdaBoost (Adaptive Boosting), GradientBoost, XGBoost (Extreme Gradient Boosting), and CatBoost (Categorical Boosting). Each algorithm offers unique advantages, enhancing the overall diversity and robustness of the analysis. LightGBM is a highly efficient gradient boosting framework. It is particularly known for its fast training speed and low memory usage, owing to its histogram-based method for decision tree learning. AdaBoost is one of the earliest boosting algorithms. The core mechanism of AdaBoost is to re-weight misclassified instances in each iteration, enabling the model to focus more on challenging cases, thus improving accuracy over time. GradientBoost is a classical gradient boosting algorithm that minimizes a loss function through the iterative training of decision trees. XGBoost is a highly efficient and scalable variant of gradient boosting, known for its exceptional performance in both speed and accuracy. CatBoost is an advanced gradient boosting algorithm that excels in handling classified features without the need for explicit preprocessing. Employing a range of machine learning models enabled us to comprehensively assess their performance on our datasets, providing insights into which models excel in addressing specific problems. We conducted experiments using five-fold cross-validation to comprehensively evaluate their performance on the dataset. SHAP Interpretable Analysis for Machine Learning SHAP (SHapley Additive exPlanations) 30 is a method rooted in game theory that allows for the quantification of each feature's contribution to a model's final prediction. While the tool itself provides transparency into model decisions, its real value lies in the insights it offers, helping us to better understand the inner workings of our machine learning models and the relationships between features and predictions. By calculating Shapley values, SHAP breaks down the model’s output into additive contributions from each feature, which not only aids in interpreting the model but also facilitates identifying which features most significantly impact the outcomes. The application of SHAP is particularly meaningful in our research, as it enables us to go beyond the "black box" nature of machine learning models and gain a deeper understanding of the driving forces behind their predictions. Statistical Analysis Normally distributed continuous variables were expressed as mean ± standard deviation (SD), and nonnormally distributed continuous variables were expressed as median (interquartile range, IQR). The distribution of continuous variables was assessed using the Shapiro–Wilk normality test, while the comparison of continuous variables was performed using Mann-Whitney U test. Classified variables were expressed as count and percentage. The x 2 test was used for comparisons of classified variables. Two‐tailed values of P < 0.05 were considered statistically significant. We used area under the receiver operating characteristic curve (AUC), sensitivity (SEN), specificity (SPE), accuracy (ACC), precision (PRE), recall, and F1 scores for evaluating the ability of the ML model to discriminate between paroxysmal and persistent AF. Displays restricted cubic spline (RCS) curves with 4 knots to test nonlinear relationships between independent variables and outcomes 31 , 32 . SPSS Statistics Version 26.0 and Python 3.7.6 software were used for statistical analysis and graphics, and p < 0.05 was considered statistically significant. Results Variable selection results A detailed explanation of all the variables collected in this study can be found in Supplementary Table 1. After screening, we finally included 10 variables that can be divided into three categories: demographic data, cardiac ultrasound, and serological indicators. Demographic data included systolic blood pressure (SBP). Echocardiographic parameters included left atrial diameter (LA) and left ventricular ejection fraction (LVEF). These indicators were analyzed by routine transthoracic echocardiography (TTE) performed by a certified cardiologist at baseline and collected from the EHR. Serological parameters included white blood cell (WBC), neutrophils (NC), hemoglobin (Hb), N-terminal pro-brain natriuretic peptide (NT-proBNP), uric acid (UA), the ratio of low-density lipoprotein cholesterol to high-density lipoprotein cholesterol (LDL-C/HDL-C), and the atherosclerosis index of plasma (AIP). AIP is a logarithmically transformed ratio of TG (Total triglycerides) to HDL-C (High density lipoprotein cholesterol) in molar concentration (mmol/L), and it is mathematically derived from log (TG/HDL-C) 33 . All serological indicators were obtained from the peripheral blood sample collected for the first time at baseline. The distribution of these 10 variables and their correlation with AF diagnostic subtypes in three independent centers are clearly shown in Fig. 2A1-A10 and B1-B3. Baseline characteristics of participants Initially, we collected 11,986 patients from DH, SYSMH, and SSH, including 6436, 4581, and 969 patients (Fig. 1 B). According to the inclusion and exclusion criteria, we finally enrolled 4155 patients that including 2248 in DH, 1599 in SYSMH, and 338 in SSH. The number of patients with paroxysmal AF and persistent AF was 2565 (61.29%) and 1620 (38.71%), respectively. The proportions of patients with paroxysmal AF and persistent AF in the three centers were also close to 60% and 40% (1361 and 887 in DH, 1019 and 580 in SYSMH, 187 and 151 in SSH). The baseline characteristics of all variables finally included in the model are shown in Table 1 . Table 1 Baseline characteristics of participant Variables Total (n = 4185) DH (n = 2248) SYSMH (n = 1599) SSH (n = 338) p-Value Age (years) 68 (59, 77) 69 (59, 78) 67 (60, 75) 69 (58, 78) < 0.001* Gender (n, %) 0.797 Male (%) 2468 (58.97) 1320 (58.71) 943 (58.97) 205 (60.65) Female (%) 1717 (41.03) 928 (41.29) 656 (41.03) 133 (39.35) SBP (mmHg) 131 (117, 147) 135 (118, 150) 127 (116, 142) 132 (118, 147) < 0.001* WBC (10 9 /L) 7.08 (5.73, 9.09) 7.43 (5.84, 9.74) 6.72 (5.67, 8.24) 7.17 (5.51, 9.19) < 0.001* NC (10 9 /L) 4.53 (3.36, 6.45) 4.86 (3.35, 7.23) 4.21 (3.35, 5.50) 5.07 (3.17, 7.34) < 0.001* Hb (g/L) 132 (118, 146) 131 (115, 145) 135 (123, 147) 130 (117, 145) < 0.001* AIP 0.032 (-0.156, 0.226) 0.014 (-0.180, 0.217) 0.055 (-0.128, 0.241) 0.009 (-0.173, 0.213) < 0.001* LDL-C/HDL-C 2.353 (1.733, 3.099) 2.256 (1.637, 3.045) 2.521 (1.944, 3.203) 2.055 (1.464, 2.792) < 0.001* NT-proBNP (pg/ml) 792 (209, 2337) 1144 (312, 3053) 470 (128, 1190) 1223 (324, 3469) < 0.001* UA (µmol/L) 384 (309, 468) 381 (301, 473) 391 (320, 468) 364 (289, 446) < 0.001* LA (mm) 37 (32, 43) 36 (31, 42) 38 (35, 43) 40 (35, 45) < 0.001* LVEF (%) 64 (57, 68) 62 (55, 67) 66 (61, 70) 59 (50, 65) < 0.001* Diagnose (n, %) 0.008* 0 2565 (61.29) 1361 (60.54) 1019 (63.73) 187 (55.33) 1 1620 (38.71) 887 (39.46) 580 (36.27) 151 (44.67) Values are presented as n (%) as appropriate or the median [interquartile range (IQR)]. DH, Donghua Hospital of Sun Yat-sen University; SYSMH, Sun Yat-sen Memorial Hospital of Sun Yat-sen University; SSH, Dongguan Songshan Lake Donghua Hospital; SBP, systolic blood pressure; WBC, white blood cell; NC, Neutrophil count; Hb, hemoglobin; AIP, Atherogenic index of plasma; LDL-C, low density lipoprotein cholesterol; HDL-C, high density lipoprotein cholesterol; NT-proBNP, N-terminal brain natriuretic peptide precursor; UA, uric acid; LA, left atrial diameter; LVEF, left ventricular ejection fraction; Diagnose, 0 = paroxysmal AF, 1 = persistent AF. Results of AF subtype prediction model As described in the Methods section, we used five machine learning methods to build the model. We used the DH dataset with the largest sample size as the primary dataset and performed a five-fold cross-validation. As shown in Fig. 1 C, in each cross-validation, the DH dataset was split into a training set and internal validation set in the ratio of 8:2, accompanied by independent external validation of the SYSMH and SSH datasets. This strategy helps eliminate the influence of data partitioning, ensuring more convincing and reliable results. The output variable of the model was the predicted AF diagnostic subtype, which was compared with the diagnosis recorded in the EHR at discharge. Comparing the evaluation indicators, we found that the model established in our study had good predictive performance and stable generalizability (as shown in Table 2 ). Table 2 Results of model output indicators in multicenter Model ACC Precision Recall AUC F1 Score SEN SPE DH CatBoost 0.789 (0.766–0.825) 0.755 (0.700-0.812) 0.694 (0.656–0.727) 0.861 (0.835–0.898) 0.722 (0.701–0.764) 0.694 (0.656–0.727) 0.852 (0.798–0.892) GradientBoost 0.791 (0.770–0.823) 0.753 (0.727–0.803) 0.702 (0.648–0.736) 0.859 (0.838–0.894) 0.726 (0.698–0.765) 0.702 (0.648–0.736) 0.849 (0.821–0.885) LightGBM 0.780 (0.766–0.815) 0.738 (0.702–0.793) 0.688 (0.645–0.721) 0.855 (0.835–0.889) 0.711 (0.689–0.754) 0.688 (0.645–0.721) 0.840 (0.809–0.879) XGBoost 0.774 (0.756–0.896) 0.727 (0.679–0.754) 0.688 (0.644–0.720) 0.846 (0.827–0.875) 0.706 (0.683–0.733) 0.688 (0.644–0.720) 0.831 (0.797–0.852) AdaBoost 0.783 (0.768–0.803) 0.740 (0.716–0.769) 0.697 (0.661–0.726) 0.845 (0.826–0.875) 0.717 (0.695–0.739) 0.697 (0.661–0.726) 0.840 (0.811–0.863) SSH CatBoost 0.748 (0.735–0.762) 0.681 (0.669–0.690) 0.834 (0.817–0.867) 0.833 (0.828–0.840) 0.750 (0.736–0.767) 0.834 (0.817–0.867) 0.677 (0.665–0.696) GradientBoost 0.734 (0.725–0.744) 0.670 (0.663–0.677) 0.813 (0.792–0.837) 0.825 (0.820–0.831) 0.734 (0.725–0.748) 0.813 (0.792–0.837) 0.668 (0.650–0.685) LightGBM 0.737 (0.731–0.755) 0.674 (0.662–0.690) 0.812 (0.792–0.836) 0.821 (0.814–0.830) 0.737 (0.728–0.755) 0.812 (0.792–0.836) 0.676 (0.651–0.691) XGBoost 0.728 (0.708–0.746) 0.666 (0.649–0.681) 0.804 (0.773–0.829) 0.821 (0.806–0.830) 0.728 (0.706–0.747) 0.804 (0.773–0.829) 0.666 (0.644–0.681) AdaBoost 0.741 (0.725–0.764) 0.677 (0.657–0.699) 0.817 (0.786–0.842) 0.819 (0.808–0.830) 0.740 (0.722–0.764) 0.817 (0.786–0.842) 0.678 (0.642–0.701) SYSMH CatBoost 0.808 (0.803–0.816) 0.707 (0.698–0.719) 0.802 (0.788–0.812) 0.876 (0.871–0.880) 0.752 (0.747–0.761) 0.802 (0.788–0.812) 0.811 (0.802–0.820) GradientBoost 0.802 (0.794–0.810) 0.702 (0.688–0.708) 0.790 (0.769–0.809) 0.872 (0.869–0.873) 0.743 (0.733–0.755) 0.790 (0.769–0.809) 0.809 (0.795–0.814) LightGBM 0.807 (0.801–0.811) 0.706 (0.695–0.721) 0.801 (0.779–0.820) 0.875 (0.873–0.877) 0.750 (0.744–0.755) 0.801 (0.779–0.820) 0.810 (0.797–0.828) XGBoost 0.800 (0.795–0.807) 0.697 (0.685–0.704) 0.793 (0.772–0.810) 0.873 (0.869–0.879) 0.742 (0.734–0.751) 0.793 (0.772–0.810) 0.804 (0.789–0.811) AdaBoost 0.787 (0.775–0.795) 0.678 (0.666–0.686) 0.787 (0.762–0.803) 0.860 (0.853–0.866) 0.728 (0.711–0.740) 0.787 (0.762–0.803) 0.787 (0.782–0.791) DH, Donghua Hospital of Sun Yat-sen University; SYSMH, Sun Yat-sen Memorial Hospital of Sun Yat-sen University; SSH, Dongguan Songshan Lake Donghua Hospital; ACC, accuracy; AUC, area under curve; CI, confidence interval; SEN, sensitivity; SPE, specificity. Among the overall models built based on the DH dataset, the GradientBoost model had the highest accuracy of 0.791 (95%CI: 0.770–0.823), followed by the CatBoost model, which was 0.789 (95%CI: 0.766–0.825). In terms of AUC value comparison, the value of the GradientBoost model was 0.859 (95% CI: 0.838–0.894), which was slightly lower than the maximum value of 0.861 (95% CI: 0.835–0.898) of the CatBoost model. In terms of sensitivity and specificity, the GradientBoost model (0.702, 95%CI: 0.648–0.736) and the CatBoost model (0.852, 95%CI: 0.798–0.892) performed best, respectively. As for the two independent validation sets SSH and SYSMH, except that the highest specificity of SSH was achieved by the AdaBoost model (0.678, 95%CI: 0.642–0.701), the CatBoost model had the highest accuracy of 0.808 (95%CI: 0.803–0.816), AUC value of 0.876 (95%CI: 0.871–0.880), sensitivity of 0.802 (95%CI: 0.788–0.812) and specificity of 0.811 (95%CI: 0.802–0.820). We summarized the AUC for every center and the results of each fold of the five-fold cross validation for the five algorithms, as shown in Fig. 2C1-C3. We also plotted the AUC results of each fold in the five-fold cross-validation process for all centers in Supplementary Fig. 1. Interpretation of AF subtype prediction model As shown in Fig. 3 A, the impact of different variables on the model output is illustrated by ranking their absolute SHAP values in descending order. The five variables that most significantly affect the diagnosis of different AF types are LA, NT-proBNP, Hb, LVEF, and UA. Figure 3 B provides a more intuitive view of the relationship between these variables and AF diagnostic types. It can be observed that LA, LVEF, and UA affect the diagnosis of AF in a certain pattern. For example, LA displays a gradient transitioning from blue to red. There is a distinct color boundary near a SHAP value of 0, indicating a regular pattern between LA values and AF diagnostic types. Specifically, when LA values are lower, the model tends to predict paroxysmal AF, whereas higher LA values are associated with a diagnosis of persistent AF. Figure 3 C shows the impact of all variables on sample classification across all samples, with red indicating a positive effect (persistent AF) on the model’s prediction and blue indicating a negative effect (persistent AF). Figure 3 D further illustrates the influence of each variable on the model’s prediction for a specific sample using the SHAP method. Figure 3 E shows how the top five variables most influential for distinguishing AF subtypes affect the model’s output as each variable changes. Compared to Fig. 2 B, this view makes the impact of each variable’s trend even clearer. For example, as LA increases, its contribution shifts the model’s prediction toward persistent AF. In the model for early differentiation of AF subtypes as paroxysmal or persistent, the top five variables were LA, NT-proBNP, LVEF, UA, and SBP. For these variables, we further obtained the mutual influence relationship between them and plotted them into a scatter plot (see Supplementary Fig. 2 for details). As shown in Supplementary Fig. 3, we also explored the relationship between these five variables and AF diagnostic subtypes in three independent centers by restricted cubic spline analysis. Performance and explanation of AF subtype prediction models in subgroups We divide the participants of every center into six groups: male under 60 years old, female under 60 years old, male 60–65 years old, female 60–65 years old, male 65 years old and above, female 65 years old and above. The model has achieved good prediction performance among different subgroups. We sorted the results of AUC between different centers according to male or female in different age subgroups to draw Supplementary Fig. 4. The order of A-F in Supplementary Fig. 4 is arranged according to age, and centers 1–3 are DH, SSH, and SYSMH, respectively. We show these results and the specific values of other evaluation indicators in Supplementary Tables 2–4. Similarly, we used the SHAP method to visualize the interpretability of the model. The SHAP graphs of all subgroups are summarized in Supplementary Fig. 5 of the Supplementary Materials. The most important and second influencing factors of any age subgroups are LA and NT-proBNP, which are similar to the overall model. Discussion This study integrated EHR data (clinical history, serum indicators, and cardiac ultrasound) from three independent large tertiary hospitals in China and established an interpretable model based on machine learning to accurately distinguish paroxysmal and persistent AF in advance. We screened out new AF subtype-related factors, which not only provided a basis for further mechanism research, but also helped us build an online web calculator ( http://123.56.120.106:8000/ ) that can accurately distinguish whether patients with first-time AF will have paroxysmal or persistent AF in the long term and give the probability. In addition, the variables affecting diagnosis in different age and gender subgroups were evaluated to confirm the importance and specificity of these variables in distinguishing early AF subtypes between paroxysmal and persistent AF. AF is a common cause of stroke, heart failure, cardiovascular death, and dementia 8 , 34 – 36 . The global burden of AF is rising 37 , and Asia-Pacific patients account for the majority of AF patients worldwide, while Chinese patients are the fifth in estimated prevalence and the first in absolute prevalence in the Asia-Pacific region 38 . China and the Asia-Pacific region face challenges including great disparities in access to healthcare and availability of diagnostic technology 38 , 39 . The model and online calculator developed based on this study can help narrow the disparity in healthcare access between different regions and improve the long-term prognosis of AF patients caused by insufficient availability of diagnostic technology. This study found 10 variables that can accurately distinguish paroxysmal from persistent AF in advance, which is partially similar to our previous research results 25 . Among the key variables affecting the model obtained based on the SHAP method, LA is the most important. Previous studies have found that LA is one of the most important indicators for predicting new-onset AF and early distinguishing AF subtypes 21 , 40 , 41 , and its increase is associated with the progression of AF from paroxysmal to persistent 42 . Our research has verified this view in the whole population and different subgroups. The latest studies suggest that the possible pathogenic mechanisms of this phenomenon are atrial cardiomyopathy and atrial fibrosis 43 , 44 , and left atrial diameter is the clinical manifestation of these potential mechanisms. Another crucial variable is UA, and our results suggest that higher UA levels are more likely to be diagnosed with persistent AF. This conclusion is consistent with the view of a recent published review 45 . There is a significant difference in UA levels between paroxysmal AF and persistent AF 46 , and this dose-response relationship can be observed in people with different disease backgrounds 47 . The potential mechanism may be that high UA may not only increase the risk of AF through cardiovascular disease, but also directly affect the development of AF through mechanisms such as oxidative stress, inflammation, insulin resistance, and activation of the renin-angiotensin-aldosterone system, ultimately leading to electrical remodeling, changes in the autonomic nervous system, abnormal Ca 2+ handling, and atrial remodeling 48 – 50 . The incidence of patients with high UA is increasing year by year worldwide 51 , and it affects about 14.0% of adults in China or even more 52 . Therefore, the relationship between UA and AF subtype should be the focus of further research. Our study also further confirmed the importance of SBP in differentiating AF subtypes, which is consistent with the conclusion of a recent study based on the China Cardiovascular Care Quality Improvement-Atrial Fibrillation 53 . Higher blood pressure levels on admission to hospital for AF patients were associated with an increased risk of stroke/transient ischemic attack and heart failure (HF), while lower blood pressure levels were associated with an increased risk of HF and all-cause mortality. Patients with lower SBP are more likely to have persistent AF, which is accompanied by a greater risk of HF. Further research is needed to clarify the relationship between blood pressure and AF type and prognosis. Studies have shown that there are clear gender differences in the epidemiology and risk factors of AF 54 – 56 , and the relationship between increasing age and the progression of AF has been confirmed 42 . We conducted the subgroup analysis combining gender and age. The cut-off points were derived from the newly released Chinese AF Diagnosis and Management Guidelines based on Asian research evidence 57 – 60 . We confirmed the stability of the model by obtaining similar model performance in different subgroups. Inter-group comparisons also yielded some meaningful conclusions, such as the highest AUC in the subgroup < 60 in male, and the highest AUC in the subgroup 60–64 in female. Our study achieved accurate differentiation of AF subtypes in younger age groups. A recent study concluded that younger age at first diagnosis of AF is associated with a higher risk of stroke 61 , so this study is conducive to early anticoagulation treatment for younger age subgroups based on this conclusion. Nevertheless, our study also has some limitations. We only explored the distinction between two AF subtypes, while the latest ACC guidelines proposed four stages of AF evolution 62 . A larger sample size of AF patients with more subtypes and long-term follow-up are needed to achieve accurate multi-classification and progress prediction of AF subtypes 29 . Although the patients came from different provinces, the three centers included in this study were all in China, which means that more centers in other districts and countries need to be added in the future to make the sample characteristics more extensive. While increasing research centers and participants, the balance of sample accounts between different centers is also something that needs to be balanced in our further researches. Conclusion In this study, we collected data from 11,986 patients across three tertiary hospitals in China and developed an interpretable machine learning model capable of accurately distinguishing early paroxysmal AF from persistent AF subtypes of first-diagnosed AF. Additionally, we identified novel AF-related risk factors and implemented an online calculator to facilitate the broad application of our findings throughout Chinese and wider Asian populations. By enabling timely and precise identification of AF subtypes, our approach aims to inform clinical decision-making, strengthen preventive strategies, and pave the way for personalized care in AF management. Abbreviations ACC accuracy AF atrial fibrillation AIP atherogenic index of plasma AUC area under the curve AV aortic valve flow velocity BNP brain natriuretic peptide CI confidence interval DNP diastolic blood pressure EHR electronic health records ESC European Society of Cardiology Hb hemoglobin HDL-C high-density lipoprotein cholesterol HF heart failure ICD International Classification of Diseases IQR interquartile range LA left atrial diameter LASSO least absolute shrinkage and selection operator LDL-C low-density lipoprotein cholesterol LVDd left ventricular end-diastolic diameter LVEF left ventricular ejection fraction ML machine learning NT-proBNP N-terminal pro-brain natriuretic peptide RCS restricted cubic spline RFE Recursive Feature Elimination ROC Receiver operating characteristic SBP systolic blood pressure SD standard deviation SEN sensitivity SHAP SHapley Additive exPlanations SPE specificity TTE transthoracic echocardiography UA uric acid. Declarations Conflict of Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Consent for publication Not applicable. Ethics approval and consent to participate This study was reviewed and approved by the Ethics Committees of the Sun Yat-sen Memorial Hospital of Sun Yat-sen University (SYSKY-2024-004-01), the Ethics Committees of Tungwah Hospital of Sun Yat-sen University (DHKY-2025-003-01), and Dongguan Songshan Lake Tungwah Hospital (SDHKY-2025-002-01). The study adhered to the tenets of the Declaration of Helsinki. Funding This study is partially supported by National Natural Science Foundation of China (62176016, 72274127), National Key R\&D Program of China (No. 2021YFB2104800), Guizhou Province Science and Technology Project: Research on Q&A Interactive Virtual Digital People for Intelligent Medical Treatment in Information Innovation Environment (supported by Qiankehe[2024] General 058), Capital Health Development Research Project(2022-2-2013), Haidian innovation and translation program from Peking University Third Hospital (HDCXZHKC2023203), and Project: Research on the Decision Support System for Urban and Park Carbon Emissions Empowered by Digital Technology - A Special Study on the Monitoring and Identification of Heavy Truck Beidou Carbon Emission Reductions to Chao Tong. The Grant from Key Laboratory of Coronary Intraluminal Imaging and Functional Analysis of Dongguan City to Heng Li. And Shenzhen Medical Research Fund (B2302020), National Natural Science Foundation of China (82330021, 82270771), Shenzhen Science and Technology Program (KCXFZ20211020163801002, ZDSYS20220606100801004, SGDX20230116092459009), and Shenzhen Key Medical Discipline Construction Fund (SZXK002), Futian District Public Health Scientific Research Project of Shenzhen (FTWS2022001), Chinese Association of Integrative Medicine-Shanghai Hutchison Pharmaceuticals Fund (HMPE202202), China Heart House-Chinese Cardiovascular Association HX fund (2022-CCA-HX-090) to Hui Huang. Author Contribution Chao Tong, Heng Li and Hui Huang are the guarantors of the study. All authors (Sijin Li, Yuqi Zhang, Weijie Wu, Guang Li, Tucheng Huang, Chao Tong, Heng Li and Hui Huang) were involved in the conceptualization and design of the study. Sijin Li and Yuqi Zhang were responsible for the experiment. Data cleaning was done by Weijie Wu Guang Li and Tuchang Huang. Analysis and interpretation were done by Sijin Li and Yuqi Zhang under the supervision and withthe support of Chao Tong, Heng Li and Hui Huang. Drafting of the article was done by Sijin Li, Yuqi Zhang. All authors (Sijin Li, Yuqi Zhang, Weijie Wu, Guang Li, Tucheng Huang, Chao Tong, Heng Li and Hui Huang) revised and contributed to the intellectual content of the article. All authors (Sijin Li, Yuqi Zhang, Weijie Wu, Guang Li, Tucheng Huang, Chao Tong, Heng Li and Hui Huang) approved the final version of the article, including the authorship list. Data availability Some or all data sets generated during and/or analyzed during the present study are not publicly available but are available from the corresponding author on reasonable request. References Tsao CW, Aday AW, Almarzooq ZI, et al. Heart Disease and Stroke Statistics-2023 Update: A Report From the American Heart Association. Circulation. 2023;147(8):e93–621. Petzl AM, Jabbour G, Cadrin-Tourigny J et al. Innovative approaches to atrial fibrillation prediction: should polygenic scores and machine learning be implemented in clinical practice? Europace 2024; 26(8). Papanastasiou CA, Theochari CA, Zareifopoulos N, et al. Atrial Fibrillation Is Associated with Cognitive Impairment, All-Cause Dementia, Vascular Dementia, and Alzheimer's Disease: a Systematic Review and Meta-Analysis. J Gen Intern Med. 2021;36(10):3122–35. Koh YH, Lew LZW, Franke KB, et al. Predictive role of atrial fibrillation in cognitive decline: a systematic review and meta-analysis of 2.8 million individuals. Europace. 2022;24(8):1229–39. Qin D, Mansour MC, Ruskin JN, Heist EK. Atrial Fibrillation-Mediated Cardiomyopathy. Circ Arrhythm Electrophysiol. 2019;12(12):e007809. Rienstra M, Tzeis S, Bunting KV et al. Spotlight on the 2024 ESC/EACTS management of atrial fibrillation guidelines: 10 novel key aspects. Europace 2024; 26(12). Mairesse GH, Moran P, Van Gelder IC, et al. Screening for atrial fibrillation: a European Heart Rhythm Association (EHRA) consensus document endorsed by the Heart Rhythm Society (HRS), Asia Pacific Heart Rhythm Society (APHRS), and Sociedad Latinoamericana de Estimulacion Cardiaca y Electrofisiologia (SOLAECE). Europace. 2017;19(10):1589–623. Rivard L, Friberg L, Conen D, et al. Atrial Fibrillation and Dementia: A Report From the AF-SCREEN International Collaboration. Circulation. 2022;145(5):392–409. Kirchhof P, Camm AJ, Goette A, et al. Early Rhythm-Control Therapy in Patients with Atrial Fibrillation. N Engl J Med. 2020;383(14):1305–16. Van Gelder IC, Rienstra M, Bunting KV, et al. 2024 ESC Guidelines for the management of atrial fibrillation developed in collaboration with the European Association for Cardio-Thoracic Surgery (EACTS). Eur Heart J. 2024;45(36):3314–414. Joglar JA, Chung MK, Armbruster AL, et al. 2023 ACC/AHA/ACCP/HRS Guideline for the Diagnosis and Management of Atrial Fibrillation: A Report of the American College of Cardiology/American Heart Association Joint Committee on Clinical Practice Guidelines. Circulation. 2024;149(1):e1–156. Writing Committee M, Joglar JA, Chung MK, et al. 2023 ACC/AHA/ACCP/HRS Guideline for the Diagnosis and Management of Atrial Fibrillation: A Report of the American College of Cardiology/American Heart Association Joint Committee on Clinical Practice Guidelines. J Am Coll Cardiol. 2024;83(1):109–279. Lubitz SA, Atlas SJ, Ashburner JM, et al. Screening for Atrial Fibrillation in Older Adults at Primary Care Visits: VITAL-AF Randomized Controlled Trial. Circulation. 2022;145(13):946–54. Uittenbogaart SB, Verbiest-van Gurp N, Lucassen WAM, et al. Opportunistic screening versus usual care for detection of atrial fibrillation in primary care: cluster randomised controlled trial. BMJ. 2020;370:m3208. Tzeis S, Gerstenfeld EP, Kalman J et al. 2024 European Heart Rhythm Association/Heart Rhythm Society/Asia Pacific Heart Rhythm Society/Latin American Heart Rhythm Society expert consensus statement on catheter and surgical ablation of atrial fibrillation. Europace 2024; 26(4). Natale A, Mohanty S, Sanders P, et al. Catheter ablation for atrial fibrillation: indications and future perspective. Eur Heart J. 2024;45(41):4383–98. Nattel S, Harada M. Atrial remodeling and atrial fibrillation: recent advances and translational perspectives. J Am Coll Cardiol. 2014;63(22):2335–45. Park JS, Cho I, Kim D, et al. Differentiating Left Atrial Pressure Responses in Paroxysmal and Persistent Atrial Fibrillation: Implications for Diagnosing Heart Failure With Preserved Ejection Fraction and Managing Atrial Fibrillation. J Am Heart Assoc. 2024;13(17):e035246. Vanassche T, Lauw MN, Eikelboom JW, et al. Risk of ischaemic stroke according to pattern of atrial fibrillation: analysis of 6563 aspirin-treated patients in ACTIVE-A and AVERROES. Eur Heart J. 2015;36(5):281–a7. Chiang CE, Naditch-Brule L, Murin J, et al. Distribution and risk profile of paroxysmal, persistent, and permanent atrial fibrillation in routine clinical practice: insight from the real-life global survey evaluating patients with atrial fibrillation international registry. Circ Arrhythm Electrophysiol. 2012;5(4):632–9. Ogawa H, An Y, Ikeda S, et al. Progression From Paroxysmal to Sustained Atrial Fibrillation Is Associated With Increased Adverse Events. Stroke. 2018;49(10):2301–8. Shi S, Tang Y, Zhao Q, et al. Prevalence and risk of atrial fibrillation in China: A national cross-sectional epidemiological study. Lancet Reg Health West Pac. 2022;23:100439. Gavidia M, Zhu H, Montanari AN, et al. Early warning of atrial fibrillation using deep learning. Patterns (N Y). 2024;5(6):100970. Petmezas G, Haris K, Stefanopoulos L, et al. Automated atrial fibrillation detection using a hybrid CNN-LSTM network on imbalanced ECG datasets. Biomed Signal Process Control. 2021;63:1021941746–8094. Zhang Y, Li S, Mai P, et al. A machine learning-based model for predicting paroxysmal and persistent atrial fibrillation based on EHR. BMC Med Inf Decis Mak. 2025;25(1):51. Daneshvar N, Pandita D, Erickson S, et al. Artificial Intelligence in the Provision of Health Care: An American College of Physicians Policy Position Paper. Ann Intern Med. 2024;177(7):964–7. Rose C, Chen JH. Learning from the EHR to implement AI in healthcare. NPJ Digit Med. 2024;7(1):330. Zhang Y, Li S, Wu W, et al. Machine-learning-based models to predict cardiovascular risk using oculomics and clinic variables in KNHANES. BioData Min. 2024;17(1):12. Quer G, Topol EJ. The potential for large language models to transform cardiovascular medicine. Lancet Digit Health. 2024;6(10):e767–71. Lundberg SM, Erion G, Chen H, et al. From local explanations to global understanding with explainable AI for trees. Nat Mach Intell. 2020;2(1):56–672522. Gupta S, Glezerman IG, Hirsch JS, et al. Derivation and external validation of a simple risk score for predicting severe acute kidney injury after intravenous cisplatin: cohort study. BMJ. 2024;384:e077169. Yang J, Wang T, Li K, Wang Y. Associations between per- and polyfluoroalkyl chemicals and abdominal aortic calcification in middle-aged and older adults. J Adv Res 2024. Assempoor R, Daneshvar MS, Taghvaei A, et al. Atherogenic index of plasma and coronary artery disease: a systematic review and meta-analysis of observational studies. Cardiovasc Diabetol. 2025;24(1):35. Hindricks G, Potpara T, Dagres N, et al. 2020 ESC Guidelines for the diagnosis and management of atrial fibrillation developed in collaboration with the European Association for Cardio-Thoracic Surgery (EACTS): The Task Force for the diagnosis and management of atrial fibrillation of the European Society of Cardiology (ESC) Developed with the special contribution of the European Heart Rhythm Association (EHRA) of the ESC. Eur Heart J. 2021;42(5):373–498. Fabritz L, Crijns H, Guasch E et al. Dynamic risk assessment to improve quality of care in patients with atrial fibrillation: the 7th AFNET/EHRA Consensus Conference. Europace 2021; 23(3): 329 – 44. Kim D, Yang PS, Sung JH, et al. Less dementia after catheter ablation for atrial fibrillation: a nationwide cohort study. Eur Heart J. 2020;41(47):4483–93. Tan S, Zhou J, Veang T, Lin Q, Liu Q. Global, regional, and national burden of atrial fibrillation and atrial flutter from 1990 to 2021: sex differences and global burden projections to 2046-a systematic analysis of the Global Burden of Disease Study 2021. Europace 2025; 27(2). Wong CX, Tse HF, Choi EK, et al. The burden of atrial fibrillation in the Asia-Pacific region. Nat Rev Cardiol. 2024;21(12):841–3. Liu H, Chen M. Atrial Fibrillation Screening in Asia: Balancing Costs and Benefits for Optimal Outcomes. JACC Asia. 2025;5(1):172–3. Lu R, Lumish HS, Hasegawa K, et al. Prediction of new-onset atrial fibrillation in patients with hypertrophic cardiomyopathy using machine learning. Eur J Heart Fail. 2025;27(2):275–84. Jabbour G, Nolin-Lapalme A, Tastet O, et al. Prediction of incident atrial fibrillation using deep learning, clinical models, and polygenic scores. Eur Heart J. 2024;45(46):4920–34. Padfield GJ, Steinberg C, Swampillai J, et al. Progression of paroxysmal to persistent atrial fibrillation: 10-year follow-up in the Canadian Registry of Atrial Fibrillation. Heart Rhythm. 2017;14(6):801–7. Choi SH, Jurgens SJ, Xiao L, et al. Sequencing in over 50,000 cases identifies coding and structural variation underlying atrial fibrillation risk. Nat Genet. 2025;57(3):548–62. Schotten U, Goette A, Verheule S. Translation of pathophysiological mechanisms of atrial fibrosis into new diagnostic and therapeutic approaches. Nat Rev Cardiol. 2025;22(4):225–40. Lu Y, Sun Y, Cai L, et al. Non-traditional risk factors for atrial fibrillation: epidemiology, mechanisms, and strategies. Eur Heart J. 2025;46(9):784–804. Wang X, Hou Y, Wang X, et al. Relationship between serum uric acid levels and different types of atrial fibrillation: An updated meta-analysis. Nutr Metab Cardiovasc Dis. 2021;31(10):2756–65. Ding M, Viet NN, Gigante B, Lind V, Hammar N, Modig K. Elevated Uric Acid Is Associated With New-Onset Atrial Fibrillation: Results From the Swedish AMORIS Cohort. J Am Heart Assoc. 2023;12(3):e027089. Masi S, Pugliese NR, Taddei S. The difficult relationship between uric acid and cardiovascular disease. Eur Heart J. 2019;40(36):3055–7. Liu CH, Huang SC, Yin CH et al. Atrial Fibrillation Risk and Urate-Lowering Therapy in Patients with Gout: A Cohort Study Using a Clinical Database. Biomedicines 2022; 11(1). Yu W, Cheng JD. Uric Acid and Cardiovascular Disease: An Update From Molecular Mechanism to Clinical Perspective. Front Pharmacol. 2020;11:582680. Dehlin M, Jacobsson L, Roddy E. Global epidemiology of gout: prevalence, incidence, treatment patterns and risk factors. Nat Rev Rheumatol. 2020;16(7):380–90. Du L, Zong Y, Li H, et al. Hyperuricemia and its related diseases: mechanisms and advances in therapy. Signal Transduct Target Ther. 2024;9(1):212. Sun Z, Hao Y, Liu J et al. Blood pressure and in-hospital outcomes in patients hospitalized with atrial fibrillation: findings from the CCC-AF project. Hypertens Res 2025. Kjerpeseth LJ, Igland J, Selmer R, et al. Prevalence and incidence rates of atrial fibrillation in Norway 2004–2014. Heart. 2021;107(3):201–7. Al-Khayatt BM, Salciccioli JD, Marshall DC, Krahn AD, Shalhoub J, Sikkel MB. Paradoxical impact of socioeconomic factors on outcome of atrial fibrillation in Europe: trends in incidence and mortality from atrial fibrillation. Eur Heart J. 2021;42(8):847–57. Sharashova E, Gerdts E, Ball J et al. Long-term pulse pressure trajectories and risk of incident atrial fibrillation: the Tromso Study. Eur Heart J 2025. Ma C, Wu S, Liu S, Han Y. Chinese guidelines for the diagnosis and management of atrial fibrillation. Pacing Clin Electrophysiol. 2024;47(6):714–70. Kim TH, Yang PS, Yu HT, et al. Age Threshold for Ischemic Stroke Risk in Atrial Fibrillation. Stroke. 2018;49(8):1872–9. Li YG, Lee SR, Choi EK, Lip GY. Stroke Prevention in Atrial Fibrillation: Focus on Asian Patients. Korean Circ J. 2018;48(8):665–84. Choi SY, Kim MH, Lee KM, et al. Age-Dependent Anticoagulant Therapy for Atrial Fibrillation Patients with Intermediate Risk of Ischemic Stroke: A Nationwide Population-Based Study. Thromb Haemost. 2021;121(9):1151–60. Cheng YJ, Deng H, Wei HQ, et al. Association Between Age at Diagnosis of Atrial Fibrillation and Subsequent Risk of Ischemic Stroke. J Am Heart Assoc. 2025;14(4):e038367. Ko D, Chung MK, Evans PT, Benjamin EJ, Helm RH. Atrial Fibrillation: A Review. JAMA. 2025;333(4):329–42. Additional Declarations No competing interests reported. Supplementary Files Supplementary.docx Cite Share Download PDF Status: Published Journal Publication published 06 Feb, 2026 Read the published version in BioData Mining → Version 1 posted Editorial decision: Revision requested 16 Jan, 2026 Reviews received at journal 16 Jan, 2026 Reviewers agreed at journal 09 Oct, 2025 Reviewers invited by journal 02 Sep, 2025 Editor assigned by journal 08 Aug, 2025 Submission checks completed at journal 08 Aug, 2025 First submitted to journal 05 Aug, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7302454","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":509156020,"identity":"4f8b4f9e-2fac-47ec-8f41-e7ee9aa1fc82","order_by":0,"name":"Sijin Li","email":"","orcid":"","institution":"Sun Yat-sen University","correspondingAuthor":false,"prefix":"","firstName":"Sijin","middleName":"","lastName":"Li","suffix":""},{"id":509156021,"identity":"4681827e-400b-4067-a031-c8928022990c","order_by":1,"name":"Yuqi Zhang","email":"","orcid":"","institution":"Beihang University","correspondingAuthor":false,"prefix":"","firstName":"Yuqi","middleName":"","lastName":"Zhang","suffix":""},{"id":509156022,"identity":"8e66ce69-241b-41bd-a65d-21b3bdfa466f","order_by":2,"name":"Weijie Wu","email":"","orcid":"","institution":"Dongguan Songshan Lake Tungwah Hospital","correspondingAuthor":false,"prefix":"","firstName":"Weijie","middleName":"","lastName":"Wu","suffix":""},{"id":509156024,"identity":"102e983b-7fc7-4ebd-b45b-3b926b3505e3","order_by":3,"name":"Guang Li","email":"","orcid":"","institution":"Dongguan Songshan Lake Tungwah Hospital","correspondingAuthor":false,"prefix":"","firstName":"Guang","middleName":"","lastName":"Li","suffix":""},{"id":509156026,"identity":"72f88ee6-68fb-46cc-bed0-f2377e11418f","order_by":4,"name":"Tucheng Huang","email":"","orcid":"","institution":"Sun Yat-sen Memorial Hospital of Sun Yat-sen University","correspondingAuthor":false,"prefix":"","firstName":"Tucheng","middleName":"","lastName":"Huang","suffix":""},{"id":509156028,"identity":"c3f7b743-1405-4262-bc44-37135784c814","order_by":5,"name":"Kuan Zeng","email":"","orcid":"","institution":"Sun Yat-sen University","correspondingAuthor":false,"prefix":"","firstName":"Kuan","middleName":"","lastName":"Zeng","suffix":""},{"id":509156029,"identity":"812a4011-f055-4085-8709-3f0850d4ab37","order_by":6,"name":"Chao Tong","email":"","orcid":"","institution":"Beihang University","correspondingAuthor":false,"prefix":"","firstName":"Chao","middleName":"","lastName":"Tong","suffix":""},{"id":509156030,"identity":"e334836a-e97e-48af-a470-543c970517da","order_by":7,"name":"Heng Li","email":"","orcid":"","institution":"Dongguan Songshan Lake Tungwah Hospital","correspondingAuthor":false,"prefix":"","firstName":"Heng","middleName":"","lastName":"Li","suffix":""},{"id":509156032,"identity":"b518d81b-51e8-448b-8ddd-b5d9a55fdd04","order_by":8,"name":"Hui Huang","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABAklEQVRIiWNgGAWjYBACAxDB2GAjByQbD8BEJYjQkmYMIknScjixAUgTp8VcIvnZw6870tLXth8G2vLnsL3BAeaDt3kY7PJwabGckWZuLHvGJnfbmcSGA4xthxM3HGBLtuZhSC7G6bAbCWbSkm1pudsOgLQ0HE4wOMBjJs3DcADsVOxa0r8BtRxONzv/EOYw/m8EtOSYSX5sO5xgdgNoCwPbYcYNB3jY8Gs586ZMmrEtzXDbDaAtiW3piTMPsxlbzjFIxq3lePo2yZ9tNvJm59MfPvjwx9qe73jzwxtvKuxwagEBZh4YK4GhGcgFG4VHPRAw/kCw6/ArHQWjYBSMghEJAAe7YmNCvQwXAAAAAElFTkSuQmCC","orcid":"","institution":"Joint Laboratory of Guangdong-Hong Kong-Macao Universities for Nutritional Metabolism and Precise Prevention and Control of Major Chronic Diseases, Sun Yat-sen University","correspondingAuthor":true,"prefix":"","firstName":"Hui","middleName":"","lastName":"Huang","suffix":""}],"badges":[],"createdAt":"2025-08-05 15:53:05","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7302454/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7302454/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1186/s13040-026-00525-5","type":"published","date":"2026-02-06T15:58:42+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":90908840,"identity":"9e0c9b24-3b44-4fdd-b3a6-eec9c15eae41","added_by":"auto","created_at":"2025-09-09 13:28:07","extension":"jpg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":1207371,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eCentral illustration and flow chart of the study design.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eFigure 1 is the central illustration of this manuscript. Figure 1A briefly summarizes the main work of this study, which is to use machine learning methods to explore variables that are strongly correlated with the atrial fibrillation subtype of first-visit patients and achieve accurate and rapid prediction in advance. The input of the model includes ten text variables that have been screened by high-throughput screening, and the output of the model is various evaluation indicators and the online calculator we developed. Figure 1B represents that we collected a total of 11,986 patients with first-visit atrial fibrillation and no specific classification from three medical centers in China during 2013-2023. By setting strict inclusion and exclusion criteria, 4,185 patients were finally included. Figure 1C elaborated on the experimental process of the final included patients. We set different centers as training sets, validation sets, and independent external validation sets. After randomly dividing the samples of the DH center into training sets and validation sets in a ratio of 8:2, we performed a five-fold cross-validation, and each validation was accompanied by the two centers of SYSMH and SSH as independent external validation sets.\u003c/p\u003e","description":"","filename":"F1Centralillustration.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7302454/v1/a61c00febc0f31dd4594236d.jpg"},{"id":90907118,"identity":"c8dc3b04-cd65-4bf0-9385-7030cab0cfbb","added_by":"auto","created_at":"2025-09-09 13:20:08","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":58671707,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eDistribution and correlation between variables in every independent center with AUCs.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eFigure 2 combines the distribution of the 10 variables we finally used to build the model in the three centers and their correlations with AUCs in three centers. Figure 2-A1 shows the distribution of left atrial size in DH (center 1), SYSMH (center 2), and SSH (center 3). Similarly, Figures 2-A2 to 2-A10 are the distributions of BNP, EF, UA, SBP, HB, LDL-C/HDL-C, WBC, AIP, and NC in three centers. Figure 2-B1 illustrates the correlation between the variables and atrial fibrillation diagnosis in the DH dataset. The values of the variables in the figure represent the correlation. A positive value indicates a positive correlation, while a negative value indicates a negative correlation. The larger the absolute value, the stronger the correlation, regardless of whether it is positive or negative. Similarly. B2-B3 respectively reflect the correlation between the variables in the SSH and SYSMH datasets. Figure 2-C shows the fit of different algorithms used in the model in the three centers, and the AUC value of each algorithm is shown in the figure. C1 is DH, C2 is SSH, and C3 is SYSMH.\u003c/p\u003e","description":"","filename":"F2variablecorrelationinthreecenterswithAUCs.png","url":"https://assets-eu.researchsquare.com/files/rs-7302454/v1/08508b429e5667b898650ee8.png"},{"id":90907117,"identity":"730c8daa-d0ea-49b2-9748-a84429061517","added_by":"auto","created_at":"2025-09-09 13:20:07","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":29730670,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eInteractions between the top five important variables ranked by SHAP method.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eFigure 3 shows the ranking of important features by the CatBoost model and identification of variables predictive of AF diagnostic subtype paroxysmal or persistent using the SHAP algorithm. Figure 3A shows the variables sorted by the absolute value of the mean SHAP value. A high value means a high impact on the model output. Figure 3B is a heatmap of the SHAP values measured using the training dataset of the CatBoost model, indicating the contribution of each feature to the model prediction. Figure 3C shows the impact of all variables on sample classification across all samples, with red indicating a positive effect (persistent AF) on the model’s prediction and blue indicating a negative effect (persistent AF). Figure 3D further shows the application of the SHAP algorithm in a single sample. Different variables with different SHAP values jointly affect the prediction results of the sample. Figure 3E is a combination of force plot for the top five variables outputted by SHAP algorithm, from up to down is the LA, NT-proBNP, LVEF, UA, SBP. Figure 3E shows how the top five variables most influential for distinguishing AF subtypes affect the model’s output as each variable changes.\u003c/p\u003e","description":"","filename":"F3SHAPofmodelandvariablesforceplot.png","url":"https://assets-eu.researchsquare.com/files/rs-7302454/v1/d9602748680799e487d05f83.png"},{"id":90907114,"identity":"7e6fb054-7955-4cce-a6d3-3f07e9893e81","added_by":"auto","created_at":"2025-09-09 13:20:07","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":15496804,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementary.docx","url":"https://assets-eu.researchsquare.com/files/rs-7302454/v1/77354b92b275ab6059e250eb.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"Early differentiation between paroxysmal and persistent atrial fibrillation based on interpretable machine learning: a multicenter retrospective study","fulltext":[{"header":"Introduction","content":"\u003cp\u003eAtrial fibrillation (AF) is the most common sustained cardiac arrhythmia and is associated with an increased risk of stroke, death, heart failure, hospitalization, and cognitive decline\u003csup\u003e\u003cspan additionalcitationids=\"CR2 CR3 CR4 CR5\" citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e\u003c/sup\u003e. Early detection of AF offers a critical period for initiating appropriate therapies, modifying risk factors, and ensuring closer follow-up of high-risk individuals to prevent AF-related morbidity and mortality\u003csup\u003e\u003cspan additionalcitationids=\"CR8\" citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u003c/sup\u003e. Most AF screening is opportunistic, according to the latest European Society of Cardiology (ESC) and American Heart Association (AHA) guidelines\u003csup\u003e\u003cspan additionalcitationids=\"CR11\" citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e. And screening also has shortcomings, such as easily overlooking patients with paroxysmal or short-term persistent AF\u003csup\u003e\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e,\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u003c/sup\u003e. Current ESC/AHA guidelines emphasize that paroxysmal and persistent AF exhibit distinct treatment strategies and divergent long-term clinical prognoses\u003csup\u003e\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e,\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e,\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e\u003c/sup\u003e. Patients with persistent AF are more likely to have adverse cardiovascular events, such as stroke and heart failure with preserved ejection fraction\u003csup\u003e\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e,\u003cspan additionalcitationids=\"CR18 CR19 CR20\" citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u003c/sup\u003e. Notably, paroxysmal and persistent AF account for approximately 38.9% and 39.2%, respectively, of all outpatient AF cases in China\u003csup\u003e\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e\u003c/sup\u003e. Therefore, early differentiation between paroxysmal and persistent AF has strong clinical application value. Currently, most researchers have paid attention to distinguishing AF subtypes by analyzing the AF attack phenomenon in ECG signals without evaluating from multiple angles, including previous medical history, serological tests, and cardiac ultrasound results\u003csup\u003e\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e,\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e\u003c/sup\u003e. Although this has been noted by some researchers, it has been limited to explorations of small samples of people in single centers\u003csup\u003e\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e\u003c/sup\u003e. Consequently, there is a lack of attention to the differentiation of paroxysmal and persistent AF.\u003c/p\u003e\u003cp\u003eRecent technological advances in artificial intelligence (AI) and machine learning (ML) provide opportunities to develop new tools to enable screening in risk factors and achieve more personalized treatment\u003csup\u003e\u003cspan additionalcitationids=\"CR27\" citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e\u003c/sup\u003e. In particular, ML models leveraging electronic health record (EHR) data have seen increasing application in cardiovascular medicine\u003csup\u003e\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e\u003c/sup\u003e. As shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e1\u003c/span\u003eA, in this study, we collected patient data from three tertiary hospitals in China and developed an interpretable ML model capable of accurately distinguishing paroxysmal AF from persistent AF of first-diagnosed AF patients. Additionally, we identified novel AF-related risk factors and implemented an online calculator to facilitate the broad application of our findings across Chinese and broader Asian populations. By enabling the timely and precise identification of AF subtypes, our approach aims to inform clinical decision-making, enhance preventive strategies, and pave the way for personalized care in AF management.\u003c/p\u003e"},{"header":"Methods","content":"\u003cp\u003e\u003cb\u003eDatasets\u003c/b\u003e\u003c/p\u003e\u003cp\u003eThis study was a multicenter retrospective study. The data were obtained from the EHR data of three tertiary medical institutions in different regions of China, including Sun Yat-sen Memorial Hospital of Sun Yat-sen University (SYSMH), Tungwah Hospital of Sun Yat-sen University (DH), and Dongguan Songshan Lake Tungwah Hospital (SSH).\u003c/p\u003e\u003cp\u003e\u003cb\u003ePatient enrolment and data collection\u003c/b\u003e\u003c/p\u003e\u003cp\u003ePatients were enrolled based on the following criteria: (1) Hospitalization occurring between January 2013 and January 2023. (2) Documented evidence of AF rhythm either prior to or during hospitalization, confirmed by diagnostic modalities such as surface electrocardiography, 24-hour Holter monitoring, or pacemaker memory recordings. All patients had a first-time diagnosis of AF, though the exact AF subtype at the time of initial diagnosis was not specified. (3) AF was confirmed as the discharge diagnosis, classified as either paroxysmal or persistent AF according to the International Classification of Diseases, 10th Revision (ICD-10). Specifically, paroxysmal AF was coded as I48.x02 and persistent AF as I48.x00x007. Paroxysmal AF was defined as self-terminating episodes lasting less than 7 days, while persistent AF was defined as sustained episodes lasting longer than 7 days and requiring medical intervention\u003csup\u003e\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e,\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u003c/sup\u003e. (4) Complete and detailed medical history records were available.\u003c/p\u003e\u003cp\u003ePatients were excluded if they met any of the following conditions: (1) History of rheumatic heart disease, congenital heart disease, primary valvular heart disease, cardiomyopathy, pericardial disease, cor pulmonale, malignancy, or recent major surgery. (2) Presence of systemic diseases potentially affecting cardiac structure or function, including acute myocardial infarction, thyroid dysfunction (hyperthyroidism or hypothyroidism), amyloidosis, pheochromocytoma, systemic lupus erythematosus, or severe infections. (3) Severe hepatic dysfunction, defined as alanine aminotransferase (ALT) levels exceeding three times the upper limit of normal and an ALT/AST ratio greater than 1. (4) Severe renal insufficiency, defined as an estimated glomerular filtration rate (eGFR) of less than 30 ml/min/1.73 m\u0026sup2;. (5) Incomplete clinical data, specifically defined as the absence of more than half of the required clinical variables or missing transthoracic echocardiography results.\u003c/p\u003e\u003cp\u003e All procedures in this study were performed in accordance with the ethical standards of the responsible committee for human experimentation in China and the Helsinki Declaration of 1975. This study was reviewed and approved by the Ethics Committee of SYSMH, DH, and SSH, respectively (SYSKY-2024-004-01, DHKY-2025-003-01, and SDHKY-2025-002-01).\u003c/p\u003e\u003cp\u003e\u003cb\u003eSelection of variables\u003c/b\u003e\u003c/p\u003e\u003cp\u003eWe collected demographic data, medication use, serological indicators, and baseline cardiac ultrasound data of all included subjects, totaling 50 variables. We performed Spearman correlation analysis on all the collected variables to initially screen out the variables that were correlated with the diagnosis of AF subtypes. Among them, P\u0026thinsp;\u0026lt;\u0026thinsp;0.05 was considered statistically significant. Based on our previous studies, we added several variables that were considered to be correlated with the diagnosis of AF subtypes before further screening\u003csup\u003e\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e\u003c/sup\u003e. Subsequently, we further screened the variables by Gradient Boosting Tree Recursive Feature Elimination (RFE) and Random Forest RFE. Through the above steps, we could comprehensively and accurately assess the importance of the variables and provide strong support for the subsequent model construction.\u003c/p\u003e\u003cp\u003e\u003cb\u003eMachine Learning Algorithms\u003c/b\u003e\u003c/p\u003e\u003cp\u003eIn this study, we conducted experiments using five widely adopted gradient boosting machine (GBM) algorithms: LightGBM (Light Gradient Boosting Machine), AdaBoost (Adaptive Boosting), GradientBoost, XGBoost (Extreme Gradient Boosting), and CatBoost (Categorical Boosting). Each algorithm offers unique advantages, enhancing the overall diversity and robustness of the analysis. LightGBM is a highly efficient gradient boosting framework. It is particularly known for its fast training speed and low memory usage, owing to its histogram-based method for decision tree learning. AdaBoost is one of the earliest boosting algorithms. The core mechanism of AdaBoost is to re-weight misclassified instances in each iteration, enabling the model to focus more on challenging cases, thus improving accuracy over time. GradientBoost is a classical gradient boosting algorithm that minimizes a loss function through the iterative training of decision trees. XGBoost is a highly efficient and scalable variant of gradient boosting, known for its exceptional performance in both speed and accuracy. CatBoost is an advanced gradient boosting algorithm that excels in handling classified features without the need for explicit preprocessing. Employing a range of machine learning models enabled us to comprehensively assess their performance on our datasets, providing insights into which models excel in addressing specific problems. We conducted experiments using five-fold cross-validation to comprehensively evaluate their performance on the dataset.\u003c/p\u003e\u003cp\u003e\u003cb\u003eSHAP Interpretable Analysis for Machine Learning\u003c/b\u003e\u003c/p\u003e\u003cp\u003eSHAP (SHapley Additive exPlanations)\u003csup\u003e\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e\u003c/sup\u003e is a method rooted in game theory that allows for the quantification of each feature's contribution to a model's final prediction. While the tool itself provides transparency into model decisions, its real value lies in the insights it offers, helping us to better understand the inner workings of our machine learning models and the relationships between features and predictions. By calculating Shapley values, SHAP breaks down the model\u0026rsquo;s output into additive contributions from each feature, which not only aids in interpreting the model but also facilitates identifying which features most significantly impact the outcomes. The application of SHAP is particularly meaningful in our research, as it enables us to go beyond the \"black box\" nature of machine learning models and gain a deeper understanding of the driving forces behind their predictions.\u003c/p\u003e\u003cdiv id=\"Sec2\" class=\"Section2\"\u003e\u003ch2\u003eStatistical Analysis\u003c/h2\u003e\u003cp\u003eNormally distributed continuous variables were expressed as mean\u0026thinsp;\u0026plusmn;\u0026thinsp;standard deviation (SD), and nonnormally distributed continuous variables were expressed as median (interquartile range, IQR). The distribution of continuous variables was assessed using the Shapiro\u0026ndash;Wilk normality test, while the comparison of continuous variables was performed using Mann-Whitney U test. Classified variables were expressed as count and percentage. The x\u003csup\u003e\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u003c/sup\u003e test was used for comparisons of classified variables. Two‐tailed values of P\u0026thinsp;\u0026lt;\u0026thinsp;0.05 were considered statistically significant. We used area under the receiver operating characteristic curve (AUC), sensitivity (SEN), specificity (SPE), accuracy (ACC), precision (PRE), recall, and F1 scores for evaluating the ability of the ML model to discriminate between paroxysmal and persistent AF. Displays restricted cubic spline (RCS) curves with 4 knots to test nonlinear relationships between independent variables and outcomes\u003csup\u003e\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e,\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e\u003c/sup\u003e. SPSS Statistics Version 26.0 and Python 3.7.6 software were used for statistical analysis and graphics, and p\u0026thinsp;\u0026lt;\u0026thinsp;0.05 was considered statistically significant.\u003c/p\u003e\u003c/div\u003e"},{"header":"Results","content":"\u003cp\u003e\u003cb\u003eVariable selection results\u003c/b\u003e\u003c/p\u003e\u003cp\u003eA detailed explanation of all the variables collected in this study can be found in Supplementary Table\u0026nbsp;1. After screening, we finally included 10 variables that can be divided into three categories: demographic data, cardiac ultrasound, and serological indicators. Demographic data included systolic blood pressure (SBP). Echocardiographic parameters included left atrial diameter (LA) and left ventricular ejection fraction (LVEF). These indicators were analyzed by routine transthoracic echocardiography (TTE) performed by a certified cardiologist at baseline and collected from the EHR. Serological parameters included white blood cell (WBC), neutrophils (NC), hemoglobin (Hb), N-terminal pro-brain natriuretic peptide (NT-proBNP), uric acid (UA), the ratio of low-density lipoprotein cholesterol to high-density lipoprotein cholesterol (LDL-C/HDL-C), and the atherosclerosis index of plasma (AIP). AIP is a logarithmically transformed ratio of TG (Total triglycerides) to HDL-C (High density lipoprotein cholesterol) in molar concentration (mmol/L), and it is mathematically derived from log (TG/HDL-C)\u003csup\u003e33\u003c/sup\u003e. All serological indicators were obtained from the peripheral blood sample collected for the first time at baseline. The distribution of these 10 variables and their correlation with AF diagnostic subtypes in three independent centers are clearly shown in Fig.\u0026nbsp;2A1-A10 and B1-B3.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003e\u003cb\u003eBaseline characteristics of participants\u003c/b\u003e\u003c/p\u003e\u003cp\u003eInitially, we collected 11,986 patients from DH, SYSMH, and SSH, including 6436, 4581, and 969 patients (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e1\u003c/span\u003eB). According to the inclusion and exclusion criteria, we finally enrolled 4155 patients that including 2248 in DH, 1599 in SYSMH, and 338 in SSH. The number of patients with paroxysmal AF and persistent AF was 2565 (61.29%) and 1620 (38.71%), respectively. The proportions of patients with paroxysmal AF and persistent AF in the three centers were also close to 60% and 40% (1361 and 887 in DH, 1019 and 580 in SYSMH, 187 and 151 in SSH). The baseline characteristics of all variables finally included in the model are shown in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eBaseline characteristics of participant\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"6\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eVariables\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eTotal\u003c/p\u003e\u003cp\u003e(n\u0026thinsp;=\u0026thinsp;4185)\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eDH\u003c/p\u003e\u003cp\u003e(n\u0026thinsp;=\u0026thinsp;2248)\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eSYSMH\u003c/p\u003e\u003cp\u003e(n\u0026thinsp;=\u0026thinsp;1599)\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e\u003cp\u003eSSH\u003c/p\u003e\u003cp\u003e(n\u0026thinsp;=\u0026thinsp;338)\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c6\"\u003e\u003cp\u003ep-Value\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAge (years)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e68 (59, 77)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e69 (59, 78)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e67 (60, 75)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e69 (58, 78)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e\u0026lt;\u0026thinsp;0.001*\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eGender (n, %)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.797\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eMale (%)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e2468 (58.97)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e1320 (58.71)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e943 (58.97)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e205 (60.65)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eFemale (%)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e1717 (41.03)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e928 (41.29)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e656 (41.03)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e133 (39.35)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eSBP (mmHg)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e131 (117, 147)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e135 (118, 150)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e127 (116, 142)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e132 (118, 147)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e\u0026lt;\u0026thinsp;0.001*\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eWBC (10\u003csup\u003e9\u003c/sup\u003e/L)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e7.08 (5.73, 9.09)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e7.43 (5.84, 9.74)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e6.72 (5.67, 8.24)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e7.17 (5.51, 9.19)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e\u0026lt;\u0026thinsp;0.001*\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eNC (10\u003csup\u003e9\u003c/sup\u003e/L)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e4.53 (3.36, 6.45)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e4.86 (3.35, 7.23)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e4.21 (3.35, 5.50)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e5.07 (3.17, 7.34)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e\u0026lt;\u0026thinsp;0.001*\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eHb (g/L)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e132 (118, 146)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e131 (115, 145)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e135 (123, 147)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e130 (117, 145)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e\u0026lt;\u0026thinsp;0.001*\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAIP\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.032 (-0.156, 0.226)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.014 (-0.180, 0.217)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.055 (-0.128, 0.241)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.009 (-0.173, 0.213)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e\u0026lt;\u0026thinsp;0.001*\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eLDL-C/HDL-C\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e2.353 (1.733, 3.099)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e2.256 (1.637, 3.045)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e2.521 (1.944, 3.203)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e2.055 (1.464, 2.792)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e\u0026lt;\u0026thinsp;0.001*\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eNT-proBNP (pg/ml)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e792 (209, 2337)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e1144 (312, 3053)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e470 (128, 1190)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e1223 (324, 3469)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e\u0026lt;\u0026thinsp;0.001*\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eUA (\u0026micro;mol/L)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e384 (309, 468)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e381 (301, 473)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e391 (320, 468)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e364 (289, 446)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e\u0026lt;\u0026thinsp;0.001*\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eLA (mm)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e37 (32, 43)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e36 (31, 42)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e38 (35, 43)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e40 (35, 45)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e\u0026lt;\u0026thinsp;0.001*\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eLVEF (%)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e64 (57, 68)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e62 (55, 67)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e66 (61, 70)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e59 (50, 65)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e\u0026lt;\u0026thinsp;0.001*\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eDiagnose (n, %)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.008*\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e0\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e2565 (61.29)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e1361 (60.54)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e1019 (63.73)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e187 (55.33)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e1\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e1620 (38.71)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e887 (39.46)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e580 (36.27)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e151 (44.67)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colspan=\"6\" nameend=\"c6\" namest=\"c1\"\u003e\u003cp\u003eValues are presented as n (%) as appropriate or the median [interquartile range (IQR)]. DH, Donghua Hospital of Sun Yat-sen University; SYSMH, Sun Yat-sen Memorial Hospital of Sun Yat-sen University; SSH, Dongguan Songshan Lake Donghua Hospital; SBP, systolic blood pressure; WBC, white blood cell; NC, Neutrophil count; Hb, hemoglobin; AIP, Atherogenic index of plasma; LDL-C, low density lipoprotein cholesterol; HDL-C, high density lipoprotein cholesterol; NT-proBNP, N-terminal brain natriuretic peptide precursor; UA, uric acid; LA, left atrial diameter; LVEF, left ventricular ejection fraction; Diagnose, 0\u0026thinsp;=\u0026thinsp;paroxysmal AF, 1\u0026thinsp;=\u0026thinsp;persistent AF.\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003e\u003cb\u003eResults of AF subtype prediction model\u003c/b\u003e\u003c/p\u003e\u003cp\u003eAs described in the Methods section, we used five machine learning methods to build the model. We used the DH dataset with the largest sample size as the primary dataset and performed a five-fold cross-validation. As shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e1\u003c/span\u003eC, in each cross-validation, the DH dataset was split into a training set and internal validation set in the ratio of 8:2, accompanied by independent external validation of the SYSMH and SSH datasets. This strategy helps eliminate the influence of data partitioning, ensuring more convincing and reliable results. The output variable of the model was the predicted AF diagnostic subtype, which was compared with the diagnosis recorded in the EHR at discharge. Comparing the evaluation indicators, we found that the model established in our study had good predictive performance and stable generalizability (as shown in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e).\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eResults of model output indicators in multicenter\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"8\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eModel\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eACC\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003ePrecision\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eRecall\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e\u003cp\u003eAUC\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c6\"\u003e\u003cp\u003eF1 Score\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c7\"\u003e\u003cp\u003eSEN\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c8\"\u003e\u003cp\u003eSPE\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003ctr\u003e\u003cth align=\"left\" colspan=\"8\" nameend=\"c8\" namest=\"c1\"\u003e\u003cp\u003eDH\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eCatBoost\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.789\u003c/p\u003e\u003cp\u003e(0.766\u0026ndash;0.825)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.755\u003c/p\u003e\u003cp\u003e(0.700-0.812)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.694\u003c/p\u003e\u003cp\u003e(0.656\u0026ndash;0.727)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.861\u003c/p\u003e\u003cp\u003e(0.835\u0026ndash;0.898)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.722\u003c/p\u003e\u003cp\u003e(0.701\u0026ndash;0.764)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.694\u003c/p\u003e\u003cp\u003e(0.656\u0026ndash;0.727)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.852\u003c/p\u003e\u003cp\u003e(0.798\u0026ndash;0.892)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eGradientBoost\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.791\u003c/p\u003e\u003cp\u003e(0.770\u0026ndash;0.823)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.753\u003c/p\u003e\u003cp\u003e(0.727\u0026ndash;0.803)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.702\u003c/p\u003e\u003cp\u003e(0.648\u0026ndash;0.736)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.859\u003c/p\u003e\u003cp\u003e(0.838\u0026ndash;0.894)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.726\u003c/p\u003e\u003cp\u003e(0.698\u0026ndash;0.765)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.702\u003c/p\u003e\u003cp\u003e(0.648\u0026ndash;0.736)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.849\u003c/p\u003e\u003cp\u003e(0.821\u0026ndash;0.885)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eLightGBM\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.780\u003c/p\u003e\u003cp\u003e(0.766\u0026ndash;0.815)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.738\u003c/p\u003e\u003cp\u003e(0.702\u0026ndash;0.793)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.688\u003c/p\u003e\u003cp\u003e(0.645\u0026ndash;0.721)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.855\u003c/p\u003e\u003cp\u003e(0.835\u0026ndash;0.889)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.711\u003c/p\u003e\u003cp\u003e(0.689\u0026ndash;0.754)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.688\u003c/p\u003e\u003cp\u003e(0.645\u0026ndash;0.721)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.840\u003c/p\u003e\u003cp\u003e(0.809\u0026ndash;0.879)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eXGBoost\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.774\u003c/p\u003e\u003cp\u003e(0.756\u0026ndash;0.896)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.727\u003c/p\u003e\u003cp\u003e(0.679\u0026ndash;0.754)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.688\u003c/p\u003e\u003cp\u003e(0.644\u0026ndash;0.720)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.846\u003c/p\u003e\u003cp\u003e(0.827\u0026ndash;0.875)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.706\u003c/p\u003e\u003cp\u003e(0.683\u0026ndash;0.733)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.688\u003c/p\u003e\u003cp\u003e(0.644\u0026ndash;0.720)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.831\u003c/p\u003e\u003cp\u003e(0.797\u0026ndash;0.852)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAdaBoost\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.783\u003c/p\u003e\u003cp\u003e(0.768\u0026ndash;0.803)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.740\u003c/p\u003e\u003cp\u003e(0.716\u0026ndash;0.769)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.697\u003c/p\u003e\u003cp\u003e(0.661\u0026ndash;0.726)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.845\u003c/p\u003e\u003cp\u003e(0.826\u0026ndash;0.875)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.717\u003c/p\u003e\u003cp\u003e(0.695\u0026ndash;0.739)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.697\u003c/p\u003e\u003cp\u003e(0.661\u0026ndash;0.726)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.840\u003c/p\u003e\u003cp\u003e(0.811\u0026ndash;0.863)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colspan=\"8\" nameend=\"c8\" namest=\"c1\"\u003e\u003cp\u003e\u003cb\u003eSSH\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eCatBoost\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.748\u003c/p\u003e\u003cp\u003e(0.735\u0026ndash;0.762)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.681\u003c/p\u003e\u003cp\u003e(0.669\u0026ndash;0.690)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.834\u003c/p\u003e\u003cp\u003e(0.817\u0026ndash;0.867)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.833\u003c/p\u003e\u003cp\u003e(0.828\u0026ndash;0.840)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.750\u003c/p\u003e\u003cp\u003e(0.736\u0026ndash;0.767)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.834\u003c/p\u003e\u003cp\u003e(0.817\u0026ndash;0.867)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.677\u003c/p\u003e\u003cp\u003e(0.665\u0026ndash;0.696)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eGradientBoost\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.734\u003c/p\u003e\u003cp\u003e(0.725\u0026ndash;0.744)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.670\u003c/p\u003e\u003cp\u003e(0.663\u0026ndash;0.677)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.813\u003c/p\u003e\u003cp\u003e(0.792\u0026ndash;0.837)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.825\u003c/p\u003e\u003cp\u003e(0.820\u0026ndash;0.831)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.734\u003c/p\u003e\u003cp\u003e(0.725\u0026ndash;0.748)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.813\u003c/p\u003e\u003cp\u003e(0.792\u0026ndash;0.837)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.668\u003c/p\u003e\u003cp\u003e(0.650\u0026ndash;0.685)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eLightGBM\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.737\u003c/p\u003e\u003cp\u003e(0.731\u0026ndash;0.755)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.674\u003c/p\u003e\u003cp\u003e(0.662\u0026ndash;0.690)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.812\u003c/p\u003e\u003cp\u003e(0.792\u0026ndash;0.836)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.821\u003c/p\u003e\u003cp\u003e(0.814\u0026ndash;0.830)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.737\u003c/p\u003e\u003cp\u003e(0.728\u0026ndash;0.755)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.812\u003c/p\u003e\u003cp\u003e(0.792\u0026ndash;0.836)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.676\u003c/p\u003e\u003cp\u003e(0.651\u0026ndash;0.691)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eXGBoost\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.728\u003c/p\u003e\u003cp\u003e(0.708\u0026ndash;0.746)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.666\u003c/p\u003e\u003cp\u003e(0.649\u0026ndash;0.681)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.804\u003c/p\u003e\u003cp\u003e(0.773\u0026ndash;0.829)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.821\u003c/p\u003e\u003cp\u003e(0.806\u0026ndash;0.830)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.728\u003c/p\u003e\u003cp\u003e(0.706\u0026ndash;0.747)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.804\u003c/p\u003e\u003cp\u003e(0.773\u0026ndash;0.829)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.666\u003c/p\u003e\u003cp\u003e(0.644\u0026ndash;0.681)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAdaBoost\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.741\u003c/p\u003e\u003cp\u003e(0.725\u0026ndash;0.764)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.677\u003c/p\u003e\u003cp\u003e(0.657\u0026ndash;0.699)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.817\u003c/p\u003e\u003cp\u003e(0.786\u0026ndash;0.842)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.819\u003c/p\u003e\u003cp\u003e(0.808\u0026ndash;0.830)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.740\u003c/p\u003e\u003cp\u003e(0.722\u0026ndash;0.764)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.817\u003c/p\u003e\u003cp\u003e(0.786\u0026ndash;0.842)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.678\u003c/p\u003e\u003cp\u003e(0.642\u0026ndash;0.701)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colspan=\"8\" nameend=\"c8\" namest=\"c1\"\u003e\u003cp\u003e\u003cb\u003eSYSMH\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eCatBoost\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.808\u003c/p\u003e\u003cp\u003e(0.803\u0026ndash;0.816)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.707\u003c/p\u003e\u003cp\u003e(0.698\u0026ndash;0.719)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.802\u003c/p\u003e\u003cp\u003e(0.788\u0026ndash;0.812)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.876\u003c/p\u003e\u003cp\u003e(0.871\u0026ndash;0.880)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.752\u003c/p\u003e\u003cp\u003e(0.747\u0026ndash;0.761)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.802\u003c/p\u003e\u003cp\u003e(0.788\u0026ndash;0.812)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.811\u003c/p\u003e\u003cp\u003e(0.802\u0026ndash;0.820)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eGradientBoost\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.802\u003c/p\u003e\u003cp\u003e(0.794\u0026ndash;0.810)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.702\u003c/p\u003e\u003cp\u003e(0.688\u0026ndash;0.708)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.790\u003c/p\u003e\u003cp\u003e(0.769\u0026ndash;0.809)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.872\u003c/p\u003e\u003cp\u003e(0.869\u0026ndash;0.873)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.743\u003c/p\u003e\u003cp\u003e(0.733\u0026ndash;0.755)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.790\u003c/p\u003e\u003cp\u003e(0.769\u0026ndash;0.809)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.809\u003c/p\u003e\u003cp\u003e(0.795\u0026ndash;0.814)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eLightGBM\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.807\u003c/p\u003e\u003cp\u003e(0.801\u0026ndash;0.811)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.706\u003c/p\u003e\u003cp\u003e(0.695\u0026ndash;0.721)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.801\u003c/p\u003e\u003cp\u003e(0.779\u0026ndash;0.820)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.875\u003c/p\u003e\u003cp\u003e(0.873\u0026ndash;0.877)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.750\u003c/p\u003e\u003cp\u003e(0.744\u0026ndash;0.755)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.801\u003c/p\u003e\u003cp\u003e(0.779\u0026ndash;0.820)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.810\u003c/p\u003e\u003cp\u003e(0.797\u0026ndash;0.828)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eXGBoost\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.800\u003c/p\u003e\u003cp\u003e(0.795\u0026ndash;0.807)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.697\u003c/p\u003e\u003cp\u003e(0.685\u0026ndash;0.704)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.793\u003c/p\u003e\u003cp\u003e(0.772\u0026ndash;0.810)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.873\u003c/p\u003e\u003cp\u003e(0.869\u0026ndash;0.879)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.742\u003c/p\u003e\u003cp\u003e(0.734\u0026ndash;0.751)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.793\u003c/p\u003e\u003cp\u003e(0.772\u0026ndash;0.810)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.804\u003c/p\u003e\u003cp\u003e(0.789\u0026ndash;0.811)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAdaBoost\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.787\u003c/p\u003e\u003cp\u003e(0.775\u0026ndash;0.795)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.678\u003c/p\u003e\u003cp\u003e(0.666\u0026ndash;0.686)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.787\u003c/p\u003e\u003cp\u003e(0.762\u0026ndash;0.803)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.860\u003c/p\u003e\u003cp\u003e(0.853\u0026ndash;0.866)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.728\u003c/p\u003e\u003cp\u003e(0.711\u0026ndash;0.740)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.787\u003c/p\u003e\u003cp\u003e(0.762\u0026ndash;0.803)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.787\u003c/p\u003e\u003cp\u003e(0.782\u0026ndash;0.791)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colspan=\"8\" nameend=\"c8\" namest=\"c1\"\u003e\u003cp\u003eDH, Donghua Hospital of Sun Yat-sen University; SYSMH, Sun Yat-sen Memorial Hospital of Sun Yat-sen University; SSH, Dongguan Songshan Lake Donghua Hospital; ACC, accuracy; AUC, area under curve; CI, confidence interval; SEN, sensitivity; SPE, specificity.\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eAmong the overall models built based on the DH dataset, the GradientBoost model had the highest accuracy of 0.791 (95%CI: 0.770\u0026ndash;0.823), followed by the CatBoost model, which was 0.789 (95%CI: 0.766\u0026ndash;0.825). In terms of AUC value comparison, the value of the GradientBoost model was 0.859 (95% CI: 0.838\u0026ndash;0.894), which was slightly lower than the maximum value of 0.861 (95% CI: 0.835\u0026ndash;0.898) of the CatBoost model. In terms of sensitivity and specificity, the GradientBoost model (0.702, 95%CI: 0.648\u0026ndash;0.736) and the CatBoost model (0.852, 95%CI: 0.798\u0026ndash;0.892) performed best, respectively. As for the two independent validation sets SSH and SYSMH, except that the highest specificity of SSH was achieved by the AdaBoost model (0.678, 95%CI: 0.642\u0026ndash;0.701), the CatBoost model had the highest accuracy of 0.808 (95%CI: 0.803\u0026ndash;0.816), AUC value of 0.876 (95%CI: 0.871\u0026ndash;0.880), sensitivity of 0.802 (95%CI: 0.788\u0026ndash;0.812) and specificity of 0.811 (95%CI: 0.802\u0026ndash;0.820). We summarized the AUC for every center and the results of each fold of the five-fold cross validation for the five algorithms, as shown in Fig.\u0026nbsp;2C1-C3. We also plotted the AUC results of each fold in the five-fold cross-validation process for all centers in Supplementary Fig.\u0026nbsp;1.\u003c/p\u003e\u003cp\u003e\u003cb\u003eInterpretation of AF subtype prediction model\u003c/b\u003e\u003c/p\u003e\u003cp\u003eAs shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e3\u003c/span\u003eA, the impact of different variables on the model output is illustrated by ranking their absolute SHAP values in descending order. The five variables that most significantly affect the diagnosis of different AF types are LA, NT-proBNP, Hb, LVEF, and UA. Figure\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e3\u003c/span\u003eB provides a more intuitive view of the relationship between these variables and AF diagnostic types. It can be observed that LA, LVEF, and UA affect the diagnosis of AF in a certain pattern. For example, LA displays a gradient transitioning from blue to red. There is a distinct color boundary near a SHAP value of 0, indicating a regular pattern between LA values and AF diagnostic types. Specifically, when LA values are lower, the model tends to predict paroxysmal AF, whereas higher LA values are associated with a diagnosis of persistent AF. Figure\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e3\u003c/span\u003eC shows the impact of all variables on sample classification across all samples, with red indicating a positive effect (persistent AF) on the model\u0026rsquo;s prediction and blue indicating a negative effect (persistent AF). Figure\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e3\u003c/span\u003eD further illustrates the influence of each variable on the model\u0026rsquo;s prediction for a specific sample using the SHAP method. Figure\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e3\u003c/span\u003eE shows how the top five variables most influential for distinguishing AF subtypes affect the model\u0026rsquo;s output as each variable changes. Compared to Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e2\u003c/span\u003eB, this view makes the impact of each variable\u0026rsquo;s trend even clearer. For example, as LA increases, its contribution shifts the model\u0026rsquo;s prediction toward persistent AF.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eIn the model for early differentiation of AF subtypes as paroxysmal or persistent, the top five variables were LA, NT-proBNP, LVEF, UA, and SBP. For these variables, we further obtained the mutual influence relationship between them and plotted them into a scatter plot (see Supplementary Fig.\u0026nbsp;2 for details). As shown in Supplementary Fig.\u0026nbsp;3, we also explored the relationship between these five variables and AF diagnostic subtypes in three independent centers by restricted cubic spline analysis.\u003c/p\u003e\u003cp\u003e\u003cb\u003ePerformance and explanation of AF subtype prediction models in subgroups\u003c/b\u003e\u003c/p\u003e\u003cp\u003eWe divide the participants of every center into six groups: male under 60 years old, female under 60 years old, male 60\u0026ndash;65 years old, female 60\u0026ndash;65 years old, male 65 years old and above, female 65 years old and above. The model has achieved good prediction performance among different subgroups. We sorted the results of AUC between different centers according to male or female in different age subgroups to draw Supplementary Fig.\u0026nbsp;4. The order of A-F in Supplementary Fig.\u0026nbsp; 4 is arranged according to age, and centers 1\u0026ndash;3 are DH, SSH, and SYSMH, respectively. We show these results and the specific values of other evaluation indicators in Supplementary Tables\u0026nbsp;2\u0026ndash;4. Similarly, we used the SHAP method to visualize the interpretability of the model. The SHAP graphs of all subgroups are summarized in Supplementary Fig.\u0026nbsp;5 of the Supplementary Materials. The most important and second influencing factors of any age subgroups are LA and NT-proBNP, which are similar to the overall model.\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eThis study integrated EHR data (clinical history, serum indicators, and cardiac ultrasound) from three independent large tertiary hospitals in China and established an interpretable model based on machine learning to accurately distinguish paroxysmal and persistent AF in advance. We screened out new AF subtype-related factors, which not only provided a basis for further mechanism research, but also helped us build an online web calculator (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://123.56.120.106:8000/\u003c/span\u003e\u003cspan address=\"http://123.56.120.106:8000/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) that can accurately distinguish whether patients with first-time AF will have paroxysmal or persistent AF in the long term and give the probability. In addition, the variables affecting diagnosis in different age and gender subgroups were evaluated to confirm the importance and specificity of these variables in distinguishing early AF subtypes between paroxysmal and persistent AF.\u003c/p\u003e\u003cp\u003eAF is a common cause of stroke, heart failure, cardiovascular death, and dementia\u003csup\u003e\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e,\u003cspan additionalcitationids=\"CR35\" citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e\u003c/sup\u003e. The global burden of AF is rising\u003csup\u003e\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e\u003c/sup\u003e, and Asia-Pacific patients account for the majority of AF patients worldwide, while Chinese patients are the fifth in estimated prevalence and the first in absolute prevalence in the Asia-Pacific region\u003csup\u003e\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e\u003c/sup\u003e. China and the Asia-Pacific region face challenges including great disparities in access to healthcare and availability of diagnostic technology\u003csup\u003e\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e,\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e\u003c/sup\u003e. The model and online calculator developed based on this study can help narrow the disparity in healthcare access between different regions and improve the long-term prognosis of AF patients caused by insufficient availability of diagnostic technology.\u003c/p\u003e\u003cp\u003eThis study found 10 variables that can accurately distinguish paroxysmal from persistent AF in advance, which is partially similar to our previous research results\u003csup\u003e\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e\u003c/sup\u003e. Among the key variables affecting the model obtained based on the SHAP method, LA is the most important. Previous studies have found that LA is one of the most important indicators for predicting new-onset AF and early distinguishing AF subtypes\u003csup\u003e\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e,\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e,\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e\u003c/sup\u003e, and its increase is associated with the progression of AF from paroxysmal to persistent\u003csup\u003e\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e\u003c/sup\u003e. Our research has verified this view in the whole population and different subgroups. The latest studies suggest that the possible pathogenic mechanisms of this phenomenon are atrial cardiomyopathy and atrial fibrosis\u003csup\u003e\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e,\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e\u003c/sup\u003e, and left atrial diameter is the clinical manifestation of these potential mechanisms.\u003c/p\u003e\u003cp\u003eAnother crucial variable is UA, and our results suggest that higher UA levels are more likely to be diagnosed with persistent AF. This conclusion is consistent with the view of a recent published review\u003csup\u003e\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e\u003c/sup\u003e. There is a significant difference in UA levels between paroxysmal AF and persistent AF\u003csup\u003e\u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e\u003c/sup\u003e, and this dose-response relationship can be observed in people with different disease backgrounds\u003csup\u003e\u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e\u003c/sup\u003e. The potential mechanism may be that high UA may not only increase the risk of AF through cardiovascular disease, but also directly affect the development of AF through mechanisms such as oxidative stress, inflammation, insulin resistance, and activation of the renin-angiotensin-aldosterone system, ultimately leading to electrical remodeling, changes in the autonomic nervous system, abnormal Ca\u003csup\u003e2+\u003c/sup\u003e handling, and atrial remodeling\u003csup\u003e\u003cspan additionalcitationids=\"CR49\" citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e\u003c/sup\u003e. The incidence of patients with high UA is increasing year by year worldwide\u003csup\u003e\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e\u003c/sup\u003e, and it affects about 14.0% of adults in China or even more\u003csup\u003e\u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e\u003c/sup\u003e. Therefore, the relationship between UA and AF subtype should be the focus of further research.\u003c/p\u003e\u003cp\u003eOur study also further confirmed the importance of SBP in differentiating AF subtypes, which is consistent with the conclusion of a recent study based on the China Cardiovascular Care Quality Improvement-Atrial Fibrillation\u003csup\u003e\u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e53\u003c/span\u003e\u003c/sup\u003e. Higher blood pressure levels on admission to hospital for AF patients were associated with an increased risk of stroke/transient ischemic attack and heart failure (HF), while lower blood pressure levels were associated with an increased risk of HF and all-cause mortality. Patients with lower SBP are more likely to have persistent AF, which is accompanied by a greater risk of HF. Further research is needed to clarify the relationship between blood pressure and AF type and prognosis.\u003c/p\u003e\u003cp\u003eStudies have shown that there are clear gender differences in the epidemiology and risk factors of AF\u003csup\u003e\u003cspan additionalcitationids=\"CR55\" citationid=\"CR54\" class=\"CitationRef\"\u003e54\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e56\u003c/span\u003e\u003c/sup\u003e, and the relationship between increasing age and the progression of AF has been confirmed\u003csup\u003e\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e\u003c/sup\u003e. We conducted the subgroup analysis combining gender and age. The cut-off points were derived from the newly released Chinese AF Diagnosis and Management Guidelines based on Asian research evidence\u003csup\u003e\u003cspan additionalcitationids=\"CR58 CR59\" citationid=\"CR57\" class=\"CitationRef\"\u003e57\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e60\u003c/span\u003e\u003c/sup\u003e. We confirmed the stability of the model by obtaining similar model performance in different subgroups. Inter-group comparisons also yielded some meaningful conclusions, such as the highest AUC in the subgroup\u0026thinsp;\u0026lt;\u0026thinsp;60 in male, and the highest AUC in the subgroup 60\u0026ndash;64 in female. Our study achieved accurate differentiation of AF subtypes in younger age groups. A recent study concluded that younger age at first diagnosis of AF is associated with a higher risk of stroke\u003csup\u003e\u003cspan citationid=\"CR61\" class=\"CitationRef\"\u003e61\u003c/span\u003e\u003c/sup\u003e, so this study is conducive to early anticoagulation treatment for younger age subgroups based on this conclusion.\u003c/p\u003e\u003cp\u003eNevertheless, our study also has some limitations. We only explored the distinction between two AF subtypes, while the latest ACC guidelines proposed four stages of AF evolution\u003csup\u003e\u003cspan citationid=\"CR62\" class=\"CitationRef\"\u003e62\u003c/span\u003e\u003c/sup\u003e. A larger sample size of AF patients with more subtypes and long-term follow-up are needed to achieve accurate multi-classification and progress prediction of AF subtypes\u003csup\u003e\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e\u003c/sup\u003e. Although the patients came from different provinces, the three centers included in this study were all in China, which means that more centers in other districts and countries need to be added in the future to make the sample characteristics more extensive. While increasing research centers and participants, the balance of sample accounts between different centers is also something that needs to be balanced in our further researches.\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eIn this study, we collected data from 11,986 patients across three tertiary hospitals in China and developed an interpretable machine learning model capable of accurately distinguishing early paroxysmal AF from persistent AF subtypes of first-diagnosed AF. Additionally, we identified novel AF-related risk factors and implemented an online calculator to facilitate the broad application of our findings throughout Chinese and wider Asian populations. By enabling timely and precise identification of AF subtypes, our approach aims to inform clinical decision-making, strengthen preventive strategies, and pave the way for personalized care in AF management.\u003c/p\u003e"},{"header":"Abbreviations","content":"\u003cdiv class=\"DefinitionList\"\u003e\u003cdiv class=\"DefinitionListEntry\"\u003e\u003cdiv class=\"Term\"\u003eACC\u003c/div\u003e\u003cdiv class=\"Description\"\u003e\u003cp\u003eaccuracy\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv class=\"DefinitionListEntry\"\u003e\u003cdiv class=\"Term\"\u003eAF\u003c/div\u003e\u003cdiv class=\"Description\"\u003e\u003cp\u003eatrial fibrillation\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv class=\"DefinitionListEntry\"\u003e\u003cdiv class=\"Term\"\u003eAIP\u003c/div\u003e\u003cdiv class=\"Description\"\u003e\u003cp\u003eatherogenic index of plasma\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv class=\"DefinitionListEntry\"\u003e\u003cdiv class=\"Term\"\u003eAUC\u003c/div\u003e\u003cdiv class=\"Description\"\u003e\u003cp\u003earea under the curve\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv class=\"DefinitionListEntry\"\u003e\u003cdiv class=\"Term\"\u003eAV\u003c/div\u003e\u003cdiv class=\"Description\"\u003e\u003cp\u003eaortic valve flow velocity\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv class=\"DefinitionListEntry\"\u003e\u003cdiv class=\"Term\"\u003eBNP\u003c/div\u003e\u003cdiv class=\"Description\"\u003e\u003cp\u003ebrain natriuretic peptide\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv class=\"DefinitionListEntry\"\u003e\u003cdiv class=\"Term\"\u003eCI\u003c/div\u003e\u003cdiv class=\"Description\"\u003e\u003cp\u003econfidence interval\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv class=\"DefinitionListEntry\"\u003e\u003cdiv class=\"Term\"\u003eDNP\u003c/div\u003e\u003cdiv class=\"Description\"\u003e\u003cp\u003ediastolic blood pressure\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv class=\"DefinitionListEntry\"\u003e\u003cdiv class=\"Term\"\u003eEHR\u003c/div\u003e\u003cdiv class=\"Description\"\u003e\u003cp\u003eelectronic health records\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv class=\"DefinitionListEntry\"\u003e\u003cdiv class=\"Term\"\u003eESC\u003c/div\u003e\u003cdiv class=\"Description\"\u003e\u003cp\u003eEuropean Society of Cardiology\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv class=\"DefinitionListEntry\"\u003e\u003cdiv class=\"Term\"\u003eHb\u003c/div\u003e\u003cdiv class=\"Description\"\u003e\u003cp\u003ehemoglobin\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv class=\"DefinitionListEntry\"\u003e\u003cdiv class=\"Term\"\u003eHDL-C\u003c/div\u003e\u003cdiv class=\"Description\"\u003e\u003cp\u003ehigh-density lipoprotein cholesterol\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv class=\"DefinitionListEntry\"\u003e\u003cdiv class=\"Term\"\u003eHF\u003c/div\u003e\u003cdiv class=\"Description\"\u003e\u003cp\u003eheart failure\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv class=\"DefinitionListEntry\"\u003e\u003cdiv class=\"Term\"\u003eICD\u003c/div\u003e\u003cdiv class=\"Description\"\u003e\u003cp\u003eInternational Classification of Diseases\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv class=\"DefinitionListEntry\"\u003e\u003cdiv class=\"Term\"\u003eIQR\u003c/div\u003e\u003cdiv class=\"Description\"\u003e\u003cp\u003einterquartile range\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv class=\"DefinitionListEntry\"\u003e\u003cdiv class=\"Term\"\u003eLA\u003c/div\u003e\u003cdiv class=\"Description\"\u003e\u003cp\u003eleft atrial diameter\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv class=\"DefinitionListEntry\"\u003e\u003cdiv class=\"Term\"\u003eLASSO\u003c/div\u003e\u003cdiv class=\"Description\"\u003e\u003cp\u003eleast absolute shrinkage and selection operator\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv class=\"DefinitionListEntry\"\u003e\u003cdiv class=\"Term\"\u003eLDL-C\u003c/div\u003e\u003cdiv class=\"Description\"\u003e\u003cp\u003elow-density lipoprotein cholesterol\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv class=\"DefinitionListEntry\"\u003e\u003cdiv class=\"Term\"\u003eLVDd\u003c/div\u003e\u003cdiv class=\"Description\"\u003e\u003cp\u003eleft ventricular end-diastolic diameter\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv class=\"DefinitionListEntry\"\u003e\u003cdiv class=\"Term\"\u003eLVEF\u003c/div\u003e\u003cdiv class=\"Description\"\u003e\u003cp\u003eleft ventricular ejection fraction\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv class=\"DefinitionListEntry\"\u003e\u003cdiv class=\"Term\"\u003eML\u003c/div\u003e\u003cdiv class=\"Description\"\u003e\u003cp\u003emachine learning\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv class=\"DefinitionListEntry\"\u003e\u003cdiv class=\"Term\"\u003eNT-proBNP\u003c/div\u003e\u003cdiv class=\"Description\"\u003e\u003cp\u003eN-terminal pro-brain natriuretic peptide\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv class=\"DefinitionListEntry\"\u003e\u003cdiv class=\"Term\"\u003eRCS\u003c/div\u003e\u003cdiv class=\"Description\"\u003e\u003cp\u003erestricted cubic spline\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv class=\"DefinitionListEntry\"\u003e\u003cdiv class=\"Term\"\u003eRFE\u003c/div\u003e\u003cdiv class=\"Description\"\u003e\u003cp\u003eRecursive Feature Elimination\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv class=\"DefinitionListEntry\"\u003e\u003cdiv class=\"Term\"\u003eROC\u003c/div\u003e\u003cdiv class=\"Description\"\u003e\u003cp\u003eReceiver operating characteristic\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv class=\"DefinitionListEntry\"\u003e\u003cdiv class=\"Term\"\u003eSBP\u003c/div\u003e\u003cdiv class=\"Description\"\u003e\u003cp\u003esystolic blood pressure\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv class=\"DefinitionListEntry\"\u003e\u003cdiv class=\"Term\"\u003eSD\u003c/div\u003e\u003cdiv class=\"Description\"\u003e\u003cp\u003estandard deviation\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv class=\"DefinitionListEntry\"\u003e\u003cdiv class=\"Term\"\u003eSEN\u003c/div\u003e\u003cdiv class=\"Description\"\u003e\u003cp\u003esensitivity\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv class=\"DefinitionListEntry\"\u003e\u003cdiv class=\"Term\"\u003eSHAP\u003c/div\u003e\u003cdiv class=\"Description\"\u003e\u003cp\u003eSHapley Additive exPlanations\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv class=\"DefinitionListEntry\"\u003e\u003cdiv class=\"Term\"\u003eSPE\u003c/div\u003e\u003cdiv class=\"Description\"\u003e\u003cp\u003especificity\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv class=\"DefinitionListEntry\"\u003e\u003cdiv class=\"Term\"\u003eTTE\u003c/div\u003e\u003cdiv class=\"Description\"\u003e\u003cp\u003etransthoracic echocardiography\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv class=\"DefinitionListEntry\"\u003e\u003cdiv class=\"Term\"\u003eUA\u003c/div\u003e\u003cdiv class=\"Description\"\u003e\u003cp\u003euric acid.\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003c/div\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003ch2\u003eConflict of Interest\u003c/h2\u003e\u003cp\u003eThe authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.\u003c/p\u003e\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eConsent for publication\u003c/strong\u003e\u003cp\u003eNot applicable.\u003c/p\u003e\u003c/p\u003e\u003cp\u003e\u003ch2\u003eEthics approval and consent to participate\u003c/h2\u003e\u003cp\u003e This study was reviewed and approved by the Ethics Committees of the Sun Yat-sen Memorial Hospital of Sun Yat-sen University (SYSKY-2024-004-01), the Ethics Committees of Tungwah Hospital of Sun Yat-sen University (DHKY-2025-003-01), and Dongguan Songshan Lake Tungwah Hospital (SDHKY-2025-002-01). The study adhered to the tenets of the Declaration of Helsinki.\u003c/p\u003e\u003c/p\u003e\u003ch2\u003eFunding\u003c/h2\u003e\u003cp\u003eThis study is partially supported by National Natural Science Foundation of China (62176016, 72274127), National Key R\\\u0026amp;D Program of China (No. 2021YFB2104800), Guizhou Province Science and Technology Project: Research on Q\u0026amp;A Interactive Virtual Digital People for Intelligent Medical Treatment in Information Innovation Environment (supported by Qiankehe[2024] General 058), Capital Health Development Research Project(2022-2-2013), Haidian innovation and translation program from Peking University Third Hospital (HDCXZHKC2023203), and Project: Research on the Decision Support System for Urban and Park Carbon Emissions Empowered by Digital Technology - A Special Study on the Monitoring and Identification of Heavy Truck Beidou Carbon Emission Reductions to Chao Tong. The Grant from Key Laboratory of Coronary Intraluminal Imaging and Functional Analysis of Dongguan City to Heng Li. And Shenzhen Medical Research Fund (B2302020), National Natural Science Foundation of China (82330021, 82270771), Shenzhen Science and Technology Program (KCXFZ20211020163801002, ZDSYS20220606100801004, SGDX20230116092459009), and Shenzhen Key Medical Discipline Construction Fund (SZXK002), Futian District Public Health Scientific Research Project of Shenzhen (FTWS2022001), Chinese Association of Integrative Medicine-Shanghai Hutchison Pharmaceuticals Fund (HMPE202202), China Heart House-Chinese Cardiovascular Association HX fund (2022-CCA-HX-090) to Hui Huang.\u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eChao Tong, Heng Li and Hui Huang are the guarantors of the study. All authors (Sijin Li, Yuqi Zhang, Weijie Wu, Guang Li, Tucheng Huang, Chao Tong, Heng Li and Hui Huang) were involved in the conceptualization and design of the study. Sijin Li and Yuqi Zhang were responsible for the experiment. Data cleaning was done by Weijie Wu Guang Li and Tuchang Huang. Analysis and interpretation were done by Sijin Li and Yuqi Zhang under the supervision and withthe support of Chao Tong, Heng Li and Hui Huang. Drafting of the article was done by Sijin Li, Yuqi Zhang. All authors (Sijin Li, Yuqi Zhang, Weijie Wu, Guang Li, Tucheng Huang, Chao Tong, Heng Li and Hui Huang) revised and contributed to the intellectual content of the article. All authors (Sijin Li, Yuqi Zhang, Weijie Wu, Guang Li, Tucheng Huang, Chao Tong, Heng Li and Hui Huang) approved the final version of the article, including the authorship list.\u003c/p\u003e\u003ch2\u003eData availability\u003c/h2\u003e\u003cp\u003eSome or all data sets generated during and/or analyzed during the present study are not publicly available but are available from the corresponding author on reasonable request.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eTsao CW, Aday AW, Almarzooq ZI, et al. Heart Disease and Stroke Statistics-2023 Update: A Report From the American Heart Association. Circulation. 2023;147(8):e93\u0026ndash;621.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePetzl AM, Jabbour G, Cadrin-Tourigny J et al. Innovative approaches to atrial fibrillation prediction: should polygenic scores and machine learning be implemented in clinical practice? Europace 2024; 26(8).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePapanastasiou CA, Theochari CA, Zareifopoulos N, et al. Atrial Fibrillation Is Associated with Cognitive Impairment, All-Cause Dementia, Vascular Dementia, and Alzheimer's Disease: a Systematic Review and Meta-Analysis. J Gen Intern Med. 2021;36(10):3122\u0026ndash;35.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKoh YH, Lew LZW, Franke KB, et al. Predictive role of atrial fibrillation in cognitive decline: a systematic review and meta-analysis of 2.8 million individuals. Europace. 2022;24(8):1229\u0026ndash;39.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eQin D, Mansour MC, Ruskin JN, Heist EK. Atrial Fibrillation-Mediated Cardiomyopathy. Circ Arrhythm Electrophysiol. 2019;12(12):e007809.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRienstra M, Tzeis S, Bunting KV et al. Spotlight on the 2024 ESC/EACTS management of atrial fibrillation guidelines: 10 novel key aspects. Europace 2024; 26(12).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMairesse GH, Moran P, Van Gelder IC, et al. Screening for atrial fibrillation: a European Heart Rhythm Association (EHRA) consensus document endorsed by the Heart Rhythm Society (HRS), Asia Pacific Heart Rhythm Society (APHRS), and Sociedad Latinoamericana de Estimulacion Cardiaca y Electrofisiologia (SOLAECE). Europace. 2017;19(10):1589\u0026ndash;623.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRivard L, Friberg L, Conen D, et al. Atrial Fibrillation and Dementia: A Report From the AF-SCREEN International Collaboration. Circulation. 2022;145(5):392\u0026ndash;409.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKirchhof P, Camm AJ, Goette A, et al. Early Rhythm-Control Therapy in Patients with Atrial Fibrillation. N Engl J Med. 2020;383(14):1305\u0026ndash;16.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eVan Gelder IC, Rienstra M, Bunting KV, et al. 2024 ESC Guidelines for the management of atrial fibrillation developed in collaboration with the European Association for Cardio-Thoracic Surgery (EACTS). Eur Heart J. 2024;45(36):3314\u0026ndash;414.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eJoglar JA, Chung MK, Armbruster AL, et al. 2023 ACC/AHA/ACCP/HRS Guideline for the Diagnosis and Management of Atrial Fibrillation: A Report of the American College of Cardiology/American Heart Association Joint Committee on Clinical Practice Guidelines. Circulation. 2024;149(1):e1\u0026ndash;156.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWriting Committee M, Joglar JA, Chung MK, et al. 2023 ACC/AHA/ACCP/HRS Guideline for the Diagnosis and Management of Atrial Fibrillation: A Report of the American College of Cardiology/American Heart Association Joint Committee on Clinical Practice Guidelines. J Am Coll Cardiol. 2024;83(1):109\u0026ndash;279.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLubitz SA, Atlas SJ, Ashburner JM, et al. Screening for Atrial Fibrillation in Older Adults at Primary Care Visits: VITAL-AF Randomized Controlled Trial. Circulation. 2022;145(13):946\u0026ndash;54.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eUittenbogaart SB, Verbiest-van Gurp N, Lucassen WAM, et al. Opportunistic screening versus usual care for detection of atrial fibrillation in primary care: cluster randomised controlled trial. BMJ. 2020;370:m3208.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eTzeis S, Gerstenfeld EP, Kalman J et al. 2024 European Heart Rhythm Association/Heart Rhythm Society/Asia Pacific Heart Rhythm Society/Latin American Heart Rhythm Society expert consensus statement on catheter and surgical ablation of atrial fibrillation. Europace 2024; 26(4).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eNatale A, Mohanty S, Sanders P, et al. Catheter ablation for atrial fibrillation: indications and future perspective. Eur Heart J. 2024;45(41):4383\u0026ndash;98.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eNattel S, Harada M. Atrial remodeling and atrial fibrillation: recent advances and translational perspectives. J Am Coll Cardiol. 2014;63(22):2335\u0026ndash;45.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePark JS, Cho I, Kim D, et al. Differentiating Left Atrial Pressure Responses in Paroxysmal and Persistent Atrial Fibrillation: Implications for Diagnosing Heart Failure With Preserved Ejection Fraction and Managing Atrial Fibrillation. J Am Heart Assoc. 2024;13(17):e035246.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eVanassche T, Lauw MN, Eikelboom JW, et al. Risk of ischaemic stroke according to pattern of atrial fibrillation: analysis of 6563 aspirin-treated patients in ACTIVE-A and AVERROES. Eur Heart J. 2015;36(5):281\u0026ndash;a7.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eChiang CE, Naditch-Brule L, Murin J, et al. Distribution and risk profile of paroxysmal, persistent, and permanent atrial fibrillation in routine clinical practice: insight from the real-life global survey evaluating patients with atrial fibrillation international registry. Circ Arrhythm Electrophysiol. 2012;5(4):632\u0026ndash;9.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eOgawa H, An Y, Ikeda S, et al. Progression From Paroxysmal to Sustained Atrial Fibrillation Is Associated With Increased Adverse Events. Stroke. 2018;49(10):2301\u0026ndash;8.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eShi S, Tang Y, Zhao Q, et al. Prevalence and risk of atrial fibrillation in China: A national cross-sectional epidemiological study. Lancet Reg Health West Pac. 2022;23:100439.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGavidia M, Zhu H, Montanari AN, et al. Early warning of atrial fibrillation using deep learning. Patterns (N Y). 2024;5(6):100970.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePetmezas G, Haris K, Stefanopoulos L, et al. Automated atrial fibrillation detection using a hybrid CNN-LSTM network on imbalanced ECG datasets. Biomed Signal Process Control. 2021;63:1021941746\u0026ndash;8094.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZhang Y, Li S, Mai P, et al. A machine learning-based model for predicting paroxysmal and persistent atrial fibrillation based on EHR. BMC Med Inf Decis Mak. 2025;25(1):51.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDaneshvar N, Pandita D, Erickson S, et al. Artificial Intelligence in the Provision of Health Care: An American College of Physicians Policy Position Paper. Ann Intern Med. 2024;177(7):964\u0026ndash;7.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRose C, Chen JH. Learning from the EHR to implement AI in healthcare. NPJ Digit Med. 2024;7(1):330.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZhang Y, Li S, Wu W, et al. Machine-learning-based models to predict cardiovascular risk using oculomics and clinic variables in KNHANES. BioData Min. 2024;17(1):12.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eQuer G, Topol EJ. The potential for large language models to transform cardiovascular medicine. Lancet Digit Health. 2024;6(10):e767\u0026ndash;71.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLundberg SM, Erion G, Chen H, et al. From local explanations to global understanding with explainable AI for trees. Nat Mach Intell. 2020;2(1):56\u0026ndash;672522.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGupta S, Glezerman IG, Hirsch JS, et al. Derivation and external validation of a simple risk score for predicting severe acute kidney injury after intravenous cisplatin: cohort study. BMJ. 2024;384:e077169.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eYang J, Wang T, Li K, Wang Y. Associations between per- and polyfluoroalkyl chemicals and abdominal aortic calcification in middle-aged and older adults. J Adv Res 2024.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAssempoor R, Daneshvar MS, Taghvaei A, et al. Atherogenic index of plasma and coronary artery disease: a systematic review and meta-analysis of observational studies. Cardiovasc Diabetol. 2025;24(1):35.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHindricks G, Potpara T, Dagres N, et al. 2020 ESC Guidelines for the diagnosis and management of atrial fibrillation developed in collaboration with the European Association for Cardio-Thoracic Surgery (EACTS): The Task Force for the diagnosis and management of atrial fibrillation of the European Society of Cardiology (ESC) Developed with the special contribution of the European Heart Rhythm Association (EHRA) of the ESC. Eur Heart J. 2021;42(5):373\u0026ndash;498.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eFabritz L, Crijns H, Guasch E et al. Dynamic risk assessment to improve quality of care in patients with atrial fibrillation: the 7th AFNET/EHRA Consensus Conference. \u003cem\u003eEuropace\u003c/em\u003e 2021; 23(3): 329\u0026thinsp;\u0026ndash;\u0026thinsp;44.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKim D, Yang PS, Sung JH, et al. Less dementia after catheter ablation for atrial fibrillation: a nationwide cohort study. Eur Heart J. 2020;41(47):4483\u0026ndash;93.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eTan S, Zhou J, Veang T, Lin Q, Liu Q. Global, regional, and national burden of atrial fibrillation and atrial flutter from 1990 to 2021: sex differences and global burden projections to 2046-a systematic analysis of the Global Burden of Disease Study 2021. Europace 2025; 27(2).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWong CX, Tse HF, Choi EK, et al. The burden of atrial fibrillation in the Asia-Pacific region. Nat Rev Cardiol. 2024;21(12):841\u0026ndash;3.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLiu H, Chen M. Atrial Fibrillation Screening in Asia: Balancing Costs and Benefits for Optimal Outcomes. JACC Asia. 2025;5(1):172\u0026ndash;3.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLu R, Lumish HS, Hasegawa K, et al. Prediction of new-onset atrial fibrillation in patients with hypertrophic cardiomyopathy using machine learning. Eur J Heart Fail. 2025;27(2):275\u0026ndash;84.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eJabbour G, Nolin-Lapalme A, Tastet O, et al. Prediction of incident atrial fibrillation using deep learning, clinical models, and polygenic scores. Eur Heart J. 2024;45(46):4920\u0026ndash;34.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePadfield GJ, Steinberg C, Swampillai J, et al. Progression of paroxysmal to persistent atrial fibrillation: 10-year follow-up in the Canadian Registry of Atrial Fibrillation. Heart Rhythm. 2017;14(6):801\u0026ndash;7.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eChoi SH, Jurgens SJ, Xiao L, et al. Sequencing in over 50,000 cases identifies coding and structural variation underlying atrial fibrillation risk. Nat Genet. 2025;57(3):548\u0026ndash;62.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSchotten U, Goette A, Verheule S. Translation of pathophysiological mechanisms of atrial fibrosis into new diagnostic and therapeutic approaches. Nat Rev Cardiol. 2025;22(4):225\u0026ndash;40.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLu Y, Sun Y, Cai L, et al. Non-traditional risk factors for atrial fibrillation: epidemiology, mechanisms, and strategies. Eur Heart J. 2025;46(9):784\u0026ndash;804.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWang X, Hou Y, Wang X, et al. Relationship between serum uric acid levels and different types of atrial fibrillation: An updated meta-analysis. Nutr Metab Cardiovasc Dis. 2021;31(10):2756\u0026ndash;65.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDing M, Viet NN, Gigante B, Lind V, Hammar N, Modig K. Elevated Uric Acid Is Associated With New-Onset Atrial Fibrillation: Results From the Swedish AMORIS Cohort. J Am Heart Assoc. 2023;12(3):e027089.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMasi S, Pugliese NR, Taddei S. The difficult relationship between uric acid and cardiovascular disease. Eur Heart J. 2019;40(36):3055\u0026ndash;7.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLiu CH, Huang SC, Yin CH et al. Atrial Fibrillation Risk and Urate-Lowering Therapy in Patients with Gout: A Cohort Study Using a Clinical Database. Biomedicines 2022; 11(1).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eYu W, Cheng JD. Uric Acid and Cardiovascular Disease: An Update From Molecular Mechanism to Clinical Perspective. Front Pharmacol. 2020;11:582680.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDehlin M, Jacobsson L, Roddy E. Global epidemiology of gout: prevalence, incidence, treatment patterns and risk factors. Nat Rev Rheumatol. 2020;16(7):380\u0026ndash;90.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDu L, Zong Y, Li H, et al. Hyperuricemia and its related diseases: mechanisms and advances in therapy. Signal Transduct Target Ther. 2024;9(1):212.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSun Z, Hao Y, Liu J et al. Blood pressure and in-hospital outcomes in patients hospitalized with atrial fibrillation: findings from the CCC-AF project. Hypertens Res 2025.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKjerpeseth LJ, Igland J, Selmer R, et al. Prevalence and incidence rates of atrial fibrillation in Norway 2004\u0026ndash;2014. Heart. 2021;107(3):201\u0026ndash;7.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAl-Khayatt BM, Salciccioli JD, Marshall DC, Krahn AD, Shalhoub J, Sikkel MB. Paradoxical impact of socioeconomic factors on outcome of atrial fibrillation in Europe: trends in incidence and mortality from atrial fibrillation. Eur Heart J. 2021;42(8):847\u0026ndash;57.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSharashova E, Gerdts E, Ball J et al. Long-term pulse pressure trajectories and risk of incident atrial fibrillation: the Tromso Study. Eur Heart J 2025.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMa C, Wu S, Liu S, Han Y. Chinese guidelines for the diagnosis and management of atrial fibrillation. Pacing Clin Electrophysiol. 2024;47(6):714\u0026ndash;70.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKim TH, Yang PS, Yu HT, et al. Age Threshold for Ischemic Stroke Risk in Atrial Fibrillation. Stroke. 2018;49(8):1872\u0026ndash;9.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLi YG, Lee SR, Choi EK, Lip GY. Stroke Prevention in Atrial Fibrillation: Focus on Asian Patients. Korean Circ J. 2018;48(8):665\u0026ndash;84.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eChoi SY, Kim MH, Lee KM, et al. Age-Dependent Anticoagulant Therapy for Atrial Fibrillation Patients with Intermediate Risk of Ischemic Stroke: A Nationwide Population-Based Study. Thromb Haemost. 2021;121(9):1151\u0026ndash;60.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eCheng YJ, Deng H, Wei HQ, et al. Association Between Age at Diagnosis of Atrial Fibrillation and Subsequent Risk of Ischemic Stroke. J Am Heart Assoc. 2025;14(4):e038367.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKo D, Chung MK, Evans PT, Benjamin EJ, Helm RH. Atrial Fibrillation: A Review. JAMA. 2025;333(4):329\u0026ndash;42.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":false,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"biodata-mining","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"bidm","sideBox":"Learn more about [BioData Mining](http://biodatamining.biomedcentral.com/)","snPcode":"13040","submissionUrl":"https://submission.nature.com/new-submission/13040/3","title":"BioData Mining","twitterHandle":"@BioMedCentral","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"BMC/SO AJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Paroxysmal atrial fibrillation, Persistent atrial fibrillation, Subtype classification, Machine learning, Multicenter retrospective study","lastPublishedDoi":"10.21203/rs.3.rs-7302454/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7302454/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003e\u003cb\u003eAims\u003c/b\u003e\u003c/p\u003e\u003cp\u003eAtrial fibrillation (AF) is a common arrhythmia associated with increased risks of stroke and heart failure. Early differentiation between paroxysmal and persistent AF at first diagnosis is critical for guiding treatment decisions. This study aimed to develop an interpretable machine learning model based on structured electronic health records (EHR) to distinguish AF subtypes and identify key contributing factors.\u003c/p\u003e\u003cp\u003e\u003cb\u003eMethods and results\u003c/b\u003e\u003c/p\u003e\u003cp\u003eIn this multicenter, retrospective cohort study, data were collected from three tertiary hospitals in China between January 2013 and January 2023. A total of 11,986 patients with suspected AF were screened, of whom 4155 patients with first-diagnosed AF were included (paroxysmal: 2565 [61.3%]; persistent: 1620 [38.7%]). Structured EHR variables including clinical demographics, serological indicators, and echocardiographic parameters were extracted. Variable selection was performed using Spearman correlation and least absolute shrinkage and selection operator regression. Three machine learning algorithms were trained and externally validated. The CatBoost model achieved the best performance, with an area under the receiver operating characteristic curve of 0.876 (95% CI: 0.871\u0026ndash;0.880) and accuracy of 0.808 (95% CI: 0.803\u0026ndash;0.816). Sensitivity and specificity ranged from 0.802 to 0.811. Shapley additive explanations (SHAP) were used to interpret model outputs and identify variables most associated with AF subtype classification.\u003c/p\u003e\u003cp\u003e\u003cb\u003eConclusion\u003c/b\u003e\u003c/p\u003e\u003cp\u003eThis multicenter study demonstrates that interpretable machine learning models based on structured EHR data can accurately distinguish paroxysmal from persistent AF at first diagnosis. The proposed model may facilitate early subtype-specific risk stratification and personalized treatment, potentially improve outcomes and reduce disparities in AF care across different medical conditions.\u003c/p\u003e","manuscriptTitle":"Early differentiation between paroxysmal and persistent atrial fibrillation based on interpretable machine learning: a multicenter retrospective study","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-09-09 13:20:02","doi":"10.21203/rs.3.rs-7302454/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2026-01-16T21:51:32+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-01-16T21:50:41+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"217816317559526093989216347582739998680","date":"2025-10-09T22:59:09+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-09-02T14:02:55+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-08-08T19:43:16+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-08-08T07:59:26+00:00","index":"","fulltext":""},{"type":"submitted","content":"BioData Mining","date":"2025-08-05T15:37:28+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"biodata-mining","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"bidm","sideBox":"Learn more about [BioData Mining](http://biodatamining.biomedcentral.com/)","snPcode":"13040","submissionUrl":"https://submission.nature.com/new-submission/13040/3","title":"BioData Mining","twitterHandle":"@BioMedCentral","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"BMC/SO AJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"b8b33675-029c-49d4-95b0-29c250c3ddc7","owner":[],"postedDate":"September 9th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2026-02-09T16:02:11+00:00","versionOfRecord":{"articleIdentity":"rs-7302454","link":"https://doi.org/10.1186/s13040-026-00525-5","journal":{"identity":"biodata-mining","isVorOnly":false,"title":"BioData Mining"},"publishedOn":"2026-02-06 15:58:42","publishedOnDateReadable":"February 6th, 2026"},"versionCreatedAt":"2025-09-09 13:20:02","video":"","vorDoi":"10.1186/s13040-026-00525-5","vorDoiUrl":"https://doi.org/10.1186/s13040-026-00525-5","workflowStages":[]},"version":"v1","identity":"rs-7302454","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7302454","identity":"rs-7302454","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00