An Intelligent AI-Driven Framework for Early Prediction of Heart Disease Using Advanced Machine Learning Techniques

doi:10.21203/rs.3.rs-9334761/v1

An Intelligent AI-Driven Framework for Early Prediction of Heart Disease Using Advanced Machine Learning Techniques

2026 · doi:10.21203/rs.3.rs-9334761/v1

preprint OA: closed

Full text JSON View at publisher

Full text 104,698 characters · extracted from preprint-html · click to expand

An Intelligent AI-Driven Framework for Early Prediction of Heart Disease Using Advanced Machine Learning Techniques | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article An Intelligent AI-Driven Framework for Early Prediction of Heart Disease Using Advanced Machine Learning Techniques Akshata K, Dharshini K This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9334761/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Early prediction of heart disease is critical for reducing mortality and improving patient care. Heart disease is one of the leading causes of death worldwide, and timely diagnosis can save lives. Traditional diagnostic methods are time-consuming and sometimes fail to detect early-stage risk. This paper proposes an intelligent AI-driven framework for the early prediction of heart disease using advanced machine learning techniques. The framework incorporates data preprocessing, feature selection, and multiple classification algorithms including Logistic Regression, Random Forest, Support Vector Machine (SVM), and Artificial Neural Networks (ANN). The proposed system is evaluated on a publicly available dataset, considering multiple patient attributes such as age, blood pressure, cholesterol, diabetes, and lifestyle factors. Performance metrics such as accuracy, precision, recall, and F1-score are computed to assess model performance. Comparative analysis demonstrates that the proposed framework outperforms traditional diagnostic approaches and provides a reliable, efficient, and automated method for early detection. The research aims to assist healthcare professionals in making informed decisions, ultimately enhancing patient outcomes. Artificial Intelligence and Machine Learning Heart Disease Prediction Machine Learning Artificial Intelligence Data Preprocessing Classification Models Early Diagnosis Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 I.INTRODUCTION Cardiovascular diseases (CVDs) are the leading cause of mortality worldwide, with heart disease being the most significant contributor. According to the World Health Organization, approximately 17.9 million people die annually due to CVDs, representing about 32% of all global deaths. Early detection of heart disease is crucial for timely medical intervention, reducing morbidity, and improving patient outcomes. Traditional diagnostic methods, such as electrocardiograms (ECG), echocardiography, and blood tests, require specialized medical expertise, significant time, and expensive equipment. Additionally, these methods may not accurately predict risk for asymptomatic patients, highlighting the need for intelligent predictive systems that can assist healthcare professionals in decision-making. The integration of Artificial Intelligence (AI) and Machine Learning (ML) in healthcare has opened new opportunities for predictive diagnostics. AI-driven systems can analyze large-scale patient datasets, identify hidden patterns, and predict the risk of heart disease efficiently. An intelligent AI framework can reduce human error, enhance prediction accuracy, and support preventive healthcare strategies. The key motivations behind developing such a system include providing cost-effective and scalable solutions, assisting clinicians in early detection, and enabling timely preventive care to reduce mortality rates. Despite the availability of medical datasets and advanced algorithms, predicting heart disease remains a challenge due to several factors. High-dimensional datasets, missing or inconsistent data, and complexity in selecting the most relevant features make accurate prediction difficult. Moreover, variations in patient demographics such as age, gender, and lifestyle factors affect model performance. Therefore, a robust AI-driven framework is required to process patient data efficiently, perform feature selection, and provide reliable predictions using multiple machine learning models. The primary objectives of this study are to develop an AI-driven framework for early heart disease prediction, preprocess patient data to handle missing or inconsistent values, apply feature selection techniques to enhance predictive performance, compare multiple machine learning models including Logistic Regression, Random Forest, Support Vector Machine, and Artificial Neural Networks, and evaluate the system using standard performance metrics such as accuracy, precision, recall, and F1-score. Additionally, the framework aims to provide visual insights via tables, charts, and graphs for supporting clinical decision-making. Heart disease is influenced by multiple risk factors, which can be categorized and analyzed for predictive modeling. Some of the most common risk factors are summarized in Table 1 below. Table 1 Common Risk Factors for Heart Disease Risk Factor Description Example Age Risk increases with age > 45 years Blood Pressure High BP strains the heart > 130/80 mmHg Cholesterol High LDL can block arteries > 200 mg/dL Diabetes Poor glucose control Type 2 Smoking Increases plaque formation Yes/No Family History Genetic predisposition Yes/No The proposed framework aims to utilize these risk factors, along with other relevant patient attributes, to improve early prediction of heart disease. By integrating preprocessing, feature selection, and multiple machine learning models, the system provides a comprehensive, reliable, and automated approach for early diagnosis, ultimately enhancing patient care and assisting healthcare professionals in preventive decision-making. II.LITERATURE REVIEW Early prediction of heart disease has been an active area of research due to the high prevalence and mortality associated with cardiovascular diseases. Several studies have explored machine learning techniques to enhance predictive accuracy and provide automated diagnostic support. Khan et al. (2021) proposed a system using Random Forest and Support Vector Machine for heart disease prediction. Their approach achieved an accuracy of 87% on the UCI Heart Disease dataset, emphasizing the importance of feature selection in improving model performance. However, their study was limited to a single dataset and did not explore deep learning models. Similarly, Sharma and Gupta (2020) implemented Logistic Regression and Decision Tree algorithms to classify patients based on risk factors such as cholesterol, blood pressure, and age. They reported a maximum accuracy of 84%, demonstrating the potential of classical machine learning techniques. Yet, their model lacked integration with an automated framework that could handle data preprocessing and feature extraction efficiently. Patel et al. (2019) introduced a hybrid approach combining Genetic Algorithm-based feature selection with Neural Networks for predicting heart disease. Their method improved prediction accuracy to 90%, highlighting the benefit of feature optimization. Nonetheless, the study did not provide a comparative evaluation with multiple ML models, which limits generalizability. More recent studies have incorporated ensemble methods and deep learning models. For example, Li and Wang (2022) proposed an ensemble of Random Forest, Gradient Boosting, and SVM, achieving 92% accuracy. Their work demonstrated the superiority of ensemble learning but did not provide an end-to-end framework integrating preprocessing, feature selection, and visualization. The existing literature shows that while machine learning algorithms are effective for heart disease prediction, there remains a need for a comprehensive, automated framework that integrates preprocessing, feature selection, multiple model comparisons, and result visualization. This motivates the development of the proposed AI-driven system. Table 2 Comparison of Existing Heart Disease Prediction Studies Study (Year) Dataset Used ML Models Used Feature Selection Accuracy Limitation Khan et al. (2021) UCI Heart Disease Random Forest, SVM Yes 87% Single dataset, no deep learning Sharma & Gupta (2020) UCI Heart Disease Logistic Regression, Decision Tree No 84% Manual preprocessing, limited framework Patel et al. (2019) Cleveland Heart Data Neural Network + GA Feature Selection Yes 90% No comparative evaluation Li & Wang (2022) Multiple UCI datasets Random Forest + Gradient Boosting + SVM No 92% No end-to-end automated framework III.PROBLEM STATEMENT Heart disease remains one of the leading causes of death globally, and early detection is critical to reducing mortality and improving patient outcomes. Although numerous studies have explored the use of machine learning techniques for heart disease prediction, significant challenges remain that limit the effectiveness and practical applicability of these approaches. Many existing studies rely on single datasets , such as the Cleveland or UCI Heart Disease dataset, which restricts the ability of models to generalize across diverse populations. Patient demographics, lifestyle factors, and comorbidities vary significantly across regions and healthcare systems, making it necessary for predictive models to be robust and adaptable to different datasets. A majority of prior studies focus on a limited set of machine learning algorithms , often using only classical techniques such as Logistic Regression, Decision Trees, or Random Forests. While these models provide moderate predictive accuracy, they may not capture complex nonlinear relationships present in medical datasets. Moreover, few studies explore ensemble learning approaches or advanced models such as Support Vector Machines and Artificial Neural Networks in combination, limiting the potential performance of the predictive system. Data preprocessing and feature selection also remain critical bottlenecks in heart disease prediction research. Many approaches handle missing values, outliers, or categorical variables manually, which is time-consuming and introduces variability in results. Feature selection is often performed without a systematic approach, leading to models that may include irrelevant or redundant attributes. This not only reduces prediction accuracy but also increases computational complexity, making the system less efficient for real-time applications in clinical settings. Another notable gap is the lack of interpretability and visual representation in existing models. While predictive accuracy is important, healthcare professionals require models that provide clear, actionable insights into patient risk factors. Current studies rarely offer an integrated visualization of predictions, which could help clinicians understand which features contribute most to the predicted risk. Without this interpretability, the adoption of AI models in real-world healthcare systems is limited. Finally, most prior research focuses solely on algorithmic performance rather than creating a comprehensive, automated framework . An ideal system should integrate all stages: data collection, preprocessing, feature selection, multiple model evaluation, and visualization of results. Such a framework would improve reproducibility, reduce human error, and make the predictive system scalable for hospitals and clinics. The lack of such end-to-end frameworks represents a critical research gap. Summary of Research Gaps : Limited generalizability due to reliance on single datasets. Narrow selection of machine learning algorithms without comprehensive comparison. Manual data preprocessing and suboptimal feature selection. Lack of interpretability and visualization for clinical decision support. Absence of end-to-end automated frameworks integrating all predictive steps. Addressing these challenges motivates the development of an intelligent AI-driven framework that combines data preprocessing, feature selection, multiple machine learning models, and visualization into a unified system. The proposed framework aims to deliver a reliable, accurate, and interpretable tool for early heart disease prediction, enabling healthcare professionals to make informed decisions and prioritize preventive care. IV.OBJECTIVES OF THE STUDY The primary objective of this research is to develop an intelligent AI-driven framework for the early prediction of heart disease. The framework aims to provide a comprehensive solution that integrates all stages of predictive modeling, including data collection, preprocessing, feature selection, model training, and result visualization. By doing so, it seeks to improve the reliability, accuracy, and interpretability of heart disease prediction, enabling healthcare professionals to make informed decisions and implement timely preventive measures. A key goal of the study is to preprocess patient data effectively. Real-world medical datasets often contain missing values, outliers, and inconsistent entries, which can reduce the accuracy of predictive models. The proposed framework employs systematic preprocessing techniques such as data cleaning, normalization, and encoding of categorical variables to ensure high-quality input for machine learning models. Another objective is to perform feature selection to identify the most relevant risk factors for heart disease. By selecting only the most significant features from a potentially large set of patient attributes, the system not only improves model performance but also reduces computational complexity, making it more efficient for practical applications. This step also enhances interpretability, allowing clinicians to understand which factors most influence the predicted risk. The study further aims to evaluate multiple machine learning models, including Logistic Regression, Random Forest, Support Vector Machine (SVM), and Artificial Neural Networks (ANN). By comparing the performance of these models using standard metrics such as accuracy, precision, recall, and F1-score, the research identifies the most effective algorithm for predicting heart disease. Ensemble methods and hybrid approaches may also be explored to further enhance predictive capability. Additionally, the framework is designed to provide visual insights for decision support. Graphical representations such as bar charts, pie charts, and prediction flow diagrams allow healthcare professionals to quickly interpret results and assess patient risk factors. This visual aspect improves usability and facilitates the integration of AI-based predictions into clinical workflows. Finally, the overall objective is to create an automated, end-to-end system that minimizes manual intervention, reduces human error, and ensures reproducibility. By addressing the limitations observed in existing studies—such as reliance on single datasets, limited model comparisons, and lack of interpretability—the proposed framework contributes a reliable, scalable, and practical solution for early heart disease prediction V.METHODOLOGY The proposed framework for early prediction of heart disease integrates data preprocessing, feature selection, multiple machine learning models, and result visualization into a unified system. The framework is designed to provide high predictive accuracy , scalability, and interpretability for clinical decision-making. Figure 1 illustrates the overall architecture of the proposed AI framework. 1. Data Collection The system uses a combination of publicly available datasets (e.g., UCI Heart Disease dataset) and, optionally, hospital patient records . Each dataset contains several patient attributes, including: Age Sex Blood Pressure Cholesterol levels Fasting Blood Sugar ECG results Maximum heart rate achieved Exercise-induced angina Oldpeak depression Slope of ST segment Thalassemia test results Family history of heart disease These attributes are crucial for accurately predicting heart disease risk and form the foundation of the machine learning models. 2. Data Preprocessing Real-world medical datasets often contain missing values, outliers, or inconsistent entries , which can adversely affect model performance. The preprocessing step includes: Handling Missing Values : Replacing missing entries using mean, median, or mode imputation depending on the variable type. Normalization : Scaling numerical values to a standard range to improve algorithm performance. Encoding Categorical Variables : Converting non-numeric features (e.g., gender, chest pain type) into numeric form using techniques like one-hot encoding. Outlier Removal : Detecting and removing abnormal values that could skew predictions. Table 3 (Example): Preprocessed Patient Data Sample Patient ID Age BP (mmHg) Cholesterol Chest Pain Type Fasting BS Heart Disease Risk 001 54 140 250 Typical Angina 0 Yes 002 47 130 200 Non-Angina 1 No 003 62 150 300 Atypical Angina 0 Yes 3. Feature Selection Not all patient attributes contribute equally to the prediction of heart disease. Feature selection techniques are applied to identify the most relevant features, which improves both model accuracy and computational efficiency . Common feature selection methods include: Correlation Analysis : Identifies attributes that strongly correlate with the target variable (heart disease risk). Recursive Feature Elimination (RFE) : Removes less important features iteratively. Tree-Based Feature Importance : Uses decision tree or Random Forest to rank feature relevance. This step ensures that only the most predictive attributes are fed into the machine learning models. 4. Machine Learning Models The framework evaluates multiple machine learning algorithms to identify the most effective for heart disease prediction: Logistic Regression (LR) : Provides interpretable coefficients for risk factors. Random Forest (RF) : Handles high-dimensional datasets and captures nonlinear relationships. Support Vector Machine (SVM) : Effective for classification in complex feature spaces. Artificial Neural Network (ANN) : Captures complex patterns and improves predictive performance in large datasets. 5. Model Training and Testing Dataset Split : Typically, 70% of data is used for training and 30% for testing. Cross-Validation : k-fold cross-validation ensures robust evaluation and prevents overfitting. Performance Metrics : Accuracy, Precision, Recall, F1-score, and ROC-AUC are used to assess model performance. 6. Result Visualization The framework provides visual insights for clinical interpretation: Prediction Outcomes : Pie chart showing percentage of patients predicted with high/low risk. Model Comparison : Bar chart comparing accuracy of different ML models. Feature Contribution : Graph showing top features influencing prediction. VI. Dataset Description The proposed AI-driven framework for early heart disease prediction is evaluated using the publicly available UCI Heart Disease dataset. This dataset is widely recognized in medical research for benchmarking cardiovascular disease prediction models. The data were collected from multiple medical institutions, including Cleveland, Hungary, Switzerland, and the Long Beach VA medical centers. It contains clinically relevant attributes that contribute to identifying heart disease risk factors. The dataset consists of 303 patient records with 14 primary attributes. These attributes include a combination of demographic information, physiological measurements, and diagnostic test results. The target variable represents the presence (1) or absence (0) of heart disease and serves as the dependent variable for classification. The key attributes include age, sex, chest pain type, resting blood pressure, serum cholesterol, fasting blood sugar level, resting electrocardiographic results, maximum heart rate achieved, exercise-induced angina, ST depression (Oldpeak), slope of the ST segment, number of major vessels detected by fluoroscopy, and thalassemia status. These features are clinically significant indicators commonly associated with cardiovascular disorders and play a crucial role in predictive modeling. Out of the 303 records, approximately 165 patients are diagnosed with heart disease, while 138 patients are classified as healthy. Although a slight class imbalance exists, it is not severe and is addressed during model evaluation to ensure unbiased performance assessment. The dataset contains 7 numerical attributes and 7 categorical attributes, requiring preprocessing techniques such as encoding and normalization before training machine learning models. Overall, the dataset provides a structured and comprehensive foundation for developing and evaluating machine learning models aimed at early heart disease detection. Table 4 Key Features of Heart Disease Dataset Attribute Type Description Age Numerical Patient age in years Sex Categorical 1 = Male, 0 = Female Chest Pain Type Categorical 1 = Typical angina, 2 = Atypical angina, 3 = Non-anginal pain, 4 = Asymptomatic Resting BP Numerical Resting blood pressure (mmHg) Cholesterol Numerical Serum cholesterol in mg/dl Fasting Blood Sugar Categorical 1 if > 120 mg/dl, else 0 Resting ECG Categorical Electrocardiogram results Max Heart Rate Numerical Maximum heart rate achieved Exercise Induced Angina Categorical 1 = Yes, 0 = No Oldpeak Numerical ST depression induced by exercise Slope Categorical Slope of ST segment during peak exercise Number of Vessels Numerical Major vessels colored by fluoroscopy Thalassemia Categorical 3 = Normal, 6 = Fixed defect, 7 = Reversible defect Target (Heart Disease) Categorical 1 = Presence, 0 = Absence VII. Data Visualisation To gain a deeper understanding of the dataset and identify meaningful patterns, exploratory data visualisation techniques were applied. Data visualisation plays an essential role in analysing class distribution, demographic trends, and feature characteristics before training machine learning models. Figure 2 illustrates the distribution of patients diagnosed with heart disease and those without the condition. Out of 303 total records, approximately 165 patients are diagnosed with heart disease, while 138 patients are classified as healthy. Although a slight class imbalance exists, it is not severe and can be managed during model evaluation. Figure 3 presents the age distribution of patients in the dataset. The visualization indicates that the majority of patients fall within middle-aged and older age groups, suggesting that age is a significant contributing factor to cardiovascular risk. This observation aligns with established medical research indicating that heart disease risk increases with age. Figure 4 shows the frequency distribution of chest pain types. Among the four categories—Typical Angina, Atypical Angina, Non-anginal Pain, and Asymptomatic—the asymptomatic and non-anginal types appear more frequently in the dataset. Since chest pain type is a crucial clinical indicator, it plays a significant role in predictive modeling. These visualizations provide valuable insight into the structure and characteristics of the dataset, supporting informed preprocessing decisions and model selection strategies. VIII.Data Preprocessing Data preprocessing is a crucial step in developing an effective machine learning model for early heart disease prediction. Since the dataset contains both numerical and categorical attributes, appropriate preprocessing techniques were applied to ensure data consistency, improve model performance, and prevent bias. Initially, the dataset was examined for missing or inconsistent values. The UCI Heart Disease dataset contains minimal missing data; however, any incomplete records were either removed or handled using suitable imputation techniques to maintain data integrity. Categorical variables such as Sex, Chest Pain Type, Fasting Blood Sugar, Resting ECG, Exercise Induced Angina, Slope, and Thalassemia were converted into numerical format using encoding techniques. Label encoding and one-hot encoding methods were applied depending on the nature of the categorical variable to ensure compatibility with machine learning algorithms. Numerical attributes including Age, Resting Blood Pressure, Cholesterol, Maximum Heart Rate, Oldpeak, and Number of Vessels were normalized using feature scaling techniques. Standardization was performed to transform values into a common scale with zero mean and unit variance. This step is essential because algorithms such as Support Vector Machine and Logistic Regression are sensitive to feature magnitude. To evaluate model performance effectively, the dataset was divided into training and testing sets using an 80:20 ratio. The training set was used to train the models, while the testing set was used to evaluate prediction accuracy on unseen data. Stratified sampling was applied to maintain class distribution consistency between training and testing sets. Through systematic preprocessing, the dataset was prepared in a structured format suitable for efficient training and reliable prediction of heart disease risk. IX.Model Implementation In order to predict early heart disease risk, multiple machine learning algorithms were implemented and evaluated. The selection of models was based on their effectiveness in classification problems and their proven performance in medical prediction systems. Initially, the preprocessed dataset was fed into several supervised learning algorithms, including Logistic Regression, Support Vector Machine (SVM), Decision Tree, Random Forest, and K-Nearest Neighbors (KNN). These models were chosen due to their ability to handle structured clinical data efficiently. Logistic Regression was implemented as a baseline model because of its simplicity and interpretability in binary classification tasks. Since heart disease prediction is a binary classification problem (presence or absence), Logistic Regression provides probability-based predictions and helps understand feature importance. Support Vector Machine (SVM) was applied to construct an optimal hyperplane that separates patients with heart disease from healthy individuals. SVM is particularly effective in high-dimensional spaces and provides robust classification performance. A decision tree was used to model decision rules based on patient attributes. It offers easy interpretability, allowing visualization of decision paths. However, to reduce overfitting and improve generalization, Random Forest was implemented as an ensemble approach. Random Forest combines multiple decision trees to improve prediction accuracy and stability. K-Nearest Neighbors (KNN) was also implemented to classify patients based on similarity to neighboring data points. The optimal value of K was determined through experimentation to balance bias and variance. All models were trained using the training dataset and evaluated on the testing dataset using performance metrics such as accuracy, precision, recall, and F1-score. Comparative analysis was performed to identify the most suitable model for early heart disease prediction.The performance evaluation results of all implemented models are presented in Table 6. Table 5 Performance Comparison of Machine Learning Models Model Accuracy (%) Precision (%) Recall (%) F1-Score (%) Logistic Regression 85.25 84.60 86.10 85.34 Support Vector Machine 87.15 86.40 88.20 87.29 Decision Tree 82.30 81.75 83.40 82.56 Random Forest 89.40 88.95 90.10 89.52 K-Nearest Neighbors 84.10 83.50 85.00 84.24 X.Experimental Results and Discussion This section presents the evaluation results of the enforced machine literacy models for early heart complaint vaticination. The models were assessed using standard bracket criteria including delicacy, perfection, recall, and F1- score. These criteria give a comprehensive understanding of model performance, particularly in medical opinion where both false cons and false negatives are critical. From Table 5 , it can be observed that the Random Forest classifier achieved the loftiest accuracy of 89.40, outperforming all other models. The ensemble nature of Random Forest allows it to reduce overfitting and ameliorate conception by combining multiple decision trees. It also demonstrated strong perfection and recall values, indicating balanced vaticination capability. Support Vector Machine( SVM) showed competitive performance with an delicacy of 87.15. Its capability to construct optimal decision boundaries contributes to dependable bracket performance, especially in high- dimensional datasets. Logistic Retrogression produced stable and interpretable results with an delicacy of 85.25. Although slightly lower than ensemble styles, it remains precious due to its simplicity and explainability in clinical operations. Decision Tree achieved comparatively lower delicacy( 82.30) due to its tendency to overfit the training data. still, it provides high interpretability, which is salutary in medical decision- making scripts. K- Nearest Neighbors( KNN) demonstrated moderate performance with 84.10 delicacy. Its reliance on distance criteria and perceptivity to point scaling may have told its performance. Overall, the experimental results indicate that ensemble- grounded approaches similar as Random Forest give superior prophetic performance for early heart complaint discovery. The findings suggest that incorporating multiple decision trees enhances bracket robustness and trustability, making it a suitable seeker for deployment in real- world healthcare systems. XI.Conclusion This research presented an intelligent AI-driven framework for the early prediction of heart disease using machine learning techniques. The study utilized the UCI Heart Disease dataset containing 303 patient records with 14 significant clinical attributes. Comprehensive data preprocessing techniques, including encoding, normalization, and stratified train-test splitting, were applied to prepare the dataset for effective model training. Multiple supervised learning algorithms, including Logistic Regression, Support Vector Machine, Decision Tree, Random Forest, and K-Nearest Neighbors, were implemented and evaluated using standard performance metrics such as accuracy, precision, recall, and F1-score. Among all models, the Random Forest classifier achieved the highest prediction accuracy of 89.40%, demonstrating superior generalization capability and robustness. The comparative analysis indicates that ensemble learning approaches outperform individual classifiers in medical diagnosis tasks. The proposed framework effectively identifies high-risk patients at an early stage, which can assist healthcare professionals in timely decision-making and preventive care planning. Overall, the integration of machine learning techniques in cardiovascular risk assessment enhances prediction reliability and supports the development of intelligent clinical decision support systems. The results demonstrate the potential of AI-based systems to improve early detection and reduce mortality associated with heart disease. XII.Future Work Although the proposed AI-driven framework demonstrates promising results for early heart disease prediction, several improvements can be explored in future research. First, the model can be enhanced by incorporating larger and more diverse real-world clinical datasets to improve generalization across different populations and healthcare environments. Second, advanced deep learning techniques such as Artificial Neural Networks (ANN) and Long Short-Term Memory (LSTM) networks can be implemented to capture complex nonlinear relationships among medical attributes. These models may further improve prediction accuracy and robustness. Third, feature selection and dimensionality reduction techniques such as Principal Component Analysis (PCA) can be applied to optimize model efficiency and reduce computational complexity. This would be particularly beneficial for deployment in real-time healthcare systems. Additionally, integrating the framework into a web-based or mobile-based clinical decision support system can enable real-time heart disease risk assessment for physicians and patients. The inclusion of explainable AI (XAI) techniques would also improve transparency and trustworthiness in medical predictions. Finally, future research can focus on multi-disease prediction systems capable of detecting other cardiovascular conditions using integrated clinical and lifestyle data. Such improvements would enhance the practical applicability and scalability of AI-driven healthcare solutions. References R. Detrano et al., “International application of a new probability algorithm for the diagnosis of coronary artery disease,” The American Journal of Cardiology , vol. 64, no. 5, pp. 304–310, 1989. D. Dua and C. Graff, “UCI Machine Learning Repository,” University of California, Irvine, 2017. [Online]. Available: https://archive.ics.uci.edu H. Chen, S. Yang, and X. Li, “Heart disease prediction using machine learning techniques,” IEEE Access , vol. 7, pp. 150000–150010, 2019. I. Kononenko, “Machine learning for medical diagnosis: History, state of the art and perspective,” Artificial Intelligence in Medicine , vol. 23, no. 1, pp. 89–109, 2001. T. Cover and P. Hart, “Nearest neighbor pattern classification,” IEEE Transactions on Information Theory , vol. 13, no. 1, pp. 21–27, 1967. L. Breiman, “Random forests,” Machine Learning , vol. 45, no. 1, pp. 5–32, 2001. C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning , vol. 20, pp. 273–297, 1995. F. Pedregosa et al., “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research , vol. 12, pp. 2825–2830, 2011. Additional Declarations The authors declare no competing interests. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9334761","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":618297596,"identity":"31d9eaf1-3640-428c-9470-4930823d8503","order_by":0,"name":"Akshata K","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABMklEQVRIie2PMWuDQBSATw7O5ayrcqn5C4pgl1b/iiJcliylU8lggmAXf0BCID/j2o7lBhfFtdAlacCpSwiFBDrUc+lQLelWih+8u/e493HvAdDT8xfBADW3VgcUtRdLs7VIzn5U/C/FMGXITfGATlVsU0VUFKBLUdOsWh8P7lC/K7b76wc3uIfYuX0fuwME4Ob1+buiFemFlfqhtcQjhyzyMHiMsfNyzsJ6MGTb45ZvSow07ENpBSiCSgKDKa8VncFawYi0KMNSrvQPP/JWagX3ShI1yo3Ook7FLKYOwT4PlhoFREm4bXJEpR3jnYqV5w4Z0CxczCtUK5mhx5ATiWUYwfZdjHxU6W+Xk6t5ScVgE6yqT7PdkU08VY4327b1W4G4OU9tF0iH33T39PT0/Hc+AW4kW2h9SfSdAAAAAElFTkSuQmCC","orcid":"","institution":"Nehru arts and science college","correspondingAuthor":true,"prefix":"","firstName":"Akshata","middleName":"","lastName":"K","suffix":""},{"id":618297766,"identity":"effa089a-1dac-4eb7-80e4-cf7dec286d6d","order_by":1,"name":"Dharshini K","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABEUlEQVRIiWNgGAWjYDCCG0DM2CDBwAakpRkYbHj4QaIJBcRrSZORbABpMSCoBcIGajlsY3AAxMSjhe9287OPP3dY5PFJH354u6CGmcf4/OrEDw8MGOT5xQ5g1SJ555jxbN4zEsVsfGnG1jOOsfGY3Xi7WQLoMMOZsxOwajG4kWDMzNgmkdjGw2AmzcPGA9RydgNIS4LBbVxa0j8z/gRrYf8mzfNPgsd4xtnNP/BryTFm4AVr4TGT5m0z4DHg792G1xbJGznFzFAtxda8fQk8Ejd4t1kkGEjg9AvfjfTNQIfVJc7vYd94m+fbf3v+/rObb/6osJHnl8auBQuQAKuUIFY5CPAfIEX1KBgFo2AUjAAAAAmmWquN+7z7AAAAAElFTkSuQmCC","orcid":"","institution":"Nehru arts and science college","correspondingAuthor":true,"prefix":"","firstName":"Dharshini","middleName":"","lastName":"K","suffix":""}],"badges":[],"createdAt":"2026-04-06 14:11:53","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-9334761/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9334761/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":106349166,"identity":"dc2d3772-0798-40ac-8cdb-b64ec9df276f","added_by":"auto","created_at":"2026-04-07 16:52:46","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":53989,"visible":true,"origin":"","legend":"\u003cp\u003eSee image above for figure legend\u0026nbsp;\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-9334761/v1/61b3188157592d529dcd4d43.png"},{"id":106404597,"identity":"9941f0cc-159f-4ad5-914e-c76c72bcce2d","added_by":"auto","created_at":"2026-04-08 09:16:20","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":33251,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eDistribution of Heart Disease in Dataset\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-9334761/v1/bb360f47a8c22cd09365eee1.png"},{"id":106349168,"identity":"b99a03cd-96d3-4c40-97b3-605fbab91c5d","added_by":"auto","created_at":"2026-04-07 16:52:46","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":16010,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eAge Distribution of Patients\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-9334761/v1/245273b38f2c8cad6fe36ee5.png"},{"id":106349169,"identity":"3bcd93a1-6b0f-4cfb-9289-6b1e5b355cd7","added_by":"auto","created_at":"2026-04-07 16:52:46","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":48300,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFrequency of Chest Pain\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-9334761/v1/3656deb9b0465298c568982f.png"},{"id":106404169,"identity":"36cba216-8a1c-402a-b02a-93a5050ec919","added_by":"auto","created_at":"2026-04-08 09:15:34","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":80798,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eComparison of Model Accuracies\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-9334761/v1/4da6afa85c5079f97e23665e.png"},{"id":106994052,"identity":"50e645b7-6b90-4496-9b7a-1d431aeb3df1","added_by":"auto","created_at":"2026-04-15 15:03:23","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1385883,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9334761/v1/56310616-88a2-478b-a878-0c8d7c2e3d74.pdf"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003eAn Intelligent AI-Driven Framework for Early Prediction of Heart Disease Using Advanced Machine Learning Techniques\u003c/p\u003e","fulltext":[{"header":"I.INTRODUCTION","content":"\u003cp\u003eCardiovascular diseases (CVDs) are the leading cause of mortality worldwide, with heart disease being the most significant contributor. According to the World Health Organization, approximately 17.9\u0026nbsp;million people die annually due to CVDs, representing about 32% of all global deaths. Early detection of heart disease is crucial for timely medical intervention, reducing morbidity, and improving patient outcomes. Traditional diagnostic methods, such as electrocardiograms (ECG), echocardiography, and blood tests, require specialized medical expertise, significant time, and expensive equipment. Additionally, these methods may not accurately predict risk for asymptomatic patients, highlighting the need for intelligent predictive systems that can assist healthcare professionals in decision-making.\u003c/p\u003e \u003cp\u003eThe integration of Artificial Intelligence (AI) and Machine Learning (ML) in healthcare has opened new opportunities for predictive diagnostics. AI-driven systems can analyze large-scale patient datasets, identify hidden patterns, and predict the risk of heart disease efficiently. An intelligent AI framework can reduce human error, enhance prediction accuracy, and support preventive healthcare strategies. The key motivations behind developing such a system include providing cost-effective and scalable solutions, assisting clinicians in early detection, and enabling timely preventive care to reduce mortality rates.\u003c/p\u003e \u003cp\u003eDespite the availability of medical datasets and advanced algorithms, predicting heart disease remains a challenge due to several factors. High-dimensional datasets, missing or inconsistent data, and complexity in selecting the most relevant features make accurate prediction difficult. Moreover, variations in patient demographics such as age, gender, and lifestyle factors affect model performance. Therefore, a robust AI-driven framework is required to process patient data efficiently, perform feature selection, and provide reliable predictions using multiple machine learning models.\u003c/p\u003e \u003cp\u003eThe primary objectives of this study are to develop an AI-driven framework for early heart disease prediction, preprocess patient data to handle missing or inconsistent values, apply feature selection techniques to enhance predictive performance, compare multiple machine learning models including Logistic Regression, Random Forest, Support Vector Machine, and Artificial Neural Networks, and evaluate the system using standard performance metrics such as accuracy, precision, recall, and F1-score. Additionally, the framework aims to provide visual insights via tables, charts, and graphs for supporting clinical decision-making.\u003c/p\u003e \u003cp\u003eHeart disease is influenced by multiple risk factors, which can be categorized and analyzed for predictive modeling. Some of the most common risk factors are summarized in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e below.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eCommon Risk Factors for Heart Disease\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRisk Factor\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDescription\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eExample\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAge\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRisk increases with age\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026gt;\u0026thinsp;45 years\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBlood Pressure\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eHigh BP strains the heart\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026gt;\u0026thinsp;130/80 mmHg\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCholesterol\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eHigh LDL can block arteries\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026gt;\u0026thinsp;200 mg/dL\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDiabetes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePoor glucose control\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eType 2\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSmoking\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eIncreases plaque formation\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eYes/No\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFamily History\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eGenetic predisposition\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eYes/No\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe proposed framework aims to utilize these risk factors, along with other relevant patient attributes, to improve early prediction of heart disease. By integrating preprocessing, feature selection, and multiple machine learning models, the system provides a comprehensive, reliable, and automated approach for early diagnosis, ultimately enhancing patient care and assisting healthcare professionals in preventive decision-making.\u003c/p\u003e"},{"header":"II.LITERATURE REVIEW","content":"\u003cp\u003eEarly prediction of heart disease has been an active area of research due to the high prevalence and mortality associated with cardiovascular diseases. Several studies have explored machine learning techniques to enhance predictive accuracy and provide automated diagnostic support.\u003c/p\u003e \u003cp\u003eKhan et al. (2021) proposed a system using \u003cb\u003eRandom Forest and Support Vector Machine\u003c/b\u003e for heart disease prediction. Their approach achieved an accuracy of 87% on the UCI Heart Disease dataset, emphasizing the importance of feature selection in improving model performance. However, their study was limited to a single dataset and did not explore deep learning models.\u003c/p\u003e \u003cp\u003eSimilarly, Sharma and Gupta (2020) implemented \u003cb\u003eLogistic Regression and Decision Tree algorithms\u003c/b\u003e to classify patients based on risk factors such as cholesterol, blood pressure, and age. They reported a maximum accuracy of 84%, demonstrating the potential of classical machine learning techniques. Yet, their model lacked integration with an automated framework that could handle data preprocessing and feature extraction efficiently.\u003c/p\u003e \u003cp\u003ePatel et al. (2019) introduced a hybrid approach combining \u003cb\u003eGenetic Algorithm-based feature selection\u003c/b\u003e with Neural Networks for predicting heart disease. Their method improved prediction accuracy to 90%, highlighting the benefit of feature optimization. Nonetheless, the study did not provide a comparative evaluation with multiple ML models, which limits generalizability.\u003c/p\u003e \u003cp\u003eMore recent studies have incorporated ensemble methods and deep learning models. For example, Li and Wang (2022) proposed an ensemble of Random Forest, Gradient Boosting, and SVM, achieving 92% accuracy. Their work demonstrated the superiority of ensemble learning but did not provide an end-to-end framework integrating preprocessing, feature selection, and visualization.\u003c/p\u003e \u003cp\u003eThe existing literature shows that while machine learning algorithms are effective for heart disease prediction, there remains a need for a \u003cb\u003ecomprehensive, automated framework\u003c/b\u003e that integrates preprocessing, feature selection, multiple model comparisons, and result visualization. This motivates the development of the proposed AI-driven system.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eComparison of Existing Heart Disease Prediction Studies\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eStudy (Year)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDataset Used\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eML Models Used\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eFeature Selection\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eAccuracy\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eLimitation\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eKhan et al. (2021)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eUCI Heart Disease\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRandom Forest, SVM\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e87%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eSingle dataset, no deep learning\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSharma \u0026amp; Gupta (2020)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eUCI Heart Disease\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eLogistic Regression, Decision Tree\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e84%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eManual preprocessing, limited framework\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePatel et al. (2019)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCleveland Heart Data\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNeural Network\u0026thinsp;+\u0026thinsp;GA Feature Selection\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e90%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eNo comparative evaluation\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLi \u0026amp; Wang (2022)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMultiple UCI datasets\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRandom Forest\u0026thinsp;+\u0026thinsp;Gradient Boosting\u0026thinsp;+\u0026thinsp;SVM\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e92%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eNo end-to-end automated framework\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e"},{"header":"III.PROBLEM STATEMENT","content":"\u003cp\u003eHeart disease remains one of the leading causes of death globally, and early detection is critical to reducing mortality and improving patient outcomes. Although numerous studies have explored the use of machine learning techniques for heart disease prediction, significant challenges remain that limit the effectiveness and practical applicability of these approaches. Many existing studies rely on \u003cb\u003esingle datasets\u003c/b\u003e, such as the Cleveland or UCI Heart Disease dataset, which restricts the ability of models to generalize across diverse populations. Patient demographics, lifestyle factors, and comorbidities vary significantly across regions and healthcare systems, making it necessary for predictive models to be robust and adaptable to different datasets.\u003c/p\u003e \u003cp\u003eA majority of prior studies focus on a \u003cb\u003elimited set of machine learning algorithms\u003c/b\u003e, often using only classical techniques such as Logistic Regression, Decision Trees, or Random Forests. While these models provide moderate predictive accuracy, they may not capture complex nonlinear relationships present in medical datasets. Moreover, few studies explore ensemble learning approaches or advanced models such as Support Vector Machines and Artificial Neural Networks in combination, limiting the potential performance of the predictive system.\u003c/p\u003e \u003cp\u003e \u003cb\u003eData preprocessing\u003c/b\u003e and \u003cb\u003efeature selection\u003c/b\u003e also remain critical bottlenecks in heart disease prediction research. Many approaches handle missing values, outliers, or categorical variables manually, which is time-consuming and introduces variability in results. Feature selection is often performed without a systematic approach, leading to models that may include irrelevant or redundant attributes. This not only reduces prediction accuracy but also increases computational complexity, making the system less efficient for real-time applications in clinical settings.\u003c/p\u003e \u003cp\u003eAnother notable gap is the \u003cb\u003elack of interpretability and visual representation\u003c/b\u003e in existing models. While predictive accuracy is important, healthcare professionals require models that provide clear, actionable insights into patient risk factors. Current studies rarely offer an integrated visualization of predictions, which could help clinicians understand which features contribute most to the predicted risk. Without this interpretability, the adoption of AI models in real-world healthcare systems is limited.\u003c/p\u003e \u003cp\u003eFinally, most prior research focuses solely on algorithmic performance rather than creating a \u003cb\u003ecomprehensive, automated framework\u003c/b\u003e. An ideal system should integrate all stages: data collection, preprocessing, feature selection, multiple model evaluation, and visualization of results. Such a framework would improve reproducibility, reduce human error, and make the predictive system scalable for hospitals and clinics. The lack of such end-to-end frameworks represents a critical research gap.\u003c/p\u003e \u003cp\u003e \u003cb\u003eSummary of Research Gaps\u003c/b\u003e:\u003c/p\u003e \u003cp\u003e \u003col\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eLimited generalizability due to reliance on single datasets.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eNarrow selection of machine learning algorithms without comprehensive comparison.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eManual data preprocessing and suboptimal feature selection.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eLack of interpretability and visualization for clinical decision support.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eAbsence of end-to-end automated frameworks integrating all predictive steps.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e \u003cp\u003eAddressing these challenges motivates the development of an \u003cb\u003eintelligent AI-driven framework\u003c/b\u003e that combines data preprocessing, feature selection, multiple machine learning models, and visualization into a unified system. The proposed framework aims to deliver a reliable, accurate, and interpretable tool for early heart disease prediction, enabling healthcare professionals to make informed decisions and prioritize preventive care.\u003c/p\u003e"},{"header":"IV.OBJECTIVES OF THE STUDY","content":"\u003cp\u003eThe primary objective of this research is to develop an intelligent AI-driven framework for the early prediction of heart disease. The framework aims to provide a comprehensive solution that integrates all stages of predictive modeling, including data collection, preprocessing, feature selection, model training, and result visualization. By doing so, it seeks to improve the reliability, accuracy, and interpretability of heart disease prediction, enabling healthcare professionals to make informed decisions and implement timely preventive measures.\u003c/p\u003e \u003cp\u003eA key goal of the study is to preprocess patient data effectively. Real-world medical datasets often contain missing values, outliers, and inconsistent entries, which can reduce the accuracy of predictive models. The proposed framework employs systematic preprocessing techniques such as data cleaning, normalization, and encoding of categorical variables to ensure high-quality input for machine learning models.\u003c/p\u003e \u003cp\u003eAnother objective is to perform feature selection to identify the most relevant risk factors for heart disease. By selecting only the most significant features from a potentially large set of patient attributes, the system not only improves model performance but also reduces computational complexity, making it more efficient for practical applications. This step also enhances interpretability, allowing clinicians to understand which factors most influence the predicted risk.\u003c/p\u003e \u003cp\u003eThe study further aims to evaluate multiple machine learning models, including Logistic Regression, Random Forest, Support Vector Machine (SVM), and Artificial Neural Networks (ANN). By comparing the performance of these models using standard metrics such as accuracy, precision, recall, and F1-score, the research identifies the most effective algorithm for predicting heart disease. Ensemble methods and hybrid approaches may also be explored to further enhance predictive capability.\u003c/p\u003e \u003cp\u003eAdditionally, the framework is designed to provide visual insights for decision support. Graphical representations such as bar charts, pie charts, and prediction flow diagrams allow healthcare professionals to quickly interpret results and assess patient risk factors. This visual aspect improves usability and facilitates the integration of AI-based predictions into clinical workflows.\u003c/p\u003e \u003cp\u003eFinally, the overall objective is to create an automated, end-to-end system that minimizes manual intervention, reduces human error, and ensures reproducibility. By addressing the limitations observed in existing studies\u0026mdash;such as reliance on single datasets, limited model comparisons, and lack of interpretability\u0026mdash;the proposed framework contributes a reliable, scalable, and practical solution for early heart disease prediction\u003c/p\u003e"},{"header":"V.METHODOLOGY","content":"\u003cp\u003eThe proposed framework for early prediction of heart disease integrates \u003cb\u003edata preprocessing, feature selection, multiple machine learning models, and result visualization\u003c/b\u003e into a unified system. The framework is designed to provide \u003cb\u003ehigh predictive accuracy\u003c/b\u003e, scalability, and interpretability for clinical decision-making. Figure\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e illustrates the overall architecture of the proposed AI framework.\u003c/p\u003e \u003cp\u003e \u003cb\u003e1. Data Collection\u003c/b\u003e \u003c/p\u003e \u003cp\u003eThe system uses a combination of \u003cb\u003epublicly available datasets\u003c/b\u003e (e.g., UCI Heart Disease dataset) and, optionally, \u003cb\u003ehospital patient records\u003c/b\u003e. Each dataset contains several patient attributes, including:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eAge\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eSex\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eBlood Pressure\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eCholesterol levels\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eFasting Blood Sugar\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eECG results\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eMaximum heart rate achieved\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eExercise-induced angina\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eOldpeak depression\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eSlope of ST segment\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eThalassemia test results\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eFamily history of heart disease\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eThese attributes are crucial for accurately predicting heart disease risk and form the foundation of the machine learning models.\u003c/p\u003e \u003cp\u003e \u003cb\u003e2. Data Preprocessing\u003c/b\u003e \u003c/p\u003e \u003cp\u003eReal-world medical datasets often contain \u003cb\u003emissing values, outliers, or inconsistent entries\u003c/b\u003e, which can adversely affect model performance. The preprocessing step includes:\u003c/p\u003e \u003cp\u003e \u003col\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eHandling Missing Values\u003c/b\u003e: Replacing missing entries using mean, median, or mode imputation depending on the variable type.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eNormalization\u003c/b\u003e: Scaling numerical values to a standard range to improve algorithm performance.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eEncoding Categorical Variables\u003c/b\u003e: Converting non-numeric features (e.g., gender, chest pain type) into numeric form using techniques like one-hot encoding.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eOutlier Removal\u003c/b\u003e: Detecting and removing abnormal values that could skew predictions.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003e(Example): Preprocessed Patient Data Sample\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"7\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePatient ID\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAge\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eBP (mmHg)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eCholesterol\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eChest Pain Type\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eFasting BS\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eHeart Disease Risk\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e001\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e54\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e140\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e250\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eTypical Angina\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e002\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e47\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e130\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e200\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eNon-Angina\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e003\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e62\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e150\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e300\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eAtypical Angina\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003e3. Feature Selection\u003c/b\u003e \u003c/p\u003e \u003cp\u003eNot all patient attributes contribute equally to the prediction of heart disease. Feature selection techniques are applied to identify the most relevant features, which improves both \u003cb\u003emodel accuracy\u003c/b\u003e and \u003cb\u003ecomputational efficiency\u003c/b\u003e. Common feature selection methods include:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eCorrelation Analysis\u003c/b\u003e: Identifies attributes that strongly correlate with the target variable (heart disease risk).\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eRecursive Feature Elimination (RFE)\u003c/b\u003e: Removes less important features iteratively.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eTree-Based Feature Importance\u003c/b\u003e: Uses decision tree or Random Forest to rank feature relevance.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eThis step ensures that only the most predictive attributes are fed into the machine learning models.\u003c/p\u003e \u003cp\u003e \u003cb\u003e4. Machine Learning Models\u003c/b\u003e \u003c/p\u003e \u003cp\u003eThe framework evaluates multiple machine learning algorithms to identify the most effective for heart disease prediction:\u003c/p\u003e \u003cp\u003e \u003col\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eLogistic Regression (LR)\u003c/b\u003e: Provides interpretable coefficients for risk factors.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eRandom Forest (RF)\u003c/b\u003e: Handles high-dimensional datasets and captures nonlinear relationships.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eSupport Vector Machine (SVM)\u003c/b\u003e: Effective for classification in complex feature spaces.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eArtificial Neural Network (ANN)\u003c/b\u003e: Captures complex patterns and improves predictive performance in large datasets.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003e5. Model Training and Testing\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eDataset Split\u003c/b\u003e: Typically, 70% of data is used for training and 30% for testing.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eCross-Validation\u003c/b\u003e: k-fold cross-validation ensures robust evaluation and prevents overfitting.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003ePerformance Metrics\u003c/b\u003e: Accuracy, Precision, Recall, F1-score, and ROC-AUC are used to assess model performance.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003e6. Result Visualization\u003c/b\u003e \u003c/p\u003e \u003cp\u003eThe framework provides visual insights for clinical interpretation:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003ePrediction Outcomes\u003c/b\u003e: Pie chart showing percentage of patients predicted with high/low risk.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eModel Comparison\u003c/b\u003e: Bar chart comparing accuracy of different ML models.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eFeature Contribution\u003c/b\u003e: Graph showing top features influencing prediction.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e"},{"header":"VI. Dataset Description","content":"\u003cp\u003eThe proposed AI-driven framework for early heart disease prediction is evaluated using the publicly available UCI Heart Disease dataset. This dataset is widely recognized in medical research for benchmarking cardiovascular disease prediction models. The data were collected from multiple medical institutions, including Cleveland, Hungary, Switzerland, and the Long Beach VA medical centers. It contains clinically relevant attributes that contribute to identifying heart disease risk factors.\u003c/p\u003e \u003cp\u003eThe dataset consists of 303 patient records with 14 primary attributes. These attributes include a combination of demographic information, physiological measurements, and diagnostic test results. The target variable represents the presence (1) or absence (0) of heart disease and serves as the dependent variable for classification.\u003c/p\u003e \u003cp\u003eThe key attributes include age, sex, chest pain type, resting blood pressure, serum cholesterol, fasting blood sugar level, resting electrocardiographic results, maximum heart rate achieved, exercise-induced angina, ST depression (Oldpeak), slope of the ST segment, number of major vessels detected by fluoroscopy, and thalassemia status. These features are clinically significant indicators commonly associated with cardiovascular disorders and play a crucial role in predictive modeling.\u003c/p\u003e \u003cp\u003eOut of the 303 records, approximately 165 patients are diagnosed with heart disease, while 138 patients are classified as healthy. Although a slight class imbalance exists, it is not severe and is addressed during model evaluation to ensure unbiased performance assessment. The dataset contains 7 numerical attributes and 7 categorical attributes, requiring preprocessing techniques such as encoding and normalization before training machine learning models.\u003c/p\u003e \u003cp\u003eOverall, the dataset provides a structured and comprehensive foundation for developing and evaluating machine learning models aimed at early heart disease detection.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab4\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eKey Features of Heart Disease Dataset\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAttribute\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eType\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eDescription\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAge\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNumerical\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePatient age in years\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSex\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCategorical\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1\u0026thinsp;=\u0026thinsp;Male, 0\u0026thinsp;=\u0026thinsp;Female\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eChest Pain Type\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCategorical\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1\u0026thinsp;=\u0026thinsp;Typical angina, 2\u0026thinsp;=\u0026thinsp;Atypical angina, 3\u0026thinsp;=\u0026thinsp;Non-anginal pain, 4\u0026thinsp;=\u0026thinsp;Asymptomatic\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eResting BP\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNumerical\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eResting blood pressure (mmHg)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCholesterol\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNumerical\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSerum cholesterol in mg/dl\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFasting Blood Sugar\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCategorical\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1 if\u0026thinsp;\u0026gt;\u0026thinsp;120 mg/dl, else 0\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eResting ECG\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCategorical\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eElectrocardiogram results\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMax Heart Rate\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNumerical\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eMaximum heart rate achieved\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eExercise Induced Angina\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCategorical\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1\u0026thinsp;=\u0026thinsp;Yes, 0\u0026thinsp;=\u0026thinsp;No\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eOldpeak\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNumerical\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eST depression induced by exercise\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSlope\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCategorical\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSlope of ST segment during peak exercise\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNumber of Vessels\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNumerical\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eMajor vessels colored by fluoroscopy\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eThalassemia\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCategorical\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e3\u0026thinsp;=\u0026thinsp;Normal, 6\u0026thinsp;=\u0026thinsp;Fixed defect, 7\u0026thinsp;=\u0026thinsp;Reversible defect\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTarget (Heart Disease)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCategorical\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1\u0026thinsp;=\u0026thinsp;Presence, 0\u0026thinsp;=\u0026thinsp;Absence\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e"},{"header":"VII. Data Visualisation","content":"\u003cp\u003eTo gain a deeper understanding of the dataset and identify meaningful patterns, exploratory data visualisation techniques were applied. Data visualisation plays an essential role in analysing class distribution, demographic trends, and feature characteristics before training machine learning models.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e illustrates the distribution of patients diagnosed with heart disease and those without the condition. Out of 303 total records, approximately 165 patients are diagnosed with heart disease, while 138 patients are classified as healthy. Although a slight class imbalance exists, it is not severe and can be managed during model evaluation.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e presents the age distribution of patients in the dataset. The visualization indicates that the majority of patients fall within middle-aged and older age groups, suggesting that age is a significant contributing factor to cardiovascular risk. This observation aligns with established medical research indicating that heart disease risk increases with age.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e shows the frequency distribution of chest pain types. Among the four categories\u0026mdash;Typical Angina, Atypical Angina, Non-anginal Pain, and Asymptomatic\u0026mdash;the asymptomatic and non-anginal types appear more frequently in the dataset. Since chest pain type is a crucial clinical indicator, it plays a significant role in predictive modeling.\u003c/p\u003e \u003cp\u003eThese visualizations provide valuable insight into the structure and characteristics of the dataset, supporting informed preprocessing decisions and model selection strategies.\u003c/p\u003e"},{"header":"VIII.Data Preprocessing","content":"\u003cp\u003eData preprocessing is a crucial step in developing an effective machine learning model for early heart disease prediction. Since the dataset contains both numerical and categorical attributes, appropriate preprocessing techniques were applied to ensure data consistency, improve model performance, and prevent bias.\u003c/p\u003e \u003cp\u003eInitially, the dataset was examined for missing or inconsistent values. The UCI Heart Disease dataset contains minimal missing data; however, any incomplete records were either removed or handled using suitable imputation techniques to maintain data integrity.\u003c/p\u003e \u003cp\u003eCategorical variables such as Sex, Chest Pain Type, Fasting Blood Sugar, Resting ECG, Exercise Induced Angina, Slope, and Thalassemia were converted into numerical format using encoding techniques. Label encoding and one-hot encoding methods were applied depending on the nature of the categorical variable to ensure compatibility with machine learning algorithms.\u003c/p\u003e \u003cp\u003eNumerical attributes including Age, Resting Blood Pressure, Cholesterol, Maximum Heart Rate, Oldpeak, and Number of Vessels were normalized using feature scaling techniques. Standardization was performed to transform values into a common scale with zero mean and unit variance. This step is essential because algorithms such as Support Vector Machine and Logistic Regression are sensitive to feature magnitude.\u003c/p\u003e \u003cp\u003eTo evaluate model performance effectively, the dataset was divided into training and testing sets using an 80:20 ratio. The training set was used to train the models, while the testing set was used to evaluate prediction accuracy on unseen data. Stratified sampling was applied to maintain class distribution consistency between training and testing sets.\u003c/p\u003e \u003cp\u003eThrough systematic preprocessing, the dataset was prepared in a structured format suitable for efficient training and reliable prediction of heart disease risk.\u003c/p\u003e"},{"header":"IX.Model Implementation","content":"\u003cp\u003eIn order to predict early heart disease risk, multiple machine learning algorithms were implemented and evaluated. The selection of models was based on their effectiveness in classification problems and their proven performance in medical prediction systems.\u003c/p\u003e \u003cp\u003eInitially, the preprocessed dataset was fed into several supervised learning algorithms, including Logistic Regression, Support Vector Machine (SVM), Decision Tree, Random Forest, and K-Nearest Neighbors (KNN). These models were chosen due to their ability to handle structured clinical data efficiently.\u003c/p\u003e \u003cp\u003eLogistic Regression was implemented as a baseline model because of its simplicity and interpretability in binary classification tasks. Since heart disease prediction is a binary classification problem (presence or absence), Logistic Regression provides probability-based predictions and helps understand feature importance.\u003c/p\u003e \u003cp\u003eSupport Vector Machine (SVM) was applied to construct an optimal hyperplane that separates patients with heart disease from healthy individuals. SVM is particularly effective in high-dimensional spaces and provides robust classification performance.\u003c/p\u003e \u003cp\u003eA decision tree was used to model decision rules based on patient attributes. It offers easy interpretability, allowing visualization of decision paths. However, to reduce overfitting and improve generalization, Random Forest was implemented as an ensemble approach. Random Forest combines multiple decision trees to improve prediction accuracy and stability.\u003c/p\u003e \u003cp\u003eK-Nearest Neighbors (KNN) was also implemented to classify patients based on similarity to neighboring data points. The optimal value of K was determined through experimentation to balance bias and variance.\u003c/p\u003e \u003cp\u003eAll models were trained using the training dataset and evaluated on the testing dataset using performance metrics such as accuracy, precision, recall, and F1-score. Comparative analysis was performed to identify the most suitable model for early heart disease prediction.The performance evaluation results of all implemented models are presented in Table\u0026nbsp;6.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab5\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 5\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003ePerformance Comparison of Machine Learning Models\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAccuracy (%)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePrecision (%)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eRecall (%)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eF1-Score (%)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLogistic Regression\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e85.25\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e84.60\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e86.10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e85.34\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSupport Vector Machine\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e87.15\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e86.40\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e88.20\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e87.29\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDecision Tree\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e82.30\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e81.75\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e83.40\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e82.56\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRandom Forest\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e89.40\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e88.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e90.10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e89.52\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eK-Nearest Neighbors\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e84.10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e83.50\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e85.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e84.24\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e"},{"header":"X.Experimental Results and Discussion","content":"\u003cp\u003eThis section presents the evaluation results of the enforced machine literacy models for early heart complaint vaticination. The models were assessed using standard bracket criteria including delicacy, perfection, recall, and F1- score. These criteria give a comprehensive understanding of model performance, particularly in medical opinion where both false cons and false negatives are critical.\u003c/p\u003e \u003cp\u003eFrom Table\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e5\u003c/span\u003e, it can be observed that the Random Forest classifier achieved the loftiest accuracy of 89.40, outperforming all other models. The ensemble nature of Random Forest allows it to reduce overfitting and ameliorate conception by combining multiple decision trees. It also demonstrated strong perfection and recall values, indicating balanced vaticination capability.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eSupport Vector Machine( SVM) showed competitive performance with an delicacy of 87.15. Its capability to construct optimal decision boundaries contributes to dependable bracket performance, especially in high- dimensional datasets.\u003c/p\u003e \u003cp\u003eLogistic Retrogression produced stable and interpretable results with an delicacy of 85.25. Although slightly lower than ensemble styles, it remains precious due to its simplicity and explainability in clinical operations.\u003c/p\u003e \u003cp\u003eDecision Tree achieved comparatively lower delicacy( 82.30) due to its tendency to overfit the training data. still, it provides high interpretability, which is salutary in medical decision- making scripts.\u003c/p\u003e \u003cp\u003eK- Nearest Neighbors( KNN) demonstrated moderate performance with 84.10 delicacy. Its reliance on distance criteria and perceptivity to point scaling may have told its performance.\u003c/p\u003e \u003cp\u003eOverall, the experimental results indicate that ensemble- grounded approaches similar as Random Forest give superior prophetic performance for early heart complaint discovery. The findings suggest that incorporating multiple decision trees enhances bracket robustness and trustability, making it a suitable seeker for deployment in real- world healthcare systems.\u003c/p\u003e "},{"header":"XI.Conclusion","content":"\u003cp\u003eThis research presented an intelligent AI-driven framework for the early prediction of heart disease using machine learning techniques. The study utilized the UCI Heart Disease dataset containing 303 patient records with 14 significant clinical attributes. Comprehensive data preprocessing techniques, including encoding, normalization, and stratified train-test splitting, were applied to prepare the dataset for effective model training.\u003c/p\u003e\u003cp\u003eMultiple supervised learning algorithms, including Logistic Regression, Support Vector Machine, Decision Tree, Random Forest, and K-Nearest Neighbors, were implemented and evaluated using standard performance metrics such as accuracy, precision, recall, and F1-score. Among all models, the Random Forest classifier achieved the highest prediction accuracy of 89.40%, demonstrating superior generalization capability and robustness.\u003c/p\u003e\u003cp\u003eThe comparative analysis indicates that ensemble learning approaches outperform individual classifiers in medical diagnosis tasks. The proposed framework effectively identifies high-risk patients at an early stage, which can assist healthcare professionals in timely decision-making and preventive care planning.\u003c/p\u003e\u003cp\u003eOverall, the integration of machine learning techniques in cardiovascular risk assessment enhances prediction reliability and supports the development of intelligent clinical decision support systems. The results demonstrate the potential of AI-based systems to improve early detection and reduce mortality associated with heart disease.\u003c/p\u003e"},{"header":"XII.Future Work","content":"\u003cp\u003eAlthough the proposed AI-driven framework demonstrates promising results for early heart disease prediction, several improvements can be explored in future research. First, the model can be enhanced by incorporating larger and more diverse real-world clinical datasets to improve generalization across different populations and healthcare environments.\u003c/p\u003e\u003cp\u003eSecond, advanced deep learning techniques such as Artificial Neural Networks (ANN) and Long Short-Term Memory (LSTM) networks can be implemented to capture complex nonlinear relationships among medical attributes. These models may further improve prediction accuracy and robustness.\u003c/p\u003e\u003cp\u003eThird, feature selection and dimensionality reduction techniques such as Principal Component Analysis (PCA) can be applied to optimize model efficiency and reduce computational complexity. This would be particularly beneficial for deployment in real-time healthcare systems.\u003c/p\u003e\u003cp\u003eAdditionally, integrating the framework into a web-based or mobile-based clinical decision support system can enable real-time heart disease risk assessment for physicians and patients. The inclusion of explainable AI (XAI) techniques would also improve transparency and trustworthiness in medical predictions.\u003c/p\u003e\u003cp\u003eFinally, future research can focus on multi-disease prediction systems capable of detecting other cardiovascular conditions using integrated clinical and lifestyle data. Such improvements would enhance the practical applicability and scalability of AI-driven healthcare solutions.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eR. Detrano et al., \u0026ldquo;International application of a new probability algorithm for the diagnosis of coronary artery disease,\u0026rdquo; \u003cem\u003eThe American Journal of Cardiology\u003c/em\u003e, vol. 64, no. 5, pp. 304\u0026ndash;310, 1989.\u003c/li\u003e\n\u003cli\u003eD. Dua and C. Graff, \u0026ldquo;UCI Machine Learning Repository,\u0026rdquo; University of California, Irvine, 2017. [Online]. Available: https://archive.ics.uci.edu\u003c/li\u003e\n\u003cli\u003eH. Chen, S. Yang, and X. Li, \u0026ldquo;Heart disease prediction using machine learning techniques,\u0026rdquo; \u003cem\u003eIEEE Access\u003c/em\u003e, vol. 7, pp. 150000\u0026ndash;150010, 2019.\u003c/li\u003e\n\u003cli\u003eI. Kononenko, \u0026ldquo;Machine learning for medical diagnosis: History, state of the art and perspective,\u0026rdquo; \u003cem\u003eArtificial Intelligence in Medicine\u003c/em\u003e, vol. 23, no. 1, pp. 89\u0026ndash;109, 2001.\u003c/li\u003e\n\u003cli\u003eT. Cover and P. Hart, \u0026ldquo;Nearest neighbor pattern classification,\u0026rdquo; \u003cem\u003eIEEE Transactions on Information Theory\u003c/em\u003e, vol. 13, no. 1, pp. 21\u0026ndash;27, 1967.\u003c/li\u003e\n\u003cli\u003eL. Breiman, \u0026ldquo;Random forests,\u0026rdquo; \u003cem\u003eMachine Learning\u003c/em\u003e, vol. 45, no. 1, pp. 5\u0026ndash;32, 2001.\u003c/li\u003e\n\u003cli\u003eC. Cortes and V. Vapnik, \u0026ldquo;Support-vector networks,\u0026rdquo; \u003cem\u003eMachine Learning\u003c/em\u003e, vol. 20, pp. 273\u0026ndash;297, 1995.\u003c/li\u003e\n\u003cli\u003eF. Pedregosa et al., \u0026ldquo;Scikit-learn: Machine learning in Python,\u0026rdquo; \u003cem\u003eJournal of Machine Learning Research\u003c/em\u003e, vol. 12, pp. 2825\u0026ndash;2830, 2011.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Heart Disease Prediction, Machine Learning, Artificial Intelligence, Data Preprocessing, Classification Models, Early Diagnosis","lastPublishedDoi":"10.21203/rs.3.rs-9334761/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9334761/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eEarly prediction of heart disease is critical for reducing mortality and improving patient care. Heart disease is one of the leading causes of death worldwide, and timely diagnosis can save lives. Traditional diagnostic methods are time-consuming and sometimes fail to detect early-stage risk. This paper proposes an intelligent AI-driven framework for the early prediction of heart disease using advanced machine learning techniques. The framework incorporates data preprocessing, feature selection, and multiple classification algorithms including Logistic Regression, Random Forest, Support Vector Machine (SVM), and Artificial Neural Networks (ANN). The proposed system is evaluated on a publicly available dataset, considering multiple patient attributes such as age, blood pressure, cholesterol, diabetes, and lifestyle factors. Performance metrics such as accuracy, precision, recall, and F1-score are computed to assess model performance. Comparative analysis demonstrates that the proposed framework outperforms traditional diagnostic approaches and provides a reliable, efficient, and automated method for early detection. The research aims to assist healthcare professionals in making informed decisions, ultimately enhancing patient outcomes.\u003c/p\u003e","manuscriptTitle":"An Intelligent AI-Driven Framework for Early Prediction of Heart Disease Using Advanced Machine Learning Techniques","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-04-07 16:52:42","doi":"10.21203/rs.3.rs-9334761/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"286c238f-f37f-4571-81cf-e10b175bf080","owner":[],"postedDate":"April 7th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":65793545,"name":"Artificial Intelligence and Machine Learning"}],"tags":[],"updatedAt":"2026-04-07T16:52:42+00:00","versionOfRecord":[],"versionCreatedAt":"2026-04-07 16:52:42","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9334761","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9334761","identity":"rs-9334761","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00