Comprehensive Analysis of Machine Learning Algorithms for Predicting Heart Disease Across Genders: A Balanced Approach to Model Selection and Performance Evaluation

doi:10.21203/rs.3.rs-6398275/v1

Comprehensive Analysis of Machine Learning Algorithms for Predicting Heart Disease Across Genders: A Balanced Approach to Model Selection and Performance Evaluation

2025 · doi:10.21203/rs.3.rs-6398275/v1

preprint OA: closed

Full text JSON View at publisher

Full text 12,039 characters · extracted from preprint-html · click to expand

Comprehensive Analysis of Machine Learning Algorithms for Predicting Heart Disease Across Genders: A Balanced Approach to Model Selection and Performance Evaluation | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Comprehensive Analysis of Machine Learning Algorithms for Predicting Heart Disease Across Genders: A Balanced Approach to Model Selection and Performance Evaluation Charu Kaushik, Kamlesh Sharma This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6398275/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Background Heart disease affects men and women differently in terms of symptoms, risk factors, and recovery, and it remains the leading cause of death worldwide. This study compares heart attack characteristics between genders using machine learning approaches. To avoid bias and ensure equitable model evaluation, a well-balanced dataset that includes lifestyle factors, clinical records, and demographic information is essential. Methods Numerous machine learning models, including Random Forest, Decision Trees, Support Vector Machines (SVM), and Logistic Regression, were evaluated. Trying out several models aids in identifying the best method for heart disease prediction. Performance metrics such as accuracy, precision, recall, F1 score, and the AUC-ROC curve were employed to evaluate the effectiveness of the models. Results The results demonstrated that the female dataset performed better than the male dataset across all models, particularly in K-Nearest Neighbour, Naïve Bayes, and Logistic Regression. The male dataset exhibited poorer accuracy, especially in Naïve Bayes and Extreme Gradient Boost. The StackingCVClassifier, which combines several models, improved predictive accuracy, achieving 92.31% accuracy for the female dataset compared to 82.76% for the male dataset, with fewer misclassified samples. Conclusions The female dataset is a more reliable model for predicting heart disease, demonstrating higher accuracy and fewer misclassified samples. The male dataset requires further optimization, particularly in models like Naïve Bayes and Extreme Gradient Boost. Combining multiple models through the StackingCVClassifier enhances predictive accuracy, highlighting the importance of leveraging individual model strengths. Cardiac & Cardiovascular Systems Heart disease Machine learning Prediction Gender differences Balanced dataset Lifestyle factors Clinical records Full Text Additional Declarations The authors declare potential competing interests as follows: Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6398275","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":439809011,"identity":"7fd2d4b7-9df1-4411-8371-a596cbcb54df","order_by":0,"name":"Charu Kaushik","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABKElEQVRIie2RMUvEMBSAUwK9JZD1Hej9hkLhPBH8LQ3CdclNLgXlrhDppq6FCv0LAeFwrBTrEvcMNygHN8gNBwVxEtObW6mbSL4hvBfex3vJQ8hi+YtAcxRN4AgURDCiVOzTg37Kq5r4w7QKmpT0UBB23q4iJmPuNXedCs3EpibqlOWZEB6LwZFI1e/6ckLQoHySbU1WlZ8Rfcbk6lEE7AHwEb5envDKDEamU92ieBD4mOwwk8BEwRS4x/HL0ueuUYCM25WwNsqC5WmjJEC8gm98/vWTwk0XXbJYMzNYAmAUvJ4l3Qpofu7cqWdfGsULFHjmk8d4dgPE7XgLTcN7tK0uDvM0XA8/o/nilop1zT/mIzooqzalFXe/LLdveQPe/abaYrFY/j3fy5prZf/K/1UAAAAASUVORK5CYII=","orcid":"","institution":"Manav Rachna International Insitute of research and studies","correspondingAuthor":true,"prefix":"","firstName":"Charu","middleName":"","lastName":"Kaushik","suffix":""},{"id":439809012,"identity":"8d2ad855-e92e-4734-ad82-2ce7c3bbd222","order_by":1,"name":"Kamlesh Sharma","email":"","orcid":"","institution":"Manav Rachna International Insitute of research and studies","correspondingAuthor":false,"prefix":"","firstName":"Kamlesh","middleName":"","lastName":"Sharma","suffix":""}],"badges":[],"createdAt":"2025-04-08 02:14:06","currentVersionCode":1,"declarations":{"humanSubjects":true,"vertebrateSubjects":true,"conflictsOfInterestStatement":true,"humanSubjectEthicalGuidelines":true,"humanSubjectConsent":true,"humanSubjectClinicalTrial":true,"humanSubjectCaseReport":true,"vertebrateSubjectEthicalGuidelines":true},"doi":"10.21203/rs.3.rs-6398275/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6398275/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":80374443,"identity":"077aaa5e-c31f-4d3a-827c-a73539e30d3c","added_by":"auto","created_at":"2025-04-11 07:31:40","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":981612,"visible":true,"origin":"","legend":"","description":"","filename":"Heartdiseaseprediction.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6398275/v1_covered_77e52345-b14b-4d56-89e7-ce00c6afc95a.pdf"}],"financialInterests":"The authors declare potential competing interests as follows: ","formattedTitle":"\u003cp\u003e\u003cstrong\u003eComprehensive Analysis of Machine Learning Algorithms for Predicting Heart Disease Across Genders: A Balanced Approach to Model Selection and Performance Evaluation\u003c/strong\u003e\u003c/p\u003e","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Heart disease Machine learning Prediction Gender differences Balanced dataset Lifestyle factors Clinical records","lastPublishedDoi":"10.21203/rs.3.rs-6398275/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6398275/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eBackground\u003c/h2\u003e \u003cp\u003eHeart disease affects men and women differently in terms of symptoms, risk factors, and recovery, and it remains the leading cause of death worldwide. This study compares heart attack characteristics between genders using machine learning approaches. To avoid bias and ensure equitable model evaluation, a well-balanced dataset that includes lifestyle factors, clinical records, and demographic information is essential.\u003c/p\u003e\u003ch2\u003eMethods\u003c/h2\u003e \u003cp\u003eNumerous machine learning models, including Random Forest, Decision Trees, Support Vector Machines (SVM), and Logistic Regression, were evaluated. Trying out several models aids in identifying the best method for heart disease prediction. Performance metrics such as accuracy, precision, recall, F1 score, and the AUC-ROC curve were employed to evaluate the effectiveness of the models.\u003c/p\u003e\u003ch2\u003eResults\u003c/h2\u003e \u003cp\u003eThe results demonstrated that the female dataset performed better than the male dataset across all models, particularly in K-Nearest Neighbour, Na\u0026iuml;ve Bayes, and Logistic Regression. The male dataset exhibited poorer accuracy, especially in Na\u0026iuml;ve Bayes and Extreme Gradient Boost. The StackingCVClassifier, which combines several models, improved predictive accuracy, achieving 92.31% accuracy for the female dataset compared to 82.76% for the male dataset, with fewer misclassified samples.\u003c/p\u003e\u003ch2\u003eConclusions\u003c/h2\u003e \u003cp\u003eThe female dataset is a more reliable model for predicting heart disease, demonstrating higher accuracy and fewer misclassified samples. The male dataset requires further optimization, particularly in models like Na\u0026iuml;ve Bayes and Extreme Gradient Boost. Combining multiple models through the StackingCVClassifier enhances predictive accuracy, highlighting the importance of leveraging individual model strengths.\u003c/p\u003e","manuscriptTitle":"Comprehensive Analysis of Machine Learning Algorithms for Predicting Heart Disease Across Genders: A Balanced Approach to Model Selection and Performance Evaluation","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-04-11 06:59:27","doi":"10.21203/rs.3.rs-6398275/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"f1c331b0-2539-4a31-88cf-7780c36a04e5","owner":[],"postedDate":"April 11th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":46820917,"name":"Cardiac \u0026 Cardiovascular Systems"}],"tags":[],"updatedAt":"2025-04-11T06:59:27+00:00","versionOfRecord":[],"versionCreatedAt":"2025-04-11 06:59:27","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-6398275","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6398275","identity":"rs-6398275","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00