Optimizing Machine Learning for Diabetes Detection: Addressing Class Imbalance with SMOTE and Random Forest Ensemble Learning

preprint OA: closed
Full text JSON View at publisher
Full text 7,496 characters · extracted from preprint-html · click to expand
Optimizing Machine Learning for Diabetes Detection: Addressing Class Imbalance with SMOTE and Random Forest Ensemble Learning | Authorea try { document.documentElement.classList.add('js'); } catch (e) { } var _gaq = _gaq || []; _gaq.push(['_setAccount', 'G-8VDV14Y67G']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); Skip to main content Preprints Collections Wiley Open Research IET Open Research Ecological Society of Japan All Collections About About Authorea FAQs Contact Us Quick Search anywhere Search for preprint articles, keywords, etc. Search Search ADVANCED SEARCH SCROLL This is a preprint and has not been peer reviewed. Data may be preliminary. 25 February 2025 V1 Latest version Share on Optimizing Machine Learning for Diabetes Detection: Addressing Class Imbalance with SMOTE and Random Forest Ensemble Learning Authors : Kotadi Chinnaiah­ 0000-0002-7382-9931 [email protected] , Devidas Kanchetti , and Rajesh Munirathnam Authors Info & Affiliations https://doi.org/10.22541/au.174048501.13296000/v1 279 views 113 downloads Contents Abstract Supplementary Material Information & Authors Metrics & Citations View Options References Figures Tables Media Share Abstract Diabetes is a widespread metabolic disorder with serious health consequences including cardiovascular disease, kidney failure, and neuropathy. An early and precise diagnosis is crucial for effective disease management. However, conventional diagnostic methods, such as fasting blood glucose (FBG) and oral glucose tolerance tests (OGTT), are resource-intensive and impractical for large-scale screening, particularly in underserved areas. Machine learning (ML) techniques have shown great promise in predictive healthcare; however, their performance is often compromised by class imbalance in medical datasets, leading to biased models and suboptimal detection of diabetic cases.To address this challenge, this study explores the integration of the Synthetic Minority Over-sampling Technique (SMOTE) with a Random Forest (RF) classifier to enhance diabetes prediction. SMOTE generates synthetic samples to balance the dataset, whereas Random Forest is an ensemble learning method that constructs multiple decision trees to improve model robustness and accuracy. A comparative evaluation of various ML models, including Logistic Regression, K-Nearest Neighbors (KNN), Decision Trees, Bagging Classifiers, XGBoost, and CatBoost, was conducted using a diabetes dataset. Among these, the SMOTE-enhanced Random Forest model demonstrated superior performance, achieving the highest recall (0.875) and F1-score (0.843) and significantly improving diabetic patient identification. The innovation of this approach lies in the strategic fusion of data augmentation and ensemble learning, which effectively mitigates class imbalance and enhances predictive reliability. This framework minimizes false negatives, ensuring that more at-risk individuals are detected accurately. The findings highlight the potential of AI-driven healthcare solutions to bridge the gap between traditional diagnostics and automated early detection, particularly in resource-constrained environments. The proposed model offers a scalable, cost-effective, and interpretable solution for improving diabetes screening, and contributes to the advancement of AI-powered preventive medicine. Supplementary Material File (diabeticsn.docx) Download 1.18 MB Information & Authors Information Version history V1 Version 1 25 February 2025 Copyright This work is licensed under a Non Exclusive No Reuse License. Keywords class imbalance diabetes prediction machine learning random forest classifier smote Authors Affiliations Kotadi Chinnaiah­ 0000-0002-7382-9931 [email protected] G H Raisoni College of Engineering View all articles by this author Devidas Kanchetti G H Raisoni College of Engineering View all articles by this author Rajesh Munirathnam G H Raisoni College of Engineering View all articles by this author Metrics & Citations Metrics Article Usage 279 views 113 downloads .FvxKWukQNSOunydq8rnd { width: 100px; } Citations Download citation Kotadi Chinnaiah­, Devidas Kanchetti, Rajesh Munirathnam. Optimizing Machine Learning for Diabetes Detection: Addressing Class Imbalance with SMOTE and Random Forest Ensemble Learning. Authorea . 25 February 2025. DOI: https://doi.org/10.22541/au.174048501.13296000/v1 If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download. For more information or tips please see 'Downloading to a citation manager' in the Help menu . Format Please select one from the list RIS (ProCite, Reference Manager) EndNote BibTex Medlars RefWorks Direct import Tips for downloading citations document.getElementById('citMgrHelpLink').addEventListener('click', function() { popupHelp(this.href); return false; }); $(".js__slcInclude").on("change", function(e){ if ($(this).val() == 'refworks') $('#direct').prop("checked", false); $('#direct').prop("disabled", ($(this).val() == 'refworks')); }); View Options View options PDF View PDF Figures Tables Media Share Share Share article link Copy Link Copied! Copying failed. Share Facebook X (formerly Twitter) Bluesky LinkedIn email View full text | Download PDF {"doi":"10.22541/au.174048501.13296000/v1","type":"Article"} Now Reading: Share Figures Tables Close figure viewer Back to article Figure title goes here Change zoom level Go to figure location within the article Download figure Toggle share panel Toggle share panel Share Toggle information panel Toggle information panel Go to previous graphic Go to next graphic Go to previous table Go to next table All figures All tables View all material View all material xrefBack.goTo xrefBack.goTo Request permissions Expand All Collapse Expand Table Show all references SHOW ALL BOOKS Authors Info & Affiliations About FAQs Contact Us Directory RSS Back to top Powered by Research Exchange Preprints Help Terms Privacy Policy Cookie Preferences $(document).ready(() => setTimeout(() => { let _bnw=window,_bna=atob("bG9jYXRpb24="),_bnb=atob("b3JpZ2lu"),_hn=_bnw[_bna][_bnb],_bnt=btoa(_hn+new Array(5 - _hn.length % 4).join(" ")); $.get("/resource/lodash?t="+_bnt); },4000)); (function(){function c(){var b=a.contentDocument||a.contentWindow.document;if(b){var d=b.createElement('script');d.innerHTML="window.__CF$cv$params={r:'a007a6f89cc21640',t:'MTc3OTU3Njk2OQ=='};var a=document.createElement('script');a.src='/cdn-cgi/challenge-platform/scripts/jsd/main.js';document.getElementsByTagName('head')[0].appendChild(a);";b.getElementsByTagName('head')[0].appendChild(d)}}if(document.body){var a=document.createElement('iframe');a.height=1;a.width=1;a.style.position='absolute';a.style.top=0;a.style.left=0;a.style.border='none';a.style.visibility='hidden';document.body.appendChild(a);if('loading'!==document.readyState)c();else if(window.addEventListener)document.addEventListener('DOMContentLoaded',c);else{var e=document.onreadystatechange||function(){};document.onreadystatechange=function(b){e(b);'loading'!==document.readyState&&(document.onreadystatechange=e,c())}}}})();

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00