Full text
7,108 characters
· extracted from
preprint-html
· click to expand
Ensemble Machine Learning for Enhanced Diabetes Detection Using CTGAN-Balanced Data | Authorea try { document.documentElement.classList.add('js'); } catch (e) { } var _gaq = _gaq || []; _gaq.push(['_setAccount', 'G-8VDV14Y67G']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); Skip to main content Preprints Collections Wiley Open Research IET Open Research Ecological Society of Japan All Collections About About Authorea FAQs Contact Us Quick Search anywhere Search for preprint articles, keywords, etc. Search Search ADVANCED SEARCH SCROLL This is a preprint and has not been peer reviewed. Data may be preliminary. 9 January 2025 V1 Latest version Share on Ensemble Machine Learning for Enhanced Diabetes Detection Using CTGAN-Balanced Data Authors : Mohammad Reza Abbaszadeh Bavil Soflaei 0009-0000-8473-9854 [email protected] and Karim Samadzamini Authors Info & Affiliations https://doi.org/10.22541/au.173638341.18317097/v1 294 views 121 downloads Contents Abstract Supplementary Material Information & Authors Metrics & Citations View Options References Figures Tables Media Share Abstract Diabetes, a pervasive chronic disease, characterized by insufficient insulin production or the body’s inefficiency in insulin utilization. With rising global spread and severe consequences like blindness, kidney failure, and stroke, timely detection is paramount. This paper introduces an innovative framework for diabetes detection using machine learning, concentrating on a benchmark dataset in the field, Pima Indians Diabetes Database. The dataset inherent challenges like class imbalance and missing values are dealt with utilizing Conditional Tabular Generative Adversarial network (CTGAN), and pre-processing methods. Furthermore, the study also employs an ensemble approach, combining four base models—Random Forest (RF), Logistic Regression (LR), Gaussian Naive Bayes (GNB), and K-Nearest-Neighbor (KNN)—trained on a balanced dataset and amalgamated through stacking with an Extreme Gradient Boosting (XGB) meta-classifier. The resulting ensemble model demonstrates superior performance, achieving 96% accuracy on the test set. In comparison, standalone models, exhibit lower accuracy at 85% on an average. This work highlights the effectiveness of ensemble techniques and data synthesis in improving diabetes prediction, and emphasizes the significance of early detection in mitigating the global impact of this life-threatening disease. Supplementary Material File (manuscript.docx) Download 622.26 KB Information & Authors Information Version history V1 Version 1 09 January 2025 Copyright This work is licensed under a Non Exclusive No Reuse License. Keywords class imbalance ctgan diabetes detection ensemble learning machine learning Authors Affiliations Mohammad Reza Abbaszadeh Bavil Soflaei 0009-0000-8473-9854 [email protected] University College of Nabi Akram View all articles by this author Karim Samadzamini University College of Nabi Akram View all articles by this author Metrics & Citations Metrics Article Usage 294 views 121 downloads .FvxKWukQNSOunydq8rnd { width: 100px; } Citations Download citation Mohammad Reza Abbaszadeh Bavil Soflaei, Karim Samadzamini. Ensemble Machine Learning for Enhanced Diabetes Detection Using CTGAN-Balanced Data. Authorea . 09 January 2025. DOI: https://doi.org/10.22541/au.173638341.18317097/v1 If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download. For more information or tips please see 'Downloading to a citation manager' in the Help menu . Format Please select one from the list RIS (ProCite, Reference Manager) EndNote BibTex Medlars RefWorks Direct import Tips for downloading citations document.getElementById('citMgrHelpLink').addEventListener('click', function() { popupHelp(this.href); return false; }); $(".js__slcInclude").on("change", function(e){ if ($(this).val() == 'refworks') $('#direct').prop("checked", false); $('#direct').prop("disabled", ($(this).val() == 'refworks')); }); Cited by Prokash Gogoi, J. Arul Valan, A multiclass machine learning framework for chronic kidney disease staging using CTGAN-based synthetic data augmentation and explainable AI, Computer Methods in Biomechanics and Biomedical Engineering, (1-18), (2026). https://doi.org/10.1080/10255842.2025.2610677 Crossref Mohammad Reza Abbaszadeh Bavil Soflaei, Karim Samadzamini, Arash Salehpour, Beyond imbalance: advancing breast cancer diagnosis with synthetic data and ML modeling, Network Modeling Analysis in Health Informatics and Bioinformatics, 14 , 1, (2025). https://doi.org/10.1007/s13721-025-00512-6 Crossref Loading... View Options View options PDF View PDF Figures Tables Media Share Share Share article link Copy Link Copied! Copying failed. Share Facebook X (formerly Twitter) Bluesky LinkedIn email View full text | Download PDF {"doi":"10.22541/au.173638341.18317097/v1","type":"Article"} Now Reading: Share Figures Tables Close figure viewer Back to article Figure title goes here Change zoom level Go to figure location within the article Download figure Toggle share panel Toggle share panel Share Toggle information panel Toggle information panel Go to previous graphic Go to next graphic Go to previous table Go to next table All figures All tables View all material View all material xrefBack.goTo xrefBack.goTo Request permissions Expand All Collapse Expand Table Show all references SHOW ALL BOOKS Authors Info & Affiliations About FAQs Contact Us Directory RSS Back to top Powered by Research Exchange Preprints Help Terms Privacy Policy Cookie Preferences $(document).ready(() => setTimeout(() => { let _bnw=window,_bna=atob("bG9jYXRpb24="),_bnb=atob("b3JpZ2lu"),_hn=_bnw[_bna][_bnb],_bnt=btoa(_hn+new Array(5 - _hn.length % 4).join(" ")); $.get("/resource/lodash?t="+_bnt); },4000)); (function(){function c(){var b=a.contentDocument||a.contentWindow.document;if(b){var d=b.createElement('script');d.innerHTML="window.__CF$cv$params={r:'a0038b9cffd3f047',t:'MTc3OTUzMzkwNQ=='};var a=document.createElement('script');a.src='/cdn-cgi/challenge-platform/scripts/jsd/main.js';document.getElementsByTagName('head')[0].appendChild(a);";b.getElementsByTagName('head')[0].appendChild(d)}}if(document.body){var a=document.createElement('iframe');a.height=1;a.width=1;a.style.position='absolute';a.style.top=0;a.style.left=0;a.style.border='none';a.style.visibility='hidden';document.body.appendChild(a);if('loading'!==document.readyState)c();else if(window.addEventListener)document.addEventListener('DOMContentLoaded',c);else{var e=document.onreadystatechange||function(){};document.onreadystatechange=function(b){e(b);'loading'!==document.readyState&&(document.onreadystatechange=e,c())}}}})();
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.