A Heterogeneity-Aware Privacy-Preserving Federated Learning Framework Using Ensemble Clustering for Healthcare Applications | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article A Heterogeneity-Aware Privacy-Preserving Federated Learning Framework Using Ensemble Clustering for Healthcare Applications Surendiran Muthukumar, Deepalakshmi Kumar, Ranjit Panigrahi, Paolo Barsocchi, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8191856/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 8 You are reading this latest preprint version Abstract Data heterogeneity remains one of the most significant challenges in federated learning (FL), impacting model performance, convergence, and scalability. This issue is especially critical in healthcare, where data is distributed across multiple institutions, devices, and geographical regions, and privacy preservation is paramount. In this study, we propose a novel hybrid algorithm, Dynamic-Cluster Personalized Federated Learning (DCP-FL), which integrates Federated Averaging (FedAvg), Personalized Federated Averaging (p-FedAvg), and dynamic clustering techniques. DCP-FL enables clients to maintain personalized models while contributing to a shared global model, achieving a balance between generalization and personalization. The algorithm clusters clients based on the similarity of their model updates, allowing for targeted aggregation that mitigates the effects of non-IID data distributions. We evaluate DCP-FL using the cardiovascular disease (CVD) dataset and the Breast Cancer Wisconsin (Diagnostic) dataset under realistic non-IID settings in a simulated FL environment using the Flower framework. Experimental results show that DCP-FL achieves 86.8% global model accuracy, 84.5% local model accuracy, and convergence in 35 communication rounds, outperforming FedAvg, p-FedAvg, FedNova and FedClust in both performance and convergence speed. While the approach incurs slightly higher communication costs, the accuracy gains justify the trade-off. These results demonstrate the potential of DCP-FL for privacy-preserving, heterogeneity-aware model training in healthcare and other domains. Federated Learning Data Heterogeneity Clustering Personalized Models Healthcare Informatics Non-IID Data Privacy Preservation Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Under Review Version 1 posted Reviews received at journal 16 Apr, 2026 Reviewers agreed at journal 16 Apr, 2026 Reviewers agreed at journal 15 Apr, 2026 Reviews received at journal 14 Apr, 2026 Reviewers agreed at journal 14 Apr, 2026 Reviewers invited by journal 13 Apr, 2026 Submission checks completed at journal 07 Feb, 2026 First submitted to journal 03 Feb, 2026 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8191856","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":625652958,"identity":"64457c1f-8ca1-4142-bbb8-5bcb39ef4cbb","order_by":0,"name":"Surendiran Muthukumar","email":"","orcid":"","institution":"Kalasalingam Academy of Research and Education","correspondingAuthor":false,"prefix":"","firstName":"Surendiran","middleName":"","lastName":"Muthukumar","suffix":""},{"id":625652959,"identity":"0e8158e4-bb18-4459-b51c-140f99327acf","order_by":1,"name":"Deepalakshmi Kumar","email":"","orcid":"","institution":"Kalasalingam Academy of Research and Education","correspondingAuthor":false,"prefix":"","firstName":"Deepalakshmi","middleName":"","lastName":"Kumar","suffix":""},{"id":625652960,"identity":"aff6c903-b51e-478a-a6bb-b9c67c4e0f49","order_by":2,"name":"Ranjit Panigrahi","email":"","orcid":"","institution":"Amrita School of Artificial Intelligence","correspondingAuthor":false,"prefix":"","firstName":"Ranjit","middleName":"","lastName":"Panigrahi","suffix":""},{"id":625652961,"identity":"8db2e381-4e5c-4aad-b9dc-b4729fe0ea46","order_by":3,"name":"Paolo Barsocchi","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABBklEQVRIie2QsWrDMBCGzxiUJaBVWepXuBCo4yHkVRwEyZLSNUMonjw1yaohD5JRQuApkDWDh0DBUwet3Sq7KXWpnKwd9IGEBPp09x+Ax/M/CSSAvB4z9kDg+xp2O7+UEbFPr0q301Zglv387lbinpbSQBnFO63Mx2G82NKTkrAun6FHpUtJXuepElAN9+c5H2yO7Cln3BYtqiTraAzlEnUfdCBY/zEM8lqxWYJMY1cWPL03ylTQY6MsCNV3lPNXlZmAZaOkBPhtJRGVzYIVF6zOkrOhzYIyLTSSMETnxChXF7MqJ4LWE8tfomin3oxZa6RUXZyNtfYWqV3E2Zbjscfj8Xj+8AnTsl1289atOgAAAABJRU5ErkJggg==","orcid":"","institution":"Institute of Information Science and Technologies","correspondingAuthor":true,"prefix":"","firstName":"Paolo","middleName":"","lastName":"Barsocchi","suffix":""},{"id":625652962,"identity":"75e1048f-238c-4028-ba56-e8ab4cd1551d","order_by":4,"name":"Akash Kumar Bhoi","email":"","orcid":"","institution":"Graphic Era University","correspondingAuthor":false,"prefix":"","firstName":"Akash","middleName":"Kumar","lastName":"Bhoi","suffix":""}],"badges":[],"createdAt":"2025-11-24 09:53:16","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8191856/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8191856/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":107507773,"identity":"ea23f1f8-141d-44ce-af3d-3436346f650a","added_by":"auto","created_at":"2026-04-22 07:13:53","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1127072,"visible":true,"origin":"","legend":"","description":"","filename":"Manuscriptcleaned.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8191856/v1_covered_5ae2ff10-7cf6-473d-823c-0e3156017738.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"A Heterogeneity-Aware Privacy-Preserving Federated Learning Framework Using Ensemble Clustering for Healthcare Applications","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"journal-of-big-data","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"bigd","sideBox":"Learn more about [Journal of Big Data](http://journalofbigdata.springeropen.com)","snPcode":"40537","submissionUrl":"https://submission.nature.com/new-submission/40537/3","title":"Journal of Big Data","twitterHandle":"@SpringerOpen","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"BMC/SO AJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Federated Learning, Data Heterogeneity, Clustering, Personalized Models, Healthcare Informatics, Non-IID Data, Privacy Preservation","lastPublishedDoi":"10.21203/rs.3.rs-8191856/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8191856/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eData heterogeneity remains one of the most significant challenges in federated learning (FL), impacting model performance, convergence, and scalability. This issue is especially critical in healthcare, where data is distributed across multiple institutions, devices, and geographical regions, and privacy preservation is paramount. In this study, we propose a novel hybrid algorithm, Dynamic-Cluster Personalized Federated Learning (DCP-FL), which integrates Federated Averaging (FedAvg), Personalized Federated Averaging (p-FedAvg), and dynamic clustering techniques. DCP-FL enables clients to maintain personalized models while contributing to a shared global model, achieving a balance between generalization and personalization. The algorithm clusters clients based on the similarity of their model updates, allowing for targeted aggregation that mitigates the effects of non-IID data distributions. We evaluate DCP-FL using the cardiovascular disease (CVD) dataset and the Breast Cancer Wisconsin (Diagnostic) dataset under realistic non-IID settings in a simulated FL environment using the Flower framework. Experimental results show that DCP-FL achieves 86.8% global model accuracy, 84.5% local model accuracy, and convergence in 35 communication rounds, outperforming FedAvg, p-FedAvg, FedNova and FedClust in both performance and convergence speed. While the approach incurs slightly higher communication costs, the accuracy gains justify the trade-off. These results demonstrate the potential of DCP-FL for privacy-preserving, heterogeneity-aware model training in healthcare and other domains.\u003c/p\u003e","manuscriptTitle":"A Heterogeneity-Aware Privacy-Preserving Federated Learning Framework Using Ensemble Clustering for Healthcare Applications","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-04-22 07:12:17","doi":"10.21203/rs.3.rs-8191856/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"editorInvitedReview","content":"","date":"2026-04-16T18:29:14+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"106082652367497616646433292981307393930","date":"2026-04-16T11:41:29+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"6334847058437176489520864252565336317","date":"2026-04-15T15:55:14+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-04-14T06:18:39+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"37894839507822478041418582650340493926","date":"2026-04-14T06:10:53+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2026-04-14T03:35:06+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2026-02-07T14:42:46+00:00","index":"","fulltext":""},{"type":"submitted","content":"Journal of Big Data","date":"2026-02-03T07:41:39+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"journal-of-big-data","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"bigd","sideBox":"Learn more about [Journal of Big Data](http://journalofbigdata.springeropen.com)","snPcode":"40537","submissionUrl":"https://submission.nature.com/new-submission/40537/3","title":"Journal of Big Data","twitterHandle":"@SpringerOpen","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"BMC/SO AJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"616096a4-d6b4-4ecf-87d5-4cbf3ed1069b","owner":[],"postedDate":"April 22nd, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2026-04-22T07:12:18+00:00","versionOfRecord":[],"versionCreatedAt":"2026-04-22 07:12:17","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8191856","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8191856","identity":"rs-8191856","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.