FedXGB-OptDP: A Privacy-Optimised Federated XGBoost Framework with Differential Privacy for IID and Non-IID healthcare data

preprint OA: closed
Full text JSON View at publisher
Full text 15,567 characters · extracted from preprint-html · click to expand
FedXGB-OptDP: A Privacy-Optimised Federated XGBoost Framework with Differential Privacy for IID and Non-IID healthcare data | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article FedXGB-OptDP: A Privacy-Optimised Federated XGBoost Framework with Differential Privacy for IID and Non-IID healthcare data B. SASIREKHA, C. GUNAVATHI This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8425166/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 18 You are reading this latest preprint version Abstract The rapid growth of sensitive healthcare data results in a significant need for machine learning systems capable of providing accurate predictions while safeguarding patient privacy. Due to rapid growth, current privacy-preserving federated tree models face significant computational expenses, inadequate noise allocation methodologies, and losses in accuracy while maintaining a trade-off between privacy and utility in both IID and non-IID scenarios. To overcome the challenges, a privacy-focused extension of the Federated XGBoost architecture, FedXGB-OptDP, has been developed. It integrates Hybrid optimisation techniques along with the regularisation. A Depth-Adaptive Differential privacy (DAD), Noise–Aware Regularisation (NAR), and a hybrid optimisation technique such as Genetic Algorithm (GA) and Bayesian TPE search. The DAD-NAR is essential for adaptively regulating the allocation of privacy budgets across tree depths, using calibrated Laplace Noise, and implementing noise-aware node dropout-ensures that model stability throughout training while safeguarding privacy. Each client executes GA-driven federated feature selection when combined with TPE–based hyperparameter optimisation, facilitating efficient learning while maintaining data privacy. Global aggregation is achieved through consensus-driven feature voting and weighted averaging of hyperparameters, eliminating the necessity for complex cryptographic techniques such as Homomorphic Encryption (HE) or Secure Multi-Party Computation (SMPC). Experiments performed on five datasets across both IID and Non-IID configurations demonstrate that our model consistently achieves high levels (up to 95–96%) while ensuring robust privacy safeguards. It exceeds the performance of centralised XGBoost and prominent federated baselines, including PrivaTree, FedXHDP, and FedBoost. Overall, the results show that adaptive differential privacy, when integrated with optimisation, substantially enhances the trade-off between privacy and utility, as well as the reliability and scalability of federated decision-tree models. Hence, it provides a practicable, efficient, and highly precise solution for privacy-preserving collaborative learning within real-world environments. Health sciences/Health care Physical sciences/Mathematics and computing Machine Learning Federated Learning Privacy XGBoost Optimisation techniques Independent and Identically Distributed data (IID) and Non-IID data Depth Adaptive Differential privacy Noise Aware Regularisation Genetic Algorithm Tree Structured Parzen Estimator Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Under Review Version 1 posted Editorial decision: Revision requested 25 Mar, 2026 Reviews received at journal 23 Mar, 2026 Reviews received at journal 21 Mar, 2026 Reviews received at journal 17 Mar, 2026 Reviews received at journal 09 Mar, 2026 Reviews received at journal 28 Feb, 2026 Reviewers agreed at journal 27 Feb, 2026 Reviewers agreed at journal 27 Feb, 2026 Reviewers agreed at journal 27 Feb, 2026 Reviews received at journal 26 Feb, 2026 Reviewers agreed at journal 26 Feb, 2026 Reviewers agreed at journal 25 Feb, 2026 Reviewers agreed at journal 25 Feb, 2026 Reviewers invited by journal 25 Feb, 2026 Editor invited by journal 26 Dec, 2025 Editor assigned by journal 24 Dec, 2025 Submission checks completed at journal 24 Dec, 2025 First submitted to journal 22 Dec, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8425166","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":597195197,"identity":"ef2c6028-9f24-498e-acbb-d427a0146b2b","order_by":0,"name":"B. SASIREKHA","email":"","orcid":"","institution":"Vellore Institute of Technology","correspondingAuthor":false,"prefix":"","firstName":"B.","middleName":"","lastName":"SASIREKHA","suffix":""},{"id":597195198,"identity":"974f7a51-3bb4-4bc6-9c32-e60b73bb2e02","order_by":1,"name":"C. GUNAVATHI","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA6klEQVRIiWNgGAWjYBCDBDYQ+YFBAsolrMMArIVxBoOEBPFaQCQzDwPMGjxAPiL34Geemj95fAzMTzfbtlnUMbAffsDwcAduLYY38pKleY4ZFLMxsJndzm0DOownzYAh8QweLTNyDCRnsBkktjHwsN3OOQPySw4DA5CLT4vxzxn/oFosQFr43+DXIi+RYybxsQ2qhaECqEWCgC0GPG/MLD72GSe2MbOZ3eypkJBsk3hmcACvLe05xjcSvsklzm9vfnbjh0EdPz9/8sOHP/HZcgDGYobSoDg9gE0p3JYGfLKjYBSMglEwCkAAAKygRhfeWhImAAAAAElFTkSuQmCC","orcid":"","institution":"Vellore Institute of Technology","correspondingAuthor":true,"prefix":"","firstName":"C.","middleName":"","lastName":"GUNAVATHI","suffix":""}],"badges":[],"createdAt":"2025-12-22 12:54:50","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8425166/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8425166/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":103567376,"identity":"fd5f38e7-8a3c-4eb8-8d73-9ad8a9801db5","added_by":"auto","created_at":"2026-02-27 07:30:13","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1457437,"visible":true,"origin":"","legend":"","description":"","filename":"FedXGBOptDP.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8425166/v1_covered_82a661a9-487c-4c91-b79c-cdbe3532141c.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"FedXGB-OptDP: A Privacy-Optimised Federated XGBoost Framework with Differential Privacy for IID and Non-IID healthcare data","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Machine Learning, Federated Learning, Privacy, XGBoost, Optimisation techniques, Independent and Identically Distributed data (IID) and Non-IID data, Depth Adaptive Differential privacy, Noise Aware Regularisation, Genetic Algorithm, Tree Structured Parzen Estimator","lastPublishedDoi":"10.21203/rs.3.rs-8425166/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8425166/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThe rapid growth of sensitive healthcare data results in a significant need for machine learning systems capable of providing accurate predictions while safeguarding patient privacy. Due to rapid growth, current privacy-preserving federated tree models face significant computational expenses, inadequate noise allocation methodologies, and losses in accuracy while maintaining a trade-off between privacy and utility in both IID and non-IID scenarios. To overcome the challenges, a privacy-focused extension of the Federated XGBoost architecture, FedXGB-OptDP, has been developed. It integrates Hybrid optimisation techniques along with the regularisation. A Depth-Adaptive Differential privacy (DAD), Noise\u0026ndash;Aware Regularisation (NAR), and a hybrid optimisation technique such as Genetic Algorithm (GA) and Bayesian TPE search. The DAD-NAR is essential for adaptively regulating the allocation of privacy budgets across tree depths, using calibrated Laplace Noise, and implementing noise-aware node dropout-ensures that model stability throughout training while safeguarding privacy. Each client executes GA-driven federated feature selection when combined with TPE\u0026ndash;based hyperparameter optimisation, facilitating efficient learning while maintaining data privacy. Global aggregation is achieved through consensus-driven feature voting and weighted averaging of hyperparameters, eliminating the necessity for complex cryptographic techniques such as Homomorphic Encryption (HE) or Secure Multi-Party Computation (SMPC). Experiments performed on five datasets across both IID and Non-IID configurations demonstrate that our model consistently achieves high levels (up to 95\u0026ndash;96%) while ensuring robust privacy safeguards. It exceeds the performance of centralised XGBoost and prominent federated baselines, including PrivaTree, FedXHDP, and FedBoost. Overall, the results show that adaptive differential privacy, when integrated with optimisation, substantially enhances the trade-off between privacy and utility, as well as the reliability and scalability of federated decision-tree models. Hence, it provides a practicable, efficient, and highly precise solution for privacy-preserving collaborative learning within real-world environments.\u003c/p\u003e","manuscriptTitle":"FedXGB-OptDP: A Privacy-Optimised Federated XGBoost Framework with Differential Privacy for IID and Non-IID healthcare data","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-02-27 07:30:02","doi":"10.21203/rs.3.rs-8425166/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2026-03-25T16:53:18+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-03-23T22:59:18+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-03-21T16:56:41+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-03-17T19:19:08+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-03-09T20:28:28+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-02-28T22:25:49+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"330845878166463232189433687659253860086","date":"2026-02-27T12:46:30+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"268812935821431516634690473676370636802","date":"2026-02-27T12:09:53+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"92790878046265285228272426129407716421","date":"2026-02-27T11:27:28+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-02-27T04:54:41+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"25916075345464624647678347575979066907","date":"2026-02-27T04:33:57+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"106331152376109945108314216570241768373","date":"2026-02-25T23:26:59+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"246969560653395425594163907166017977669","date":"2026-02-25T11:57:37+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2026-02-25T10:54:44+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2025-12-26T20:35:44+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-12-24T07:28:49+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-12-24T07:27:26+00:00","index":"","fulltext":""},{"type":"submitted","content":"Scientific Reports","date":"2025-12-22T12:39:58+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"2a296a23-9d7e-4a66-beae-c934313a8514","owner":[],"postedDate":"February 27th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":63548835,"name":"Health sciences/Health care"},{"id":63548836,"name":"Physical sciences/Mathematics and computing"}],"tags":[],"updatedAt":"2026-05-11T09:24:05+00:00","versionOfRecord":[],"versionCreatedAt":"2026-02-27 07:30:02","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8425166","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8425166","identity":"rs-8425166","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00