Comparative Analysis of Anomaly Detection Methods in Medical Data Using a Multi-metric Evaluation Framework

preprint OA: closed
Full text JSON View at publisher
Full text 15,418 characters · extracted from preprint-html · click to expand
Comparative Analysis of Anomaly Detection Methods in Medical Data Using a Multi-metric Evaluation Framework | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Comparative Analysis of Anomaly Detection Methods in Medical Data Using a Multi-metric Evaluation Framework Nataliya Boyko This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9125432/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 9 You are reading this latest preprint version Abstract Context. Medical datasets used for tuberculosis treatment monitoring often contain heterogeneous, noisy, or incomplete records resulting from data entry errors, measurement inconsistencies, or clinically rare cases. Such anomalies distort statistical patterns and significantly reduce the reliability of predictive analytics and clinical decision-support systems. Effective anomaly detection is therefore necessary to ensure the correctness of medical conclusions and treatment recommendations. Objective. The objective of this study is to develop and evaluate a methodological framework for detecting anomalies in tuberculosis treatment data that increases the accuracy, robustness, and interpretability of machine learning models used in clinical analytics. Method. Four anomaly detection algorithms based on different theoretical principles were analyzed: Isolation Forest (isolation-based), Local Outlier Factor (density-based), One-Class Support Vector Machine (kernel-based), and Elliptic Envelope (covariance-based). Each method was trained and optimized using hyperparameter tuning. Their performance was evaluated within a multi-metric assessment framework including Precision, Recall, F1-score, ROC-AUC, PR-AUC, and Matthews Correlation Coefficient. In addition, a stacking ensemble with LGBM and Logistic Regression as meta-models was proposed. Statistical significance was verified using bootstrapping and Mann–Whitney U testing. Results. After optimization, the Elliptic Envelope method demonstrated the most balanced performance among individual models (F1 = 0.8182, ROC-AUC = 0.9609, MCC = 0.8048). However, the proposed stacking ensemble significantly outperformed all standalone methods, achieving F1 = 0.96, Recall = 1.0, PR-AUC = 0.98, and MCC = 0.95. Statistical testing confirmed that the improvement provided by the ensemble is significant at α = 0.05 ( p < 0.001). Robustness analysis further demonstrated stable ensemble performance under up to 15% artificial noise distortion. Conclusions. The study shows that anomaly detection effectiveness in medical data depends on the structural characteristics of the feature space, and no single method is universally optimal. The proposed stacking ensemble provides a statistically validated and robust improvement over individual models and is recommended for integration into clinical decision-support systems for tuberculosis monitoring. Future research will focus on extending the approach to multimodal medical datasets and implementing cost-sensitive anomaly evaluation strategies. Biological sciences/Computational biology and bioinformatics Health sciences/Diseases Health sciences/Health care Physical sciences/Mathematics and computing Health sciences/Medical research anomaly detection tuberculosis treatment monitoring ensemble learning Elliptic Envelope Isolation Forest Local Outlier Factor One-Class SVM LGBM medical decision support robust analytics Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Under Review Version 1 posted Reviewers agreed at journal 04 May, 2026 Reviews received at journal 30 Apr, 2026 Reviewers agreed at journal 13 Apr, 2026 Reviewers agreed at journal 30 Mar, 2026 Reviewers invited by journal 30 Mar, 2026 Editor invited by journal 30 Mar, 2026 Editor assigned by journal 24 Mar, 2026 Submission checks completed at journal 24 Mar, 2026 First submitted to journal 14 Mar, 2026 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9125432","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":614653182,"identity":"953b2a73-79bb-4b43-b9e5-984f849b6876","order_by":0,"name":"Nataliya Boyko","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABFUlEQVRIiWNgGAWjYLCCBwZAgpmBDUgmMPCDRBIK8KlnBipA1iLZwAAVwasFwoJoMTgAYuPRYs5+/uCHhILD8gzsDGwPPralyRmfX534AehUeX6xA1i1WPYkM0skGBw2bGBmYDec2ZZjbHbj7WagCIPhzNkJWLUYHEhmAGlhBGphk+Ztq0jcduPsBpCWBIPbOLScf8z8A6jFHq5l84yzm3/g1XIjmQ1kSyJUS07iBv7ebfhtufHYzCLBID25jZmx3XDGuTRjiRu824AiErj9cj7x8Y0Pf6xt+/kPH3vwoSxZjr//7OabPyps5PmlsWuBgmZgpDA2MDCCokYCrFICn3IQqIPSf4CY/wAh1aNgFIyCUTDCAAB/gFy7PbI1FQAAAABJRU5ErkJggg==","orcid":"","institution":"Stepan Gzhytskyi National University of Veterinary Medicine and Biotechnologies Lviv","correspondingAuthor":true,"prefix":"","firstName":"Nataliya","middleName":"","lastName":"Boyko","suffix":""}],"badges":[],"createdAt":"2026-03-15 00:08:04","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-9125432/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9125432/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":105904759,"identity":"62fd189f-5656-427b-b4c3-67a7bcda25dd","added_by":"auto","created_at":"2026-04-01 10:10:26","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":955669,"visible":true,"origin":"","legend":"","description":"","filename":"paperBoykoWord2026.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9125432/v1_covered_ef5c8e9a-cb9d-4932-a7b6-92470f12ad6c.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"\u003cp\u003eComparative Analysis of Anomaly Detection Methods in Medical Data Using a Multi-metric Evaluation Framework\u003c/p\u003e","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"anomaly detection, tuberculosis treatment monitoring, ensemble learning, Elliptic Envelope, Isolation Forest, Local Outlier Factor, One-Class SVM, LGBM, medical decision support, robust analytics","lastPublishedDoi":"10.21203/rs.3.rs-9125432/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9125432/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eContext.\u003c/h2\u003e \u003cp\u003eMedical datasets used for tuberculosis treatment monitoring often contain heterogeneous, noisy, or incomplete records resulting from data entry errors, measurement inconsistencies, or clinically rare cases. Such anomalies distort statistical patterns and significantly reduce the reliability of predictive analytics and clinical decision-support systems. Effective anomaly detection is therefore necessary to ensure the correctness of medical conclusions and treatment recommendations.\u003c/p\u003e\u003ch2\u003eObjective.\u003c/h2\u003e \u003cp\u003eThe objective of this study is to develop and evaluate a methodological framework for detecting anomalies in tuberculosis treatment data that increases the accuracy, robustness, and interpretability of machine learning models used in clinical analytics.\u003c/p\u003e\u003ch2\u003eMethod.\u003c/h2\u003e \u003cp\u003eFour anomaly detection algorithms based on different theoretical principles were analyzed: Isolation Forest (isolation-based), Local Outlier Factor (density-based), One-Class Support Vector Machine (kernel-based), and Elliptic Envelope (covariance-based). Each method was trained and optimized using hyperparameter tuning. Their performance was evaluated within a multi-metric assessment framework including Precision, Recall, F1-score, ROC-AUC, PR-AUC, and Matthews Correlation Coefficient. In addition, a stacking ensemble with LGBM and Logistic Regression as meta-models was proposed. Statistical significance was verified using bootstrapping and Mann\u0026ndash;Whitney U testing.\u003c/p\u003e\u003ch2\u003eResults.\u003c/h2\u003e \u003cp\u003eAfter optimization, the Elliptic Envelope method demonstrated the most balanced performance among individual models (F1\u0026thinsp;=\u0026thinsp;0.8182, ROC-AUC\u0026thinsp;=\u0026thinsp;0.9609, MCC\u0026thinsp;=\u0026thinsp;0.8048). However, the proposed stacking ensemble significantly outperformed all standalone methods, achieving F1\u0026thinsp;=\u0026thinsp;0.96, Recall\u0026thinsp;=\u0026thinsp;1.0, PR-AUC\u0026thinsp;=\u0026thinsp;0.98, and MCC\u0026thinsp;=\u0026thinsp;0.95. Statistical testing confirmed that the improvement provided by the ensemble is significant at α\u0026thinsp;=\u0026thinsp;0.05 (\u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001). Robustness analysis further demonstrated stable ensemble performance under up to 15% artificial noise distortion.\u003c/p\u003e\u003ch2\u003eConclusions.\u003c/h2\u003e \u003cp\u003eThe study shows that anomaly detection effectiveness in medical data depends on the structural characteristics of the feature space, and no single method is universally optimal. The proposed stacking ensemble provides a statistically validated and robust improvement over individual models and is recommended for integration into clinical decision-support systems for tuberculosis monitoring. Future research will focus on extending the approach to multimodal medical datasets and implementing cost-sensitive anomaly evaluation strategies.\u003c/p\u003e","manuscriptTitle":"Comparative Analysis of Anomaly Detection Methods in Medical Data Using a Multi-metric Evaluation Framework","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-04-01 07:31:31","doi":"10.21203/rs.3.rs-9125432/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"reviewerAgreed","content":"190375536345504749701353591338038529279","date":"2026-05-04T20:23:58+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-04-30T10:34:37+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"161713404794505701569570626422919845283","date":"2026-04-13T09:53:04+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"291939716546899252771244462535079759476","date":"2026-03-30T14:07:54+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2026-03-30T13:18:39+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2026-03-30T12:28:50+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2026-03-24T11:24:19+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2026-03-24T11:23:55+00:00","index":"","fulltext":""},{"type":"submitted","content":"Scientific Reports","date":"2026-03-14T23:53:46+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"5b15d340-87bc-465a-a27f-84cc8f554808","owner":[],"postedDate":"April 1st, 2026","published":true,"recentEditorialEvents":[{"type":"reviewerAgreed","content":"190375536345504749701353591338038529279","date":"2026-05-04T20:23:58+00:00","index":50,"fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-04-30T10:34:37+00:00","index":41,"fulltext":""}],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":65518915,"name":"Biological sciences/Computational biology and bioinformatics"},{"id":65518916,"name":"Health sciences/Diseases"},{"id":65518917,"name":"Health sciences/Health care"},{"id":65518918,"name":"Physical sciences/Mathematics and computing"},{"id":65518919,"name":"Health sciences/Medical research"}],"tags":[],"updatedAt":"2026-04-01T07:31:31+00:00","versionOfRecord":[],"versionCreatedAt":"2026-04-01 07:31:31","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9125432","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9125432","identity":"rs-9125432","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00