A Machine Learning Model Based on CBC-Derived Parameters to Distinguish Benign from Malignant Lymphoproliferative Disorders

preprint OA: closed
Full text JSON View at publisher
Full text 15,748 characters · extracted from preprint-html · click to expand
A Machine Learning Model Based on CBC-Derived Parameters to Distinguish Benign from Malignant Lymphoproliferative Disorders | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article A Machine Learning Model Based on CBC-Derived Parameters to Distinguish Benign from Malignant Lymphoproliferative Disorders Jing Jing¹, Xiaoyan Hao¹, Yanjun Diao, Xiang Cheng, Xiaoxia Gao, and 6 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7283529/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Objective Infectious mononucleosis (IM) and malignant lymphoproliferative disorders often present with similar initial symptoms and signs. The purpose of study is to develop a machine learning-based model to distinguish IM from lymphoid hematologic malignancies using both routine and research parameters derived from complete blood count (CBC) analysis. Methods The multicenter model development and validation study utilized data from three independent institutions. Patients with a final confirmed diagnosis of infectious mononucleosis (IM), acute lymphoblastic leukemia (ALL), or chronic lymphoproliferative disorders (CLPD) were included. A total of 24 routine and 21 report-derived parameters from the complete blood count (CBC) at initial presentation to our institution were collected. Nine candidate biomarkers and five machine learning classifiers were employed to construct predictive models from the training data. The models were validated using an independent test dataset. Model performance was assessed by calculating the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, accuracy, positive predictive value (PPV), negative predictive value (NPV), Matthews correlation coefficient (MCC), and Cohen’s Kappa coefficient. The diagnostic performance of all five models was evaluated using both internal ROC analysis and external validation datasets. Result A total of 114 patients with infectious mononucleosis (IM), 108 with hematologic diseases, and 150 healthy controls were included in the study. In both the IM versus hematologic disease classification model and the IM versus healthy control model, excluding the decision tree (DT), other methods achieved good evaluation indicators, most of them are above 87% in the validation cohorts, suggesting reliable diagnostic capability. In the classification model distinguishing healthy individuals from patients with hematologic diseases, the XGBoost algorithm showed stable and high performance across the training, validation, and test sets. The receiver operating characteristic (ROC) curves confirmed that XGBoost was the most effective model, the area under the curve (AUC) values was 0.995 in the test sets. Conclusion The XGBoost model demonstrated the most satisfactory performance. Machine learning algorithms show promise for clinical implementation, and the proposed model may aid in the early identification of IM and malignant lymphoid disorders with overlapping initial presentations. This provides valuable assistance for clinical doctors to intervene early and improve prognosis. machine Learning complete blood count routine and research parameters chronic lymphoproliferative disorders ALL IM Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7283529","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":509255026,"identity":"95ccc9ad-165e-4bdf-bcbd-e5a6b6e9a5a9","order_by":0,"name":"Jing Jing¹","email":"","orcid":"","institution":"First Affiliated Hospital of Air Force Medical University","correspondingAuthor":false,"prefix":"","firstName":"Jing","middleName":"","lastName":"Jing¹","suffix":""},{"id":509255027,"identity":"5b46321d-bf1a-41f7-adb1-41a2329e1ccf","order_by":1,"name":"Xiaoyan Hao¹","email":"","orcid":"","institution":"First Affiliated Hospital of Air Force Medical University","correspondingAuthor":false,"prefix":"","firstName":"Xiaoyan","middleName":"","lastName":"Hao¹","suffix":""},{"id":509255028,"identity":"ed1fc062-0bcf-419b-8b3d-e0c104ff069b","order_by":2,"name":"Yanjun Diao","email":"","orcid":"","institution":"First Affiliated Hospital of Air Force Medical University","correspondingAuthor":false,"prefix":"","firstName":"Yanjun","middleName":"","lastName":"Diao","suffix":""},{"id":509255029,"identity":"e52530c9-db8b-4cfc-8b23-e875372b1230","order_by":3,"name":"Xiang Cheng","email":"","orcid":"","institution":"Shenzhen Mindray Bio-Medical Electronics Co. Ltd","correspondingAuthor":false,"prefix":"","firstName":"Xiang","middleName":"","lastName":"Cheng","suffix":""},{"id":509255030,"identity":"dfc1aaa8-32e7-4835-8c58-aef3450c2c01","order_by":4,"name":"Xiaoxia Gao","email":"","orcid":"","institution":"Northwest Women’s and Children’s hospital","correspondingAuthor":false,"prefix":"","firstName":"Xiaoxia","middleName":"","lastName":"Gao","suffix":""},{"id":509255031,"identity":"fd5e4cca-0ce5-40a8-aab9-2fbb1ee6d843","order_by":5,"name":"Bin Huang","email":"","orcid":"","institution":"Northwest Women’s and Children’s hospital","correspondingAuthor":false,"prefix":"","firstName":"Bin","middleName":"","lastName":"Huang","suffix":""},{"id":509255032,"identity":"67d486b0-62b1-4556-8b1a-b7e1cdf994c5","order_by":6,"name":"Yun Yang","email":"","orcid":"","institution":"Shenzhen Mindray Bio-Medical Electronics Co. Ltd","correspondingAuthor":false,"prefix":"","firstName":"Yun","middleName":"","lastName":"Yang","suffix":""},{"id":509255033,"identity":"04e2acd9-449e-4c6f-a51f-5be6547451ad","order_by":7,"name":"Enliang Hu","email":"","orcid":"","institution":"First Affiliated Hospital of Air Force Medical University","correspondingAuthor":false,"prefix":"","firstName":"Enliang","middleName":"","lastName":"Hu","suffix":""},{"id":509255034,"identity":"d671bed8-96cf-4f32-be44-dacda8d464bd","order_by":8,"name":"Yuan Zhao","email":"","orcid":"","institution":"First Affiliated Hospital of Air Force Medical University","correspondingAuthor":false,"prefix":"","firstName":"Yuan","middleName":"","lastName":"Zhao","suffix":""},{"id":509255035,"identity":"8aba051d-7739-434a-9df2-3933ff915046","order_by":9,"name":"Jingyuan Jia","email":"","orcid":"","institution":"First Affiliated Hospital of Air Force Medical University","correspondingAuthor":false,"prefix":"","firstName":"Jingyuan","middleName":"","lastName":"Jia","suffix":""},{"id":509255036,"identity":"dd945b5c-eb7c-43fd-aef6-393672b26223","order_by":10,"name":"Jiayun Liu","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAwUlEQVRIiWNgGAWjYBACxmYwZcMP4bIRryVNsoFoLVBwmAQtzO3Mz6R5d5yX0J12xoDhQ9lhBv7ZDYQcxmYmzXvmtoTZ7RwDxhnnDjNI3DlASAsDUEvb7TqQFmbetsMMBhIJhLSwfwNqOQe2hfkvcVp4QLYcgGhhJFJLseXctmSglrSCgz3n0nkkbhDQYth/fOONt212QC3JGx/8KLOW459BSEsDA4sEjHMAiHnwqwcCeWDUfCCoahSMglEwCkY2AACOCz5Fu6RJ6gAAAABJRU5ErkJggg==","orcid":"","institution":"First Affiliated Hospital of Air Force Medical University","correspondingAuthor":true,"prefix":"","firstName":"Jiayun","middleName":"","lastName":"Liu","suffix":""}],"badges":[],"createdAt":"2025-08-03 13:23:09","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7283529/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7283529/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":93772100,"identity":"076e28a6-73de-4d1b-a30f-7fe4daf17101","added_by":"auto","created_at":"2025-10-17 12:08:27","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":743310,"visible":true,"origin":"","legend":"","description":"","filename":"AMachineLearningBasedModelUsingCBCDerivedParameterstoDistinguishBenignfromMalignantLymphoproliferativeDisorders.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7283529/v1_covered_a6b32ce6-7863-427f-961c-0d342784ad00.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"A Machine Learning Model Based on CBC-Derived Parameters to Distinguish Benign from Malignant Lymphoproliferative Disorders","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"machine Learning, complete blood count, routine and research parameters, chronic lymphoproliferative disorders, ALL, IM","lastPublishedDoi":"10.21203/rs.3.rs-7283529/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7283529/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eObjective\u003c/h2\u003e\u003cp\u003eInfectious mononucleosis (IM) and malignant lymphoproliferative disorders often present with similar initial symptoms and signs. The purpose of study is to develop a machine learning-based model to distinguish IM from lymphoid hematologic malignancies using both routine and research parameters derived from complete blood count (CBC) analysis.\u003c/p\u003e\u003ch2\u003eMethods\u003c/h2\u003e\u003cp\u003eThe multicenter model development and validation study utilized data from three independent institutions. Patients with a final confirmed diagnosis of infectious mononucleosis (IM), acute lymphoblastic leukemia (ALL), or chronic lymphoproliferative disorders (CLPD) were included. A total of 24 routine and 21 report-derived parameters from the complete blood count (CBC) at initial presentation to our institution were collected. Nine candidate biomarkers and five machine learning classifiers were employed to construct predictive models from the training data. The models were validated using an independent test dataset. Model performance was assessed by calculating the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, accuracy, positive predictive value (PPV), negative predictive value (NPV), Matthews correlation coefficient (MCC), and Cohen\u0026rsquo;s Kappa coefficient. The diagnostic performance of all five models was evaluated using both internal ROC analysis and external validation datasets.\u003c/p\u003e\u003ch2\u003eResult\u003c/h2\u003e\u003cp\u003eA total of 114 patients with infectious mononucleosis (IM), 108 with hematologic diseases, and 150 healthy controls were included in the study. In both the IM versus hematologic disease classification model and the IM versus healthy control model, excluding the decision tree (DT), other methods achieved good evaluation indicators, most of them are above 87% in the validation cohorts, suggesting reliable diagnostic capability. In the classification model distinguishing healthy individuals from patients with hematologic diseases, the XGBoost algorithm showed stable and high performance across the training, validation, and test sets. The receiver operating characteristic (ROC) curves confirmed that XGBoost was the most effective model, the area under the curve (AUC) values was 0.995 in the test sets.\u003c/p\u003e\u003ch2\u003eConclusion\u003c/h2\u003e\u003cp\u003eThe XGBoost model demonstrated the most satisfactory performance. Machine learning algorithms show promise for clinical implementation, and the proposed model may aid in the early identification of IM and malignant lymphoid disorders with overlapping initial presentations. This provides valuable assistance for clinical doctors to intervene early and improve prognosis.\u003c/p\u003e","manuscriptTitle":"A Machine Learning Model Based on CBC-Derived Parameters to Distinguish Benign from Malignant Lymphoproliferative Disorders","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-09-03 17:51:51","doi":"10.21203/rs.3.rs-7283529/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"444bd488-7372-406e-98d9-910771c5a69d","owner":[],"postedDate":"September 3rd, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2025-10-17T12:08:14+00:00","versionOfRecord":[],"versionCreatedAt":"2025-09-03 17:51:51","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7283529","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7283529","identity":"rs-7283529","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00