Construction of Prediction Model for Severe Pneumonia in Children Based on Machine Learning

preprint OA: closed CC-BY-4.0
📄 Open PDF Full text JSON View at publisher
Full text 15,916 characters · extracted from preprint-html · click to expand
Construction of Prediction Model for Severe Pneumonia in Children Based on Machine Learning | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Construction of Prediction Model for Severe Pneumonia in Children Based on Machine Learning Shuai Yu, Zhengfeng Xue, Qing Liu, Yaya Ren, Xue Zhou, Yuanxia Li This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7575261/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 6 You are reading this latest preprint version Abstract Background:Pneumonia stands as the primary infectious disease and leading cause of mortality in children under 5 years old globally. The vulnerability to delayed diagnosis and treatment of severe pneumonia (SP) in children arises from the underdeveloped respiratory and immune systems, coupled with challenges in symptom expression. Consequently, SP can instigate diverse systemic complications, posing a grave threat to children's well-being and escalating societal and economic burdens. Presently, clinical tools for assessing pneumonia severity in children exhibit limitations in sensitivity, specificity, and inter-observer consistency. Furthermore, artificial intelligence research in pediatric pneumonia significantly trails behind advancements in adult pneumonia. Objective: This study aimed to develop a machine learning-based prediction model for early identification and intervention of severe pneumonia in children to support clinical decision-making. Methods: A retrospective analysis was conducted on 360 pneumonia cases admitted to the Affiliated Hospital of Yan'an University between August 2023 and August 2024. The cases were categorized into severe (n=160) and mild (n=200) groups based on disease severity. Independent risk factors were identified through univariate and multivariate logistic regression analyses. Seven machine learning algorithms, including CatBoost, XGBoost, LightGBM, SVM, KNN, LR, and GNB, were employed to construct the prediction model. The dataset was randomly split into training (70%) and test (30%) sets for model development and evaluation. Model performance metrics such as accuracy, precision, recall, F1 score, and area under the receiver operating characteristic curve (AUC) were used. SHAP values were utilized for model interpretation and visualization of the optimal model. Results: Multivariate analysis identified fever days, abdominal pain, elevated CRP, elevated plasma D-dimer, and plastic sputum thrombus formation as independent risk factors for severe pneumonia in children (all P<.05). Among the seven machine learning models assessed, Support Vector Machine (SVM) demonstrated superior performance on the test set, achieving an AUC value of .906, accuracy of .843, and F1 score of .817. SHAP analysis indicated that the number of days with fever was the most influential feature for model predictions, followed by D-dimer and CRP levels. Conclusion: The SVM machine learning model, utilizing fever, abdominal pain, CRP, D-dimer, and plastic sputum thrombus, effectively predicts the risk of severe pneumonia in children. Furthermore, the model exhibits good interpretability through the SHAP framework, facilitating the early identification of high-risk children. However, further validation of the model is warranted using multi-center, large-sample external data. Biological sciences/Computational biology and bioinformatics Health sciences/Diseases Health sciences/Health care Health sciences/Medical research Health sciences/Risk factors Severe pneumonia in children Machine learning Prediction model Support Vector Machine (SVM) Risk factors SHAP analysis Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Under Review Version 1 posted Reviewers agreed at journal 06 Apr, 2026 Reviewers invited by journal 01 Apr, 2026 Editor invited by journal 12 Sep, 2025 Editor assigned by journal 11 Sep, 2025 Submission checks completed at journal 10 Sep, 2025 First submitted to journal 09 Sep, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7575261","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":618345714,"identity":"44e59dc0-b506-4339-aeca-f517c35988b1","order_by":0,"name":"Shuai Yu","email":"","orcid":"","institution":"Affiliated Hospital of Yan’an University","correspondingAuthor":false,"prefix":"","firstName":"Shuai","middleName":"","lastName":"Yu","suffix":""},{"id":618345715,"identity":"eae50502-4925-4f33-9add-5c646a66452d","order_by":1,"name":"Zhengfeng Xue","email":"","orcid":"","institution":"Affiliated Hospital of Yan’an University","correspondingAuthor":false,"prefix":"","firstName":"Zhengfeng","middleName":"","lastName":"Xue","suffix":""},{"id":618345716,"identity":"f9de6b99-cc37-430a-817e-d7936404bce1","order_by":2,"name":"Qing Liu","email":"","orcid":"","institution":"Affiliated Hospital of Yan’an University","correspondingAuthor":false,"prefix":"","firstName":"Qing","middleName":"","lastName":"Liu","suffix":""},{"id":618345717,"identity":"be7904bf-0a0a-44fb-9cc7-abe6ebb45d09","order_by":3,"name":"Yaya Ren","email":"","orcid":"","institution":"Affiliated Hospital of Yan’an University","correspondingAuthor":false,"prefix":"","firstName":"Yaya","middleName":"","lastName":"Ren","suffix":""},{"id":618345718,"identity":"06ff98a6-36c4-41fd-8a57-10008b80408a","order_by":4,"name":"Xue Zhou","email":"","orcid":"","institution":"Affiliated Hospital of Yan’an University","correspondingAuthor":false,"prefix":"","firstName":"Xue","middleName":"","lastName":"Zhou","suffix":""},{"id":618345719,"identity":"56df47c7-daa8-42f4-9215-b02c503522ad","order_by":5,"name":"Yuanxia Li","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAyElEQVRIiWNgGAWjYBACefnHxz9IVNjIsbE3EKnFsCEtjcHiTJoxH88BYq05kGPGUNlyKHGeRAKROhgbjqU9uNlwwJhN8vHGGww1NtEEtbAzNh83nLnjjhybdFqxBcOxtNwGgrY0syVIS555ZswmnWMmwdhwmLAWhmM8BtJ/2w4ntkmeIVbLGR4zCUmQFgkeIrUYzmBLNpAABjIbD9AvCcT4RV6C+eADUFTKtx/eeONDjQ0RDkMCBkRHDZIWUnWMglEwCkbByAAAn8Y/8kBew44AAAAASUVORK5CYII=","orcid":"","institution":"Affiliated Hospital of Yan’an University","correspondingAuthor":true,"prefix":"","firstName":"Yuanxia","middleName":"","lastName":"Li","suffix":""}],"badges":[],"createdAt":"2025-09-09 15:08:43","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7575261/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7575261/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":106404673,"identity":"c7e8ef02-de1e-44c0-849e-f95bc1de12dd","added_by":"auto","created_at":"2026-04-08 09:16:30","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1184383,"visible":true,"origin":"","legend":"","description":"","filename":"Manuscript2025.09.10.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7575261/v1_covered_510fc6d6-cbd9-484f-8271-78a2557c1bca.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Construction of Prediction Model for Severe Pneumonia in Children Based on Machine Learning","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Severe pneumonia in children, Machine learning, Prediction model, Support Vector Machine (SVM), Risk factors, SHAP analysis","lastPublishedDoi":"10.21203/rs.3.rs-7575261/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7575261/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eBackground:Pneumonia stands as the primary infectious disease and leading cause of mortality in children under 5 years old globally. The vulnerability to delayed diagnosis and treatment of severe pneumonia (SP) in children arises from the underdeveloped respiratory and immune systems, coupled with challenges in symptom expression. Consequently, SP can instigate diverse systemic complications, posing a grave threat to children's well-being and escalating societal and economic burdens. Presently, clinical tools for assessing pneumonia severity in children exhibit limitations in sensitivity, specificity, and inter-observer consistency. Furthermore, artificial intelligence research in pediatric pneumonia significantly trails behind advancements in adult pneumonia.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eObjective: This study aimed to develop a machine learning-based prediction model for early identification and intervention of severe pneumonia in children to support clinical decision-making.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eMethods: A retrospective analysis was conducted on 360 pneumonia cases admitted to the Affiliated Hospital of Yan'an University between August 2023 and August 2024. The cases were categorized into severe (n=160) and mild (n=200) groups based on disease severity. Independent risk factors were identified through univariate and multivariate logistic regression analyses. Seven machine learning algorithms, including CatBoost, XGBoost, LightGBM, SVM, KNN, LR, and GNB, were employed to construct the prediction model. The dataset was randomly split into training (70%) and test (30%) sets for model development and evaluation. Model performance metrics such as accuracy, precision, recall, F1 score, and area under the receiver operating characteristic curve (AUC) were used. SHAP values were utilized for model interpretation and visualization of the optimal model.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eResults: Multivariate analysis identified fever days, abdominal pain, elevated CRP, elevated plasma D-dimer, and plastic sputum thrombus formation as independent risk factors for severe pneumonia in children (all P\u0026lt;.05). Among the seven machine learning models assessed, Support Vector Machine (SVM) demonstrated superior performance on the test set, achieving an AUC value of .906, accuracy of .843, and F1 score of .817. SHAP analysis indicated that the number of days with fever was the most influential feature for model predictions, followed by D-dimer and CRP levels.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eConclusion: The SVM machine learning model, utilizing fever, abdominal pain, CRP, D-dimer, and plastic sputum thrombus, effectively predicts the risk of severe pneumonia in children. Furthermore, the model exhibits good interpretability through the SHAP framework, facilitating the early identification of high-risk children. However, further validation of the model is warranted using multi-center, large-sample external data.\u003c/p\u003e","manuscriptTitle":"Construction of Prediction Model for Severe Pneumonia in Children Based on Machine Learning","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-04-07 20:40:58","doi":"10.21203/rs.3.rs-7575261/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"reviewerAgreed","content":"201703223321906657705405408766831125422","date":"2026-04-06T15:24:51+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2026-04-01T15:17:19+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2025-09-12T08:07:34+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-09-11T06:11:48+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-09-10T14:55:58+00:00","index":"","fulltext":""},{"type":"submitted","content":"Scientific Reports","date":"2025-09-09T15:04:23+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"c3fbdfd1-a879-4193-9f72-f7f7eb46dda5","owner":[],"postedDate":"April 7th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":65798358,"name":"Biological sciences/Computational biology and bioinformatics"},{"id":65798359,"name":"Health sciences/Diseases"},{"id":65798360,"name":"Health sciences/Health care"},{"id":65798361,"name":"Health sciences/Medical research"},{"id":65798362,"name":"Health sciences/Risk factors"}],"tags":[],"updatedAt":"2026-04-07T20:40:58+00:00","versionOfRecord":[],"versionCreatedAt":"2026-04-07 20:40:58","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7575261","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7575261","identity":"rs-7575261","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-05-20T11:00:21.680559+00:00
License: CC-BY-4.0