Implementing Symptom-Based Predictive Models for Early Diagnosis of Pediatric Respiratory Viral Infections | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Implementing Symptom-Based Predictive Models for Early Diagnosis of Pediatric Respiratory Viral Infections Antoni Soriano-Arandes, Cristina Andrés, Aida Perramon-Malavez, and 27 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-5490724/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Introduction: Respiratory viral infections, including SARS-CoV-2, respiratory syncytial virus (RSV), influenza, rhinovirus, and adenovirus, are major causes of acute respiratory illness (ARI) in children. Symptom-based predictive models are valuable tools for expediting diagnoses, particularly in primary care settings. This study assessed the effectiveness of machine learning-based models in estimating infection probabilities for these common pediatric respiratory viruses using symptom data. Methods: Data were collected from 868 children with ARI symptoms evaluated across 14 primary care centers, members of COPEDICAT (Coronavirus Pediatria Catalunya), from October 2021 to October 2023. Random Forest and Boosting models with 10-fold cross-validation were used, applying SMOTE-NC to address class imbalance. Model performance was evaluated via area under the curve (AUC), sensitivity, specificity, and Shapley Additive exPlanations (SHAP) values for feature importance. Results: The model performed best for RSV (AUC: 0.81, sensitivity: 0.64, specificity: 0.77) and effectively ruled out SARS-CoV-2 based on symptom absence, such as crackles and wheezing. Predictive performance was lower for non-enveloped viruses like rhinovirus and adenovirus, due to their nonspecific symptom profiles. SHAP analysis identified key symptoms patterns for each virus. Conclusions: The study demonstrated that symptom-based predictive models effectively identify pediatric respiratory infections, with notable accuracy for RSV, SARS-CoV-2 and influenza. Health sciences/Health care Health sciences/Health care/Paediatrics Health sciences/Health care/Paediatrics/Paediatric research Biological sciences/Computational biology and bioinformatics/Machine learning pediatric care symptom-based predictive modeling respiratory virus infections acute respiratory infection triage Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Introduction Respiratory viral infections are a predominant cause of acute respiratory infections (ARI), particularly during peak seasons when several pathogens circulate simultaneously [1]. This is remarkably significant in children, whose underdeveloped immune systems lead to specific vulnerabilities and responses to infections [2]. ARIs remain the most common cause of illnesses, hospitalizations, and mortality in the pediatric population, leading to a considerable economic burden on families and society [3]. Among the most prevalent viruses affecting the pediatric population are SARS-CoV-2, respiratory syncytial virus (RSV), influenza virus, rhinovirus, and adenovirus. These viral infections often exhibit overlapping seasonal patterns and symptoms – such as fever, cough, or wheezing – making clinical diagnosis particularly challenging if targeted testing is not performed [4]. During the COVID-19 pandemic, the strain placed on healthcare systems made especially difficult to provide comprehensive diagnostic testing for all symptomatic patients [5]. Children were particularly impacted as they often exhibited mild or atypical symptoms [6]. In this context, given the similar clinical presentation of different respiratory viruses, symptom-based predictive models emerged as a promising approach to expedite the diagnostic process in the clinical setting. These models function as pre-diagnostic tools, helping clinicians assess the likelihood of specific infections even before confirmatory testing is available [7]. By predicting the most likely pathogen based on symptom presentation, such models enable informed clinical decisions, even in the absence of immediate diagnostic confirmation [8,9]. Nevertheless, these tools are mainly designed for use in emergency settings rather than primary care [10], where immediate and accurate identification of the causative pathogen can improve patient outcomes and reduce healthcare costs [11]. This study aims to assess the effectiveness of a previously validated symptom-based predictive model for pediatric SARS-CoV-2 [12], in determining the risk of infection for five common pediatric respiratory viruses (SARS-CoV-2, RSV, influenza, rhinovirus, and adenovirus). Additionally, it explores the distinct symptom signatures associated with each virus, providing insights into their diagnostic utility. Methods Study design and sample collection The study was conducted across 14 primary care centers within the COPEDICAT (Coronavirus Pediatria Catalunya) network from October 2021 to October 2023. Following a structured sampling protocol we randomly enrolled the first two patients with suspected ARI per week from each center and obtained a mid-turbinate swab sample from each participant. Samples were sent to the reference laboratory (Hospital Vall d’Hebron) exclusively for research purposes. To increase the sample size, we also included children that tested positive for influenza A-B, RSV, SARS-CoV-2, rhinovirus, or adenovirus by rapid diagnostic testing from primary care centers. Inclusion criteria The study included children based on three clinical criteria: (1) bronchiolitis, defined as the first episode of respiratory infection in children under 2 years, presenting with fever, rhinorrhea, cough, and crackles/or wheezing on lung auscultation; (2) fever, defined by fever of 38ºC or higher, lasting more than 24 hours in previously healthy children aged 3 months to 2 years, with a stable general condition and normal physical examination; and (3) influenza-like illness, indicated by a fever of 38ºC or higher, along with one or more of the following symptoms: myalgia, headache, general malaise, gastrointestinal symptoms, cough, rhinorrhea and/or odynophagia. Ethical Considerations The study was approved by the ethical committee of coordinating center with the number of expedient PR(AMI)40/2021. All procedures followed the ethical standards outlined in the Declaration of Helsinki and Good Clinical Practice guidelines. Informed consent was obtained from all the legal guardian(s) of the subjects included in the study. Data collection and processing Clinical signs and symptoms were collected from medical records from each participant into a study-designed secured database (Redcap®). Data was then curated to ensure accuracy, including correcting inconsistencies and addressing missing data with automated machine learning techniques. Symptoms with less than 25% missing values were imputed, while those exceeding this threshold were removed ( Figure S1 ). The imputation used scikit-learn's IterativeImputer, iterating over three rounds with five neighbors, assuming feature ordinality, to enhance data completeness. A composite binary variable indicated the presence of a specific respiratory virus (SARS-CoV-2, RSV, influenza A-B, rhinovirus, adenovirus), marking patients positive if they had a confirmatory PCR, antigen, or lab test. Influenza strains were consolidated into “Flu (A+B)” due to limited samples, and each virus was modeled independently due to the low prevalence of co-infections ( Figure S2 ). Modeling approach For each virus, the dataset was divided into training and testing sets using 10-fold stratified cross-validation to preserve the proportion of positive and negative cases. Two machine learning models, Random Forest and Boosting, were employed to predict the likelihood of a positive diagnosis. The data was split into two subsets: 30% for training or derivation and 70% for validation. To address class imbalance, the SMOTE-NC algorithm was applied during training, balancing both categorical and numerical features without affecting the test set. Age was calculated by subtracting date of symptom onset from birth date. Each model underwent hyperparameter tuning via grid search, optimizing parameters based on the highest AUC (area under the ROC curve) on the test set for enhanced performance. Metrics such as AUC, accuracy, kappa, sensitivity, specificity, positive/negative predictive value, prevalence, detection rate, detection prevalence, and balanced accuracy were calculated for each viral infection, as detailed in the Supplementary Material. SHAP value analysis Following model training and selection, SHAP (Shapley Additive Explanations) values were calculated for each model to assess the marginal contribution of individual symptoms to the prediction of a positive outcome. For each virus, SHAP bar plots and beeswarm plots were generated to visualize the importance of each feature in predicting the viral infection. Software and statistical analysis All data analysis and machine learning processes were performed using R software (version 4.4.1). A significance level of 5% was used for all statistical tests, ensuring that the results were statistically robust. Results Dataset of the Symptoms and Respiratory Viral Infections Present in the Population We included in the analyses detailed clinical data from 868 children with a viral ARI. Among confirmed respiratory viral infections, influenza (A + B) and RSV were most prevalent, with 39.10% (n=389) and 15.48% (n=154) positive cases, respectively. In contrast, rhinovirus and SARS-CoV-2 were less common, affecting 10.35% and 9.55% of the population, respectively. The range of signs and symptoms and infection statuses used for training and evaluating the predictive model is summarized in Table S1 . Fever (90.05%), cough (82.11%), and nasal congestion (81.21%) were widely reported, whereas more severe clinical presentations such as seizures (0.20%), apnea or shock (0.00%) were rare or not identified. The statement is that symptom distribution helps to inform the predictive model, though less frequent infections contribute less to overall model performance. Symptom-Based Model Performance Accuracy in the Detection of Respiratory Viruses The symptom-based model for detecting respiratory viruses ( Table 1 ), demonstrated solid performance across the different pathogens studied, with RSV achieving the highest AUC (0.81), reflecting a strong balance between sensitivity (0.64) and specificity (0.77). Also, RSV had the highest balanced accuracy (0.71), reflecting a superior diagnostic capability compared to the other viruses. COVID-19 and influenza also showed solid results, with AUCs of 0.71 and 0.70, respectively, translating to balanced accuracy scores of 0.64 for COVID-19 and 0.65 for influenza, indicating acceptable and stable diagnostic performance for both infections. Although the sensitivity and specificity of COVID-19 (both at 0.64) are not as high as RSV, the model still achieves suitable diagnostic precision. Furthermore, while the positive predictive value for COVID-19 is low (0.12) due to its lower prevalence, this is balanced by a high negative predictive value of 0.96, which is clinically significant for ruling out the disease. Conversely, rhinovirus and adenovirus presented lower AUCs (0.62 and 0.69, respectively), suggesting moderate diagnostic capability for these infections. Although their sensitivities are lower (0.50 and 0.57, respectively), the model achieved strong specificity (0.62 for rhinovirus and 0.69 for adenovirus), indicating that it is accurate in identifying negative cases of these viruses, particularly adenovirus. The balanced accuracy for these viruses (0.56 for rhinovirus and 0.63 for adenovirus) demonstrates that, while the model is less robust for these infections, it still provides valuable diagnostic information for clinical decision-making. Virus-Specific Predictive Symptoms for Respiratory Infections The analysis of virus-specific predictive clinical presentation provided insights into how different signs and symptoms correlated with the diagnosis of each respiratory virus. As showed in Figure 1A , the mean SHAP values demonstrated that the most significant symptom associated with the absence of SARS-CoV-2 infection were crackles (0.0558), cough (0.0532), fever >39ºC (0.0342), abdominal pain (0.0253), and conjunctivitis (0.0193). Other symptoms, such as wheezing (0.0150), croup (0.0124), and diarrhea (0.0111) were also linked to the absence of SARS-CoV-2 infection, but with less intensity. Interestingly, low grade fever 37-38ºC showed a moderate association with SARS-CoV-2 ARI (0.0162) but emerged as the key indicator among all signs and symptoms analyzed ( Figure 1A and 1B ). Wheezing (0.0876) and crackles (0.0742) emerged as the strongest predictors of RSV ARI, followed by low grade fever 37-38 ºC (0.0455) ( Figures 2A and 2B ). In contrast, high fever (>39º) was the principal symptom related with the absence of RSV (0.0462) ( Figures 2A and 2B ). Regarding influenza (A+B), fatigue was identified as the symptom most strongly associated with influenza diagnosis (0.0700), followed by cough (0.0315) and high fever (>39ºC) (0.0246) ( Figures 3A and 3B ). However, wheezing (0.0657), low grade fever (37-38ºC) (0.0523), crackles (0.0436), and conjunctivitis (0.0275), were associated with the absence of influenza ARI ( Figures 3A and 3B ). For the prediction of rhinovirus ARI, cough (0.0545), nasal congestion (0.0524) and to a lesser extent low grade fever 37-38ºC (0.0020) were the most relevant symptoms ( Figure 4 ). Conversely, high fever (>39º) (0.0815), fatigue (0.0582), vomiting (0.0354), and crackles (0.0190) pointed to the unlikely diagnosis of rhinovirus ARI, as showed in Figures 4A and 4B . Adenovirus symptom analysis revealed that high fever (>39ºC) was the only symptom that exhibited a high SHAP (0.0486) and significant number of patients showing this symptom as represented the high density of black dots ( Figure 5 ). Strikingly, the model stands out by its predictive potential ruling out the virus presence, as reported by the connection between fever (37-38) (0.0754), crackles (0.0572), vomiting (0.0540), fever (38-39) (0.0521), fatigue (0.0422) and wheezing (0.0322) high SHAPs and the absence of adenovirus ( Figure 5 ). Discussion This study highlights the utility of symptom-based predictive models in pediatric primary care for diagnosing specific respiratory viral infections, with the potential of facilitating their targeted management, and overall improving clinical outcomes [7,11]. Such models are especially valuable during outbreaks or novel virus scenarios, where rapid symptom assessment can guide timely interventions and reduce reliance on confirmatory tests [14]. In this context, integrating a symptom-based predictive model into routine clinical practice appears highly promising [7]. In pediatric populations, where symptoms are often ambiguous and may evolve rapidly [7], such a tool can support healthcare providers, mainly in primary care centers, in making prompt, data-driven decisions [13]. Given that ARIs account for 50% of pediatric consultations and result in 1.2 million hospital admissions annually [15], the high rate of hospitalizations and intensive care demands for these patients impose a substantial economic burden on healthcare systems. The development of predictive models focused on early and accurate diagnosis is therefore crucial, as they can enhance management, reduce costs, and improve patient care [11]. Our model addresses this need by enabling timely identification of infections and demonstrating notable accuracy in detecting key pathogens responsible for ARI in children, especially in settings where access to highly sensitive or rapid diagnostic tests is limited. Remarkably, the model demonstrated effectiveness in identifying infections not only through specific signs and symptoms but also by recognizing the lack of certain symptoms, an aspect that is equally valuable for clinical decision-making. For RSV, the model achieved an AUC of 0.81, with sensitivity at 0.64 and specificity at 0.77, showing a strong balance. This suggests it effectively detects RSV when symptoms like wheezing or lack of high fever are present. The high AUC reflects the model's capability to distinguish RSV from other viral infections based on symptom patterns, offering reliable diagnostic utility in primary care settings. Additionally, the model displayed solid accuracy in detecting influenza, achieving an AUC of 0.70, further demonstrating its effectiveness in identifying common pathogens causing ARIs. In SARS-CoV-2 infections, the absence of symptoms such as crackles or wheezing was found to be a significant predictor of non-infection. The model’s performance was also influenced by the clinical presentation of certain viruses, showing better predictive capabilities for those with more robust manifestations, such as enveloped viruses like influenza, SARS-CoV-2, and RSV, which affect both the upper and lower respiratory tract [16,17]. These viruses are often associated with more severe clinical outcomes, such as those related to pneumopathies [16,17], and the model demonstrated particular accuracy in detecting these viruses due to their distinct clinical signatures. The ability of our predictive model to accurately detect viruses that cause more severe symptoms is a significant achievement, as these pathogens incur higher economic costs. This is especially relevant for RSV, which is associated with annual hospitalization expenses of up to € 87.1 million just in Spain [18]. Additionally, accurate detection of viral infections is particularly relevant, as the emergence of SARS-CoV-2 has not only increased healthcare system costs but also disrupted the seasonal patterns of respiratory pathogens, making the diagnosis of viral ARIs challenging [19-20]. Conversely, non-enveloped viruses like rhinovirus and adenovirus exhibited lower AUC values of 0.62 and 0.69, respectively. These results reflect that these non-enveloped viruses tend to cause milder, more localized infections, often confined to the upper respiratory tract [21,22]. Their symptoms are less specific, typically presented as nasal congestion, pharyngitis, and cough [22,23]. As a result, the model displayed lower diagnostic accuracy for this group of viruses, relying on the absence of severe symptoms, such as high fever or systemic manifestations, to make predictions. By assessing both specific and nonspecific symptoms, the model provides clinicians with valuable insights for tailoring more effective interventions [24,25]. Building on this foundation, this predictive model demonstrates significant potential in early diagnosis of pediatric respiratory viral infections by analyzing symptom patterns. Its dual approach - leveraging both the presence and absence of symptoms - enhances diagnostic accuracy for infections like SARS-CoV-2, RSV, and influenza, which often require urgent intervention. This capability becomes even more important when considering future emergent viruses, where symptomatology will initially be the primary tool for diagnosis before specific tests are developed and deployed, as was seen during the initial stages of the COVID-19 pandemic. Moreover, the model’s application during high-demand periods could mitigate healthcare system overload by reducing unnecessary testing and hospitalizations. However, the model also presented limitations, such as the reduced accuracy for viruses with mild symptoms, like rhinovirus or adenovirus, due to nonspecific clinical presentations. Its performance may also have been limited by the sample size analyzed, preventing the model from offering accurate predictions across different age groups. The model also depends on accurate symptom reporting, which can vary due to clinician judgment and patient descriptions, especially in pediatric cases, potentially affecting predictive accuracy. In addition, the model did not consider the possibility of viral co-detections which may have been associated with overlapping signs and symptoms. Thus, while the model supports clinical decision-making, confirmatory testing remains necessary to ensure accurate diagnoses. In summary, the model effectively balances sensitivity and specificity for pathogens like SARS-CoV-2, RSV, and influenza, making it a valuable diagnostic tool for pediatric care. Its adaptability to different clinical environments, including resource-limited primary care settings, further underscores its practical utility. Additionally, the model’s reliance on readily available clinical data makes it an accessible and scalable solution for improving diagnostic accuracy. Its incorporation into routine practice could not only strengthen healthcare responses, but also contribute to improved readiness for future pandemics. Declarations Competing Interests: The authors declare no competing interests. Funding: This study was funded by the Fundació La Marató tv3 with the number of expedient 202134-30-31. Authors’ Contributions: C.P. and A.S.-A. were the principal investigators of the project, responsible for the study design and conceptualization. The primary care pediatricians, co-authors of the article, provided the essential clinical data for the analysis. A.A and C.A processed the samples to identify respiratory viruses. All authors contributed to the manuscript's preparation and approved the final version for submission. Acknowledgments: To all the children and their families contributing with the samples and clinical data. To Datexbio (Laura Muñoz and Marc Pastor) for their thorough contributions to data analysis and clinical predictive model building. Data availability: The datasets created and/or analyzed in this study can be obtained from the corresponding author upon reasonable request. References Litwin, C. M. & Bosley, J. G. Seasonality and prevalence of respiratory pathogens detected by multiplex PCR at a tertiary care medical center. Arch. Virol. 159 , 65–72 (2014). Pieren, D. K. J., Boer, M. C. & de Wit, J. The adaptive immune system in early life: The shift makes it count. Front. Immunol. 13 , 1031924 (2022). Zhu, G. et al. Epidemiological characteristics of four common respiratory viral infections in children. Virol. J. 18 , 10 (2021). Debiaggi, M., Canducci, F., Ceresola, E. R. & Clementi, M. The role of infections and coinfections with newly identified and emerging respiratory viruses in children. Virol. J. 9 , 247 (2012). Filip, R., Gheorghita Puscaselu, R., Anchidin-Norocel, L., Dimian, M. & Savage, W. K. Global challenges to public health care systems during the COVID-19 pandemic: A review of pandemic measures and problems. J. Pers. Med. 12 , 1295 (2022). Goyal, R. & Sharma, R. Pediatric COVID: How is it different from adults? Shearah, Z., Ullah, Z. & Fakieh, B. Intelligent framework for early detection of severe pediatric diseases from mild symptoms. Diagnostics 13 , 3204 (2023). Ramírez Varela, A. et al. Prediction of SARS-CoV-2 infection with a symptoms-based model to aid public health decision making in Latin America and other low and middle income settings. Prev. Med. Rep. 27 , 101798 (2022). Tso, C. F., Lam, C., Calvert, J. & Mao, Q. Machine learning early prediction of respiratory syncytial virus in pediatric hospitalized patients. Front. Pediatr. 10 , 886212 (2022). Goto, T., Camargo, C. A. Jr, Faridi, M. K., Freishtat, R. J. & Hasegawa, K. Machine learning-based prediction of clinical outcomes for children during emergency department triage. JAMA Netw. Open 2 , e186937 (2019). Barenfanger, J., Drake, C., Leon, N., Mueller, T. & Troutt, T. Clinical and financial benefits of rapid detection of respiratory viruses: An outcomes study. J. Clin. Microbiol. 38 , 2824–2828 (2000). Antoñanzas, J. M. et al. Symptom-based predictive model of COVID-19 disease in children. Viruses 14 , 63 (2021). Ramgopal, S., Sanchez-Pinto, L. N., Horvat, C. M., Carroll, M. S., Luo, Y. & Florin, T. A. Artificial intelligence-based clinical decision support in pediatrics. Pediatr. Res. 93 , 334–341 (2023). Petric, M., Comanor, L. & Petti, C. A. Role of the laboratory in diagnosis of influenza during seasonal epidemics and potential pandemics. J. Infect. Dis. 194 , S98–S110 (2006). Dagne, H., Andualem, Z., Dagnew, B. & Taddese, A. A. Acute respiratory infection and its associated factors among children under-five years attending pediatrics ward at University of Gondar Comprehensive Specialized Hospital, Northwest Ethiopia: Institution-based cross-sectional study. BMC Pediatr. 20 , 93 (2020). de Gabory, L., Alharbi, A., Kérimian, M. & Lafon, M. E. The influenza virus, SARS-CoV-2, and the airways: Clarification for the otorhinolaryngologist. Eur. Ann. Otorhinolaryngol. Head Neck Dis. 137 , 291–296 (2020). Lee, C. Y. F. et al. Respiratory syncytial virus prevention: A new era of vaccines. Cureus 15 , e45012 (2023). Gea-Izquierdo, E., Gil-Prieto, R., Hernández-Barrera, V. & Gil-de-Miguel, Á. Respiratory syncytial virus-associated hospitalization in children aged <2 years in Spain from 2018 to 2021. Hum. Vaccin. Immunother. 19 , 2231818 (2023). Carrera-Hueso, F. J. et al. Hospitalization budget impact during the COVID-19 pandemic in Spain. Health Econ. Rev. 11 , 43 (2021). Brañas, P. et al. Dynamics of respiratory viruses other than SARS-CoV-2 during the COVID-19 pandemic in Madrid, Spain. Influenza Other Respir. Viruses 17 , e13199 (2023). Esneau, C., Bartlett, N. & Bochkov, Y. Rhinovirus structure, replication, and classification. In Rhinovirus Infections (Elsevier, 2019). Lynch, J. P. III & Kajon, A. E. Adenovirus: Epidemiology, global spread of novel serotypes, and advances in treatment and prevention. Semin. Respir. Crit. Care Med. 37 , 586–602 (2016). Vandini, S., Biagi, C., Fischer, M. & Lanari, M. Impact of rhinovirus infections in children. Viruses 11 , 521 (2019). Challener, D. W., Dowdy, S. C. & O’Horo, J. C. Analytics and prediction modeling during the COVID-19 pandemic. Mayo Clin. Proc. 95 , S8–S10 (2020). Feng, T. et al. Machine learning-based clinical decision support for infection risk prediction. Front. Med. 10 , 1213411 (2023). Table Table 1. Parameters evaluated by the model for each specific viral infection. Metric SARS-CoV-2 VRS Influenza Rhinovirus Adenovirus AUC 0.71 0.81 0.70 0.62 0.69 Accuracy 0.64 0.76 0.63 0.61 0.68 Kappa 0.09 0.27 0.27 0.04 0.13 Sensitivity 0.64 0.64 0.70 0.50 0.57 Specificity 0.64 0.77 0.59 0.62 0.69 Posisitve Predicted Value 0.12 0.29 0.50 0.09 0.17 Negative Predicted Value 0.96 0.94 0.77 0.94 0.94 Prevalence 0.08 0.13 0.37 0.07 0.10 Detection Rate 0.05 0.08 0.26 0.03 0.06 Detection Prevalence 0.38 0.28 0.52 0.39 0.33 Balanced Accuracy 0.64 0.71 0.65 0.56 0.63 Additional Declarations No competing interests reported. Supplementary Files Supplementarymaterials29112024.docx Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-5490724","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":403304689,"identity":"2e5a33bc-c797-40d8-ad7d-f4394eda8d8e","order_by":0,"name":"Antoni Soriano-Arandes","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABB0lEQVRIiWNgGAWjYLCCBBiDh4FBDkQfeIBHNQ+6FmOwlgTsqhFakHmJDSiGYAH2/IePfXhQc0eOf/bhZw/eVNSlzw87/BBoi52cbgMOWyTSkmckHHtmLHEuzdxwzpnDuRtvpxkAtSQbmx3ApYXHmCGB7XBiwxkGM2netgO5G2cngLQcSNyGSwv/GaCWf4cT559h/ybN+68u3XB2+gf8WhhyjBkS2w4nbjjDA7SlgTlBXjqHgC030pIZEvsOGxue4SmTnHPssOEG6ZyCAwkGuP3C3n/4MOOPb4fl5M6wb5N4U1MnLz87ffOHDxV2cri0YAIDsEoDYpWDgHwDKapHwSgYBaNgJAAAlwpgSjrVLycAAAAASUVORK5CYII=","orcid":"","institution":"Vall d'Hebron Hospital Universitari","correspondingAuthor":true,"prefix":"","firstName":"Antoni","middleName":"","lastName":"Soriano-Arandes","suffix":""},{"id":403304690,"identity":"d6972c71-84dd-4790-b782-0e2f558d9cd1","order_by":1,"name":"Cristina Andrés","email":"","orcid":"","institution":"Vall d'Hebron Hospital Universitari","correspondingAuthor":false,"prefix":"","firstName":"Cristina","middleName":"","lastName":"Andrés","suffix":""},{"id":403304691,"identity":"ff76a063-2a23-45e1-ab90-52daefc855c4","order_by":2,"name":"Aida Perramon-Malavez","email":"","orcid":"","institution":"Universitat Politècnica de Catalunya","correspondingAuthor":false,"prefix":"","firstName":"Aida","middleName":"","lastName":"Perramon-Malavez","suffix":""},{"id":403304692,"identity":"4002fe76-42e4-4e28-b249-d021cc038fb4","order_by":3,"name":"Anna Creus-Costa","email":"","orcid":"","institution":"Vall d'Hebron Hospital Universitari","correspondingAuthor":false,"prefix":"","firstName":"Anna","middleName":"","lastName":"Creus-Costa","suffix":""},{"id":403304693,"identity":"3e1efae3-8a84-4538-89fc-c5cb304734a2","order_by":4,"name":"Anna Gatell","email":"","orcid":"","institution":"Equip Pediatria Territorial Garraf","correspondingAuthor":false,"prefix":"","firstName":"Anna","middleName":"","lastName":"Gatell","suffix":""},{"id":403304694,"identity":"4c373b23-f7a6-4d99-a5cf-8cf8b25580da","order_by":5,"name":"Ramona Martín-Martín","email":"","orcid":"","institution":"CAP Marià Fortuny","correspondingAuthor":false,"prefix":"","firstName":"Ramona","middleName":"","lastName":"Martín-Martín","suffix":""},{"id":403304695,"identity":"597af8e3-d5b6-409a-9644-c3e7937cfe60","order_by":6,"name":"Elisabet Solà-Segura","email":"","orcid":"","institution":"EAP Vic Nord","correspondingAuthor":false,"prefix":"","firstName":"Elisabet","middleName":"","lastName":"Solà-Segura","suffix":""},{"id":403304696,"identity":"25a116cd-2c10-4b1a-96e0-fa5579609dfe","order_by":7,"name":"Maria Teresa Riera-Bosch","email":"","orcid":"","institution":"EAP Vic Nord","correspondingAuthor":false,"prefix":"","firstName":"Maria","middleName":"Teresa","lastName":"Riera-Bosch","suffix":""},{"id":403304697,"identity":"0a0d887e-c1b7-4bac-9907-75077b8eb36c","order_by":8,"name":"Eduard Fernández","email":"","orcid":"","institution":"EAP Vic Nord","correspondingAuthor":false,"prefix":"","firstName":"Eduard","middleName":"","lastName":"Fernández","suffix":""},{"id":403304698,"identity":"6f1a5f1d-5bdf-4241-80aa-1064fcaadd7c","order_by":9,"name":"Mireia Biosca","email":"","orcid":"","institution":"EAP Les Borges Blanques","correspondingAuthor":false,"prefix":"","firstName":"Mireia","middleName":"","lastName":"Biosca","suffix":""},{"id":403304699,"identity":"bb024326-a50c-4f46-a2aa-0e0ae467e0dd","order_by":10,"name":"Ramon Capdevila","email":"","orcid":"","institution":"EAP Les Borges Blanques","correspondingAuthor":false,"prefix":"","firstName":"Ramon","middleName":"","lastName":"Capdevila","suffix":""},{"id":403304700,"identity":"3d5730d8-e954-4457-ae47-a4118f434a06","order_by":11,"name":"Almudena Sánchez","email":"","orcid":"","institution":"CAP Les Hortes","correspondingAuthor":false,"prefix":"","firstName":"Almudena","middleName":"","lastName":"Sánchez","suffix":""},{"id":403304701,"identity":"adfbbcac-e223-4a59-af7d-239a2a646291","order_by":12,"name":"Isabel Soler","email":"","orcid":"","institution":"EAP Navàs-Balsareny","correspondingAuthor":false,"prefix":"","firstName":"Isabel","middleName":"","lastName":"Soler","suffix":""},{"id":403304702,"identity":"66284859-6c71-4a70-a5ce-88d96d1a6c5e","order_by":13,"name":"Maria Chiné","email":"","orcid":"","institution":"CAP Almacelles","correspondingAuthor":false,"prefix":"","firstName":"Maria","middleName":"","lastName":"Chiné","suffix":""},{"id":403304703,"identity":"02cb31a9-cab9-4b5a-acf2-795dc8fd6e4d","order_by":14,"name":"Lidia Sanz","email":"","orcid":"","institution":"CAP Seròs","correspondingAuthor":false,"prefix":"","firstName":"Lidia","middleName":"","lastName":"Sanz","suffix":""},{"id":403304704,"identity":"c6822d13-f9b4-4195-a4af-b60b1cb475f7","order_by":15,"name":"Gabriela Quezada","email":"","orcid":"","institution":"CAP Marià Fortuny","correspondingAuthor":false,"prefix":"","firstName":"Gabriela","middleName":"","lastName":"Quezada","suffix":""},{"id":403304705,"identity":"338e7813-46d8-4a04-b2c5-d2455e2caf77","order_by":16,"name":"Sandra Pérez","email":"","orcid":"","institution":"CAP Barberà del Vallés","correspondingAuthor":false,"prefix":"","firstName":"Sandra","middleName":"","lastName":"Pérez","suffix":""},{"id":403304706,"identity":"019b0a82-5844-45ca-9d63-d33b9156db68","order_by":17,"name":"Dolors Canadell","email":"","orcid":"","institution":"CAP Barberà del Vallés","correspondingAuthor":false,"prefix":"","firstName":"Dolors","middleName":"","lastName":"Canadell","suffix":""},{"id":403304707,"identity":"e02c721f-44de-4cb2-85ae-dc53be13b0f0","order_by":18,"name":"Olga Salvadó","email":"","orcid":"","institution":"CAP Llibertat","correspondingAuthor":false,"prefix":"","firstName":"Olga","middleName":"","lastName":"Salvadó","suffix":""},{"id":403304708,"identity":"feaf9b1c-f6fb-4cea-9356-e36d287da3b4","order_by":19,"name":"Marisa Ridao","email":"","orcid":"","institution":"EAP Sant Vicenç dels Horts","correspondingAuthor":false,"prefix":"","firstName":"Marisa","middleName":"","lastName":"Ridao","suffix":""},{"id":403304709,"identity":"c75d3f15-3e7f-449c-a37d-e77104c32d9b","order_by":20,"name":"Imma Sau","email":"","orcid":"","institution":"EAP Santa Coloma de Farners","correspondingAuthor":false,"prefix":"","firstName":"Imma","middleName":"","lastName":"Sau","suffix":""},{"id":403304710,"identity":"967be309-8b83-426a-84c4-04cc7b34cda5","order_by":21,"name":"Mª Àngels Rifà","email":"","orcid":"","institution":"CAP Tona","correspondingAuthor":false,"prefix":"","firstName":"Mª","middleName":"Àngels","lastName":"Rifà","suffix":""},{"id":403304711,"identity":"17029c41-23aa-4cc4-8ad2-f74f2d3cf558","order_by":22,"name":"Esperança Macià","email":"","orcid":"","institution":"CAP Manlleu","correspondingAuthor":false,"prefix":"","firstName":"Esperança","middleName":"","lastName":"Macià","suffix":""},{"id":403304712,"identity":"c176c84d-0001-4704-94bd-4f8ea0bdc8dd","order_by":23,"name":"Sílvia Burgaya-Subirana","email":"","orcid":"","institution":"CAP Manlleu","correspondingAuthor":false,"prefix":"","firstName":"Sílvia","middleName":"","lastName":"Burgaya-Subirana","suffix":""},{"id":403304713,"identity":"b2612b14-e0f6-4f6a-bce3-42447ace76e5","order_by":24,"name":"Mònica Vila","email":"","orcid":"","institution":"EAP Horta","correspondingAuthor":false,"prefix":"","firstName":"Mònica","middleName":"","lastName":"Vila","suffix":""},{"id":403304714,"identity":"40e84536-a086-499a-a3ba-43cd2e278a4b","order_by":25,"name":"Jorgina Vila","email":"","orcid":"","institution":"Vall d'Hebron Hospital Universitari","correspondingAuthor":false,"prefix":"","firstName":"Jorgina","middleName":"","lastName":"Vila","suffix":""},{"id":403304715,"identity":"bf4b2ee9-98c1-4291-96a4-351f0f8b4d63","order_by":26,"name":"Asunción Mejias","email":"","orcid":"","institution":"St. Jude Children's Research Hospital","correspondingAuthor":false,"prefix":"","firstName":"Asunción","middleName":"","lastName":"Mejias","suffix":""},{"id":403304716,"identity":"76254107-60ba-479e-84b3-759b847f6b7a","order_by":27,"name":"Andrés Anton","email":"","orcid":"","institution":"Vall d'Hebron Hospital Universitari","correspondingAuthor":false,"prefix":"","firstName":"Andrés","middleName":"","lastName":"Anton","suffix":""},{"id":403304717,"identity":"289901e7-71f6-49e6-bbac-147159aae9c0","order_by":28,"name":"Pere Soler-Palacin","email":"","orcid":"","institution":"Vall d'Hebron Hospital Universitari","correspondingAuthor":false,"prefix":"","firstName":"Pere","middleName":"","lastName":"Soler-Palacin","suffix":""},{"id":403304718,"identity":"4dfcae79-6b45-4b8d-b2d6-97402645f419","order_by":29,"name":"Clara Prats","email":"","orcid":"","institution":"Universitat Politècnica de Catalunya","correspondingAuthor":false,"prefix":"","firstName":"Clara","middleName":"","lastName":"Prats","suffix":""}],"badges":[],"createdAt":"2024-11-20 12:08:23","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-5490724/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-5490724/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":74294114,"identity":"04c41342-92a8-4439-89d5-30dd0998516a","added_by":"auto","created_at":"2025-01-20 17:43:50","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":890252,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSHAP values for SARS-CoV-2 prediction. A. Mean SHAP value barplot. B. SHAP Beeswarm plot. \u003c/strong\u003eBlack points represent the presence of a symptom associated with the disease, and grey points represent their absence. Higher positive SHAP values indicate an increased probability of disease, illustrating the relationship between specific symptoms and the likelihood of SARS-CoV-2 infection.\u003c/p\u003e","description":"","filename":"Figure1.png","url":"https://assets-eu.researchsquare.com/files/rs-5490724/v1/4ae710e6f80e5b1a545ff03d.png"},{"id":74294116,"identity":"44a4c545-e63a-485d-a816-22fc43984c52","added_by":"auto","created_at":"2025-01-20 17:43:50","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":1016074,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSHAP values for VRS prediction. A. Mean SHAP value barplot. B. SHAP Beeswarm plot. \u003c/strong\u003eBlack points represent the presence of a symptom associated with the disease, and grey points represent their absence. Higher positive SHAP values indicate an increased probability of disease, illustrating the relationship between specific symptoms and the likelihood of VRS infection.\u003c/p\u003e","description":"","filename":"Figure2.png","url":"https://assets-eu.researchsquare.com/files/rs-5490724/v1/756814937bbdef0d5878e03e.png"},{"id":74294118,"identity":"c511ed88-02c9-404e-a3a6-97f85bafcc2d","added_by":"auto","created_at":"2025-01-20 17:43:50","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":933640,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSHAP values for Influenza (A+B) prediction. A. Mean SHAP value barplot. B. SHAP Beeswarm plot. \u003c/strong\u003eBlack points represent the presence of a symptom associated with the disease, and grey points represent their absence. Higher positive SHAP values indicate an increased probability of disease, illustrating the relationship between specific symptoms and the likelihood of Influenza (A+B) infection.\u003c/p\u003e","description":"","filename":"Figure3.png","url":"https://assets-eu.researchsquare.com/files/rs-5490724/v1/59187de220381f168175374b.png"},{"id":74294122,"identity":"94d64c74-d02c-40ed-bedd-62fe3c56f16f","added_by":"auto","created_at":"2025-01-20 17:43:50","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":895789,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSHAP values for Rhinovirus prediction. A. Mean SHAP value barplot. B. SHAP Beeswarm plot. \u003c/strong\u003eBlack points represent the presence of a symptom associated with the disease, and grey points represent their absence. Higher positive SHAP values indicate an increased probability of disease, illustrating the relationship between specific symptoms and the likelihood of Rhinovirus infection.\u003c/p\u003e","description":"","filename":"Figure4.png","url":"https://assets-eu.researchsquare.com/files/rs-5490724/v1/88896a93f20de7ed068571e4.png"},{"id":74294120,"identity":"0e2a85b2-187c-4a8c-8b5d-6c1f7c955dae","added_by":"auto","created_at":"2025-01-20 17:43:50","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":889157,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSHAP values for Adenovirus prediction. A. Mean SHAP value barplot. B. SHAP Beeswarm plot. \u003c/strong\u003eBlack points represent the presence of a symptom associated with the disease, and grey points represent their absence. Higher positive SHAP values indicate an increased probability of disease, illustrating the relationship between specific symptoms and the likelihood of Adenovirus infection.\u003c/p\u003e","description":"","filename":"Figure5.png","url":"https://assets-eu.researchsquare.com/files/rs-5490724/v1/6af9ecd2dec3b45047c13029.png"},{"id":74295668,"identity":"2209d1dd-e2ee-4409-8b7c-6ca35a0b05ae","added_by":"auto","created_at":"2025-01-20 18:07:54","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":5429267,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5490724/v1/3b5636d3-eeaa-4e18-ac83-f76ee49f976c.pdf"},{"id":74294113,"identity":"b8f208a7-6b4d-452e-8ecc-ce8f4c2986f1","added_by":"auto","created_at":"2025-01-20 17:43:50","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":67674,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementarymaterials29112024.docx","url":"https://assets-eu.researchsquare.com/files/rs-5490724/v1/7c6aecd707f5abf2e0e898ef.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"Implementing Symptom-Based Predictive Models for Early Diagnosis of Pediatric Respiratory Viral Infections","fulltext":[{"header":"Introduction ","content":"\u003cp\u003eRespiratory viral infections are a predominant cause of acute respiratory infections (ARI), particularly during peak seasons when several pathogens circulate simultaneously [1]. This is remarkably significant in children, whose underdeveloped immune systems lead to specific vulnerabilities and responses to infections [2]. ARIs remain the most common cause of illnesses, hospitalizations, and mortality in the pediatric population, leading to a considerable economic burden on families and society [3]. Among the most prevalent viruses affecting the pediatric population are SARS-CoV-2, respiratory syncytial virus (RSV), influenza virus, rhinovirus, and adenovirus. These viral infections often exhibit overlapping seasonal patterns and symptoms – such as fever, cough, or wheezing – making clinical diagnosis particularly challenging if targeted testing is not performed [4].\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eDuring the COVID-19 pandemic, the strain placed on healthcare systems made especially difficult to provide comprehensive diagnostic testing for all symptomatic patients [5].\u0026nbsp;Children were particularly impacted as they often exhibited mild or atypical symptoms\u0026nbsp;[6]. In this context, given the similar clinical presentation of different respiratory viruses, symptom-based predictive models emerged as a promising approach to expedite the diagnostic process in the clinical setting. These models function as pre-diagnostic tools, helping clinicians assess the likelihood of specific infections even before confirmatory testing is available [7]. By predicting the most likely pathogen based on symptom presentation, such models enable informed clinical decisions, even in the absence of immediate diagnostic confirmation [8,9]. Nevertheless, these tools are mainly designed for use in emergency settings rather than primary care [10], where immediate and accurate identification of the causative pathogen can improve patient outcomes and reduce healthcare costs [11].\u003c/p\u003e\n\u003cp\u003eThis study aims to assess the effectiveness of a previously validated symptom-based predictive model for pediatric SARS-CoV-2 [12], in determining the risk of infection for five common pediatric respiratory viruses (SARS-CoV-2, RSV, influenza, rhinovirus, and adenovirus). Additionally, it explores the distinct symptom signatures associated with each virus, providing insights into their diagnostic utility.\u0026nbsp;\u003c/p\u003e"},{"header":"Methods","content":"\u003cp\u003e\u003cem\u003eStudy design and sample collection\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eThe study was conducted across 14 primary care centers within the COPEDICAT (Coronavirus Pediatria Catalunya) network from October 2021 to October 2023. Following a structured sampling protocol we randomly enrolled the first two patients with suspected ARI per week from each center and obtained a mid-turbinate swab sample from each participant. Samples were sent to the reference laboratory (Hospital Vall d’Hebron) exclusively for research purposes. To increase the sample size, we also included children that tested positive for influenza A-B, RSV, SARS-CoV-2, rhinovirus, or adenovirus by rapid diagnostic testing from primary care centers.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eInclusion criteria\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eThe study included children based on three clinical criteria: (1) bronchiolitis, defined as the first episode of respiratory infection in children under 2 years, presenting with fever, rhinorrhea, cough, and crackles/or wheezing on lung auscultation; (2) fever, defined by fever of 38ºC or higher, lasting more than 24 hours in previously healthy children aged 3 months to 2 years, with a stable general condition and normal physical examination; and (3) influenza-like illness, indicated by a fever of 38ºC or higher, along with one or more of the following symptoms: myalgia, headache, general malaise, gastrointestinal symptoms, cough, rhinorrhea and/or odynophagia.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eEthical Considerations\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eThe study was approved by the ethical committee of coordinating center with the number of expedient PR(AMI)40/2021.\u0026nbsp;All procedures followed the ethical standards outlined in the Declaration of Helsinki and Good Clinical Practice guidelines. Informed consent was obtained from all the legal guardian(s) of the subjects included in the study.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eData collection and processing\u0026nbsp;\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eClinical signs and symptoms were collected from medical records from each participant into a study-designed secured database (Redcap®). Data was then curated to ensure accuracy, including correcting inconsistencies and addressing missing data with automated machine learning techniques. Symptoms with less than 25% missing values were imputed, while those exceeding this threshold were removed (\u003cstrong\u003eFigure S1\u003c/strong\u003e). The imputation used scikit-learn's IterativeImputer, iterating over three rounds with five neighbors, assuming feature ordinality, to enhance data completeness. A composite binary variable indicated the presence of a specific respiratory virus (SARS-CoV-2, RSV, influenza A-B, rhinovirus, adenovirus), marking patients positive if they had a confirmatory PCR, antigen, or lab test. Influenza strains were consolidated into “Flu (A+B)” due to limited samples, and each virus was modeled independently due to the low prevalence of co-infections (\u003cstrong\u003eFigure S2\u003c/strong\u003e).\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eModeling approach\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eFor each virus, the dataset was divided into training and testing sets using 10-fold stratified cross-validation to preserve the proportion of positive and negative cases. Two machine learning models, Random Forest and Boosting, were employed to predict the likelihood of a positive diagnosis. The data was split into two subsets: 30% for training or derivation and 70% for validation. To address class imbalance, the SMOTE-NC algorithm was applied during training, balancing both categorical and numerical features without affecting the test set. Age was calculated by subtracting date of symptom onset from birth date. Each model underwent hyperparameter tuning via grid search, optimizing parameters based on the highest AUC (area under the ROC curve) on the test set for enhanced performance. Metrics such as AUC, accuracy, kappa, sensitivity, specificity, positive/negative predictive value, prevalence, detection rate, detection prevalence, and balanced accuracy were calculated for each viral infection, as detailed in the Supplementary Material.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eSHAP value analysis\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eFollowing model training and selection, SHAP (Shapley Additive Explanations) values were calculated for each model to assess the marginal contribution of individual symptoms to the prediction of a positive outcome. For each virus, SHAP bar plots and beeswarm plots were generated to visualize the importance of each feature in predicting the viral infection.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eSoftware and statistical analysis\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eAll data analysis and machine learning processes were performed using R software (version 4.4.1). A significance level of 5% was used for all statistical tests, ensuring that the results were statistically robust.\u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003e\u003cstrong\u003e\u003cem\u003eDataset of the Symptoms and Respiratory Viral Infections Present in the Population\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe included in the analyses detailed clinical data from 868 children with a viral ARI. Among confirmed respiratory viral infections, influenza (A + B) and RSV were most prevalent, with 39.10% (n=389) and 15.48% (n=154) positive cases, respectively. In contrast, rhinovirus and SARS-CoV-2 were less common, affecting 10.35% and 9.55% of the population, respectively. The range of signs and symptoms and infection statuses used for training and evaluating the predictive model is summarized in \u003cstrong\u003eTable S1\u003c/strong\u003e. Fever (90.05%), cough (82.11%), and nasal congestion (81.21%) were widely reported, whereas more severe clinical presentations such as seizures (0.20%), apnea or shock (0.00%) were rare or not identified. The statement is that symptom distribution helps to inform the predictive model, though less frequent infections contribute less to overall model performance.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eSymptom-Based Model Performance Accuracy in the Detection of Respiratory Viruses\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe symptom-based model for detecting respiratory viruses (\u003cstrong\u003eTable 1\u003c/strong\u003e), demonstrated solid performance across the different pathogens studied, with RSV achieving the highest AUC (0.81), reflecting a strong balance between sensitivity (0.64) and specificity (0.77). Also, RSV had the highest balanced accuracy (0.71), reflecting a superior diagnostic capability compared to the other viruses. COVID-19 and influenza also showed solid results, with AUCs of 0.71 and 0.70, respectively, translating to balanced accuracy scores of 0.64 for COVID-19 and 0.65 for influenza, indicating acceptable and stable diagnostic performance for both infections. Although the sensitivity and specificity of COVID-19 (both at 0.64) are not as high as RSV, the model still achieves suitable diagnostic precision. Furthermore, while the positive predictive value for COVID-19 is low (0.12) due to its lower prevalence, this is balanced by a high negative predictive value of 0.96, which is clinically significant for ruling out the disease.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eConversely, rhinovirus and adenovirus presented lower AUCs (0.62 and 0.69, respectively), suggesting moderate diagnostic capability for these infections. Although their sensitivities are lower (0.50 and 0.57, respectively), the model achieved strong specificity (0.62 for rhinovirus and 0.69 for adenovirus), indicating that it is accurate in identifying negative cases of these viruses, particularly adenovirus. The balanced accuracy for these viruses (0.56 for rhinovirus and 0.63 for adenovirus) demonstrates that, while the model is less robust for these infections, it still provides valuable diagnostic information for clinical decision-making.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eVirus-Specific Predictive Symptoms for Respiratory Infections \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe analysis of virus-specific predictive clinical presentation provided insights into how different signs and symptoms correlated with the diagnosis of each respiratory virus.\u003c/p\u003e\n\u003cp\u003eAs showed in \u003cstrong\u003eFigure 1A\u003c/strong\u003e, the mean SHAP values demonstrated that the most significant symptom associated with the absence of SARS-CoV-2 infection were crackles (0.0558), cough (0.0532), fever \u0026gt;39ºC (0.0342), abdominal pain (0.0253), and conjunctivitis (0.0193). Other symptoms, such as wheezing (0.0150), croup (0.0124), and diarrhea (0.0111) were also linked to the absence of SARS-CoV-2 infection, but with less intensity. Interestingly, low grade fever 37-38ºC showed a moderate association with SARS-CoV-2 ARI (0.0162) but emerged as the key indicator among all signs and symptoms analyzed (\u003cstrong\u003eFigure 1A and 1B\u003c/strong\u003e).\u003c/p\u003e\n\u003cp\u003eWheezing (0.0876) and crackles (0.0742) emerged as the strongest predictors of RSV ARI, followed by low grade fever 37-38 ºC (0.0455) (\u003cstrong\u003eFigures 2A and 2B\u003c/strong\u003e). In contrast, high fever (\u0026gt;39º) was the principal symptom related with the absence of RSV (0.0462) (\u003cstrong\u003eFigures 2A and 2B\u003c/strong\u003e).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eRegarding influenza (A+B), fatigue was identified as the symptom most strongly associated with influenza diagnosis (0.0700), followed by cough (0.0315) and high fever (\u0026gt;39ºC) (0.0246) (\u003cstrong\u003eFigures 3A and 3B\u003c/strong\u003e). However, wheezing (0.0657), low grade fever (37-38ºC) (0.0523), crackles (0.0436), and conjunctivitis (0.0275), were associated with the absence of influenza ARI (\u003cstrong\u003eFigures 3A and 3B\u003c/strong\u003e).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eFor the prediction of rhinovirus ARI, cough (0.0545), nasal congestion (0.0524) and to a lesser extent low grade fever 37-38ºC (0.0020) were the most relevant symptoms (\u003cstrong\u003eFigure 4\u003c/strong\u003e). Conversely, high fever (\u0026gt;39º) (0.0815), fatigue (0.0582), vomiting (0.0354), and crackles (0.0190) pointed to the unlikely diagnosis of rhinovirus ARI, as showed in \u003cstrong\u003eFigures 4A and 4B\u003c/strong\u003e.\u003c/p\u003e\n\u003cp\u003eAdenovirus symptom analysis revealed that high fever (\u0026gt;39ºC) was the only symptom that exhibited a high SHAP (0.0486) and significant number of patients showing this symptom as represented the high density of black dots (\u003cstrong\u003eFigure 5\u003c/strong\u003e). Strikingly, the model stands out by its predictive potential ruling out the virus presence, as reported by the connection between fever (37-38) (0.0754), crackles (0.0572), vomiting (0.0540), fever (38-39) (0.0521), fatigue (0.0422) and wheezing (0.0322) high SHAPs and the absence of adenovirus (\u003cstrong\u003eFigure 5\u003c/strong\u003e).\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eThis study highlights the utility of symptom-based predictive models in pediatric primary care for diagnosing specific respiratory viral infections, with the potential of facilitating their targeted management, and overall improving clinical outcomes [7,11]. Such models are especially valuable during outbreaks or novel virus scenarios, where rapid symptom assessment can guide timely interventions and reduce reliance on confirmatory tests [14].\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIn this context, integrating a symptom-based predictive model into routine clinical practice appears highly promising [7]. In pediatric populations, where symptoms are often ambiguous and may evolve rapidly [7], such a tool can support healthcare providers, mainly in primary care centers, in making prompt, data-driven decisions [13]. Given that ARIs account for 50% of pediatric consultations and result in 1.2 million hospital admissions annually [15], the high rate of hospitalizations and intensive care demands for these patients impose a substantial economic burden on healthcare systems. The development of predictive models focused on early and accurate diagnosis is therefore crucial, as they can enhance management, reduce costs, and improve patient care [11]. Our model addresses this need by enabling timely identification of infections and demonstrating notable accuracy in detecting key pathogens responsible for ARI in children, especially in settings where access to highly sensitive or rapid diagnostic tests is limited.\u003c/p\u003e\n\u003cp\u003eRemarkably, the model demonstrated effectiveness in identifying infections not only through specific signs and symptoms but also by recognizing the lack of certain symptoms, an aspect that is equally valuable for clinical decision-making. For RSV, the model achieved an AUC of 0.81, with sensitivity at 0.64 and specificity at 0.77, showing a strong balance. This suggests it effectively detects RSV when symptoms like wheezing or lack of high fever are present. The high AUC reflects the model's capability to distinguish RSV from other viral infections based on symptom patterns, offering reliable diagnostic utility in primary care settings. Additionally, the model displayed solid accuracy in detecting influenza, achieving an AUC of 0.70, further demonstrating its effectiveness in identifying common pathogens causing ARIs. In SARS-CoV-2 infections, the absence of symptoms such as crackles or wheezing was found to be a significant predictor of non-infection.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe model’s performance was also influenced by the clinical presentation of certain viruses, showing better predictive capabilities for those with more robust manifestations, such as enveloped viruses like influenza, SARS-CoV-2, and RSV, which affect both the upper and lower respiratory tract [16,17]. These viruses are often associated with more severe clinical outcomes, such as those related to pneumopathies [16,17], and the model demonstrated particular accuracy in detecting these viruses due to their distinct clinical signatures.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe ability of our predictive model to accurately detect viruses that cause more severe symptoms is a significant achievement, as these pathogens incur higher economic costs. This is especially relevant for RSV, which is associated with annual hospitalization expenses of up to € 87.1 million just in Spain [18]. Additionally, accurate detection of viral infections is particularly relevant, as the emergence of SARS-CoV-2 has not only increased healthcare system costs but also disrupted the seasonal patterns of respiratory pathogens, making the diagnosis of viral ARIs challenging [19-20].\u003c/p\u003e\n\u003cp\u003eConversely, non-enveloped viruses like rhinovirus and adenovirus exhibited lower AUC values of 0.62 and 0.69, respectively. These results reflect that these non-enveloped viruses tend to cause milder, more localized infections, often confined to the upper respiratory tract [21,22].\u0026nbsp;Their symptoms are less specific, typically presented as nasal congestion, pharyngitis, and cough [22,23]. As a result, the model displayed lower diagnostic accuracy for this group of viruses, relying on the absence of severe symptoms, such as high fever or systemic manifestations, to make predictions. By assessing both specific and nonspecific symptoms, the model provides clinicians with valuable insights for tailoring more effective interventions [24,25].\u003c/p\u003e\n\u003cp\u003eBuilding on this foundation, this predictive model demonstrates significant potential in early diagnosis of pediatric respiratory viral infections by analyzing symptom patterns. Its dual approach - leveraging both the presence and absence of symptoms - enhances diagnostic accuracy for infections like SARS-CoV-2, RSV, and influenza, which often require urgent intervention. This capability becomes even more important when considering future emergent viruses, where symptomatology will initially be the primary tool for diagnosis before specific tests are developed and deployed, as was seen during the initial stages of the COVID-19 pandemic. Moreover, the model’s application during high-demand periods could mitigate healthcare system overload by reducing unnecessary testing and hospitalizations.\u003c/p\u003e\n\u003cp\u003eHowever, the model also presented limitations, such as the reduced accuracy for viruses with mild symptoms, like rhinovirus or adenovirus, due to nonspecific clinical presentations.\u0026nbsp;Its performance may also have been limited by the sample size analyzed, preventing the model from offering accurate predictions across different age groups. The model also depends on accurate symptom reporting, which can vary due to clinician judgment and patient descriptions, especially in pediatric cases, potentially affecting predictive accuracy. In addition, the model did not consider the possibility of viral co-detections which may have been associated with overlapping signs and symptoms. Thus, while the model supports clinical decision-making, confirmatory testing remains necessary to ensure accurate diagnoses.\u003c/p\u003e\n\u003cp\u003eIn summary, the model effectively balances sensitivity and specificity for pathogens like SARS-CoV-2, RSV, and influenza, making it a valuable diagnostic tool for pediatric care. Its adaptability to different clinical environments, including resource-limited primary care settings, further underscores its practical utility. Additionally, the model’s reliance on readily available clinical data makes it an accessible and scalable solution for improving diagnostic accuracy. Its incorporation into routine practice could not only strengthen healthcare responses, but also contribute to improved readiness for future pandemics.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eCompeting Interests:\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare no competing interests.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding:\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis study was funded by the Fundació La Marató tv3 with the number of expedient 202134-30-31.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthors’ Contributions:\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eC.P. and A.S.-A. were the principal investigators of the project, responsible for the study design and conceptualization. The primary care pediatricians, co-authors of the article, provided the essential clinical data for the analysis. A.A and C.A processed the samples to identify respiratory viruses. All authors contributed to the manuscript's preparation and approved the final version for submission.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgments:\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo all the children and their families contributing with the samples and clinical data. To Datexbio (Laura Muñoz and Marc Pastor) for their thorough contributions to data analysis and clinical predictive model building.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData availability:\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe datasets created and/or analyzed in this study can be obtained from the corresponding author upon reasonable request.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n \u003cli\u003eLitwin, C. M. \u0026amp; Bosley, J. G. Seasonality and prevalence of respiratory pathogens detected by multiplex PCR at a tertiary care medical center. \u003cem\u003eArch. Virol.\u003c/em\u003e\u003cstrong\u003e159\u003c/strong\u003e, 65\u0026ndash;72 (2014).\u0026nbsp;\u003c/li\u003e\n \u003cli\u003ePieren, D. K. J., Boer, M. C. \u0026amp; de Wit, J. The adaptive immune system in early life: The shift makes it count. \u003cem\u003eFront. Immunol.\u003c/em\u003e\u003cstrong\u003e13\u003c/strong\u003e, 1031924 (2022).\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eZhu, G. et al. Epidemiological characteristics of four common respiratory viral infections in children. \u003cem\u003eVirol. J.\u003c/em\u003e\u003cstrong\u003e18\u003c/strong\u003e, 10 (2021).\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eDebiaggi, M., Canducci, F., Ceresola, E. R. \u0026amp; Clementi, M. The role of infections and coinfections with newly identified and emerging respiratory viruses in children. \u003cem\u003eVirol. J.\u003c/em\u003e\u003cstrong\u003e9\u003c/strong\u003e, 247 (2012).\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eFilip, R., Gheorghita Puscaselu, R., Anchidin-Norocel, L., Dimian, M. \u0026amp; Savage, W. K. Global challenges to public health care systems during the COVID-19 pandemic: A review of pandemic measures and problems. \u003cem\u003eJ. Pers. Med.\u003c/em\u003e\u003cstrong\u003e12\u003c/strong\u003e, 1295 (2022).\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eGoyal, R. \u0026amp; Sharma, R. Pediatric COVID: How is it different from adults?\u003c/li\u003e\n \u003cli\u003eShearah, Z., Ullah, Z. \u0026amp; Fakieh, B. Intelligent framework for early detection of severe pediatric diseases from mild symptoms.\u0026nbsp;\u003cem\u003eDiagnostics\u003c/em\u003e\u003cstrong\u003e13\u003c/strong\u003e, 3204 (2023).\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eRam\u0026iacute;rez Varela, A. et al. Prediction of SARS-CoV-2 infection with a symptoms-based model to aid public health decision making in Latin America and other low and middle income settings. \u003cem\u003ePrev. Med. Rep.\u003c/em\u003e\u003cstrong\u003e27\u003c/strong\u003e, 101798 (2022).\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eTso, C. F., Lam, C., Calvert, J. \u0026amp; Mao, Q. Machine learning early prediction of respiratory syncytial virus in pediatric hospitalized patients. \u003cem\u003eFront. Pediatr.\u003c/em\u003e\u003cstrong\u003e10\u003c/strong\u003e, 886212 (2022).\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eGoto, T., Camargo, C. A. Jr, Faridi, M. K., Freishtat, R. J. \u0026amp; Hasegawa, K. Machine learning-based prediction of clinical outcomes for children during emergency department triage. \u003cem\u003eJAMA Netw. Open\u003c/em\u003e\u003cstrong\u003e2\u003c/strong\u003e, e186937 (2019).\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eBarenfanger, J., Drake, C., Leon, N., Mueller, T. \u0026amp; Troutt, T. Clinical and financial benefits of rapid detection of respiratory viruses: An outcomes study. \u003cem\u003eJ. Clin. Microbiol.\u003c/em\u003e\u003cstrong\u003e38\u003c/strong\u003e, 2824\u0026ndash;2828 (2000).\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eAnto\u0026ntilde;anzas, J. M. et al. Symptom-based predictive model of COVID-19 disease in children. \u003cem\u003eViruses\u003c/em\u003e\u003cstrong\u003e14\u003c/strong\u003e, 63 (2021).\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eRamgopal, S., Sanchez-Pinto, L. N., Horvat, C. M., Carroll, M. S., Luo, Y. \u0026amp; Florin, T. A. Artificial intelligence-based clinical decision support in pediatrics. \u003cem\u003ePediatr. Res.\u003c/em\u003e\u003cstrong\u003e93\u003c/strong\u003e, 334\u0026ndash;341 (2023).\u0026nbsp;\u003c/li\u003e\n \u003cli\u003ePetric, M., Comanor, L. \u0026amp; Petti, C. A. Role of the laboratory in diagnosis of influenza during seasonal epidemics and potential pandemics. \u003cem\u003eJ. Infect. Dis.\u003c/em\u003e\u003cstrong\u003e194\u003c/strong\u003e, S98\u0026ndash;S110 (2006).\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eDagne, H., Andualem, Z., Dagnew, B. \u0026amp; Taddese, A. A. Acute respiratory infection and its associated factors among children under-five years attending pediatrics ward at University of Gondar Comprehensive Specialized Hospital, Northwest Ethiopia: Institution-based cross-sectional study. \u003cem\u003eBMC Pediatr.\u003c/em\u003e\u003cstrong\u003e20\u003c/strong\u003e, 93 (2020).\u0026nbsp;\u003c/li\u003e\n \u003cli\u003ede Gabory, L., Alharbi, A., K\u0026eacute;rimian, M. \u0026amp; Lafon, M. E. The influenza virus, SARS-CoV-2, and the airways: Clarification for the otorhinolaryngologist. \u003cem\u003eEur. Ann. Otorhinolaryngol. Head Neck Dis.\u003c/em\u003e\u003cstrong\u003e137\u003c/strong\u003e, 291\u0026ndash;296 (2020).\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eLee, C. Y. F. et al. Respiratory syncytial virus prevention: A new era of vaccines. \u003cem\u003eCureus\u003c/em\u003e\u003cstrong\u003e15\u003c/strong\u003e, e45012 (2023).\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eGea-Izquierdo, E., Gil-Prieto, R., Hern\u0026aacute;ndez-Barrera, V. \u0026amp; Gil-de-Miguel, \u0026Aacute;. Respiratory syncytial virus-associated hospitalization in children aged \u0026lt;2 years in Spain from 2018 to 2021. \u003cem\u003eHum. Vaccin. Immunother.\u003c/em\u003e\u003cstrong\u003e19\u003c/strong\u003e, 2231818 (2023).\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eCarrera-Hueso, F. J. et al. Hospitalization budget impact during the COVID-19 pandemic in Spain. \u003cem\u003eHealth Econ. Rev.\u003c/em\u003e\u003cstrong\u003e11\u003c/strong\u003e, 43 (2021).\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eBra\u0026ntilde;as, P. et al. Dynamics of respiratory viruses other than SARS-CoV-2 during the COVID-19 pandemic in Madrid, Spain. \u003cem\u003eInfluenza Other Respir. Viruses\u003c/em\u003e\u003cstrong\u003e17\u003c/strong\u003e, e13199 (2023).\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eEsneau, C., Bartlett, N. \u0026amp; Bochkov, Y. Rhinovirus structure, replication, and classification. In \u003cem\u003eRhinovirus Infections\u003c/em\u003e (Elsevier, 2019).\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eLynch, J. P. III \u0026amp; Kajon, A. E. Adenovirus: Epidemiology, global spread of novel serotypes, and advances in treatment and prevention. \u003cem\u003eSemin. Respir. Crit. Care Med.\u003c/em\u003e\u003cstrong\u003e37\u003c/strong\u003e, 586\u0026ndash;602 (2016).\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eVandini, S., Biagi, C., Fischer, M. \u0026amp; Lanari, M. Impact of rhinovirus infections in children. \u003cem\u003eViruses\u003c/em\u003e\u003cstrong\u003e11\u003c/strong\u003e, 521 (2019).\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eChallener, D. W., Dowdy, S. C. \u0026amp; O\u0026rsquo;Horo, J. C. Analytics and prediction modeling during the COVID-19 pandemic. \u003cem\u003eMayo Clin. Proc.\u003c/em\u003e\u003cstrong\u003e95\u003c/strong\u003e, S8\u0026ndash;S10 (2020).\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eFeng, T. et al. Machine learning-based clinical decision support for infection risk prediction. \u003cem\u003eFront. Med.\u003c/em\u003e\u003cstrong\u003e10\u003c/strong\u003e, 1213411 (2023). \u003c/li\u003e\n\u003c/ol\u003e"},{"header":"Table","content":"\u003ctable border=\"0\" cellpadding=\"0\" width=\"101%\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"6\" style=\"width: 99px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eTable 1. Parameters evaluated by the model for each specific viral infection.\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 26px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eMetric\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 15px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eSARS-CoV-2\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 8px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eVRS\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 15px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eInfluenza\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 15px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eRhinovirus\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 16px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eAdenovirus\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 26px;\"\u003e\n \u003cp\u003e\u003cem\u003eAUC\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 15px;\"\u003e\n \u003cp\u003e0.71\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 8px;\"\u003e\n \u003cp\u003e0.81\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 15px;\"\u003e\n \u003cp\u003e0.70\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 15px;\"\u003e\n \u003cp\u003e0.62\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 16px;\"\u003e\n \u003cp\u003e0.69\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 26px;\"\u003e\n \u003cp\u003e\u003cem\u003eAccuracy\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 15px;\"\u003e\n \u003cp\u003e0.64\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 8px;\"\u003e\n \u003cp\u003e0.76\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 15px;\"\u003e\n \u003cp\u003e0.63\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 15px;\"\u003e\n \u003cp\u003e0.61\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 16px;\"\u003e\n \u003cp\u003e0.68\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 26px;\"\u003e\n \u003cp\u003e\u003cem\u003eKappa\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 15px;\"\u003e\n \u003cp\u003e0.09\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 8px;\"\u003e\n \u003cp\u003e0.27\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 15px;\"\u003e\n \u003cp\u003e0.27\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 15px;\"\u003e\n \u003cp\u003e0.04\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 16px;\"\u003e\n \u003cp\u003e0.13\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 26px;\"\u003e\n \u003cp\u003e\u003cem\u003eSensitivity\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 15px;\"\u003e\n \u003cp\u003e0.64\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 8px;\"\u003e\n \u003cp\u003e0.64\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 15px;\"\u003e\n \u003cp\u003e0.70\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 15px;\"\u003e\n \u003cp\u003e0.50\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 16px;\"\u003e\n \u003cp\u003e0.57\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 26px;\"\u003e\n \u003cp\u003e\u003cem\u003eSpecificity\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 15px;\"\u003e\n \u003cp\u003e0.64\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 8px;\"\u003e\n \u003cp\u003e0.77\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 15px;\"\u003e\n \u003cp\u003e0.59\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 15px;\"\u003e\n \u003cp\u003e0.62\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 16px;\"\u003e\n \u003cp\u003e0.69\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 26px;\"\u003e\n \u003cp\u003e\u003cem\u003ePosisitve Predicted Value\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 15px;\"\u003e\n \u003cp\u003e0.12\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 8px;\"\u003e\n \u003cp\u003e0.29\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 15px;\"\u003e\n \u003cp\u003e0.50\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 15px;\"\u003e\n \u003cp\u003e0.09\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 16px;\"\u003e\n \u003cp\u003e0.17\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 26px;\"\u003e\n \u003cp\u003e\u003cem\u003eNegative Predicted Value\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 15px;\"\u003e\n \u003cp\u003e0.96\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 8px;\"\u003e\n \u003cp\u003e0.94\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 15px;\"\u003e\n \u003cp\u003e0.77\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 15px;\"\u003e\n \u003cp\u003e0.94\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 16px;\"\u003e\n \u003cp\u003e0.94\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 26px;\"\u003e\n \u003cp\u003e\u003cem\u003ePrevalence\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 15px;\"\u003e\n \u003cp\u003e0.08\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 8px;\"\u003e\n \u003cp\u003e0.13\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 15px;\"\u003e\n \u003cp\u003e0.37\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 15px;\"\u003e\n \u003cp\u003e0.07\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 16px;\"\u003e\n \u003cp\u003e0.10\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 26px;\"\u003e\n \u003cp\u003e\u003cem\u003eDetection Rate\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 15px;\"\u003e\n \u003cp\u003e0.05\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 8px;\"\u003e\n \u003cp\u003e0.08\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 15px;\"\u003e\n \u003cp\u003e0.26\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 15px;\"\u003e\n \u003cp\u003e0.03\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 16px;\"\u003e\n \u003cp\u003e0.06\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 26px;\"\u003e\n \u003cp\u003eDetection Prevalence\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 15px;\"\u003e\n \u003cp\u003e0.38\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 8px;\"\u003e\n \u003cp\u003e0.28\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 15px;\"\u003e\n \u003cp\u003e0.52\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 15px;\"\u003e\n \u003cp\u003e0.39\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 16px;\"\u003e\n \u003cp\u003e0.33\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 26px;\"\u003e\n \u003cp\u003eBalanced Accuracy\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 15px;\"\u003e\n \u003cp\u003e0.64\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 8px;\"\u003e\n \u003cp\u003e0.71\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 15px;\"\u003e\n \u003cp\u003e0.65\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 15px;\"\u003e\n \u003cp\u003e0.56\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 16px;\"\u003e\n \u003cp\u003e0.63\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"pediatric care, symptom-based predictive modeling, respiratory virus infections, acute respiratory infection, triage","lastPublishedDoi":"10.21203/rs.3.rs-5490724/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-5490724/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003e\u003cstrong\u003eIntroduction: \u003c/strong\u003eRespiratory viral infections, including SARS-CoV-2, respiratory syncytial virus (RSV), influenza, rhinovirus, and adenovirus, are major causes of acute respiratory illness (ARI) in children. Symptom-based predictive models are valuable tools for expediting diagnoses, particularly in primary care settings. This study assessed the effectiveness of machine learning-based models in estimating infection probabilities for these common pediatric respiratory viruses using symptom data.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eMethods:\u003c/strong\u003e Data were collected from 868 children with ARI symptoms evaluated across 14 primary care centers, members of COPEDICAT (Coronavirus Pediatria Catalunya), from October 2021 to October 2023. Random Forest and Boosting models with 10-fold cross-validation were used, applying SMOTE-NC to address class imbalance. Model performance was evaluated via area under the curve (AUC), sensitivity, specificity, and Shapley Additive exPlanations (SHAP) values for feature importance.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eResults:\u003c/strong\u003e The model performed best for RSV (AUC: 0.81, sensitivity: 0.64, specificity: 0.77) and effectively ruled out SARS-CoV-2 based on symptom absence, such as crackles and wheezing. Predictive performance was lower for non-enveloped viruses like rhinovirus and adenovirus, due to their nonspecific symptom profiles. SHAP analysis identified key symptoms patterns for each virus.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConclusions:\u003c/strong\u003e The study demonstrated that symptom-based predictive models effectively identify pediatric respiratory infections, with notable accuracy for RSV, SARS-CoV-2 and influenza.\u003c/p\u003e","manuscriptTitle":"Implementing Symptom-Based Predictive Models for Early Diagnosis of Pediatric Respiratory Viral Infections","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-01-20 17:43:45","doi":"10.21203/rs.3.rs-5490724/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"37d8b39e-5307-4c95-add3-0a531b07e3ce","owner":[],"postedDate":"January 20th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":42999178,"name":"Health sciences/Health care"},{"id":42999179,"name":"Health sciences/Health care/Paediatrics"},{"id":42999180,"name":"Health sciences/Health care/Paediatrics/Paediatric research"},{"id":42999181,"name":"Biological sciences/Computational biology and bioinformatics/Machine learning"}],"tags":[],"updatedAt":"2025-01-20T17:43:45+00:00","versionOfRecord":[],"versionCreatedAt":"2025-01-20 17:43:45","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-5490724","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-5490724","identity":"rs-5490724","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.