Parameter Adjustment for Mechanistic Epidemiological Models of COVID-19: Controlling for the Impact of Metro Area Crowding

preprint OA: closed CC-BY-4.0
📄 Open PDF Full text JSON View at publisher
Full text 47,924 characters · extracted from preprint-html · click to expand
Parameter Adjustment for Mechanistic Epidemiological Models of COVID-19: Controlling for the Impact of Metro Area Crowding | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Parameter Adjustment for Mechanistic Epidemiological Models of COVID-19: Controlling for the Impact of Metro Area Crowding Michael M Thomas, Zhangding Liu, Neda Mohammadi, John E. Taylor This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7312496/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Timely understanding and accurate prediction of affected areas during novel disease outbreaks like COVID-19 is essential for the implementation of emergency response activities. Often, mechanistic models are used to evaluate the outcomes of these decisions. However, it can be difficult to estimate key features of the disease when an outbreak becoming a pandemic is in its early stages. Using compartmental model-controlled neural networks, this study creates transmission parameters for mechanistic models that are adjusted for these urban crowding features such as household size and public transportation use. The results showed that adjusted parameters can significantly improve the accuracy of the SIR model, thus helping guide resource allocation and make policy decisions. Artificial Intelligence and Machine Learning Mechanistic Epidemiological Models SIR Model Neural Network Deep Learning in Public Health Figures Figure 1 INTRODUCTION Novel infectious diseases require quick public health response; but building models demonstrating their spread, especially in the early stages with limited data, is challenging. Typically, epidemiological models are calibrated based on the disease's spread within specific communities to inform rapid policy decisions (Holmdahl & Buckee 2020 ). However, the model built by these methods may only apply to the source population where an epidemic has already broken out. Model parameters like the Basic Reproduction Number \(\:{\text{R}}_{0}\) and contact rate \(\:\beta\:\) depend on the model's structure and assumptions (Delamater et al. 2019 ). These parameters change over time and space, reflecting variations in different human interactions. Consequently, it’s inaccurate to use the value of \(\:{R}_{0}\) beyond the initial outbreak region (Holmdahl & Buckee 2020 ; Ridenhour et al. 2018 ). Therefore, a modeling procedure that considers properties of human-human interactions can assuage these generalizability issues. COVID-19 is an example of a pandemic that has seriously affected human lives and has caused over 1.1 million deaths in the U.S. (Dong et al. 2023 ). Early in the pandemic, extensive efforts focused on understanding disease spread to allocate resources, guide policy, and predict peak case times in different metropolitan areas (Holmdahl & Buckee 2020 ; Institute for Health Metrics and Evaluation (IHME) 2020 ). Other diseases, for example, tuberculosis, have been linked to increased transmission through urban crowding (Rojas-Bolivar et al. 2021 ). Evidence has shown urban crowding may influence spread (Thomas et al. 2021). While epidemiologists have applied parameter adjustments to mitigate confounder effects in non-communicable disease analyses, this approach has not been used in mechanistic models (Szklo & Nieto 2007 ). This study seeks to generate more accurate descriptions of the spread of COVID-19 in the early stage of a community with a limited number of cases for reference. The goal is to develop a framework that takes data from areas with relevant differences in urbanization covariates and adjusts it for use in the target community. A machine learning model was trained by four metropolitan statistical areas (MSA) in the U.S. which had the earliest recorded epidemics of COVID-19, and tested on Richmond, VA’s MSA data to illustrate changes in an SIR model when the transmission parameter is informed by the model-adjusted value instead of the calibrated model case count outputs. MATERIALS AND METHODS Data. The COVID-19 data were collected from the Johns Hopkins Center for Systems Science and Engineering data repository (Dong et al. 2023 ). The American Community Survey (ACS) 5-Year estimates were used to estimate the urban density variables (not including mass transportation adoption) and the U.S. Census was used to estimate populations of counties (U.S. Census Bureau 2019 ). Proportion of respondents stating that they ride buses to work weekly or more from the 2017 National Household Travel Survey (NHTS) (U.S. Dept of Transportation 2017 ) and Google Mobility data were also used to measure the mass transportation adoption behaviors in different locations in the U.S. (Google 2022 ; Buehler & Pucher 2017 ). Google Mobility Reports were also incorporated to provide a time-varying measure of population mobility (Google 2022 ). ACS variables representing household crowding, the percent of population below the poverty line, and the percent of population educational attainment were included to characterize other facets of the population. These variables were included to take into account other relevant factors to disease spread such as the number of individuals in essential jobs and household crowding (Rojas-Bolivar et al. 2021 ). Variables were aggregated to the Census Bureau’s MSA. Daily county-level data for the U.S. were collected, smoothed, and aggregated for the time period January 22, 2020 to May 1, 2020. The data used in the experiment to test the methodology included two-week periods beginning when the relevant area had its first day with more than 10 cases (Financial Times 2020 ). This time period was chosen to include early stages of the pandemic before mobility habits changed and the effects of transmission immediately following the lockdown policies. The experiment included in this study focuses on four source cities and one target city, including 7,440 cases. Four cities were chosen because this was the largest number of cities with two weeks of available case data and established epidemics at the point in time when the last target city’s epidemic began. Models. In traditional SIR models, "S" stands for Susceptible individuals, "I" stands for those Infected, and "R" stands for the Recovered or Removed population (Dimitrov & Meyers 2010 ). Parameters are typically derived from observed case counts (incidence) and recovery or mortality data, using a curve-fitting method for calibration. This study used a SIR model to demonstrate how compartment dynamics vary when adjustments are made to the transmission rate parameter specifically. The \(\:\beta\:\) and \(\:\gamma\:\) values represent the rate of transmission of individuals from the S to the I and from the I to the R compartments respectively. For example, a high value of \(\:\beta\:\) represents a high flow of individuals from the susceptible population to the infected population, meaning the disease is very infectious. Adjusting estimates is often necessary when utilizing calibrated epidemiologic models. In dense urban centers, incorporating information about common confounding variables from other cities might improve predictions. In this context, we entered relevant density factors listed above into a neural network model. Using the estimated transmission parameters from this model, we could adjust the incidence estimate. In our neural network model, COVID-19 calibrated parameters in U.S. Metropolitan Statistical Areas are modeled as \(\:\widehat{{{\beta\:}}_{\text{i}}}=\widehat{f}\left(\varvec{w},{\varvec{x}}_{\varvec{i}}\right)\) , where \(\:\widehat{{{\beta\:}}_{\text{i}}}\) is the epidemiological parameter of interest in the SIR model, \(\:\widehat{f}\) is the trained neural network linking the input covariates to the predicted parameters, \(\:\varvec{w}\) are the weights used in \(\:\widehat{f}\) ’s summation to feed to the transformation unit, and \(\:{\varvec{x}}_{\varvec{i}}\) are the input feature values. The subscript \(\:i\) indicates which one of the \(\:n\) training cities corresponds to the feature vectors. The loss function used for training the neural net is \(\:\mathcal{L}={\sum\:}_{i=1}^{n}{\left({\widehat{\:\beta\:}}_{\varvec{i}}-{{\beta\:}}_{\text{i}}\right)}^{2}\) and the final prediction of daily cases is generated using the adjusted value \(\:\:\widehat{I}=\:SIR({S}_{i}\left[0\right],{I}_{i}\left[0\right],{R}_{i}\left[0\right],\widehat{\varvec{\beta\:}},\varvec{\gamma\:}).\) In this case \(\:SIR\) is the system of difference equations producing the calibrated values of \(\:\beta\:\) ; the hat represents that this vector of parameters represents the adjusted counts which are produced using the weight parameters from \(\:\widehat{f}\) combined with the crowding data from the city. In this way, the overall process for estimation is, Calibrate \(\:{\beta\:}\) for \(\:n\) cities with established epidemics Train \(\:\widehat{f}\) by minimizing the square error of \(\:{\widehat{{\beta\:}}}_{i}\) and \(\:{{\beta\:}}_{\text{i}}\) Using the model from 2. place \(\:\widehat{{\beta\:}}\) in the \(\:SIR\) function to produce an adjusted \(\:I.\) Neural Network. We collected two-week periods of COVID-19 data and transportation data, socioeconomic data, and data regarding household crowding for four source cities and one target city, as the training and test samples. Each of the variables was normalized then randomly divided into the training and validation set on a ratio of 8:2. The Adam optimizer was used to select the weights of the neural net. Hyperparameter tuning determined the following values: the sample batch size was set to 1, the learning rate was set to 1.16e-05, the length of hidden layer 1 was set to 8 neurons, and the length of hidden layer 2 was set to 5 neurons. The activation function and epoch sizes were set to sigmoid and 400. We also use dropout and early stopping during the training process to prevent model overfitting and to improve generalization. RESULTS The model was trained on four metropolitan statistical areas (Boston, New York, San Jose Seattle) which were selected because they had over 10 daily cases of COVID-19 two weeks prior to the start of the latest epidemic start date in the US MSA data, Richmond, VA. These four cities’ calibrated epidemic parameters were used to characterize the association between mass transportation adoption in the area and COVID-19 incidence; this is analogous to a situation in which Richmond had no local data on COVID-19 cases and would have to use information from other cities to inform mechanistic models. Richmond data were entered into the trained model to produce epidemiologic parameters and then entered in an SIR model to produce COVID-19 case counts as shown in Fig. 1 (a). Then RMSE (Root Mean Squared Error) was used to evaluate the performances of the methods: the standard SIR model’s Prediction and our adjusted SIR model’s Prediction_Adj. The purpose of the model in this context was to reduce the magnitude of raw estimates considering the transit usage and other relevant covariates in the four index cities. This experiment was run 100 times to describe the variance of the predictions which involve stochasticity because of the neural net’s weight estimation. Figure 1 (b) plots the results of the 100 runs of training and testing the model; the RMSE of the adjusted model (right) was generally lower than the RMSE of the traditional model approach (left). An example of two kinds of outputs is shown in Fig. 1 (a); these figures show that the adjusted \(\:\beta\:\:\) value generates better model results than the unadjusted results in the blue solid line. DISCUSSION The framework in this adjusted infection parameters to prevent inflation attributable to mass transportation networks and other constructs related to urban density. Instead of estimating the SIR model’s β infection rate parameter using raw case data, the values were adjusted using a neural network to account for the effects of transportation network from the transmission probability estimates. The adjusted values were used to plot an SIR model, the model demonstrates a more conservative estimate of the rate at which the infection compartment grows. The deviation from the cases at the end of the actual data is likely a result of a global decrease in COVID-19 spread from national lockdowns. In the initial stages of a pandemic, it is crucial to avoid common miscalculations in epidemiological models (Holmdahl & Buckee 2020 ). Increasing the accuracy of compartmental epidemiological modeling methods is challenging when little is known about a novel disease and limited data have been published. One significant challenge is the transferability of crucial disease transmission parameters. These parameters, initially derived from index populations and their unique contact patterns, may not be generally applicable. This study presents an opportunity to more accurately estimate disease spread. Estimating the impact of these effects, coupled with effective visualizations and communication strategies, is anticipated to foster public trust by developing epidemic models for specific populations. Such an approach is important for informed decision-making during the critical initial phase of an epidemic. CONCLUSION In this research, we explored the enhancement of refining disease transmission rate parameters in compartmental models through a neural network. Our findings suggest that by factoring in aspects such as individual transportation usage, we can achieve a more accurate depiction of disease spread. By integrating these approaches, we can significantly improve the effectiveness of interventions and resource distribution during possible future public health crises, ensuring a more targeted and efficient response. Declarations ACKNOWLEDGEMENTS This material is based upon work supported by the National Science Foundation (NSF) under Grant No. 1837021. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF. References Buehler R, Pucher J (2017) Trends in Walking and Cycling Safety: Recent Evidence From High-Income Countries, With a Focus on the United States and Germany. Am J Public Health 107(2):281–287. https://doi.org/10.2105/AJPH.2016.303546 Dimitrov NB, Meyers LA (2010) Mathematical approaches to infectious disease prediction and control. Risk and optimization in an uncertain world. INFORMS, pp 1–25 Delamater PL, Street EJ, Leslie TF, Yang YT, Jacobsen KH (2019) Complexity of the basic reproduction number (R0). Emerg Infect Dis 25(1):1 Dong E, Du H, Gardner L (2023) An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect Dis 20(5):533–534 Financial Times (2020) Financial Times Covid-19 Tracker. Financial Times Google (2022) COVID-19 Community Mobility Reports . Holmdahl I, Buckee C (2020) Wrong but Useful — What Covid-19 Epidemiologic Models Can and Cannot Tell Us. N Engl J Med 383(4):303–305 Institute for Health Metrics and Evaluation (IHME) (2020) COVID-19 Projections. IHME, University of Washington Ridenhour B, Kowalik JM, Shay DK (2018) Unraveling r 0: Considerations for public health applications. Am J Public Health 108(S6):S445–S454 Rojas-Bolivar D, Intimayta-Escalante C, Cardenas-Jara A, Jandarov R, Huaman MA (2021) COVID-19 case fatality rate and tuberculosis in a metropolitan setting. J Med Virol Szklo M, Nieto FJ (2007) Stratification and Adjustment: Multivariate Analysis in Epidemiology. Epidemiology: Beyond the Basics. Second Edi). Jones & Bartlett Thomas MM, Mohammadi N, Taylor JE (2022) Investigating the association between mass transit adoption and COVID-19 infections in US metropolitan areas. Sci Total Environ 811:152284 U.S. Census Bureau (2019) American Community Survey 5-year Data API (2009–2019). https://www.census.gov/data/developers/data-sets/acs-5year.html U.S. Dept of Transportation (2017) 2017 National Household Travel Survey . nhts.ornl.gov Additional Declarations The authors declare no competing interests. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7312496","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":496771851,"identity":"6e3df5a8-7ae6-455d-93cc-56e6017e06bd","order_by":0,"name":"Michael M Thomas","email":"","orcid":"","institution":"Georgia Institute of Technology","correspondingAuthor":false,"prefix":"","firstName":"Michael","middleName":"M","lastName":"Thomas","suffix":""},{"id":496771852,"identity":"681c57db-b055-4177-b80f-83a69dcac05e","order_by":1,"name":"Zhangding Liu","email":"","orcid":"","institution":"Georgia Institute of Technology","correspondingAuthor":false,"prefix":"","firstName":"Zhangding","middleName":"","lastName":"Liu","suffix":""},{"id":496771853,"identity":"d76072c2-37e9-4acd-bc17-e31dea814b54","order_by":2,"name":"Neda Mohammadi","email":"","orcid":"","institution":"Georgia Institute of Technology","correspondingAuthor":false,"prefix":"","firstName":"Neda","middleName":"","lastName":"Mohammadi","suffix":""},{"id":496771854,"identity":"fc4f4f1c-2053-4a75-8b10-46a0fdb203e5","order_by":3,"name":"John E. Taylor","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAvUlEQVRIiWNgGAWjYBACxgYGxgdglgQQ8xCnhZnZgIHBgAQtDAzMbBKkaWFu7z9W+aXmjxz/7AbGB2/biHFYz2G22zLHDIwl7hxgNpxLlJYZyWy3JdgMEjdIJLBJ8xKlZf5jtmKJfwb1QC3sv4nTMoOZjfFjm0GCAdAWZuK09CQbSzP2GRvOuJHYLDnnHBFaDNsPPvz445ucPP+M5IMf3pQRo6UBGNCQ6AAmBKKAPEjtD+LUjoJRMApGwUgFAPRkMrBMoh28AAAAAElFTkSuQmCC","orcid":"","institution":"Georgia Institute of Technology","correspondingAuthor":true,"prefix":"","firstName":"John","middleName":"E.","lastName":"Taylor","suffix":""}],"badges":[],"createdAt":"2025-08-06 18:58:25","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-7312496/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7312496/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":88590327,"identity":"7bceb5d9-a4ab-45fc-b154-53c8e5e83db3","added_by":"auto","created_at":"2025-08-08 05:33:09","extension":"jpeg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":144362,"visible":true,"origin":"","legend":"\u003cp\u003e(a) SIR Compartmental Models governed by a system of difference equations. The dotted lines represent adjusted estimated beta values, the blue lines represent a model generated with the mean of the calibrated beta values. (b) Comparison of RMSE for one instance of unadjusted (Left) and 100 run mean and standard deviation of adjusted (Right) models’ RMSE.\u003c/p\u003e","description":"","filename":"floatimage1.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7312496/v1/8973557737b135fcbce0c9e1.jpeg"},{"id":88592126,"identity":"270c2a7f-e92e-482d-9763-1ad9b7a3104e","added_by":"auto","created_at":"2025-08-08 05:57:09","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":452149,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7312496/v1/6f8c74b3-3e60-4cb3-831c-4a78b0f6cf98.pdf"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003e\u003cstrong\u003eParameter Adjustment for Mechanistic Epidemiological Models of COVID-19: Controlling for the Impact of Metro Area Crowding\u003c/strong\u003e\u003c/p\u003e","fulltext":[{"header":"INTRODUCTION","content":"\u003cp\u003eNovel infectious diseases require quick public health response; but building models demonstrating their spread, especially in the early stages with limited data, is challenging. Typically, epidemiological models are calibrated based on the disease's spread within specific communities to inform rapid policy decisions (Holmdahl \u0026amp; Buckee \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). However, the model built by these methods may only apply to the source population where an epidemic has already broken out. Model parameters like the Basic Reproduction Number \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\text{R}}_{0}\\)\u003c/span\u003e\u003c/span\u003e and contact rate \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\beta\\:\\)\u003c/span\u003e\u003c/span\u003e depend on the model's structure and assumptions (Delamater et al. \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2019\u003c/span\u003e). These parameters change over time and space, reflecting variations in different human interactions. Consequently, it\u0026rsquo;s inaccurate to use the value of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{R}_{0}\\)\u003c/span\u003e\u003c/span\u003e beyond the initial outbreak region (Holmdahl \u0026amp; Buckee \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Ridenhour et al. \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). Therefore, a modeling procedure that considers properties of human-human interactions can assuage these generalizability issues.\u003c/p\u003e\u003cp\u003eCOVID-19 is an example of a pandemic that has seriously affected human lives and has caused over 1.1\u0026nbsp;million deaths in the U.S. (Dong et al. \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). Early in the pandemic, extensive efforts focused on understanding disease spread to allocate resources, guide policy, and predict peak case times in different metropolitan areas (Holmdahl \u0026amp; Buckee \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Institute for Health Metrics and Evaluation (IHME) \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). Other diseases, for example, tuberculosis, have been linked to increased transmission through urban crowding (Rojas-Bolivar et al. \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). Evidence has shown urban crowding may influence spread (Thomas et al. 2021). While epidemiologists have applied parameter adjustments to mitigate confounder effects in non-communicable disease analyses, this approach has not been used in mechanistic models (Szklo \u0026amp; Nieto \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2007\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eThis study seeks to generate more accurate descriptions of the spread of COVID-19 in the early stage of a community with a limited number of cases for reference. The goal is to develop a framework that takes data from areas with relevant differences in urbanization covariates and adjusts it for use in the target community. A machine learning model was trained by four metropolitan statistical areas (MSA) in the U.S. which had the earliest recorded epidemics of COVID-19, and tested on Richmond, VA\u0026rsquo;s MSA data to illustrate changes in an SIR model when the transmission parameter is informed by the model-adjusted value instead of the calibrated model case count outputs.\u003c/p\u003e"},{"header":"MATERIALS AND METHODS","content":"\u003cp\u003e\u003cb\u003eData.\u003c/b\u003e The COVID-19 data were collected from the Johns Hopkins Center for Systems Science and Engineering data repository (Dong et al. \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). The American Community Survey (ACS) 5-Year estimates were used to estimate the urban density variables (not including mass transportation adoption) and the U.S. Census was used to estimate populations of counties (U.S. Census Bureau \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2019\u003c/span\u003e). Proportion of respondents stating that they ride buses to work weekly or more from the 2017 National Household Travel Survey (NHTS) (U.S. Dept of Transportation \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2017\u003c/span\u003e) and Google Mobility data were also used to measure the mass transportation adoption behaviors in different locations in the U.S. (Google \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Buehler \u0026amp; Pucher \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2017\u003c/span\u003e). Google Mobility Reports were also incorporated to provide a time-varying measure of population mobility (Google \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). ACS variables representing household crowding, the percent of population below the poverty line, and the percent of population educational attainment were included to characterize other facets of the population. These variables were included to take into account other relevant factors to disease spread such as the number of individuals in essential jobs and household crowding (Rojas-Bolivar et al. \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). Variables were aggregated to the Census Bureau\u0026rsquo;s MSA.\u003c/p\u003e\u003cp\u003eDaily county-level data for the U.S. were collected, smoothed, and aggregated for the time period January 22, 2020 to May 1, 2020. The data used in the experiment to test the methodology included two-week periods beginning when the relevant area had its first day with more than 10 cases (Financial Times \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). This time period was chosen to include early stages of the pandemic before mobility habits changed and the effects of transmission immediately following the lockdown policies. The experiment included in this study focuses on four source cities and one target city, including 7,440 cases. Four cities were chosen because this was the largest number of cities with two weeks of available case data and established epidemics at the point in time when the last target city\u0026rsquo;s epidemic began.\u003c/p\u003e\u003cp\u003e\u003cb\u003eModels.\u003c/b\u003e In traditional SIR models, \"S\" stands for Susceptible individuals, \"I\" stands for those Infected, and \"R\" stands for the Recovered or Removed population (Dimitrov \u0026amp; Meyers \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2010\u003c/span\u003e). Parameters are typically derived from observed case counts (incidence) and recovery or mortality data, using a curve-fitting method for calibration. This study used a SIR model to demonstrate how compartment dynamics vary when adjustments are made to the transmission rate parameter specifically. The \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\beta\\:\\)\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\gamma\\:\\)\u003c/span\u003e\u003c/span\u003e values represent the rate of transmission of individuals from the S to the I and from the I to the R compartments respectively. For example, a high value of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\beta\\:\\)\u003c/span\u003e\u003c/span\u003e represents a high flow of individuals from the susceptible population to the infected population, meaning the disease is very infectious.\u003c/p\u003e\u003cp\u003eAdjusting estimates is often necessary when utilizing calibrated epidemiologic models. In dense urban centers, incorporating information about common confounding variables from other cities might improve predictions. In this context, we entered relevant density factors listed above into a neural network model. Using the estimated transmission parameters from this model, we could adjust the incidence estimate. In our neural network model, COVID-19 calibrated parameters in U.S. Metropolitan Statistical Areas are modeled as \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\widehat{{{\\beta\\:}}_{\\text{i}}}=\\widehat{f}\\left(\\varvec{w},{\\varvec{x}}_{\\varvec{i}}\\right)\\)\u003c/span\u003e\u003c/span\u003e, where \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\widehat{{{\\beta\\:}}_{\\text{i}}}\\)\u003c/span\u003e\u003c/span\u003e is the epidemiological parameter of interest in the SIR model, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\widehat{f}\\)\u003c/span\u003e\u003c/span\u003e is the trained neural network linking the input covariates to the predicted parameters, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\varvec{w}\\)\u003c/span\u003e\u003c/span\u003e are the weights used in \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\widehat{f}\\)\u003c/span\u003e\u003c/span\u003e\u0026rsquo;s summation to feed to the transformation unit, and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\varvec{x}}_{\\varvec{i}}\\)\u003c/span\u003e\u003c/span\u003e are the input feature values. The subscript \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:i\\)\u003c/span\u003e\u003c/span\u003e indicates which one of the \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:n\\)\u003c/span\u003e\u003c/span\u003e training cities corresponds to the feature vectors.\u003c/p\u003e\u003cp\u003eThe loss function used for training the neural net is \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\mathcal{L}={\\sum\\:}_{i=1}^{n}{\\left({\\widehat{\\:\\beta\\:}}_{\\varvec{i}}-{{\\beta\\:}}_{\\text{i}}\\right)}^{2}\\)\u003c/span\u003e\u003c/span\u003e and the final prediction of daily cases is generated using the adjusted value\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\:\\widehat{I}=\\:SIR({S}_{i}\\left[0\\right],{I}_{i}\\left[0\\right],{R}_{i}\\left[0\\right],\\widehat{\\varvec{\\beta\\:}},\\varvec{\\gamma\\:}).\\)\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\u003cp\u003eIn this case \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:SIR\\)\u003c/span\u003e\u003c/span\u003e is the system of difference equations producing the calibrated values of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\beta\\:\\)\u003c/span\u003e\u003c/span\u003e; the hat represents that this vector of parameters represents the \u003cem\u003eadjusted\u003c/em\u003e counts which are produced using the weight parameters from \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\widehat{f}\\)\u003c/span\u003e\u003c/span\u003e combined with the crowding data from the city. In this way, the overall process for estimation is,\u003c/p\u003e\u003cp\u003e\u003col\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003eCalibrate \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\beta\\:}\\)\u003c/span\u003e\u003c/span\u003e for \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:n\\)\u003c/span\u003e\u003c/span\u003e cities with established epidemics\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003eTrain \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\widehat{f}\\)\u003c/span\u003e\u003c/span\u003e by minimizing the square error of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\widehat{{\\beta\\:}}}_{i}\\)\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{{\\beta\\:}}_{\\text{i}}\\)\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003eUsing the model from 2. place \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\widehat{{\\beta\\:}}\\)\u003c/span\u003e\u003c/span\u003e in the \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:SIR\\)\u003c/span\u003e\u003c/span\u003e function to produce an adjusted \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:I.\\)\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003c/ol\u003e\u003c/p\u003e\u003cp\u003e\u003cb\u003eNeural Network.\u003c/b\u003e We collected two-week periods of COVID-19 data and transportation data, socioeconomic data, and data regarding household crowding for four source cities and one target city, as the training and test samples. Each of the variables was normalized then randomly divided into the training and validation set on a ratio of 8:2. The Adam optimizer was used to select the weights of the neural net. Hyperparameter tuning determined the following values: the sample batch size was set to 1, the learning rate was set to 1.16e-05, the length of hidden layer 1 was set to 8 neurons, and the length of hidden layer 2 was set to 5 neurons. The activation function and epoch sizes were set to sigmoid and 400. We also use dropout and early stopping during the training process to prevent model overfitting and to improve generalization.\u003c/p\u003e"},{"header":"RESULTS","content":"\u003cp\u003eThe model was trained on four metropolitan statistical areas (Boston, New York, San Jose Seattle) which were selected because they had over 10 daily cases of COVID-19 two weeks prior to the start of the latest epidemic start date in the US MSA data, Richmond, VA. These four cities\u0026rsquo; calibrated epidemic parameters were used to characterize the association between mass transportation adoption in the area and COVID-19 incidence; this is analogous to a situation in which Richmond had no local data on COVID-19 cases and would have to use information from other cities to inform mechanistic models. Richmond data were entered into the trained model to produce epidemiologic parameters and then entered in an SIR model to produce COVID-19 case counts as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e(a). Then RMSE (Root Mean Squared Error) was used to evaluate the performances of the methods: the standard SIR model\u0026rsquo;s Prediction and our adjusted SIR model\u0026rsquo;s Prediction_Adj. The purpose of the model in this context was to reduce the magnitude of raw estimates considering the transit usage and other relevant covariates in the four index cities. This experiment was run 100 times to describe the variance of the predictions which involve stochasticity because of the neural net\u0026rsquo;s weight estimation. Figure\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e(b) plots the results of the 100 runs of training and testing the model; the RMSE of the adjusted model (right) was generally lower than the RMSE of the traditional model approach (left). An example of two kinds of outputs is shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e(a); these figures show that the adjusted \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\beta\\:\\:\\)\u003c/span\u003e\u003c/span\u003e value generates better model results than the unadjusted results in the blue solid line.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e"},{"header":"DISCUSSION","content":"\u003cp\u003eThe framework in this adjusted infection parameters to prevent inflation attributable to mass transportation networks and other constructs related to urban density. Instead of estimating the SIR model\u0026rsquo;s β infection rate parameter using raw case data, the values were adjusted using a neural network to account for the effects of transportation network from the transmission probability estimates. The adjusted values were used to plot an SIR model, the model demonstrates a more conservative estimate of the rate at which the infection compartment grows. The deviation from the cases at the end of the actual data is likely a result of a global decrease in COVID-19 spread from national lockdowns.\u003c/p\u003e\u003cp\u003eIn the initial stages of a pandemic, it is crucial to avoid common miscalculations in epidemiological models (Holmdahl \u0026amp; Buckee \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). Increasing the accuracy of compartmental epidemiological modeling methods is challenging when little is known about a novel disease and limited data have been published. One significant challenge is the transferability of crucial disease transmission parameters. These parameters, initially derived from index populations and their unique contact patterns, may not be generally applicable. This study presents an opportunity to more accurately estimate disease spread. Estimating the impact of these effects, coupled with effective visualizations and communication strategies, is anticipated to foster public trust by developing epidemic models for specific populations. Such an approach is important for informed decision-making during the critical initial phase of an epidemic.\u003c/p\u003e"},{"header":"CONCLUSION","content":"\u003cp\u003eIn this research, we explored the enhancement of refining disease transmission rate parameters in compartmental models through a neural network. Our findings suggest that by factoring in aspects such as individual transportation usage, we can achieve a more accurate depiction of disease spread. By integrating these approaches, we can significantly improve the effectiveness of interventions and resource distribution during possible future public health crises, ensuring a more targeted and efficient response.\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003eACKNOWLEDGEMENTS\u003c/h2\u003e\u003cp\u003eThis material is based upon work supported by the National Science Foundation (NSF) under Grant No. 1837021. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eBuehler R, Pucher J (2017) Trends in Walking and Cycling Safety: Recent Evidence From High-Income Countries, With a Focus on the United States and Germany. Am J Public Health 107(2):281\u0026ndash;287. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.2105/AJPH.2016.303546\u003c/span\u003e\u003cspan address=\"10.2105/AJPH.2016.303546\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDimitrov NB, Meyers LA (2010) Mathematical approaches to infectious disease prediction and control. Risk and optimization in an uncertain world. INFORMS, pp 1\u0026ndash;25\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDelamater PL, Street EJ, Leslie TF, Yang YT, Jacobsen KH (2019) Complexity of the basic reproduction number (R0). Emerg Infect Dis 25(1):1\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDong E, Du H, Gardner L (2023) An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect Dis 20(5):533\u0026ndash;534\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eFinancial Times (2020) Financial Times Covid-19 Tracker. Financial Times\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGoogle (2022) \u003cem\u003eCOVID-19 Community Mobility Reports\u003c/em\u003e. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e\u003c/span\u003e\u003cspan address=\"http://www.google.com/covid19/mobility/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHolmdahl I, Buckee C (2020) Wrong but Useful \u0026mdash; What Covid-19 Epidemiologic Models Can and Cannot Tell Us. N Engl J Med 383(4):303\u0026ndash;305\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eInstitute for Health Metrics and Evaluation (IHME) (2020) COVID-19 Projections. IHME, University of Washington\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRidenhour B, Kowalik JM, Shay DK (2018) Unraveling r 0: Considerations for public health applications. Am J Public Health 108(S6):S445\u0026ndash;S454\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRojas-Bolivar D, Intimayta-Escalante C, Cardenas-Jara A, Jandarov R, Huaman MA (2021) COVID-19 case fatality rate and tuberculosis in a metropolitan setting. J Med Virol\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSzklo M, Nieto FJ (2007) Stratification and Adjustment: Multivariate Analysis in Epidemiology. Epidemiology: Beyond the Basics. Second Edi). Jones \u0026amp; Bartlett\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eThomas MM, Mohammadi N, Taylor JE (2022) Investigating the association between mass transit adoption and COVID-19 infections in US metropolitan areas. Sci Total Environ 811:152284\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eU.S. Census Bureau (2019) American Community Survey 5-year Data API (2009\u0026ndash;2019). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.census.gov/data/developers/data-sets/acs-5year.html\u003c/span\u003e\u003cspan address=\"https://www.census.gov/data/developers/data-sets/acs-5year.html\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eU.S. Dept of Transportation (2017) \u003cem\u003e2017 National Household Travel Survey\u003c/em\u003e. nhts.ornl.gov\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"Georgia Institute of Technology","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Mechanistic Epidemiological Models, SIR Model, Neural Network, Deep Learning in Public Health","lastPublishedDoi":"10.21203/rs.3.rs-7312496/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7312496/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eTimely understanding and accurate prediction of affected areas during novel disease outbreaks like COVID-19 is essential for the implementation of emergency response activities. Often, mechanistic models are used to evaluate the outcomes of these decisions. However, it can be difficult to estimate key features of the disease when an outbreak becoming a pandemic is in its early stages. Using compartmental model-controlled neural networks, this study creates transmission parameters for mechanistic models that are adjusted for these urban crowding features such as household size and public transportation use. The results showed that adjusted parameters can significantly improve the accuracy of the SIR model, thus helping guide resource allocation and make policy decisions.\u003c/p\u003e","manuscriptTitle":"Parameter Adjustment for Mechanistic Epidemiological Models of COVID-19: Controlling for the Impact of Metro Area Crowding","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-08-08 05:33:04","doi":"10.21203/rs.3.rs-7312496/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"a9f30445-eebf-4406-98fe-0ba85bfd0a82","owner":[],"postedDate":"August 8th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":52769864,"name":"Artificial Intelligence and Machine Learning"}],"tags":[],"updatedAt":"2025-08-08T05:33:04+00:00","versionOfRecord":[],"versionCreatedAt":"2025-08-08 05:33:04","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7312496","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7312496","identity":"rs-7312496","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-05-23T02:00:01.238055+00:00
License: CC-BY-4.0