Crowd signals: Early detection of disease outbreaks using real-time healthcare occupancy data | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Crowd signals: Early detection of disease outbreaks using real-time healthcare occupancy data Vanderson Souza Sampaio, Jose Araujo, Juan Silva, Marcelo Bragatte, and 10 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7754752/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted You are reading this latest preprint version Abstract Early detection of disease outbreaks is critical for effective public health response, yet traditional surveillance systems often suffer from delayed reporting. Here, we investigate whether real-time occupancy data from healthcare facilities can act as an early warning indicator of possible outbreak activity. We analyzed occupancy trends from 17 emergency care units in the São Paulo metropolitan area and compared them with national surveillance data for infectious diseases, including SARS-CoV-2 and dengue virus. Dynamic time warping and Granger causality tests demonstrated that occupancy patterns anticipate infection dynamics with a mean lead time of three weeks. Early warning signals of three epidemiological events were identified as deviations from average occupancy. Local indicators of spatial association revealed persistent overcrowding hotspots in later outbreak stages, highlighting regions where sustained healthcare monitoring and surveillance remain necessary. These findings demonstrate the potential of privacy-safe passive occupancy data to support timely epidemic surveillance. Health sciences/Diseases/Infectious diseases Health sciences/Risk factors Real-time occupancy data early warning surveillance z-based epidemic volatility index health monitoring Figures Figure 1 Figure 2 Figure 3 Figure 4 Introduction Global health crises trigger widespread social, economic, and political disruptions, with healthcare facility overcrowding standing as a key pressure point for system resilience 1 . Beyond the direct burden on care delivery, overcrowding is associated with operational inefficiencies, negative patient outcomes, prolonged wait times, and reduced bed availability 2 , 3 . Environmental hazards such as heatwaves, wildfires, and floods have been recognized as important sources of pressure on health systems. These non-infectious drivers of overcrowding highlight the need for sensitive surveillance approaches that can monitor fluctuations in healthcare facility capacity 4 – 6 . Recently, electronic health records were used to estimate facility occupancy, but limitations in accessibility and the interval between data collection and analysis constrain their utility for rapid epidemiological response 7 . While pathogen-specific data remains essential for diagnosis and response, planning requires time 8 . From an operational perspective, sensitive early signals of anomalous healthcare usage can provide timely insights to support rapid decision-making and resource allocation 9 . Therefore, the ability to detect early warnings of disease outbreaks based on healthcare unit occupancy represents a potential innovation that could add a new layer to epidemic intelligence and significantly enhance public health surveillance 10 . Health systems worldwide use multiple surveillance strategies to track infectious diseases. These include passive notification systems for notifiable diseases such as COVID-19, dengue, and yellow fever 11 ; sentinel surveillance from selected healthcare facilities, syndromic and laboratory-based surveillance 12 . However, while these methods are well-established, they often suffer from limitations such as delayed reporting, high infrastructure demands, and lack of real-time availability 13 . Currently, many digital surveillance tools incorporate artificial intelligence to mine open-source data from social media or news, yet these approaches can suffer from noise and limited clinical relevance 14 . Despite advances, most official notification systems still require several days to identify and report outbreaks, causing delays that can undermine timely interventions 15 . While surveillance systems exist to track infectious diseases, few tools effectively monitor real-time stress on healthcare infrastructure. In this context, there is a pressing need for complementary systems that harness alternative data streams to enhance the speed, sensitivity, and scalability of epidemic detection. In this study, we present a novel method that leverages real-time occupancy data from healthcare facilities, made available by Google Maps™, to detect early warning signals of potential outbreak activity. This data, derived from anonymized and aggregated user location metrics, provides a dynamic proxy for occupancy levels in clinical settings. Sudden increases in facility occupancy were associated with intensified pathogen circulation, making this a powerful source for early outbreak detection. We tested this hypothesis by analyzing occupancy patterns from 17 public urgent care units in the São Paulo metropolitan area. By applying Dynamic Time Warping (DTW) and Granger Causality analyses, we demonstrated that occupancy trends are associated with confirmed cases of respiratory pathogens and dengue virus (DENV) infections by up to five weeks. Unsupervised spatial analysis uncovered persistent high levels of healthcare unit occupancy in the later stages of the epidemiological event, pointing to areas that continued to experience system strain and required ongoing monitoring. Our findings show that the use of privacy-safe passive mobility data can offer a timely, low-cost, and scalable solution to enhance epidemic surveillance and public health responses. Results Healthcare occupancy patterns reflect epidemiological events timing Across seven cities in the São Paulo metropolitan area, aggregated occupancy data from 17 healthcare units reflected population-level dynamics from July 15, 2023, to October 12, 2024 (Fig. 1A and Supplementary Data 1). Temporal relationships between weekly occupancy averages and laboratory-based surveillance data (Fig. 1B) revealed whether historical trends in one time series exhibited lagged associations with the other, using time-lagged modeling approaches that account for directional dependencies (Fig. 1C). Laboratory surveillance data included SARS-CoV-2, DENV, and other respiratory viruses of public health importance (RV: SARS-CoV-2, Influenza virus A/B, and respiratory syncytial virus). Temporal distribution of occupancy levels revealed notably coherent dynamics across the healthcare units, despite their independent selection and heterogeneous contexts (Fig. 2A). Three distinct periods concentrated the highest occupancy levels across units: from July to October 2023, from December 2023 to May 2024, and from August to September 2024. The occupancy average trajectory highlights the collective shifts and further supports the similar patterns among the units (Fig. 2A, lower panel). These periods of elevated occupancy partially overlap with potential outbreak intervals (Fig. 2B-C), suggesting consistent temporal patterns in occupancy surges that may reflect broader epidemiological dynamics. Despite using occupancy averages, waves of overcrowding were consistently observed across the 17 monitored units, underscoring the synchronized signal that can be used to track health system pressures as they evolve. Next, we associated the temporal dynamics of healthcare unit occupancy alongside the laboratory-based surveillance indicators (Fig. 2B-C). Notably, the increasing occupancy levels during the first and second waves tend to anticipate the peaks of respiratory pathogen outbreak progression by up to five weeks. Although the third wave demonstrated a reduced lead time for anticipating the pathogen surge, a substantial real-time increase in occupancy levels was observed following the increase in laboratory-confirmed cases. Even in scenarios where early anticipation is limited, such occupancy-based signals remain valuable for situational awareness and can support timely decision-making by healthcare managers. Also, this reduced anticipation window is partially explained by the decline in DENV circulation during the second semester of 2024, followed by a sudden rise in SARS-CoV-2 cases in August 2024 (Fig. 2B-C). The transition between these two epidemiological phases created a pronounced valley in overall laboratory-based indicators around June and July. High occupancy averages indicate elevated laboratory-confirmed infections To quantify temporal similarity between healthcare occupancy and confirmed pathogen-specific epidemiological events, we applied DTW on scaled frequency over all 66 weeks of data. Lower DTW distances indicate greater temporal similarity after allowing for local shifts in time, making it a suitable metric for comparing epidemic signals with potential lags or phase differences. SARS-CoV-2 and RV test positivity rate indicators showed the strongest temporal concordance with occupancy patterns, as reflected in their lower warping distances with 0.89 and 0.97, respectively. In contrast, DENV-related series showed higher distances, indicating phase shifts or misalignment relative to occupancy surges. Such distance was primarily driven by a single pronounced peak in dengue virus activity during the second half of the analyzed period. Because DTW was calculated over the full timeline, this abrupt and concentrated nature of the DENV wave reduced the overall similarity with the occupancy trends. As the observed alignments from the DTW analysis demonstrated a lag structure, particularly the consistent 5-week offsets between the peaks (Fig. 2B-C), we next evaluated whether changes in occupancy could statistically inform subsequent trends in pathogen-specific signals. To capture time-lagged influences across distinct epidemic dynamics, we opted to segment the data into three partially overlapping lag-aware periods: a first period from July to December 2023, a second from October 2023 to June 2024, and a third period from June 2023 to October 2024. The overlapping lag-aware periods account for the temporal lags specific to each pathogen while enabling clearer attribution of occupancy changes to individual epidemic signals. Directional dependencies between time series were further assessed using the Granger causality test applied to differenced and detrended data. Significant associations (p < 0.05) were identified across all three shifted periods (Table 1). In the first lag-aware period (July–December 2023), we detected significant associations for all three pathogen groups, with RV-related indicators contributing the highest number of causal relationships (n = 5), followed by DENV (n = 2) and SARS-CoV-2 (n = 2). There was a mean lag of 3 weeks across these associations, supporting the findings from the DTW analysis. During the second lag-aware period (October 2023–June 2024), DENV emerged as the dominant driver of occupancy variation, with five highly significant associations (p < 0.01) and lags ranging from one to five weeks. No significant associations were observed for RV or SARS-CoV-2 during this period, suggesting that dengue activity alone accounted for occupancy increases during this phase. In the third lag-aware period (June–October 2024), signal strength was lower overall, but short-term directional associations (lag = 1) were identified for both SARS-CoV-2 and RV indicators with marginal significance (p = 0.03 and p = 0.04, respectively). These wave-specific results reinforce the notion that healthcare occupancy responds to different pathogens and further demonstrate its potential as a lag-sensitive indicator of epidemic burden. Early warning of epidemic burden using occupancy-derived volatility metrics On the basis of this temporal lag structure, we applied a 42-day (6-week) moving average to healthcare occupancy data to calculate the z-scores (Fig. 4A). This window was selected based on the consistent 5-week lead time between occupancy data and pathogen-specific signals, with an additional week added to enhance sensitivity and minimize the influence of short-term noise. Weekly deviations (standard deviations, SD) from the moving average were defined as the indicator of system pressure. During the initial phase of each occupancy wave, the volatility signals intensify as weekly values rise above the moving average. This signal converges at the peak of the wave, where short-term and long-term trends align. The opposite occurs in the later stages of the detected events, where weekly occupancy values begin to decline while the moving average remains elevated, resulting in a signal indicative of decreasing healthcare system usage. Three distinct anomalies were evident in the deviations from the mean, beginning in August 2023, December 2023, and August 2024 (Fig. 4A). Although aggregated trends across all 17 units provide an overview of occupancy patterns in the analyzed region, the same volatility index can be disaggregated to understand unit-specific dynamics. When comparing the initial (August 26, 2023 - Fig. 4B-D) against the later stages (November 04, 2023 - Fig. 4E-G) of an outbreak, we identified distinct spatial and temporal in-unit occupancies. At the earlier time point, all units showed positive deviations from the moving average, including 13 with high-intensity alerts (red) and four with moderate deviations (yellow), with most facilities operating near or above 50% occupancy. By the later time point, 10 units exhibited occupancy declines (green bars), while the remaining 7 continued to show signs of strain. Importantly, 16 of the 17 units recorded occupancy rates below 50%, suggesting a system-wide reduction in demand. Localized areas of persistent pressure were identified using the Local Indicators of Spatial Association (LISA) test, which identified clusters of units with similar occupancy patterns. Four units (8, 9, 10, and 15) revealed a significant hotspot (p-value = 0.03) during the later phase of the outbreak (Fig. 4G). Mapping these hotspots exposed regions with abnormally high occupancy during both non-outbreak (Supp. Figure 1A) and outbreak periods (Supp. Figure 1B, Supplementary Data 2). These findings demonstrate that, even as overall demand decreases, spatial heterogeneity in healthcare burden persists and highlight the importance of unit-level monitoring for targeted public health interventions. Discussion While many disease forecasting models aim to predict pathogen-specific trends, each disease follows its own transmission dynamics, often limiting the generalizability of such approaches. However, from a surveillance management perspective, the priority is to identify anomalous activity early enough to allocate resources and mitigate the healthcare systemic pressure. Our study with sixty weeks of data during 2023–2024 proposes an innovative approach to track real-time healthcare occupancy and highlight periods of health system pressure. We demonstrated the use of occupancy data as a complementary system that harnesses alternative data streams to enhance the speed, sensitivity, and scalability of epidemic surveillance. Regardless of the disease etiology, the occupancy signal preceded the peak of cases of pathogens of major public health importance by up to five weeks, as indicated by the first occupancy wave. This temporal relationship was observed consistently across multiple epidemic waves and for distinct pathogen infections, reinforcing the idea of using occupancy trends for epidemiological purposes. Although infections caused by viral pathogens tend to follow well-defined seasonal patterns, multiple factors can interfere with transmission dynamics and either advance or delay expected epidemiological events, such as climate change 16 and human behavioral shifts like lockdowns 17 . For this reason, it is necessary to use a method sensitive enough to capture nuanced variations in health care systems. In parallel, there is an increasing demand from the Ministry of Health (MoH) to support efficient early warning systems to enhance evidence-informed epidemic management 18 . Our work offers a scalable approach that indicates outbreak activity as a complement to the existing surveillance methods. During public health emergencies, this occupancy-based method can be used as an operational trigger for further epidemiological investigations to allocate resources for pathogen identification in a timely manner. This capability supports the development of an adaptive surveillance system that can detect emerging threats faster and aligns with the Brazilian MoH's goal of improving the responsiveness of Brazil’s public health surveillance infrastructure. The temporal associations observed between occupancy data and laboratory-based indicators offer strong support for the use of healthcare facility activity as a leading signal of epidemic burden. Importantly, the occupancy signal demonstrated a consistent ability to reflect underlying infection dynamics, whether through anticipation or real-time alignment, as seen in the third wave. This underscores the broader utility of occupancy trends not only for forecasting but also for near real-time situational awareness, particularly when traditional surveillance systems may lag or underperform. The sensitivity of this signal to both arboviral and respiratory pathogens, regardless of their transmission mechanisms or seasonality, underscores its integrative value for public health surveillance. Moreover, the detection of statistically significant lagged associations across multiple periods through Granger causality analysis supports the robustness and reproducibility of this approach in diverse epidemic contexts. Due to its sensitivity to healthcare system burden regardless of disease etiology, the occupancy signal anticipated a 5-week lead between the first peak of occupancy and the peak of the second SARS-CoV-2 wave (Fig. 2B). However, during the decline of this SARS-CoV-2 wave, occupancy levels did not decrease in parallel, likely due to the simultaneous increase in dengue virus (DENV) infections. Between 2023 and 2024, a total of 3.79 million dengue cases were reported in Brazil, with 52% occurring in the Southeast region 19 . The state of São Paulo, the most densely populated area in the country, accounted for a significant share of this burden with 2,745 cases per 100,000 inhabitants. This represented the highest dengue burden during the period analyzed. Notably, the occupancy levels remained sensitive throughout, capturing the distinct dynamics of both SARS-CoV-2 and DENV waves despite their differing seasonality, further underscoring its value as a potential integrative and timely indicator of epidemic pressure. The integration of a volatility-based metric with spatial association analysis provides an additional layer of resolution for identifying anomalous patterns in healthcare system usage. By capturing deviations from expected occupancy levels at both regional and unit-specific scales, this method adapts dynamically to reveal localized system pressures that may signal emerging outbreaks before they escalate to broader system-wide impact. Notably, occupancy surges in just a few units within a single municipality can already serve as actionable warning signals for local health administrators. These early alerts offer not only operational value but also strategic insights into structural disparities in healthcare access. Persistent high-occupancy hotspots revealed areas of concentrated demand and potential gaps in the equitable distribution of healthcare services. Recognizing these spatial inequities is crucial for guiding the reallocation of resources, including the redistribution of human resources and medical supplies to the most affected units. In this way, the volatility index, combined with spatial analysis, can support health system equity goals by identifying zones where population needs are highest and response capacity may be insufficient. This strengthens the capacity for targeted interventions, enabling managers to balance efficiency with fairness in epidemic preparedness and response. Despite the strengths of this approach, it has limitations. Notably, the Google Maps™ occupancy data lacks full detailed methodological transparency and may be subject to representativeness bias depending on the type of health care unit, user demographics, and regional coverage. Therefore, while this method offers valuable insights into the health care system burden, it is not intended to work in isolation. Its full potential is realized when integrated with other surveillance approaches such as syndromic, laboratory, and genomic surveillance. These classical epidemiological approaches provide pathogen-specific and population-level contextual information. Together, these complementary tools can enhance the resolution of health surveillance and public health policies. Conclusions This study presented an innovative, low-cost, and easy-to-implement method with the capability to detect epidemiological events up to five weeks in advance compared to laboratory test data. Our approach provides valuable insights that can support public health managers in making data-driven decisions and implementing effective control measures. The geospatial visualization and comparative analysis between outbreak and non-outbreak periods emphasize the importance of continuous surveillance and efficient health service management to prevent system overburden and ensure quality care for the population. Future improvements should focus on expanding the epidemiological data pool and incorporating seasonal variables for an even more robust approach. These findings underscore the relevance of real-time data monitoring as a critical component of surveillance systems capable of timely epidemic response. Methods Data source and selection criteria Real-time occupancy percentage values were obtained from Google Maps™ from 17 emergency care units hourly from seven cities in the São Paulo metropolitan area between July 15, 2023, and October 12, 2024. During the collection process, we ensured that no personal information was obtained and all data were aggregated to ensure safety compliance. These units were randomly selected to represent different neighborhoods in the São Paulo area, which also serves as a primary hub for the state. The use of metropolitan area data as representative of broader regional and state patterns is supported by several studies that have demonstrated significant associations between metropolitan and state-level epidemiological indicators 20 – 22 . Five epidemiological weeks (2023-08-19, 2023-11-25, 2023-12-02, 2023-12-09, 2023-12-30) were excluded due to consecutive days with missing data. In total, 66 weeks were available for subsequent analysis. For weeks with only a single missing hourly record, numerical data interpolation (method=’linear’) was applied. These non-consecutive missing data points typically resulted from interruptions in internet connectivity. Diagnostic test results for SARS-CoV-2, influenza virus (A and B), respiratory syncytial virus, and dengue virus were obtained from different sources. First, anonymized diagnostic test data were obtained from seven private laboratories that compose a prospective pathogen monitoring initiative coordinated by the Instituto Todos pela Saúde (ITpS) - All for Health Institute 23 . These diagnostic data are mainly derived from symptomatic individuals seeking assistance in private healthcare facilities in São Paulo state. In this case, randomly selected individuals, including symptomatic and asymptomatic individuals, were tested. Second, test results were obtained from the government epidemiological surveillance information system SIVEP-SRAG from Open Datasus, which monitors cases of Severe Acute Respiratory Infections (SARI). SIVEP-SRAG is a compulsory notification repository for COVID-19 and other respiratory diseases from both public and private healthcare systems in Brazil. Third, DENV case data were sourced from InfoDengue 24 , a surveillance platform that issues arbovirus alerts. In this study, we included only confirmed cases with pathogen detection, represented by the number of positive results from antigen or RT-qPCR tests from the three data sources. All datasets used in this study (ITpS, SIVEP-SRAG, and InfoDengue) are from public repositories that do not provide sensitive patient data; therefore, analyses were based on them, follow open data principles, and do not require ethics committee approval in Brazil. Real-time occupancy and laboratory data were consolidated for downstream epidemiological analyses (Supplementary Data 3) Time series similarity assessment Hourly data from each healthcare facility were first aggregated into daily averages. Subsequently, the weekly occupancy values were calculated as the average of the daily values of each epidemiological week. We separated the laboratory-based surveillance data by source and three main groups: (i) SARS-CoV-2, (ii) DENV, and (iii) a respiratory panel of important pathogens for Brazilian public health, including SARS-CoV-2, respiratory syncytial virus (RSV), Influenza A and B. These diagnostic test results were consolidated weekly. For disease case counts from SIVEP-SRAG and InfoDengue public repositories, we used the number of positive tests. For infection rates, we calculated the positivity rate by dividing the number of positive tests by the total number of tests and multiplying by 100 to obtain the percentage. Dynamic time warping from the dtaidistance library [v.2.3.12] 25 was applied to assess the temporal alignment between the occupancy data and the laboratory-based surveillance data. By default, DTW returns a distance of zero when comparing a time series with itself, and increases as temporal misalignment or shape differences grow. This makes it suitable for capturing non-linear shifts and local distortions in timing between epidemic indicators. To ensure comparability of temporal patterns rather than absolute magnitudes, each time series was normalized independently by linearly rescaling its minimum and maximum values to a 0–1 range using scikit-learn 26 . Given the multiple pathogen time series analyzed, we adjusted the occupancy wave periods according to the observed temporal lags in pathogen circulation to ensure evaluation of the complete trend period. The partial overlaps introduced by this adjustment were identified using DTW, which aligns temporal patterns across datasets and highlights phase shifts between signals. To evaluate whether temporal changes could consistently explain the relationship between emergency care unit occupancy and seasonal variations in pathogen circulation, we applied the Granger causality test. The Augmented Dickey-Fuller (ADF) test was used to evaluate stationarity, and first-order differencing was applied when necessary. We analyzed three distinct temporal waves defined based on epidemiological weeks. For each wave, we tested the causality between occupancy percentage changes and pathogen-specific indicators using the statsmodels [v.0.14.4] 27 implementation of the Granger causality test, considering lags tested up to five weeks due to data availability constraints on each wave. All statistical estimates, such as lag-specific and p-values, were exported for further interpretation. Z-based epidemic volatility index Next, to monitor fluctuations in emergency care unit occupancy, we adapted the epidemic volatility index 28 , an early warning algorithm designed to detect emerging epidemic waves by assessing relative changes in the standard deviation of case counts over time. In our adaptation, hourly occupancy data were aggregated into daily averages. For each facility, we computed a 42-day moving average and standard deviation, using a window length set one week longer than the maximum lag tested in the Granger causality analyses to ensure short-term trends were represented. Z-scores or standard deviations from the mean were calculated by subtracting the moving average from the daily occupancy percentage and dividing by the corresponding standard deviation. These daily z-scores were then aggregated into weekly summaries aligned with epidemiological weeks. To classify the intensity of occupancy fluctuations, we applied predefined thresholds to the weekly z-scores. Values below 0 were labeled indicating low volatility (green), values between 0 and 0.65 as moderate volatility (yellow), and values above 0.65 (high volatility). We selected this threshold for high volatility as it represents a moderate deviation - approximately two-thirds of a standard deviation- from the moving average over a 42-day (six weeks) window. Under a normal distribution, about 74% of observations fall below 0.65 standard deviations, making this threshold sensitive enough to detect atypical increases in occupancy while limiting false positives from routine variability. This choice also accounts for the limited range of percentage data, which reduces variability compared to absolute case counts. Geospatial statistics information Next, to evaluate spatial relationships between healthcare units, we computed pairwise distances using Haversine's algorithm to generate a geographic distance matrix (here in kilometers) across each pair of units (Supp. Figure 2). This algorithm calculates circle distances based on latitude and longitude coordinates while accounting for the Earth’s curvature. To identify localized patterns of weekly occupancy, we applied the Local Indicators of Spatial Association (LISA) test to average occupancy rates vs geographical distance. The LISA method detects spatial clusters of similar or dissimilar values by comparing each unit’s occupancy value with those of its nearest neighbors. For this, we used a spatial weights matrix based on the nearest neighbors (k = 4) with significant units assigned with a p-value < 0.05. Geospatial maps and metrics were visualized using GeoPandas [v.1.0.1] and Matplotlib [v.3.10.0]. Declarations Acknowledgements We acknowledge Google Maps™ for providing aggregate occupancy data essential to this study. We are grateful to the Brazilian Ministry of Health for public access to SIVEP-SRAG and InfoDengue surveillance platforms. The findings and conclusions are solely those of the authors and do not necessarily reflect the official position of the funding institution. Authorship information J.D.A., J.C.S.S., and V.S.S. contributed to the conceptualization and methodology of the study. E.R.S., J.D.A., and J.C.S.S. were responsible for software development and data curation. Validation was performed by J.D.A., J.C.S.S., M.A.S.B., E.C.S., H.I.N., C.S.L., J.R.R.P., G.O.P. and V.S.S. Formal analysis and investigation were carried out by J.D.A., J.C.S.S. and V.S.S. A.F.B. and M.S. provided resources. Data visualization was conducted by I.N.S., J.D.A., J.C.S.S., and A.F.B. The original draft of the manuscript was written by J.D.A., J.C.S.S., and V.S.S., and review and editing were performed by J.D.A., J.C.S.S., M.A.S.B., C.S.L., J.R.R.P., A.F.B,. and V.S.S. Supervision was provided by J.D.A., J.C.S.S., E.C.S., H.I.N., G.O.P., J.K., M.S., A.F.B,. and V.S. Project administration was conducted by J.D.A., J.C.S.S., G.O.P., J.K., M.S., and V.S. Funding acquisition was carried out by J.D.A., J.C.S.S., G.O.P., J.K., M.S. and V.S.S. Data and code availability The consolidated datasets and the complete Python codes to perform the analyses of this work are available at https://github.com/InstitutoTodosPelaSaude/paper_occupancy_detecta. Declaration of Interests The authors declare no competing interests. References Massuda, A., Hone, T., Leles, F. A. G., de Castro, M. C. & Atun, R. The Brazilian health system at crossroads: progress, crisis and resilience. BMJ Glob Health 3 , e000829 (2018). Campillo-Funollet, E. et al. Predicting and forecasting the impact of local outbreaks of COVID-19: use of SEIR-D quantitative epidemiological modelling for healthcare demand and capacity. Int. J. Epidemiol. 50 , 1103–1113 (2021). Sartini, M. et al. Overcrowding in Emergency Department: Causes, Consequences, and Solutions-A Narrative Review. Healthcare (Basel) 10 , (2022). Alho, A. M., Oliveira, A. P., Viegas, S. & Nogueira, P. Effect of heatwaves on daily hospital admissions in Portugal, 2000-18: an observational study. Lancet Planet. Health 8 , e318–e326 (2024). Requia, W. J., Amini, H., Mukherjee, R., Gold, D. R. & Schwartz, J. D. Health impacts of wildfire-related air pollution in Brazil: a nationwide study of more than 2 million hospital admissions between 2008 and 2018. Nat. Commun. 12 , 6555 (2021). Aggarwal, S., Hu, J. K., Sullivan, J. A., Parks, R. M. & Nethery, R. C. Severe flooding and cause-specific hospitalisation among older adults in the USA: a retrospective matched cohort analysis. The Lancet Planetary Health 9 , 101268 (2025). Marzano, L. et al. Diagnosing an overcrowded emergency department from its Electronic Health Records. Sci. Rep. 14 , 9955 (2024). Pellis, L. et al. Challenges in control of COVID-19: short doubling time and long delay to effect of interventions. Philos. Trans. R. Soc. Lond. B Biol. Sci. 376 , 20200264 (2021). Khanizadeh, F. et al. Smart data-driven medical decisions through collective and individual anomaly detection in healthcare time series. Int. J. Med. Inform. 194 , 105696 (2025). Swaan, C., van den Broek, A., Kretzschmar, M. & Richardus, J. H. Timeliness of notification systems for infectious diseases: A systematic literature review. PLoS ONE 13 , e0198845 (2018). Coelho Neto, G. C. & Chioro, A. [After all, how many nationwide Health Information Systems are there in Brazil?]. Cad. Saude Publica 37 , e00182119 (2021). Abat, C., Chaudet, H., Rolain, J.-M., Colson, P. & Raoult, D. Traditional and syndromic surveillance of infectious diseases and pathogens. Int. J. Infect. Dis. 48 , 22–28 (2016). Murray, J. & Cohen, A. L. Infectious Disease Surveillance. in International encyclopedia of public health 222–229 (Elsevier, 2017). doi:10.1016/B978-0-12-803678-5.00517-8. MacIntyre, C. R. et al. Artificial intelligence in public health: the potential of epidemic early warning systems. J. Int. Med. Res. 51 , 3000605231159335 (2023). Bastos, L. S. et al. A modelling approach for correcting reporting delays in disease surveillance data. Stat. Med. 38 , 4363–4377 (2019). Ruan, W., Liang, Y., Sun, Z. & An, X. Climate warming and influenza dynamics: the modulating effects of seasonal temperature increases on epidemic patterns. npj Climate and Atmospheric Science (2025). Chen, Z. et al. COVID-19 pandemic interventions reshaped the global dispersal of seasonal influenza viruses. Science 386 , eadq3003 (2024). Brasil. Ministério da Saúde. Demandas de pesquisas para apoio à gestão da Secretaria de Vigilância em Saúde e Ambiente [recurso eletrônico]. Ministério da Saúde, Secretaria de Vigilância em Saúde e Ambiente. Departamento de Ações Estratégicas de Epidemiologia e Vigilância em Saúde e Ambiente. (2025). SINAN-Dengue. SINANWEB - Dengue. https://portalsinan.saude.gov.br/dengue (2025). Nicolelis, M. A. L., Raimundo, R. L. G., Peixoto, P. S. & Andreazzi, C. S. The impact of super-spreader cities, highways, and intensive care availability in the early stages of the COVID-19 epidemic in Brazil. Sci. Rep. 11 , 13001 (2021). Cousins, H. C., Cousins, C. C., Harris, A. & Pasquale, L. R. Regional Infoveillance of COVID-19 Case Rates: Analysis of Search-Engine Query Patterns. J. Med. Internet Res. 22 , e19483 (2020). Fortaleza, C. M. C. B. et al. The use of health geography modeling to understand early dispersion of COVID-19 in São Paulo, Brazil. PLoS ONE 16 , e0245051 (2021). ITpS. Instituto Todos pela Saúde. (2025). Codeco, C. et al. Infodengue: A nowcasting system for the surveillance of arboviruses in Brazil. Revue d’Épidémiologie et de Santé Publique 66 , S386 (2018). Meert, W. et al. DTAIDistance. Zenodo (2020) doi:10.5281/zenodo.7158824. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V. & Thirion, B. Scikit-learn: Machine Learning in Python. The Journal of Machine Learning Research (2011). Johansen, S. Likelihood-Based Inference in Cointegrated Vector Autoregressive Models . (Oxford University Press, 1995). doi:10.1093/0198774508.001.0001. Kostoulas, P. et al. The epidemic volatility index, a novel early warning tool for identifying new waves in an epidemic. Sci. Rep. 11 , 23775 (2021). Table Table 1 is available in the Supplementary Files section Additional Declarations There is NO Competing Interest. Supplementary Files SuppData1.tsv Supplementary Data 1 SuppData2.tsv Supplementary Data 2 SuppData3.txt Supplementary Data 3 SuppFigure1.png Supplementary Figure 1 SuppFigure2.png Supplementary Figure 2 Table1.xlsx Table 1 Cite Share Download PDF Status: Under Review Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7754752","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":523692920,"identity":"9fd235a6-27a4-41c4-88cc-33e3b62ae2f5","order_by":0,"name":"Vanderson Souza Sampaio","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA2klEQVRIiWNgGAWjYJACCQjFfADEliFFC1sCiM1DihYeAzBJULnB8bMHb3zcUydnzn/m86sbNRY8DOyHj27Aq+VMXrLljGeHjS1n5G6zzjkGdBhPWtoNfFokG3LMpHkOHEjccIN3m3EOG1CLBI8Zfi39b8yk/xyoS9xw/swz45x/RGjhlwDawnCAOXHDgRzmx7ltRGl5Y2zZc+CwscGNNDPm3D4JHjZCfmHjzzG88eNAnZzB+cOPP+d8q5PjZz98DK8WFO3gCGIjVjkIMH8gRfUoGAWjYBSMHAAALIJGrK/x5l4AAAAASUVORK5CYII=","orcid":"https://orcid.org/0000-0001-7307-8851","institution":"Instituto Todos pela Saúde","correspondingAuthor":true,"prefix":"","firstName":"Vanderson","middleName":"Souza","lastName":"Sampaio","suffix":""},{"id":523692921,"identity":"b2e3ba1c-32f7-4223-88aa-947b5b35c62f","order_by":1,"name":"Jose Araujo","email":"","orcid":"","institution":"Instituto Todos pela Saude ITpS, Sao Paulo, Brazil","correspondingAuthor":false,"prefix":"","firstName":"Jose","middleName":"","lastName":"Araujo","suffix":""},{"id":523692922,"identity":"a0abb905-b55c-468b-a8c5-1127c431791d","order_by":2,"name":"Juan Silva","email":"","orcid":"","institution":"Instituto Todos pela Saude ITpS, Sao Paulo, Brazil","correspondingAuthor":false,"prefix":"","firstName":"Juan","middleName":"","lastName":"Silva","suffix":""},{"id":523692923,"identity":"6ae2f562-8e4a-454a-b03b-ee04c87c9016","order_by":3,"name":"Marcelo Bragatte","email":"","orcid":"https://orcid.org/0000-0001-6031-4755","institution":"Instituto Todos pela Saude ITpS, Sao Paulo, Brazil","correspondingAuthor":false,"prefix":"","firstName":"Marcelo","middleName":"","lastName":"Bragatte","suffix":""},{"id":523692924,"identity":"86905d9a-17e0-40e1-bbfe-04071ee1fb23","order_by":4,"name":"Erick Sousa","email":"","orcid":"","institution":"Instituto Todos pela Saude ITpS, Sao Paulo, Brazil","correspondingAuthor":false,"prefix":"","firstName":"Erick","middleName":"","lastName":"Sousa","suffix":""},{"id":523692925,"identity":"9e580feb-8864-4af7-b43c-bb28fb162210","order_by":5,"name":"Isaac Schrarstzhaupt","email":"","orcid":"https://orcid.org/0000-0002-4451-3612","institution":"Faculdade de Saude Publica, Universidade de Sao Paulo, Sao Paulo, Brazil","correspondingAuthor":false,"prefix":"","firstName":"Isaac","middleName":"","lastName":"Schrarstzhaupt","suffix":""},{"id":523692926,"identity":"d9926989-f2a6-4e87-927b-87bcf2a15be8","order_by":6,"name":"Ester sabino","email":"","orcid":"https://orcid.org/0000-0003-2623-5126","institution":"USP","correspondingAuthor":false,"prefix":"","firstName":"Ester","middleName":"","lastName":"sabino","suffix":""},{"id":523692927,"identity":"b45285e1-d975-470f-81ba-5ce019c31c9b","order_by":7,"name":"Helder Nakaya","email":"","orcid":"https://orcid.org/0000-0001-5297-9108","institution":"Hospital Israelita Albert Einstein","correspondingAuthor":false,"prefix":"","firstName":"Helder","middleName":"","lastName":"Nakaya","suffix":""},{"id":523692928,"identity":"1e96e216-5c8b-4b3d-928b-5c4f4cafb713","order_by":8,"name":"Carolina Lazari","email":"","orcid":"","institution":"Division of Clinical Pathology, Department of Pathology, Hospital das Clinicas HCFMUSP, Faculty of Medicine, University of Sao Paulo, Sao Paulo, SP, Brazil","correspondingAuthor":false,"prefix":"","firstName":"Carolina","middleName":"","lastName":"Lazari","suffix":""},{"id":523692929,"identity":"a6388eab-a807-40f8-baf7-5ef44f23123a","order_by":9,"name":"Joao Renato Rebello Pinho","email":"","orcid":"https://orcid.org/0000-0003-3999-0489","institution":"Hospital Albert Einstein","correspondingAuthor":false,"prefix":"","firstName":"Joao","middleName":"Renato Rebello","lastName":"Pinho","suffix":""},{"id":523692930,"identity":"5fee4d7f-5028-43ca-8a3e-81227574b9fb","order_by":10,"name":"Gerson Penna","email":"","orcid":"","institution":"Instituto Todos pela Saude ITpS, Sao Paulo, Brazil","correspondingAuthor":false,"prefix":"","firstName":"Gerson","middleName":"","lastName":"Penna","suffix":""},{"id":523692931,"identity":"2600aa92-a6da-4ebc-8d67-4bdc31f4d11e","order_by":11,"name":"Jorge Kalil","email":"","orcid":"","institution":"University of Sao Paulo","correspondingAuthor":false,"prefix":"","firstName":"Jorge","middleName":"","lastName":"Kalil","suffix":""},{"id":523692932,"identity":"d0b5786e-91d9-45b2-a115-c70c0e3cbf75","order_by":12,"name":"Mariangela Simao","email":"","orcid":"","institution":"Ministry of Health of Brazil, Secretariat of Health and Environmental Surveillance (SVSA), Brasilia, DF, Brazil","correspondingAuthor":false,"prefix":"","firstName":"Mariangela","middleName":"","lastName":"Simao","suffix":""},{"id":523692933,"identity":"69c4a063-ede4-4084-b212-51bcf3ab44c7","order_by":13,"name":"Anderson Brito","email":"","orcid":"","institution":"Instituto Todos pela Saude ITpS, Sao Paulo, Brazil","correspondingAuthor":false,"prefix":"","firstName":"Anderson","middleName":"","lastName":"Brito","suffix":""}],"badges":[],"createdAt":"2025-09-30 20:35:08","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7754752/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7754752/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":94845193,"identity":"ddedafb1-ed6b-4c99-8f06-0f151b2ff9b8","added_by":"auto","created_at":"2025-10-31 10:07:43","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":1565749,"visible":true,"origin":"","legend":"\u003cp\u003eLegend not included with this version.\u003c/p\u003e","description":"","filename":"Figure1.png","url":"https://assets-eu.researchsquare.com/files/rs-7754752/v1/1cbfdd134ab91ed374b68915.png"},{"id":94845190,"identity":"d0dcaea0-4841-4dc7-941d-9dd51645e4a3","added_by":"auto","created_at":"2025-10-31 10:07:43","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":1453526,"visible":true,"origin":"","legend":"\u003cp\u003eLegend not included with this version.\u003c/p\u003e","description":"","filename":"Figure2.png","url":"https://assets-eu.researchsquare.com/files/rs-7754752/v1/859aeab3cf6337dbfeacdf3f.png"},{"id":94984948,"identity":"87e2ffc8-368e-46f2-b289-7df1073ed1ba","added_by":"auto","created_at":"2025-11-03 06:56:59","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":1716602,"visible":true,"origin":"","legend":"\u003cp\u003eLegend not included with this version.\u003c/p\u003e","description":"","filename":"Figure3.png","url":"https://assets-eu.researchsquare.com/files/rs-7754752/v1/db6afcc6fa188f0d885d9ec3.png"},{"id":94985505,"identity":"e585d8a7-d081-4476-9335-1c6640b9553a","added_by":"auto","created_at":"2025-11-03 06:58:18","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":3930631,"visible":true,"origin":"","legend":"\u003cp\u003eLegend not included with this version.\u003c/p\u003e","description":"","filename":"Figure4.png","url":"https://assets-eu.researchsquare.com/files/rs-7754752/v1/0190a49fb580cc77a45abaa6.png"},{"id":94990376,"identity":"e3a67beb-3cda-42ff-ace5-f8fa1c8829cf","added_by":"auto","created_at":"2025-11-03 07:16:41","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":8136998,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7754752/v1/013a42cf-9854-4460-a918-dee9f5fc7c5c.pdf"},{"id":94984797,"identity":"72e4b754-0383-429e-875a-c0951a5b9dea","added_by":"auto","created_at":"2025-11-03 06:56:17","extension":"tsv","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":3472,"visible":true,"origin":"","legend":"Supplementary Data 1","description":"","filename":"SuppData1.tsv","url":"https://assets-eu.researchsquare.com/files/rs-7754752/v1/7fa54594d550510daf30d2a6.tsv"},{"id":94845188,"identity":"fd3034f8-69ae-45fc-84e0-8e3d6e771975","added_by":"auto","created_at":"2025-10-31 10:07:43","extension":"tsv","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":4517,"visible":true,"origin":"","legend":"Supplementary Data 2","description":"","filename":"SuppData2.tsv","url":"https://assets-eu.researchsquare.com/files/rs-7754752/v1/950250342b413d7230a0e8be.tsv"},{"id":94984771,"identity":"73f12d88-d3d6-4278-aa3a-5c78fb38fbd4","added_by":"auto","created_at":"2025-11-03 06:56:07","extension":"txt","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":6455,"visible":true,"origin":"","legend":"Supplementary Data 3","description":"","filename":"SuppData3.txt","url":"https://assets-eu.researchsquare.com/files/rs-7754752/v1/e1317acb6f9f7a3b9b02a198.txt"},{"id":94985084,"identity":"446dfd1d-a6f3-48d9-9bef-4bc64b22d806","added_by":"auto","created_at":"2025-11-03 06:57:24","extension":"png","order_by":4,"title":"","display":"","copyAsset":false,"role":"supplement","size":420535,"visible":true,"origin":"","legend":"\u003cp\u003eSupplementary Figure 1\u003c/p\u003e","description":"","filename":"SuppFigure1.png","url":"https://assets-eu.researchsquare.com/files/rs-7754752/v1/422fb690c985e96230434ab6.png"},{"id":94845196,"identity":"03514d7d-454a-4401-9746-1f3e180d21eb","added_by":"auto","created_at":"2025-10-31 10:07:43","extension":"png","order_by":5,"title":"","display":"","copyAsset":false,"role":"supplement","size":535598,"visible":true,"origin":"","legend":"\u003cp\u003eSupplementary Figure 2\u003c/p\u003e","description":"","filename":"SuppFigure2.png","url":"https://assets-eu.researchsquare.com/files/rs-7754752/v1/b5a5d844b56e168ac175c332.png"},{"id":94845194,"identity":"39a1aadc-71ac-4ad2-b770-6d4f4548ecce","added_by":"auto","created_at":"2025-10-31 10:07:43","extension":"xlsx","order_by":6,"title":"","display":"","copyAsset":false,"role":"supplement","size":49194,"visible":true,"origin":"","legend":"Table 1","description":"","filename":"Table1.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-7754752/v1/4fa023f71b3e977282909ea0.xlsx"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"Crowd signals: Early detection of disease outbreaks using real-time healthcare occupancy data","fulltext":[{"header":"Introduction","content":"\u003cp\u003eGlobal health crises trigger widespread social, economic, and political disruptions, with healthcare facility overcrowding standing as a key pressure point for system resilience \u003csup\u003e\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u003c/sup\u003e. Beyond the direct burden on care delivery, overcrowding is associated with operational inefficiencies, negative patient outcomes, prolonged wait times, and reduced bed availability \u003csup\u003e\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e,\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e\u003c/sup\u003e. Environmental hazards such as heatwaves, wildfires, and floods have been recognized as important sources of pressure on health systems. These non-infectious drivers of overcrowding highlight the need for sensitive surveillance approaches that can monitor fluctuations in healthcare facility capacity \u003csup\u003e\u003cspan additionalcitationids=\"CR5\" citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e\u003c/sup\u003e. Recently, electronic health records were used to estimate facility occupancy, but limitations in accessibility and the interval between data collection and analysis constrain their utility for rapid epidemiological response \u003csup\u003e\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e\u003c/sup\u003e. While pathogen-specific data remains essential for diagnosis and response, planning requires time \u003csup\u003e\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u003c/sup\u003e. From an operational perspective, sensitive early signals of anomalous healthcare usage can provide timely insights to support rapid decision-making and resource allocation \u003csup\u003e\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u003c/sup\u003e. Therefore, the ability to detect early warnings of disease outbreaks based on healthcare unit occupancy represents a potential innovation that could add a new layer to epidemic intelligence and significantly enhance public health surveillance \u003csup\u003e\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e\u003cp\u003eHealth systems worldwide use multiple surveillance strategies to track infectious diseases. These include passive notification systems for notifiable diseases such as COVID-19, dengue, and yellow fever \u003csup\u003e\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u003c/sup\u003e; sentinel surveillance from selected healthcare facilities, syndromic and laboratory-based surveillance \u003csup\u003e\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e. However, while these methods are well-established, they often suffer from limitations such as delayed reporting, high infrastructure demands, and lack of real-time availability \u003csup\u003e\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u003c/sup\u003e. Currently, many digital surveillance tools incorporate artificial intelligence to mine open-source data from social media or news, yet these approaches can suffer from noise and limited clinical relevance \u003csup\u003e\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u003c/sup\u003e. Despite advances, most official notification systems still require several days to identify and report outbreaks, causing delays that can undermine timely interventions \u003csup\u003e\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e\u003c/sup\u003e. While surveillance systems exist to track infectious diseases, few tools effectively monitor real-time stress on healthcare infrastructure. In this context, there is a pressing need for complementary systems that harness alternative data streams to enhance the speed, sensitivity, and scalability of epidemic detection.\u003c/p\u003e\u003cp\u003eIn this study, we present a novel method that leverages real-time occupancy data from healthcare facilities, made available by Google Maps\u0026trade;, to detect early warning signals of potential outbreak activity. This data, derived from anonymized and aggregated user location metrics, provides a dynamic proxy for occupancy levels in clinical settings. Sudden increases in facility occupancy were associated with intensified pathogen circulation, making this a powerful source for early outbreak detection. We tested this hypothesis by analyzing occupancy patterns from 17 public urgent care units in the S\u0026atilde;o Paulo metropolitan area. By applying Dynamic Time Warping (DTW) and Granger Causality analyses, we demonstrated that occupancy trends are associated with confirmed cases of respiratory pathogens and dengue virus (DENV) infections by up to five weeks. Unsupervised spatial analysis uncovered persistent high levels of healthcare unit occupancy in the later stages of the epidemiological event, pointing to areas that continued to experience system strain and required ongoing monitoring. Our findings show that the use of privacy-safe passive mobility data can offer a timely, low-cost, and scalable solution to enhance epidemic surveillance and public health responses.\u003c/p\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\u003ch2\u003eHealthcare occupancy patterns reflect epidemiological events timing\u003c/h2\u003e\u003cp\u003eAcross seven cities in the S\u0026atilde;o Paulo metropolitan area, aggregated occupancy data from 17 healthcare units reflected population-level dynamics from July 15, 2023, to October 12, 2024 (Fig.\u0026nbsp;1A and Supplementary Data 1). Temporal relationships between weekly occupancy averages and laboratory-based surveillance data (Fig.\u0026nbsp;1B) revealed whether historical trends in one time series exhibited lagged associations with the other, using time-lagged modeling approaches that account for directional dependencies (Fig.\u0026nbsp;1C). Laboratory surveillance data included SARS-CoV-2, DENV, and other respiratory viruses of public health importance (RV: SARS-CoV-2, Influenza virus A/B, and respiratory syncytial virus).\u003c/p\u003e\u003cp\u003eTemporal distribution of occupancy levels revealed notably coherent dynamics across the healthcare units, despite their independent selection and heterogeneous contexts (Fig.\u0026nbsp;2A). Three distinct periods concentrated the highest occupancy levels across units: from July to October 2023, from December 2023 to May 2024, and from August to September 2024. The occupancy average trajectory highlights the collective shifts and further supports the similar patterns among the units (Fig.\u0026nbsp;2A, lower panel). These periods of elevated occupancy partially overlap with potential outbreak intervals (Fig.\u0026nbsp;2B-C), suggesting consistent temporal patterns in occupancy surges that may reflect broader epidemiological dynamics. Despite using occupancy averages, waves of overcrowding were consistently observed across the 17 monitored units, underscoring the synchronized signal that can be used to track health system pressures as they evolve.\u003c/p\u003e\u003cp\u003eNext, we associated the temporal dynamics of healthcare unit occupancy alongside the laboratory-based surveillance indicators (Fig.\u0026nbsp;2B-C). Notably, the increasing occupancy levels during the first and second waves tend to anticipate the peaks of respiratory pathogen outbreak progression by up to five weeks. Although the third wave demonstrated a reduced lead time for anticipating the pathogen surge, a substantial real-time increase in occupancy levels was observed following the increase in laboratory-confirmed cases. Even in scenarios where early anticipation is limited, such occupancy-based signals remain valuable for situational awareness and can support timely decision-making by healthcare managers. Also, this reduced anticipation window is partially explained by the decline in DENV circulation during the second semester of 2024, followed by a sudden rise in SARS-CoV-2 cases in August 2024 (Fig.\u0026nbsp;2B-C). The transition between these two epidemiological phases created a pronounced valley in overall laboratory-based indicators around June and July.\u003c/p\u003e\u003c/div\u003e\n\u003ch3\u003eHigh occupancy averages indicate elevated laboratory-confirmed infections\u003c/h3\u003e\n\u003cp\u003eTo quantify temporal similarity between healthcare occupancy and confirmed pathogen-specific epidemiological events, we applied DTW on scaled frequency over all 66 weeks of data. Lower DTW distances indicate greater temporal similarity after allowing for local shifts in time, making it a suitable metric for comparing epidemic signals with potential lags or phase differences. SARS-CoV-2 and RV test positivity rate indicators showed the strongest temporal concordance with occupancy patterns, as reflected in their lower warping distances with 0.89 and 0.97, respectively. In contrast, DENV-related series showed higher distances, indicating phase shifts or misalignment relative to occupancy surges. Such distance was primarily driven by a single pronounced peak in dengue virus activity during the second half of the analyzed period. Because DTW was calculated over the full timeline, this abrupt and concentrated nature of the DENV wave reduced the overall similarity with the occupancy trends.\u003c/p\u003e\u003cp\u003eAs the observed alignments from the DTW analysis demonstrated a lag structure, particularly the consistent 5-week offsets between the peaks (Fig.\u0026nbsp;2B-C), we next evaluated whether changes in occupancy could statistically inform subsequent trends in pathogen-specific signals. To capture time-lagged influences across distinct epidemic dynamics, we opted to segment the data into three partially overlapping lag-aware periods: a first period from July to December 2023, a second from October 2023 to June 2024, and a third period from June 2023 to October 2024. The overlapping lag-aware periods account for the temporal lags specific to each pathogen while enabling clearer attribution of occupancy changes to individual epidemic signals. Directional dependencies between time series were further assessed using the Granger causality test applied to differenced and detrended data. Significant associations (p\u0026thinsp;\u0026lt;\u0026thinsp;0.05) were identified across all three shifted periods (Table\u0026nbsp;1). In the first lag-aware period (July\u0026ndash;December 2023), we detected significant associations for all three pathogen groups, with RV-related indicators contributing the highest number of causal relationships (n\u0026thinsp;=\u0026thinsp;5), followed by DENV (n\u0026thinsp;=\u0026thinsp;2) and SARS-CoV-2 (n\u0026thinsp;=\u0026thinsp;2). There was a mean lag of 3 weeks across these associations, supporting the findings from the DTW analysis. During the second lag-aware period (October 2023\u0026ndash;June 2024), DENV emerged as the dominant driver of occupancy variation, with five highly significant associations (p\u0026thinsp;\u0026lt;\u0026thinsp;0.01) and lags ranging from one to five weeks. No significant associations were observed for RV or SARS-CoV-2 during this period, suggesting that dengue activity alone accounted for occupancy increases during this phase. In the third lag-aware period (June\u0026ndash;October 2024), signal strength was lower overall, but short-term directional associations (lag\u0026thinsp;=\u0026thinsp;1) were identified for both SARS-CoV-2 and RV indicators with marginal significance (p\u0026thinsp;=\u0026thinsp;0.03 and p\u0026thinsp;=\u0026thinsp;0.04, respectively). These wave-specific results reinforce the notion that healthcare occupancy responds to different pathogens and further demonstrate its potential as a lag-sensitive indicator of epidemic burden.\u003c/p\u003e\n\u003ch3\u003eEarly warning of epidemic burden using occupancy-derived volatility metrics\u003c/h3\u003e\n\u003cp\u003eOn the basis of this temporal lag structure, we applied a 42-day (6-week) moving average to healthcare occupancy data to calculate the z-scores (Fig.\u0026nbsp;4A). This window was selected based on the consistent 5-week lead time between occupancy data and pathogen-specific signals, with an additional week added to enhance sensitivity and minimize the influence of short-term noise. Weekly deviations (standard deviations, SD) from the moving average were defined as the indicator of system pressure. During the initial phase of each occupancy wave, the volatility signals intensify as weekly values rise above the moving average. This signal converges at the peak of the wave, where short-term and long-term trends align. The opposite occurs in the later stages of the detected events, where weekly occupancy values begin to decline while the moving average remains elevated, resulting in a signal indicative of decreasing healthcare system usage. Three distinct anomalies were evident in the deviations from the mean, beginning in August 2023, December 2023, and August 2024 (Fig.\u0026nbsp;4A).\u003c/p\u003e\u003cp\u003eAlthough aggregated trends across all 17 units provide an overview of occupancy patterns in the analyzed region, the same volatility index can be disaggregated to understand unit-specific dynamics. When comparing the initial (August 26, 2023 - Fig.\u0026nbsp;4B-D) against the later stages (November 04, 2023 - Fig.\u0026nbsp;4E-G) of an outbreak, we identified distinct spatial and temporal in-unit occupancies. At the earlier time point, all units showed positive deviations from the moving average, including 13 with high-intensity alerts (red) and four with moderate deviations (yellow), with most facilities operating near or above 50% occupancy. By the later time point, 10 units exhibited occupancy declines (green bars), while the remaining 7 continued to show signs of strain. Importantly, 16 of the 17 units recorded occupancy rates below 50%, suggesting a system-wide reduction in demand.\u003c/p\u003e\u003cp\u003eLocalized areas of persistent pressure were identified using the Local Indicators of Spatial Association (LISA) test, which identified clusters of units with similar occupancy patterns. Four units (8, 9, 10, and 15) revealed a significant hotspot (p-value\u0026thinsp;=\u0026thinsp;0.03) during the later phase of the outbreak (Fig.\u0026nbsp;4G). Mapping these hotspots exposed regions with abnormally high occupancy during both non-outbreak (Supp. Figure\u0026nbsp;1A) and outbreak periods (Supp. Figure\u0026nbsp;1B, Supplementary Data 2). These findings demonstrate that, even as overall demand decreases, spatial heterogeneity in healthcare burden persists and highlight the importance of unit-level monitoring for targeted public health interventions.\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eWhile many disease forecasting models aim to predict pathogen-specific trends, each disease follows its own transmission dynamics, often limiting the generalizability of such approaches. However, from a surveillance management perspective, the priority is to identify anomalous activity early enough to allocate resources and mitigate the healthcare systemic pressure. Our study with sixty weeks of data during 2023\u0026ndash;2024 proposes an innovative approach to track real-time healthcare occupancy and highlight periods of health system pressure. We demonstrated the use of occupancy data as a complementary system that harnesses alternative data streams to enhance the speed, sensitivity, and scalability of epidemic surveillance. Regardless of the disease etiology, the occupancy signal preceded the peak of cases of pathogens of major public health importance by up to five weeks, as indicated by the first occupancy wave. This temporal relationship was observed consistently across multiple epidemic waves and for distinct pathogen infections, reinforcing the idea of using occupancy trends for epidemiological purposes.\u003c/p\u003e\u003cp\u003eAlthough infections caused by viral pathogens tend to follow well-defined seasonal patterns, multiple factors can interfere with transmission dynamics and either advance or delay expected epidemiological events, such as climate change \u003csup\u003e\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e\u003c/sup\u003e and human behavioral shifts like lockdowns \u003csup\u003e\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e\u003c/sup\u003e. For this reason, it is necessary to use a method sensitive enough to capture nuanced variations in health care systems. In parallel, there is an increasing demand from the Ministry of Health (MoH) to support efficient early warning systems to enhance evidence-informed epidemic management \u003csup\u003e\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e\u003c/sup\u003e. Our work offers a scalable approach that indicates outbreak activity as a complement to the existing surveillance methods. During public health emergencies, this occupancy-based method can be used as an operational trigger for further epidemiological investigations to allocate resources for pathogen identification in a timely manner. This capability supports the development of an adaptive surveillance system that can detect emerging threats faster and aligns with the Brazilian MoH's goal of improving the responsiveness of Brazil\u0026rsquo;s public health surveillance infrastructure.\u003c/p\u003e\u003cp\u003eThe temporal associations observed between occupancy data and laboratory-based indicators offer strong support for the use of healthcare facility activity as a leading signal of epidemic burden. Importantly, the occupancy signal demonstrated a consistent ability to reflect underlying infection dynamics, whether through anticipation or real-time alignment, as seen in the third wave. This underscores the broader utility of occupancy trends not only for forecasting but also for near real-time situational awareness, particularly when traditional surveillance systems may lag or underperform. The sensitivity of this signal to both arboviral and respiratory pathogens, regardless of their transmission mechanisms or seasonality, underscores its integrative value for public health surveillance. Moreover, the detection of statistically significant lagged associations across multiple periods through Granger causality analysis supports the robustness and reproducibility of this approach in diverse epidemic contexts.\u003c/p\u003e\u003cp\u003eDue to its sensitivity to healthcare system burden regardless of disease etiology, the occupancy signal anticipated a 5-week lead between the first peak of occupancy and the peak of the second SARS-CoV-2 wave (Fig.\u0026nbsp;2B). However, during the decline of this SARS-CoV-2 wave, occupancy levels did not decrease in parallel, likely due to the simultaneous increase in dengue virus (DENV) infections. Between 2023 and 2024, a total of 3.79\u0026nbsp;million dengue cases were reported in Brazil, with 52% occurring in the Southeast region \u003csup\u003e\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u003c/sup\u003e. The state of S\u0026atilde;o Paulo, the most densely populated area in the country, accounted for a significant share of this burden with 2,745 cases per 100,000 inhabitants. This represented the highest dengue burden during the period analyzed. Notably, the occupancy levels remained sensitive throughout, capturing the distinct dynamics of both SARS-CoV-2 and DENV waves despite their differing seasonality, further underscoring its value as a potential integrative and timely indicator of epidemic pressure.\u003c/p\u003e\u003cp\u003eThe integration of a volatility-based metric with spatial association analysis provides an additional layer of resolution for identifying anomalous patterns in healthcare system usage. By capturing deviations from expected occupancy levels at both regional and unit-specific scales, this method adapts dynamically to reveal localized system pressures that may signal emerging outbreaks before they escalate to broader system-wide impact. Notably, occupancy surges in just a few units within a single municipality can already serve as actionable warning signals for local health administrators. These early alerts offer not only operational value but also strategic insights into structural disparities in healthcare access. Persistent high-occupancy hotspots revealed areas of concentrated demand and potential gaps in the equitable distribution of healthcare services. Recognizing these spatial inequities is crucial for guiding the reallocation of resources, including the redistribution of human resources and medical supplies to the most affected units. In this way, the volatility index, combined with spatial analysis, can support health system equity goals by identifying zones where population needs are highest and response capacity may be insufficient. This strengthens the capacity for targeted interventions, enabling managers to balance efficiency with fairness in epidemic preparedness and response.\u003c/p\u003e\u003cp\u003eDespite the strengths of this approach, it has limitations. Notably, the Google Maps\u0026trade; occupancy data lacks full detailed methodological transparency and may be subject to representativeness bias depending on the type of health care unit, user demographics, and regional coverage. Therefore, while this method offers valuable insights into the health care system burden, it is not intended to work in isolation. Its full potential is realized when integrated with other surveillance approaches such as syndromic, laboratory, and genomic surveillance. These classical epidemiological approaches provide pathogen-specific and population-level contextual information. Together, these complementary tools can enhance the resolution of health surveillance and public health policies.\u003c/p\u003e"},{"header":"Conclusions","content":"\u003cp\u003eThis study presented an innovative, low-cost, and easy-to-implement method with the capability to detect epidemiological events up to five weeks in advance compared to laboratory test data. Our approach provides valuable insights that can support public health managers in making data-driven decisions and implementing effective control measures. The geospatial visualization and comparative analysis between outbreak and non-outbreak periods emphasize the importance of continuous surveillance and efficient health service management to prevent system overburden and ensure quality care for the population. Future improvements should focus on expanding the epidemiological data pool and incorporating seasonal variables for an even more robust approach. These findings underscore the relevance of real-time data monitoring as a critical component of surveillance systems capable of timely epidemic response.\u003c/p\u003e\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e\u003cdiv id=\"Sec9\" class=\"Section3\"\u003e\u003c/div\u003e\u003c/div\u003e\n\n"},{"header":"Methods","content":"\u003ch2\u003eData source and selection criteria\u003c/h2\u003e\u003cp\u003eReal-time occupancy percentage values were obtained from Google Maps™ from 17 emergency care units hourly from seven cities in the São Paulo metropolitan area between July 15, 2023, and October 12, 2024. During the collection process, we ensured that no personal information was obtained and all data were aggregated to ensure safety compliance. These units were randomly selected to represent different neighborhoods in the São Paulo area, which also serves as a primary hub for the state. The use of metropolitan area data as representative of broader regional and state patterns is supported by several studies that have demonstrated significant associations between metropolitan and state-level epidemiological indicators \u003csup\u003e\u003cspan additionalcitationids=\"CR21\" citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e–\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e\u003c/sup\u003e. Five epidemiological weeks (2023-08-19, 2023-11-25, 2023-12-02, 2023-12-09, 2023-12-30) were excluded due to consecutive days with missing data. In total, 66 weeks were available for subsequent analysis. For weeks with only a single missing hourly record, numerical data interpolation (method=’linear’) was applied. These non-consecutive missing data points typically resulted from interruptions in internet connectivity.\u003c/p\u003e\u003cp\u003eDiagnostic test results for SARS-CoV-2, influenza virus (A and B), respiratory syncytial virus, and dengue virus were obtained from different sources. First, anonymized diagnostic test data were obtained from seven private laboratories that compose a prospective pathogen monitoring initiative coordinated by the Instituto Todos pela Saúde (ITpS) - All for Health Institute \u003csup\u003e\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e\u003c/sup\u003e. These diagnostic data are mainly derived from symptomatic individuals seeking assistance in private healthcare facilities in São Paulo state. In this case, randomly selected individuals, including symptomatic and asymptomatic individuals, were tested. Second, test results were obtained from the government epidemiological surveillance information system SIVEP-SRAG from Open Datasus, which monitors cases of Severe Acute Respiratory Infections (SARI). SIVEP-SRAG is a compulsory notification repository for COVID-19 and other respiratory diseases from both public and private healthcare systems in Brazil. Third, DENV case data were sourced from InfoDengue \u003csup\u003e\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e\u003c/sup\u003e, a surveillance platform that issues arbovirus alerts. In this study, we included only confirmed cases with pathogen detection, represented by the number of positive results from antigen or RT-qPCR tests from the three data sources. All datasets used in this study (ITpS, SIVEP-SRAG, and InfoDengue) are from public repositories that do not provide sensitive patient data; therefore, analyses were based on them, follow open data principles, and do not require ethics committee approval in Brazil. Real-time occupancy and laboratory data were consolidated for downstream epidemiological analyses (Supplementary Data 3)\u003c/p\u003e\u003ch3\u003eTime series similarity assessment\u003c/h3\u003e\u003cp\u003eHourly data from each healthcare facility were first aggregated into daily averages. Subsequently, the weekly occupancy values were calculated as the average of the daily values of each epidemiological week. We separated the laboratory-based surveillance data by source and three main groups: (i) SARS-CoV-2, (ii) DENV, and (iii) a respiratory panel of important pathogens for Brazilian public health, including SARS-CoV-2, respiratory syncytial virus (RSV), Influenza A and B. These diagnostic test results were consolidated weekly. For disease case counts from SIVEP-SRAG and InfoDengue public repositories, we used the number of positive tests. For infection rates, we calculated the positivity rate by dividing the number of positive tests by the total number of tests and multiplying by 100 to obtain the percentage.\u003c/p\u003e\u003cp\u003eDynamic time warping from the dtaidistance library [v.2.3.12] \u003csup\u003e\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e\u003c/sup\u003e was applied to assess the temporal alignment between the occupancy data and the laboratory-based surveillance data. By default, DTW returns a distance of zero when comparing a time series with itself, and increases as temporal misalignment or shape differences grow. This makes it suitable for capturing non-linear shifts and local distortions in timing between epidemic indicators. To ensure comparability of temporal patterns rather than absolute magnitudes, each time series was normalized independently by linearly rescaling its minimum and maximum values to a 0–1 range using scikit-learn \u003csup\u003e\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e\u003c/sup\u003e. Given the multiple pathogen time series analyzed, we adjusted the occupancy wave periods according to the observed temporal lags in pathogen circulation to ensure evaluation of the complete trend period. The partial overlaps introduced by this adjustment were identified using DTW, which aligns temporal patterns across datasets and highlights phase shifts between signals.\u003c/p\u003e\u003cp\u003eTo evaluate whether temporal changes could consistently explain the relationship between emergency care unit occupancy and seasonal variations in pathogen circulation, we applied the Granger causality test. The Augmented Dickey-Fuller (ADF) test was used to evaluate stationarity, and first-order differencing was applied when necessary. We analyzed three distinct temporal waves defined based on epidemiological weeks. For each wave, we tested the causality between occupancy percentage changes and pathogen-specific indicators using the statsmodels [v.0.14.4] \u003csup\u003e27\u003c/sup\u003e implementation of the Granger causality test, considering lags tested up to five weeks due to data availability constraints on each wave. All statistical estimates, such as lag-specific and p-values, were exported for further interpretation.\u003c/p\u003e\u003ch2\u003eZ-based epidemic volatility index\u003c/h2\u003e\u003cp\u003eNext, to monitor fluctuations in emergency care unit occupancy, we adapted the epidemic volatility index \u003csup\u003e\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e\u003c/sup\u003e, an early warning algorithm designed to detect emerging epidemic waves by assessing relative changes in the standard deviation of case counts over time. In our adaptation, hourly occupancy data were aggregated into daily averages. For each facility, we computed a 42-day moving average and standard deviation, using a window length set one week longer than the maximum lag tested in the Granger causality analyses to ensure short-term trends were represented. Z-scores or standard deviations from the mean were calculated by subtracting the moving average from the daily occupancy percentage and dividing by the corresponding standard deviation. These daily z-scores were then aggregated into weekly summaries aligned with epidemiological weeks. To classify the intensity of occupancy fluctuations, we applied predefined thresholds to the weekly z-scores. Values below 0 were labeled indicating low volatility (green), values between 0 and 0.65 as moderate volatility (yellow), and values above 0.65 (high volatility). We selected this threshold for high volatility as it represents a moderate deviation - approximately two-thirds of a standard deviation- from the moving average over a 42-day (six weeks) window. Under a normal distribution, about 74% of observations fall below 0.65 standard deviations, making this threshold sensitive enough to detect atypical increases in occupancy while limiting false positives from routine variability. This choice also accounts for the limited range of percentage data, which reduces variability compared to absolute case counts.\u003c/p\u003e\u003ch2\u003eGeospatial statistics information\u003c/h2\u003e\u003cp\u003eNext, to evaluate spatial relationships between healthcare units, we computed pairwise distances using Haversine's algorithm to generate a geographic distance matrix (here in kilometers) across each pair of units (Supp. Figure\u0026nbsp;2). This algorithm calculates circle distances based on latitude and longitude coordinates while accounting for the Earth’s curvature. To identify localized patterns of weekly occupancy, we applied the Local Indicators of Spatial Association (LISA) test to average occupancy rates vs geographical distance. The LISA method detects spatial clusters of similar or dissimilar values by comparing each unit’s occupancy value with those of its nearest neighbors. For this, we used a spatial weights matrix based on the nearest neighbors (k = 4) with significant units assigned with a p-value \u0026lt; 0.05. Geospatial maps and metrics were visualized using GeoPandas [v.1.0.1] and Matplotlib [v.3.10.0].\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003eAcknowledgements\u003c/p\u003e\n\u003cp\u003eWe acknowledge Google Maps\u0026trade; for providing aggregate occupancy data essential to this study. We are grateful to the Brazilian Ministry of Health for public access to SIVEP-SRAG and InfoDengue surveillance platforms. The findings and conclusions are solely those of the authors and do not necessarily reflect the official position of the funding institution.\u003c/p\u003e\n\u003cp\u003eAuthorship information\u003c/p\u003e\n\u003cp\u003eJ.D.A., J.C.S.S., and V.S.S. contributed to the conceptualization and methodology of the study. E.R.S., J.D.A., and J.C.S.S. were responsible for software development and data curation. Validation was performed by J.D.A., J.C.S.S., M.A.S.B., E.C.S., H.I.N., C.S.L., J.R.R.P., G.O.P. and V.S.S. Formal analysis and investigation were carried out by J.D.A., J.C.S.S. and V.S.S. A.F.B. and M.S. provided resources. Data visualization was conducted by I.N.S., J.D.A., J.C.S.S., and A.F.B. The original draft of the manuscript was written by J.D.A., J.C.S.S., and V.S.S., and review and editing were performed by J.D.A., J.C.S.S., M.A.S.B., C.S.L., J.R.R.P., A.F.B,. and V.S.S. Supervision was provided by J.D.A., J.C.S.S., E.C.S., H.I.N., G.O.P., J.K., M.S., A.F.B,. and V.S. Project administration was conducted by J.D.A., J.C.S.S., G.O.P., J.K., M.S., and V.S. Funding acquisition was carried out by J.D.A., J.C.S.S., G.O.P., J.K., M.S. and V.S.S.\u003c/p\u003e\n\u003cp\u003eData and code availability\u003c/p\u003e\n\u003cp\u003eThe consolidated datasets and the complete Python codes to perform the analyses of this work are available at https://github.com/InstitutoTodosPelaSaude/paper_occupancy_detecta.\u003c/p\u003e\n\u003cp\u003eDeclaration of Interests\u003c/p\u003e\n\u003cp\u003eThe authors declare no competing interests.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eMassuda, A., Hone, T., Leles, F. A. G., de Castro, M. C. \u0026amp; Atun, R. The Brazilian health system at crossroads: progress, crisis and resilience. \u003cem\u003eBMJ Glob Health\u003c/em\u003e\u003cstrong\u003e3\u003c/strong\u003e, e000829 (2018).\u003c/li\u003e\n\u003cli\u003eCampillo-Funollet, E. \u003cem\u003eet al.\u003c/em\u003e Predicting and forecasting the impact of local outbreaks of COVID-19: use of SEIR-D quantitative epidemiological modelling for healthcare demand and capacity. \u003cem\u003eInt. J. Epidemiol.\u003c/em\u003e\u003cstrong\u003e50\u003c/strong\u003e, 1103\u0026ndash;1113 (2021).\u003c/li\u003e\n\u003cli\u003eSartini, M. \u003cem\u003eet al.\u003c/em\u003e Overcrowding in Emergency Department: Causes, Consequences, and Solutions-A Narrative Review. \u003cem\u003eHealthcare (Basel)\u003c/em\u003e\u003cstrong\u003e10\u003c/strong\u003e, (2022).\u003c/li\u003e\n\u003cli\u003eAlho, A. M., Oliveira, A. P., Viegas, S. \u0026amp; Nogueira, P. Effect of heatwaves on daily hospital admissions in Portugal, 2000-18: an observational study. \u003cem\u003eLancet Planet. Health\u003c/em\u003e\u003cstrong\u003e8\u003c/strong\u003e, e318\u0026ndash;e326 (2024).\u003c/li\u003e\n\u003cli\u003eRequia, W. J., Amini, H., Mukherjee, R., Gold, D. R. \u0026amp; Schwartz, J. D. Health impacts of wildfire-related air pollution in Brazil: a nationwide study of more than 2 million hospital admissions between 2008 and 2018. \u003cem\u003eNat. Commun.\u003c/em\u003e\u003cstrong\u003e12\u003c/strong\u003e, 6555 (2021).\u003c/li\u003e\n\u003cli\u003eAggarwal, S., Hu, J. K., Sullivan, J. A., Parks, R. M. \u0026amp; Nethery, R. C. Severe flooding and cause-specific hospitalisation among older adults in the USA: a retrospective matched cohort analysis. \u003cem\u003eThe Lancet Planetary Health\u003c/em\u003e\u003cstrong\u003e9\u003c/strong\u003e, 101268 (2025).\u003c/li\u003e\n\u003cli\u003eMarzano, L. \u003cem\u003eet al.\u003c/em\u003e Diagnosing an overcrowded emergency department from its Electronic Health Records. \u003cem\u003eSci. Rep.\u003c/em\u003e\u003cstrong\u003e14\u003c/strong\u003e, 9955 (2024).\u003c/li\u003e\n\u003cli\u003ePellis, L. \u003cem\u003eet al.\u003c/em\u003e Challenges in control of COVID-19: short doubling time and long delay to effect of interventions. \u003cem\u003ePhilos. Trans. R. Soc. Lond. B Biol. Sci.\u003c/em\u003e\u003cstrong\u003e376\u003c/strong\u003e, 20200264 (2021).\u003c/li\u003e\n\u003cli\u003eKhanizadeh, F. \u003cem\u003eet al.\u003c/em\u003e Smart data-driven medical decisions through collective and individual anomaly detection in healthcare time series. \u003cem\u003eInt. J. Med. Inform.\u003c/em\u003e\u003cstrong\u003e194\u003c/strong\u003e, 105696 (2025).\u003c/li\u003e\n\u003cli\u003eSwaan, C., van den Broek, A., Kretzschmar, M. \u0026amp; Richardus, J. H. Timeliness of notification systems for infectious diseases: A systematic literature review. \u003cem\u003ePLoS ONE\u003c/em\u003e\u003cstrong\u003e13\u003c/strong\u003e, e0198845 (2018).\u003c/li\u003e\n\u003cli\u003eCoelho Neto, G. C. \u0026amp; Chioro, A. [After all, how many nationwide Health Information Systems are there in Brazil?]. \u003cem\u003eCad. Saude Publica\u003c/em\u003e\u003cstrong\u003e37\u003c/strong\u003e, e00182119 (2021).\u003c/li\u003e\n\u003cli\u003eAbat, C., Chaudet, H., Rolain, J.-M., Colson, P. \u0026amp; Raoult, D. Traditional and syndromic surveillance of infectious diseases and pathogens. \u003cem\u003eInt. J. Infect. Dis.\u003c/em\u003e\u003cstrong\u003e48\u003c/strong\u003e, 22\u0026ndash;28 (2016).\u003c/li\u003e\n\u003cli\u003eMurray, J. \u0026amp; Cohen, A. L. Infectious Disease Surveillance. in \u003cem\u003eInternational encyclopedia of public health\u003c/em\u003e 222\u0026ndash;229 (Elsevier, 2017). doi:10.1016/B978-0-12-803678-5.00517-8.\u003c/li\u003e\n\u003cli\u003eMacIntyre, C. R. \u003cem\u003eet al.\u003c/em\u003e Artificial intelligence in public health: the potential of epidemic early warning systems. \u003cem\u003eJ. Int. Med. Res.\u003c/em\u003e\u003cstrong\u003e51\u003c/strong\u003e, 3000605231159335 (2023).\u003c/li\u003e\n\u003cli\u003eBastos, L. S. \u003cem\u003eet al.\u003c/em\u003e A modelling approach for correcting reporting delays in disease surveillance data. \u003cem\u003eStat. Med.\u003c/em\u003e\u003cstrong\u003e38\u003c/strong\u003e, 4363\u0026ndash;4377 (2019).\u003c/li\u003e\n\u003cli\u003eRuan, W., Liang, Y., Sun, Z. \u0026amp; An, X. Climate warming and influenza dynamics: the modulating effects of seasonal temperature increases on epidemic patterns. \u003cem\u003enpj Climate and Atmospheric Science\u003c/em\u003e (2025).\u003c/li\u003e\n\u003cli\u003eChen, Z. \u003cem\u003eet al.\u003c/em\u003e COVID-19 pandemic interventions reshaped the global dispersal of seasonal influenza viruses. \u003cem\u003eScience\u003c/em\u003e\u003cstrong\u003e386\u003c/strong\u003e, eadq3003 (2024).\u003c/li\u003e\n\u003cli\u003eBrasil. Minist\u0026eacute;rio da Sa\u0026uacute;de. Demandas de pesquisas para apoio \u0026agrave; gest\u0026atilde;o da Secretaria de Vigil\u0026acirc;ncia em Sa\u0026uacute;de e Ambiente [recurso eletr\u0026ocirc;nico]. \u003cem\u003eMinist\u0026eacute;rio da Sa\u0026uacute;de, Secretaria de Vigil\u0026acirc;ncia em Sa\u0026uacute;de e Ambiente. Departamento de A\u0026ccedil;\u0026otilde;es Estrat\u0026eacute;gicas de Epidemiologia e Vigil\u0026acirc;ncia em Sa\u0026uacute;de e Ambiente.\u003c/em\u003e (2025).\u003c/li\u003e\n\u003cli\u003eSINAN-Dengue. SINANWEB - Dengue. https://portalsinan.saude.gov.br/dengue (2025).\u003c/li\u003e\n\u003cli\u003eNicolelis, M. A. L., Raimundo, R. L. G., Peixoto, P. S. \u0026amp; Andreazzi, C. S. The impact of super-spreader cities, highways, and intensive care availability in the early stages of the COVID-19 epidemic in Brazil. \u003cem\u003eSci. Rep.\u003c/em\u003e\u003cstrong\u003e11\u003c/strong\u003e, 13001 (2021).\u003c/li\u003e\n\u003cli\u003eCousins, H. C., Cousins, C. C., Harris, A. \u0026amp; Pasquale, L. R. Regional Infoveillance of COVID-19 Case Rates: Analysis of Search-Engine Query Patterns. \u003cem\u003eJ. Med. Internet Res.\u003c/em\u003e\u003cstrong\u003e22\u003c/strong\u003e, e19483 (2020).\u003c/li\u003e\n\u003cli\u003eFortaleza, C. M. C. B. \u003cem\u003eet al.\u003c/em\u003e The use of health geography modeling to understand early dispersion of COVID-19 in S\u0026atilde;o Paulo, Brazil. \u003cem\u003ePLoS ONE\u003c/em\u003e\u003cstrong\u003e16\u003c/strong\u003e, e0245051 (2021).\u003c/li\u003e\n\u003cli\u003eITpS. Instituto Todos pela Sa\u0026uacute;de. (2025).\u003c/li\u003e\n\u003cli\u003eCodeco, C. \u003cem\u003eet al.\u003c/em\u003e Infodengue: A nowcasting system for the surveillance of arboviruses in Brazil. \u003cem\u003eRevue d\u0026rsquo;\u0026Eacute;pid\u0026eacute;miologie et de Sant\u0026eacute; Publique\u003c/em\u003e\u003cstrong\u003e66\u003c/strong\u003e, S386 (2018).\u003c/li\u003e\n\u003cli\u003eMeert, W. \u003cem\u003eet al.\u003c/em\u003e DTAIDistance. \u003cem\u003eZenodo\u003c/em\u003e (2020) doi:10.5281/zenodo.7158824.\u003c/li\u003e\n\u003cli\u003ePedregosa, F., Varoquaux, G., Gramfort, A., Michel, V. \u0026amp; Thirion, B. Scikit-learn: Machine Learning in Python. \u003cem\u003eThe Journal of Machine Learning Research\u003c/em\u003e (2011).\u003c/li\u003e\n\u003cli\u003eJohansen, S. \u003cem\u003eLikelihood-Based Inference in Cointegrated Vector Autoregressive Models\u003c/em\u003e. (Oxford University Press, 1995). doi:10.1093/0198774508.001.0001.\u003c/li\u003e\n\u003cli\u003eKostoulas, P. \u003cem\u003eet al.\u003c/em\u003e The epidemic volatility index, a novel early warning tool for identifying new waves in an epidemic. \u003cem\u003eSci. Rep.\u003c/em\u003e\u003cstrong\u003e11\u003c/strong\u003e, 23775 (2021).\u003c/li\u003e\n\u003c/ol\u003e"},{"header":"Table","content":"\u003cp\u003eTable 1 is available in the Supplementary Files section\u003c/p\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"nature-portfolio","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Nature Portfolio","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"ejp","reportingPortfolio":"","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"Real-time occupancy data, early warning, surveillance, z-based epidemic volatility index, health monitoring","lastPublishedDoi":"10.21203/rs.3.rs-7754752/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7754752/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eEarly detection of disease outbreaks is critical for effective public health response, yet traditional surveillance systems often suffer from delayed reporting. Here, we investigate whether real-time occupancy data from healthcare facilities can act as an early warning indicator of possible outbreak activity. We analyzed occupancy trends from 17 emergency care units in the S\u0026atilde;o Paulo metropolitan area and compared them with national surveillance data for infectious diseases, including SARS-CoV-2 and dengue virus. Dynamic time warping and Granger causality tests demonstrated that occupancy patterns anticipate infection dynamics with a mean lead time of three weeks. Early warning signals of three epidemiological events were identified as deviations from average occupancy. Local indicators of spatial association revealed persistent overcrowding hotspots in later outbreak stages, highlighting regions where sustained healthcare monitoring and surveillance remain necessary. These findings demonstrate the potential of privacy-safe passive occupancy data to support timely epidemic surveillance.\u003c/p\u003e","manuscriptTitle":"Crowd signals: Early detection of disease outbreaks using real-time healthcare occupancy data","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-10-31 10:07:38","doi":"10.21203/rs.3.rs-7754752/v1","editorialEvents":[],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"communications-medicine","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"commsmed","sideBox":"Learn more about [Communications Medicine](http://www.nature.com/commsmed)","snPcode":"43856","submissionUrl":"https://mts-commsmed.nature.com/cgi-bin/main.plex","title":"Communications Medicine","twitterHandle":"@commsmedicine","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Communications Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"56c19ae6-fbe4-412a-93d0-ac47a8e1db2d","owner":[],"postedDate":"October 31st, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":55662400,"name":"Health sciences/Diseases/Infectious diseases"},{"id":55662401,"name":"Health sciences/Risk factors"}],"tags":[],"updatedAt":"2025-10-31T10:07:38+00:00","versionOfRecord":[],"versionCreatedAt":"2025-10-31 10:07:38","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7754752","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7754752","identity":"rs-7754752","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.