Projected typical allergic diseases prevalence under changing environments based on multiple machine learning models | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Projected typical allergic diseases prevalence under changing environments based on multiple machine learning models Fengxia Hu, Yizhou Li, Xiaoyu Zhang, Qian Wang, Jin Zhang, Junqin Liang This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6938034/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 11 Feb, 2026 Read the published version in International Journal of Biometeorology → Version 1 posted 4 You are reading this latest preprint version Abstract Timely understanding the prevalence of allergic skin diseases (ASD) and allergic nasopharyngeal disease (AND) is essential for effective public health planning and resource allocation. However, accurately predicting ASD and AND poses a significant challenge due to the complex interplay of environmental and individual factors. A machine learning-based scheme was proposed for predicting the prevalence of ASD and AND using environmental and hydrological data (n = 85). Significant variations in predictive accuracy were observed across different algorithms. For ASD, the decision tree regression (DTR) demonstrated the best performance. For AND, the ridge regression (RR) model yielded the best results, respectively. Based on Urumqi's 2022 population, the projected peak number of individuals with ASD is expected to rise by 215,000, 243,200, and 275,600 compared to January 2015. For AND, the projected peak increases are expected to be 38,900, 35,700, and 56,300, respectively. Environmental factors exhibit significant correlations with the prevalence of ASD and AND, with minimum temperature identified as the most influential factor affecting both conditions. Machine learning models that incorporate these environmental variables were proven to effectively predict the prevalence of both conditions. Based on the model's projections under three climate change scenarios, a significant increase in the prevalence of ASD and AND in Urumqi is expected from 2015 to 2099. This trend underscores the potential impact of climate change on public health in the region, highlighting the need for proactive measures to address these emerging challenges. Allergic diseases changing environments machine learning Shared Socioeconomic Pathways (SSPs) geographical detector Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 1. Introduction In recent years, allergic diseases have emerged as a significant and escalating public health concern worldwide(Conway et al., 2024 ; Wang et al., 2023 ; Xu and Li, 2024 ). As the two of the most commonly prevalence of allergic diseases, the allergic skin diseases (ASD) and allergic nasopharyngeal disease (AND) can significantly impact quality of life(Huang et al., 2023 ; Lack, 2001 ; Tsai et al., 2022 ). Previous studies indicate that climate change can exacerbate ASD and AND by influencing the potency and distribution of environmental allergens(Grant et al., 2023 ; C.-F. Huang et al., 2021 ). For instance, warmer temperatures and higher carbon dioxide levels can enhance pollen production and extend the growing seasons of allergenic plants(Cheng et al., 2023 ; Lam et al., 2024 ). Additionally, humidity levels and precipitation patterns can impact the growth of allergenic fungi and the dispersion of dust mites, which aggravates hypersensitivity reactions and chronic inflammation in individuals with these conditions(Weikl et al., 2015 ; Ziaee et al., 2018 ). As global climate change continues to accelerate, the rising prevalence of allergic diseases is inevitably having an increasing impact on both individual well-being and healthcare systems. Therefore, accurately predicting the prevalence of ASD and AND are crucial for effective public health planning and resource allocation, as they enable targeted interventions and preventive measures. Traditional methods for assessing the incidence of ASD and AND typically include epidemiological surveys, population-based studies, and analysis of healthcare utilization data(Schafer and Ring, 1997 ; Svensson et al., 2018 ; Weller et al., 2022 ). These approaches often rely on self-reported symptoms, medical records, and clinic-based assessments to estimate prevalence rates. However, these approaches present several limitations, including potential underreporting or misdiagnosis due to subjective reporting and variability in diagnostic practices. Given the complex relationship between climate variables and allergic disease prevalence, utilizing climate data to develop data-driven models offers a valuable opportunity for estimating and forecasting the prevalence of ASD and AND(Martinez et al., 2022 ; Tang et al., 2020 ). Currently, the data-driven methods can be categorized into empirical methods and machine learning (ML) techniques. Commonly used empirical methods for predicting allergic disease incidence, such as linear regression and time series analysis, provide clear quantification of relationships and straightforward implementation(de Marco et al., 2002 ; Hu et al., 2022 ). However, these methods may oversimplify complex interactions and fail to account for non-linear or dynamic effects of climate variables on disease prevalence. Due to their capability to identify complex patterns and extract relevant features, machine learning methods can effectively overcome the limitations of the empirical models. Prosperi et al.(Prosperi et al., 2014 ) evaluated five machine learning models using clinical, demographic, laboratory, genetic, and environmental data to predict allergic phenotypes. The results indicated that the random forest model has the best performance among all the models. Huang et al.(Y. Huang et al., 2021 ) assessed the predictive capability of various machine learning methods for childhood atopic dermatitis and allergic rhinitis using longitudinal data from 1,439 mother-infant pairs, finding that tree-based models had the highest sensitivity and specificity regarding air pollution exposure. With advancements in satellite technology, remote sensing provides a rich dataset for machine learning-based research on allergic diseases. For instance, Lin et al.(Lin et al., 2022 ) utilized satellite remote sensing data to extract urban green space area and investigated its relationship with allergic diseases in a study of 522 two-year-old children in Guangzhou, China. Tang et al.(Tang et al., 2022 ) utilized remote sensing to obtain urban nighttime light and meteorological data to investigate the effects of artificial light at night (ALAN) and air pollutants on the incidence of allergic diseases among college students. Previous research suggests that combining multi-source remote sensing data with comprehensive medical records, along with machine learning techniques, has the potential to clarify the complex relationships between air pollution levels, climate variables, and the incidence of allergic diseases(Shamji et al., 2023 ; Sobieraj et al., 2024 ). This advanced analytical capability offers the potential for more precise forecasting of allergic disease events, enabling the development of targeted public health interventions and preventive measures. In this study, we intended to develop and validate machine learning models for the prediction of the prevalence of the allergic disease. More specifically, we first analyzed the relationship between climate factors and the occurrence of allergic disease events to identify the optimal input features. Subsequently, five machine learning models were developed and trained using records of allergic diseases in Urumqi and pre-selected features. Finally, the optimal models for ASD and AND were employed to predict the prevalence of allergic diseases under three future climate change scenarios. By analyzing historical climate data alongside epidemiological trends, we aim to elucidate patterns that could enhance our understanding of how environmental factors influence allergic disease dynamics. Overall, this study is anticipated to significantly enhance the monitoring capabilities for allergic disease outbreaks, providing valuable technical support for the formulation of preventive measures and management strategies by government agencies. 2. Study site Urumqi is located in the Xinjiang Uygur Autonomous Region and covers 13,800 km², with a population of 4.08 million(Chen et al., 2024 ). Situated in the heart of the Eurasian continent, Urumqi is subject to a continental arid climate that is marked by pronounced seasonal extremes. The annual mean temperature in Urumqi is 6.7°C, with perennial average rainfall of 280 mm(Sidikjan et al., 2022 ). These climatic conditions create a dry environment with low humidity, which can affect the dispersion and concentration of airborne allergens, such as pollen and dust mites. The combination of limited precipitation and high evaporation rates contributes to a heightened presence of particulate matter in the air, which can exacerbate allergic responses(Shen et al., 2011 ). The eastern part of the Urumqi is characterized by mountainous and hilly terrain, while the western region comprises expansive plains. This topographic diversity further influences the local microclimates and allergen concentrations, providing a rich context for investigating how environmental factors interplay with allergic disease prevalence and severity(Wang et al., 2016 ). 3. Materials and methods This study developed a predictive framework for allergic diseases based on multi-source remote sensing data and various machine learning models. Initially, environmental, hydrological, and meteorological factors were considered, and twelve influencing factors were selected through information gain (IG) and correlation analysis. Subsequently, five representative machine learning models were chosen, and model parameters were tuned using k-fold cross-validation and grid search algorithms. The overall performance of these models was evaluated using Mean Absolute Error (MAE), Mean Squared Error (MSE), and correlation coefficient (r) to select the optimal predictive model for allergic diseases. Using CMIP6 data, twelve influencing factors under three SSP scenarios were extracted and input into the optimal model to forecast the prevalence trends of allergic diseases under future climate change condition. Finally, the Geodetector method was employed to elucidate the impact mechanisms of various influencing factors on allergic disease prevalence. The technical flow chart for this study is shown in Fig. 1 . 3.1 Dataset collection 3.1.1 Allergic diseases data collection We first collected monthly incidence data for ASD and AND in Urumqi from 2014 to 2023. Then, the annual population data of Urumqi between 2014 and 2023 was downloaded from the websites of National Bureau of Statistics ( https://www.stats.gov.cn/ ). Finally, the monthly prevalence rates of ASD and AND were obtained by dividing the monthly incidence figures by the total population for each corresponding year. 3.1.2 Environmental monitoring date collection The environmental factors including aerosol optical depth (AOD), ozone (O₃), and the ultraviolet index (UVI). The AOD reflects the concentration of airborne particulates, which can trigger inflammatory responses in the skin and nasal passages. O 3 is a potent respiratory irritant that can worsen nasal inflammation and contribute to allergic rhinitis(Kim et al., 2011 ). High UVI levels can damage skin cells and alter immune responses, potentially leading to flare-ups in allergic skin conditions and impacting the severity of allergic rhinitis. Thus, we selected these three factors to develop a predictive model for the prevalence of allergic diseases. The AOD data was extracted from MOD08_M3 (version 6.1) product in Google Earth Engine (GEE). The O₃ data is sourced from the surface O₃ dataset within the China High Air Pollutants datasets (CHAP). The UVI data was acquired from the Clouds and the Earth’s Radiant Energy System (CERES) products ( https://asdc.larc.nasa.gov/project/CERES ). The hydrological and climate data, including monthly total runoff (Q sum ), monthly maximum temperature (Q max ), daily minimum temperature (T min ), daily maximum temperature (T max ), daily mean temperature (T mean ), daily maximum precipitation (P max ), daily minimum precipitation (P min ), eastward component of the 10m wind (W u ), and northward component of the 10m wind (W v ), were acquired from the land component of fifth generation ECMWF atmospheric reanalysis (ERA5-land) of the global climate datasets(Hersbach et al., 2020 ; Munoz-Sabater et al., 2021 ). 3.2 Output of CMIP6 projection To address the challenges posed by global climate change, the World Climate Research Programme (WCRP) initiated a new phase of the Coupled Model Intercomparison Project (CMIP6) (Meehl et al., 2020).This project provides data support for achieving the scientific goals established by the WCRP’s Grand Challenges program(Li et al., 2023; Lovato et al., 2022 ). To predict future trends of two allergic diseases in the study area, we selected three greenhouse gas emission scenarios from CMIP6: low emission scenario (SSP126), medium emission scenario (SSP370), and high emission scenario (SSP585). The Shared Socioeconomic Pathways (SSPs) in CMIP6 integrate Representative Concentration Pathways (RCPs) and consider the impact of socioeconomic development, providing more reliable potential outcomes of future climate change(Gidden et al., 2019 ). 3.3 Piecewise regression model Piecewise regression model (PRM) is used to capture varying relationships between predictors and responses by dividing the predictor space into segments, each with its own linear relationship(Toms and Lesperance, 2003 ). The process involves identifying breakpoints where the relationship changes, defining separate linear models for each segment, and estimating the model parameters. A PRM model with \(\:k\) segments can be specified as: $$\:{Y}_{i}=\left\{\begin{array}{c}{\beta\:}_{\text{0,1}}+{\beta\:}_{\text{1,1}}{X}_{i}+{ϵ}_{i}\:\:\:\:if{X}_{i}\le\:{c}_{1}\\\:{\beta\:}_{\text{0,2}}+{\beta\:}_{\text{1,2}}{X}_{i}+{ϵ}_{i}\:\:\:\:if{c}_{1}\le\:{X}_{i}\le\:{c}_{2}\\\:\genfrac{}{}{0pt}{}{\dots\:}{{\beta\:}_{0,k}+{\beta\:}_{1,k}{X}_{i}+{ϵ}_{i}\:\:\:\:if{X}_{i}\ge\:{c}_{k-1}}\end{array}\right.$$ 1 Where \(\:{c}_{1},{c}_{2},\dots\:,{c}_{k-1}\) represents break points, \(\:{\beta\:}_{0,j}\) and \(\:{\beta\:}_{1,j}\) represents segment-specific coefficients, and \(\:{ϵ}_{i}\) represents the error term. The objective function of A PRM model is to minimize the sum of squared residuals across all segments, which can be formulated as follows: $$\:{Objective}_{PRM}=Minimize\sum\:_{i=1}^{n}{({Y}_{i}-\widehat{{Y}_{i}})}^{2}$$ 2 where \(\:\:{Y}_{i}\) represents the observed valuer. \(\:\widehat{{Y}_{i}}\) is the predicted value based on the PRE model. 3.4 Information Gain Information Gain (IG) method was derived from information theory and introduced by Claude Shannon(Shannon, 1948 ). By measuring the reduction in uncertainty or entropy achieved through the introduction of a particular feature, the Information Gain (IG) method can identify the most effective features for splitting data and enhance model performance. For a machine learning model, the initial entropy, which represents the uncertainty in the target variable before considering any feature, can be calculated as follows: $$\:H\left(D\right)=\:-\sum\:_{i}{p}_{i}{log}_{2}{p}_{i}$$ 3 Where \(\:H\left(D\right)\) represents the initial entropy, \(\:{p}_{i}\) denotes the probability of the i-th outcome in the target variable. The conditional entropy \(\:H\left(D|X\right)\) , which represents the remaining uncertainty in the target variable when a feature \(\:X\) is known, can be calculated as follows: $$\:H\left(D|X\right)=\sum\:_{v\in\:Values\left(X\right)}\frac{\left|{D}_{v}\right|}{\left|D\right|}H\left({D}_{v}\right)\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:$$ 4 Where \(\:{D}_{v}\) is the subset of data where feature X has the values \(\:v\) , \(\:\left|{D}_{v}\right|\) denotes the size of this subset. Thus, the information gain \(\:IG(D,X)\) can be formulated as follows: $$\:IG\left(D,X\right)=\:H\left(D\right)-H\left(D|X\right)$$ 5 A higher Information Gain indicates that the feature \(\:X\) provides a greater reduction in uncertainty, making it more valuable for improving model performance. 3.5 Machine Learning models We compared the performance of five commonly used machine learning models, including Decision tree regression (DTR), Ridge Regression (RR), Random Forest Regression (RFR), Elastic Net Regression (ENR), and Support Vector Regression (SVR), to select the optimal model for predicting the prevalence of ASD and AND. DTR is a non-linear model that uses a tree-like structure to make decisions based on feature values(Rathore and Kumar, 2016 ). RR is a linear regression technique that includes a penalty term to the loss function, helping to mitigate multicollinearity and reduce overfitting by shrinking coefficient estimates(Hoerl and Kennard, 1970 ). RFR is an ensemble learning method that builds multiple decision trees and averages their predictions, enhancing accuracy and robustness while reducing the risk of overfitting(Rodriguez-Galiano et al., 2015 ). ENR combines the penalties of both Ridge and Lasso regression, allowing it to effectively handle high-dimensional datasets while performing variable selection and regularization(Hans, 2011 ). SVR is a powerful regression technique that uses support vector machines to find a hyperplane that best fits the data, capable of capturing complex relationships through kernel function (Awad et al., 2015). 3.6 Performance evaluation metrics MAE, MSE and 𝑟 were chosen as evaluating metrics in this study to comprehensively assess the model's performance on both the training and validation sets. The formulas for these three metrics are as follows $$\:MSE=\:\frac{1}{m}\sum\:_{i=1}^{m}{({y}_{i}-{\widehat{y}}_{i})}^{2}$$ 6 $$\:MAE=\:\frac{1}{m}\sum\:_{i=1}^{m}|{y}_{i}-{\widehat{y}}_{i}|$$ 7 $$\:r=\:\frac{{\sum\:}_{i=1}^{m}\left[({y}_{i}-\stackrel{-}{y})(\widehat{{y}_{i}}-\stackrel{-}{\widehat{y}})\right]}{\sqrt{{\sum\:}_{i=1}^{m}{({y}_{i}-\stackrel{-}{y})}^{2}*{\sum\:}_{i=1}^{m}{(\widehat{{y}_{i}}-\stackrel{-}{\widehat{y}})}^{2}}}$$ 8 Where \(\:{y}_{i}\) denotes the true value of the i-th sample, \(\:{\widehat{y}}_{i}\) denotes the predicted value of the i-th sample, and \(\:m\:\) represents the total number of samples. 3.7 Generalized Linear Model Generalized Linear Model (GLM) is a versatile framework for regression analysis that extends the traditional linear model to accommodate a wide range of response variable types( Nelder et al., 1972). The GLM framework consists of three main components including random component, systematic component, and link function. The random component specifies the probability distribution of the response variable and the systematic component represents the linear predictor. The link function relates the mean of the response variable \(\:\mu\:\) to the liner predictor \(\:\eta\:\) and can be formulated as follows: $$\:g\left(\mu\:\right)=X\beta\:$$ 9 Where \(\:X\) is the matrix of predictors, \(\:\beta\:\) represents the vector of coefficients. For a sample of \(\:n\) observations, the log-likelihood can be formulated as follows: $$\:\mathcal{l}\left(\beta\:\right)=\sum\:_{i=1}^{n}logf({y}_{i},{\mu\:}_{i},\varphi\:)$$ 10 where \(\:f\) is the probability function for the response variable, \(\:{\mu\:}_{i}\) is the mean for the i-th observation, and \(\:\varphi\:\) is a dispersion parameter. 3.8 Geodetector Geodetector is a statistical technique used to analyze and quantify the spatial distribution of explanatory variables and their influence on a response variable(Zhu et al., 2020 ). The Geodetector method encompasses four types of detectors, including Factor Detector, Interaction Detector, Risk Detector, and Ecological Detector. Among these, the factor detector is used to quantify the impact of different environmental indicators on the prevalence of allergic diseases, which can be formulated as follows: $$\:Q=1-\frac{\sum\:_{h=1}^{L}{N}_{h}{\sigma\:}_{h}^{2}}{N{\sigma\:}^{2}}$$ 11 Where \(\:h\:=\text{1,2},3..n\) represents specific categories, \(\:L\) denotes the number of layers for the independent or dependent variables, \(\:{N}_{h}\) and \(\:N\:\) are the number of units in layer \(\:h\) and the total number of units across all layer, respectively, \(\:{\sigma\:}_{h}\:\) and \(\:\sigma\:\) are the variances within layer \(\:h\) and the overall variance across the total layer, respectively. The Interaction Detector evaluates how interactions between two factors affect a variable's explanatory power, categorizing them in Geodetector as nonlinear weakening, single-factor nonlinear weakening, double-factor enhancement, independence, and linear enhancement(Song and Wu, 2021 ). To meet the input data requirements for Geodetector, the optimal parameter discretization method is first applied. Both the Geodetector and the optimal parameter discretization methods are implemented using the GD package in R language. 4. Results 4.1 Temporal variability of Allergic Diseases The variation of the ASD and AND in Urumqi from 2014 to 2023 is illustrated in Fig. 3 . In general, the prevalence of both ASD and AND has shown an upward trend from 2014 to 2023. For ASD, the highest prevalence was observed in January 2014, at 0.014%, while the lowest prevalence occurred in June 2017, at 0.083%. For AND, the highest prevalence was observed in November 2022, at 0.017%, while the lowest prevalence occurred in August 2023, at 0.001%. According to the Fig. 3 (a), the prevalence of ASD in the population exhibited a notable inflection point in March 2017. Prior to March 2017, the prevalence displayed an increasing trend, which subsequently transitioned to a decreasing trend. According to Fig. 2 (b), the prevalence of AND in the population exhibited two significant inflection points, occurring in July 2019 and September 2022. Initially, the prevalence displayed an increasing trend up to July 2019. Following this, a decreasing trend was observed until September 2022, after which the prevalence began to rise again. From an annual variation perspective, both ASD and AND exhibit notable seasonal trends [Fig. 2 (c)]. For ASD, the prevalence shows an increasing trend from January to July, and a decreasing trend from July to December, with the average multi-year monthly maximum value of 0.060% and minimum values of 0.0350%. For AND, the prevalence rises from January to August and declines from August to December. with the average multi-year monthly maximum value of 0.011% and minimum values of 0.005%. 4.2 Analysis of environmental factors According to the Fig. 3 (a), all the environmental factors exhibit positive correlations with ASD and AND except for W V . For both ASD and AND, the minimum temperature exhibits the most significant correlation, with correlation coefficients of 0.63 and 0.52, respectively (P < 0.001). This may be due to the fact that low temperatures can easily impact the human immune system, potentially leading to an abnormal enhancement of immune responses. Low temperatures are often accompanied by low humidity, which causes dry air, thereby exacerbating the dryness and sensitivity of the skin and respiratory tract. Conversely, the AOD factor shows the least correlation with these diseases, with coefficients of 0.29 and 0.12 (P < 0.05), respectively. AOD primarily reflects the particulate matter concentration in the atmosphere, whereas allergic diseases are more influenced by specific allergens (such as pollen, dust mites, and mold), and the relationship between the concentration and distribution of these allergens and aerosol concentrations may not be significant. The relative importance of various environmental factors in influencing allergic disease as shown in the Fig. 3 (b). Factors with higher bars contribute more significantly to reducing uncertainty about allergic disease outcomes, suggesting they have a stronger impact on the incidence or severity of these conditions. According to the Fig. 3 (b), the majority of environmental factors affecting ASD and AND in Urumqi exhibit similar information gain values, clustered within the narrow range of 6.6 to 7.0. This suggests that these factors contribute almost equally to reducing uncertainty in the model. However, Q max stands out with a slightly lower information gain value of 6.6 for both ASD and AND, indicating that it has a marginally lesser impact on the prediction of allergic disease outcomes compared to the other factors. Overall, the close proximity of information gain values of the environmental factors highlighting the multifaceted nature of environmental influences on allergic diseases. The input features of the final allergic disease model include a total of 12 environmental factors such as UVI, P sum , and T min . 4.3 Model performance evaluation According to the Fig. 4 , for the ASD prevalence prediction, the comprehensive performance of the five machine learning models in both training datasets and testing dataset is ranked from high to low as: DTR > RR > RFR > ENR > SVR. The DTR model outperformed all other models, with the weighted MSE, MAE, and r values of 0.53, 0.58, and 0.72, respectively. For AND prevalence prediction, the comprehensive performance of the five machine learning models in both training datasets and testing dataset is ranked from high to low as: RR > SVR > RFR > ENR > DTR. The RR model outperformed all other models, with the weighted MSE, MAE, and r values of 0.62, 0.54, and 0.61, respectively. Therefore, we selected the DTR and RR models for subsequent predictions of ASD and AND prevalence under different SSP scenarios. It worth to nothing that, the performance differences among the five machine learning models used to predict two allergic diseases based on environmental factors were all within 0.2. This suggests that further performance improvements within the current framework may require the introduction of new features or the exploration of alternative models in future work. Figure S2. shows the consistency between the actual prevalence and the predicted prevalence by the optimal model for both ASD and AND. In general, the two optimal machine learning models effectively capture the peaks and troughs in the prevalence trends of ASD and AND. For ASD prediction, the DTR model tends to overestimate high values and underestimate low values in the training dataset. In the testing dataset, the underestimation by the DTR model is more pronounced. For AND prediction, In the training dataset, the RR model mainly underestimates high values, while in the testing dataset, it primarily overestimates low values. Overall, the results demonstrates that the DTR and RR models perform well in predicting the prevalence of ASD and AND. 5. Discussion 5.1 Projected environmental factors under three SSPs The CMIP6 provides a unique opportunity to access data on variables such as temperature and precipitation under different future greenhouse gas emission scenarios [Fig. S3 (a) - (k)]. Among all environmental factors, the intensity of UVI significantly influences skin immune responses by altering individual sensitivity to allergens, thereby directly impacting the occurrence and severity of allergic reactions. According to the Fig. 3 , UVI is crucial for accurately predicting the prevalence of ASD and AND. Nevertheless, CMIP6 does not provide UVI data. Thus, in this study, we utilized a GLM model to establish the relationship between UVI and the other 11 influencing factors. The results indicate that using AOD and O 3 as predictors for estimating UVI yields high model accuracy (as shown in the Table S1 ), with an R 2 of 0.77 (p < 0.001). The AOD reflects the ability of aerosol particles in the atmosphere to absorb and scatter UV radiation. These particles directly influence the transmittance of UV radiation, making AOD data crucial for understanding the propagation and distribution of UV radiation in the atmosphere(Hu et al., 2010 ). Ozone selectively absorbs UV radiation, particularly in the shorter wavelengths of UVB and UVC regions. Combining AOD and ozone data thus provides accurate estimates of current UVI. Finally, utilizing AOD and O 3 data from CMIP6 along with the GLM model, we derived UVI data for the period 2024–2099 under three SSP scenarios [Fig. S3 (k)]. 5.2 Projected ASD and AND prevalence Under three SSPs Based on the trained optimal ASD and AND prediction models and meteorological datasets from CMIP6, the monthly changes in the prevalence of ASD and AND in Urumqi from 2024 to 2099 under three scenarios (SSP126, SSP370, and SSP585) are shown in Fig.S4. Under the three SSP scenarios, the prevalence of ASD and AND exhibits cyclical variations on a monthly scale. The prevalence rates gradually increase over certain periods, reach a peak, and then gradually decline to a specific low point. By December 2099, the prevalence of ASD is projected to reach 0.039%, 0.045%, and 0.045% respectively. For ASD, the highest prevalence under the SSP126, SSP370, and SSP585 scenarios is projected to be 0.053% in May 2029, 0.053% in May 2094, and 0.053% in May 2091, respectively. The lowest prevalence under all three emission scenarios is 0.039%, occurring in January 2025, January 2030, and December 2031, respectively. For AND, by December 2099, the prevalence is projected to reach 0.007%, 0.009%, and 0.007% under three different emission scenarios, respectively. The annual variation in the prevalence of ASD and AND in Urumqi from 2015 to 2099 is illustrated in the Fig. 5 . For ASD, the annual variation shows a significant increase in prevalence under the SSP126, SSP370, and SSP585 scenarios (p < 0.05). Under the SSP126 and SSP370 scenarios, the highest prevalence of ASD is observed in 2096, reaching 0.047% and 0.048%, respectively. Under the SSP585 scenario, the highest prevalence of ASD occurs in 2099, with the value of 0.049%. Generally, before 2025, the ASD prevalence is highest under the SSP126 scenario, followed by SSP370, and lowest under SSP585. As time progresses, the rate of increase in ASD prevalence under the SSP585 scenario becomes more pronounced, surpassing the other two scenarios after 2040. For AND, the prevalence of ASD under the SSP370 and SSP585 scenarios shows a significant increasing trend from 2015 to 2099 (p 0.05). Under the SSP126 scenario, the highest prevalence of ASD occurs in 2097, reaching 0.008%. For the SSP370 scenario, the peak occurs in 2088 at 0.0087%, while under the SSP585 scenario, the highest prevalence is observed in 2080 at 0.009%. Overall, the prevalence of AND before 2073 is relatively similar across the three scenarios, with SSP126 and SSP370 showing slightly higher prevalence than SSP585. The prevalence of AND under the SSP585 scenario after 2073 increases significantly, and peaking and clearly surpassing the prevalence under the other two scenarios. After 2090, the prevalence of AND in all three scenarios gradually converges. It is noteworthy that although the proportional increase in the prevalence of ASD and AND under future SSP scenarios appears modest, when considering Urumqi's population of 25.9 million in 2022, the projected peak numbers of individuals with ASD are expected to increase by 215,000, 243,200, and 275,600 respectively compared to January 2015. Similarly, the projected peak numbers of individuals with AND are expected to increase by 38,900, 35,700, and 56,300 respectively compared to January 2015 under different climate scenarios in the future. Thus, the government departments should enhance health education and awareness, provide more medical services and strengthen social support networks to effectively address the potential increase in the prevalence of ASD and AND in the future. 5.3 Analysis of the importance of model features and collaborative benefits based on Geographical detector Investigating the etiological mechanisms underlying allergic disease occurrence is crucial for elucidating key factors influencing its onset and dissemination. Thus, in this study, we employ geographic detectors to examine the impact of individual environmental factors and their interactions on the prevalence of ASD and AND, as illustrated in Fig. 6 . As depicted in Fig. 6 (a) – (b), all environmental factors significantly influence ASD prevalence, except for Q sum (P < 0.01). Among these factors, T min exerts the greatest impact on prevalence (Q = 0.446), followed by UVI, O 3 , and T mean with Q values of 0.419, 0.403, and 0.396, respectively. Conversely, AOD exhibits the least influence on ASD prevalence among all factors (Q = 0.124). For AND, all environmental factors demonstrate significant effects on its occurrence (P < 0.01). T min has the strongest influence on AND prevalence (Q = 0.345), followed by O 3 , Q sum , and T mean , with T mean values of 0.325, 0.325, and 0.324, respectively. At lower environmental temperatures, increased skin moisture loss weakens the skin barrier function, rendering it more susceptible to external allergens and exacerbating symptoms of ASD. Simultaneously, dry nasal mucosa and increased nasal airway resistance under cold conditions aggravate manifestations of AND. Additionally, cold temperatures may induce nasal vasoconstriction, reducing nasal blood flow and local immune cell activity, thereby impairing their ability to clear allergens and further intensifying episodes of AND. The occurrence of allergic diseases is typically influenced by multiple factors. Utilizing interaction detectors enables a more precise capture of the intricate interactions among environmental factors, thereby revealing their collective impact on allergic diseases. According to the Fig. 6 (c) – (d), AOD and factors such as T min , Q max , Q sum , P max , and P sum exhibit nonlinear enhancements with ASD prevalence. This indicates that the impact of these factors on ASD is not simply additive but rather characterized by complex relationships that vary significantly with factor levels. Additionally, interactions of other environmental factors show a dual-factor enhancement effect on ASD prevalence, suggesting that their combined influence exceeds simple additive effects. Among all interaction factors, the interaction between temperature and total runoff has the highest impact on the prevalence of ASD, with a value of 0.599. The interaction between AOD and maximum runoff has the smallest impact on ASD prevalence, with a value of 0.314%. For AND, aside from the nonlinear enhancement observed in the interactions between AOD and other environmental factors, as well as total runoff and factors including T max , Q sum , P max , and W U , all other interactions involving environmental variables show dual-factor enhancement. Specifically, the interaction between O 3 and Q sum has the highest impact on AND prevalence, with a value of 0.590. Conversely, the interaction between P sum and W V has the lowest impact on AND prevalence, with a value of 0.292. Overall, according to Fig. 6 (c) – (d), the pairwise interaction of environmental factors contributes to the accurate estimation of the prevalence of ASD and AND, indirectly indicating the feasibility of utilizing environmental factors for predicting allergic diseases. 6. Conclusion In this study, we established a machine learning-based scheme for the prediction of ASD and AND prevalence and forecasted the variations in ASD and AND prevalence in Urumqi from 2015 to 2099 under three different climate change scenarios. The results demonstrates that machine learning models can effectively predict the prevalence of ASD and AND) using environmental and hydrological data. The DTR emerged as the most effective model for predicting ASD, while RR performed best for AND. The analysis revealed that minimum temperature is the most influential factor affecting both diseases. Projections under three climate change scenarios indicate a significant increase in ASD and AND prevalence in Urumqi from 2015 to 2099. Specifically, the peak number of ASD cases under the three climate change scenarios is expected to rise to between 215000 and 275600, while AND cases are projected to rise to between 39900 and 56300. These results underscore the critical need for integrating predictive models into public health strategies to anticipate and manage future changes in allergic disease patterns ensuring effective resource allocation and intervention planning. Overall, the scheme proposed in this paper demonstrates the potential of machine learning models to forecast the prevalence of ASD and AND, which could aid decision-making departments in implementing timely countermeasures and in effective resource allocation. Declarations Ethical considerations This is an observational study that has confirmed that ethical approval is not required. Competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Funding This work was supported by the Natural Science Foundation of Xinjiang Uygur Autonomous Region - Distinguished Young Scholars (2022D01E81), Xinjiang Uygur Autonomous Region Tianshan Talent Training Program (2022TSYCCX0109), and Tianshan Innovation Team Fund Project of Xinjiang Uygur Autonomous Region (2023D14005). Author contributions Fengxia Hu: Data curation, Methodology, Writing – original draft, Conceptualization. Junqin Liang: Conceptualization, Writing – review & editing, Methodology. Jin Zhang: Conceptualization, Validation, Supervision. Yizhou Li: Software, Data curation. Xiaoyu Zhang: Visualization, Investigation. Qian Wang: Visualization, Investigation. Acknowledgments We thank Peng Liu from information department of our hospital and Hongbo Ling from Xinjiang Institute of Ecology and Geography for their assistance during the data extraction process. Data availability Data will be made available on request. References Awad M, Khanna R (2015) Efficient Learn Machines. 10.1007/978-1-4302-5990-9 Chen D, Sun W, Shi J, Johnson BA, Tan ML, Pan Q, Li W, Yang X, Zhang F (2024) Utilizing GaoFen-2 derived urban green space information to predict local surface temperature. Urban Urban Green 99:128463. https://doi.org/10.1016/j.ufug.2024.128463 Cheng X, Frank U, Zhao F, Capella JR, Winkler JB, Schnitzler J-P, Ghirardo A, Bertic M, Estrella N, Durner J, Pritsch K (2023) Plant growth traits and allergenic potential of Ambrosia artemisiifolia pollen as modified by temperature and NO2. Environ Exp Bot 206:105193. https://doi.org/10.1016/j.envexpbot.2022.105193 Conway AE, Verdi M, Kartha N, Maddukuri C, Anagnostou A, Abrams EM, Bansal P, Bukstein D, Nowak-Wegrzyn A, Oppenheimer J, Madan JC, Garnaat SL, Bernstein JA, Shaker MS (2024) Allergic Diseases and Mental Health. J Allergy Clin Immunol -Pract 12:2298–2309. https://doi.org/10.1016/j.jaip.2024.05.049 de Marco R, Poli A, Ferrari M, Accordini S, Giammanco G, Bugiani M, Villani S, Ponzio M, Bono R, Carrozzi L, Cavallini R, Cazzoletti L, Dallari R, Ginesu F, Lauriola P, Mandrioli P, Perfetti L, Pignato S, Pirina P, Struzzo P (2002) The impact of climate and traffic-related NO2 on the prevalence of asthma and allergic rhinitis in Italy. Clin Exp Allergy 32:1405–1412. https://doi.org/10.1046/j.1365-2745.2002.01466.x Gidden MJ, Riahi K, Smith SJ, Fujimori S, Luderer G, Kriegler E, van Vuuren DP, van den Berg M, Feng L, Klein D, Calvin K, Doelman JC, Frank S, Fricko O, Harmsen M, Hasegawa T, Havlik P, Hilaire J, Hoesly R, Horing J, Popp A, Stehfest E, Takahashi K (2019) Global emissions pathways under different socioeconomic scenarios for use in CMIP6: a dataset of harmonized emissions trajectories through the end of the century. Geosci Model Dev 12:1443–1475. https://doi.org/10.5194/gmd-12-1443-2019 Grant TL, Wood RA, Chapman MD (2023) Indoor Environmental Exposures and Their Relationship to Allergic Diseases. J Allergy Clin Immunol -Pract 11:2963–2970. https://doi.org/10.1016/j.jaip.2023.08.034 Hans C (2011) Elastic Net Regression Modeling With the Orthant Normal Prior. J Am Stat Assoc 106:1383–1393. https://doi.org/10.1198/jasa.2011.tm09241 Hersbach H, Bell B, Berrisford P, Hirahara S, Horanyi A, Munoz-Sabater J, Nicolas J, Peubey C, Radu R, Schepers D, Simmons A, Soci C, Abdalla S, Abellan X, Balsamo G, Bechtold P, Biavati G, Bidlot J, Bonavita M, De Chiara G, Dahlgren P, Dee D, Diamantakis M, Dragani R, Flemming J, Forbes R, Fuentes M, Geer A, Haimberger L, Healy S, Hogan RJ, Holm E, Janiskova M, Keeley S, Laloyaux P, Lopez P, Lupu C, Radnoti G, de Rosnay P, Rozum I, Vamborg F, Villaume S, Thepaut J-N (2020) The ERA5 global reanalysis. Q J R Meteorol Soc 146:1999–2049. https://doi.org/10.1002/qj.3803 Hoerl A, Kennard R (1970) Ridge Regression - Applications to Nonorthogonal Problems. Technometrics 12:69. https://doi.org/10.2307/1267352 Hu B, Wang Y, Liu G (2010) Properties of ultraviolet radiation and the relationship between ultraviolet radiation and aerosol optical depth in China. Atmos Res 98:297–308. https://doi.org/10.1016/j.atmosres.2010.07.009 Hu Y, Jiang F, Tan J, Liu S, Li S, Wu M, Yan C, Yu G, Yi H, Yin Y, Tong S (2022) Environmental Exposure and Childhood Atopic Dermatitis in Shanghai: A Season-Stratified Time-Series Analysis. Dermatology 238:101–108. https://doi.org/10.1159/000514685 Huang C-F, Chie W-C, Wang I-J (2021) Effect of environmental exposures on allergen sensitization and the development of childhood allergic diseases: A large-scale population-based study. World Allergy Organ J 14:100495. https://doi.org/10.1016/j.waojou.2020.100495 Huang J, Zheng W, Huang H, Ran Y, Liu Y, Huang P (2023) Particulate matter, nitrogen dioxide, and sulfur dioxide and their associations with allergic skin diseases: A systematic review and meta-analysis. Atmos Pollut Res 14:101804. https://doi.org/10.1016/j.apr.2023.101804 Huang Y, Wen H-J, Guo Y-LL, Wei T-Y, Wang W-C, Tsai S-F, Tseng VS, Wang S-LJ (2021) Prenatal exposure to air pollutants and childhood atopic dermatitis and allergic rhinitis adopting machine learning approaches: 14-year follow-up birth cohort study. Sci Total Environ 777:145982. https://doi.org/10.1016/j.scitotenv.2021.145982 Kim B-J, Kwon J-W, Seo J-H, Kim H-B, Lee S-Y, Park K-S, Yu J, Kim H-C, Leem J-H, Sakong J, Kim S-Y, Lee C-G, Kang D-M, Ha M, Hong Y-C, Kwon H-J, Hong S-J (2011) Association of ozone exposure with asthma, allergic rhinitis, and allergic sensitization. Ann Allergy Asthma Immunol 107:214–219. https://doi.org/10.1016/j.anai.2011.05.025 Lack G (2001) Pediatric allergic rhinitis and comorbid disorders. J Allergy Clin Immunol 108:S9–S15. https://doi.org/10.1067/mai.2001.115562 Lam HCY, Anees-Hill S, Satchwell J, Symon F, Macintyre H, Pashley CH, Marczylo EL, Douglas P, Aldridge S, Hansell A (2024) Association between ambient temperature and common allergenic pollen and fungal spores: A 52-year analysis in central England, United Kingdom. Sci Total Environ 906:167607. https://doi.org/10.1016/j.scitotenv.2023.167607 Lin L, Chen Y, Wei J, Wu, Shengchi, Wu, Shu, Jing J, Dong G, Cai L (2022) The associations between residential greenness and allergic diseases in Chinese toddlers: A birth cohort study. Environ Res 214:114003. https://doi.org/10.1016/j.envres.2022.114003 Lovato T, Peano D, Butenschon M, Materia S, Iovino D, Scoccimarro E, Fogli PG, Cherchi A, Bellucci A, Gualdi S, Masina S, Navarra A (2022) CMIP6 Simulations With the CMCC Earth System Model (CMCC-ESM2). J. Adv. Model. Earth Syst. 14, e2021MS002814. https://doi.org/10.1029/2021MS002814 Martinez BA, Shrotri S, Kingsmore KM, Bachali P, Grammer AC, Lipsky PE (2022) Machine learning reveals distinct gene signature profiles in lesional and nonlesional regions of inflammatory skin diseases. Sci Adv 8:eabn4776. https://doi.org/10.1126/sciadv.abn4776 Meehl GA, Boer GJ, Covey C, Latif M, Stouffer RJ (2000) The coupled model intercomparison project (CMIP). Bull Am Meteorol Soc 81:313–318 Munoz-Sabater J, Dutra E, Agusti-Panareda A, Albergel C, Arduini G, Balsamo G, Boussetta S, Choulga M, Harrigan S, Hersbach H, Martens B, Miralles DG, Piles M, Rodriguez-Fernandez NJ, Zsoter E, Buontempo C, Thepaut J-N (2021) ERA5-Land: a state-of-the-art global reanalysis dataset for land applications. Earth Syst Sci Data 13:4349–4383. https://doi.org/10.5194/essd-13-4349-2021 Nelder JA, Wedderburn RWM (1972) Generalized Linear Models. J Royal Stat Soc Ser A 135:370–384. https://doi.org/10.2307/2344614 Prosperi MCF, Marinho S, Simpson A, Custovic A, Buchan IE (2014) Predicting phenotypes of asthma and eczema with machine learning. BMC Med Genomics 7:S7. https://doi.org/10.1186/1755-8794-7-S1-S7 Rathore SS, Kumar S (2016) SIGSOFT Softw Eng Notes 41:1–6. https://doi.org/10.1145/2853073.2853083 . A Decision Tree Regression based Approach for the Number of Software Faults Prediction Rodriguez-Galiano V, Sanchez-Castillo M, Chica-Olmo M, Chica-Rivas M (2015) Machine learning predictive models for mineral prospectivity: An evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geol Rev 71:804–818. https://doi.org/10.1016/j.oregeorev.2015.01.001 Schafer T, Ring J (1997) Epidemiology of allergic diseases. Allergy 52:14–22. https://doi.org/10.1111/j.1398-9995.1997.tb04864.x Shamji MH, Ollert M, Adcock IM, Bennett O, Favaro A, Sarama R, Riggioni C, Annesi-Maesano I, Custovic A, Fontanella S, Traidl-Hoffmann C, Nadeau K, Cecchi L, Zemelka-Wiacek M, Akdis CA, Jutel M, Agache I (2023) EAACI guidelines on environmental science in allergic diseases and asthma - Leveraging artificial intelligence and machine learning to develop a causality model in exposomics. Allergy 78:1742–1757. https://doi.org/10.1111/all.15667 Shannon C (1948) A Mathematical Theory of Communication. Bell Syst Techn J 27:379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x Shen J, Ke X, Hong S, Zeng Q, Liang C, Li T, Tang A (2011) Epidemiological features of allergic rhinitis in four major cities in Western China. J Huazhong Univ Sci Tech -Med 31:433–440. https://doi.org/10.1007/s11596-011-0469-1 Sidikjan N, Eziz M, Wang Y (2022) Spatial Distribution, Contamination Levels, and Health Risks of Trace Elements in Topsoil along an Urbanization Gradient in the City of Urumqi, China. Sustainability 14:12646. https://doi.org/10.3390/su141912646 Sobieraj K, Grewling L, Bogawski P (2024) Assessing allergy risk from ornamental trees in a city: Integrating open access remote sensing data with pollen measurements. J Environ Manage 367:122051. https://doi.org/10.1016/j.jenvman.2024.122051 Song Y, Wu P (2021) An interactive detector for spatial associations. Int J Geogr Inf Sci 35:1676–1701. https://doi.org/10.1080/13658816.2021.1882680 Svensson A, Ofenloch RF, Bruze M, Naldi L, Cazzaniga S, Elsner P, Goncalo M, Schuttelaar M-LA, Diepgen TL (2018) Prevalence of skin disease in a population-based sample of adults from five European countries. Br J Dermatol 178:1111–1118. https://doi.org/10.1111/bjd.16248 Tang HHF, Sly PD, Holt PG, Holt KE, Inouye M (2020) Systems biology and big data in asthma and allergy: recent discoveries and emerging challenges. Eur Resp J 55:1900844. https://doi.org/10.1183/13993003.00844-2019 Tang Z, Li S, Shen M, Xiao Y, Su J, Tao J, Wang X, Shan S, Kang X, Wu B, Zou B, Chen X (2022) Association of exposure to artificial light at night with atopic diseases: A cross-sectional study in college students. Int J Hyg Environ Health 241:113932. https://doi.org/10.1016/j.ijheh.2022.113932 Toms JD, Lesperance ML (2003) Piecewise regression: A tool for identifying ecological thresholds. Ecology 84:2034–2041. https://doi.org/10.1890/02-0472 Tsai M-H, Shih H-J, Su K-W, Liao S-L, Hua M-C, Yao T-C, Lai S-H, Yeh K-W, Chen L-C, Huang J-L, Chiu C-Y (2022) Nasopharyngeal microbial profiles associated with the risk of airway allergies in early childhood. J Microbiol Immunol Infect 55:777–785. https://doi.org/10.1016/j.jmii.2022.01.006 Wang XD, Zheng M, Lou HF, Wang CS, Zhang Y, Bo MY, Ge SQ, Zhang N, Zhang L, Bachert C (2016) An increased prevalence of self-reported allergic rhinitis in major Chinese cities from 2005 to 2011. Allergy 71:1170–1180. https://doi.org/10.1111/all.12874 Wang Y, Liu T, Wan Z, Wang L, Hou J, Shi M, Tsui SKW (2023) Investigating causal relationships between the gut microbiota and allergic diseases: A mendelian randomization study. Front Genet 14:1153847. https://doi.org/10.3389/fgene.2023.1153847 Weikl F, Radl V, Munch JC, Pritsch K (2015) Targeting allergenic fungi in agricultural environments aids the identification of major sources and potential risks for human health. Sci Total Environ 529:223–230. https://doi.org/10.1016/j.scitotenv.2015.05.056 Weller K, Maurer M, Bauer A, Wedi B, Wagner N, Schliemann S, Kramps T, Baeumer D, Multmeier J, Hillmann E, Staubach P (2022) Epidemiology, comorbidities, and healthcare utilization of patients with chronic urticaria in Germany. J Eur Acad Dermatol Venereol 36:91–99. https://doi.org/10.1111/jdv.17724 Xu Y, Li Y (2024) Association between lipid-lowering drugs and allergic diseases: A Mendelian randomization study. World Allergy Organ J 17:100899. https://doi.org/10.1016/j.waojou.2024.100899 Zhu L, Meng J, Zhu L (2020) Applying Geodetector to disentangle the contributions of natural and anthropogenic factors to NDVI variations in the middle reaches of the Heihe River Basin. Ecol Indic 117:106545. https://doi.org/10.1016/j.ecolind.2020.106545 Ziaee A, Zia M, Goli M (2018) Identification of saprophytic and allergenic fungi in indoor and outdoor environments. Environ Monit Assess 190:574. https://doi.org/10.1007/s10661-018-6952-4 Supplementary Files Supplementarymaterial.docx Cite Share Download PDF Status: Published Journal Publication published 11 Feb, 2026 Read the published version in International Journal of Biometeorology → Version 1 posted Reviewers agreed at journal 23 Jul, 2025 Reviewers invited by journal 23 Jul, 2025 Editor assigned by journal 20 Jun, 2025 First submitted to journal 20 Jun, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6938034","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":489769044,"identity":"f322a191-3afb-4e43-ac09-c9f291e8f15e","order_by":0,"name":"Fengxia Hu","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABAElEQVRIie3QwUrDMBjA8S8EustXck3Z0FeI7DAHxb1KSqAnQR8hMPAUPHcv4jmlUC9Dr8IOdhQ8bwhSdAfbefJg2uPA/C8hkB/JFwCf7wQTQKpuRRjppJJNjIzpPkLFD0Fbi61JJ1FmhxEAnrxFVVDEQks3mXFF6tvPYsK0UkLiMwqwZLe//pvMM0Wnq/sUuS3bWS43OKOaRqsHx8NebspxaGKE3HS3bHCubUBDJ1Gjr9BwPC/YB5fBEwore0lAsf1bUULaEjuArGs6DnWKF2tQIjEKoyxfumd5VOQdD8Xi7LVKtk1ztWBsme/2DnKM3P3e6p7zXYcBZ3w+n+//9g24glTvQfyg5QAAAABJRU5ErkJggg==","orcid":"https://orcid.org/0009-0007-4386-860X","institution":"People's Hospital of Xinjiang Uygur Autonomous Region","correspondingAuthor":true,"prefix":"","firstName":"Fengxia","middleName":"","lastName":"Hu","suffix":""},{"id":489769045,"identity":"061fd769-3bd3-47f5-b9ef-48574b770f66","order_by":1,"name":"Yizhou Li","email":"","orcid":"","institution":"","correspondingAuthor":false,"prefix":"","firstName":"Yizhou","middleName":"","lastName":"Li","suffix":""},{"id":489769046,"identity":"e3221c33-1dc8-4831-9e9a-94a3657ce575","order_by":2,"name":"Xiaoyu Zhang","email":"","orcid":"","institution":"","correspondingAuthor":false,"prefix":"","firstName":"Xiaoyu","middleName":"","lastName":"Zhang","suffix":""},{"id":489769047,"identity":"6c8d8531-5bbc-4330-a927-1fe45e4a650b","order_by":3,"name":"Qian Wang","email":"","orcid":"","institution":"","correspondingAuthor":false,"prefix":"","firstName":"Qian","middleName":"","lastName":"Wang","suffix":""},{"id":489769048,"identity":"56e41b50-a0d4-4b8a-9130-d1ddd3aa17ed","order_by":4,"name":"Jin Zhang","email":"","orcid":"","institution":"","correspondingAuthor":false,"prefix":"","firstName":"Jin","middleName":"","lastName":"Zhang","suffix":""},{"id":489769049,"identity":"17713711-51b5-4e4a-8195-7573cc5caabe","order_by":5,"name":"Junqin Liang","email":"","orcid":"","institution":"","correspondingAuthor":false,"prefix":"","firstName":"Junqin","middleName":"","lastName":"Liang","suffix":""}],"badges":[],"createdAt":"2025-06-20 10:32:55","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6938034/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6938034/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1007/s00484-026-03127-2","type":"published","date":"2026-02-11T15:57:59+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":87572625,"identity":"ff333fb6-7ef7-415f-936b-9b43a5915552","added_by":"auto","created_at":"2025-07-25 11:07:10","extension":"jpeg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":214162,"visible":true,"origin":"","legend":"\u003cp\u003eTechnical flow chart of the machine learning-based framework for forecasting ASD and AND prevalence\u003c/p\u003e","description":"","filename":"floatimage1.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-6938034/v1/0f4b22a850f9e39435c9a158.jpeg"},{"id":87572623,"identity":"25e0d7c2-540c-44d6-913a-c61c2fbd9f7a","added_by":"auto","created_at":"2025-07-25 11:07:10","extension":"jpeg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":243265,"visible":true,"origin":"","legend":"\u003cp\u003eThe temporal variability in the prevalence of ASD and AND. (a) and (b) are the multi-year change trends of two allergic diseases and piecewise regression fitting results. The red line represents the observed prevalence, while the green line indicates the segmented fitted curve. The inset table provides the slopes and time periods (start time, ST, and end time, ET) for each segment. (c) Multi-year average monthly change trends of the ASD and AND.\u003c/p\u003e","description":"","filename":"floatimage2.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-6938034/v1/b7ce7348b403123e9b19c8c1.jpeg"},{"id":87572626,"identity":"c477f1b5-2e97-4baf-897a-7fe61fcaa0f5","added_by":"auto","created_at":"2025-07-25 11:07:10","extension":"jpeg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":161987,"visible":true,"origin":"","legend":"\u003cp\u003e(a) Heat map of the correlation between two allergic diseases and 12 influencing factors. (b) The results of information gain. Each bar on the graph represents a different environmental factor, with the height of the bar indicating the reduction in entropy, achieved by incorporating that factor into the model.\u003c/p\u003e","description":"","filename":"floatimage3.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-6938034/v1/111b6370474b7019b0928d08.jpeg"},{"id":87573807,"identity":"7e0dac1c-38a0-4497-9a68-070abf11a8d1","added_by":"auto","created_at":"2025-07-25 11:23:10","extension":"jpeg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":111970,"visible":true,"origin":"","legend":"\u003cp\u003ePerformance evaluation results of five machine learning models. (a) and (b) are the performance of the five machine learning models on the ASD training set and test set; (c) and (d) are the performance of the five machine learning models on the AND training set and test set.\u003c/p\u003e","description":"","filename":"floatimage4.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-6938034/v1/3f7eee23478c9cbc7cc1888f.jpeg"},{"id":87573393,"identity":"70d71944-240b-4ee6-870b-8b81bd0bad8d","added_by":"auto","created_at":"2025-07-25 11:15:10","extension":"jpeg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":140874,"visible":true,"origin":"","legend":"\u003cp\u003eThe annual prevalence of ASD and AND from 2024 to 2099 across various SSP scenarios in the future.\u003c/p\u003e","description":"","filename":"floatimage5.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-6938034/v1/0fca328b979e18bd775d21c3.jpeg"},{"id":87573388,"identity":"67258bed-5fe7-444f-b632-da952fb53bc7","added_by":"auto","created_at":"2025-07-25 11:15:10","extension":"jpeg","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":151310,"visible":true,"origin":"","legend":"\u003cp\u003e(a) The explanatory power of different influencing factors to ASD and AND. (b) The explanatory power of the interaction of two factors.\u003c/p\u003e","description":"","filename":"floatimage6.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-6938034/v1/1a210cf3f7ffcb302661635e.jpeg"},{"id":102785441,"identity":"99dd2dc1-cb17-4512-b793-e54644df0ad6","added_by":"auto","created_at":"2026-02-16 16:06:40","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1876969,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6938034/v1/3188ee89-80ee-40b8-a70e-2b6bbfef15e6.pdf"},{"id":87572640,"identity":"5a602ba7-f0c5-4d5c-ae74-542a2633885e","added_by":"auto","created_at":"2025-07-25 11:07:11","extension":"docx","order_by":4,"title":"","display":"","copyAsset":false,"role":"supplement","size":1566592,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementarymaterial.docx","url":"https://assets-eu.researchsquare.com/files/rs-6938034/v1/b1d5f1526bdf4fcb50a8e294.docx"}],"financialInterests":"","formattedTitle":"Projected typical allergic diseases prevalence under changing environments based on multiple machine learning models","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eIn recent years, allergic diseases have emerged as a significant and escalating public health concern worldwide(Conway et al., \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Wang et al., \u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Xu and Li, \u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). As the two of the most commonly prevalence of allergic diseases, the allergic skin diseases (ASD) and allergic nasopharyngeal disease (AND) can significantly impact quality of life(Huang et al., \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Lack, \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2001\u003c/span\u003e; Tsai et al., \u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). Previous studies indicate that climate change can exacerbate ASD and AND by influencing the potency and distribution of environmental allergens(Grant et al., \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; C.-F. Huang et al., \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). For instance, warmer temperatures and higher carbon dioxide levels can enhance pollen production and extend the growing seasons of allergenic plants(Cheng et al., \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Lam et al., \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). Additionally, humidity levels and precipitation patterns can impact the growth of allergenic fungi and the dispersion of dust mites, which aggravates hypersensitivity reactions and chronic inflammation in individuals with these conditions(Weikl et al., \u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e2015\u003c/span\u003e; Ziaee et al., \u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). As global climate change continues to accelerate, the rising prevalence of allergic diseases is inevitably having an increasing impact on both individual well-being and healthcare systems. Therefore, accurately predicting the prevalence of ASD and AND are crucial for effective public health planning and resource allocation, as they enable targeted interventions and preventive measures.\u003c/p\u003e\u003cp\u003eTraditional methods for assessing the incidence of ASD and AND typically include epidemiological surveys, population-based studies, and analysis of healthcare utilization data(Schafer and Ring, \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e1997\u003c/span\u003e; Svensson et al., \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Weller et al., \u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). These approaches often rely on self-reported symptoms, medical records, and clinic-based assessments to estimate prevalence rates. However, these approaches present several limitations, including potential underreporting or misdiagnosis due to subjective reporting and variability in diagnostic practices. Given the complex relationship between climate variables and allergic disease prevalence, utilizing climate data to develop data-driven models offers a valuable opportunity for estimating and forecasting the prevalence of ASD and AND(Martinez et al., \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Tang et al., \u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). Currently, the data-driven methods can be categorized into empirical methods and machine learning (ML) techniques. Commonly used empirical methods for predicting allergic disease incidence, such as linear regression and time series analysis, provide clear quantification of relationships and straightforward implementation(de Marco et al., \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2002\u003c/span\u003e; Hu et al., \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). However, these methods may oversimplify complex interactions and fail to account for non-linear or dynamic effects of climate variables on disease prevalence.\u003c/p\u003e\u003cp\u003eDue to their capability to identify complex patterns and extract relevant features, machine learning methods can effectively overcome the limitations of the empirical models. Prosperi et al.(Prosperi et al., \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2014\u003c/span\u003e) evaluated five machine learning models using clinical, demographic, laboratory, genetic, and environmental data to predict allergic phenotypes. The results indicated that the random forest model has the best performance among all the models. Huang et al.(Y. Huang et al., \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2021\u003c/span\u003e) assessed the predictive capability of various machine learning methods for childhood atopic dermatitis and allergic rhinitis using longitudinal data from 1,439 mother-infant pairs, finding that tree-based models had the highest sensitivity and specificity regarding air pollution exposure. With advancements in satellite technology, remote sensing provides a rich dataset for machine learning-based research on allergic diseases. For instance, Lin et al.(Lin et al., \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e2022\u003c/span\u003e) utilized satellite remote sensing data to extract urban green space area and investigated its relationship with allergic diseases in a study of 522 two-year-old children in Guangzhou, China. Tang et al.(Tang et al., \u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e2022\u003c/span\u003e) utilized remote sensing to obtain urban nighttime light and meteorological data to investigate the effects of artificial light at night (ALAN) and air pollutants on the incidence of allergic diseases among college students. Previous research suggests that combining multi-source remote sensing data with comprehensive medical records, along with machine learning techniques, has the potential to clarify the complex relationships between air pollution levels, climate variables, and the incidence of allergic diseases(Shamji et al., \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Sobieraj et al., \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). This advanced analytical capability offers the potential for more precise forecasting of allergic disease events, enabling the development of targeted public health interventions and preventive measures.\u003c/p\u003e\u003cp\u003eIn this study, we intended to develop and validate machine learning models for the prediction of the prevalence of the allergic disease. More specifically, we first analyzed the relationship between climate factors and the occurrence of allergic disease events to identify the optimal input features. Subsequently, five machine learning models were developed and trained using records of allergic diseases in Urumqi and pre-selected features. Finally, the optimal models for ASD and AND were employed to predict the prevalence of allergic diseases under three future climate change scenarios. By analyzing historical climate data alongside epidemiological trends, we aim to elucidate patterns that could enhance our understanding of how environmental factors influence allergic disease dynamics. Overall, this study is anticipated to significantly enhance the monitoring capabilities for allergic disease outbreaks, providing valuable technical support for the formulation of preventive measures and management strategies by government agencies.\u003c/p\u003e"},{"header":"2. Study site","content":"\u003cp\u003eUrumqi is located in the Xinjiang Uygur Autonomous Region and covers 13,800 km\u0026sup2;, with a population of 4.08\u0026nbsp;million(Chen et al., \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). Situated in the heart of the Eurasian continent, Urumqi is subject to a continental arid climate that is marked by pronounced seasonal extremes. The annual mean temperature in Urumqi is 6.7\u0026deg;C, with perennial average rainfall of 280 mm(Sidikjan et al., \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). These climatic conditions create a dry environment with low humidity, which can affect the dispersion and concentration of airborne allergens, such as pollen and dust mites. The combination of limited precipitation and high evaporation rates contributes to a heightened presence of particulate matter in the air, which can exacerbate allergic responses(Shen et al., \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2011\u003c/span\u003e). The eastern part of the Urumqi is characterized by mountainous and hilly terrain, while the western region comprises expansive plains. This topographic diversity further influences the local microclimates and allergen concentrations, providing a rich context for investigating how environmental factors interplay with allergic disease prevalence and severity(Wang et al., \u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e2016\u003c/span\u003e).\u003c/p\u003e"},{"header":"3. Materials and methods","content":"\u003cp\u003eThis study developed a predictive framework for allergic diseases based on multi-source remote sensing data and various machine learning models. Initially, environmental, hydrological, and meteorological factors were considered, and twelve influencing factors were selected through information gain (IG) and correlation analysis. Subsequently, five representative machine learning models were chosen, and model parameters were tuned using k-fold cross-validation and grid search algorithms. The overall performance of these models was evaluated using Mean Absolute Error (MAE), Mean Squared Error (MSE), and correlation coefficient (r) to select the optimal predictive model for allergic diseases. Using CMIP6 data, twelve influencing factors under three SSP scenarios were extracted and input into the optimal model to forecast the prevalence trends of allergic diseases under future climate change condition. Finally, the Geodetector method was employed to elucidate the impact mechanisms of various influencing factors on allergic disease prevalence. The technical flow chart for this study is shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cdiv id=\"Sec4\" class=\"Section2\"\u003e\u003ch2\u003e3.1 Dataset collection\u003c/h2\u003e\u003cdiv id=\"Sec5\" class=\"Section3\"\u003e\u003ch2\u003e3.1.1 Allergic diseases data collection\u003c/h2\u003e\u003cp\u003eWe first collected monthly incidence data for ASD and AND in Urumqi from 2014 to 2023. Then, the annual population data of Urumqi between 2014 and 2023 was downloaded from the websites of National Bureau of Statistics (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.stats.gov.cn/\u003c/span\u003e\u003cspan address=\"https://www.stats.gov.cn/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). Finally, the monthly prevalence rates of ASD and AND were obtained by dividing the monthly incidence figures by the total population for each corresponding year.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec6\" class=\"Section3\"\u003e\u003ch2\u003e3.1.2 Environmental monitoring date collection\u003c/h2\u003e\u003cp\u003eThe environmental factors including aerosol optical depth (AOD), ozone (O₃), and the ultraviolet index (UVI). The AOD reflects the concentration of airborne particulates, which can trigger inflammatory responses in the skin and nasal passages. O\u003csub\u003e3\u003c/sub\u003e is a potent respiratory irritant that can worsen nasal inflammation and contribute to allergic rhinitis(Kim et al., \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2011\u003c/span\u003e). High UVI levels can damage skin cells and alter immune responses, potentially leading to flare-ups in allergic skin conditions and impacting the severity of allergic rhinitis. Thus, we selected these three factors to develop a predictive model for the prevalence of allergic diseases. The AOD data was extracted from MOD08_M3 (version 6.1) product in Google Earth Engine (GEE). The O₃ data is sourced from the surface O₃ dataset within the China High Air Pollutants datasets (CHAP). The UVI data was acquired from the Clouds and the Earth\u0026rsquo;s Radiant Energy System (CERES) products (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://asdc.larc.nasa.gov/project/CERES\u003c/span\u003e\u003cspan address=\"https://asdc.larc.nasa.gov/project/CERES\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eThe hydrological and climate data, including monthly total runoff (Q\u003csub\u003esum\u003c/sub\u003e), monthly maximum temperature (Q\u003csub\u003emax\u003c/sub\u003e), daily minimum temperature (T\u003csub\u003emin\u003c/sub\u003e), daily maximum temperature (T\u003csub\u003emax\u003c/sub\u003e), daily mean temperature (T\u003csub\u003emean\u003c/sub\u003e), daily maximum precipitation (P\u003csub\u003emax\u003c/sub\u003e), daily minimum precipitation (P\u003csub\u003emin\u003c/sub\u003e), eastward component of the 10m wind (W\u003csub\u003eu\u003c/sub\u003e), and northward component of the 10m wind (W\u003csub\u003ev\u003c/sub\u003e), were acquired from the land component of fifth generation ECMWF atmospheric reanalysis (ERA5-land) of the global climate datasets(Hersbach et al., \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Munoz-Sabater et al., \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2021\u003c/span\u003e).\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Sec7\" class=\"Section2\"\u003e\u003ch2\u003e3.2 Output of CMIP6 projection\u003c/h2\u003e\u003cp\u003eTo address the challenges posed by global climate change, the World Climate Research Programme (WCRP) initiated a new phase of the Coupled Model Intercomparison Project (CMIP6) (Meehl et al., 2020).This project provides data support for achieving the scientific goals established by the WCRP\u0026rsquo;s Grand Challenges program(Li et al., 2023; Lovato et al., \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). To predict future trends of two allergic diseases in the study area, we selected three greenhouse gas emission scenarios from CMIP6: low emission scenario (SSP126), medium emission scenario (SSP370), and high emission scenario (SSP585). The Shared Socioeconomic Pathways (SSPs) in CMIP6 integrate Representative Concentration Pathways (RCPs) and consider the impact of socioeconomic development, providing more reliable potential outcomes of future climate change(Gidden et al., \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2019\u003c/span\u003e).\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e\u003ch2\u003e3.3 Piecewise regression model\u003c/h2\u003e\u003cp\u003ePiecewise regression model (PRM) is used to capture varying relationships between predictors and responses by dividing the predictor space into segments, each with its own linear relationship(Toms and Lesperance, \u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e2003\u003c/span\u003e). The process involves identifying breakpoints where the relationship changes, defining separate linear models for each segment, and estimating the model parameters. A PRM model with \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:k\\)\u003c/span\u003e\u003c/span\u003e segments can be specified as:\u003cdiv id=\"Equ1\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ1\" name=\"EquationSource\"\u003e\n$$\\:{Y}_{i}=\\left\\{\\begin{array}{c}{\\beta\\:}_{\\text{0,1}}+{\\beta\\:}_{\\text{1,1}}{X}_{i}+{ϵ}_{i}\\:\\:\\:\\:if{X}_{i}\\le\\:{c}_{1}\\\\\\:{\\beta\\:}_{\\text{0,2}}+{\\beta\\:}_{\\text{1,2}}{X}_{i}+{ϵ}_{i}\\:\\:\\:\\:if{c}_{1}\\le\\:{X}_{i}\\le\\:{c}_{2}\\\\\\:\\genfrac{}{}{0pt}{}{\\dots\\:}{{\\beta\\:}_{0,k}+{\\beta\\:}_{1,k}{X}_{i}+{ϵ}_{i}\\:\\:\\:\\:if{X}_{i}\\ge\\:{c}_{k-1}}\\end{array}\\right.$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e1\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eWhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{c}_{1},{c}_{2},\\dots\\:,{c}_{k-1}\\)\u003c/span\u003e\u003c/span\u003e represents break points, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\beta\\:}_{0,j}\\)\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\beta\\:}_{1,j}\\)\u003c/span\u003e\u003c/span\u003e represents segment-specific coefficients, and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{ϵ}_{i}\\)\u003c/span\u003e\u003c/span\u003e represents the error term.\u003c/p\u003e\u003cp\u003eThe objective function of A PRM model is to minimize the sum of squared residuals across all segments, which can be formulated as follows:\u003cdiv id=\"Equ2\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ2\" name=\"EquationSource\"\u003e\n$$\\:{Objective}_{PRM}=Minimize\\sum\\:_{i=1}^{n}{({Y}_{i}-\\widehat{{Y}_{i}})}^{2}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e2\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003ewhere\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\:{Y}_{i}\\)\u003c/span\u003e\u003c/span\u003e represents the observed valuer. \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\widehat{{Y}_{i}}\\)\u003c/span\u003e\u003c/span\u003e is the predicted value based on the PRE model.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec9\" class=\"Section2\"\u003e\u003ch2\u003e3.4 Information Gain\u003c/h2\u003e\u003cp\u003eInformation Gain (IG) method was derived from information theory and introduced by Claude Shannon(Shannon, \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e1948\u003c/span\u003e). By measuring the reduction in uncertainty or entropy achieved through the introduction of a particular feature, the Information Gain (IG) method can identify the most effective features for splitting data and enhance model performance. For a machine learning model, the initial entropy, which represents the uncertainty in the target variable before considering any feature, can be calculated as follows:\u003cdiv id=\"Equ3\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ3\" name=\"EquationSource\"\u003e\n$$\\:H\\left(D\\right)=\\:-\\sum\\:_{i}{p}_{i}{log}_{2}{p}_{i}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e3\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eWhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:H\\left(D\\right)\\)\u003c/span\u003e\u003c/span\u003erepresents the initial entropy, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{p}_{i}\\)\u003c/span\u003e\u003c/span\u003e denotes the probability of the i-th outcome in the target variable.\u003c/p\u003e\u003cp\u003eThe conditional entropy \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:H\\left(D|X\\right)\\)\u003c/span\u003e\u003c/span\u003e, which represents the remaining uncertainty in the target variable when a feature \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:X\\)\u003c/span\u003e\u003c/span\u003e is known, can be calculated as follows:\u003cdiv id=\"Equ4\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ4\" name=\"EquationSource\"\u003e\n$$\\:H\\left(D|X\\right)=\\sum\\:_{v\\in\\:Values\\left(X\\right)}\\frac{\\left|{D}_{v}\\right|}{\\left|D\\right|}H\\left({D}_{v}\\right)\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e4\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eWhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{D}_{v}\\)\u003c/span\u003e\u003c/span\u003e is the subset of data where feature X has the values \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:v\\)\u003c/span\u003e\u003c/span\u003e, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\left|{D}_{v}\\right|\\)\u003c/span\u003e\u003c/span\u003e denotes the size of this subset.\u003c/p\u003e\u003cp\u003eThus, the information gain \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:IG(D,X)\\)\u003c/span\u003e\u003c/span\u003e can be formulated as follows:\u003cdiv id=\"Equ5\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ5\" name=\"EquationSource\"\u003e\n$$\\:IG\\left(D,X\\right)=\\:H\\left(D\\right)-H\\left(D|X\\right)$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e5\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eA higher Information Gain indicates that the feature \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:X\\)\u003c/span\u003e\u003c/span\u003e provides a greater reduction in uncertainty, making it more valuable for improving model performance.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec10\" class=\"Section2\"\u003e\u003ch2\u003e3.5 Machine Learning models\u003c/h2\u003e\u003cp\u003eWe compared the performance of five commonly used machine learning models, including Decision tree regression (DTR), Ridge Regression (RR), Random Forest Regression (RFR), Elastic Net Regression (ENR), and Support Vector Regression (SVR), to select the optimal model for predicting the prevalence of ASD and AND. DTR is a non-linear model that uses a tree-like structure to make decisions based on feature values(Rathore and Kumar, \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2016\u003c/span\u003e). RR is a linear regression technique that includes a penalty term to the loss function, helping to mitigate multicollinearity and reduce overfitting by shrinking coefficient estimates(Hoerl and Kennard, \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e1970\u003c/span\u003e). RFR is an ensemble learning method that builds multiple decision trees and averages their predictions, enhancing accuracy and robustness while reducing the risk of overfitting(Rodriguez-Galiano et al., \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2015\u003c/span\u003e). ENR combines the penalties of both Ridge and Lasso regression, allowing it to effectively handle high-dimensional datasets while performing variable selection and regularization(Hans, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2011\u003c/span\u003e). SVR is a powerful regression technique that uses support vector machines to find a hyperplane that best fits the data, capable of capturing complex relationships through kernel function (Awad et al., 2015).\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e\u003ch2\u003e3.6 Performance evaluation metrics\u003c/h2\u003e\u003cp\u003eMAE, MSE and \u0026#119903; were chosen as evaluating metrics in this study to comprehensively assess the model's performance on both the training and validation sets. The formulas for these three metrics are as follows\u003cdiv id=\"Equ6\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ6\" name=\"EquationSource\"\u003e\n$$\\:MSE=\\:\\frac{1}{m}\\sum\\:_{i=1}^{m}{({y}_{i}-{\\widehat{y}}_{i})}^{2}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e6\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equ7\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ7\" name=\"EquationSource\"\u003e\n$$\\:MAE=\\:\\frac{1}{m}\\sum\\:_{i=1}^{m}|{y}_{i}-{\\widehat{y}}_{i}|$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e7\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equ8\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ8\" name=\"EquationSource\"\u003e\n$$\\:r=\\:\\frac{{\\sum\\:}_{i=1}^{m}\\left[({y}_{i}-\\stackrel{-}{y})(\\widehat{{y}_{i}}-\\stackrel{-}{\\widehat{y}})\\right]}{\\sqrt{{\\sum\\:}_{i=1}^{m}{({y}_{i}-\\stackrel{-}{y})}^{2}*{\\sum\\:}_{i=1}^{m}{(\\widehat{{y}_{i}}-\\stackrel{-}{\\widehat{y}})}^{2}}}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e8\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eWhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{y}_{i}\\)\u003c/span\u003e\u003c/span\u003e denotes the true value of the i-th sample, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\widehat{y}}_{i}\\)\u003c/span\u003e\u003c/span\u003e denotes the predicted value of the i-th sample, and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:m\\:\\)\u003c/span\u003e\u003c/span\u003erepresents the total number of samples.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec12\" class=\"Section2\"\u003e\u003ch2\u003e3.7 Generalized Linear Model\u003c/h2\u003e\u003cp\u003eGeneralized Linear Model (GLM) is a versatile framework for regression analysis that extends the traditional linear model to accommodate a wide range of response variable types( Nelder et al., 1972). The GLM framework consists of three main components including random component, systematic component, and link function. The random component specifies the probability distribution of the response variable and the systematic component represents the linear predictor. The link function relates the mean of the response variable \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\mu\\:\\)\u003c/span\u003e\u003c/span\u003e to the liner predictor \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\eta\\:\\)\u003c/span\u003e\u003c/span\u003e and can be formulated as follows:\u003cdiv id=\"Equ9\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ9\" name=\"EquationSource\"\u003e\n$$\\:g\\left(\\mu\\:\\right)=X\\beta\\:$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e9\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eWhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:X\\)\u003c/span\u003e\u003c/span\u003e is the matrix of predictors, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\beta\\:\\)\u003c/span\u003e\u003c/span\u003e represents the vector of coefficients. For a sample of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:n\\)\u003c/span\u003e\u003c/span\u003e observations, the log-likelihood can be formulated as follows:\u003cdiv id=\"Equ10\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ10\" name=\"EquationSource\"\u003e\n$$\\:\\mathcal{l}\\left(\\beta\\:\\right)=\\sum\\:_{i=1}^{n}logf({y}_{i},{\\mu\\:}_{i},\\varphi\\:)$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e10\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003ewhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:f\\)\u003c/span\u003e\u003c/span\u003e is the probability function for the response variable, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\mu\\:}_{i}\\)\u003c/span\u003e\u003c/span\u003e is the mean for the i-th observation, and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\varphi\\:\\)\u003c/span\u003e\u003c/span\u003e is a dispersion parameter.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec13\" class=\"Section2\"\u003e\u003ch2\u003e3.8 Geodetector\u003c/h2\u003e\u003cp\u003eGeodetector is a statistical technique used to analyze and quantify the spatial distribution of explanatory variables and their influence on a response variable(Zhu et al., \u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). The Geodetector method encompasses four types of detectors, including Factor Detector, Interaction Detector, Risk Detector, and Ecological Detector. Among these, the factor detector is used to quantify the impact of different environmental indicators on the prevalence of allergic diseases, which can be formulated as follows:\u003cdiv id=\"Equ11\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ11\" name=\"EquationSource\"\u003e\n$$\\:Q=1-\\frac{\\sum\\:_{h=1}^{L}{N}_{h}{\\sigma\\:}_{h}^{2}}{N{\\sigma\\:}^{2}}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e11\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eWhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:h\\:=\\text{1,2},3..n\\)\u003c/span\u003e\u003c/span\u003e represents specific categories, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:L\\)\u003c/span\u003e\u003c/span\u003e denotes the number of layers for the independent or dependent variables, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{N}_{h}\\)\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:N\\:\\)\u003c/span\u003e\u003c/span\u003eare the number of units in layer \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:h\\)\u003c/span\u003e\u003c/span\u003e and the total number of units across all layer, respectively, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\sigma\\:}_{h}\\:\\)\u003c/span\u003e\u003c/span\u003eand \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\sigma\\:\\)\u003c/span\u003e\u003c/span\u003e are the variances within layer \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:h\\)\u003c/span\u003e\u003c/span\u003e and the overall variance across the total layer, respectively.\u003c/p\u003e\u003cp\u003eThe Interaction Detector evaluates how interactions between two factors affect a variable's explanatory power, categorizing them in Geodetector as nonlinear weakening, single-factor nonlinear weakening, double-factor enhancement, independence, and linear enhancement(Song and Wu, \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). To meet the input data requirements for Geodetector, the optimal parameter discretization method is first applied. Both the Geodetector and the optimal parameter discretization methods are implemented using the GD package in R language.\u003c/p\u003e\u003c/div\u003e"},{"header":"4. Results","content":"\u003cdiv id=\"Sec15\" class=\"Section2\"\u003e\u003ch2\u003e4.1 Temporal variability of Allergic Diseases\u003c/h2\u003e\u003cp\u003eThe variation of the ASD and AND in Urumqi from 2014 to 2023 is illustrated in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e. In general, the prevalence of both ASD and AND has shown an upward trend from 2014 to 2023. For ASD, the highest prevalence was observed in January 2014, at 0.014%, while the lowest prevalence occurred in June 2017, at 0.083%. For AND, the highest prevalence was observed in November 2022, at 0.017%, while the lowest prevalence occurred in August 2023, at 0.001%. According to the Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e (a), the prevalence of ASD in the population exhibited a notable inflection point in March 2017. Prior to March 2017, the prevalence displayed an increasing trend, which subsequently transitioned to a decreasing trend. According to Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e (b), the prevalence of AND in the population exhibited two significant inflection points, occurring in July 2019 and September 2022. Initially, the prevalence displayed an increasing trend up to July 2019. Following this, a decreasing trend was observed until September 2022, after which the prevalence began to rise again.\u003c/p\u003e\u003cp\u003eFrom an annual variation perspective, both ASD and AND exhibit notable seasonal trends [Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e (c)]. For ASD, the prevalence shows an increasing trend from January to July, and a decreasing trend from July to December, with the average multi-year monthly maximum value of 0.060% and minimum values of 0.0350%. For AND, the prevalence rises from January to August and declines from August to December. with the average multi-year monthly maximum value of 0.011% and minimum values of 0.005%.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec16\" class=\"Section2\"\u003e\u003ch2\u003e4.2 Analysis of environmental factors\u003c/h2\u003e\u003cp\u003eAccording to the Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e (a), all the environmental factors exhibit positive correlations with ASD and AND except for W\u003csub\u003eV\u003c/sub\u003e. For both ASD and AND, the minimum temperature exhibits the most significant correlation, with correlation coefficients of 0.63 and 0.52, respectively (P\u0026thinsp;\u0026lt;\u0026thinsp;0.001). This may be due to the fact that low temperatures can easily impact the human immune system, potentially leading to an abnormal enhancement of immune responses. Low temperatures are often accompanied by low humidity, which causes dry air, thereby exacerbating the dryness and sensitivity of the skin and respiratory tract. Conversely, the AOD factor shows the least correlation with these diseases, with coefficients of 0.29 and 0.12 (P\u0026thinsp;\u0026lt;\u0026thinsp;0.05), respectively. AOD primarily reflects the particulate matter concentration in the atmosphere, whereas allergic diseases are more influenced by specific allergens (such as pollen, dust mites, and mold), and the relationship between the concentration and distribution of these allergens and aerosol concentrations may not be significant.\u003c/p\u003e\u003cp\u003eThe relative importance of various environmental factors in influencing allergic disease as shown in the Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e (b). Factors with higher bars contribute more significantly to reducing uncertainty about allergic disease outcomes, suggesting they have a stronger impact on the incidence or severity of these conditions. According to the Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e (b), the majority of environmental factors affecting ASD and AND in Urumqi exhibit similar information gain values, clustered within the narrow range of 6.6 to 7.0. This suggests that these factors contribute almost equally to reducing uncertainty in the model. However, Q\u003csub\u003emax\u003c/sub\u003e stands out with a slightly lower information gain value of 6.6 for both ASD and AND, indicating that it has a marginally lesser impact on the prediction of allergic disease outcomes compared to the other factors. Overall, the close proximity of information gain values of the environmental factors highlighting the multifaceted nature of environmental influences on allergic diseases. The input features of the final allergic disease model include a total of 12 environmental factors such as UVI, P\u003csub\u003esum\u003c/sub\u003e, and T\u003csub\u003emin\u003c/sub\u003e.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec17\" class=\"Section2\"\u003e\u003ch2\u003e4.3 Model performance evaluation\u003c/h2\u003e\u003cp\u003eAccording to the Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e, for the ASD prevalence prediction, the comprehensive performance of the five machine learning models in both training datasets and testing dataset is ranked from high to low as: DTR\u0026thinsp;\u0026gt;\u0026thinsp;RR\u0026thinsp;\u0026gt;\u0026thinsp;RFR\u0026thinsp;\u0026gt;\u0026thinsp;ENR\u0026thinsp;\u0026gt;\u0026thinsp;SVR. The DTR model outperformed all other models, with the weighted MSE, MAE, and r values of 0.53, 0.58, and 0.72, respectively. For AND prevalence prediction, the comprehensive performance of the five machine learning models in both training datasets and testing dataset is ranked from high to low as: RR\u0026thinsp;\u0026gt;\u0026thinsp;SVR\u0026thinsp;\u0026gt;\u0026thinsp;RFR\u0026thinsp;\u0026gt;\u0026thinsp;ENR\u0026thinsp;\u0026gt;\u0026thinsp;DTR. The RR model outperformed all other models, with the weighted MSE, MAE, and r values of 0.62, 0.54, and 0.61, respectively. Therefore, we selected the DTR and RR models for subsequent predictions of ASD and AND prevalence under different SSP scenarios. It worth to nothing that, the performance differences among the five machine learning models used to predict two allergic diseases based on environmental factors were all within 0.2. This suggests that further performance improvements within the current framework may require the introduction of new features or the exploration of alternative models in future work.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eFigure S2. shows the consistency between the actual prevalence and the predicted prevalence by the optimal model for both ASD and AND. In general, the two optimal machine learning models effectively capture the peaks and troughs in the prevalence trends of ASD and AND. For ASD prediction, the DTR model tends to overestimate high values and underestimate low values in the training dataset. In the testing dataset, the underestimation by the DTR model is more pronounced. For AND prediction, In the training dataset, the RR model mainly underestimates high values, while in the testing dataset, it primarily overestimates low values. Overall, the results demonstrates that the DTR and RR models perform well in predicting the prevalence of ASD and AND.\u003c/p\u003e\u003c/div\u003e"},{"header":"5. Discussion","content":"\u003cdiv id=\"Sec19\" class=\"Section2\"\u003e\u003ch2\u003e5.1 Projected environmental factors under three SSPs\u003c/h2\u003e\u003cp\u003eThe CMIP6 provides a unique opportunity to access data on variables such as temperature and precipitation under different future greenhouse gas emission scenarios [Fig. S3 (a) - (k)]. Among all environmental factors, the intensity of UVI significantly influences skin immune responses by altering individual sensitivity to allergens, thereby directly impacting the occurrence and severity of allergic reactions. According to the Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e, UVI is crucial for accurately predicting the prevalence of ASD and AND. Nevertheless, CMIP6 does not provide UVI data. Thus, in this study, we utilized a GLM model to establish the relationship between UVI and the other 11 influencing factors. The results indicate that using AOD and O\u003csub\u003e3\u003c/sub\u003e as predictors for estimating UVI yields high model accuracy (as shown in the Table \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e), with an R\u003csup\u003e2\u003c/sup\u003e of 0.77 (p\u0026thinsp;\u0026lt;\u0026thinsp;0.001). The AOD reflects the ability of aerosol particles in the atmosphere to absorb and scatter UV radiation. These particles directly influence the transmittance of UV radiation, making AOD data crucial for understanding the propagation and distribution of UV radiation in the atmosphere(Hu et al., \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2010\u003c/span\u003e). Ozone selectively absorbs UV radiation, particularly in the shorter wavelengths of UVB and UVC regions. Combining AOD and ozone data thus provides accurate estimates of current UVI. Finally, utilizing AOD and O\u003csub\u003e3\u003c/sub\u003e data from CMIP6 along with the GLM model, we derived UVI data for the period 2024\u0026ndash;2099 under three SSP scenarios [Fig. S3 (k)].\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec20\" class=\"Section2\"\u003e\u003ch2\u003e5.2 Projected ASD and AND prevalence Under three SSPs\u003c/h2\u003e\u003cp\u003eBased on the trained optimal ASD and AND prediction models and meteorological datasets from CMIP6, the monthly changes in the prevalence of ASD and AND in Urumqi from 2024 to 2099 under three scenarios (SSP126, SSP370, and SSP585) are shown in Fig.S4. Under the three SSP scenarios, the prevalence of ASD and AND exhibits cyclical variations on a monthly scale. The prevalence rates gradually increase over certain periods, reach a peak, and then gradually decline to a specific low point. By December 2099, the prevalence of ASD is projected to reach 0.039%, 0.045%, and 0.045% respectively. For ASD, the highest prevalence under the SSP126, SSP370, and SSP585 scenarios is projected to be 0.053% in May 2029, 0.053% in May 2094, and 0.053% in May 2091, respectively. The lowest prevalence under all three emission scenarios is 0.039%, occurring in January 2025, January 2030, and December 2031, respectively. For AND, by December 2099, the prevalence is projected to reach 0.007%, 0.009%, and 0.007% under three different emission scenarios, respectively.\u003c/p\u003e\u003cp\u003eThe annual variation in the prevalence of ASD and AND in Urumqi from 2015 to 2099 is illustrated in the Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e. For ASD, the annual variation shows a significant increase in prevalence under the SSP126, SSP370, and SSP585 scenarios (p\u0026thinsp;\u0026lt;\u0026thinsp;0.05). Under the SSP126 and SSP370 scenarios, the highest prevalence of ASD is observed in 2096, reaching 0.047% and 0.048%, respectively. Under the SSP585 scenario, the highest prevalence of ASD occurs in 2099, with the value of 0.049%. Generally, before 2025, the ASD prevalence is highest under the SSP126 scenario, followed by SSP370, and lowest under SSP585. As time progresses, the rate of increase in ASD prevalence under the SSP585 scenario becomes more pronounced, surpassing the other two scenarios after 2040.\u003c/p\u003e\u003cp\u003eFor AND, the prevalence of ASD under the SSP370 and SSP585 scenarios shows a significant increasing trend from 2015 to 2099 (p\u0026thinsp;\u0026lt;\u0026thinsp;0.05). Although the prevalence of ASD under the SSP126 scenario also shows an increasing trend, it is not significant (p\u0026thinsp;\u0026gt;\u0026thinsp;0.05). Under the SSP126 scenario, the highest prevalence of ASD occurs in 2097, reaching 0.008%. For the SSP370 scenario, the peak occurs in 2088 at 0.0087%, while under the SSP585 scenario, the highest prevalence is observed in 2080 at 0.009%. Overall, the prevalence of AND before 2073 is relatively similar across the three scenarios, with SSP126 and SSP370 showing slightly higher prevalence than SSP585. The prevalence of AND under the SSP585 scenario after 2073 increases significantly, and peaking and clearly surpassing the prevalence under the other two scenarios. After 2090, the prevalence of AND in all three scenarios gradually converges. It is noteworthy that although the proportional increase in the prevalence of ASD and AND under future SSP scenarios appears modest, when considering Urumqi's population of 25.9\u0026nbsp;million in 2022, the projected peak numbers of individuals with ASD are expected to increase by 215,000, 243,200, and 275,600 respectively compared to January 2015. Similarly, the projected peak numbers of individuals with AND are expected to increase by 38,900, 35,700, and 56,300 respectively compared to January 2015 under different climate scenarios in the future. Thus, the government departments should enhance health education and awareness, provide more medical services and strengthen social support networks to effectively address the potential increase in the prevalence of ASD and AND in the future.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec21\" class=\"Section2\"\u003e\u003ch2\u003e5.3 Analysis of the importance of model features and collaborative benefits based on Geographical detector\u003c/h2\u003e\u003cp\u003eInvestigating the etiological mechanisms underlying allergic disease occurrence is crucial for elucidating key factors influencing its onset and dissemination. Thus, in this study, we employ geographic detectors to examine the impact of individual environmental factors and their interactions on the prevalence of ASD and AND, as illustrated in Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e.\u003c/p\u003e\u003cp\u003eAs depicted in Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e(a) \u0026ndash; (b), all environmental factors significantly influence ASD prevalence, except for Q\u003csub\u003esum\u003c/sub\u003e (P\u0026thinsp;\u0026lt;\u0026thinsp;0.01). Among these factors, T\u003csub\u003emin\u003c/sub\u003e exerts the greatest impact on prevalence (Q\u0026thinsp;=\u0026thinsp;0.446), followed by UVI, O\u003csub\u003e3\u003c/sub\u003e, and T\u003csub\u003emean\u003c/sub\u003e with Q values of 0.419, 0.403, and 0.396, respectively. Conversely, AOD exhibits the least influence on ASD prevalence among all factors (Q\u0026thinsp;=\u0026thinsp;0.124). For AND, all environmental factors demonstrate significant effects on its occurrence (P\u0026thinsp;\u0026lt;\u0026thinsp;0.01). T\u003csub\u003emin\u003c/sub\u003e has the strongest influence on AND prevalence (Q\u0026thinsp;=\u0026thinsp;0.345), followed by O\u003csub\u003e3\u003c/sub\u003e, Q\u003csub\u003esum\u003c/sub\u003e, and T\u003csub\u003emean\u003c/sub\u003e, with T\u003csub\u003emean\u003c/sub\u003e values of 0.325, 0.325, and 0.324, respectively. At lower environmental temperatures, increased skin moisture loss weakens the skin barrier function, rendering it more susceptible to external allergens and exacerbating symptoms of ASD. Simultaneously, dry nasal mucosa and increased nasal airway resistance under cold conditions aggravate manifestations of AND. Additionally, cold temperatures may induce nasal vasoconstriction, reducing nasal blood flow and local immune cell activity, thereby impairing their ability to clear allergens and further intensifying episodes of AND.\u003c/p\u003e\u003cp\u003eThe occurrence of allergic diseases is typically influenced by multiple factors. Utilizing interaction detectors enables a more precise capture of the intricate interactions among environmental factors, thereby revealing their collective impact on allergic diseases. According to the Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e (c) \u0026ndash; (d), AOD and factors such as T\u003csub\u003emin\u003c/sub\u003e, Q\u003csub\u003emax\u003c/sub\u003e, Q\u003csub\u003esum\u003c/sub\u003e, P\u003csub\u003emax\u003c/sub\u003e, and P\u003csub\u003esum\u003c/sub\u003e exhibit nonlinear enhancements with ASD prevalence. This indicates that the impact of these factors on ASD is not simply additive but rather characterized by complex relationships that vary significantly with factor levels. Additionally, interactions of other environmental factors show a dual-factor enhancement effect on ASD prevalence, suggesting that their combined influence exceeds simple additive effects. Among all interaction factors, the interaction between temperature and total runoff has the highest impact on the prevalence of ASD, with a value of 0.599. The interaction between AOD and maximum runoff has the smallest impact on ASD prevalence, with a value of 0.314%.\u003c/p\u003e\u003cp\u003eFor AND, aside from the nonlinear enhancement observed in the interactions between AOD and other environmental factors, as well as total runoff and factors including T\u003csub\u003emax\u003c/sub\u003e, Q\u003csub\u003esum\u003c/sub\u003e, P\u003csub\u003emax\u003c/sub\u003e, and W\u003csub\u003eU\u003c/sub\u003e, all other interactions involving environmental variables show dual-factor enhancement. Specifically, the interaction between O\u003csub\u003e3\u003c/sub\u003e and Q\u003csub\u003esum\u003c/sub\u003e has the highest impact on AND prevalence, with a value of 0.590. Conversely, the interaction between P\u003csub\u003esum\u003c/sub\u003e and W\u003csub\u003eV\u003c/sub\u003e has the lowest impact on AND prevalence, with a value of 0.292. Overall, according to Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e(c) \u0026ndash; (d), the pairwise interaction of environmental factors contributes to the accurate estimation of the prevalence of ASD and AND, indirectly indicating the feasibility of utilizing environmental factors for predicting allergic diseases.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e"},{"header":"6. Conclusion","content":"\u003cp\u003eIn this study, we established a machine learning-based scheme for the prediction of ASD and AND prevalence and forecasted the variations in ASD and AND prevalence in Urumqi from 2015 to 2099 under three different climate change scenarios. The results demonstrates that machine learning models can effectively predict the prevalence of ASD and AND) using environmental and hydrological data. The DTR emerged as the most effective model for predicting ASD, while RR performed best for AND. The analysis revealed that minimum temperature is the most influential factor affecting both diseases. Projections under three climate change scenarios indicate a significant increase in ASD and AND prevalence in Urumqi from 2015 to 2099. Specifically, the peak number of ASD cases under the three climate change scenarios is expected to rise to between 215000 and 275600, while AND cases are projected to rise to between 39900 and 56300. These results underscore the critical need for integrating predictive models into public health strategies to anticipate and manage future changes in allergic disease patterns ensuring effective resource allocation and intervention planning. Overall, the scheme proposed in this paper demonstrates the potential of machine learning models to forecast the prevalence of ASD and AND, which could aid decision-making departments in implementing timely countermeasures and in effective resource allocation.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cb\u003eEthical considerations\u003c/b\u003e\u003c/p\u003e\u003cp\u003eThis is an observational study that has confirmed that ethical approval is not required.\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eCompeting interest\u003c/strong\u003e\u003cp\u003eThe authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.\u003c/p\u003e\u003c/p\u003e\u003ch2\u003eFunding\u003c/h2\u003e\u003cp\u003eThis work was supported by the Natural Science Foundation of Xinjiang Uygur Autonomous Region - Distinguished Young Scholars (2022D01E81), Xinjiang Uygur Autonomous Region Tianshan Talent Training Program (2022TSYCCX0109), and Tianshan Innovation Team Fund Project of Xinjiang Uygur Autonomous Region (2023D14005).\u003c/p\u003e\u003ch2\u003eAuthor contributions\u003c/h2\u003e\u003cp\u003eFengxia Hu: Data curation, Methodology, Writing \u0026ndash; original draft, Conceptualization. Junqin Liang: Conceptualization, Writing \u0026ndash; review \u0026amp; editing, Methodology. Jin Zhang: Conceptualization, Validation, Supervision. Yizhou Li: Software, Data curation. Xiaoyu Zhang: Visualization, Investigation. Qian Wang: Visualization, Investigation.\u003c/p\u003e\u003ch2\u003eAcknowledgments\u003c/h2\u003e\u003cp\u003eWe thank Peng Liu from information department of our hospital and Hongbo Ling from Xinjiang Institute of Ecology and Geography for their assistance during the data extraction process.\u003c/p\u003e\u003ch2\u003eData availability\u003c/h2\u003e\u003cp\u003eData will be made available on request.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eAwad M, Khanna R (2015) Efficient Learn Machines. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/978-1-4302-5990-9\u003c/span\u003e\u003cspan address=\"10.1007/978-1-4302-5990-9\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eChen D, Sun W, Shi J, Johnson BA, Tan ML, Pan Q, Li W, Yang X, Zhang F (2024) Utilizing GaoFen-2 derived urban green space information to predict local surface temperature. Urban Urban Green 99:128463. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.ufug.2024.128463\u003c/span\u003e\u003cspan address=\"10.1016/j.ufug.2024.128463\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eCheng X, Frank U, Zhao F, Capella JR, Winkler JB, Schnitzler J-P, Ghirardo A, Bertic M, Estrella N, Durner J, Pritsch K (2023) Plant growth traits and allergenic potential of Ambrosia artemisiifolia pollen as modified by temperature and NO2. Environ Exp Bot 206:105193. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.envexpbot.2022.105193\u003c/span\u003e\u003cspan address=\"10.1016/j.envexpbot.2022.105193\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eConway AE, Verdi M, Kartha N, Maddukuri C, Anagnostou A, Abrams EM, Bansal P, Bukstein D, Nowak-Wegrzyn A, Oppenheimer J, Madan JC, Garnaat SL, Bernstein JA, Shaker MS (2024) Allergic Diseases and Mental Health. J Allergy Clin Immunol -Pract 12:2298\u0026ndash;2309. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.jaip.2024.05.049\u003c/span\u003e\u003cspan address=\"10.1016/j.jaip.2024.05.049\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ede Marco R, Poli A, Ferrari M, Accordini S, Giammanco G, Bugiani M, Villani S, Ponzio M, Bono R, Carrozzi L, Cavallini R, Cazzoletti L, Dallari R, Ginesu F, Lauriola P, Mandrioli P, Perfetti L, Pignato S, Pirina P, Struzzo P (2002) The impact of climate and traffic-related NO2 on the prevalence of asthma and allergic rhinitis in Italy. Clin Exp Allergy 32:1405\u0026ndash;1412. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1046/j.1365-2745.2002.01466.x\u003c/span\u003e\u003cspan address=\"10.1046/j.1365-2745.2002.01466.x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGidden MJ, Riahi K, Smith SJ, Fujimori S, Luderer G, Kriegler E, van Vuuren DP, van den Berg M, Feng L, Klein D, Calvin K, Doelman JC, Frank S, Fricko O, Harmsen M, Hasegawa T, Havlik P, Hilaire J, Hoesly R, Horing J, Popp A, Stehfest E, Takahashi K (2019) Global emissions pathways under different socioeconomic scenarios for use in CMIP6: a dataset of harmonized emissions trajectories through the end of the century. Geosci Model Dev 12:1443\u0026ndash;1475. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.5194/gmd-12-1443-2019\u003c/span\u003e\u003cspan address=\"10.5194/gmd-12-1443-2019\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGrant TL, Wood RA, Chapman MD (2023) Indoor Environmental Exposures and Their Relationship to Allergic Diseases. J Allergy Clin Immunol -Pract 11:2963\u0026ndash;2970. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.jaip.2023.08.034\u003c/span\u003e\u003cspan address=\"10.1016/j.jaip.2023.08.034\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHans C (2011) Elastic Net Regression Modeling With the Orthant Normal Prior. J Am Stat Assoc 106:1383\u0026ndash;1393. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1198/jasa.2011.tm09241\u003c/span\u003e\u003cspan address=\"10.1198/jasa.2011.tm09241\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHersbach H, Bell B, Berrisford P, Hirahara S, Horanyi A, Munoz-Sabater J, Nicolas J, Peubey C, Radu R, Schepers D, Simmons A, Soci C, Abdalla S, Abellan X, Balsamo G, Bechtold P, Biavati G, Bidlot J, Bonavita M, De Chiara G, Dahlgren P, Dee D, Diamantakis M, Dragani R, Flemming J, Forbes R, Fuentes M, Geer A, Haimberger L, Healy S, Hogan RJ, Holm E, Janiskova M, Keeley S, Laloyaux P, Lopez P, Lupu C, Radnoti G, de Rosnay P, Rozum I, Vamborg F, Villaume S, Thepaut J-N (2020) The ERA5 global reanalysis. Q J R Meteorol Soc 146:1999\u0026ndash;2049. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1002/qj.3803\u003c/span\u003e\u003cspan address=\"10.1002/qj.3803\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHoerl A, Kennard R (1970) Ridge Regression - Applications to Nonorthogonal Problems. Technometrics 12:69. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.2307/1267352\u003c/span\u003e\u003cspan address=\"10.2307/1267352\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHu B, Wang Y, Liu G (2010) Properties of ultraviolet radiation and the relationship between ultraviolet radiation and aerosol optical depth in China. Atmos Res 98:297\u0026ndash;308. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.atmosres.2010.07.009\u003c/span\u003e\u003cspan address=\"10.1016/j.atmosres.2010.07.009\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHu Y, Jiang F, Tan J, Liu S, Li S, Wu M, Yan C, Yu G, Yi H, Yin Y, Tong S (2022) Environmental Exposure and Childhood Atopic Dermatitis in Shanghai: A Season-Stratified Time-Series Analysis. Dermatology 238:101\u0026ndash;108. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1159/000514685\u003c/span\u003e\u003cspan address=\"10.1159/000514685\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHuang C-F, Chie W-C, Wang I-J (2021) Effect of environmental exposures on allergen sensitization and the development of childhood allergic diseases: A large-scale population-based study. World Allergy Organ J 14:100495. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.waojou.2020.100495\u003c/span\u003e\u003cspan address=\"10.1016/j.waojou.2020.100495\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHuang J, Zheng W, Huang H, Ran Y, Liu Y, Huang P (2023) Particulate matter, nitrogen dioxide, and sulfur dioxide and their associations with allergic skin diseases: A systematic review and meta-analysis. Atmos Pollut Res 14:101804. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.apr.2023.101804\u003c/span\u003e\u003cspan address=\"10.1016/j.apr.2023.101804\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHuang Y, Wen H-J, Guo Y-LL, Wei T-Y, Wang W-C, Tsai S-F, Tseng VS, Wang S-LJ (2021) Prenatal exposure to air pollutants and childhood atopic dermatitis and allergic rhinitis adopting machine learning approaches: 14-year follow-up birth cohort study. Sci Total Environ 777:145982. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.scitotenv.2021.145982\u003c/span\u003e\u003cspan address=\"10.1016/j.scitotenv.2021.145982\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKim B-J, Kwon J-W, Seo J-H, Kim H-B, Lee S-Y, Park K-S, Yu J, Kim H-C, Leem J-H, Sakong J, Kim S-Y, Lee C-G, Kang D-M, Ha M, Hong Y-C, Kwon H-J, Hong S-J (2011) Association of ozone exposure with asthma, allergic rhinitis, and allergic sensitization. Ann Allergy Asthma Immunol 107:214\u0026ndash;219. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.anai.2011.05.025\u003c/span\u003e\u003cspan address=\"10.1016/j.anai.2011.05.025\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLack G (2001) Pediatric allergic rhinitis and comorbid disorders. J Allergy Clin Immunol 108:S9\u0026ndash;S15. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1067/mai.2001.115562\u003c/span\u003e\u003cspan address=\"10.1067/mai.2001.115562\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLam HCY, Anees-Hill S, Satchwell J, Symon F, Macintyre H, Pashley CH, Marczylo EL, Douglas P, Aldridge S, Hansell A (2024) Association between ambient temperature and common allergenic pollen and fungal spores: A 52-year analysis in central England, United Kingdom. Sci Total Environ 906:167607. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.scitotenv.2023.167607\u003c/span\u003e\u003cspan address=\"10.1016/j.scitotenv.2023.167607\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLin L, Chen Y, Wei J, Wu, Shengchi, Wu, Shu, Jing J, Dong G, Cai L (2022) The associations between residential greenness and allergic diseases in Chinese toddlers: A birth cohort study. Environ Res 214:114003. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.envres.2022.114003\u003c/span\u003e\u003cspan address=\"10.1016/j.envres.2022.114003\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLovato T, Peano D, Butenschon M, Materia S, Iovino D, Scoccimarro E, Fogli PG, Cherchi A, Bellucci A, Gualdi S, Masina S, Navarra A (2022) CMIP6 Simulations With the CMCC Earth System Model (CMCC-ESM2). J. Adv. Model. Earth Syst. 14, e2021MS002814. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1029/2021MS002814\u003c/span\u003e\u003cspan address=\"10.1029/2021MS002814\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMartinez BA, Shrotri S, Kingsmore KM, Bachali P, Grammer AC, Lipsky PE (2022) Machine learning reveals distinct gene signature profiles in lesional and nonlesional regions of inflammatory skin diseases. Sci Adv 8:eabn4776. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1126/sciadv.abn4776\u003c/span\u003e\u003cspan address=\"10.1126/sciadv.abn4776\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMeehl GA, Boer GJ, Covey C, Latif M, Stouffer RJ (2000) The coupled model intercomparison project (CMIP). Bull Am Meteorol Soc 81:313\u0026ndash;318\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMunoz-Sabater J, Dutra E, Agusti-Panareda A, Albergel C, Arduini G, Balsamo G, Boussetta S, Choulga M, Harrigan S, Hersbach H, Martens B, Miralles DG, Piles M, Rodriguez-Fernandez NJ, Zsoter E, Buontempo C, Thepaut J-N (2021) ERA5-Land: a state-of-the-art global reanalysis dataset for land applications. Earth Syst Sci Data 13:4349\u0026ndash;4383. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.5194/essd-13-4349-2021\u003c/span\u003e\u003cspan address=\"10.5194/essd-13-4349-2021\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eNelder JA, Wedderburn RWM (1972) Generalized Linear Models. J Royal Stat Soc Ser A 135:370\u0026ndash;384. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.2307/2344614\u003c/span\u003e\u003cspan address=\"10.2307/2344614\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eProsperi MCF, Marinho S, Simpson A, Custovic A, Buchan IE (2014) Predicting phenotypes of asthma and eczema with machine learning. BMC Med Genomics 7:S7. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1186/1755-8794-7-S1-S7\u003c/span\u003e\u003cspan address=\"10.1186/1755-8794-7-S1-S7\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRathore SS, Kumar S (2016) SIGSOFT Softw Eng Notes 41:1\u0026ndash;6. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1145/2853073.2853083\u003c/span\u003e\u003cspan address=\"10.1145/2853073.2853083\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. A Decision Tree Regression based Approach for the Number of Software Faults Prediction\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRodriguez-Galiano V, Sanchez-Castillo M, Chica-Olmo M, Chica-Rivas M (2015) Machine learning predictive models for mineral prospectivity: An evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geol Rev 71:804\u0026ndash;818. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.oregeorev.2015.01.001\u003c/span\u003e\u003cspan address=\"10.1016/j.oregeorev.2015.01.001\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSchafer T, Ring J (1997) Epidemiology of allergic diseases. Allergy 52:14\u0026ndash;22. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1111/j.1398-9995.1997.tb04864.x\u003c/span\u003e\u003cspan address=\"10.1111/j.1398-9995.1997.tb04864.x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eShamji MH, Ollert M, Adcock IM, Bennett O, Favaro A, Sarama R, Riggioni C, Annesi-Maesano I, Custovic A, Fontanella S, Traidl-Hoffmann C, Nadeau K, Cecchi L, Zemelka-Wiacek M, Akdis CA, Jutel M, Agache I (2023) EAACI guidelines on environmental science in allergic diseases and asthma - Leveraging artificial intelligence and machine learning to develop a causality model in exposomics. Allergy 78:1742\u0026ndash;1757. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1111/all.15667\u003c/span\u003e\u003cspan address=\"10.1111/all.15667\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eShannon C (1948) A Mathematical Theory of Communication. Bell Syst Techn J 27:379\u0026ndash;423. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1002/j.1538-7305.1948.tb01338.x\u003c/span\u003e\u003cspan address=\"10.1002/j.1538-7305.1948.tb01338.x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eShen J, Ke X, Hong S, Zeng Q, Liang C, Li T, Tang A (2011) Epidemiological features of allergic rhinitis in four major cities in Western China. J Huazhong Univ Sci Tech -Med 31:433\u0026ndash;440. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s11596-011-0469-1\u003c/span\u003e\u003cspan address=\"10.1007/s11596-011-0469-1\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSidikjan N, Eziz M, Wang Y (2022) Spatial Distribution, Contamination Levels, and Health Risks of Trace Elements in Topsoil along an Urbanization Gradient in the City of Urumqi, China. Sustainability 14:12646. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/su141912646\u003c/span\u003e\u003cspan address=\"10.3390/su141912646\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSobieraj K, Grewling L, Bogawski P (2024) Assessing allergy risk from ornamental trees in a city: Integrating open access remote sensing data with pollen measurements. J Environ Manage 367:122051. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.jenvman.2024.122051\u003c/span\u003e\u003cspan address=\"10.1016/j.jenvman.2024.122051\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSong Y, Wu P (2021) An interactive detector for spatial associations. Int J Geogr Inf Sci 35:1676\u0026ndash;1701. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1080/13658816.2021.1882680\u003c/span\u003e\u003cspan address=\"10.1080/13658816.2021.1882680\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSvensson A, Ofenloch RF, Bruze M, Naldi L, Cazzaniga S, Elsner P, Goncalo M, Schuttelaar M-LA, Diepgen TL (2018) Prevalence of skin disease in a population-based sample of adults from five European countries. Br J Dermatol 178:1111\u0026ndash;1118. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1111/bjd.16248\u003c/span\u003e\u003cspan address=\"10.1111/bjd.16248\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eTang HHF, Sly PD, Holt PG, Holt KE, Inouye M (2020) Systems biology and big data in asthma and allergy: recent discoveries and emerging challenges. Eur Resp J 55:1900844. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1183/13993003.00844-2019\u003c/span\u003e\u003cspan address=\"10.1183/13993003.00844-2019\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eTang Z, Li S, Shen M, Xiao Y, Su J, Tao J, Wang X, Shan S, Kang X, Wu B, Zou B, Chen X (2022) Association of exposure to artificial light at night with atopic diseases: A cross-sectional study in college students. Int J Hyg Environ Health 241:113932. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.ijheh.2022.113932\u003c/span\u003e\u003cspan address=\"10.1016/j.ijheh.2022.113932\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eToms JD, Lesperance ML (2003) Piecewise regression: A tool for identifying ecological thresholds. Ecology 84:2034\u0026ndash;2041. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1890/02-0472\u003c/span\u003e\u003cspan address=\"10.1890/02-0472\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eTsai M-H, Shih H-J, Su K-W, Liao S-L, Hua M-C, Yao T-C, Lai S-H, Yeh K-W, Chen L-C, Huang J-L, Chiu C-Y (2022) Nasopharyngeal microbial profiles associated with the risk of airway allergies in early childhood. J Microbiol Immunol Infect 55:777\u0026ndash;785. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.jmii.2022.01.006\u003c/span\u003e\u003cspan address=\"10.1016/j.jmii.2022.01.006\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWang XD, Zheng M, Lou HF, Wang CS, Zhang Y, Bo MY, Ge SQ, Zhang N, Zhang L, Bachert C (2016) An increased prevalence of self-reported allergic rhinitis in major Chinese cities from 2005 to 2011. Allergy 71:1170\u0026ndash;1180. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1111/all.12874\u003c/span\u003e\u003cspan address=\"10.1111/all.12874\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWang Y, Liu T, Wan Z, Wang L, Hou J, Shi M, Tsui SKW (2023) Investigating causal relationships between the gut microbiota and allergic diseases: A mendelian randomization study. Front Genet 14:1153847. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3389/fgene.2023.1153847\u003c/span\u003e\u003cspan address=\"10.3389/fgene.2023.1153847\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWeikl F, Radl V, Munch JC, Pritsch K (2015) Targeting allergenic fungi in agricultural environments aids the identification of major sources and potential risks for human health. Sci Total Environ 529:223\u0026ndash;230. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.scitotenv.2015.05.056\u003c/span\u003e\u003cspan address=\"10.1016/j.scitotenv.2015.05.056\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWeller K, Maurer M, Bauer A, Wedi B, Wagner N, Schliemann S, Kramps T, Baeumer D, Multmeier J, Hillmann E, Staubach P (2022) Epidemiology, comorbidities, and healthcare utilization of patients with chronic urticaria in Germany. J Eur Acad Dermatol Venereol 36:91\u0026ndash;99. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1111/jdv.17724\u003c/span\u003e\u003cspan address=\"10.1111/jdv.17724\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eXu Y, Li Y (2024) Association between lipid-lowering drugs and allergic diseases: A Mendelian randomization study. World Allergy Organ J 17:100899. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.waojou.2024.100899\u003c/span\u003e\u003cspan address=\"10.1016/j.waojou.2024.100899\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZhu L, Meng J, Zhu L (2020) Applying Geodetector to disentangle the contributions of natural and anthropogenic factors to NDVI variations in the middle reaches of the Heihe River Basin. Ecol Indic 117:106545. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.ecolind.2020.106545\u003c/span\u003e\u003cspan address=\"10.1016/j.ecolind.2020.106545\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZiaee A, Zia M, Goli M (2018) Identification of saprophytic and allergenic fungi in indoor and outdoor environments. Environ Monit Assess 190:574. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s10661-018-6952-4\u003c/span\u003e\u003cspan address=\"10.1007/s10661-018-6952-4\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":true,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"international-journal-of-biometeorology","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"ijbm","sideBox":"Learn more about [International Journal of Biometeorology](http://link.springer.com/journal/484)","snPcode":"484","submissionUrl":"https://www.editorialmanager.com/ijbm/default2.aspx","title":"International Journal of Biometeorology","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"Allergic diseases, changing environments, machine learning, Shared Socioeconomic Pathways (SSPs), geographical detector","lastPublishedDoi":"10.21203/rs.3.rs-6938034/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6938034/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eTimely understanding the prevalence of allergic skin diseases (ASD) and allergic nasopharyngeal disease (AND) is essential for effective public health planning and resource allocation. However, accurately predicting ASD and AND poses a significant challenge due to the complex interplay of environmental and individual factors. A machine learning-based scheme was proposed for predicting the prevalence of ASD and AND using environmental and hydrological data (n\u0026thinsp;=\u0026thinsp;85). Significant variations in predictive accuracy were observed across different algorithms. For ASD, the decision tree regression (DTR) demonstrated the best performance. For AND, the ridge regression (RR) model yielded the best results, respectively. Based on Urumqi's 2022 population, the projected peak number of individuals with ASD is expected to rise by 215,000, 243,200, and 275,600 compared to January 2015. For AND, the projected peak increases are expected to be 38,900, 35,700, and 56,300, respectively. Environmental factors exhibit significant correlations with the prevalence of ASD and AND, with minimum temperature identified as the most influential factor affecting both conditions. Machine learning models that incorporate these environmental variables were proven to effectively predict the prevalence of both conditions. Based on the model's projections under three climate change scenarios, a significant increase in the prevalence of ASD and AND in Urumqi is expected from 2015 to 2099. This trend underscores the potential impact of climate change on public health in the region, highlighting the need for proactive measures to address these emerging challenges.\u003c/p\u003e","manuscriptTitle":"Projected typical allergic diseases prevalence under changing environments based on multiple machine learning models","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-07-25 11:07:05","doi":"10.21203/rs.3.rs-6938034/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"reviewerAgreed","content":"","date":"2025-07-24T03:08:44+00:00","index":0,"fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-07-23T12:55:28+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-06-20T13:32:24+00:00","index":"","fulltext":""},{"type":"submitted","content":"International Journal of Biometeorology","date":"2025-06-20T06:32:02+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"international-journal-of-biometeorology","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"ijbm","sideBox":"Learn more about [International Journal of Biometeorology](http://link.springer.com/journal/484)","snPcode":"484","submissionUrl":"https://www.editorialmanager.com/ijbm/default2.aspx","title":"International Journal of Biometeorology","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"c98d3a0c-ae8d-40ad-8998-9ba9010a7a57","owner":[],"postedDate":"July 25th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2026-02-16T16:03:18+00:00","versionOfRecord":{"articleIdentity":"rs-6938034","link":"https://doi.org/10.1007/s00484-026-03127-2","journal":{"identity":"international-journal-of-biometeorology","isVorOnly":false,"title":"International Journal of Biometeorology"},"publishedOn":"2026-02-11 15:57:59","publishedOnDateReadable":"February 11th, 2026"},"versionCreatedAt":"2025-07-25 11:07:05","video":"","vorDoi":"10.1007/s00484-026-03127-2","vorDoiUrl":"https://doi.org/10.1007/s00484-026-03127-2","workflowStages":[]},"version":"v1","identity":"rs-6938034","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6938034","identity":"rs-6938034","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.