Quantifying Uncertainty in Indoor Radon Exposure Estimates in Pennsylvania with Quantile Regression Forests | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Quantifying Uncertainty in Indoor Radon Exposure Estimates in Pennsylvania with Quantile Regression Forests Heechan Lee, Dakotah Maguire, Jeremy Logan, Greeshma Agasthya, and 2 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6857670/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 05 Mar, 2026 Read the published version in Scientific Reports → Version 1 posted 11 You are reading this latest preprint version Abstract Background: Radon is a naturally occurring radioactive gas that poses a serious health risk as the primary cause of lung cancer in non-smokers. Despite the well-known adverse association with health outcomes, current radon exposure assessments are limited to county-level or average-level estimates, which fail to capture regional variability. Objective: This study aims to capture the regional variability at ZCTA-level. Methods: This study uses ML models, including RF and QRF, to predict the indoor radon concentrations at the ZCTA-level and characterize uncertainties in model estimates. Incorporating geological, meteorological, and building-specific data, the models aim to improve radon risk assessment by capturing mean exposure, variability, and extreme concentration levels. Processed radon test data (n = 718,111) were analyzed using average, variability, and quantile prediction methods. Results: Models that predict the average radon exposure at the ZCTA-level can yield promising model-fit results, but they do not capture the underlying variability of indoor radon exposure within a ZCTA. We utilize volatility analyses to identify characteristics indicative of high variability of indoor radon exposure. We also show that a QRF model can be used to predict upper quantiles of residential radon exposure, thereby uncovering localized areas of elevated exposure that were not apparent in mean estimates. The results highlighted the need for a deep characterization of exposure risk and show that regions with moderate average exposure levels could still harbor extreme outliers with implications for evaluating health risks. Conclusion: Utilizing multiple radon exposure models allows for a deeper characterization of radon risk within a geographic area and can better identify high-risk areas. The results from this study provide a foundation for developing mitigation strategies and examining associations between radon exposure and health outcomes at fine scales. Future research should extend the geographic scope and incorporate additional environmental risk factors to establish a comprehensive framework for risk assessment. Earth and environmental sciences/Environmental sciences Health sciences/Risk factors Radon Geology Prediction model Machine Learning ZCTA-level predictions environmental health Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 Figure 11 Figure 12 Figure 13 Figure 14 Figure 15 Figure 16 Figure 17 Figure 18 1. Introduction As a naturally occurring radioactive gas, radon is a major contributor to background radiation exposure 1 and a significant health concern as the second leading cause of lung cancer overall and the leading cause of lung cancer in nonsmokers. 2 Seeping undetected into dwellings through the ground, this colorless and odorless gas poses unique risks because radon’s decay products can attach to lung tissue and emit ionizing radiation that initiates carcinogenic processes. 3 Recent studies have also suggested potential links between radon exposure and cardiovascular disease 4,5 and its potential interaction with PM 2.5, 5,6 findings that could broaden the understanding of radon’s health impacts. Despite the well-established links between radon exposure and adverse health outcomes, 7-10 radon measurement and exposure assessment often involve significant uncertainties. 11,12 Indoor radon concentrations can vary significantly, even between neighboring houses, owing to differences in soil composition, building materials, and ventilation systems. 13,14 Moreover, radon estimates are currently limited to county-level data, which insufficiently captures the localized variability of radon exposure. This limitation poses challenges for effective public health interventions and epidemiological research. This study applied multiple machine learning models to predict indoor radon concentrations at the zip-code scale and to examine factors contributing to uncertainty in these predictions. This approach hypothesizes that integrating geological, meteorological, and building-specific factors improves the accuracy of radon concentration predictions at the zip-code scale while identifying areas where mean or median predictions may underestimate exposure for a large percentage of the population. By combining these models, this study offers a more refined approach for radon risk assessment and targeted mitigation strategies. Estimating indoor radon exposure at finer spatial scales and quantifying the uncertainty in these estimates can also enhance the precision of environmental exposure data for epidemiological studies investigating the links between indoor radon exposure and health outcomes. Indoor radon tests were used to develop predictive models at the zip-code tabulation area (ZCTA) scale, with the overarching goal of providing more accurate assessments of risks posed by indoor radon exposure. 2. Background 2.1 Characteristics of Radon Radon is a radioactive gas formed from the natural decay of uranium, which is commonly found in rocks, soil, and certain building materials. Inhalation of radon and its progeny exposes biological tissue to ionizing radiation, which has the potential to induce DNA damage and increase the risk of cancer, with lung cancer being the highest potential risk. 8 , 15 Radon-222, the isotope predominantly linked with radon-induced health risks, has a half-life of ~ 3.8 days and produces radioactive progeny, including polonium-218 and polonium-214. Because radon gas is colorless and odorless, specialized instruments are required to detect and measure it. Radon is primarily released from uranium-rich igneous rocks (e.g., granite) and sedimentary formations (e.g., phosphate rocks, shales) with higher uranium content. Limestone, although generally low in uranium content, can also emit radon in trace amounts. 16 , 17 Building materials, including concrete and gypsum wallboard, may also emit radon if they contain traces of uranium. 18 , 19 Because radon gas is significantly denser than air, once emitted, it tends to accumulate in the lowest areas of buildings (e.g., basements, below-ground spaces), although elevated radon levels can also be found in poorly ventilated ground-level rooms. 20 , 21 2.2 Variables Affecting Indoor Radon Concentrations Radon concentration at a specific location is not a random occurrence but is determined by a complex interplay of geological, structural/architectural, and meteorological factors. These factors can either exacerbate or attenuate radon levels and ultimately influence the extent of indoor radon exposure. 2.2.1 Geological Factors Radon is a geochemically generated gas that can exhibit significant spatial variability caused by underlying geological conditions. Elevated radon conditions are typically seen in regions with geologic formations that have elevated uranium levels—notably in granite, phosphate rocks, and shales. 16 , 17 , 22 Additionally, the physical properties of soil, including its density, porosity, and moisture levels, are key determinants of radon migration from the subsurface to the atmosphere. 17 , 22 , 23 For example, clay-rich soils tend to inhibit radon diffusion and release, whereas sandy soils facilitate radon mobility. Moreover, geological discontinuities (e.g., faults and fractures) further enhance the transport of radon by providing pathways for upward migration of gas into the environment. 17 , 24 , 25 2.2.2 Structural and Architectural Factors The entry and accumulation of radon within buildings are strongly influenced by architectural and structural factors. The construction materials themselves may serve as sources of radon emissions. For example, stone foundations can emit higher levels of radon compared to foundations made of other materials. 21 Older buildings that suffer from structural degradation and the presence of cracks typically exhibit higher radon concentrations due to an increased number of infiltration points. 26 Ventilation systems play a significant role in moderating indoor radon levels, and well-ventilated spaces dilute radon concentrations more effectively. 21 , 26 Furthermore, numerous other factors (as derived from census data) could contribute to indoor radon concentration, including the number of housing units, their occupancy status, structural attributes (e.g., the number of units in a structure), and type of heating fuel used. 20 , 27 2.2.3 Meteorological Factors Meteorological conditions play an important role in fluctuating levels of indoor radon concentration. Temperature and atmospheric pressure affect radon's movement from the ground into buildings. For example, low atmospheric pressure can increase radon entry, and temperature differences between indoor areas and outdoor areas can result in pressure differences that facilitate the entry of radon into a structure. 28 Seasonal changes further impact indoor radon levels, and concentrations are often higher during the winter than during the summer. 29 , 30 Furthermore, weather patterns, including precipitation, can also change radon concentrations. Heavy rainfall can increase soil moisture and potentially reduce radon emissions, whereas dry conditions can allow for greater radon emissions from the dry soil. 31 , 32 2.3 Previous Radon Level Prediction Models Previous studies have employed different modeling approaches to estimate radon concentrations across multiple geographic regions. The Environmental Protection Agency’s Radon Zone project 33 classified counties based on geological and soil characteristics, offering a broad but useful categorization. Subsequent studies have used more sophisticated statistical methods. Price et al. 34 , 35 employed Bayesian models that incorporated geological data to improve county-level predictions of radon concentrations in Minnesota and mid-Atlantic states. Mose and Mushrush 23 explored the correlation between soil radon levels, permeability, and indoor radon levels, emphasizing the complexity of radon entry into homes. Apte et al. 36 employed mixed-effects regression models for predicting radon levels in New Hampshire, accounting for housing types and geological features. Similarly, Smith and Field 14 developed a Bayesian hierarchical model to predict residential radon in Iowa by combining regional geological data and housing characteristics, and Casey et al. 37 analyzed how geology, well water usage, and other factors influenced radon levels in Pennsylvania. Recent studies have also applied machine learning techniques to more accurately predict radon levels. Kropat et al. 38 and Nikkilä et al. 39 used random forests (RFs) and other ensemble methods to predict radon concentrations in Switzerland and Finland, respectively, by incorporating detailed geological data. Dai et al. 40 identified geological fault zones and housing characteristics as critical factors for radon risk in Georgia. Li et al. 41 introduced an ensemble-based machine learning model that integrated multiple data types to predict monthly radon concentrations at the ZCTA level in the greater Boston area. These advanced approaches allow for the integration of environmental, architectural, and demographic factors, thereby enabling more comprehensive and nuanced radon risk assessments. 3. Methods This study employed machine learning techniques, specifically RF and quantile regression forest (QRF), to estimate the mean and quantiles of radon concentrations. We also used a volatility model framework and modeled the relative variability of radon at the ZCTA level. Residential radon values are from pre-mitigation indoor home radon tests. Predictive attributes include the physical characteristics known to affect radon exposure, and they were measured at the ZCTA level (elevation, soil, hydrologic, meteorological, and census data). Data were integrated to create predictive models that incorporate information from over 60 features related to residential radon exposure. These datasets were preprocessed and aggregated by using the H3 spatial indexing system to allow for a common spatial scale for all datasets using workflows developed for the Centralized Health and Environmental Repository (C-HER). 42 The application of C-HER workflows enabled the harmonization of data across multiple spatial scales into a single reference scale. Detailed descriptions of the data processing methodologies for each dataset are provided in the associated methods white paper. 43 A concise overview of these methodologies is included in this paper for reference. 3.1 Data Processing 3.1.1 Residential Radon Values Indoor radon test data collected from 2008 to 2021 were provided by the Pennsylvania Department of Environmental Protection (n = 1,622,169). 44 Pennsylvania was selected as the study area owing to the high prevalence of radon in the state. The dataset includes detailed information such as county, residential address postal code, building purpose, test floor level, and test date. The residential radon measures from 2008 to 2021 showed a 5.92 pCi/L average, a 2.60 pCi/L median, 1.40 pCi/L for the 25th percentile, and 5.40 pCi/L for the 75th percentile. In this study, several exclusion criteria were applied to refine the dataset (Fig. 1 ). Tests with inappropriate durations (fewer than 2 days or more than 15 days, n = 9,772) were removed. Non-residential indoor radon tests were excluded (n = 533,804). Tests with out-of-range values, specifically those less than 0 or greater than 9,999, as well as tests lacking information on the test floor level, were also excluded (n = 49,740). Additionally, any test with a value exceeding the 99th percentile within a single zip code was excluded (n = 8,242). To address differences in indoor radon concentrations across floors, only measurements taken in basements, where radon levels are typically highest, were included (n = 80,697). For houses with multiple measurements across time, only the first recorded measurement was retained (n = 162,330) because subsequent measurements were likely taken post-mitigation. Zip codes for radon test kit addresses were linked to Census ZCTAs by using a 2020 crosswalk file from the Uniform Data System Mapper. 45 For cases when no direct match was available, USPS zip code updates were applied. To calculate ZCTA-level statistics, observations were clustered using ZCTA-month pairs for calculating averages or the coefficient of variations (CoVs) for a ZCTA. This also enabled us to account for the significant seasonal variability in radon levels. 29 , 30 Indoor radon concentrations are higher during winter months because structures typically use less ventilation, and heating a structure induces the stack effect. 30 We required at least 10 measurements per ZCTA-month pair to ensure the inclusion of statistically reliable averages and CoVs. ZCTA-month pairs with fewer than 10 measurements were excluded from the analysis (n = 46,473). After completing all data processing steps, the final dataset included 718,111 measurements across 1,542 ZCTAS from 2008 to 2021. This dataset was used to train and test predictive models. The processed residential radon measurements from 2008 to 2021 had an average concentration of 5.2 pCi/L, with a median of 2.6 pCi/L. The interquartile range spanned 1.4 pCi/L (25th percentile) to 5.3 pCi/L (75th percentile), highlighting significant variability in radon levels across Pennsylvania. 3.1.2 Independent Variables The elevation, soil, geochemical, hydrologic, and meteorological measures included in our model were reported across a range of spatial scales. Traditionally, to solve this issue, researchers aggregate measures to the spatial area of the dependent variable (i.e., ZCTA in our case). However, for measures such as soil data, for which there is a wide variability and more detailed information is available, this averaging results in less precise measures that decrease predictive accuracy. Additionally, in rural areas where housing may be sparsely spread across an area, averaging over a ZCTA incorporates information about soil characteristics that do not affect residential locations. We developed a three-step data engineering process to solve these data engineering challenges. 43 First, all measures were spatially indexed using H3 hexagons. Second, we used LandScan population data 46 to mask daily population data (aggregated day and night population) from 2018 to identify all H3 hexagons with an established population (Fig. 2 ). If a hexagon did not include a residential location, then the information from the hexagon was excluded from the aggregate statistics at the ZCTA level. Third, all data in H3 hexagons were converted to ZCTA (weighted average, standard deviation, and dominant condition for categorical variables) by using areal interpolation via PySAL's area_interpolate() function. To effectively manage large data volumes, a raster was conducted in overlapping tiles to ensure comprehensive coverage of all hexagons with relevant raster data. Census data were measured at the ZCTA level. This study incorporated elevation data from the USGS’s Global Multi-Resolution Terrain Elevation Data 2010 (GMTED2010), 47 which offers comprehensive global elevation information that is ideal for environmental and geological analyses. The GMTED2010 data for Pennsylvania was obtained in a 30 arc-second resolution and re-gridded into level-8 H3 hexagons using the geo_to_h3_aggregate() function from the h3pandas library, 42 , 48 which calculated mean elevation for each hexagon based on raster pixel centroids. To address missing hexagons caused by grid misalignment, a ring smoothing technique was used to average values from adjacent hexagons to fill the gaps. The soil data utilized in this study were extracted from the Gridded National Soil Survey Geographic Database (gNATSGO), which was provided by the USDA's Natural Resources Conservation Service. 49 The gNATSGO dataset included detailed soil characteristics relevant for assessing radon emanation, transport, and accumulation. The analysis utilized 10-meter resolution state-level data in an ESRI file geodatabase format, and the data were processed using the ArcGIS Toolbox Soil Data Development Toolkit from the USGS. Soil maps were generated for each soil characteristic, representing the dominant soil condition and the maximum available depth (200 cm). Individual rasters were reprojected to the EPSG:4326 coordinate reference system, and pixel values were extracted to construct a data frame that contained longitude and latitude. Zonal statistics for target hexagons were calculated using the geo_to_h3_aggregate() function from the h3pandas library. This study incorporated geochemical data from the USGS Geochemical and Mineralogical Survey, 50 and focused on uranium, potassium, and thorium concentrations across different soil depths (0–5 cm, A horizon, C horizon). Because uranium is a direct source of radon, understanding its distribution is key for radon risk assessment. Geochemical data were processed into H3 hexagons to aggregate values at the ZCTA level, allowing for detailed spatial analysis. Categorical values from the GeoTIFF files were assigned using the geo_to_h3_aggregate() function to obtain the most dominant condition within ZCTAs, thereby ensuring consistency across hexagons and enhancing the accuracy of radon risk predictions. Hydrologic data for this study were obtained from the USGS Hydrologic Landscape Regions, which provide detailed water-related landscape characteristics across the United States. 51 This dataset provides information on hydrological factors that affect radon transport through water and soil pathways. For the radon concentration prediction model, relevant hydrological variables were selected from an initial vector dataset at a 1 km \(\:\times\:\) 1 km resolution. The data were converted into H3 level-8 hexagons by using the polyfill_resample() method from the h3pandas library to ensure consistency with other datasets. The meteorological data used in the radon concentration prediction model were sourced from the Daymet dataset, 52 which provides high-resolution climate information from across North America. Precipitation, snow water equivalent, temperature, and vapor pressure variables were prioritized due to their relevance in influencing radon levels. Daily 1-km grids from Daymet were aggregated into monthly averages using numpy 53 and then re-gridded to H3 level-8 using the area_interpolate() function from the PySAL Tobler library. 54 The demographic and housing data used in this study were sourced from the American Community Survey (ACS) and Decennial Census (DEC) and accessed via the US Census Bureau’s API service. 55 , 56 These data were utilized to capture housing characteristics, which were hypothesized to be significant predictors of radon levels. Variables such as year of construction and primary heating fuel were collected from the 2000 DEC and the 2013–2020 5-year ACS estimates at the ZCTA-level resolution. A summary of selected variables and the rationale for their inclusion in the models are provided in Table 1 . Table 1 Variables included in the study for modeling radon level estimation and the descriptions of their relevance and mechanisms by which they may influence radon levels. Variable Description Bulk Density Reflects soil's ability to permit radon gas movement; denser soils may reduce radon's upward migration. Percent Clay, Sand, Silt These factors affect soil permeability to radon gas, with coarser soils (higher sand content) generally allowing for easier radon passage. Depth to Soil Restrictive Layer Indicates potential barriers to radon movement toward the surface. Linear Extensibility, Liquid Limit, Plasticity Index Relate to soil's expandability and water retention, impacting radon diffusion. Surface Texture Influences the surface's ability to release or trap radon gas. Soil Taxonomy Classification Provides insights into the soil's overall characteristics that could affect radon emanation and transport. Available Water Supply/Capacity Water saturation levels can impact radon solubility and its movement through the soil. Water Content Water content affects soil's radon transmission properties as wetter soils may impede radon flow. 22 , 28 Hydric Rating by Map Unit Identifies water saturation in soil. Moisture content can influence radon solubility and transport. Hydrologic Soil Group Classifies soils based on their drainage capacity, affecting radon's upward movement from soil to indoor environments. Soil Moisture Class/Subclass Indicates moisture content of soil, which is essential for understanding radon transport dynamics in different soil conditions. Soil Temperature Soil temperature can affect diffusion and permeability of radon. Drainage Class Efficient drainage can reduce radon's upward movement, making these variables critical in predicting radon levels. Saturated Hydraulic Conductivity Higher conductivity suggests easier movement of water and possibly of radon through the soil. Organic Matter Influences soil structure and hence radon migration, with higher organic content potentially impeding radon movement. Dwellings With/Without Basements Indicates building characteristics that are directly related to potential indoor radon levels, as homes with basements are generally at higher risk. Uranium Content Radon-222 is a decay product of uranium-238. Thorium Content Radon-220 (or thoron) is a decay product of thorium-232. Potassium Content Although potassium is not related to radon directly, it shows the characteristics of the soil or rock. Clay Content (10 Å, 14 Å, and Kaolinite) Different types of clay content can affect radon's retention and movement through soil, thereby influencing its diffusion and accumulation indoors. Depth to Bed Rock Depth to bedrock can determine how easily radon migrates from the subsurface to the surface, thereby impacting potential radon exposure levels in buildings. Aquifer Permeability Class Determines groundwater flow rates, which affect radon's transport from soil to water sources and potentially into buildings. Elevation Influences atmospheric pressure variations, which can affect soil gas emissions, including radon. Relief of Watershed Indicates topographical variations that can influence radon gas accumulation and dispersion patterns. Percent Flat Land in Watershed Affects water drainage and soil gas movement, thereby impacting radon release. Daily Total Precipitation Impacts soil moisture levels, which can affect radon solubility and mobility through the soil. Snow Water Equivalent Reflects the amount of water contained in snowpack, which influences ground moisture and radon emission rates. Daily Minimum/Maximum 2-meter Air Temperature Affects the thermal gradient between the ground and atmosphere, which influences radon diffusion. Also affects the ventilation habits that can affect accumulation of radon. Vapor Pressure Vapor pressure can affect the moisture of soil, which can affect permeability of radon. Occupancy Status Unoccupied houses may have higher radon concentrations because the ventilation systems may not operate regularly. Year Structure Built Older buildings may have more radon entry points due to structural degradation over time and might have different types of building materials. House Heating Fuel Different heating systems can alter indoor air pressure and flow, thereby influencing radon entry and distribution. Also, heating fuel itself can be a potential source of radon. 3.2 Radon Level Estimation Models Although individual-level radon test results with zip codes were available, exact location information for each house tested was not provided. To protect privacy, the data provider masked the exact address of the test and released only the zip-codes. Although address masking was necessary for maintaining privacy, it reduced the accuracy of prediction algorithms, particularly in areas with high local variability. To address these limitations, a multi-model approach was employed to evaluate and compare the performance of various modeling strategies in the absence of precise location information. The RF algorithm 57 was selected as the base model for all comparisons, and allowing us to account for complex interaction effects between variables in our model. Additionally, the QRF model, an extension of RF, was for the individual-level analysis. This approach enabled a more detailed assessment by estimating the conditional distribution of radon concentrations. 58 , 59 The ability to estimate the conditional distribution makes QRF particularly effective for assessing exposure risks by modeling quantiles of radon exposure rather than just the mean or median concentration levels. For all models, the dependent and independent variables were measured at the ZCTA level because point-level information could not be assigned to individual radon tests. A comparison of three different models was used to provide a richer characterization of radon concentration risk within a ZCTA. The outcome for each model was as follows: 1. Average Model: The dependent variable is the mean of all individual-level radon tests within a ZCTA. The independent variables are the population-masked averages or most dominant conditions at the ZCTA level. This is the standard approach to modeling indoor radon risk. 2. Relative Variability Model: The dependent variable is the CoV of all individual-level radon tests within a ZCTA. The independent variables are the population-masked standard deviation at the ZCTA level. This model will allow us to identify ZCTAs with high levels of uncertainty when using the Average Model. 3. Individual QRF Model: The dependent variable is the median, 75th percentile, and 90th percentile of the individual level radon test results within a ZCTA. The independent variables are the population-masked averages at the ZCTA level. This model will allow us to identify ZCTAs that may have high levels of indoor radon exposure. Using multiple models enabled the quantification of model uncertainty in predicting indoor radon at the ZCTA-level. This ability, in turn, provides a framework for the interpretation of radon exposure levels in the absence of fine-scale spatial data. 3.2.1 Average Model The Average Model employed an RF algorithm to predict the mean radon concentration at the ZCTA level. In this approach, the average characteristics of a ZCTA were used to develop a predictive model that estimates the mean indoor radon exposure within that ZCTA. Radon measures were averaged within each ZCTA-month pair. For variables other than the radon measure, numerical variables were averaged across the months, and categorical variables were selected with the most frequent value within each ZCTA-month pair. This method preserved seasonal information by grouping the dataset by both ZCTAs and months. However, relying solely on aggregate values disregards the variability of radon concentrations within a ZCTA, potentially underestimating risk for high levels of indoor radon exposure in highly heterogenous areas. The model fit for these models is generally calculated using the mean radon level for the ZCTA. However, ecological approaches that aggregate values with a geographic area ignore the underlying variability in the indoor radon exposure known to exist within a ZCTA. To illustrate the limitations of an ecological approach, we recalculated the model fit using the individual radon tests as the observed value and the prediction from the Average Model. 3.2.2 Relative Variability Model In the second model, we predicted the variability of indoor radon exposure in a ZCTA by using RF to identify characteristics of ZCTAs with a high variability of indoor radon exposure. In this model, the CoV was used for the radon measure values, defined as the ratio of the standard deviation to the mean for the outcome. We hypothesized that the variability of geographic, meteorological, and housing characteristics that are known to affect indoor radon exposure would be associated with the variability of radon exposure within a ZCTA. Because many of the factors could be measured at smaller spatial scales, we used the variability of these factors to predict the variability in indoor radon exposure. This model employed the standard deviation (for numerical variables) and entropy (for categorical variables) of the variables at ZCTA-level across the H3 level-8 scale information to provide a comprehensive view of radon level fluctuations. Owing to the lack of variability in the measures from the ACS and DEC, we excluded measures of variation for fuel type, age of structure, and occupancy status. This approach enabled the quantification of uncertainty in predicted averages based on the variability of the input data. The Relative Variability Model complemented the Average Model by identifying areas with highly heterogeneous indoor radon exposure—a phenomenon that aggregate means alone could not capture. 3.2.3 Individual QRF Model The QRF algorithm extends the RF algorithm by estimating the conditional distribution of a target variable. This nonparametric machine learning method provided a means to evaluate both the range and uncertainty of predictions. Unlike traditional RF models, which focus on predicting the mean outcome, QRF enables the prediction of any quantile within the target distribution. This feature could be particularly useful for capturing the variability and range of radon concentrations within a ZCTA, thereby offering a deeper understanding of factors that contribute to both higher average exposure and extreme values. Such an approach is critical for risk assessment and public health planning because it supports a more comprehensive evaluation of exposure risks and the identification of localized areas with high concentrations of radon. In previous studies, QRF has been applied to various environmental datasets after the introduction of the algorithm by Meinshausen 58 (2006). Work by Vaysse and Lagacherie 60 (2017) and Maxwell 61 (2021) applied QRF to geological studies. These studies highlighted the utility of QRF in capturing the distribution and the variabilities of target variables, making it an appropriate choice for radon prediction in this study. Individual QRF models estimated the conditional distribution of radon concentrations across different percentiles rather than focusing only on mean predictions. The quantile-forest python package was used to implement QRF. 59 The model was trained by using the same set of predictors as the RF models, including geological, meteorological, and building-specific factors. The QRF model's ability to predict quantiles of radon concentration predictions further enhances our analysis and enables more informed decision-making by demonstrating the range of potential exposure levels in a ZCTA. Owing to the intensive computational complexity of the QRF model, its processing time was significantly longer than that of the RF model. To address this, advanced computational resources provided by the Compute and Data Environment for Science (CADES) at Oak Ridge National Laboratory were utilized to efficiently perform the analysis. 3.2.4 Model Evaluation Metrics The prediction performance of the RF and QRF models was evaluated by using a combination of metrics to assess both predictive accuracy and model interpretability. 3.2.4.1 Predictive Accuracy Metrics For the predictive accuracy metrics, root mean square error (RMSE), R-squared (R²), and mean absolute percentage error (MAPE) were used. RMSE quantifies the average magnitude of prediction errors, providing a measure of overall accuracy. MAPE measures the average percentage deviation between predicted and actual radon concentrations, offering an intuitive understanding of prediction error magnitude. To account for potential spatial autocorrelation within ZCTAs, grouped 5-fold cross-validation (CV) was implemented, in which folds were created based on ZCTA groupings. 3.2.4.2 Permutation Feature Importance and Partial Dependence Plots Permutation feature importance and partial dependence plots (PDPs) were analyzed to identify important features in each model. Permutation importance quantifies the reduction in model performance when the values of a feature are randomly shuffled, thereby identifying the features that are most critical for capturing the radon levels. To address the issue of highly correlated variables that dilute feature importance, only one variable from each group of highly correlated variables (absolute correlation coefficient > 0.85) was included in the model. Table 2 presents the groups of variables with absolute correlation coefficients that exceed 0.85. Additionally, variables such as hydrologic soil group and drainage class, which were already represented in aquifer permeability, were excluded to avoid redundancy. Table 2 Groups of highly correlated variables (|r| > 0.85). Bolded variables are used in the model to analyze the permutation importance. Group Variables 1 ‘Minimum elevation in watershed,’ ‘ Elevation’ 2 ‘Available Water Capacity WTA, 0 to 200 cm,’ ‘ Available Water Storage WTA, 0 to 200 cm ,’ ‘Available Water Supply, 0 to 25 cm,’ ‘Available Water Supply, 0 to 50 cm,’ ‘Available Water Supply, 0 to 150 cm,’ ‘Available Water Supply, 0 to 100 cm,’ ‘Water Content, 15 Bar WTA, 0 to 200 cm,’ ‘Water Content, One-Third Bar WTA, 0 to 200 cm’ 3 ‘Percent Clay WTA, 0 to 200 cm,’ ‘Linear Extensibility WTA, 0 to 200 cm,’ ‘Liquid Limit WTA, 0 to 200 cm,’ ‘Linear Extensibility WTA, 0 to 200 cm,’ ‘ Plasticity Index WTA, 0 to 200 cm ,’ ‘Percent Silt WTA, 0 to 200 cm’ 4 ‘Saturated Hydraulic Conductivity (Ksat) WTA, 0 to 200 cm,’ ‘ Saturated Hydraulic Conductivity (Ksat) , Standard Classes WTA , 0 to 200 cm ’ 5 ‘Percent flat land (slope less than 1%) in watershed lowland ,’ ‘Total percent flat land (slope less than 1%) in watershed’ 6 ‘ Maximum air temperature ,’ ‘Minimum air temperature,’ ‘Water vapor pressure’ 7 ‘ Estimate Total Utility Gas ,’ ‘Estimate total fuel oil, kerosene, etc.’ 8 ‘Potassium content in A horizon,’ ‘ Potassium content in 0 to 5 cm depth ’ 9 ‘Thorium content in A horizon,’ ‘ Thorium content in 0 to 5 cm depth ’ 10 ‘Uranium content in A horizon,’ ‘ Uranium content in 0 to 5 cm depth ’ PDPs were generated to visualize the marginal effects of individual features on the predicted radon concentrations. PDPs helped illustrate the relationship between each feature and the model's predictions, revealing whether the relationship was linear, monotonic, or more complex. This visual analysis complemented the permutation feature importance metrics, offering a more intuitive interpretation of the model's behavior. 4. Results and Discussion The Average Model and the Relative Variability Model were designed to capture different aspects of radon concentration within ZCTAs. The Average Model focused on predicting the mean radon concentration within a ZCTA, which provided an estimate of average radon levels across different areas. In contrast, the Relative Variability Model aimed to quantify the variation in radon concentrations within ZCTAs. By analyzing the CoV of indoor radon within a ZCTA, we can identify areas where radon concentrations may vary significantly, even if the average levels appear moderate. This approach supports targeted mitigation efforts by identifying locations with a higher likelihood of extreme radon exposure. The Average Model was designed to predict the average radon concentration within each ZCTA, using soil characteristics, climate variables, ACS and DEC data, and seasonal factors as predictors to examine how these factors influenced average radon levels within a ZCTA. The model test was conducted by applying 5 iterations of the 5-fold CV with ZCTA as the grouping variable and then without grouping variables. The exercise was conducted again to test the radon-level prediction data for individual houses when using the average model (Table 3 ). Table 3 Metrics and their standard deviation of Average Model tested with average and with individual tests. Tested with average Tested with individual 5-fold CV Group 5-fold (ZCTA) CV 5-fold CV Group 5-fold (ZCTA) CV RMSE 2.67 (0.19) 3.17 (0.14) 7.80 (0.15) 7.86 (0.30) R 2 0.67 (0.022) 0.53 (0.021) 0.12 (0.0020) 0.10 (0.0079) MAPE 20.68 (0.42) 27.71 (0.81) 166 (1.59) 167 (2.70) The metrics indicated fair predictive performance for the average radon level in the community but showed poor predictive performance for the radon level of individual houses. These results suggest significant variability in radon levels within each ZCTA. This limitation poses challenges from a public health perspective because some houses with high radon levels may remain untested due to low exposure levels in the surrounding areas, leading to gaps in risk identification and mitigation efforts. Permutation feature importance was assessed to identify the variables from the dataset that significantly affected average radon concentration predictions (Fig. 3 ). To examine the impact of specific predictors on radon levels, PDPs were generated for the most influential non-categorical variables (Fig. 4 ). These plots provided a detailed view of the relationship between individual predictors and radon levels. Permutation importance revealed that permeability, house heating fuel type, and hydraulic conductivity had the largest effects on determining the average radon level. The partial dependence of permeability suggests that increasing soil permeability or saturated hydraulic conductivity consistently results in higher average radon levels. Notable patterns are also observed in the PDPs for fuel types, where areas with higher usage of wood, coal, or coke as the primary fuel source exhibit elevated radon concentrations. In contrast, regions that use utility-provided gas heat as the main fuel type show lower radon levels. The partial dependence of relief indicates sharp increases in radon levels at lower relief values. Higher relief values have a diminishing effect beyond a certain point. Figure 5 illustrates the maps of the actual average radon levels alongside the averages of key variables by ZCTAs across Pennsylvania. The spatial distribution of average radon concentrations exhibited a notable resemblance to the permeability map. However, the map that shows utility gas as the primary fuel type indicates that urban areas generally have higher utility gas fuel usage and rural areas generally have higher wood fuel usage. In this study, we are unable to determine if fuel type directly influences radon concentrations or whether it serves as an indirect proxy for distinguishing urban from rural areas. Further analysis is warranted to discern the true relationship between fuel type and radon levels. The Average Model effectively predicted radon levels at the ZCTA level, providing a more refined analysis compared to county-level estimates. Offering predictions at the ZCTA level enabled a more detailed assessment of radon exposure risk at a smaller geographic scale. Given the significant variability in radon levels even within small areas, such granular predictions are crucial for accurate risk analysis and targeted public health interventions. The Average Model had a notable limitation in that it lacked detailed information on the variability of radon exposure within an area because all data were aggregated. As a result, the Average Model has a fair prediction accuracy for the radon levels of the community, but detailed predictions of radon levels for each house are not achievable. To address this, we used a Relative Variability Model for uncertainty quantification, enabling the identification of factors that predict high variability in ZCTA level measures. As with the Average Model, the Relative Variability Model test was conducted by applying 5 iterations of the 5-fold CV, both with ZCTA as the grouping variable and without grouping variables (Table 4 ). Table 4 Metrics and their standard deviation of the Relative Variability Model. 5-fold CV Group 5-fold (ZCTA) CV RMSE 0.23 (0.0055) 0.27 (0.0059) R 2 0.46 (0.014) 0.24 (0.094) MAPE 16.68 (0.38) 20.76 (0.77) Permutation feature importances and PDPs were evaluated with the Relative Variability Model to identify and illustrate the most influential factors that affect radon concentration variability, as shown in Fig. 6 . Although permeability and fuel type were identified as the most important variables that influence average radon levels, the factors that predict variability in radon levels are different (Fig. 7 ). We find that ZCTAs with large variability in elevation, saturated hydraulic conductivity, temperature, soil drainage, depth to restrictive soil layers, and/or soil moisture exhibit high levels of variability, or uncertainty, around average radon levels. Partial dependence analysis of the standard deviation of elevation, hydraulic conductivity, maximum temperature, and soil moisture indicated that the variability of local radon concentrations initially increased with greater variation of these variables. However, this was non-linear, with the increases plateauing beyond a certain point. In contrast, uncertainty of radon continued to increase gradually with greater variability in soil depth to the restrictive soil level. Conversely, as the variability in soil drainage increased, the uncertainty in an average radon level showed a decrease. Figure 8 shows the CoV of radon levels by ZCTA and the standard deviations of key variables by ZCTAs across Pennsylvania. The correlation of the CoVs of radon measures to the standard deviations of variables are not as high as the mapped average levels. The Relative Variability Model added value by quantifying the range and distribution of radon levels, highlighting areas with significant fluctuations and variabilities in radon concentrations. This model can be used to identify the characteristics of the area that has a higher chance of possessing the house with exceeding the action level. Figure 9 illustrates how areas with high average radon concentrations can also have high variability of radon concentrations. Overall, the darker-colored areas on this map represent regions where testing for indoor radon exposure should be prioritized because there are both high average levels and high variability. The specific colors (ranging from shades of red to shades of blue) illustrate the characteristics of radon distribution in these areas. Regions shaded in red indicate areas with high variability. Many areas exhibited similar patterns between the predicted average radon levels and the CoV of radon. However, some ZCTAs displayed distinct patterns of variability at the ZCTA level, even when their average radon levels were comparable. For example, ZCTA A (average: 10.58 pCi/L, CoV: 1.39), B (average: 9.80 pCi/L, CoV: 1.43), and C (average: 8.95 pCi/L, CoV: 1.43) had high radon levels and high CoV in the month of January for the 2008–2021 data. January. ZCTA D (average: 10.84 pCi/L, CoV: 0.88), E (average: 10.86 pCi/L, CoV: 0.94), and F (average: 8.99 pCi/L, CoV: 0.90) showed high average radon levels but showed low CoVs of radon in January. Table 5 compares the standard deviation of key variables that have the highest importance in the Relative Variability Model. The standard deviation of elevation, hydraulic conductivity, and maximum temperature is much higher for the high variability groups than for the low variability groups. This result shows that, although the average level of radon is similar to other areas, the variability of the radon level itself can be different if the variability of the variables (e.g., elevation, hydraulic conductivity, maximum temperature) differ. Table 5 Key variability for ZCTAs with high radon. High vs. low variability groups in January. ZCTA Standard deviation of elevation (m) Standard deviation of hydraulic conductivity (µm/s) Standard deviation of maximum temperature (°C) High average and high variability A 70.63765 12.27862 0.635485 B 32.44647 6.901275 0.22935 C 29.22913 5.436235 0.184695 High average and low variability D 7.444461 0.842725 0.035577 E 3.86221 0.415112 0.010676 F 12.89145 2.60695 0.05758 Some of the ZCTAs showed high variability of radon even though they had below-average radon levels. ZCTA G (average: 3.55 pCi/L, CoV: 1.36) and H (average: 3.60 pCi/L, CoV: 1.39) showed relatively low radon levels but high radon variability. H showed high variability in elevation (61.86 m), and G showed low variability in elevation and hydraulic conductivity. The Average Model and the Relative Variability Model do not require significant computational resources and are relatively simple models for predicting average radon levels and variability. However, they cannot capture the full spectrum of information available in the dataset. To provide a more detailed characterization of the distribution of indoor radon exposure within a ZCTA, a QRF Model was used to predict radon concentrations across different quantiles, thereby providing a comprehensive view of radon exposure risks. The QRF model implemented using the quantile-forest library extends the capabilities of traditional RF models by estimating the conditional distribution of radon levels rather than focusing solely on mean predictions. This method allows for a detailed assessment of radon risk by predicting a range of possible outcomes at various quantile levels within each ZCTA. RMSE, R 2 , and MAPE for the 50th, 75th, and 90th percentiles were used to evaluate the performance of the QRF model (Table 6 ). The evaluation was conducted with a grouped 5-fold CV by using ZCTA as the group. Based on the same reasoning as averaging ZCTA-month pairs for the Average Model, each predicted quantile value was tested with the actual quantile value of each ZCTA-month pair. To estimate and compare the estimated 90th percentile value with the measured 90th percentile value, the model used a dataset in which there were at least 10 measures for each ZCTA-month pair. Table 6 Metrics and standard deviation of individual quantile regression forest (QRF) model. 50th 75th 90th 5-fold CV Group 5-fold (ZCTA) CV 5-fold CV Group 5-fold (ZCTA) CV 5-fold CV Group 5-fold (ZCTA) CV RMSE 1.71 (0.026) 1.98 (0.32) 3.76 (0.046) 4.44 (0.51) 7.14 (0.064) 8.39 (0.58) R 2 0.51 (0.013) 0.35 (0.024) 0.61 (0.0078) 0.46 (0.05) 0.66 (0.0059) 0.52 (0.032) MAPE 19.57 (0.15) 21.37 (1.37) 23.21 (0.22) 30.19 (1.48) 31.70 (0.62) 42.31 (1.90) The importance of each variable on the QRF model was analyzed with permutation importance, and the permutation importance of the model was calculated for the 50th, 75th, and 90th percentiles (Figs. 10 – 12 , respectively). The analysis revealed notable trends, including a decrease in the importance of temperature as the target percentile increased. In contrast, the importance of permeability, elevation, and relief remained relatively consistent across different percentiles. When comparing partial dependence across quantiles, the partial dependence on permeability and relief remained relatively consistent, whereas the partial dependence on temperature or depth to soil-restrictive layer exhibited notable differences. For maximum temperature, permutation importance was highest at the 50th percentile but decreased at the 75th and 90th percentiles, a pattern reflected in its PDPs. Similarly, the importance of depth to the soil-restrictive layer was greatest at higher percentiles, with its PDPs showing the largest variation at these levels. These trends suggest that the variables influencing median radon levels may differ from those affecting higher concentrations. The Individual QRF Model effectively identified communities with a high likelihood of elevated radon levels. This model provided the added ability to predict various quantiles of radon levels, which was notably useful for identifying areas at higher risk for extreme exposures. The Relative Variability Model was able to identify the areas that might have high variability but was unable to point out the specific pattern or distribution of the radon level in the community. When revisiting the ZCTAs of high average and high variability analyzed with the Average Model and the Relative Variability Model, the 90th percentile to 50th percentile ratio of January predicted from the QRF model are 5.80 (A), 5.28 (B), and 5.64 (C). ZCTAs of high average and low variability showed 4.37 (D), 4.16 (E), and 4.78 (F) for the 90th percentile to 50th percentile ratio of January prediction. The ratio of high average and high variability ZCTAs tends to be higher than the ratio of high average and low variability ZCTAs. Furthermore, the QRF model can provide detailed information about what cannot be provided from the Average Model and the Relative Variability Model. When comparing the different models, each offered distinct strengths in assessing radon exposure. The Average Model provided a straightforward estimation of mean radon levels, which may be useful for some population-level risk analysis. 62 , 63 However, the Average Model was limited in its ability to account for local variability or identify the areas with homes with extreme radon levels. The Relative Variability Model addressed this limitation by quantifying the spread of radon concentrations. The Individual QRF Model further expanded on the profiling of radon exposure at the ZCTA level by predicting various quantiles of radon levels, making it possible to identify areas at higher risk for extreme exposures. By providing predictions at the ZCTA level, these models offered a more refined risk assessment tool compared to traditional county-level analyses. This granularity is necessary for identifying high-risk areas and overall risk of areas. Identifying and predicting variability or uncertainty in radon estimates within ZCTAs is crucial because radon levels can differ significantly even within the same community. These differences result in substantial uncertainties in estimated risks, highlighting the need for more tailored public health advisories and interventions that address the uncertainties in risk profiles not only across the communities but also within communities. The Individual QRF Model’s detailed predictions, which include various quantiles, are particularly useful for identifying areas with elevated risk. For instance, areas identified with high upper percentile predictions can be prioritized for more aggressive radon reduction measures. This approach ensures that resources are allocated efficiently, targeting areas with the highest potential risk. This study offers several strengths compared to previous studies. The use of multiple methods for characterizing ZCTA-level estimates of indoor radon exposure risk allows for more informed decision-making that can be directly applied to community-level interventions. By employing RF and QRF models, the analysis leveraged the capabilities of machine learning to handle complex datasets and nonlinear relationships. Multiple evaluation metrics, including RMSE, R², and MAPE, were utilized to ensure a comprehensive assessment of model performance. Notably, the Individual QRF Model offered unique capabilities that enabled a more thorough understanding of residential radon exposure and informed strategies for radiation protection and mitigation. Although this study advances radon exposure modeling and demonstrates the utility of machine learning for community-level risk assessment, the models and data presented some limitations. The reliance on ACS and DEC data assumed uniform distribution of variables, such as housing characteristics and demographics, across ZCTAs. This assumption may oversimplify local variability and fail to reflect the nuanced factors that influence radon exposure. Future studies that incorporate point-level data and more granular geographic information could address these limitations and enhance the models’ accuracy and applicability. Future research should aim to integrate more detailed geographic and temporal data to enhance the accuracy of radon predictions. This could include finer-scale soil and building data as well as finer-scale and long-term radon measurements to account for temporal variability. While this study focused on Pennsylvania, the methods developed can be applied to other regions with similar or different radon risk concerns. Expanding the geographic scope will validate the model’s applicability and help develop a comprehensive radon risk map for broader areas. Integrating radon predictions with other environmental hazards, such as air pollution and water quality, could provide a more detailed view of environmental health risks. This approach would help in designing multifaceted public health strategies that address multiple environmental factors simultaneously. Furthermore, with more granular radon-level predictions, follow-up studies could offer more resources for linking radon exposure to lung cancer incidence at the ZCTA-level, which was not achievable from the previous study. 5. Conclusion The study demonstrates the utility of a deep characterization of potential radon exposure with multiple machine learning models, particularly RF and QRF. Describing radon exposure risks and the potential uncertainties in estimates will facilitate a more targeted deployment of public health strategies and policies. Study findings underscore the importance of updating radon risk assessments using current data, access to individual test results, and advanced modeling techniques to better assess health risks associated with radon exposure. Future research should focus on further refining these models and extending their application to broader geographic regions and other environmental hazards to comprehensively enhance environmental health risk assessment. Declarations Competing interests The authors declare that they have no competing interests. Author Contribution HL conducted the study design, processed the data, performed the analyses, interpreted the results, drafted the manuscript, and contributed to its editing. DM and JL assisted with analysis design, data processing, and manuscript editing. GA and SD contributed to the study design and manuscript revisions. HH was responsible for study design, project supervision, interpretation of results, and critical revision of the manuscript. Acknowledgements This work was supported by the Office of Biological and Environmental Research’s Biological Systems Science Division. This manuscript has been authored by UT-Battelle LLC under Contract No. DE-AC05-00OR22725 with the US Department of Energy and Award AWD-002827 between UT-Battelle and the Georgia Tech Research Corporation. This research used resources of CADES at the Oak Ridge National Laboratory, which is supported by the US Department of Energy’s Office of Science under Contract No. DE-AC05-00OR22725. Data Availability All data generated or analysed during this study are included in this published article. References Wall, B. F. Ionising radiation exposure of the population of the United States: NCRP Report No. 160 (Oxford University Press, 2009). Organization, W. H. WHO handbook on indoor radon: a public health perspective (World Health Organization, 2009). Tirmarche, M. et al. ICRP Publication 115. Lung cancer risk from radon and progeny and statement on radon. Ann. ICRP . 40 , 1–64 (2010). Kim, S. H., Park, J. M. & Kim, H. The prevalence of stroke according to indoor radon concentration in South Koreans: Nationwide cross section study. Med. (Baltim). 99 , e18859. https://doi.org/10.1097/MD.0000000000018859 (2020). Dong, S. et al. Synergistic Effects of Particle Radioactivity (Gross beta Activity) and Particulate Matter =2.5 mum Aerodynamic Diameter on Cardiovascular Disease Mortality</at. J. Am. Heart Assoc. 11 , e025470. https://doi.org/10.1161/JAHA.121.025470 (2022). Lee, H. et al. Evaluating county-level lung cancer incidence from environmental radiation exposure, PM(2.5), and other exposures with regression and machine learning models. Environ. Geochem. Health . 46 , 82. https://doi.org/10.1007/s10653-023-01820-4 (2024). Al-Zoughool, M. & Krewski, D. Health effects of radon: a review of the literature. Int. J. Radiat. Biol. 85 , 57–69. https://doi.org/10.1080/09553000802635054 (2009). Council, N. R. Health effects of exposure to radon: BEIR VI (National Academies, 1999). Kang, J. K., Seo, S. & Jin, Y. W. Health Effects of Radon Exposure. Yonsei Med. J. 60 , 597–603. https://doi.org/10.3349/ymj.2019.60.7.597 (2019). Richardson, D. B. et al. Mortality among uranium miners in North America and Europe: the Pooled Uranium Miners Analysis (PUMA). Int. J. Epidemiol. 50 , 633–643. https://doi.org/10.1093/ije/dyaa195 (2021). Lagarde, F. et al. Glass-based radon-exposure assessment and lung cancer risk. J. Expo. Sci. Environ. Epidemiol. 12 , 344–354 (2002). Park, N. W., Kim, Y., Chang, B. U. & Kwak, G. H. County-level indoor radon concentration mapping and uncertainty assessment in South Korea using geostatistical simulation and environmental factors. J. Environ. Radioact. 208 , 106044 (2019). Fujimoto, K. & Sanada, T. Dependence of indoor radon concentration on the year of house construction. Health Phys. 77 , 410–419 (1999). Smith, B. J. & Field, R. W. Effect of housing factors and surficial uranium on the spatial prediction of residential radon in Iowa. Environmetrics 18 , 481–497. https://doi.org/10.1002/env.816 (2006). Abergel, R. et al. The enduring legacy of Marie Curie: impacts of radium in 21st century radiological and medical sciences. Int. J. Radiat. Biol. 98 , 267–275. https://doi.org/10.1080/09553002.2022.2027542 (2022). Gundersen, L. C. et al. Geology of radon in the United States. (1992). Otton, J. K. The geology of radon. (1992). Bulut, H. A., Şahin, R. & Radon Concrete, Buildings and Human Health—A Review Study. Buildings 14 , 510 (2024). Mustonen, R. Natural radioactivity in and radon exhalation from Finnish building materials. Health Phys. 46 , 1195–1203 (1984). Marcinowski, F., Lucas, R. M. & Yeager, W. M. National and regional distributions of airborne radon concentrations in US homes. Health Phys. 66 , 699–706 (1994). Yazzie, S. A., Davis, S., Seixas, N. & Yost, M. G. Assessing the Impact of Housing Features and Environmental Factors on Home Indoor Radon Concentration Levels on the Navajo Nation. Int. J. Environ. Res. Public. Health . 17 https://doi.org/10.3390/ijerph17082813 (2020). Sun, K., Guo, Q. & Cheng, J. The Effect of Some Soil Characteristics on Soil Radon Concentration and Radon Exhalation from Soil Surface. J. Nucl. Sci. Technol. 41 , 1113–1117. https://doi.org/10.1080/18811248.2004.9726337 (2004). Mose, D. G. & Mushrush, G. W. Prediction of indoor radon based on soil radon and soil permeability. J. Environ. Sci. Health Part. A . 34 , 1253–1266. https://doi.org/10.1080/10934529909376894 (1999). Hassan, N. M. et al. Radon migration process and its influence factors; review. Japanese J. Health Phys. 44 , 218–231 (2009). Khattak, N., Khan, M. A., Ali, N. & Abbas, S. M. Radon Monitoring for geological exploration: A review. J. Himal. Earth Sci. 44 , 91–102 (2011). Nunes, L. J. R., Curado, A., Graca, L., Soares, S. & Lopes, S. I. Impacts of Indoor Radon on Health: A Comprehensive Review on Causes, Assessment and Remediation Strategies. Int. J. Environ. Res. Public. Health . 19 https://doi.org/10.3390/ijerph19073929 (2022). Şen, G. Y., Içhedef, M., Saç, M. M. & Yener, G. Effect of natural gas usage on indoor radon levels. J. Radioanal. Nucl. Chem. 295 , 277–282. https://doi.org/10.1007/s10967-012-1841-8 (2012). Yang, J. et al. Modeling of radon exhalation from soil influenced by environmental parameters. Sci. Total Environ. 656 , 1304–1311. https://doi.org/10.1016/j.scitotenv.2018.11.464 (2019). Bochicchio, F. et al. Annual average and seasonal variations of residential radon concentration for all the Italian Regions. Radiat. Meas. 40 , 686–694 (2005). Miles, J. C., Howarth, C. B. & Hunter, N. Seasonal variation of radon concentrations in UK homes. J. Radiol. Prot. 32 , 275–287. https://doi.org/10.1088/0952-4746/32/3/275 (2012). Porstendorfer, J., Butterweck, G. & Reineking, A. Daily variation of the radon concentration indoors and outdoors and the influence of meteorological parameters. Health Phys. 67 , 283–287 (1994). Rey, J. F. et al. Long-term impacts of weather conditions on indoor radon concentration measurements in Switzerland. Atmosphere 13 , 92 (2022). Environmental Proteciton Agency. EPA Maps of Radon Zones and Supporting Documents by State. (1993). Price, P. N., Nero, A. V. & Gelman, A. Bayesian prediction of mean indoor radon concentrations for Minnesota counties. Health Phys. 71 , 922–936 (1996). Price, P. Predictions and maps of county mean indoor radon concentrations in the mid-Atlantic states. Health Phys. 72 , 893–906 (1997). Apte, M., Price, P., Nero, A. & Revzan, K. Predicting New Hampshire indoor radon concentrations from geologic information and other covariates. Environ. Geol. 37 , 181–194 (1999). Casey, J. A. et al. Predictors of Indoor Radon Concentrations in Pennsylvania, 1989–2013. Environ. Health Perspect. 123 , 1130–1137. https://doi.org/10.1289/ehp.1409014 (2015). Kropat, G. et al. Improved predictive mapping of indoor radon concentrations using ensemble regression trees based on automatic clustering of geological units. J. Environ. Radioact . 147 , 51–62. https://doi.org/10.1016/j.jenvrad.2015.05.006 (2015). Nikkila, A. et al. Predicting residential radon concentrations in Finland: Model development, validation, and application to childhood leukemia. Scand. J. Work Environ. Health . 46 , 278–292. https://doi.org/10.5271/sjweh.3867 (2020). Dai, D. et al. Confluent impact of housing and geology on indoor radon concentrations in Atlanta, Georgia, United States. Sci. Total Environ. 668 , 500–511. https://doi.org/10.1016/j.scitotenv.2019.02.257 (2019). Li, L. et al. Predicting Monthly Community-Level Domestic Radon Concentrations in the Greater Boston Area with an Ensemble Learning Model. Environ. Sci. Technol. 55 , 7157–7166. https://doi.org/10.1021/acs.est.0c08792 (2021). UBER. H3: Uber’s Hexagonal Hierarchical Spatial Index , < https://www.uber.com/blog/h3/ (. Maguire, D., Logan, J., Lee, H. & Hanson, H. Radon Exposure Dataset. arXiv preprint arXiv:2505.09489 (2025). Pennsylvania Department of Enviornmental Protection. Radon Test Results September 1986 - Current Annual County Environmental Protection. October 13, (2023). Administration, H. R. & a. S. UDS Mapper , (2023). http://www.udsmapper.org/ Weber, E. et al. LandScan USA (Oak Ridge National Laboratory, 2022). Danielson, J. J. & Gesch, D. B. Global multi-resolution terrain elevation data 2010 (GMTED2010). Report No. 2331 – 1258 (US Geological Survey, 2011). Dahn H3-Pandas , (2021). https://h3-pandas.readthedocs.io/en/latest/ Staff, S. S. (ed) (ed United States Department of Agriculture). Smith, D. B., Solano, F., Woodruff, L. G., Cannon, W. F. & Ellefsen, K. J. Geochemical and mineralogical maps, with interpretation, for soils of the conterminous United States. Report. Reston, VA (2019). Wieczorek, M. E. & a. L., A.EU.S. Geological Survey data release,. (2010). Thornton, M. et al. Daymet: Daily surface weather data on a 1-km grid for North America, version 4 R1. ORNL DAAC, Oak Ridge, Tennessee, USA. Single Pixel Extraction Tool| Daymet (ornl. gov) (2022). Harris, C. R. et al. Array programming with NumPy. Nature 585 , 357–362 (2020). Rey, S. J. & Anselin, L. in Handbook of applied spatial analysis: Software tools, methods and applications 175–193 (Springer, 2009). United State Census Bureau. YEAR STRUCTURE BUILT. American Community Survey, ACS 5-Year Estimates Detailed Tables, Table B25034, (2020). United State Census Bureau. YEAR STRUCTURE BUILT [10], Decennial Census, DEC Summary File 3, Table H034. (2000). (2000). Pedregosa, F. et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12 , 2825–2830 (2011). Meinshausen, N. & Ridgeway, G. Quantile regression forests. Journal Mach. Learn. research 7 (2006). Johnson, R. A. quantile-forest: A python package for quantile regression forests. J. Open. Source Softw. 9 , 5976 (2024). Vaysse, K. & Lagacherie, P. Using quantile regression forest to estimate uncertainty of digital soil mapping products. Geoderma 291 , 55–64 (2017). Maxwell, K., Rajabi, M. & Esterle, J. Spatial interpolation of coal properties using geographic quantile regression forest. Int. J. Coal Geol. 248 , 103869 (2021). Lubin, J. H. & Boice, J. D. Jr Lung cancer risk from residential radon: meta-analysis of eight epidemiologic studies. J. Natl Cancer Inst. 89 , 49–57 (1997). Ajrouche, R. et al. Quantitative health risk assessment of indoor radon: a systematic review. Radiat. Prot. Dosimetry . 177 , 69–77 (2017). Appendix. Another random. forest (RF) model that uses the individual-level data was investigated to test the adoptability of predicting the radon level of individual houses by using the aggregated independent variables. For this model, RF was used similarly to the previous Average Model and Relative Variability Model. Performance metrics such as root mean square error (RMSE), R-squared (R²), and mean absolute percentage error (MAPE) were used to evaluate the model's performance, and the evaluation was conducted by using 5 iterations of group 5-fold cross-validation (CV) with zip-code tabulation area (ZCTA) as the grouping variable. (Table A1) The metrics indicated poor predictive performance, suggesting significant variability in radon levels within each ZCTA, similar to the Average Model. Predicting high-variability measures from aggregated variables resulted in poor accuracy and highlighted the need for non-aggregated variables or other approaches. Additional Declarations No competing interests reported. Supplementary Files Appendix.docx Cite Share Download PDF Status: Published Journal Publication published 05 Mar, 2026 Read the published version in Scientific Reports → Version 1 posted Editorial decision: Revision requested 07 Oct, 2025 Reviews received at journal 07 Oct, 2025 Reviewers agreed at journal 15 Sep, 2025 Reviews received at journal 29 Aug, 2025 Reviewers agreed at journal 24 Jul, 2025 Reviewers agreed at journal 11 Jul, 2025 Reviewers invited by journal 23 Jun, 2025 Editor assigned by journal 23 Jun, 2025 Editor invited by journal 12 Jun, 2025 Submission checks completed at journal 11 Jun, 2025 First submitted to journal 09 Jun, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6857670","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":476048972,"identity":"49c85d1d-e621-47ea-8f5b-325dcacd359c","order_by":0,"name":"Heechan Lee","email":"","orcid":"","institution":"Georgia Institute of Technology","correspondingAuthor":false,"prefix":"","firstName":"Heechan","middleName":"","lastName":"Lee","suffix":""},{"id":476048973,"identity":"a4cfd4c5-d6ce-4f01-b9dc-30d934bd3145","order_by":1,"name":"Dakotah Maguire","email":"","orcid":"","institution":"Oak Ridge National Laboratory","correspondingAuthor":false,"prefix":"","firstName":"Dakotah","middleName":"","lastName":"Maguire","suffix":""},{"id":476048974,"identity":"216d9c66-7586-4bc2-880d-538c110a3d05","order_by":2,"name":"Jeremy Logan","email":"","orcid":"","institution":"Oak Ridge National Laboratory","correspondingAuthor":false,"prefix":"","firstName":"Jeremy","middleName":"","lastName":"Logan","suffix":""},{"id":476048975,"identity":"9cf6822b-42b5-4935-acfb-c0f149c6f404","order_by":3,"name":"Greeshma Agasthya","email":"","orcid":"","institution":"Georgia Institute of Technology","correspondingAuthor":false,"prefix":"","firstName":"Greeshma","middleName":"","lastName":"Agasthya","suffix":""},{"id":476048976,"identity":"bfb868f4-ee9a-406f-aca7-03407aadb302","order_by":4,"name":"Shaheen Dewji","email":"","orcid":"","institution":"Georgia Institute of Technology","correspondingAuthor":false,"prefix":"","firstName":"Shaheen","middleName":"","lastName":"Dewji","suffix":""},{"id":476048977,"identity":"88b1f0e5-69aa-467c-9254-86c52cb2f2e9","order_by":5,"name":"Heidi A. Hanson","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAwklEQVRIiWNgGAWjYBACAwh1gIEfRD0gSYtkA5BKIEmLwQFitZiznz384eOOO/LGx3ufPUhgqJUD68UHLHvy0iRnnnlmuO3McXODBIbjxgS1GBzIMWPmbTvMuO1GGptEAsOxxJkNhLScf2P8GajFfvMMorXcyDGQBmpJ3CAB1lKT2E9AB1DLGzPJmW2Hk2ecOQbUYnDAmJ+glvM5xh8+th227W9vY5P4UFEnx0ZIC7oJh0nUAAR1pGsZBaNgFIyCYQ8AV3xDqGMNNYUAAAAASUVORK5CYII=","orcid":"","institution":"Oak Ridge National Laboratory","correspondingAuthor":true,"prefix":"","firstName":"Heidi","middleName":"A.","lastName":"Hanson","suffix":""}],"badges":[],"createdAt":"2025-06-09 23:23:12","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6857670/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6857670/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41598-026-37891-3","type":"published","date":"2026-03-05T15:59:11+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":85475829,"identity":"9b0ecdeb-6a81-45b4-a07f-d4d8dc9a177b","added_by":"auto","created_at":"2025-06-26 10:03:12","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":638147,"visible":true,"origin":"","legend":"\u003cp\u003eProcess of the data selection and the number of measures after each step.\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-6857670/v1/e25ea4ff4ef45909c0b48129.png"},{"id":85475827,"identity":"c4384fcd-d8b6-48ad-a5de-11660a77607f","added_by":"auto","created_at":"2025-06-26 10:03:12","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":2157814,"visible":true,"origin":"","legend":"\u003cp\u003ePopulated hexagons in Pennsylvania and the boundaries of ZCTAs.\u003c/p\u003e","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-6857670/v1/1dbe444fb21af24f725bbf01.png"},{"id":85475834,"identity":"ed264d54-fba9-4d4a-8dea-c7faf83020ff","added_by":"auto","created_at":"2025-06-26 10:03:12","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":743804,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePermutation feature importance of the top-20 variables for the Average Model.\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-6857670/v1/f2194116120d559a2336a9a1.png"},{"id":85475831,"identity":"e39dcece-fa64-4f68-b68a-902e76bb2041","added_by":"auto","created_at":"2025-06-26 10:03:12","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":666313,"visible":true,"origin":"","legend":"\u003cp\u003ePartial dependence plots (PDPs) of the featured variables for the Average Model (top-6 importance non-categorical features, all variables are scaled).\u003c/p\u003e","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-6857670/v1/c89e3aedc02f5736cd93b988.png"},{"id":85475837,"identity":"3e83050b-51d8-4160-b120-cedb0c40bfcf","added_by":"auto","created_at":"2025-06-26 10:03:12","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":2123694,"visible":true,"origin":"","legend":"\u003cp\u003eActual average and three most important variables of January by ZCTA (and ZCTAs where more than 3 measures exist). Striped areas indicate ZCTAs where data is not available.\u003c/p\u003e","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-6857670/v1/c699577ef0e4eadec2aa8f51.png"},{"id":85476876,"identity":"18485279-1e55-4456-9c41-0fbb4327119c","added_by":"auto","created_at":"2025-06-26 10:11:12","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":660763,"visible":true,"origin":"","legend":"\u003cp\u003ePermutation feature importance of top-10 variables for Relative Variability Model.\u003c/p\u003e","description":"","filename":"floatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-6857670/v1/2919aba998fca8100a683a58.png"},{"id":85476881,"identity":"994e005d-7618-4369-97e6-6ddad511a344","added_by":"auto","created_at":"2025-06-26 10:11:12","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":650971,"visible":true,"origin":"","legend":"\u003cp\u003ePDPs of the featured variables for the variability model (top-6 importance features).\u003c/p\u003e","description":"","filename":"floatimage7.png","url":"https://assets-eu.researchsquare.com/files/rs-6857670/v1/42da9eb409836489f62bae13.png"},{"id":85477185,"identity":"40252193-31ff-4ba6-b577-aeb9bb2ad9f6","added_by":"auto","created_at":"2025-06-26 10:19:12","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":2173723,"visible":true,"origin":"","legend":"\u003cp\u003eActual coefficient of variance (CoV) and the standard deviations (STDs) of the top-3 most important variables for January by ZCTA (ZCTAs where more than 3 measures exist). Striped areas indicate ZCTAs where data is not available.\u003c/p\u003e","description":"","filename":"floatimage8.png","url":"https://assets-eu.researchsquare.com/files/rs-6857670/v1/f51c58ba055316490fc759f1.png"},{"id":85477184,"identity":"3b16f067-15c9-41cd-b5e4-4a57eade308b","added_by":"auto","created_at":"2025-06-26 10:19:12","extension":"png","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":1612569,"visible":true,"origin":"","legend":"\u003cp\u003ePredicted average and CoVs of radon levels in Pennsylvania for January by ZCTAs (ZCTAs where more than 3 measures exist). Striped areas indicate ZCTAs where data is not available.\u003c/p\u003e","description":"","filename":"floatimage9.png","url":"https://assets-eu.researchsquare.com/files/rs-6857670/v1/712d81ff50af6faf3641331d.png"},{"id":85475842,"identity":"08047daa-52ce-4a20-98a9-6ebf56b92fbd","added_by":"auto","created_at":"2025-06-26 10:03:12","extension":"png","order_by":10,"title":"Figure 10","display":"","copyAsset":false,"role":"figure","size":731845,"visible":true,"origin":"","legend":"\u003cp\u003ePermutation feature importance of top-10 variables for 50th percentiles of individual a quantile regression forest (QRF) model.\u003c/p\u003e","description":"","filename":"floatimage10.png","url":"https://assets-eu.researchsquare.com/files/rs-6857670/v1/46d4e390ead951fe5b07da5e.png"},{"id":85475843,"identity":"5884c203-9faf-497b-b4b5-fe5c915b2aed","added_by":"auto","created_at":"2025-06-26 10:03:12","extension":"png","order_by":11,"title":"Figure 11","display":"","copyAsset":false,"role":"figure","size":752369,"visible":true,"origin":"","legend":"\u003cp\u003ePermutation feature importance of top-10 variables for 75th percentiles of an individual QRF model.\u003c/p\u003e","description":"","filename":"floatimage11.png","url":"https://assets-eu.researchsquare.com/files/rs-6857670/v1/66e33c8f6e7989e559e29fec.png"},{"id":85476885,"identity":"0b100b5f-2593-4643-baa1-7f562e91331f","added_by":"auto","created_at":"2025-06-26 10:11:12","extension":"png","order_by":12,"title":"Figure 12","display":"","copyAsset":false,"role":"figure","size":721848,"visible":true,"origin":"","legend":"\u003cp\u003ePermutation feature importance of top-10 variables for 90th percentiles of Individual QRF Model.\u003c/p\u003e","description":"","filename":"floatimage12.png","url":"https://assets-eu.researchsquare.com/files/rs-6857670/v1/4f4cbf83dd6f5565e5a24f27.png"},{"id":85475874,"identity":"e3c10492-7830-429d-895c-ff07884b1995","added_by":"auto","created_at":"2025-06-26 10:03:13","extension":"png","order_by":13,"title":"Figure 13","display":"","copyAsset":false,"role":"figure","size":285849,"visible":true,"origin":"","legend":"\u003cp\u003ePDPs of six variables (permeability, elevation, fuel type wood, relief, depth to any soil restrictive layer, and the maximum temperature) at 50th percentile.\u003c/p\u003e","description":"","filename":"floatimage13.png","url":"https://assets-eu.researchsquare.com/files/rs-6857670/v1/38809c171822d2548f2d56f5.png"},{"id":85475867,"identity":"a69cbf8f-ec57-4f8b-acd0-54abb2a7ec54","added_by":"auto","created_at":"2025-06-26 10:03:13","extension":"png","order_by":14,"title":"Figure 14","display":"","copyAsset":false,"role":"figure","size":285830,"visible":true,"origin":"","legend":"\u003cp\u003ePDPs of six variables (permeability, elevation, fuel type wood, relief, depth to any soil restrictive layer, and the maximum temperature) at 75th percentile.\u003c/p\u003e","description":"","filename":"floatimage14.png","url":"https://assets-eu.researchsquare.com/files/rs-6857670/v1/930243565d9b092c62d6fc9b.png"},{"id":85476883,"identity":"a0313d89-4e01-4d9d-b5b9-cb20c8d2f809","added_by":"auto","created_at":"2025-06-26 10:11:12","extension":"png","order_by":15,"title":"Figure 15","display":"","copyAsset":false,"role":"figure","size":292549,"visible":true,"origin":"","legend":"\u003cp\u003ePDPs of six variables (permeability, elevation, fuel type wood, relief, depth to any soil restrictive layer, and the maximum temperature) at 90th percentile.\u003c/p\u003e","description":"","filename":"floatimage15.png","url":"https://assets-eu.researchsquare.com/files/rs-6857670/v1/d630f58e8b71b78750d57aed.png"},{"id":85476888,"identity":"ab693e2f-ad5c-4cf6-b233-ad4b82cd08ac","added_by":"auto","created_at":"2025-06-26 10:11:12","extension":"png","order_by":16,"title":"Figure 16","display":"","copyAsset":false,"role":"figure","size":1365417,"visible":true,"origin":"","legend":"\u003cp\u003ePredicted 50th percentile of radon level of January by ZCTA (ZCTAs where populated). Striped areas indicate ZCTAs where data is not available.\u003c/p\u003e","description":"","filename":"floatimage16.png","url":"https://assets-eu.researchsquare.com/files/rs-6857670/v1/aae4041b580e1fe9baa75a48.png"},{"id":85475879,"identity":"8adede77-6694-4593-be9e-7c0d9ff05d69","added_by":"auto","created_at":"2025-06-26 10:03:13","extension":"png","order_by":17,"title":"Figure 17","display":"","copyAsset":false,"role":"figure","size":1328954,"visible":true,"origin":"","legend":"\u003cp\u003ePredicted 75th percentile of radon levels for January by ZCTA (ZCTAs where populated). Striped areas indicate ZCTAs where data is not available.\u003c/p\u003e","description":"","filename":"floatimage17.png","url":"https://assets-eu.researchsquare.com/files/rs-6857670/v1/4639958ea8db964f4b018cdb.png"},{"id":85476889,"identity":"81f2dd91-5dfc-4303-a233-db1650aa7ee6","added_by":"auto","created_at":"2025-06-26 10:11:13","extension":"png","order_by":18,"title":"Figure 18","display":"","copyAsset":false,"role":"figure","size":1093827,"visible":true,"origin":"","legend":"\u003cp\u003ePredicted 90th percentile of radon levels for January by ZCTA (ZCTAs where populated). Striped areas indicate ZCTAs where data is not available.\u003c/p\u003e","description":"","filename":"floatimage18.png","url":"https://assets-eu.researchsquare.com/files/rs-6857670/v1/f7746a52fa264cf67d33f017.png"},{"id":104250871,"identity":"8dc823c1-a285-49db-9c38-9cec32c29973","added_by":"auto","created_at":"2026-03-09 16:10:59","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":19740129,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6857670/v1/c63211d7-73aa-4701-a566-3ab0730f3c52.pdf"},{"id":85475828,"identity":"eb57a395-1a36-4394-9ef9-2653e0b22f89","added_by":"auto","created_at":"2025-06-26 10:03:12","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":18400,"visible":true,"origin":"","legend":"","description":"","filename":"Appendix.docx","url":"https://assets-eu.researchsquare.com/files/rs-6857670/v1/458d3603c3feb9eb19dae658.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"Quantifying Uncertainty in Indoor Radon Exposure Estimates in Pennsylvania with Quantile Regression Forests","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eAs a naturally occurring radioactive gas, radon is a major contributor to background radiation exposure\u003csup\u003e1\u003c/sup\u003e and a significant health concern as the second leading cause of lung cancer overall and the leading cause of lung cancer in nonsmokers.\u003csup\u003e2\u003c/sup\u003e Seeping undetected into dwellings through the ground, this colorless and odorless gas poses unique risks because radon’s decay products can attach to lung tissue and emit ionizing radiation that initiates carcinogenic processes.\u003csup\u003e3\u003c/sup\u003e Recent studies have also suggested potential links between radon exposure and cardiovascular disease\u003csup\u003e4,5\u003c/sup\u003e and its potential interaction with PM 2.5,\u003csup\u003e5,6\u003c/sup\u003e findings that could broaden the understanding of radon’s health impacts.\u003c/p\u003e\n\u003cp\u003eDespite the well-established links between radon exposure and adverse health outcomes,\u003csup\u003e7-10\u003c/sup\u003e radon measurement and exposure assessment often involve significant uncertainties.\u003csup\u003e11,12\u003c/sup\u003e Indoor radon concentrations can vary significantly, even between neighboring houses, owing to differences in soil composition, building materials, and ventilation systems.\u003csup\u003e13,14\u003c/sup\u003e Moreover, radon estimates are currently limited to county-level data, which insufficiently captures the localized variability of radon exposure. This limitation poses challenges for effective public health interventions and epidemiological research.\u003c/p\u003e\n\u003cp\u003eThis study applied multiple machine learning models to predict indoor radon concentrations at the zip-code scale and to examine factors contributing to uncertainty in these predictions. This approach hypothesizes that integrating geological, meteorological, and building-specific factors improves the accuracy of radon concentration predictions at the zip-code scale while identifying areas where mean or median predictions may underestimate exposure for a large percentage of the population. By combining these models, this study offers a more refined approach for radon risk assessment and targeted mitigation strategies. Estimating indoor radon exposure at finer spatial scales and quantifying the uncertainty in these estimates can also enhance the precision of environmental exposure data for epidemiological studies investigating the links between indoor radon exposure and health outcomes. Indoor radon tests were used to develop predictive models at the zip-code tabulation area (ZCTA) scale, with the overarching goal of providing more accurate assessments of risks posed by indoor radon exposure.\u0026nbsp;\u003c/p\u003e"},{"header":"2. Background","content":"\u003cdiv id=\"Sec2\" class=\"Section2\"\u003e \u003ch2\u003e2.1 Characteristics of Radon\u003c/h2\u003e \u003cp\u003eRadon is a radioactive gas formed from the natural decay of uranium, which is commonly found in rocks, soil, and certain building materials. Inhalation of radon and its progeny exposes biological tissue to ionizing radiation, which has the potential to induce DNA damage and increase the risk of cancer, with lung cancer being the highest potential risk.\u003csup\u003e\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e,\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e\u003c/sup\u003e Radon-222, the isotope predominantly linked with radon-induced health risks, has a half-life of ~\u0026thinsp;3.8 days and produces radioactive progeny, including polonium-218 and polonium-214. Because radon gas is colorless and odorless, specialized instruments are required to detect and measure it.\u003c/p\u003e \u003cp\u003eRadon is primarily released from uranium-rich igneous rocks (e.g., granite) and sedimentary formations (e.g., phosphate rocks, shales) with higher uranium content. Limestone, although generally low in uranium content, can also emit radon in trace amounts.\u003csup\u003e\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e,\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e\u003c/sup\u003e Building materials, including concrete and gypsum wallboard, may also emit radon if they contain traces of uranium.\u003csup\u003e\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e,\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u003c/sup\u003e Because radon gas is significantly denser than air, once emitted, it tends to accumulate in the lowest areas of buildings (e.g., basements, below-ground spaces), although elevated radon levels can also be found in poorly ventilated ground-level rooms.\u003csup\u003e\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e,\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u003c/sup\u003e\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.2 Variables Affecting Indoor Radon Concentrations\u003c/h2\u003e \u003cp\u003eRadon concentration at a specific location is not a random occurrence but is determined by a complex interplay of geological, structural/architectural, and meteorological factors. These factors can either exacerbate or attenuate radon levels and ultimately influence the extent of indoor radon exposure.\u003c/p\u003e \u003cdiv id=\"Sec4\" class=\"Section3\"\u003e \u003ch2\u003e2.2.1 Geological Factors\u003c/h2\u003e \u003cp\u003eRadon is a geochemically generated gas that can exhibit significant spatial variability caused by underlying geological conditions. Elevated radon conditions are typically seen in regions with geologic formations that have elevated uranium levels\u0026mdash;notably in granite, phosphate rocks, and shales.\u003csup\u003e\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e,\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e,\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e\u003c/sup\u003e Additionally, the physical properties of soil, including its density, porosity, and moisture levels, are key determinants of radon migration from the subsurface to the atmosphere.\u003csup\u003e\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e,\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e,\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e\u003c/sup\u003e For example, clay-rich soils tend to inhibit radon diffusion and release, whereas sandy soils facilitate radon mobility. Moreover, geological discontinuities (e.g., faults and fractures) further enhance the transport of radon by providing pathways for upward migration of gas into the environment.\u003csup\u003e\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e,\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e,\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e\u003c/sup\u003e\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section3\"\u003e \u003ch2\u003e2.2.2 Structural and Architectural Factors\u003c/h2\u003e \u003cp\u003eThe entry and accumulation of radon within buildings are strongly influenced by architectural and structural factors. The construction materials themselves may serve as sources of radon emissions. For example, stone foundations can emit higher levels of radon compared to foundations made of other materials.\u003csup\u003e\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u003c/sup\u003e Older buildings that suffer from structural degradation and the presence of cracks typically exhibit higher radon concentrations due to an increased number of infiltration points.\u003csup\u003e\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e\u003c/sup\u003e Ventilation systems play a significant role in moderating indoor radon levels, and well-ventilated spaces dilute radon concentrations more effectively.\u003csup\u003e\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e,\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e\u003c/sup\u003e Furthermore, numerous other factors (as derived from census data) could contribute to indoor radon concentration, including the number of housing units, their occupancy status, structural attributes (e.g., the number of units in a structure), and type of heating fuel used.\u003csup\u003e\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e,\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e\u003c/sup\u003e\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section3\"\u003e \u003ch2\u003e2.2.3 Meteorological Factors\u003c/h2\u003e \u003cp\u003eMeteorological conditions play an important role in fluctuating levels of indoor radon concentration. Temperature and atmospheric pressure affect radon's movement from the ground into buildings. For example, low atmospheric pressure can increase radon entry, and temperature differences between indoor areas and outdoor areas can result in pressure differences that facilitate the entry of radon into a structure.\u003csup\u003e\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e\u003c/sup\u003e Seasonal changes further impact indoor radon levels, and concentrations are often higher during the winter than during the summer.\u003csup\u003e\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e,\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e\u003c/sup\u003e Furthermore, weather patterns, including precipitation, can also change radon concentrations. Heavy rainfall can increase soil moisture and potentially reduce radon emissions, whereas dry conditions can allow for greater radon emissions from the dry soil.\u003csup\u003e\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e,\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e\u003c/sup\u003e\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003e2.3 Previous Radon Level Prediction Models\u003c/h2\u003e \u003cp\u003ePrevious studies have employed different modeling approaches to estimate radon concentrations across multiple geographic regions. The Environmental Protection Agency\u0026rsquo;s Radon Zone project\u003csup\u003e\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e\u003c/sup\u003e classified counties based on geological and soil characteristics, offering a broad but useful categorization. Subsequent studies have used more sophisticated statistical methods. Price et al.\u003csup\u003e\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e,\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e\u003c/sup\u003e employed Bayesian models that incorporated geological data to improve county-level predictions of radon concentrations in Minnesota and mid-Atlantic states. Mose and Mushrush\u003csup\u003e\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e\u003c/sup\u003e explored the correlation between soil radon levels, permeability, and indoor radon levels, emphasizing the complexity of radon entry into homes. Apte et al.\u003csup\u003e\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e\u003c/sup\u003e employed mixed-effects regression models for predicting radon levels in New Hampshire, accounting for housing types and geological features. Similarly, Smith and Field\u003csup\u003e\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u003c/sup\u003e developed a Bayesian hierarchical model to predict residential radon in Iowa by combining regional geological data and housing characteristics, and Casey et al.\u003csup\u003e\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e\u003c/sup\u003e analyzed how geology, well water usage, and other factors influenced radon levels in Pennsylvania.\u003c/p\u003e \u003cp\u003eRecent studies have also applied machine learning techniques to more accurately predict radon levels. Kropat et al.\u003csup\u003e\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e\u003c/sup\u003e and Nikkil\u0026auml; et al.\u003csup\u003e\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e\u003c/sup\u003e used random forests (RFs) and other ensemble methods to predict radon concentrations in Switzerland and Finland, respectively, by incorporating detailed geological data. Dai et al.\u003csup\u003e\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e\u003c/sup\u003e identified geological fault zones and housing characteristics as critical factors for radon risk in Georgia. Li et al.\u003csup\u003e\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e\u003c/sup\u003e introduced an ensemble-based machine learning model that integrated multiple data types to predict monthly radon concentrations at the ZCTA level in the greater Boston area. These advanced approaches allow for the integration of environmental, architectural, and demographic factors, thereby enabling more comprehensive and nuanced radon risk assessments.\u003c/p\u003e \u003c/div\u003e"},{"header":"3. Methods","content":"\u003cp\u003eThis study employed machine learning techniques, specifically RF and quantile regression forest (QRF), to estimate the mean and quantiles of radon concentrations. We also used a volatility model framework and modeled the relative variability of radon at the ZCTA level. Residential radon values are from pre-mitigation indoor home radon tests. Predictive attributes include the physical characteristics known to affect radon exposure, and they were measured at the ZCTA level (elevation, soil, hydrologic, meteorological, and census data). Data were integrated to create predictive models that incorporate information from over 60 features related to residential radon exposure. These datasets were preprocessed and aggregated by using the H3 spatial indexing system to allow for a common spatial scale for all datasets using workflows developed for the Centralized Health and Environmental Repository (C-HER).\u003csup\u003e\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e\u003c/sup\u003e The application of C-HER workflows enabled the harmonization of data across multiple spatial scales into a single reference scale. Detailed descriptions of the data processing methodologies for each dataset are provided in the associated methods white paper.\u003csup\u003e\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e\u003c/sup\u003e A concise overview of these methodologies is included in this paper for reference.\u003c/p\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003e3.1 Data Processing\u003c/h2\u003e \u003cdiv id=\"Sec10\" class=\"Section3\"\u003e \u003ch2\u003e3.1.1 Residential Radon Values\u003c/h2\u003e \u003cp\u003eIndoor radon test data collected from 2008 to 2021 were provided by the Pennsylvania Department of Environmental Protection (n\u0026thinsp;=\u0026thinsp;1,622,169).\u003csup\u003e44\u003c/sup\u003e Pennsylvania was selected as the study area owing to the high prevalence of radon in the state. The dataset includes detailed information such as county, residential address postal code, building purpose, test floor level, and test date. The residential radon measures from 2008 to 2021 showed a 5.92 pCi/L average, a 2.60 pCi/L median, 1.40 pCi/L for the 25th percentile, and 5.40 pCi/L for the 75th percentile.\u003c/p\u003e \u003cp\u003eIn this study, several exclusion criteria were applied to refine the dataset (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). Tests with inappropriate durations (fewer than 2 days or more than 15 days, n\u0026thinsp;=\u0026thinsp;9,772) were removed. Non-residential indoor radon tests were excluded (n\u0026thinsp;=\u0026thinsp;533,804). Tests with out-of-range values, specifically those less than 0 or greater than 9,999, as well as tests lacking information on the test floor level, were also excluded (n\u0026thinsp;=\u0026thinsp;49,740). Additionally, any test with a value exceeding the 99th percentile within a single zip code was excluded (n\u0026thinsp;=\u0026thinsp;8,242). To address differences in indoor radon concentrations across floors, only measurements taken in basements, where radon levels are typically highest, were included (n\u0026thinsp;=\u0026thinsp;80,697). For houses with multiple measurements across time, only the first recorded measurement was retained (n\u0026thinsp;=\u0026thinsp;162,330) because subsequent measurements were likely taken post-mitigation.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eZip codes for radon test kit addresses were linked to Census ZCTAs by using a 2020 crosswalk file from the Uniform Data System Mapper.\u003csup\u003e\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e\u003c/sup\u003e For cases when no direct match was available, USPS zip code updates were applied.\u003c/p\u003e \u003cp\u003eTo calculate ZCTA-level statistics, observations were clustered using ZCTA-month pairs for calculating averages or the coefficient of variations (CoVs) for a ZCTA. This also enabled us to account for the significant seasonal variability in radon levels.\u003csup\u003e\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e,\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e\u003c/sup\u003e Indoor radon concentrations are higher during winter months because structures typically use less ventilation, and heating a structure induces the stack effect.\u003csup\u003e\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e\u003c/sup\u003e We required at least 10 measurements per ZCTA-month pair to ensure the inclusion of statistically reliable averages and CoVs. ZCTA-month pairs with fewer than 10 measurements were excluded from the analysis (n\u0026thinsp;=\u0026thinsp;46,473). After completing all data processing steps, the final dataset included 718,111 measurements across 1,542 ZCTAS from 2008 to 2021. This dataset was used to train and test predictive models.\u003c/p\u003e \u003cp\u003eThe processed residential radon measurements from 2008 to 2021 had an average concentration of 5.2 pCi/L, with a median of 2.6 pCi/L. The interquartile range spanned 1.4 pCi/L (25th percentile) to 5.3 pCi/L (75th percentile), highlighting significant variability in radon levels across Pennsylvania.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec11\" class=\"Section3\"\u003e \u003ch2\u003e3.1.2 Independent Variables\u003c/h2\u003e \u003cp\u003eThe elevation, soil, geochemical, hydrologic, and meteorological measures included in our model were reported across a range of spatial scales. Traditionally, to solve this issue, researchers aggregate measures to the spatial area of the dependent variable (i.e., ZCTA in our case). However, for measures such as soil data, for which there is a wide variability and more detailed information is available, this averaging results in less precise measures that decrease predictive accuracy. Additionally, in rural areas where housing may be sparsely spread across an area, averaging over a ZCTA incorporates information about soil characteristics that do not affect residential locations.\u003c/p\u003e \u003cp\u003eWe developed a three-step data engineering process to solve these data engineering challenges.\u003csup\u003e\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e\u003c/sup\u003e First, all measures were spatially indexed using H3 hexagons. Second, we used LandScan population data\u003csup\u003e\u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e\u003c/sup\u003e to mask daily population data (aggregated day and night population) from 2018 to identify all H3 hexagons with an established population (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). If a hexagon did not include a residential location, then the information from the hexagon was excluded from the aggregate statistics at the ZCTA level. Third, all data in H3 hexagons were converted to ZCTA (weighted average, standard deviation, and dominant condition for categorical variables) by using areal interpolation via PySAL's \u003cem\u003earea_interpolate()\u003c/em\u003e function. To effectively manage large data volumes, a raster was conducted in overlapping tiles to ensure comprehensive coverage of all hexagons with relevant raster data. Census data were measured at the ZCTA level.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThis study incorporated elevation data from the USGS\u0026rsquo;s Global Multi-Resolution Terrain Elevation Data 2010 (GMTED2010),\u003csup\u003e\u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e\u003c/sup\u003e which offers comprehensive global elevation information that is ideal for environmental and geological analyses. The GMTED2010 data for Pennsylvania was obtained in a 30 arc-second resolution and re-gridded into level-8 H3 hexagons using \u003cem\u003ethe geo_to_h3_aggregate()\u003c/em\u003e function from the \u003cem\u003eh3pandas\u003c/em\u003e library,\u003csup\u003e\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e,\u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e\u003c/sup\u003e which calculated mean elevation for each hexagon based on raster pixel centroids. To address missing hexagons caused by grid misalignment, a ring smoothing technique was used to average values from adjacent hexagons to fill the gaps.\u003c/p\u003e \u003cp\u003eThe soil data utilized in this study were extracted from the Gridded National Soil Survey Geographic Database (gNATSGO), which was provided by the USDA's Natural Resources Conservation Service.\u003csup\u003e\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e\u003c/sup\u003e The gNATSGO dataset included detailed soil characteristics relevant for assessing radon emanation, transport, and accumulation. The analysis utilized 10-meter resolution state-level data in an ESRI file geodatabase format, and the data were processed using the ArcGIS Toolbox Soil Data Development Toolkit from the USGS. Soil maps were generated for each soil characteristic, representing the dominant soil condition and the maximum available depth (200 cm). Individual rasters were reprojected to the EPSG:4326 coordinate reference system, and pixel values were extracted to construct a data frame that contained longitude and latitude. Zonal statistics for target hexagons were calculated using the \u003cem\u003egeo_to_h3_aggregate()\u003c/em\u003e function from the \u003cem\u003eh3pandas\u003c/em\u003e library.\u003c/p\u003e \u003cp\u003eThis study incorporated geochemical data from the USGS Geochemical and Mineralogical Survey,\u003csup\u003e\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e\u003c/sup\u003e and focused on uranium, potassium, and thorium concentrations across different soil depths (0\u0026ndash;5 cm, A horizon, C horizon). Because uranium is a direct source of radon, understanding its distribution is key for radon risk assessment. Geochemical data were processed into H3 hexagons to aggregate values at the ZCTA level, allowing for detailed spatial analysis. Categorical values from the GeoTIFF files were assigned using the \u003cem\u003egeo_to_h3_aggregate()\u003c/em\u003e function to obtain the most dominant condition within ZCTAs, thereby ensuring consistency across hexagons and enhancing the accuracy of radon risk predictions.\u003c/p\u003e \u003cp\u003eHydrologic data for this study were obtained from the USGS Hydrologic Landscape Regions, which provide detailed water-related landscape characteristics across the United States.\u003csup\u003e\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e\u003c/sup\u003e This dataset provides information on hydrological factors that affect radon transport through water and soil pathways. For the radon concentration prediction model, relevant hydrological variables were selected from an initial vector dataset at a 1 km \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\times\\:\\)\u003c/span\u003e\u003c/span\u003e 1 km resolution. The data were converted into H3 level-8 hexagons by using the \u003cem\u003epolyfill_resample()\u003c/em\u003e method from the \u003cem\u003eh3pandas\u003c/em\u003e library to ensure consistency with other datasets.\u003c/p\u003e \u003cp\u003eThe meteorological data used in the radon concentration prediction model were sourced from the Daymet dataset,\u003csup\u003e\u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e\u003c/sup\u003e which provides high-resolution climate information from across North America. Precipitation, snow water equivalent, temperature, and vapor pressure variables were prioritized due to their relevance in influencing radon levels. Daily 1-km grids from Daymet were aggregated into monthly averages using \u003cem\u003enumpy\u003c/em\u003e\u003csup\u003e\u003cem\u003e\u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e53\u003c/span\u003e\u003c/em\u003e\u003c/sup\u003e and then re-gridded to H3 level-8 using the \u003cem\u003earea_interpolate()\u003c/em\u003e function from the PySAL Tobler library.\u003csup\u003e\u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e54\u003c/span\u003e\u003c/sup\u003e\u003c/p\u003e \u003cp\u003eThe demographic and housing data used in this study were sourced from the American Community Survey (ACS) and Decennial Census (DEC) and accessed via the US Census Bureau\u0026rsquo;s API service.\u003csup\u003e\u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e55\u003c/span\u003e,\u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e56\u003c/span\u003e\u003c/sup\u003e These data were utilized to capture housing characteristics, which were hypothesized to be significant predictors of radon levels. Variables such as year of construction and primary heating fuel were collected from the 2000 DEC and the 2013\u0026ndash;2020 5-year ACS estimates at the ZCTA-level resolution. A summary of selected variables and the rationale for their inclusion in the models are provided in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eVariables included in the study for modeling radon level estimation and the descriptions of their relevance and mechanisms by which they may influence radon levels.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"2\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eVariable\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDescription\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBulk Density\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eReflects soil's ability to permit radon gas movement; denser soils may reduce radon's upward migration.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePercent Clay, Sand, Silt\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eThese factors affect soil permeability to radon gas, with coarser soils (higher sand content) generally allowing for easier radon passage.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDepth to Soil Restrictive Layer\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eIndicates potential barriers to radon movement toward the surface.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLinear Extensibility, Liquid Limit, Plasticity Index\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRelate to soil's expandability and water retention, impacting radon diffusion.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSurface Texture\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eInfluences the surface's ability to release or trap radon gas.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSoil Taxonomy Classification\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eProvides insights into the soil's overall characteristics that could affect radon emanation and transport.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAvailable Water Supply/Capacity\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eWater saturation levels can impact radon solubility and its movement through the soil.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eWater Content\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eWater content affects soil's radon transmission properties as wetter soils may impede radon flow.\u003csup\u003e\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e,\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHydric Rating by Map Unit\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eIdentifies water saturation in soil. Moisture content can influence radon solubility and transport.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHydrologic Soil Group\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eClassifies soils based on their drainage capacity, affecting radon's upward movement from soil to indoor environments.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSoil Moisture Class/Subclass\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eIndicates moisture content of soil, which is essential for understanding radon transport dynamics in different soil conditions.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSoil Temperature\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSoil temperature can affect diffusion and permeability of radon.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDrainage Class\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eEfficient drainage can reduce radon's upward movement, making these variables critical in predicting radon levels.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSaturated Hydraulic Conductivity\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eHigher conductivity suggests easier movement of water and possibly of radon through the soil.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eOrganic Matter\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eInfluences soil structure and hence radon migration, with higher organic content potentially impeding radon movement.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDwellings With/Without Basements\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eIndicates building characteristics that are directly related to potential indoor radon levels, as homes with basements are generally at higher risk.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eUranium Content\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRadon-222 is a decay product of uranium-238.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eThorium Content\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRadon-220 (or thoron) is a decay product of thorium-232.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePotassium Content\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAlthough potassium is not related to radon directly, it shows the characteristics of the soil or rock.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eClay Content (10 \u0026Aring;, 14 \u0026Aring;, and Kaolinite)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDifferent types of clay content can affect radon's retention and movement through soil, thereby influencing its diffusion and accumulation indoors.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDepth to Bed Rock\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDepth to bedrock can determine how easily radon migrates from the subsurface to the surface, thereby impacting potential radon exposure levels in buildings.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAquifer Permeability Class\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDetermines groundwater flow rates, which affect radon's transport from soil to water sources and potentially into buildings.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eElevation\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eInfluences atmospheric pressure variations, which can affect soil gas emissions, including radon.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRelief of Watershed\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eIndicates topographical variations that can influence radon gas accumulation and dispersion patterns.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePercent Flat Land in Watershed\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAffects water drainage and soil gas movement, thereby impacting radon release.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDaily Total Precipitation\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eImpacts soil moisture levels, which can affect radon solubility and mobility through the soil.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSnow Water Equivalent\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eReflects the amount of water contained in snowpack, which influences ground moisture and radon emission rates.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDaily Minimum/Maximum 2-meter Air Temperature\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAffects the thermal gradient between the ground and atmosphere, which influences radon diffusion. Also affects the ventilation habits that can affect accumulation of radon.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eVapor Pressure\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eVapor pressure can affect the moisture of soil, which can affect permeability of radon.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eOccupancy Status\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eUnoccupied houses may have higher radon concentrations because the ventilation systems may not operate regularly.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eYear Structure Built\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eOlder buildings may have more radon entry points due to structural degradation over time and might have different types of building materials.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHouse Heating Fuel\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDifferent heating systems can alter indoor air pressure and flow, thereby influencing radon entry and distribution. Also, heating fuel itself can be a potential source of radon.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003e3.2 Radon Level Estimation Models\u003c/h2\u003e \u003cp\u003eAlthough individual-level radon test results with zip codes were available, exact location information for each house tested was not provided. To protect privacy, the data provider masked the exact address of the test and released only the zip-codes. Although address masking was necessary for maintaining privacy, it reduced the accuracy of prediction algorithms, particularly in areas with high local variability. To address these limitations, a multi-model approach was employed to evaluate and compare the performance of various modeling strategies in the absence of precise location information. The RF algorithm\u003csup\u003e\u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e57\u003c/span\u003e\u003c/sup\u003e was selected as the base model for all comparisons, and allowing us to account for complex interaction effects between variables in our model. Additionally, the QRF model, an extension of RF, was for the individual-level analysis. This approach enabled a more detailed assessment by estimating the conditional distribution of radon concentrations.\u003csup\u003e\u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e58\u003c/span\u003e,\u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e59\u003c/span\u003e\u003c/sup\u003e The ability to estimate the conditional distribution makes QRF particularly effective for assessing exposure risks by modeling quantiles of radon exposure rather than just the mean or median concentration levels.\u003c/p\u003e \u003cp\u003eFor all models, the dependent and independent variables were measured at the ZCTA level because point-level information could not be assigned to individual radon tests. A comparison of three different models was used to provide a richer characterization of radon concentration risk within a ZCTA.\u003c/p\u003e \u003cp\u003eThe outcome for each model was as follows:\u003c/p\u003e \u003cp\u003e1. Average Model: The dependent variable is the mean of all individual-level radon tests within a ZCTA. The independent variables are the population-masked averages or most dominant conditions at the ZCTA level. This is the standard approach to modeling indoor radon risk.\u003c/p\u003e \u003cp\u003e2. Relative Variability Model: The dependent variable is the CoV of all individual-level radon tests within a ZCTA. The independent variables are the population-masked standard deviation at the ZCTA level. This model will allow us to identify ZCTAs with high levels of uncertainty when using the Average Model.\u003c/p\u003e \u003cp\u003e3. Individual QRF Model: The dependent variable is the median, 75th percentile, and 90th percentile of the individual level radon test results within a ZCTA. The independent variables are the population-masked averages at the ZCTA level. This model will allow us to identify ZCTAs that may have high levels of indoor radon exposure.\u003c/p\u003e \u003cp\u003eUsing multiple models enabled the quantification of model uncertainty in predicting indoor radon at the ZCTA-level. This ability, in turn, provides a framework for the interpretation of radon exposure levels in the absence of fine-scale spatial data.\u003c/p\u003e \u003cdiv id=\"Sec13\" class=\"Section3\"\u003e \u003ch2\u003e3.2.1 Average Model\u003c/h2\u003e \u003cp\u003eThe Average Model employed an RF algorithm to predict the mean radon concentration at the ZCTA level. In this approach, the average characteristics of a ZCTA were used to develop a predictive model that estimates the mean indoor radon exposure within that ZCTA. Radon measures were averaged within each ZCTA-month pair. For variables other than the radon measure, numerical variables were averaged across the months, and categorical variables were selected with the most frequent value within each ZCTA-month pair. This method preserved seasonal information by grouping the dataset by both ZCTAs and months. However, relying solely on aggregate values disregards the variability of radon concentrations within a ZCTA, potentially underestimating risk for high levels of indoor radon exposure in highly heterogenous areas. The model fit for these models is generally calculated using the mean radon level for the ZCTA. However, ecological approaches that aggregate values with a geographic area ignore the underlying variability in the indoor radon exposure known to exist within a ZCTA. To illustrate the limitations of an ecological approach, we recalculated the model fit using the individual radon tests as the observed value and the prediction from the Average Model.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section3\"\u003e \u003ch2\u003e3.2.2 Relative Variability Model\u003c/h2\u003e \u003cp\u003eIn the second model, we predicted the variability of indoor radon exposure in a ZCTA by using RF to identify characteristics of ZCTAs with a high variability of indoor radon exposure. In this model, the CoV was used for the radon measure values, defined as the ratio of the standard deviation to the mean for the outcome. We hypothesized that the variability of geographic, meteorological, and housing characteristics that are known to affect indoor radon exposure would be associated with the variability of radon exposure within a ZCTA. Because many of the factors could be measured at smaller spatial scales, we used the variability of these factors to predict the variability in indoor radon exposure. This model employed the standard deviation (for numerical variables) and entropy (for categorical variables) of the variables at ZCTA-level across the H3 level-8 scale information to provide a comprehensive view of radon level fluctuations. Owing to the lack of variability in the measures from the ACS and DEC, we excluded measures of variation for fuel type, age of structure, and occupancy status.\u003c/p\u003e \u003cp\u003eThis approach enabled the quantification of uncertainty in predicted averages based on the variability of the input data. The Relative Variability Model complemented the Average Model by identifying areas with highly heterogeneous indoor radon exposure\u0026mdash;a phenomenon that aggregate means alone could not capture.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section3\"\u003e \u003ch2\u003e3.2.3 Individual QRF Model\u003c/h2\u003e \u003cp\u003eThe QRF algorithm extends the RF algorithm by estimating the conditional distribution of a target variable. This nonparametric machine learning method provided a means to evaluate both the range and uncertainty of predictions. Unlike traditional RF models, which focus on predicting the mean outcome, QRF enables the prediction of any quantile within the target distribution. This feature could be particularly useful for capturing the variability and range of radon concentrations within a ZCTA, thereby offering a deeper understanding of factors that contribute to both higher average exposure and extreme values. Such an approach is critical for risk assessment and public health planning because it supports a more comprehensive evaluation of exposure risks and the identification of localized areas with high concentrations of radon.\u003c/p\u003e \u003cp\u003eIn previous studies, QRF has been applied to various environmental datasets after the introduction of the algorithm by Meinshausen\u003csup\u003e\u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e58\u003c/span\u003e\u003c/sup\u003e (2006). Work by Vaysse and Lagacherie\u003csup\u003e\u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e60\u003c/span\u003e\u003c/sup\u003e (2017) and Maxwell\u003csup\u003e\u003cspan citationid=\"CR61\" class=\"CitationRef\"\u003e61\u003c/span\u003e\u003c/sup\u003e (2021) applied QRF to geological studies. These studies highlighted the utility of QRF in capturing the distribution and the variabilities of target variables, making it an appropriate choice for radon prediction in this study. Individual QRF models estimated the conditional distribution of radon concentrations across different percentiles rather than focusing only on mean predictions.\u003c/p\u003e \u003cp\u003eThe quantile-forest python package was used to implement QRF.\u003csup\u003e\u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e59\u003c/span\u003e\u003c/sup\u003e The model was trained by using the same set of predictors as the RF models, including geological, meteorological, and building-specific factors. The QRF model's ability to predict quantiles of radon concentration predictions further enhances our analysis and enables more informed decision-making by demonstrating the range of potential exposure levels in a ZCTA. Owing to the intensive computational complexity of the QRF model, its processing time was significantly longer than that of the RF model. To address this, advanced computational resources provided by the Compute and Data Environment for Science (CADES) at Oak Ridge National Laboratory were utilized to efficiently perform the analysis.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section3\"\u003e \u003ch2\u003e3.2.4 Model Evaluation Metrics\u003c/h2\u003e \u003cp\u003eThe prediction performance of the RF and QRF models was evaluated by using a combination of metrics to assess both predictive accuracy and model interpretability.\u003c/p\u003e \u003cdiv id=\"Sec17\" class=\"Section4\"\u003e \u003ch2\u003e3.2.4.1 Predictive Accuracy Metrics\u003c/h2\u003e \u003cp\u003eFor the predictive accuracy metrics, root mean square error (RMSE), R-squared (R\u0026sup2;), and mean absolute percentage error (MAPE) were used. RMSE quantifies the average magnitude of prediction errors, providing a measure of overall accuracy. MAPE measures the average percentage deviation between predicted and actual radon concentrations, offering an intuitive understanding of prediction error magnitude.\u003c/p\u003e \u003cp\u003eTo account for potential spatial autocorrelation within ZCTAs, grouped 5-fold cross-validation (CV) was implemented, in which folds were created based on ZCTA groupings.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section4\"\u003e \u003ch2\u003e3.2.4.2 Permutation Feature Importance and Partial Dependence Plots\u003c/h2\u003e \u003cp\u003ePermutation feature importance and partial dependence plots (PDPs) were analyzed to identify important features in each model. Permutation importance quantifies the reduction in model performance when the values of a feature are randomly shuffled, thereby identifying the features that are most critical for capturing the radon levels.\u003c/p\u003e \u003cp\u003eTo address the issue of highly correlated variables that dilute feature importance, only one variable from each group of highly correlated variables (absolute correlation coefficient\u0026thinsp;\u0026gt;\u0026thinsp;0.85) was included in the model. Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e presents the groups of variables with absolute correlation coefficients that exceed 0.85. Additionally, variables such as hydrologic soil group and drainage class, which were already represented in aquifer permeability, were excluded to avoid redundancy.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eGroups of highly correlated variables (|r| \u0026gt; 0.85). Bolded variables are used in the model to analyze the permutation importance.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"2\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGroup\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eVariables\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u0026lsquo;Minimum elevation in watershed,\u0026rsquo; \u0026lsquo;\u003cb\u003eElevation\u0026rsquo;\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u0026lsquo;Available Water Capacity WTA, 0 to 200\u0026nbsp;cm,\u0026rsquo; \u0026lsquo;\u003cb\u003eAvailable Water Storage WTA, 0 to 200\u0026nbsp;cm\u003c/b\u003e,\u0026rsquo; \u0026lsquo;Available Water Supply, 0\u0026nbsp;to\u0026nbsp;25\u0026nbsp;cm,\u0026rsquo; \u0026lsquo;Available Water Supply, 0 to 50\u0026nbsp;cm,\u0026rsquo; \u0026lsquo;Available Water Supply, 0 to 150\u0026nbsp;cm,\u0026rsquo; \u0026lsquo;Available Water Supply, 0\u0026nbsp;to\u0026nbsp;100\u0026nbsp;cm,\u0026rsquo; \u0026lsquo;Water Content, 15 Bar WTA, 0 to 200\u0026nbsp;cm,\u0026rsquo; \u0026lsquo;Water Content, One-Third Bar WTA, 0 to 200\u0026nbsp;cm\u0026rsquo;\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u0026lsquo;Percent Clay WTA, 0 to 200\u0026nbsp;cm,\u0026rsquo; \u0026lsquo;Linear Extensibility WTA, 0\u0026nbsp;to\u0026nbsp;200\u0026nbsp;cm,\u0026rsquo; \u0026lsquo;Liquid Limit WTA, 0 to 200\u0026nbsp;cm,\u0026rsquo; \u0026lsquo;Linear Extensibility WTA, 0 to 200\u0026nbsp;cm,\u0026rsquo; \u0026lsquo;\u003cb\u003ePlasticity Index WTA, 0\u0026nbsp;to\u0026nbsp;200\u0026nbsp;cm\u003c/b\u003e,\u0026rsquo; \u0026lsquo;Percent Silt WTA, 0 to 200\u0026nbsp;cm\u0026rsquo;\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u0026lsquo;Saturated Hydraulic Conductivity (Ksat) WTA, 0 to 200\u0026nbsp;cm,\u0026rsquo; \u0026lsquo;\u003cb\u003eSaturated Hydraulic Conductivity (Ksat)\u003c/b\u003e, \u003cb\u003eStandard Classes WTA\u003c/b\u003e, \u003cb\u003e0 to 200\u0026nbsp;cm\u003c/b\u003e\u0026rsquo;\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003e\u0026lsquo;Percent flat land (slope less than 1%) in watershed lowland\u003c/b\u003e,\u0026rsquo; \u0026lsquo;Total percent flat land (slope less than 1%) in watershed\u0026rsquo;\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e6\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u0026lsquo;\u003cb\u003eMaximum air temperature\u003c/b\u003e,\u0026rsquo; \u0026lsquo;Minimum air temperature,\u0026rsquo; \u0026lsquo;Water vapor pressure\u0026rsquo;\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e7\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u0026lsquo;\u003cb\u003eEstimate Total Utility Gas\u003c/b\u003e,\u0026rsquo; \u0026lsquo;Estimate total fuel oil, kerosene, etc.\u0026rsquo;\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u0026lsquo;Potassium content in A horizon,\u0026rsquo; \u0026lsquo;\u003cb\u003ePotassium content in 0\u0026nbsp;to\u0026nbsp;5\u0026nbsp;cm depth\u003c/b\u003e\u0026rsquo;\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e9\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u0026lsquo;Thorium content in A horizon,\u0026rsquo; \u0026lsquo;\u003cb\u003eThorium content in 0\u0026nbsp;to\u0026nbsp;5\u0026nbsp;cm depth\u003c/b\u003e\u0026rsquo;\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u0026lsquo;Uranium content in A horizon,\u0026rsquo; \u0026lsquo;\u003cb\u003eUranium content in 0\u0026nbsp;to\u0026nbsp;5\u0026nbsp;cm depth\u003c/b\u003e\u0026rsquo;\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003ePDPs were generated to visualize the marginal effects of individual features on the predicted radon concentrations. PDPs helped illustrate the relationship between each feature and the model's predictions, revealing whether the relationship was linear, monotonic, or more complex. This visual analysis complemented the permutation feature importance metrics, offering a more intuitive interpretation of the model's behavior.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003c/div\u003e"},{"header":"4. Results and Discussion","content":"\u003cp\u003eThe Average Model and the Relative Variability Model were designed to capture different aspects of radon concentration within ZCTAs. The Average Model focused on predicting the mean radon concentration within a ZCTA, which provided an estimate of average radon levels across different areas. In contrast, the Relative Variability Model aimed to quantify the variation in radon concentrations within ZCTAs. By analyzing the CoV of indoor radon within a ZCTA, we can identify areas where radon concentrations may vary significantly, even if the average levels appear moderate. This approach supports targeted mitigation efforts by identifying locations with a higher likelihood of extreme radon exposure.\u003c/p\u003e\n\u003cp\u003eThe Average Model was designed to predict the average radon concentration within each ZCTA, using soil characteristics, climate variables, ACS and DEC data, and seasonal factors as predictors to examine how these factors influenced average radon levels within a ZCTA.\u003c/p\u003e\n\u003cp\u003eThe model test was conducted by applying 5 iterations of the 5-fold CV with ZCTA as the grouping variable and then without grouping variables. The exercise was conducted again to test the radon-level prediction data for individual houses when using the average model (Table\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003e).\u003c/p\u003e\n\u003cdiv class=\"gridtable\"\u003e\n \u003cdiv align=\"char\" class=\"colspec\"\u003e\u003cbr\u003e\u003c/div\u003e\u0026nbsp;\u003ctable id=\"Tab3\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e\n \u003cdiv class=\"CaptionContent\"\u003e\n \u003cp\u003eMetrics and their standard deviation of Average Model tested with average and with individual tests.\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003ccolgroup cols=\"5\"\u003e\u003c/colgroup\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\" rowspan=\"2\"\u003e\u0026nbsp;\u003c/th\u003e\n \u003cth align=\"left\" colspan=\"2\"\u003e\n \u003cp\u003eTested with average\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colspan=\"2\"\u003e\n \u003cp\u003eTested with individual\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003e5-fold CV\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eGroup 5-fold (ZCTA) CV\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003e5-fold CV\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eGroup 5-fold (ZCTA) CV\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eRMSE\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e2.67 (0.19)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e3.17 (0.14)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e7.80 (0.15)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e7.86 (0.30)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eR\u003c/strong\u003e\u003csup\u003e\u003cstrong\u003e\u003cspan class=\"CitationRef\"\u003e2\u003c/span\u003e\u003c/strong\u003e\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.67 (0.022)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.53 (0.021)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.12 (0.0020)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.10 (0.0079)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eMAPE\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e20.68 (0.42)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e27.71 (0.81)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e166 (1.59)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e167 (2.70)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e\n\u003cp\u003eThe metrics indicated fair predictive performance for the average radon level in the community but showed poor predictive performance for the radon level of individual houses. These results suggest significant variability in radon levels within each ZCTA. This limitation poses challenges from a public health perspective because some houses with high radon levels may remain untested due to low exposure levels in the surrounding areas, leading to gaps in risk identification and mitigation efforts.\u003c/p\u003e\n\u003cp\u003ePermutation feature importance was assessed to identify the variables from the dataset that significantly affected average radon concentration predictions (Fig. \u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003e). To examine the impact of specific predictors on radon levels, PDPs were generated for the most influential non-categorical variables (Fig. \u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003e). These plots provided a detailed view of the relationship between individual predictors and radon levels.\u003c/p\u003e\n\u003cp\u003ePermutation importance revealed that permeability, house heating fuel type, and hydraulic conductivity had the largest effects on determining the average radon level.\u003c/p\u003e\n\u003cp\u003eThe partial dependence of permeability suggests that increasing soil permeability or saturated hydraulic conductivity consistently results in higher average radon levels. Notable patterns are also observed in the PDPs for fuel types, where areas with higher usage of wood, coal, or coke as the primary fuel source exhibit elevated radon concentrations. In contrast, regions that use utility-provided gas heat as the main fuel type show lower radon levels. The partial dependence of relief indicates sharp increases in radon levels at lower relief values. Higher relief values have a diminishing effect beyond a certain point.\u003c/p\u003e\n\u003cp\u003eFigure \u003cspan class=\"InternalRef\"\u003e5\u003c/span\u003e illustrates the maps of the actual average radon levels alongside the averages of key variables by ZCTAs across Pennsylvania. The spatial distribution of average radon concentrations exhibited a notable resemblance to the permeability map. However, the map that shows utility gas as the primary fuel type indicates that urban areas generally have higher utility gas fuel usage and rural areas generally have higher wood fuel usage. In this study, we are unable to determine if fuel type directly influences radon concentrations or whether it serves as an indirect proxy for distinguishing urban from rural areas. Further analysis is warranted to discern the true relationship between fuel type and radon levels.\u003c/p\u003e\n\u003cp\u003eThe Average Model effectively predicted radon levels at the ZCTA level, providing a more refined analysis compared to county-level estimates. Offering predictions at the ZCTA level enabled a more detailed assessment of radon exposure risk at a smaller geographic scale. Given the significant variability in radon levels even within small areas, such granular predictions are crucial for accurate risk analysis and targeted public health interventions.\u003c/p\u003e\n\u003cp\u003eThe Average Model had a notable limitation in that it lacked detailed information on the variability of radon exposure within an area because all data were aggregated. As a result, the Average Model has a fair prediction accuracy for the radon levels of the community, but detailed predictions of radon levels for each house are not achievable. To address this, we used a Relative Variability Model for uncertainty quantification, enabling the identification of factors that predict high variability in ZCTA level measures.\u003c/p\u003e\n\u003cp\u003eAs with the Average Model, the Relative Variability Model test was conducted by applying 5 iterations of the 5-fold CV, both with ZCTA as the grouping variable and without grouping variables (Table \u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003e).\u003c/p\u003e\n\u003cdiv class=\"gridtable\"\u003e\n \u003cdiv align=\"char\" class=\"colspec\"\u003e\u003cbr\u003e\u003c/div\u003e\u0026nbsp;\u003ctable id=\"Tab4\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e\n \u003cdiv class=\"CaptionContent\"\u003e\n \u003cp\u003eMetrics and their standard deviation of the Relative Variability Model.\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003ccolgroup cols=\"3\"\u003e\u003c/colgroup\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\u0026nbsp;\u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003e5-fold CV\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eGroup 5-fold (ZCTA) CV\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eRMSE\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.23 (0.0055)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.27 (0.0059)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eR\u003c/strong\u003e\u003csup\u003e\u003cstrong\u003e\u003cspan class=\"CitationRef\"\u003e2\u003c/span\u003e\u003c/strong\u003e\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.46 (0.014)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.24 (0.094)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eMAPE\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e16.68 (0.38)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e20.76 (0.77)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e\n\u003cp\u003ePermutation feature importances and PDPs were evaluated with the Relative Variability Model to identify and illustrate the most influential factors that affect radon concentration variability, as shown in Fig. \u003cspan class=\"InternalRef\"\u003e6\u003c/span\u003e.\u003c/p\u003e\n\u003cp\u003eAlthough permeability and fuel type were identified as the most important variables that influence average radon levels, the factors that predict variability in radon levels are different (Fig. \u003cspan class=\"InternalRef\"\u003e7\u003c/span\u003e). We find that ZCTAs with large variability in elevation, saturated hydraulic conductivity, temperature, soil drainage, depth to restrictive soil layers, and/or soil moisture exhibit high levels of variability, or uncertainty, around average radon levels.\u003c/p\u003e\n\u003cp\u003ePartial dependence analysis of the standard deviation of elevation, hydraulic conductivity, maximum temperature, and soil moisture indicated that the variability of local radon concentrations initially increased with greater variation of these variables. However, this was non-linear, with the increases plateauing beyond a certain point. In contrast, uncertainty of radon continued to increase gradually with greater variability in soil depth to the restrictive soil level. Conversely, as the variability in soil drainage increased, the uncertainty in an average radon level showed a decrease.\u003c/p\u003e\n\u003cp\u003eFigure \u003cspan class=\"InternalRef\"\u003e8\u003c/span\u003e shows the CoV of radon levels by ZCTA and the standard deviations of key variables by ZCTAs across Pennsylvania. The correlation of the CoVs of radon measures to the standard deviations of variables are not as high as the mapped average levels.\u003c/p\u003e\n\u003cp\u003eThe Relative Variability Model added value by quantifying the range and distribution of radon levels, highlighting areas with significant fluctuations and variabilities in radon concentrations. This model can be used to identify the characteristics of the area that has a higher chance of possessing the house with exceeding the action level.\u003c/p\u003e\n\u003cp\u003eFigure \u003cspan class=\"InternalRef\"\u003e9\u003c/span\u003e illustrates how areas with high average radon concentrations can also have high variability of radon concentrations. Overall, the darker-colored areas on this map represent regions where testing for indoor radon exposure should be prioritized because there are both high average levels and high variability. The specific colors (ranging from shades of red to shades of blue) illustrate the characteristics of radon distribution in these areas. Regions shaded in red indicate areas with high variability.\u003c/p\u003e\n\u003cp\u003eMany areas exhibited similar patterns between the predicted average radon levels and the CoV of radon. However, some ZCTAs displayed distinct patterns of variability at the ZCTA level, even when their average radon levels were comparable. For example, ZCTA A (average: 10.58 pCi/L, CoV: 1.39), B (average: 9.80 pCi/L, CoV: 1.43), and C (average: 8.95 pCi/L, CoV: 1.43) had high radon levels and high CoV in the month of January for the 2008\u0026ndash;2021 data. January. ZCTA D (average: 10.84 pCi/L, CoV: 0.88), E (average: 10.86 pCi/L, CoV: 0.94), and F (average: 8.99 pCi/L, CoV: 0.90) showed high average radon levels but showed low CoVs of radon in January.\u003c/p\u003e\n\u003cp\u003eTable\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e5\u003c/span\u003e compares the standard deviation of key variables that have the highest importance in the Relative Variability Model. The standard deviation of elevation, hydraulic conductivity, and maximum temperature is much higher for the high variability groups than for the low variability groups. This result shows that, although the average level of radon is similar to other areas, the variability of the radon level itself can be different if the variability of the variables (e.g., elevation, hydraulic conductivity, maximum temperature) differ.\u003c/p\u003e\n\u003cdiv class=\"gridtable\"\u003e\n \u003cdiv align=\"char\" class=\"colspec\"\u003e\u003cbr\u003e\u003c/div\u003e\u0026nbsp;\u003ctable id=\"Tab5\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv class=\"CaptionNumber\"\u003eTable 5\u003c/div\u003e\n \u003cdiv class=\"CaptionContent\"\u003e\n \u003cp\u003eKey variability for ZCTAs with high radon. High vs. low variability groups in January.\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003ccolgroup cols=\"5\"\u003e\u003c/colgroup\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\u0026nbsp;\u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eZCTA\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eStandard deviation of elevation (m)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eStandard deviation of hydraulic conductivity (\u0026micro;m/s)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eStandard deviation of maximum temperature (\u0026deg;C)\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" rowspan=\"3\"\u003e\n \u003cp\u003e\u003cstrong\u003eHigh average and high variability\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e70.63765\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e12.27862\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.635485\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eB\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e32.44647\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e6.901275\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.22935\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e29.22913\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e5.436235\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.184695\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" rowspan=\"3\"\u003e\n \u003cp\u003e\u003cstrong\u003eHigh average and low variability\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eD\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e7.444461\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.842725\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.035577\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eE\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e3.86221\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.415112\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.010676\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eF\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e12.89145\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e2.60695\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.05758\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e\n\u003cp\u003eSome of the ZCTAs showed high variability of radon even though they had below-average radon levels. ZCTA G (average: 3.55 pCi/L, CoV: 1.36) and H (average: 3.60 pCi/L, CoV: 1.39) showed relatively low radon levels but high radon variability. H showed high variability in elevation (61.86 m), and G showed low variability in elevation and hydraulic conductivity.\u003c/p\u003e\n\u003cp\u003eThe Average Model and the Relative Variability Model do not require significant computational resources and are relatively simple models for predicting average radon levels and variability. However, they cannot capture the full spectrum of information available in the dataset. To provide a more detailed characterization of the distribution of indoor radon exposure within a ZCTA, a QRF Model was used to predict radon concentrations across different quantiles, thereby providing a comprehensive view of radon exposure risks. The QRF model implemented using the quantile-forest library extends the capabilities of traditional RF models by estimating the conditional distribution of radon levels rather than focusing solely on mean predictions. This method allows for a detailed assessment of radon risk by predicting a range of possible outcomes at various quantile levels within each ZCTA.\u003c/p\u003e\n\u003cp\u003eRMSE, R\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e2\u003c/span\u003e\u003c/sup\u003e, and MAPE for the 50th, 75th, and 90th percentiles were used to evaluate the performance of the QRF model (Table \u003cspan class=\"InternalRef\"\u003e6\u003c/span\u003e). The evaluation was conducted with a grouped 5-fold CV by using ZCTA as the group. Based on the same reasoning as averaging ZCTA-month pairs for the Average Model, each predicted quantile value was tested with the actual quantile value of each ZCTA-month pair. To estimate and compare the estimated 90th percentile value with the measured 90th percentile value, the model used a dataset in which there were at least 10 measures for each ZCTA-month pair.\u003c/p\u003e\n\u003cdiv class=\"gridtable\"\u003e\n \u003cdiv align=\"char\" class=\"colspec\"\u003e\u003cbr\u003e\u003c/div\u003e\u0026nbsp;\u003ctable id=\"Tab7\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv class=\"CaptionNumber\"\u003eTable 6\u003c/div\u003e\n \u003cdiv class=\"CaptionContent\"\u003e\n \u003cp\u003eMetrics and standard deviation of individual quantile regression forest (QRF) model.\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003ccolgroup cols=\"7\"\u003e\u003c/colgroup\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\" rowspan=\"2\"\u003e\u0026nbsp;\u003c/th\u003e\n \u003cth align=\"left\" colspan=\"2\"\u003e\n \u003cp\u003e50th\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colspan=\"2\"\u003e\n \u003cp\u003e75th\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colspan=\"2\"\u003e\n \u003cp\u003e90th\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003e5-fold CV\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eGroup 5-fold (ZCTA) CV\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003e5-fold CV\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eGroup 5-fold (ZCTA) CV\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003e5-fold CV\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eGroup 5-fold (ZCTA) CV\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eRMSE\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1.71 (0.026)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1.98 (0.32)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e3.76 (0.046)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e4.44 (0.51)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e7.14 (0.064)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e8.39 (0.58)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eR\u003c/strong\u003e\u003csup\u003e\u003cstrong\u003e\u003cspan class=\"CitationRef\"\u003e2\u003c/span\u003e\u003c/strong\u003e\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.51 (0.013)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.35 (0.024)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.61 (0.0078)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.46 (0.05)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.66 (0.0059)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.52 (0.032)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eMAPE\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e19.57 (0.15)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e21.37 (1.37)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e23.21 (0.22)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e30.19 (1.48)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e31.70 (0.62)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e42.31 (1.90)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e\n\u003cp\u003eThe importance of each variable on the QRF model was analyzed with permutation importance, and the permutation importance of the model was calculated for the 50th, 75th, and 90th percentiles (Figs. \u003cspan class=\"InternalRef\"\u003e10\u003c/span\u003e\u0026ndash;\u003cspan class=\"InternalRef\"\u003e12\u003c/span\u003e, respectively). The analysis revealed notable trends, including a decrease in the importance of temperature as the target percentile increased. In contrast, the importance of permeability, elevation, and relief remained relatively consistent across different percentiles.\u003c/p\u003e\n\u003cp\u003eWhen comparing partial dependence across quantiles, the partial dependence on permeability and relief remained relatively consistent, whereas the partial dependence on temperature or depth to soil-restrictive layer exhibited notable differences. For maximum temperature, permutation importance was highest at the 50th percentile but decreased at the 75th and 90th percentiles, a pattern reflected in its PDPs. Similarly, the importance of depth to the soil-restrictive layer was greatest at higher percentiles, with its PDPs showing the largest variation at these levels. These trends suggest that the variables influencing median radon levels may differ from those affecting higher concentrations.\u003c/p\u003e\n\u003cp\u003eThe Individual QRF Model effectively identified communities with a high likelihood of elevated radon levels. This model provided the added ability to predict various quantiles of radon levels, which was notably useful for identifying areas at higher risk for extreme exposures. The Relative Variability Model was able to identify the areas that might have high variability but was unable to point out the specific pattern or distribution of the radon level in the community.\u003c/p\u003e\n\u003cp\u003eWhen revisiting the ZCTAs of high average and high variability analyzed with the Average Model and the Relative Variability Model, the 90th percentile to 50th percentile ratio of January predicted from the QRF model are 5.80 (A), 5.28 (B), and 5.64 (C). ZCTAs of high average and low variability showed 4.37 (D), 4.16 (E), and 4.78 (F) for the 90th percentile to 50th percentile ratio of January prediction. The ratio of high average and high variability ZCTAs tends to be higher than the ratio of high average and low variability ZCTAs. Furthermore, the QRF model can provide detailed information about what cannot be provided from the Average Model and the Relative Variability Model.\u003c/p\u003e\n\u003cp\u003eWhen comparing the different models, each offered distinct strengths in assessing radon exposure. The Average Model provided a straightforward estimation of mean radon levels, which may be useful for some population-level risk analysis.\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e62\u003c/span\u003e,\u003cspan class=\"CitationRef\"\u003e63\u003c/span\u003e\u003c/sup\u003e However, the Average Model was limited in its ability to account for local variability or identify the areas with homes with extreme radon levels. The Relative Variability Model addressed this limitation by quantifying the spread of radon concentrations. The Individual QRF Model further expanded on the profiling of radon exposure at the ZCTA level by predicting various quantiles of radon levels, making it possible to identify areas at higher risk for extreme exposures.\u003c/p\u003e\n\u003cp\u003eBy providing predictions at the ZCTA level, these models offered a more refined risk assessment tool compared to traditional county-level analyses. This granularity is necessary for identifying high-risk areas and overall risk of areas. Identifying and predicting variability or uncertainty in radon estimates within ZCTAs is crucial because radon levels can differ significantly even within the same community. These differences result in substantial uncertainties in estimated risks, highlighting the need for more tailored public health advisories and interventions that address the uncertainties in risk profiles not only across the communities but also within communities.\u003c/p\u003e\n\u003cp\u003eThe Individual QRF Model\u0026rsquo;s detailed predictions, which include various quantiles, are particularly useful for identifying areas with elevated risk. For instance, areas identified with high upper percentile predictions can be prioritized for more aggressive radon reduction measures. This approach ensures that resources are allocated efficiently, targeting areas with the highest potential risk.\u003c/p\u003e\n\u003cp\u003eThis study offers several strengths compared to previous studies. The use of multiple methods for characterizing ZCTA-level estimates of indoor radon exposure risk allows for more informed decision-making that can be directly applied to community-level interventions. By employing RF and QRF models, the analysis leveraged the capabilities of machine learning to handle complex datasets and nonlinear relationships. Multiple evaluation metrics, including RMSE, R\u0026sup2;, and MAPE, were utilized to ensure a comprehensive assessment of model performance. Notably, the Individual QRF Model offered unique capabilities that enabled a more thorough understanding of residential radon exposure and informed strategies for radiation protection and mitigation.\u003c/p\u003e\n\u003cp\u003eAlthough this study advances radon exposure modeling and demonstrates the utility of machine learning for community-level risk assessment, the models and data presented some limitations. The reliance on ACS and DEC data assumed uniform distribution of variables, such as housing characteristics and demographics, across ZCTAs. This assumption may oversimplify local variability and fail to reflect the nuanced factors that influence radon exposure. Future studies that incorporate point-level data and more granular geographic information could address these limitations and enhance the models\u0026rsquo; accuracy and applicability.\u003c/p\u003e\n\u003cp\u003eFuture research should aim to integrate more detailed geographic and temporal data to enhance the accuracy of radon predictions. This could include finer-scale soil and building data as well as finer-scale and long-term radon measurements to account for temporal variability.\u003c/p\u003e\n\u003cp\u003eWhile this study focused on Pennsylvania, the methods developed can be applied to other regions with similar or different radon risk concerns. Expanding the geographic scope will validate the model\u0026rsquo;s applicability and help develop a comprehensive radon risk map for broader areas.\u003c/p\u003e\n\u003cp\u003eIntegrating radon predictions with other environmental hazards, such as air pollution and water quality, could provide a more detailed view of environmental health risks. This approach would help in designing multifaceted public health strategies that address multiple environmental factors simultaneously.\u003c/p\u003e\n\u003cp\u003eFurthermore, with more granular radon-level predictions, follow-up studies could offer more resources for linking radon exposure to lung cancer incidence at the ZCTA-level, which was not achievable from the previous study.\u003c/p\u003e"},{"header":"5. Conclusion","content":"\u003cp\u003eThe study demonstrates the utility of a deep characterization of potential radon exposure with multiple machine learning models, particularly RF and QRF. Describing radon exposure risks and the potential uncertainties in estimates will facilitate a more targeted deployment of public health strategies and policies. Study findings underscore the importance of updating radon risk assessments using current data, access to individual test results, and advanced modeling techniques to better assess health risks associated with radon exposure. Future research should focus on further refining these models and extending their application to broader geographic regions and other environmental hazards to comprehensively enhance environmental health risk assessment.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e \u003ch2\u003eCompeting interests\u003c/h2\u003e \u003cp\u003eThe authors declare that they have no competing interests.\u003c/p\u003e \u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eHL conducted the study design, processed the data, performed the analyses, interpreted the results, drafted the manuscript, and contributed to its editing. DM and JL assisted with analysis design, data processing, and manuscript editing. GA and SD contributed to the study design and manuscript revisions. HH was responsible for study design, project supervision, interpretation of results, and critical revision of the manuscript.\u003c/p\u003e\u003ch2\u003eAcknowledgements\u003c/h2\u003e \u003cp\u003eThis work was supported by the Office of Biological and Environmental Research\u0026rsquo;s Biological Systems Science Division. This manuscript has been authored by UT-Battelle LLC under Contract No. DE-AC05-00OR22725 with the US Department of Energy and Award AWD-002827 between UT-Battelle and the Georgia Tech Research Corporation. This research used resources of CADES at the Oak Ridge National Laboratory, which is supported by the US Department of Energy\u0026rsquo;s Office of Science under Contract No. DE-AC05-00OR22725.\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eAll data generated or analysed during this study are included in this published article.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eWall, B. F. \u003cem\u003eIonising radiation exposure of the population of the United States: NCRP Report No. 160\u003c/em\u003e (Oxford University Press, 2009).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOrganization, W. H. \u003cem\u003eWHO handbook on indoor radon: a public health perspective\u003c/em\u003e (World Health Organization, 2009).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTirmarche, M. et al. ICRP Publication 115. Lung cancer risk from radon and progeny and statement on radon. \u003cem\u003eAnn. ICRP\u003c/em\u003e. \u003cb\u003e40\u003c/b\u003e, 1\u0026ndash;64 (2010).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKim, S. H., Park, J. M. \u0026amp; Kim, H. The prevalence of stroke according to indoor radon concentration in South Koreans: Nationwide cross section study. \u003cem\u003eMed. (Baltim).\u003c/em\u003e \u003cb\u003e99\u003c/b\u003e, e18859. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1097/MD.0000000000018859\u003c/span\u003e\u003cspan address=\"10.1097/MD.0000000000018859\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDong, S. et al. Synergistic Effects of Particle Radioactivity (Gross beta Activity) and Particulate Matter =2.5 mum Aerodynamic Diameter on Cardiovascular Disease Mortality\u0026lt;/at. \u003cem\u003eJ. Am. Heart Assoc.\u003c/em\u003e \u003cb\u003e11\u003c/b\u003e, e025470. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1161/JAHA.121.025470\u003c/span\u003e\u003cspan address=\"10.1161/JAHA.121.025470\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLee, H. et al. Evaluating county-level lung cancer incidence from environmental radiation exposure, PM(2.5), and other exposures with regression and machine learning models. \u003cem\u003eEnviron. Geochem. Health\u003c/em\u003e. \u003cb\u003e46\u003c/b\u003e, 82. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s10653-023-01820-4\u003c/span\u003e\u003cspan address=\"10.1007/s10653-023-01820-4\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAl-Zoughool, M. \u0026amp; Krewski, D. Health effects of radon: a review of the literature. \u003cem\u003eInt. J. Radiat. Biol.\u003c/em\u003e \u003cb\u003e85\u003c/b\u003e, 57\u0026ndash;69. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1080/09553000802635054\u003c/span\u003e\u003cspan address=\"10.1080/09553000802635054\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2009).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCouncil, N. R. \u003cem\u003eHealth effects of exposure to radon: BEIR VI\u003c/em\u003e (National Academies, 1999).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKang, J. K., Seo, S. \u0026amp; Jin, Y. W. Health Effects of Radon Exposure. \u003cem\u003eYonsei Med. J.\u003c/em\u003e \u003cb\u003e60\u003c/b\u003e, 597\u0026ndash;603. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3349/ymj.2019.60.7.597\u003c/span\u003e\u003cspan address=\"10.3349/ymj.2019.60.7.597\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRichardson, D. B. et al. Mortality among uranium miners in North America and Europe: the Pooled Uranium Miners Analysis (PUMA). \u003cem\u003eInt. J. Epidemiol.\u003c/em\u003e \u003cb\u003e50\u003c/b\u003e, 633\u0026ndash;643. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/ije/dyaa195\u003c/span\u003e\u003cspan address=\"10.1093/ije/dyaa195\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLagarde, F. et al. Glass-based radon-exposure assessment and lung cancer risk. \u003cem\u003eJ. Expo. Sci. Environ. Epidemiol.\u003c/em\u003e \u003cb\u003e12\u003c/b\u003e, 344\u0026ndash;354 (2002).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePark, N. W., Kim, Y., Chang, B. U. \u0026amp; Kwak, G. H. County-level indoor radon concentration mapping and uncertainty assessment in South Korea using geostatistical simulation and environmental factors. \u003cem\u003eJ. Environ. Radioact.\u003c/em\u003e \u003cb\u003e208\u003c/b\u003e, 106044 (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFujimoto, K. \u0026amp; Sanada, T. Dependence of indoor radon concentration on the year of house construction. \u003cem\u003eHealth Phys.\u003c/em\u003e \u003cb\u003e77\u003c/b\u003e, 410\u0026ndash;419 (1999).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSmith, B. J. \u0026amp; Field, R. W. Effect of housing factors and surficial uranium on the spatial prediction of residential radon in Iowa. \u003cem\u003eEnvironmetrics\u003c/em\u003e \u003cb\u003e18\u003c/b\u003e, 481\u0026ndash;497. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1002/env.816\u003c/span\u003e\u003cspan address=\"10.1002/env.816\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2006).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAbergel, R. et al. The enduring legacy of Marie Curie: impacts of radium in 21st century radiological and medical sciences. \u003cem\u003eInt. J. Radiat. Biol.\u003c/em\u003e \u003cb\u003e98\u003c/b\u003e, 267\u0026ndash;275. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1080/09553002.2022.2027542\u003c/span\u003e\u003cspan address=\"10.1080/09553002.2022.2027542\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGundersen, L. C. et al. Geology of radon in the United States. (1992).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOtton, J. K. The geology of radon. (1992).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBulut, H. A., Şahin, R. \u0026amp; Radon Concrete, Buildings and Human Health\u0026mdash;A Review Study. \u003cem\u003eBuildings\u003c/em\u003e \u003cb\u003e14\u003c/b\u003e, 510 (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMustonen, R. Natural radioactivity in and radon exhalation from Finnish building materials. \u003cem\u003eHealth Phys.\u003c/em\u003e \u003cb\u003e46\u003c/b\u003e, 1195\u0026ndash;1203 (1984).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMarcinowski, F., Lucas, R. M. \u0026amp; Yeager, W. M. National and regional distributions of airborne radon concentrations in US homes. \u003cem\u003eHealth Phys.\u003c/em\u003e \u003cb\u003e66\u003c/b\u003e, 699\u0026ndash;706 (1994).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYazzie, S. A., Davis, S., Seixas, N. \u0026amp; Yost, M. G. Assessing the Impact of Housing Features and Environmental Factors on Home Indoor Radon Concentration Levels on the Navajo Nation. \u003cem\u003eInt. J. Environ. Res. Public. Health\u003c/em\u003e. \u003cb\u003e17\u003c/b\u003e \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/ijerph17082813\u003c/span\u003e\u003cspan address=\"10.3390/ijerph17082813\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSun, K., Guo, Q. \u0026amp; Cheng, J. The Effect of Some Soil Characteristics on Soil Radon Concentration and Radon Exhalation from Soil Surface. \u003cem\u003eJ. Nucl. Sci. Technol.\u003c/em\u003e \u003cb\u003e41\u003c/b\u003e, 1113\u0026ndash;1117. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1080/18811248.2004.9726337\u003c/span\u003e\u003cspan address=\"10.1080/18811248.2004.9726337\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2004).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMose, D. G. \u0026amp; Mushrush, G. W. Prediction of indoor radon based on soil radon and soil permeability. \u003cem\u003eJ. Environ. Sci. Health Part. A\u003c/em\u003e. \u003cb\u003e34\u003c/b\u003e, 1253\u0026ndash;1266. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1080/10934529909376894\u003c/span\u003e\u003cspan address=\"10.1080/10934529909376894\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (1999).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHassan, N. M. et al. Radon migration process and its influence factors; review. \u003cem\u003eJapanese J. Health Phys.\u003c/em\u003e \u003cb\u003e44\u003c/b\u003e, 218\u0026ndash;231 (2009).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKhattak, N., Khan, M. A., Ali, N. \u0026amp; Abbas, S. M. Radon Monitoring for geological exploration: A review. \u003cem\u003eJ. Himal. Earth Sci.\u003c/em\u003e \u003cb\u003e44\u003c/b\u003e, 91\u0026ndash;102 (2011).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNunes, L. J. R., Curado, A., Graca, L., Soares, S. \u0026amp; Lopes, S. I. Impacts of Indoor Radon on Health: A Comprehensive Review on Causes, Assessment and Remediation Strategies. \u003cem\u003eInt. J. Environ. Res. Public. Health\u003c/em\u003e. \u003cb\u003e19\u003c/b\u003e \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/ijerph19073929\u003c/span\u003e\u003cspan address=\"10.3390/ijerph19073929\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eŞen, G. Y., I\u0026ccedil;hedef, M., Sa\u0026ccedil;, M. M. \u0026amp; Yener, G. Effect of natural gas usage on indoor radon levels. \u003cem\u003eJ. Radioanal. Nucl. Chem.\u003c/em\u003e \u003cb\u003e295\u003c/b\u003e, 277\u0026ndash;282. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s10967-012-1841-8\u003c/span\u003e\u003cspan address=\"10.1007/s10967-012-1841-8\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2012).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYang, J. et al. Modeling of radon exhalation from soil influenced by environmental parameters. \u003cem\u003eSci. Total Environ.\u003c/em\u003e \u003cb\u003e656\u003c/b\u003e, 1304\u0026ndash;1311. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.scitotenv.2018.11.464\u003c/span\u003e\u003cspan address=\"10.1016/j.scitotenv.2018.11.464\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBochicchio, F. et al. Annual average and seasonal variations of residential radon concentration for all the Italian Regions. \u003cem\u003eRadiat. Meas.\u003c/em\u003e \u003cb\u003e40\u003c/b\u003e, 686\u0026ndash;694 (2005).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMiles, J. C., Howarth, C. B. \u0026amp; Hunter, N. Seasonal variation of radon concentrations in UK homes. \u003cem\u003eJ. Radiol. Prot.\u003c/em\u003e \u003cb\u003e32\u003c/b\u003e, 275\u0026ndash;287. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1088/0952-4746/32/3/275\u003c/span\u003e\u003cspan address=\"10.1088/0952-4746/32/3/275\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2012).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePorstendorfer, J., Butterweck, G. \u0026amp; Reineking, A. Daily variation of the radon concentration indoors and outdoors and the influence of meteorological parameters. \u003cem\u003eHealth Phys.\u003c/em\u003e \u003cb\u003e67\u003c/b\u003e, 283\u0026ndash;287 (1994).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRey, J. F. et al. Long-term impacts of weather conditions on indoor radon concentration measurements in Switzerland. \u003cem\u003eAtmosphere\u003c/em\u003e \u003cb\u003e13\u003c/b\u003e, 92 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEnvironmental Proteciton Agency. EPA Maps of Radon Zones and Supporting Documents by State. (1993).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePrice, P. N., Nero, A. V. \u0026amp; Gelman, A. Bayesian prediction of mean indoor radon concentrations for Minnesota counties. \u003cem\u003eHealth Phys.\u003c/em\u003e \u003cb\u003e71\u003c/b\u003e, 922\u0026ndash;936 (1996).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePrice, P. Predictions and maps of county mean indoor radon concentrations in the mid-Atlantic states. \u003cem\u003eHealth Phys.\u003c/em\u003e \u003cb\u003e72\u003c/b\u003e, 893\u0026ndash;906 (1997).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eApte, M., Price, P., Nero, A. \u0026amp; Revzan, K. Predicting New Hampshire indoor radon concentrations from geologic information and other covariates. \u003cem\u003eEnviron. Geol.\u003c/em\u003e \u003cb\u003e37\u003c/b\u003e, 181\u0026ndash;194 (1999).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCasey, J. A. et al. Predictors of Indoor Radon Concentrations in Pennsylvania, 1989\u0026ndash;2013. \u003cem\u003eEnviron. Health Perspect.\u003c/em\u003e \u003cb\u003e123\u003c/b\u003e, 1130\u0026ndash;1137. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1289/ehp.1409014\u003c/span\u003e\u003cspan address=\"10.1289/ehp.1409014\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2015).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKropat, G. et al. Improved predictive mapping of indoor radon concentrations using ensemble regression trees based on automatic clustering of geological units. \u003cem\u003eJ. Environ. Radioact\u003c/em\u003e. \u003cb\u003e147\u003c/b\u003e, 51\u0026ndash;62. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.jenvrad.2015.05.006\u003c/span\u003e\u003cspan address=\"10.1016/j.jenvrad.2015.05.006\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2015).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNikkila, A. et al. Predicting residential radon concentrations in Finland: Model development, validation, and application to childhood leukemia. \u003cem\u003eScand. J. Work Environ. Health\u003c/em\u003e. \u003cb\u003e46\u003c/b\u003e, 278\u0026ndash;292. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.5271/sjweh.3867\u003c/span\u003e\u003cspan address=\"10.5271/sjweh.3867\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDai, D. et al. Confluent impact of housing and geology on indoor radon concentrations in Atlanta, Georgia, United States. \u003cem\u003eSci. Total Environ.\u003c/em\u003e \u003cb\u003e668\u003c/b\u003e, 500\u0026ndash;511. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.scitotenv.2019.02.257\u003c/span\u003e\u003cspan address=\"10.1016/j.scitotenv.2019.02.257\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi, L. et al. Predicting Monthly Community-Level Domestic Radon Concentrations in the Greater Boston Area with an Ensemble Learning Model. \u003cem\u003eEnviron. Sci. Technol.\u003c/em\u003e \u003cb\u003e55\u003c/b\u003e, 7157\u0026ndash;7166. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1021/acs.est.0c08792\u003c/span\u003e\u003cspan address=\"10.1021/acs.est.0c08792\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eUBER. \u003cem\u003eH3: Uber\u0026rsquo;s Hexagonal Hierarchical Spatial Index\u003c/em\u003e, \u0026lt;\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.uber.com/blog/h3/\u003c/span\u003e\u003cspan address=\"https://www.uber.com/blog/h3/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMaguire, D., Logan, J., Lee, H. \u0026amp; Hanson, H. Radon Exposure Dataset. \u003cem\u003earXiv preprint arXiv:2505.09489\u003c/em\u003e (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePennsylvania Department of Enviornmental Protection. Radon Test Results September 1986 - Current Annual County Environmental Protection. October 13, (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAdministration, H. R. \u0026amp; a. S. \u003cem\u003eUDS Mapper\u003c/em\u003e, (2023). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://www.udsmapper.org/\u003c/span\u003e\u003cspan address=\"http://www.udsmapper.org/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWeber, E. et al. \u003cem\u003eLandScan USA\u003c/em\u003e (Oak Ridge National Laboratory, 2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDanielson, J. J. \u0026amp; Gesch, D. B. \u003cem\u003eGlobal multi-resolution terrain elevation data 2010 (GMTED2010). Report No. 2331\u0026thinsp;\u0026ndash;\u0026thinsp;1258\u003c/em\u003e (US Geological Survey, 2011).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDahn \u003cem\u003eH3-Pandas\u003c/em\u003e, (2021). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://h3-pandas.readthedocs.io/en/latest/\u003c/span\u003e\u003cspan address=\"https://h3-pandas.readthedocs.io/en/latest/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eStaff, S. S. (ed) (ed United States Department of Agriculture).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSmith, D. B., Solano, F., Woodruff, L. G., Cannon, W. F. \u0026amp; Ellefsen, K. J. Geochemical and mineralogical maps, with interpretation, for soils of the conterminous United States. \u003cem\u003eReport. Reston, VA\u003c/em\u003e (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWieczorek, M. E. \u0026amp; a. L., A.EU.S. Geological Survey data release,. (2010).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eThornton, M. et al. Daymet: Daily surface weather data on a 1-km grid for North America, version 4 R1. \u003cem\u003eORNL DAAC, Oak Ridge, Tennessee, USA. Single Pixel Extraction Tool| Daymet (ornl. gov)\u003c/em\u003e (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHarris, C. R. et al. Array programming with NumPy. \u003cem\u003eNature\u003c/em\u003e \u003cb\u003e585\u003c/b\u003e, 357\u0026ndash;362 (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRey, S. J. \u0026amp; Anselin, L. \u003cem\u003ein Handbook of applied spatial analysis: Software tools, methods and applications\u003c/em\u003e175\u0026ndash;193 (Springer, 2009).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eUnited State Census Bureau. YEAR STRUCTURE BUILT. American Community Survey, ACS 5-Year Estimates Detailed Tables, Table B25034, (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eUnited State Census Bureau. YEAR STRUCTURE BUILT [10], Decennial Census, DEC Summary File 3, Table H034. (2000). (2000).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePedregosa, F. et al. Scikit-learn: Machine learning in Python. \u003cem\u003eJ. Mach. Learn. Res.\u003c/em\u003e \u003cb\u003e12\u003c/b\u003e, 2825\u0026ndash;2830 (2011).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMeinshausen, N. \u0026amp; Ridgeway, G. Quantile regression forests. \u003cem\u003eJournal Mach. Learn. research\u003c/em\u003e \u003cb\u003e7\u003c/b\u003e (2006).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJohnson, R. A. quantile-forest: A python package for quantile regression forests. \u003cem\u003eJ. Open. Source Softw.\u003c/em\u003e \u003cb\u003e9\u003c/b\u003e, 5976 (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVaysse, K. \u0026amp; Lagacherie, P. Using quantile regression forest to estimate uncertainty of digital soil mapping products. \u003cem\u003eGeoderma\u003c/em\u003e \u003cb\u003e291\u003c/b\u003e, 55\u0026ndash;64 (2017).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMaxwell, K., Rajabi, M. \u0026amp; Esterle, J. Spatial interpolation of coal properties using geographic quantile regression forest. \u003cem\u003eInt. J. Coal Geol.\u003c/em\u003e \u003cb\u003e248\u003c/b\u003e, 103869 (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLubin, J. H. \u0026amp; Boice, J. D. Jr Lung cancer risk from residential radon: meta-analysis of eight epidemiologic studies. \u003cem\u003eJ. Natl Cancer Inst.\u003c/em\u003e \u003cb\u003e89\u003c/b\u003e, 49\u0026ndash;57 (1997).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAjrouche, R. et al. Quantitative health risk assessment of indoor radon: a systematic review. \u003cem\u003eRadiat. Prot. Dosimetry\u003c/em\u003e. \u003cb\u003e177\u003c/b\u003e, 69\u0026ndash;77 (2017).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAppendix.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAnother random. forest (RF) model that uses the individual-level data was investigated to test the adoptability of predicting the radon level of individual houses by using the aggregated independent variables.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFor this model, RF was used similarly to the previous Average Model and Relative Variability Model. Performance metrics such as root mean square error (RMSE), R-squared (R\u0026sup2;), and mean absolute percentage error (MAPE) were used to evaluate the model's performance, and the evaluation was conducted by using 5 iterations of group 5-fold cross-validation (CV) with zip-code tabulation area (ZCTA) as the grouping variable. (Table A1) The metrics indicated poor predictive performance, suggesting significant variability in radon levels within each ZCTA, similar to the Average Model. Predicting high-variability measures from aggregated variables resulted in poor accuracy and highlighted the need for non-aggregated variables or other approaches.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Radon, Geology, Prediction model, Machine Learning, ZCTA-level predictions, environmental health","lastPublishedDoi":"10.21203/rs.3.rs-6857670/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6857670/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eBackground:\u003c/h2\u003e \u003cp\u003eRadon is a naturally occurring radioactive gas that poses a serious health risk as the primary cause of lung cancer in non-smokers. Despite the well-known adverse association with health outcomes, current radon exposure assessments are limited to county-level or average-level estimates, which fail to capture regional variability.\u003c/p\u003e\u003ch2\u003eObjective:\u003c/h2\u003e \u003cp\u003eThis study aims to capture the regional variability at ZCTA-level.\u003c/p\u003e\u003ch2\u003eMethods:\u003c/h2\u003e \u003cp\u003eThis study uses ML models, including RF and QRF, to predict the indoor radon concentrations at the ZCTA-level and characterize uncertainties in model estimates. Incorporating geological, meteorological, and building-specific data, the models aim to improve radon risk assessment by capturing mean exposure, variability, and extreme concentration levels. Processed radon test data (n\u0026thinsp;=\u0026thinsp;718,111) were analyzed using average, variability, and quantile prediction methods.\u003c/p\u003e\u003ch2\u003eResults:\u003c/h2\u003e \u003cp\u003eModels that predict the average radon exposure at the ZCTA-level can yield promising model-fit results, but they do not capture the underlying variability of indoor radon exposure within a ZCTA. We utilize volatility analyses to identify characteristics indicative of high variability of indoor radon exposure. We also show that a QRF model can be used to predict upper quantiles of residential radon exposure, thereby uncovering localized areas of elevated exposure that were not apparent in mean estimates. The results highlighted the need for a deep characterization of exposure risk and show that regions with moderate average exposure levels could still harbor extreme outliers with implications for evaluating health risks.\u003c/p\u003e\u003ch2\u003eConclusion:\u003c/h2\u003e \u003cp\u003eUtilizing multiple radon exposure models allows for a deeper characterization of radon risk within a geographic area and can better identify high-risk areas. The results from this study provide a foundation for developing mitigation strategies and examining associations between radon exposure and health outcomes at fine scales. Future research should extend the geographic scope and incorporate additional environmental risk factors to establish a comprehensive framework for risk assessment.\u003c/p\u003e","manuscriptTitle":"Quantifying Uncertainty in Indoor Radon Exposure Estimates in Pennsylvania with Quantile Regression Forests","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-06-26 10:03:07","doi":"10.21203/rs.3.rs-6857670/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-10-07T09:18:41+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-10-07T07:16:57+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"292692057966063921430680517116580991310","date":"2025-09-15T08:39:43+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-08-30T03:35:19+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"92609250459706864473759947195068300143","date":"2025-07-24T13:30:01+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"196031407347527848037642711710791831933","date":"2025-07-11T16:17:50+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-06-23T08:28:28+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-06-23T08:19:26+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2025-06-12T08:25:41+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-06-12T03:43:52+00:00","index":"","fulltext":""},{"type":"submitted","content":"Scientific Reports","date":"2025-06-09T23:12:52+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"a61c246e-c8c2-48c7-bb83-630366dce1b3","owner":[],"postedDate":"June 26th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":50549962,"name":"Earth and environmental sciences/Environmental sciences"},{"id":50549963,"name":"Health sciences/Risk factors"}],"tags":[],"updatedAt":"2026-03-09T16:06:32+00:00","versionOfRecord":{"articleIdentity":"rs-6857670","link":"https://doi.org/10.1038/s41598-026-37891-3","journal":{"identity":"scientific-reports","isVorOnly":false,"title":"Scientific Reports"},"publishedOn":"2026-03-05 15:59:11","publishedOnDateReadable":"March 5th, 2026"},"versionCreatedAt":"2025-06-26 10:03:07","video":"","vorDoi":"10.1038/s41598-026-37891-3","vorDoiUrl":"https://doi.org/10.1038/s41598-026-37891-3","workflowStages":[]},"version":"v1","identity":"rs-6857670","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6857670","identity":"rs-6857670","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.