A Spatially Generalizable Neural-Stacked Ensemble Framework for Wildfire Susceptibility Prediction in California

preprint OA: closed
Full text JSON View at publisher
Full text 79,077 characters · extracted from preprint-html · click to expand
A Spatially Generalizable Neural-Stacked Ensemble Framework for Wildfire Susceptibility Prediction in California | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article A Spatially Generalizable Neural-Stacked Ensemble Framework for Wildfire Susceptibility Prediction in California Mofasser Bin Hossain Maruf, Monoarul Haq Omy, Rafid Khandaker, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9586702/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Interacting climatic, biological, and human variables have increased the frequency and impact of wildfires throughout California. By combining long-term fire records (1910–2022) with six remote-sensing and environmental predictors (NDVI, EVI, precipitation, elevation, land surface temperature, and surface air temperature) standardized to a 1 km grid, this study creates a spatially robust ensemble machine-learning framework to model wildfire susceptibility. Spatial block cross-validation was used to assess a balanced presence-absence dataset (n = 1,094) in order to lessen autocorrelation bias. Mean and neural-network stacking ensembles were compared with five basic classifiers: Decision Tree, Random Forest, XGBoost, LightGBM, and Support Vector Machine. According to the results, ensemble methods perform better than single models; the stacking ensemble performs the best (AUC = 0.889; R2 ≈ 0.51). Strong predictive performance was also shown by the Random Forest and gradient-boosted models, however the Decision Tree model alone shown less generalization. The results validate that the combination of geographical validation and ensemble learning enhances the generalizability and reliability of mapping wildfire vulnerability. This framework supports California's climate adaptation plans, mitigation planning, and wildfire risk assessment by offering a scalable and practically applicable tool. Artificial Intelligence and Machine Learning Geographic Information Systems Climate Analysis and Modeling Wildfire susceptibility Ensemble machine learning Remote sensing predictors Geospatial modeling Environmental risk mapping Neural network stacking Figures Figure 1 Figure 2 1. Introduction Wildfires have become one of the biggest environmental and socio-economic challenges in California over the past twenty years. The state has seen a significant rise in the frequency, intensity, and area affected by large fires, especially those over 10,000 hectares. Historical analyses show that while extreme fire events are not entirely new, the clustering and increase of large wildfires in recent years, particularly in 2018 and 2020, mark a clear change in fire patterns (Keeley & Syphard, 2021 ). Long drought conditions, measured by indices like the Palmer Drought Severity Index (PDSI), have been closely linked to spikes in large wildfire events (Keeley & Syphard, 2021 ). At the same time, climate change, fuel buildup, land management practices, and human-caused ignitions have raised fire risks across various Californian ecosystems. Human-induced climatic change may, over a relatively short time period (< 100 years), give rise to climates outside anything experienced in California since the establishment of an industrial civilization currently sustaining a state population that has increased approximately 41,000% since 1850 (Westerling et al., 2011). Artificial intelligence has been applied in wildfire science and management since the 1990s, with early applications including neural networks and expert systems. Since then the field has rapidly progressed congruently with the wide adoption of machine learning (ML) methods in the environmental sciences (Jain et al., 2020 ). The use of algorithms in scientific inquiry is not new. An algorithm is simply a series of steps or set of rules that carries out an action or solves a problem. Any model that utilizes a simulation employs an algorithm. Algorithms on their own are not explanations. It is only when algorithmic models are used to answer a question about some event or phenomenon that they explain. Some examples: How is it possible that the eye evolved in so many diverse systems? Why is segregation so prevalent? Or what effect does carbon dioxide have on current and future weather patterns? Other distinguishing features of explanation (causal, counter-factual, law covering, and so on) are still widely discussed. As a starting point, I will adopt the increasingly common view that explanation aims at understanding (De Regt [2017]); slogan: explaining why helps us to understand why (Sullivan, 2022 ). Recent improvements in remote sensing, geospatial analytics, and machine learning (ML) offer new chances to model and forecast wildfire occurrence and spread. Studies have shown that ML algorithms, including Random Forest, Support Vector Machines, and ensemble methods, are useful for mapping wildfire susceptibility and predicting fire spread the next day (Huot et al., 2022 ; Pandey et al., 2019 ). Additionally, comparisons of ML algorithms reveal that they perform better than traditional statistical methods when dealing with complex interactions among weather, terrain, and vegetation factors (Pandey et al., 2019 ). However, even with substantial progress, there are still issues such as inconsistent methods, limited applicability, and a lack of integration of historical drought-fire relationships in current wildfire prediction models. This study aims to create a strong machine learning-based wildfire prediction system for. The research not only improves predictive power but also places wildfire modeling in a larger environmental context. California’s wildfire crisis has grown in both size and complexity, leading to severe ecological damage, loss of infrastructure, and displacement of people. The 2020 fire season recorded an unprecedented number of large fires, highlighting the increasing wildfire risks (Keeley & Syphard, 2021 ). Traditional fire risk assessment models rely mainly on straightforward statistical methods. These often do not account for complex interactions among climate, physical, and human factors. While drought has long been linked to large fire events (Keeley & Syphard, 2021 ), current predictive models rarely combine long-term fire history with real-time remote sensing data. Additionally, machine learning models have shown promise in predicting wildfire spread (Huot et al., 2022 ), but their use in California often lacks thorough validation and comparison with other algorithms. Another important issue is spatial and temporal generalization. Many models are designed for specific fire events or short periods, which restricts their broader application. There is an urgent need for a predictive system that merges historical drought-fire relationships, diverse environmental factors, and ensemble machine learning to create reliable and scalable wildfire forecasts for California. Over the course of time, environmental disasters have taken the center stage in the political arena due to the vast impact on the economy and society as a whole, fueled by the accelerated urban growth and climate change. According to the United Nation’s report Habitat III, more than 50% of the global population is currently concentrated in urban areas and this number is expected to rise to over 70% by 2050 (Motta et al., 2021 ). Application of BN and other ML techniques in prediction and classification has been used widely in many field including agriculture, economy, and etc. in Malaysia. Used ML techniques such as an Artificial Neural Network (ANN), K-Nearest Neighbors (KNN), Decision Table (DT) and M5P Tree algorithms in their research to classify herbs for agriculture industry in Malaysia (Razali et al., 2020 ). This research adopts a critical realist ontological stance, viewing wildfire as real, material phenomena governed by biophysical laws while simultaneously shaped by socio-environmental processes. The occurrence of wildfires is viewed as an emergent property of interacting systems, including topography, vegetation structure, climate variability, and human activity, rather than as a completely random event. According to historical analyses, California has seen significant fires since the nineteenth century, especially during protracted dry spells (Keeley & Syphard, 2021 ). These results support the ontological stance that wildfire regimes are not isolated anomalies but rather are a part of long-term ecological and climatic cycles. At the same time, the structural conditions that lead to fires are changed by anthropogenic climate change, land-use change, and fuel management techniques. Consequently, both contingent human influences and deterministic environmental drivers must be taken into account when predicting wildfires. In this regard, machine learning models are instruments for revealing hidden patterns in these intricate ontological frameworks. With its emphasis on empirical data, quantifiable indicators, and algorithmic pattern recognition, this study adheres to a positivist–computational paradigm in terms of epistemology. Reproducible computational workflows, statistical validation, and quantitative modelling are used to build knowledge about wildfire risk. Nonlinear relationships can be extracted from high-dimensional environmental datasets using machine learning techniques, especially ensemble methods (Pandey et al., 2019 ). Furthermore, wildfire spread datasets derived from remote sensing enable near real-time predictive modelling (Huot et al., 2022 ). These methods demonstrate a data-driven epistemology in which knowledge validity is determined by generalization ability and predictive accuracy. The study does also recognize that predictions are probabilistic. Although historical wildfire analyses offer insightful long-term information (Keeley & Syphard, 2021 ), predictive machine learning pipelines hardly ever incorporate them. The majority of ML-based research does not incorporate past drought-fire dynamics and instead concentrates on short-term forecasting. Reviews of comparative algorithms highlight the advantages of various machine learning approaches (Pandey et al., 2019 ), but in California-specific contexts, thorough ensemble modelling that incorporates Decision Tree, Random Forest, and boosting algorithms is still understudied. Current research frequently focuses on accuracy metrics without adequately addressing the interpretability and scalability of models across California's various ecological zones. This study has a wide range of practical implications. First, precise wildfire prediction models can help emergency management authorities and state agencies like CAL FIRE with early warning systems, evacuation planning, and resource allocation. Secondly, by combining historical fire patterns and drought indicators (Keeley & Syphard, 2021 ),the model supports long-term adaptation plans under changing climate scenarios by offering climate-informed predictive capabilities. Third, in high-risk areas, machine learning-based susceptibility mapping can direct fuel management regulations, infrastructure protection, and land-use planning. By reducing the biases of individual algorithms, ensemble modelling techniques further improve prediction reliability (Pandey et al., 2019 ). Lastly, by showing how ensemble machine learning and integrated datasets can handle complex socio-ecological hazards, the study makes a methodological contribution to computational environmental science. 2. Methodology 2.1. Study area and data sources California, spanning about 423,970 km², is among the most geographically and ecologically varied regions in the United States, with elevations ranging from sea level along the Pacific coastline to more than 4,200 m in the Sierra Nevada Mountains. The state is largely characterized by a Mediterranean-type climate marked by hot, dry summers and mild, wet winters, conditions that heighten wildfire risk during dry seasons. Its climatic patterns are strongly influenced by latitude, elevation gradients, and proximity to the Pacific Ocean. For example, the Central Valley a 725-km lowland bordered by the Coast Ranges to the west and the Sierra Nevada to the east—experiences far hotter summer temperatures than coastal zones, while mountainous areas show rapid precipitation shifts due to orographic effects and alpine climatic conditions. Land cover varies widely across the state, including dense conifer forests, chaparral shrublands, deserts, and arid plains, with nearly half of the area forested and encompassing ecosystems such as mixed evergreen, montane conifer, and oak woodland formations. This study centers on California and uses wildfire occurrence records from 1910–2022 that were compiled and georeferenced from official perimeter and incident datasets. Environmental predictor rasters were obtained from federal and state agencies, including U.S. Geological Survey and CAL FIRE, and standardized to a consistent spatial reference system and resolution using geospatial processing platforms. 2.2 GIS preprocessing (performed prior to Python modelling) Spatial datasets were acquired in GeoTIFF formats and processed to a uniform 1 km × 1 km grid using bilinear resampling to preserve continuous raster values. Historical wildfire polygons were converted to point locations and then sampled to create a balanced dataset of 1,094 georeferenced points (Event = 1 for fire, 0 for non-fire). Values from six raster layers were extracted to these points using ArcGIS Extract Multi Values to Points: NDVI, EVI, cumulative Precipitation, Elevation (DEM), Land Surface Temperature (LST), and SAT. The assembled point table was exported as CSV for machine-learning analysis. Spatial inputs were obtained from authoritative federal and state sources and standardized in ArcGIS Pro and Google Earth Engine to ensure consistent resolution, projection, and format. Wildfire perimeters (1910–2022) were converted to point samples and paired with six 1 km values were assigned to points using the Extract Multi Values tool to produce a reproducible multivariate input table for modeling. 2.3 Preprocessing and spatial cross-validation Imputation : Missing predictor values were imputed with feature-wise mean values (SimpleImputer(strategy='mean')). Scaling : Predictors were standardized (z-score) for algorithms sensitive to scale (SVM and neural-net meta-learner) using StandardScaler. Tree-based learners used the imputed raw values. Spatial blocking : To control spatial autocorrelation and leakage, points were binned into 20 km × 20 km grid blocks (derived from POINT_X/POINT_Y) and used as groups for GroupKFold(n_splits=5). This produced geographically distinct training/validation folds during out-of-fold (OOF) evaluation. 2.4 Modeling strategy Five base classifiers were trained using spatial GroupKFold OOF predictions: Decision Tree (DT): DecisionTreeClassifier(max_depth=10) Random Forest (RF): RandomForestClassifier(n_estimators=400) XGBoost (XGB): XGBClassifier(n_estimators=600, max_depth=6, learning_rate=0.05, subsample=0.8, colsample_bytree=0.8) LightGBM (LGBM): LGBMClassifier(n_estimators=600, learning_rate=0.05) Support Vector Machine (SVM): SVC(probability=True, kernel='rbf', C=10, gamma='scale') 2.5 Ensembles Mean ensemble : arithmetic mean of base-model OOF probabilities. Neural-network stacking (NN ensemble) : MLP meta-learner trained on the five base-model OOF probabilities (MLP Classifier (hidden layer sizes= (16,8), mixite=800)). After cross-validated evaluation, final models were retrained on the full training dataset and applied to the prediction dataset (≈423,892 points) to produce per-point probability layers for each model and the ensembles. 3. Results 3.1. Model performance (spatially grouped OOF) The comparative predictive performance of the individual classifiers and ensemble models, evaluated using spatially grouped out-of-fold cross-validation, is reported in Table 1. Table 1: Cross-validated OOF model comparison — AUC, F1, Precision, Recall. Model AUC F1 Precision Recall NN Ensemble 0.889 0.8585 0.8074 0.9165 Mean Ensemble 0.886 0.8422 0.8134 0.8731 LGBM 0.883 0.8412 0.8173 0.8664 RF 0.881 0.8450 0.8101 0.8831 XGB 0.880 0.8384 0.8122 0.8664 SVM 0.861 0.8328 0.7960 0.8731 DT 0.785 0.8055 0.7797 0.8331 The NN stacking ensemble attained the best discrimination (AUC = 0.889) and the highest sensitivity (Recall = 0.916), demonstrating that stacking the diverse base learners into an MLP meta-learner improved detection of wildfire occurrences. The mean ensemble closely follows the NN ensemble, which shows that both simple averaging and stacking yield robust gains over single base models. Among single models, gradient-boosted learners (LGBM, XGB) and RF performed strongly; the single decision tree performed worst due to high variance and limited generalization. R² values indicate the NN ensemble explains ~51% of variance in the binary labels via predicted probabilities, consistent with its superior AUC. Note: treat R² here as descriptive of probability fit rather than as the usual linear-regression interpretation. The R² values of the out-of-fold probability predictions are shown in Table 2. Table 2: R² of OOF probability predictions (r2_score). Model R² NN Ensemble 0.5079 Mean Ensemble 0.4856 RF 0.4717 XGB 0.4417 SVM 0.4345 LGBM 0.3760 DT 0.1928 Multicollinearity among the predictor variables was evaluated using variance inflation factor (VIF) analysis, and the corresponding VIF values are summarized in Table 3. Table 3: VIF values for predictors showing NDVI and EVI collinearity. Feature VIF NDVI 19.3995 EVI 17.5848 SAT 3.0762 Precipitation 2.1773 Elevation 1.8431 LST 1.6574 NDVI and EVI exhibit very high multicollinearity (VIF > 10), reflecting near-redundancy (Pearson r ≈ 0.97). This is common in remote-sensing studies where multiple vegetation indices co-vary. Tree-based learners are robust to collinearity, but for interpretability and linear-model contexts it is advisable to combine or drop redundant indices (PCA or single index selection). A comprehensive diagnostic summary of model performance and predictor relationships is presented in Figure 1. This includes the ROC curves for all models, kernel-density distributions of predicted probabilities, the Pearson correlation matrix of explanatory variables, and the VIF-based multicollinearity assessment. The bivariate relationships and marginal distributions of the environmental predictor variables are presented in Figure 2, providing an initial visual assessment of predictor interactions and potential collinearity. 4. Discussion 4.1. Key Findings and Interpretation Ensembles improve predictive performance: Both the simple mean-ensemble and the NN-stacking ensemble outperform individual classifiers in discrimination (AUC) and sensitivity (Recall). The NN ensemble produced the highest OOF AUC (0.889) and the highest R² (≈0.508), indicating it both ranks and explains wildfire occurrence better than single models. The success of ensemble methods here is consistent with ensemble theory (variance/bias reduction) and numerous wildfire studies (e.g., Dietterich, Polikar). Gradient-boosted learners are strong single models: LightGBM and XGBoost achieve high AUC and F1; LightGBM was particularly competitive, corroborating many remote-sensing applications where LightGBM balances accuracy with computational efficiency. Trade-offs: accuracy vs. complexity: Deep-learning (DL) approaches were not part of this present study; in comparative studies (Biswas et al., 2025) DL ensembles sometimes require much greater compute and do not always yield better spatially coherent susceptibility maps for this problem and dataset. Given the results, a tree-ensemble (LightGBM or RF) or the stacking ensemble provides the best operational compromise. 4.2 On NDVI/EVI Collinearity and Model Implications NDVI and EVI are highly collinear (r ≈ 0.97; VIF > 17). For interpretability, consider either (a) keeping both for tree-based models because they can exploit subtle differences, (b) combining them via PCA (use the 1st component as “vegetation” score), or (c) selecting one index that better captures local vegetation characteristics. For conservation/management users, a smaller predictor set is easier to interpret and may increase model transferability. 4.3 Spatial Realism and Operational Readiness Spatial block CV lowered optimistic bias; models that perform well under GroupKFold are more likely to generalize across space. The ensemble maps (particularly NN and Mean ensembles) are spatially coherent and align with known high-risk areas (Sierra foothills, coastal chaparral), which supports their operational usefulness for mitigation planning. 4.4 Limitations Temporal mismatch : labels span 1910–2022 while predictors reflect 2022 conditions — results are susceptibility under 2022 covariates, not year-by-year forecasting. Temporal matching is recommended for dynamic forecasting. Omitted anthropogenic covariates : human-activity layers (distance to roads, population density) were not included but can be important drivers. Calibration : probabilities should be assessed for calibration (Brier score, reliability diagrams) before using absolute risk thresholds. External validation : validating against independent held-out years or different regions is necessary to assess transferability. 4.5 Practical Recommendations For operational mapping choose LightGBM or RF for single-model deployments; adopt NN stacking if maximizing detection (Recall) is priority and greater compute is acceptable.Apply calibration (Platt scaling / isotonic) to predicted probabilities if thresholds are used for decision-making. For interpretability add SHAP analysis to quantify predictor contributions per model and explain spatial heterogeneity to stakeholders. Explore temporally resolved predictors (multi-year NDVI, antecedent precipitation) for forecasting burned area or yearly susceptibility. 5. Conclusion This study shows that an integrated machine-learning framework backed by geographically robust validation may be used to efficiently estimate California's wildfire susceptibility. Long-term wildfire records (1910–2022), remote sensing, and meteorological predictors were combined in the study to overcome significant shortcomings in earlier research, especially the absence of ensemble integration and appropriate geographic cross-validation. Ensemble methods perform better than single classifiers, according to the results. The mean ensemble was closely followed by the neural-network stacking ensemble, which had the best prediction performance (AUC = 0.889; R2 ≈ 0.51). LightGBM, XGBoost, and Random Forest outperformed the single Decision Tree in terms of generalization, whereas the other models did well. These results demonstrate that when modeling intricate socio-ecological systems, such as wildfire regimes, ensemble learning increases dependability. By lowering geographical bias and enhancing generalizability across California's varied landscapes, the application of spatial block cross-validation enhances the results' trustworthiness. The paradigm offers a strong methodological basis, despite persistent shortcomings such temporal mismatches, NDVI–EVI collinearity, and the lack of anthropogenic variables. The results have practical implications for early warning, resource allocation, and long-term climate adaptation planning by supporting operational wildfire risk mapping for organizations like CAL FIRE. All things considered, this study proves that ensemble machine learning is a dependable and scalable method for simulating California's vulnerability to wildfires. Declarations Funding Declaration The authors received no specific funding for this research from any public, commercial, or not-for-profit funding agency. This study was conducted independently using the authors’ own institutional and personal resources. AI Use Declaration The authors used artificial intelligence-assisted language tools solely to improve grammar, clarity, readability, and overall presentation of the manuscript, particularly to address language barriers. The AI tools were not used for data analysis, interpretation of results, generation of scientific conclusions, or authorship. All content was reviewed, verified, and approved by the authors, who take full responsibility for the final manuscript. References Andrianarivony, H. S., & Akhloufi, M. A. (2024). Machine Learning and Deep Learning for Wildfire Spread Prediction: A Review. Fire , 7 (12), 482. https://doi.org/10.3390/fire7120482 Huot, F., Hu, R. L., Goyal, N., Sankar, T., Ihme, M., & Chen, Y.-F. (2022). Next Day Wildfire Spread: A Machine Learning Dataset to Predict Wildfire Spreading From Remote-Sensing Data. IEEE Transactions on Geoscience and Remote Sensing , 60 , 1–13. https://doi.org/10.1109/TGRS.2022.3192974 Jain, P., Coogan, S. C. P., Subramanian, S. G., Crowley, M., Taylor, S., & Flannigan, M. D. (2020). A review of machine learning applications in wildfire science and management. Environmental Reviews , 28 (4), 478–505. https://doi.org/10.1139/er-2020-0019 Keeley, J. E., & Syphard, A. D. (2021). Large California wildfires: 2020 fires in historical context. Fire Ecology , 17 (1), 22. https://doi.org/10.1186/s42408-021-00110-7 Moghim, S., & Mehrabi, M. (2024). Wildfire assessment using machine learning algorithms in different regions. Fire Ecology , 20 (1), 104. https://doi.org/10.1186/s42408-024-00335-2 Motta, M., De Castro Neto, M., & Sarmento, P. (2021). A mixed approach for urban flood prediction using Machine Learning and GIS. International Journal of Disaster Risk Reduction , 56 , 102154. https://doi.org/10.1016/j.ijdrr.2021.102154 Pandey, D., Niwaria, K., & Chourasia, B. (2019). Machine Learning Algorithms: A Review . 06 (02). Razali, N., Ismail, S., & Mustapha, A. (2020). Machine learning approach for flood risks prediction. IAES International Journal of Artificial Intelligence (IJ-AI) , 9 (1), 73. https://doi.org/10.11591/ijai.v9.i1.pp73-80 Sayad, Y. O., Mousannif, H., & Al Moatassime, H. (2019). Predictive modeling of wildfires: A new dataset and machine learning approach. Fire Safety Journal , 104 , 130–146. https://doi.org/10.1016/j.firesaf.2019.01.006 Sullivan, E. (2022). Understanding from Machine Learning Models. The British Journal for the Philosophy of Science , 73 (1), 109–133. https://doi.org/10.1093/bjps/axz035 Maps Map 1 and 2 are available in the Supplementary Files section. Additional Declarations The authors declare no competing interests. Supplementary Files Picture1.jpg Map 1: Study area and training sample locations. Elevation hillshade (colormap blue→magenta) overlain by wildfire occurrence points (red, n = 604) from 1910–2022 fire records and paired with 2022 predictor rasters. Picture3.jpg Map 2: Panel A: Decision Tree; B: LightGBM; C: Random Forest; D: SVM; E: XGBoost; F: Mean Ensemble; G: NN Ensemble. Colors show relative risk (Very Low → Very High). Maps are predictions on the full prediction dataset (≈423,892 points) and were rendered in ArcGIS Pro for visual comparison. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9586702","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":632999211,"identity":"25730bcd-859b-45ca-a2ff-997cc4e6aa33","order_by":0,"name":"Mofasser Bin Hossain Maruf","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABOklEQVRIie2QMUvEMBTHUwJxSc81EsFPIKQIOQ5r74O4tAR6k3R1OI5KIfcVzslP4Cq4BQp1Kd5auaX3DerWg6Km9abaim4i/S158N4vf94DYGDgT2LkzcMgQE2BAFTKvbbr1o3qVCBrK8jLi9SvlfB7BewVAPCZdSvjpuxSTpeR8VpWdjA+MJO8qBYnIxpyitHauV/GOmVuX7YUnsaQmtKfPEajWf25JY+VTzHeiIfU00riX4UtJQsUNcKYsRhzaobKkMRNKCYbwZVWdOurIuCurN4/FVwtppJ4kmL2LPh626cggpHaKwh6kghorVzl8KwnJY3RuSkFm0SYH+ldhCS+kRdKuDzTKW7HLk8RfCkrh40PU070xS7uVkGhvDdnytezbV7M7bbSj9dMuj8dr5n+ZnhgYGDgX/MBBu1+VGYvB80AAAAASUVORK5CYII=","orcid":"https://orcid.org/0009-0007-7822-2358","institution":"Khulna University","correspondingAuthor":true,"prefix":"","firstName":"Mofasser","middleName":"Bin Hossain","lastName":"Maruf","suffix":""},{"id":632999212,"identity":"c63b94ca-0de5-4170-ab5a-4728287f59ad","order_by":1,"name":"Monoarul Haq Omy","email":"","orcid":"","institution":"Khulna University","correspondingAuthor":false,"prefix":"","firstName":"Monoarul","middleName":"Haq","lastName":"Omy","suffix":""},{"id":632999213,"identity":"3e375f15-afdc-41e2-b247-9fcbfcff5f49","order_by":2,"name":"Rafid Khandaker","email":"","orcid":"","institution":"Khulna University","correspondingAuthor":false,"prefix":"","firstName":"Rafid","middleName":"","lastName":"Khandaker","suffix":""},{"id":632999214,"identity":"64607557-bcb6-4de5-b302-0fe68295d94d","order_by":3,"name":"Osman Hayat Asif","email":"","orcid":"","institution":"Khulna University","correspondingAuthor":false,"prefix":"","firstName":"Osman","middleName":"Hayat","lastName":"Asif","suffix":""}],"badges":[],"createdAt":"2026-05-01 14:02:46","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-9586702/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9586702/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":108410612,"identity":"412f3511-cbc8-4b27-8fa2-694bc317b4fc","added_by":"auto","created_at":"2026-05-04 10:10:27","extension":"jpg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":141354,"visible":true,"origin":"","legend":"\u003cp\u003e(A) ROC curves computed from OOF predictions for each base model and ensembles; (B) kernel-density estimates of predicted OOF probabilities (note NN ensemble shows strongest bimodality); (C) Pearson correlation matrix (upper triangle masked); (D) Variance Inflation Factor (VIF) barplot.\u003c/p\u003e","description":"","filename":"Picture4.jpg","url":"https://assets-eu.researchsquare.com/files/rs-9586702/v1/393180e3274152152def0742.jpg"},{"id":108410593,"identity":"50da0839-bc51-426a-881d-940ecfd6b446","added_by":"auto","created_at":"2026-05-04 10:10:17","extension":"jpg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":184538,"visible":true,"origin":"","legend":"\u003cp\u003ePairwise relationships among predictors (sampled pairplot). Diagonals show marginal KDEs; off diagonals show scatter between feature pairs. NDVI and EVI show near-linear correlation.\u003c/p\u003e","description":"","filename":"Picture5.jpg","url":"https://assets-eu.researchsquare.com/files/rs-9586702/v1/df5c661eefe8092f9affb0c3.jpg"},{"id":108410717,"identity":"1c979e39-ebba-480e-a403-fbc24c0e1afd","added_by":"auto","created_at":"2026-05-04 10:10:49","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":567172,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9586702/v1/b74179a8-2968-40b2-9da3-8d45bed3f523.pdf"},{"id":108410611,"identity":"8689932b-4ce6-41a4-b140-5131bd7d5035","added_by":"auto","created_at":"2026-05-04 10:10:27","extension":"jpg","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":398768,"visible":true,"origin":"","legend":"\u003cp\u003eMap 1: Study area and training sample locations. Elevation hillshade (colormap blue→magenta) overlain by wildfire occurrence points (red, n = 604) from 1910–2022 fire records and paired with 2022 predictor rasters.\u003c/p\u003e","description":"","filename":"Picture1.jpg","url":"https://assets-eu.researchsquare.com/files/rs-9586702/v1/3e63f32c5ed0909b823d0c79.jpg"},{"id":108410649,"identity":"44345007-6019-46e8-ab70-9343cf231be5","added_by":"auto","created_at":"2026-05-04 10:10:33","extension":"jpg","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":482182,"visible":true,"origin":"","legend":"\u003cp\u003eMap 2: Panel A: Decision Tree; B: LightGBM; C: Random Forest; D: SVM; E: XGBoost; F: Mean Ensemble; G: NN Ensemble. Colors show relative risk (Very Low → Very High). Maps are predictions on the full prediction dataset (≈423,892 points) and were rendered in ArcGIS Pro for visual comparison.\u003c/p\u003e","description":"","filename":"Picture3.jpg","url":"https://assets-eu.researchsquare.com/files/rs-9586702/v1/997a5ae945c2ce85888067e7.jpg"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003eA Spatially Generalizable Neural-Stacked Ensemble Framework for Wildfire Susceptibility Prediction in California\u003c/p\u003e","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eWildfires have become one of the biggest environmental and socio-economic challenges in California over the past twenty years. The state has seen a significant rise in the frequency, intensity, and area affected by large fires, especially those over 10,000 hectares. Historical analyses show that while extreme fire events are not entirely new, the clustering and increase of large wildfires in recent years, particularly in 2018 and 2020, mark a clear change in fire patterns (Keeley \u0026amp; Syphard, \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). Long drought conditions, measured by indices like the Palmer Drought Severity Index (PDSI), have been closely linked to spikes in large wildfire events (Keeley \u0026amp; Syphard, \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). At the same time, climate change, fuel buildup, land management practices, and human-caused ignitions have raised fire risks across various Californian ecosystems. Human-induced climatic change may, over a relatively short time period (\u0026lt;\u0026thinsp;100 years), give rise to climates outside anything experienced in California since the establishment of an industrial civilization currently sustaining a state population that has increased approximately 41,000% since 1850 (Westerling et al., 2011).\u003c/p\u003e \u003cp\u003eArtificial intelligence has been applied in wildfire science and management since the 1990s, with early applications including neural networks and expert systems. Since then the field has rapidly progressed congruently with the wide adoption of machine learning (ML) methods in the environmental sciences (Jain et al., \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). The use of algorithms in scientific inquiry is not new. An algorithm is simply a series of steps or set of rules that carries out an action or solves a problem. Any model that utilizes a simulation employs an algorithm. Algorithms on their own are not explanations. It is only when algorithmic models are used to answer a question about some event or phenomenon that they explain. Some examples: How is it possible that the eye evolved in so many diverse systems? Why is segregation so prevalent? Or what effect does carbon dioxide have on current and future weather patterns? Other distinguishing features of explanation (causal, counter-factual, law covering, and so on) are still widely discussed. As a starting point, I will adopt the increasingly common view that explanation aims at understanding (De Regt [2017]); slogan: explaining why helps us to understand why (Sullivan, \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2022\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eRecent improvements in remote sensing, geospatial analytics, and machine learning (ML) offer new chances to model and forecast wildfire occurrence and spread. Studies have shown that ML algorithms, including Random Forest, Support Vector Machines, and ensemble methods, are useful for mapping wildfire susceptibility and predicting fire spread the next day (Huot et al., \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Pandey et al., \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2019\u003c/span\u003e). Additionally, comparisons of ML algorithms reveal that they perform better than traditional statistical methods when dealing with complex interactions among weather, terrain, and vegetation factors (Pandey et al., \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2019\u003c/span\u003e). However, even with substantial progress, there are still issues such as inconsistent methods, limited applicability, and a lack of integration of historical drought-fire relationships in current wildfire prediction models. This study aims to create a strong machine learning-based wildfire prediction system for. The research not only improves predictive power but also places wildfire modeling in a larger environmental context.\u003c/p\u003e \u003cp\u003eCalifornia\u0026rsquo;s wildfire crisis has grown in both size and complexity, leading to severe ecological damage, loss of infrastructure, and displacement of people. The 2020 fire season recorded an unprecedented number of large fires, highlighting the increasing wildfire risks (Keeley \u0026amp; Syphard, \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). Traditional fire risk assessment models rely mainly on straightforward statistical methods. These often do not account for complex interactions among climate, physical, and human factors. While drought has long been linked to large fire events (Keeley \u0026amp; Syphard, \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2021\u003c/span\u003e), current predictive models rarely combine long-term fire history with real-time remote sensing data. Additionally, machine learning models have shown promise in predicting wildfire spread (Huot et al., \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2022\u003c/span\u003e), but their use in California often lacks thorough validation and comparison with other algorithms. Another important issue is spatial and temporal generalization. Many models are designed for specific fire events or short periods, which restricts their broader application. There is an urgent need for a predictive system that merges historical drought-fire relationships, diverse environmental factors, and ensemble machine learning to create reliable and scalable wildfire forecasts for California.\u003c/p\u003e \u003cp\u003eOver the course of time, environmental disasters have taken the center stage in the political arena due to the vast impact on the economy and society as a whole, fueled by the accelerated urban growth and climate change. According to the United Nation\u0026rsquo;s report Habitat III, more than 50% of the global population is currently concentrated in urban areas and this number is expected to rise to over 70% by 2050 (Motta et al., \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). Application of BN and other ML techniques in prediction and classification has been used widely in many field including agriculture, economy, and etc. in Malaysia. Used ML techniques such as an Artificial Neural Network (ANN), K-Nearest Neighbors (KNN), Decision Table (DT) and M5P Tree algorithms in their research to classify herbs for agriculture industry in Malaysia (Razali et al., \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2020\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eThis research adopts a critical realist ontological stance, viewing wildfire as real, material phenomena governed by biophysical laws while simultaneously shaped by socio-environmental processes. The occurrence of wildfires is viewed as an emergent property of interacting systems, including topography, vegetation structure, climate variability, and human activity, rather than as a completely random event. According to historical analyses, California has seen significant fires since the nineteenth century, especially during protracted dry spells (Keeley \u0026amp; Syphard, \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). These results support the ontological stance that wildfire regimes are not isolated anomalies but rather are a part of long-term ecological and climatic cycles. At the same time, the structural conditions that lead to fires are changed by anthropogenic climate change, land-use change, and fuel management techniques. Consequently, both contingent human influences and deterministic environmental drivers must be taken into account when predicting wildfires. In this regard, machine learning models are instruments for revealing hidden patterns in these intricate ontological frameworks.\u003c/p\u003e \u003cp\u003eWith its emphasis on empirical data, quantifiable indicators, and algorithmic pattern recognition, this study adheres to a positivist\u0026ndash;computational paradigm in terms of epistemology. Reproducible computational workflows, statistical validation, and quantitative modelling are used to build knowledge about wildfire risk. Nonlinear relationships can be extracted from high-dimensional environmental datasets using machine learning techniques, especially ensemble methods (Pandey et al., \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2019\u003c/span\u003e). Furthermore, wildfire spread datasets derived from remote sensing enable near real-time predictive modelling (Huot et al., \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). These methods demonstrate a data-driven epistemology in which knowledge validity is determined by generalization ability and predictive accuracy. The study does also recognize that predictions are probabilistic.\u003c/p\u003e \u003cp\u003eAlthough historical wildfire analyses offer insightful long-term information (Keeley \u0026amp; Syphard, \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2021\u003c/span\u003e), predictive machine learning pipelines hardly ever incorporate them. The majority of ML-based research does not incorporate past drought-fire dynamics and instead concentrates on short-term forecasting. Reviews of comparative algorithms highlight the advantages of various machine learning approaches (Pandey et al., \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2019\u003c/span\u003e), but in California-specific contexts, thorough ensemble modelling that incorporates Decision Tree, Random Forest, and boosting algorithms is still understudied. Current research frequently focuses on accuracy metrics without adequately addressing the interpretability and scalability of models across California's various ecological zones.\u003c/p\u003e \u003cp\u003eThis study has a wide range of practical implications. First, precise wildfire prediction models can help emergency management authorities and state agencies like CAL FIRE with early warning systems, evacuation planning, and resource allocation. Secondly, by combining historical fire patterns and drought indicators (Keeley \u0026amp; Syphard, \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2021\u003c/span\u003e),the model supports long-term adaptation plans under changing climate scenarios by offering climate-informed predictive capabilities. Third, in high-risk areas, machine learning-based susceptibility mapping can direct fuel management regulations, infrastructure protection, and land-use planning. By reducing the biases of individual algorithms, ensemble modelling techniques further improve prediction reliability (Pandey et al., \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2019\u003c/span\u003e). Lastly, by showing how ensemble machine learning and integrated datasets can handle complex socio-ecological hazards, the study makes a methodological contribution to computational environmental science.\u003c/p\u003e"},{"header":"2. Methodology","content":"\u003cp\u003e\u003cem\u003e2.1. Study area and data sources\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eCalifornia, spanning about 423,970 km\u0026sup2;, is among the most geographically and ecologically varied regions in the United States, with elevations ranging from sea level\u003c/p\u003e\n\u003cp\u003ealong the Pacific coastline to more than 4,200 m in the Sierra Nevada Mountains. The state is largely characterized by a Mediterranean-type climate marked by hot, dry summers and mild, wet winters, conditions that heighten wildfire risk during dry seasons. Its climatic patterns are strongly influenced by latitude, elevation gradients, and proximity to the Pacific Ocean. For example, the Central Valley a 725-km lowland bordered by the Coast Ranges to the west and the Sierra Nevada to the east\u0026mdash;experiences far hotter summer temperatures than coastal zones, while mountainous areas show rapid precipitation shifts due to orographic effects and alpine climatic conditions. Land cover varies widely across the state, including dense conifer forests, chaparral shrublands, deserts, and arid plains, with nearly half of the area forested and encompassing ecosystems such as mixed evergreen, montane conifer, and oak woodland formations.\u003c/p\u003e\n\u003cp\u003eThis study centers on California and uses wildfire occurrence records from 1910\u0026ndash;2022 that were compiled and georeferenced from official perimeter and incident datasets. Environmental predictor rasters were obtained from federal and state agencies, including U.S. Geological Survey and CAL FIRE, and standardized to a consistent spatial reference system and resolution using geospatial processing platforms.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003e2.2 GIS preprocessing (performed prior to Python modelling)\u003c/em\u003e\u003cem\u003e\u0026nbsp;\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eSpatial datasets were acquired in GeoTIFF formats and processed to a uniform 1 km \u0026times; 1 km grid using bilinear resampling to preserve continuous raster values. Historical wildfire polygons were converted to point locations and then sampled to create a balanced dataset of 1,094 georeferenced points (Event = 1 for fire, 0 for non-fire). Values from six raster layers were extracted to these points using ArcGIS Extract Multi Values to Points: NDVI, EVI, cumulative Precipitation, Elevation (DEM), Land Surface Temperature (LST), and SAT. The assembled point table was exported as CSV for machine-learning analysis.\u003cbr\u003e\u0026nbsp;Spatial inputs were obtained from authoritative federal and state sources and standardized in ArcGIS Pro and Google Earth Engine to ensure consistent resolution, projection, and format. Wildfire perimeters (1910\u0026ndash;2022) were converted to point samples and paired with six 1 km\u0026nbsp;\u003c/p\u003e\n\u003cp\u003evalues were assigned to points using the Extract Multi Values tool to produce a reproducible multivariate input table for modeling.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003e2.3 Preprocessing and spatial cross-validation\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eImputation\u003c/strong\u003e: Missing predictor values were imputed with feature-wise mean values (SimpleImputer(strategy=\u0026apos;mean\u0026apos;)).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eScaling\u003c/strong\u003e: Predictors were standardized (z-score) for algorithms sensitive to scale (SVM and neural-net meta-learner) using StandardScaler. Tree-based learners used the imputed raw values.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eSpatial blocking\u003c/strong\u003e: To control spatial autocorrelation and leakage, points were binned into 20 km \u0026times; 20 km grid blocks (derived from POINT_X/POINT_Y) and used as groups for GroupKFold(n_splits=5). This produced geographically distinct training/validation folds during out-of-fold (OOF) evaluation.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003e2.4 Modeling strategy\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eFive base classifiers were trained using spatial GroupKFold OOF predictions:\u003c/p\u003e\n\u003cp\u003eDecision Tree (DT): DecisionTreeClassifier(max_depth=10)\u003c/p\u003e\n\u003cp\u003eRandom Forest (RF): RandomForestClassifier(n_estimators=400)\u003c/p\u003e\n\u003cp\u003eXGBoost (XGB): XGBClassifier(n_estimators=600, max_depth=6, learning_rate=0.05, subsample=0.8, colsample_bytree=0.8)\u003c/p\u003e\n\u003cp\u003eLightGBM (LGBM): LGBMClassifier(n_estimators=600, learning_rate=0.05)\u003c/p\u003e\n\u003cp\u003eSupport Vector Machine (SVM): SVC(probability=True, kernel=\u0026apos;rbf\u0026apos;, C=10, gamma=\u0026apos;scale\u0026apos;)\u003c/p\u003e\n\u003cp\u003e\u003cem\u003e2.5 Ensembles\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eMean ensemble\u003c/strong\u003e: arithmetic mean of base-model OOF probabilities.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eNeural-network stacking (NN ensemble)\u003c/strong\u003e: MLP meta-learner trained on the five base-model OOF probabilities (MLP Classifier (hidden layer sizes= (16,8), mixite=800)).\u003c/p\u003e\n\u003cp\u003eAfter cross-validated evaluation, final models were retrained on the full training dataset and applied to the prediction dataset (\u0026asymp;423,892 points) to produce per-point probability layers for each model and the ensembles.\u003c/p\u003e"},{"header":"3. Results","content":"\u003cp\u003e\u003cem\u003e3.1. Model performance (spatially grouped OOF)\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eThe comparative predictive performance of the individual classifiers and ensemble models, evaluated using spatially grouped out-of-fold cross-validation, is reported in\u0026nbsp;Table 1.\u003c/p\u003e\n\u003cp\u003eTable 1: Cross-validated OOF model comparison \u0026mdash; AUC, F1, Precision, Recall.\u003c/p\u003e\n\u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\" width=\"100%\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 31px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eModel\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 15px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eAUC\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 15px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eF1\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20px;\"\u003e\n \u003cp\u003e\u003cstrong\u003ePrecision\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 15px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eRecall\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 31px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eNN Ensemble\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 15px;\"\u003e\n \u003cp\u003e0.889\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 15px;\"\u003e\n \u003cp\u003e0.8585\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20px;\"\u003e\n \u003cp\u003e0.8074\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 15px;\"\u003e\n \u003cp\u003e0.9165\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 31px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eMean Ensemble\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 15px;\"\u003e\n \u003cp\u003e0.886\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 15px;\"\u003e\n \u003cp\u003e0.8422\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20px;\"\u003e\n \u003cp\u003e0.8134\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 15px;\"\u003e\n \u003cp\u003e0.8731\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 31px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eLGBM\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 15px;\"\u003e\n \u003cp\u003e0.883\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 15px;\"\u003e\n \u003cp\u003e0.8412\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20px;\"\u003e\n \u003cp\u003e0.8173\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 15px;\"\u003e\n \u003cp\u003e0.8664\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 31px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eRF\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 15px;\"\u003e\n \u003cp\u003e0.881\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 15px;\"\u003e\n \u003cp\u003e0.8450\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20px;\"\u003e\n \u003cp\u003e0.8101\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 15px;\"\u003e\n \u003cp\u003e0.8831\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 31px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eXGB\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 15px;\"\u003e\n \u003cp\u003e0.880\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 15px;\"\u003e\n \u003cp\u003e0.8384\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20px;\"\u003e\n \u003cp\u003e0.8122\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 15px;\"\u003e\n \u003cp\u003e0.8664\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 31px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eSVM\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 15px;\"\u003e\n \u003cp\u003e0.861\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 15px;\"\u003e\n \u003cp\u003e0.8328\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20px;\"\u003e\n \u003cp\u003e0.7960\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 15px;\"\u003e\n \u003cp\u003e0.8731\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 31px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eDT\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 15px;\"\u003e\n \u003cp\u003e0.785\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 15px;\"\u003e\n \u003cp\u003e0.8055\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20px;\"\u003e\n \u003cp\u003e0.7797\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 15px;\"\u003e\n \u003cp\u003e0.8331\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003e\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe NN stacking ensemble attained the best discrimination (AUC = 0.889) and the highest sensitivity (Recall = 0.916), demonstrating that stacking the diverse base learners into an MLP meta-learner improved detection of wildfire occurrences. The mean ensemble closely follows the NN ensemble, which shows that both simple averaging and stacking yield robust gains over single base models. Among single models, gradient-boosted learners (LGBM, XGB) and RF performed strongly; the single decision tree performed worst due to high variance and limited generalization.\u003c/p\u003e\n\u003cp\u003eR\u0026sup2; values indicate the NN ensemble explains ~51% of variance in the binary labels via predicted probabilities, consistent with its superior AUC. Note: treat R\u0026sup2; here as descriptive of probability fit rather than as the usual linear-regression interpretation. The R\u0026sup2; values of the out-of-fold probability predictions are shown in Table 2.\u003c/p\u003e\n\u003cp\u003eTable 2: R\u0026sup2; of OOF probability predictions (r2_score).\u003c/p\u003e\n\u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\" width=\"100%\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 66px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eModel\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 33px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eR\u0026sup2;\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 66px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eNN Ensemble\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 33px;\"\u003e\n \u003cp\u003e0.5079\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 66px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eMean Ensemble\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 33px;\"\u003e\n \u003cp\u003e0.4856\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 66px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eRF\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 33px;\"\u003e\n \u003cp\u003e0.4717\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 66px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eXGB\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 33px;\"\u003e\n \u003cp\u003e0.4417\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 66px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eSVM\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 33px;\"\u003e\n \u003cp\u003e0.4345\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 66px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eLGBM\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 33px;\"\u003e\n \u003cp\u003e0.3760\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 66px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eDT\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 33px;\"\u003e\n \u003cp\u003e0.1928\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003eMulticollinearity among the predictor variables was evaluated using variance inflation factor (VIF) analysis, and the corresponding VIF values are summarized in\u0026nbsp;Table 3.\u003c/p\u003e\n\u003cp\u003eTable 3: VIF values for predictors showing NDVI and EVI collinearity.\u003c/p\u003e\n\u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\" width=\"100%\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 53px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eFeature\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 46px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eVIF\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 53px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eNDVI\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 46px;\"\u003e\n \u003cp\u003e19.3995\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 53px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eEVI\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 46px;\"\u003e\n \u003cp\u003e17.5848\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 53px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eSAT\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 46px;\"\u003e\n \u003cp\u003e3.0762\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 53px;\"\u003e\n \u003cp\u003e\u003cstrong\u003ePrecipitation\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 46px;\"\u003e\n \u003cp\u003e2.1773\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 53px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eElevation\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 46px;\"\u003e\n \u003cp\u003e1.8431\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 53px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eLST\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 46px;\"\u003e\n \u003cp\u003e1.6574\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003eNDVI and EVI exhibit very high multicollinearity (VIF \u0026gt; 10), reflecting near-redundancy (Pearson r \u0026asymp; 0.97). This is common in remote-sensing studies where multiple vegetation indices co-vary. Tree-based learners are robust to collinearity, but for interpretability and linear-model contexts it is advisable to combine or drop redundant indices (PCA or single index selection).\u003c/p\u003e\n\u003cp\u003eA comprehensive diagnostic summary of model performance and predictor relationships is presented in\u0026nbsp;Figure\u0026nbsp;1. This includes the ROC curves for all models, kernel-density distributions of predicted probabilities, the Pearson correlation matrix of explanatory variables, and the VIF-based multicollinearity assessment.\u003c/p\u003e\n\u003cp\u003eThe bivariate relationships and marginal distributions of the environmental predictor variables are presented in\u0026nbsp;Figure\u0026nbsp;2, providing an initial visual assessment of predictor interactions and potential collinearity.\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;\u003c/p\u003e"},{"header":"4. Discussion","content":"\u003cp\u003e\u003cem\u003e4.1. Key Findings and Interpretation\u003c/em\u003e\u003c/p\u003e\n\u003col\u003e\n \u003cli\u003e\u003cstrong\u003eEnsembles improve predictive performance:\u003c/strong\u003e Both the simple mean-ensemble and the NN-stacking ensemble outperform individual classifiers in discrimination (AUC) and sensitivity (Recall). The NN ensemble produced the highest OOF AUC (0.889) and the highest R\u0026sup2; (\u0026asymp;0.508), indicating it both ranks and explains wildfire occurrence better than single models. The success of ensemble methods here is consistent with ensemble theory (variance/bias reduction) and numerous wildfire studies (e.g., Dietterich, Polikar).\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eGradient-boosted learners are strong single models:\u003c/strong\u003e LightGBM and XGBoost achieve high AUC and F1; LightGBM was particularly competitive, corroborating many remote-sensing applications where LightGBM balances accuracy with computational efficiency.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eTrade-offs: accuracy vs. complexity:\u003c/strong\u003e Deep-learning (DL) approaches were not part of this present study; in comparative studies (Biswas et al., 2025) DL ensembles sometimes require much greater compute and do not always yield better spatially coherent susceptibility maps for this problem and dataset. Given the results, a tree-ensemble (LightGBM or RF) or the stacking ensemble provides the best operational compromise.\u003c/li\u003e\n\u003c/ol\u003e\n\u003cp\u003e\u003cem\u003e4.2 On NDVI/EVI Collinearity and Model Implications\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eNDVI and EVI are highly collinear (r \u0026asymp; 0.97; VIF \u0026gt; 17). For interpretability, consider either (a) keeping both for tree-based models because they can exploit subtle differences, (b) combining them via PCA (use the 1st component as \u0026ldquo;vegetation\u0026rdquo; score), or (c) selecting one index that better captures local vegetation characteristics. For conservation/management users, a smaller predictor set is easier to interpret and may increase model transferability.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003e4.3 Spatial Realism and Operational Readiness\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eSpatial block CV lowered optimistic bias; models that perform well under GroupKFold are more likely to generalize across space. The ensemble maps (particularly NN and Mean ensembles) are spatially coherent and align with known high-risk areas (Sierra foothills, coastal chaparral), which supports their operational usefulness for mitigation planning.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003e4.4 Limitations\u003c/em\u003e\u003c/p\u003e\n\u003cul\u003e\n \u003cli\u003e\u003cstrong\u003eTemporal mismatch\u003c/strong\u003e: labels span 1910\u0026ndash;2022 while predictors reflect 2022 conditions \u0026mdash; results are susceptibility under 2022 covariates, not year-by-year forecasting. Temporal matching is recommended for dynamic forecasting.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eOmitted anthropogenic covariates\u003c/strong\u003e: human-activity layers (distance to roads, population density) were not included but can be important drivers.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eCalibration\u003c/strong\u003e: probabilities should be assessed for calibration (Brier score, reliability diagrams) before using absolute risk thresholds.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eExternal validation\u003c/strong\u003e: validating against independent held-out years or different regions is necessary to assess transferability.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003e\u003cem\u003e4.5 Practical Recommendations\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eFor operational mapping choose LightGBM or RF for single-model deployments; adopt NN stacking if maximizing detection (Recall) is priority and greater compute is acceptable.Apply calibration (Platt scaling / isotonic) to predicted probabilities if thresholds are used for decision-making. For interpretability add SHAP analysis to quantify predictor contributions per model and explain spatial heterogeneity to stakeholders. Explore temporally resolved predictors (multi-year NDVI, antecedent precipitation) for forecasting burned area or yearly susceptibility.\u003c/p\u003e"},{"header":"5. Conclusion","content":"\u003cp\u003eThis study shows that an integrated machine-learning framework backed by geographically robust validation may be used to efficiently estimate California's wildfire susceptibility. Long-term wildfire records (1910\u0026ndash;2022), remote sensing, and meteorological predictors were combined in the study to overcome significant shortcomings in earlier research, especially the absence of ensemble integration and appropriate geographic cross-validation.\u003c/p\u003e \u003cp\u003eEnsemble methods perform better than single classifiers, according to the results. The mean ensemble was closely followed by the neural-network stacking ensemble, which had the best prediction performance (AUC\u0026thinsp;=\u0026thinsp;0.889; R2\u0026thinsp;\u0026asymp;\u0026thinsp;0.51). LightGBM, XGBoost, and Random Forest outperformed the single Decision Tree in terms of generalization, whereas the other models did well. These results demonstrate that when modeling intricate socio-ecological systems, such as wildfire regimes, ensemble learning increases dependability. By lowering geographical bias and enhancing generalizability across California's varied landscapes, the application of spatial block cross-validation enhances the results' trustworthiness. The paradigm offers a strong methodological basis, despite persistent shortcomings such temporal mismatches, NDVI\u0026ndash;EVI collinearity, and the lack of anthropogenic variables.\u003c/p\u003e \u003cp\u003eThe results have practical implications for early warning, resource allocation, and long-term climate adaptation planning by supporting operational wildfire risk mapping for organizations like CAL FIRE. All things considered, this study proves that ensemble machine learning is a dependable and scalable method for simulating California's vulnerability to wildfires.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eFunding Declaration\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors received no specific funding for this research from any public, commercial, or not-for-profit funding agency. This study was conducted independently using the authors\u0026rsquo; own institutional and personal resources.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAI Use Declaration\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors used artificial intelligence-assisted language tools solely to improve grammar, clarity, readability, and overall presentation of the manuscript, particularly to address language barriers. The AI tools were not used for data analysis, interpretation of results, generation of scientific conclusions, or authorship. All content was reviewed, verified, and approved by the authors, who take full responsibility for the final manuscript.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n \u003cli\u003eAndrianarivony, H. S., \u0026amp; Akhloufi, M. A. (2024). Machine Learning and Deep Learning for Wildfire Spread Prediction: A Review. \u003cem\u003eFire\u003c/em\u003e, \u003cem\u003e7\u003c/em\u003e(12), 482. https://doi.org/10.3390/fire7120482\u003c/li\u003e\n \u003cli\u003eHuot, F., Hu, R. L., Goyal, N., Sankar, T., Ihme, M., \u0026amp; Chen, Y.-F. (2022). Next Day Wildfire Spread: A Machine Learning Dataset to Predict Wildfire Spreading From Remote-Sensing Data. \u003cem\u003eIEEE Transactions on Geoscience and Remote Sensing\u003c/em\u003e, \u003cem\u003e60\u003c/em\u003e, 1\u0026ndash;13. https://doi.org/10.1109/TGRS.2022.3192974\u003c/li\u003e\n \u003cli\u003eJain, P., Coogan, S. C. P., Subramanian, S. G., Crowley, M., Taylor, S., \u0026amp; Flannigan, M. D. (2020). A review of machine learning applications in wildfire science and management. \u003cem\u003eEnvironmental Reviews\u003c/em\u003e, \u003cem\u003e28\u003c/em\u003e(4), 478\u0026ndash;505. https://doi.org/10.1139/er-2020-0019\u003c/li\u003e\n \u003cli\u003eKeeley, J. E., \u0026amp; Syphard, A. D. (2021). Large California wildfires: 2020 fires in historical context. \u003cem\u003eFire Ecology\u003c/em\u003e, \u003cem\u003e17\u003c/em\u003e(1), 22. https://doi.org/10.1186/s42408-021-00110-7\u003c/li\u003e\n \u003cli\u003eMoghim, S., \u0026amp; Mehrabi, M. (2024). Wildfire assessment using machine learning algorithms in different regions. \u003cem\u003eFire Ecology\u003c/em\u003e, \u003cem\u003e20\u003c/em\u003e(1), 104. https://doi.org/10.1186/s42408-024-00335-2\u003c/li\u003e\n \u003cli\u003eMotta, M., De Castro Neto, M., \u0026amp; Sarmento, P. (2021). A mixed approach for urban flood prediction using Machine Learning and GIS. \u003cem\u003eInternational Journal of Disaster Risk Reduction\u003c/em\u003e, \u003cem\u003e56\u003c/em\u003e, 102154. https://doi.org/10.1016/j.ijdrr.2021.102154\u003c/li\u003e\n \u003cli\u003ePandey, D., Niwaria, K., \u0026amp; Chourasia, B. (2019). \u003cem\u003eMachine Learning Algorithms: A Review\u003c/em\u003e. \u003cem\u003e06\u003c/em\u003e(02).\u003c/li\u003e\n \u003cli\u003eRazali, N., Ismail, S., \u0026amp; Mustapha, A. (2020). Machine learning approach for flood risks prediction. \u003cem\u003eIAES International Journal of Artificial Intelligence (IJ-AI)\u003c/em\u003e, \u003cem\u003e9\u003c/em\u003e(1), 73. https://doi.org/10.11591/ijai.v9.i1.pp73-80\u003c/li\u003e\n \u003cli\u003eSayad, Y. O., Mousannif, H., \u0026amp; Al Moatassime, H. (2019). Predictive modeling of wildfires: A new dataset and machine learning approach. \u003cem\u003eFire Safety Journal\u003c/em\u003e, \u003cem\u003e104\u003c/em\u003e, 130\u0026ndash;146. https://doi.org/10.1016/j.firesaf.2019.01.006\u003c/li\u003e\n \u003cli\u003eSullivan, E. (2022). Understanding from Machine Learning Models. \u003cem\u003eThe British Journal for the Philosophy of Science\u003c/em\u003e, \u003cem\u003e73\u003c/em\u003e(1), 109\u0026ndash;133. https://doi.org/10.1093/bjps/axz035\u003c/li\u003e\n\u003c/ol\u003e"},{"header":"Maps","content":"\u003cp\u003eMap 1 and 2 are available in the Supplementary Files section.\u003c/p\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Wildfire susceptibility, Ensemble machine learning, Remote sensing predictors, Geospatial modeling, Environmental risk mapping, Neural network stacking","lastPublishedDoi":"10.21203/rs.3.rs-9586702/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9586702/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eInteracting climatic, biological, and human variables have increased the frequency and impact of wildfires throughout California. By combining long-term fire records (1910\u0026ndash;2022) with six remote-sensing and environmental predictors (NDVI, EVI, precipitation, elevation, land surface temperature, and surface air temperature) standardized to a 1 km grid, this study creates a spatially robust ensemble machine-learning framework to model wildfire susceptibility. Spatial block cross-validation was used to assess a balanced presence-absence dataset (n\u0026thinsp;=\u0026thinsp;1,094) in order to lessen autocorrelation bias. Mean and neural-network stacking ensembles were compared with five basic classifiers: Decision Tree, Random Forest, XGBoost, LightGBM, and Support Vector Machine. According to the results, ensemble methods perform better than single models; the stacking ensemble performs the best (AUC\u0026thinsp;=\u0026thinsp;0.889; R2\u0026thinsp;\u0026asymp;\u0026thinsp;0.51). Strong predictive performance was also shown by the Random Forest and gradient-boosted models, however the Decision Tree model alone shown less generalization. The results validate that the combination of geographical validation and ensemble learning enhances the generalizability and reliability of mapping wildfire vulnerability. This framework supports California's climate adaptation plans, mitigation planning, and wildfire risk assessment by offering a scalable and practically applicable tool.\u003c/p\u003e","manuscriptTitle":"A Spatially Generalizable Neural-Stacked Ensemble Framework for Wildfire Susceptibility Prediction in California","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-05-04 10:08:58","doi":"10.21203/rs.3.rs-9586702/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"4469c8f9-7fdb-4b38-bcd8-21c9e2e0c09b","owner":[],"postedDate":"May 4th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":67376722,"name":"Artificial Intelligence and Machine Learning"},{"id":67376723,"name":"Geographic Information Systems"},{"id":67376724,"name":"Climate Analysis and Modeling"}],"tags":[],"updatedAt":"2026-05-04T10:08:59+00:00","versionOfRecord":[],"versionCreatedAt":"2026-05-04 10:08:58","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9586702","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9586702","identity":"rs-9586702","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00