Advanced Landslide Early Warning System Based on a Semi-supervised Model in Highly Urbanized Areas across China's Greater Bay Area | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Advanced Landslide Early Warning System Based on a Semi-supervised Model in Highly Urbanized Areas across China's Greater Bay Area Haixia Yu, Yi Jin, Kunlong He, Xuan Yu This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-5684743/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Landslides are a significant global geological hazard, with adverse and far for human life, the economy and the natural environment on an annual basis worldwide. Accurately estimating the spatial and temporal distribution of landslide probability is crucial for reducing these losses. Nevertheless, existing landslide warning systems may fail to consider the selection of non-landslide samples and the dynamic process of landslides, potentially compromising the accuracy of landslide warning systems. This study explores the impact of different selections of non-landslide samples and satellite rainfall datasets on the early warning model for landslides in the Guangdong-Hong Kong-Macao Greater Bay Area (GBA). Through Pearson correlation analysis, critical factors associated with landslide occurrences were identified, including elevation, slope, aspect, distance to roads and rivers, soil type, plan curvature, profile curvature, Topographic Wetness Index (TWI), and Normalized Difference Vegetation Index (NDVI). In this study, a semi-supervised random forest (SSRF) model incorporating frequency ratios (FR) to evaluate landslide susceptibility in the GBA. The susceptibility and rainfall threshold model were subsequently combined into a dynamic landslide hazard warning system through a matrix approach. The findings revealed that the maximum area under the curve (AUC) value for a landslide to non-landslide ratio of 1:4 is 0.973. The very high susceptibility zone is typically located between 125 and 250 meters away from roads. Moreover, the validation phase yielded successful predictions for 67 out of 96 landslide events, thereby providing effective early warning and a reference point for disaster mitigation and prevention. Non-Landslide sample Semi-supervised random forest Landslide susceptibility map Rainfall threshold Dynamic landslide warning Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 1 Introduction Landslides are a frequent and damaging natural occurrence, influenced by geological, meteorological, and human factors (Casagli et al. 2023). Between 2004 and 2016, landslides caused the deaths of around 55,997 people worldwide, with Asia being the most affected region (Froude and Petley 2018). China has also suffered greatly, with landslides accounting for 14,394 deaths between 1940 and 2020. Notably, the majority of these incidents occurred during the country's rainy season, which spans from April to September. The situation is further complicated by the fact that the unpredictability of landslides is intensified by human activities and climatic changes, which increase risk factors and the spatial-temporal variability of occurrences (Gariano and Guzzetti 2016; Johnston et al. 2021; Jones et al. 2021). Consequently, an accurate prediction of landslide susceptibility is an essential component to preemptively mitigate the potential detrimental impacts on both the constructed and natural environments due to landslides (Keefer and Larsen 2007; Lacroix et al. 2020). Recent advancements in landslide susceptibility prediction (LSP) have leveraged methodologies grounded in statistical analyses and burgeoning machine learning techniques, establishing a foundation for sophisticated hazard warning systems. In the realm of LSP, statistical models such as the information value (IV) model and frequency ratio model play a pivotal role. These models utilize probabilistic analysis, integrating environmental variables with historical landslide incidents to forecast susceptibility (Pardeshi et al. 2013; Tan et al. 2015). Parallel to this, the machine learning spectrum has expanded to incorporate models like the RF, support vector machines (SVM), artificial neural networks (ANN), and decision trees (DT), with RF particularly noted for its proficiency in managing multidimensional datasets and projecting geohazard susceptibilities at a regional level (Merghadi et al. 2020). While fine-tuning model parameters before training may lead to increased accuracy in LSP, the modeling process remains beset by uncertainties. Examples include methods for selecting non-landslide samples for input to the RF model regarding both quantity and spatial distribution (Dou et al. 2023), determining the grid location of landslide samples based on the landslide catalog (Loche et al. 2022), and the categorization of environmental factors that include the treatment of continuous variables (Huang et al. 2021). Furthermore, the occurrence of regional landslides is intricately linked to the hydrological elements of precipitation, particularly the amount and duration, which are critical factors for the onset and movement of slope failures. Physically-based models adeptly integrate these rainfall attributes, offering reliable predictions for the occurrence and spread of rainfall-induced landslides (Schilirò et al. 2016). However, these methods mainly show localized precision, which is specific to the environmental conditions of the study area, and often rely on detailed site-specific datasets that may not be consistently applicable to different geographic locations or larger scales (De Sy et al. 2013). This specificity presents a substantial challenge in the general applicability of the models since the diversity and uniqueness of geographical settings can substantially influence their transferability. Hence, advancing universally applicable methodologies is paramount, necessitating integrating historical rainfall datasets and exhaustive landslide inventory compilations. These methodologies promote the construction of uniform models that guarantee the scalability of landslide prediction across various terrains (Mondini et al. 2023; Segoni et al. 2018). Notably, models that delineate rainfall thresholds according to cumulative event rainfall-duration (ED) and intensity-duration (ID) criteria have proven to be effective in predicting landslide occurrences, particularly for events triggered by extensive rainfall (Kim et al. 2021). Expanding upon this base, leveraging global satellite rainfall data promises to refine the precision of rainfall threshold models, courtesy of their broad scope and regular updates (Maraun et al. 2022). Nonetheless, while rainfall thresholds assist in pinpointing when landslides might occur, they fall short in shedding light on where landslides might transpire spatially. To surmount this shortfall, one can utilize landslide susceptibility maps, which depict the probability of landslides over an extended period within a delineated locale (Ahmed et al. 2023). Moreover, the static representation in a landslide susceptibility map does not encapsulate the dynamic influence of rainfall, posing challenges for real-time landslide prediction. This research has given rise to a spatial-temporal model that fuses the unchanging aspects of landslide susceptibility with the fluctuating traits of rainfall thresholds, aiming to anticipate landslides in the Greater Bay Area. Calculating warning levels for distinct zones enables the anticipation of landslide disaster risks for specified regions within set intervals. Such an approach lays down a scientific framework for informed disaster risk reduction and evasion. 2 Study area and datasets 2.1 Study area The GBA is a crucial urban agglomeration in coastal China, with a well-established metropolitan area, extensive port infrastructure, and a sophisticated network for transportation. Covering nine cities of Guangdong Province, including Guangzhou, Shenzhen, Zhuhai, Foshan, Zhongshan, Dongguan, Huizhou, Jiangmen, and Zhaoqing, as well as the special administrative regions of Hong Kong and Macau, the area is fringed by mountains on three sides and is open to the sea on the fourth, presenting a rich tapestry of landforms (Fig. 1 ). With China’s reform and opening up, urbanization within the GBA has dramatically altered its spatial attributes. For example, a six-fold increase in impervious surface areas, extensive land cover changes (Ding et al. 2022), and the central urban regions’ fluctuating and often low NDVI values (Geng et al. 2022), amongst others.In terms of climatic conditions, the GBA is located along the coast of South China with a subtropical humid monsoon climate, where typhoons and rainstorms are frequent (Wang et al. 2022). The average yearly rain is between 1300 and 2500 mm and the rainy season occurs from April to September annually (Xin et al. 2021). Despite a decreasing trend in the overall number of rainy days, recent decades have seen a surge in both heavy precipitation events and extreme weather patterns, exacerbating landslide hazards across the GBA. Climate risks in the GBA will become more acute. It is urgent to rationalize the use of climate resources and improve the capacity for climate disaster and risk management (Zhou et al. 2023) Geologically, the area is characterized by a stratigraphic sequence spanning from the Upper Proterozoic to the Quaternary period, with a notable presence of Upper Paleozoic carbonate formations and a veneer of Quaternary deposits. The bedrock in the region is predominantly granite, indicating a geological framework that potentially poses intricate challenges to the geomorphological stability due to weathering and fracturing processes inherent in granitic terrains (Goudie 2016). Moreover, the coastal areas witness a widespread distribution of Upper Paleozoic carbonate rocks and a superficial layer of Quaternary deposits, contributing to the diverse geology from the Upper Proterozoic to the Quaternary period. Given the aforesaid factors, the uncertainty surrounding meteorological disaster-induced landslides, such as rainstorms and typhoons in the GBA, is increasing (Wang et al. 2023). Therefore, accurately categorizing this susceptibility and implementing targeted disaster prevention is imperative. The study region was demarcated into evaluation cells, measuring 12.5 m × 12.5 m, to amass a total of 8.40 × 10 8 grid cells. 2.2 Datasets The landslide data was collated from the Guangdong Provincial Disaster Prevention and Mitigation Yearbook (GDPMY) and NASA's Global Landslide Catalog (GLC). The GDPMY contains overviews, major events, disaster descriptions, and appendices dating back to 1995. The GLC has been developed for rainfall-induced landslides reported in the media, online databases and other sources. It uses data from 2007 to the present (Kirschbaum et al. 2010). Finally, a total of 476 landslide events with precise temporal and spatial details from 2008 to 2020 were identified as the focus of this study. Heavy rainfall served as the principal trigger for those landslides, followed by anthropogenic construction activities. The landslide events dataset was systematically partitioned into three separate subsets, adhering to a ratio of 6:2:2. Specifically, 60% (285 landslide events) of the aggregated samples were allocated for the training phase of the SSRF, aimed at constructing a robust predictive engine. A further 20% (95 landslide events) were reserved for the SSRF model's testing sequence, facilitating an initial performance assessment. The remainder, also constituting 20% (96 landslide events), functioned as the validation set designated to evaluate the efficacy of the warning model post-integration with the proposed rainfall threshold mechanism. This stratified distribution ensures a comprehensive approach to model development by encompassing distinct phases from calibration to validation, thereby fortifying the predictive reliability of the landslide early warning system. Precipitation data was sourced from the Multi-Source Weighted-Ensemble Precipitation (MSWEP) dataset with a pixel resolution of 0.1° ( https://www.gloh2o.org/mswep/).Th e 12.5 m DEM of the study area was collected by the Advanced Land Observation Satellite (ALOS). Furthermore, the elevation, slope, aspect, curvature, profile curvature, plane curvature, flow direction, relief amplitude, and TWI were extracted. Among them, curvature, profile curvature and plan curvature are calculated in the Spatial Analyst Tools - Surface - Curvature option. The D8 method is used to create a grid of flow directions from each pixel to its downhill adjacent points. Relief amplitude is the vertical difference between the highest and lowest elevations in a specific area. It is a macroscopic index used to describe the terrain characteristics of a district. The maximum and minimum values were obtained using the Spatial Analyst Tools- Neighborhood - Focus Statistics option using the Focus Statistics tools in ArcGIS Pro. TWI is a physical index of the influence of regional topography on runoff flow direction and accumulation, which is calculated from the slope and flow direction of the grid. The Open Street Map (OSM) at http://www.openstreetmap.org/ provides road and water system data. The NDVI data are derived from the China regional 250 m normalized difference vegetation index dataset (2000–2022), while soil type distribution was gathered from the soil map of China at a 1:1 million scale ( https://www.resdc.cn/data.aspx?DATAID=145 ). All of the above data were eventually resampled to 12.5m through ArcGIS Pro in preparation for analysis. 3 Methods 3.1 Frequency ratio model FR is a statistical method used to classify various factors into intervals. Its history is related to geological engineering and research on geological disasters (Lee and Pradhan 2007; Youssef et al. 2023). The FR was calculated according to the following equation: $$\:\begin{array}{c}FR=\frac{\raisebox{1ex}{$N$}\!\left/\:\!\raisebox{-1ex}{${N}_{0}$}\right.}{\raisebox{1ex}{$S$}\!\left/\:\!\raisebox{-1ex}{${S}_{0}$}\right.}\ \left(1\right)\end{array}$$ where \(\:N\) is the area of geohazards within the categorized interval of a factor, \(\:{N}_{0}\) is the total area of geohazards in the study area, \(\:S\) is the categorized area of the factor, and \(\:{S}_{0}\) is the total area. A FR value exceeding 1 suggests an increased likelihood of geohazard occurrences attributed to the factor, whereas an FR below 1 implies a reduced susceptibility. This study considers soil type, aspect, and flow direction as discrete factors, each with specific meanings and influences on landslides (Canavesi et al. 2020). Roads and water systems need to be buffered separately due to their different distribution and impact on landslides. These buffer zones were defined based on iterative procedures, conclusively setting 125 meters as the buffer distance of the road and 250 meters as the buffer distance of the river, respective distances that optimally encapsulate the influence on landslide incidences (Fang et al. 2023). The assignment of continuous factors to specific intervals is pivotal in the model's predictive capability. A small AIN value can result in over-rough division of factors, which reduces the prediction accuracy of the model. On the other hand, a large AIN value can increase the complexity and computational cost of modeling (Xing et al. 2023). It was found that when the attribute interval number (AIN) exceeded a threshold of 8, the incremental gain in prediction precision plateaued or even decreased. Hence, an AIN of 8 was identified as an optimal trade-off between model simplicity and comprehensive factor representation. 3.2 Semi-supervised random forest model Leveraging the competencies of the RF model, a supervised learning algorithm renowned for its robust applicability in complex classification tasks, this study integrates semi-supervised techniques to address the known inaccuracies engendered by the arbitrary selection of non-landslide samples (Huang et al. 2024). SSRF effectively combines the robust, multivariate processing capability of RF with semi-supervised learning mechanisms, minimizing the biases introduced by arbitrary non-landslide sample selection (Jiang et al. 2024). The SSRF modeling approach is as follows: Firstly, non-landslide samples are randomly selected from the grids in the region. Then, the RF model in the tidymodels package in R is used for LSP. The predicted landslide susceptibility index (LSI) with a range of 0–1 are then imported into ArcGIS Pro and classified into five levels: very low, low, medium, high, and very high using the natural breaks method. Non-landslide samples with higher reliability were then randomly selected from the very low and low susceptibility zones of the five levels. 3.3 Rainfall threshold warning model Current landslide rainfall threshold research is mainly based on empirical methods, which derive thresholds from the relationship between historical landslides and precipitation without directly considering physical processes. This approach indirectly predicts landslide occurrences through rainfall as the primary inducing factor (Brunetti et al. 2010). Due to the convenience of data acquisition, strong generalizability, and good applicability, it has been widely used worldwide (Brunetti et al. 2021; Peruccacci et al. 2017; Piciullo et al. 2017). To determine the rainfall thresholds that trigger landslides in the GBA, this study has adopted a comprehensive classification standard. This criterion combines rainfall data from the GBA from 2008 to 2020, including daily rainfall, cumulative three-day rainfall, and cumulative seven-day rainfall (Fan et al. 2015). For each grid, the ratio of rainfall within these time frames to the historical maximum rainfall is calculated. Utilizing landslide data comprising 80% (380 landslide events), percentiles for these ratios were established to evaluate the precipitation situation at each grid. As illustrated in Fig. 2 , the step implemented involves comparing the observed daily rainfall at each grid to the historical maximum for that day. If this comparison exceeds the 95th percentile among training samples, the grid is immediately classified as a zone of very high risk (R1). Additionally, if the cumulative three-day and seven-day rainfall ratios surpass 95% and 90% of the training samples, respectively, the grid is also classified as R1. Grids where the cumulative three-day rainfall ratio ranges from 75–95%, or the cumulative seven-day rainfall ratio is between 80% and 90%, are determined to be high-risk zones (R2). Risk levels for other ratio intervals are delineated likewise. A landslide warning is issued when the rainfall reaches R3 or more. It is important to note that if the rainfall ratios do not reach these specific thresholds such as a daily rainfall ratio is below 95% or the cumulative three-day rainfall ratio is less than 40%, it is difficult to determine the exact rainfall level. In these cases, the grid remains in an indeterminate zone, which is depicted in grey on the Fig. 2 and represents an unspecified risk area. This is due to the observation that relying solely on the values of daily or three-day rainfall, which are not significantly high, does not comprehensively predict landslides in the GBA based on its historical patterns. Ultimately, the rainfall level for each grid is determined by the highest level indicated among the daily, cumulative three-day, and cumulative seven-day rainfall ratio (i.e., the red dashed line in Fig. 2 ). The rainfall level on the first day is determined by determining the percentage of rain that falls on that day, categorized as R1. Although the rainfall on the second day decreases to a normal level, it is categorized as being in R2 by the proportion of cumulative rainfall over the three days due to the heavy rainfall on the first two days.This methodology allows for a synthesis of real-time rainfall data with historical accumulations, assessing the extent of historical anomaly in rainfall patterns and its impact on the potential probability of landslide initiation. We think that when the rainfall reaches R3 or more, a landslide warning is issued. 3.4 Dynamic landslide hazard warning model The warning model based on a matrix method integrates the characteristics of landslide susceptibility mapping (LSM) and rainfall thresholds to realize the spatial-temporal early warning of landslide (Huang et al. 2022; Segoni et al. 2015). LSM enables the prediction of landslides under the influence of environmental ontogenetic factors and facilitates preventive interventions in very high and high susceptibility areas. Concurrently, rainfall thresholds gauge instantaneous probabilities of cumulative precipitation inducing landslides. Figure 3 shows the specific warning levels established through this model. In monitoring areas classified within the Very High Susceptibility Zone, the issuance of the highest emergency level warning is mandated upon R1 classification, underscoring a pronounced likelihood of landslide occurrence. The immediacy of this threat necessitates prompt response protocols, including on-site verification, evacuation, and secure sheltering, to mitigate potential loss of life and property. Similarly, the rest of the landslide hazard classification criteria need to be based on the corresponding susceptibility class level areas that are discriminated by the rainfall thresholds. We consider that a landslide warning is issued when its warning level reaches above watch, and therefore serves as a criterion for whether to predict a landslide. 4 Result 4.1 Elimination of redundant impact factors The genesis of landslides is multifactorial, and inter-correlations among predictive indicators could propagate errors, compromising model precision. The study employed Pearson correlation analysis to evaluate the linear relationships among various predisposing factors (Yu et al. 2021). It is posited that Pearson coefficients with absolute values below 0.5 signify negligible inter-dependence among the evaluation metrics. Conversely, coefficients exceeding 0.5 indicate a moderate correlation between the evaluation factors. The results of the analysis are shown in Fig. 4 , which shows that there is a significant correlation between relief amplitude, slope, and the TWI, with correlation coefficients surpassing the 0.5 threshold. The SSRF model's output underscored a comparatively minor contribution of relief amplitude to landslide susceptibility, an indicator of more substantial impact from other assessed factors. Analysis revealed substantial coefficients for both curvature against profile curvature, and flow direction against slope direction, prompting their exclusion to avoid collinearity. Consequently, the study identified elevation, slope, aspect, plan curvature, profile curvature, TWI, NDVI, soil type, distance to road, distance to river as the fundamental variables influencing landslide susceptibility within the GBA. 4.2 Combination of factors indicating high probability of landslide occurrence Utilizing the attribute interval number value of 8, as delineated in Section 3.1, this study classified continuous factors impacting landslides and summarized the results in Table 1 . Elevation between 38 to 311 meters, covering 42.2% of the GBA, consistently showed FR values above 1, indicating heightened landslide susceptibility. Specifically, the elevation band between 38 and 113 meters, constituting a low-altitude hilly topography in the GBA, displays the apex of FR values. Although not reaching mountainous classifications, this segment is characterized by its moderate heights, likely presenting undulating hillocks and gentle slopes. Diverging from the prototypical steeply inclined landscapes that typify landslide-vulnerable areas within China, the GBA's inclination rarely transcends 5 degrees, offering a unique pattern of susceptibility. A significant concentration of landslides, approximately 69.6% of documented occurrences, are situated within gentle to moderate slopes ranging from 7.64 to 27.94 degrees. In terms of road transport, the GBA boasts a sophisticated road network reflective of its urbanization. The areas within 125 meters of roadways encompass over 30% of the region and are associated with a FR exceeding 1.4. Remarkably, FR values peak between 125 to 250 meters distance from roads, hinting at a transitional corridor from dense urban to burgeoning suburban zones. The ascendant FR values potentially stem from the abundance of virgin slopes and geological protrusions, prone to heightened landslide susceptibility (Kwong et al. 2004). Beyond 375 meters road distance, all FR values fall below 1, suggesting a significant suppression of landslide occurrences in this range. This phenomenon is presumably aided by denser vegetation, milder topographic gradients, and differing edaphic compositions. Meanwhile, Spatial distribution analysis of FR values in relation to aquatic vicinities unfurls a non-linear trend. An initial ascension is observed for locations within the 0 to 500 meters range from river, peaking with an FR of 1.222 within the 500 to 750 meters range. This relationship delineates an intricate interplay of erosive force and hydrological variance, which imbues the proximal fringes of water bodies with greater susceptibility. Conversely, the retreat from river influence beyond this distance diminishes landslide propensity, where topographic and vegetative factors preside in susceptibility determination. Conclusively, the correlation between the spatial proximity to rivers, roads, and landslide incidence shows that regions within a 250 to 750 meters radius from river, and up to 250 meters from road infrastructure, exhibit an excess FR value of over 1.2. This demarks an augmented likelihood of landslide emergence within these confines. Table 1 Frequency analysis of all conditioning factors. Factors Interval FR Factors Interval FR Elevation (m) < 38 0.622 Slope (°) 0 ~ 3.62 0.551 38–113 2.047 3.62 ~ 7.64 0.762 113–205 1.374 7.64 ~ 12.47 1.448 205–311 1.041 12.47 ~ 17.44 1.441 311–436 0.408 17.43 ~ 22.49 1.334 436–595 0.108 22.49 ~ 27.94 1.283 595–822 0.112 27.94 ~ 34.97 0.817 > 822 0 34.97 ~ 78.46 0.588 Profile curvature <-1.69 1.869 Aspect Flat 0 -1.69~-0.92 0.736 North 0.523 -0.92~-0.31 1.014 Northeast 1.175 -0.31 ~ 0.30 0.967 East 1.549 0.30 ~ 0.89 0.949 Southeast 1.249 0.89 ~ 1.54 1.359 South 1.091 1.54 ~ 2.83 1.329 Southwest 1.016 > 2.83 0 West 0.7 Plan curvature <-1.65 0.847 Northwest 0.787 -1.65~-0.89 1.059 Distance to road (m) 2.76 0 750 ~ 875 0.76 TWI 1000 0.39 7.13–9.44 0.764 Distance to river (m) 18.26 0.384 1250 ~ 1500 1.032 NDVI -0.2~-0.092 0.126 1500 ~ 1750 0.709 -0.092 ~ 0.119 0.202 1750 ~ 2000 0.56 0.119 ~ 0.297 0.962 > 2000 0.611 0.297 ~ 0.452 1.133 Soil type Aquic Soil 0.423 0.452 ~ 0.607 1.06 Solonchak 0 0.607 ~ 0.733 1.06 Ferralsols 0.834 0.733 ~ 0.827 1.232 Anthrosols 1.138 0.827 ~ 0.995 0.797 Others 0.513 4.3 Landslide susceptibility map generatio In order to investigate the implications of non-landslide sample representation on the SSRF’s landslide susceptibility mapping efficacy in the GBA, different proportions of non-landslide samples were randomly selected. Landslide samples, designated with a ‘1’ label, were paired with non-landslide samples, labeled ‘0’, across a range of ratios from 1:1 up to 1:5. These combinations, integrated with previously calculated FR, served as the input factors for refining the SSRF's predictive analytics. Analyses indicate that an increase in the proportion of non-landslide samples significantly reduces the number of grids categorized as highly susceptible, while simultaneously increasing those classified as less susceptible. With the different pairing ratios mentioned above, the proportions of historical landslides in very high and high susceptibility areas were 83.37%, 94.74%, 91.79%, 91.79%, and 89.05%, respectively. Among these, the number of historical landslides contained in the grids predicted as very high and high susceptibility areas at the ratios of 1:2, 1:3, and 1:4 ratios exceeded 90%, and the AUC values were 0.924, 0.950, and 0.973. The ROC curves of the five models are shown in Fig. 5 (a). For refined statistical delineation, the LSI was segmented into 64 intervals. Within the SSRF model configurations of 1:2, 1:3, and 1:4, the derived mean LSI values were 0.375, 0.319, and 0.240, with standard deviations of 0.315, 0.272, and 0.239, respectively. A lower mean LSI coupled with a higher standard deviation signifies reduced predictive uncertainty. Ultimately, a 1:4 landslide to non-landslide sample ratio was established to yield the superior predictive prowess for the SSRF model. Meanwhile, the performance of the SSRF and traditional RF models in producing LSM was evaluated; the optimal ratio of landslide to non-landslide samples for the RF model was determined across the study area, and LSI was likewise apportioned into 64 intervals. As shown in Fig. 5 (b), the mean LSI of the SSRF model was 0.241, which is higher than the RF's mean value of 0.180. However, the SSRF had a significantly larger standard deviation of 0.239 compared to 0.149 of RF. And the AUC value of the SSRF model under the optimal ratio of landslide and non-landslide is 0.973, which is 0.194 higher than the AUC value of RF under the optimal parameter of 0.779. Therefore, we believe that SSRF, with its smaller variability and higher AUC values, can more accurately predict landslides and generate LSM. The significance of the ten evaluated factors was delineated, utilizing the optimal SSRF proportion, as shown in Fig. 5 (c). Distance to roads and elevation emerged as preeminent factors in landslide susceptibility modeling, reflected by mean decrease Gini values of 90.8 and 74.8, respectively. In particular, road distance was identified as a salient variable in the SSRF and generalized additive models. The diminished road distances correlated with increased settlement density, thus amplifying landslide risks to life and property. Consequently, preemptive measures, including the strategic design and systematic layout of roads, augmented by vigilant traffic management, are imperative to attenuate landslide risks. Timely road maintenance practices such as routine inspections and immediate rectification of fissures, depressions, and obstructions in drainage are instrumental in curtailing landslide occurrence. Additionally, the strategic emplacement of barriers at susceptible slopes and the conspicuous display of warning signage along critical sections can substantially mitigate landslide-induced traffic mishaps, thereby ensuring the safety of both motorists and pedestrians. Subsequently, the LSM can be obtained by inputting the optimal parameters derived above and the DEM the GBA (Fig. 6 ). The region was divided into five categorically distinct susceptibility zones. These zones range from very low, constituting 42.10% of the area, to very high susceptibility, which occupies only 8.98% of the area. Notably, the very high susceptibility zone, although representing a minor fraction of the total area, encompasses an astonishing 74.94% of the landslide occurrences, attesting to the model’s precision in identifying zones of critical concern. Aligning with data in Table 1 , the spatial distribution of these zones elucidates a correlation of very high susceptibility with parameters such as low elevation, moderate slopes, immediate vicinity to roads or rivers, eastward-facing topography, and zones of active construction. The observed spatial patterns reflect a complex interplay of geomorphological and anthropogenic influences, where lower elevations coupled with moderate inclines predispose the formation of layered soil structures, and proximity to infrastructural developments heightens vulnerability to erosional forces, leading to destabilization of the slopes. These anthropogenically mediated modifications can adversely affect soil cohesion. Conversely, the areas positioned at higher altitudes in the GBA tend to exhibit decreased susceptibility due to their diminished exposure to fluvial erosion, inherently lower hydrological accumulation capacities, and relative seclusion from human-induced alterations. 4.4 Comparison of rainfall threshold warning and dynamic landslide hazard warning The training dataset for model training consisted of 80% (380 landslide events). Among these, 135 events were classified under category R3, while a significant portion exceeded the defined critical levels in categorized R1 and R2. The probability of successful landslide prediction under the rainfall threshold model is 67.2%. With respect to dynamic landslide hazard warning, 120 events triggered emergency-level warnings and 131 activated alert-level warnings. Finally, compared with the rainfall threshold warning, the prediction success rate is increased by 22.1%, reaching 89.2%. Detailed verifications revealed that several events, which were not successfully predicted, were influenced by non-rainfall-induced factors such as anthropogenic slope alterations and infrastructural deficiencies. These deficiencies include unfortified road slopes, absence of drainage facilities in retaining structures, and a general lack of preventive maintenance protocols. Further validation was conducted using the remaining 20% of landslide data, resulting in the successful prediction of 63 landslides, with a success rate of 65.6%. Although the success rate of the prediction was similar to that of the training set, the forecast sample identified only 4 events and 14 events at R1 and R2, respectively. This posed a challenge to distinguish landslide occurrence conditions in the case of heavy rainfall and severe storms. In contrast, while the dynamic landslide hazard warning model identified only 67 of the 96 landslides, 14 of them were in an emergency level and 34 were in a warning level. This more accurately reflects the occurrence of landslides under the situation of heavy rainfall. A comparison of the performance of the warning systems was conducted using data from a rainfall event which occurred on 31 August 2018. Despite the fact that the average rainfall recorded at each landslide location did not exceed 30 mm for that day, cumulative precipitation data over three and seven days showed significant amounts. Consequently, this phenomenon initiated a total of 37 landslides on that day. Figure 7 (a) illustrates the day's rainfall warnings, which are categorized into three distinct levels: R2, R3, and R4. The distribution of landslides highlights 29 instances within the R2 region, predominantly around Huizhou and Zhaoqing, and 8 within the distributed R3 region. Figure 7 (b) analyses the incident data, showing that landslide hazard alerts were more precise in determining severity compared to rainfall threshold warnings. Out of the 37 landslide events, the warning system accurately predicted 24 at the emergency level, 11 within the alert level, and only 2 at the attention level. Nevertheless, both models were able to predict the location of the landslides. According to the rainfall threshold warning model, there was no R1 rainfall in the entire GBA area, and the highest warning level that could be issued was R2. However, on 31 August 2018, a dense landslide hazard occurred in the entire GBA area, which was not in line with our expectation of the warning. In contrast, the dynamic landslide warning model classified 24 out of 37 landslides as the highest warning, i.e., the emergency level and provides accurate warnings that align with actual conditions. 4.5 Performance of dynamic landslide hazard warning system The continuous landslide events in GBA from August 15 to August 20, 2013 were chosen to verify the performance of the landslide hazard warning system. A total of 35 landslides were recorded within this six-day period, with the majority occurring in the municipality of Huizhou, located in the eastern sector of the GBA, as shown in Fig. 8 . Among these events, the dynamic landslide warning model correctly predicted 28 landslide events, with an accuracy of 74.3%. Emergency warnings were issued for seven of the 28 events that were correctly predicted. Notably, a smooth spatial correlation was observed as one's proximity to the identified landslide points decreases, there is a corresponding escalation in the warning level; hence, proximal locations receive the most acute level of alerts commensurate with their increased risk of being impacted by landslide events. However, on August 16, 2013, three landslides occurred in the eastern part of the GBA that were expected to be at alert level, while no landslides occurred in their more easterly emergency level. Two explanations may account for this anomaly. The potential lack of hourly updates in the landslide catalog raises the possibility that a nighttime (August 16, 2013) rainstorm triggered the landslides but was not recorded by the daily rainfall measurements, obscuring its anomaly. This temporal mismatch between rainfall and catalog updates could impact warning accuracy, especially for sudden events. Secondly, the study used 0.1° data from MSWEP, which has superior spatial resolution compared to conventional 0.5° satellite data. However, it is important to acknowledge that in some areas, MSWEP achieves its finer resolution through spatial interpolation, which can introduce unavoidable inaccuracies. It is possible that these deviations may have affected the system's forecasts, which could explain the difference between the actual and expected landslide phenomena. 5 Discussion This study has utilized a landslide catalog and precipitation datasets to shape the landslide hazard warning framework for the GBA. The catalogs classified events based on primary components, such as nominal spatial data, temporal instances, and causative factors, and secondary attributes encompassing the type, scale, and consequences of the landslides. The classification of events in catalogs is based on primary components, such as nominal location information, time, and trigger, as well as secondary elements that encompass the type and relative size of the event, latitude and longitude, and impact information regarding landslides. However, incorporating landslides of different sizes into a single database requires a calibrated approach to accommodate the diversity in precipitation profiles, which poses challenges in modeling efforts. The relationship between landslide size and rainfall intensity is a complex issue that significantly impacts the accuracy of early warning systems (Kirschbaum et al. 2010). When combined with other natural disasters, such as floods and tropical storms, it becomes challenging to isolate landslide-specific impact metrics. To improve data accuracy, it is necessary to clearly distinguish landslides from other disasters, considering their distinctiveness in terms of casualties and geographical extent. In multi-hazard scenarios, it is important to ensure that landslide incidents are not conflated with other hazards, as this can lead to potential underrepresentation in disaster reporting and compromise data robustness for early warning model calibration (Guzzetti 2000). Regional landslide catalogs typically contain precise spatial coordinates and timing data. To improve clarity and comprehensibility, field mapping and remote sensing interpretation can be used to corroborate and update detailed information on the shape characteristics and spatial location of landslides, if necessary(Lacroix et al. 2019; Rapstine et al. 2020). Landform databases may not provide sufficient detail about individual landslides, and instead rely on broader classification scales. This can potentially compromise the accuracy of susceptibility models, as the commonly used grid dimensions in LSM may not accurately represent the affected areas. To mitigate such spatial bias, this study employs 12.5 m resolution DEM data instead of 30 m or 90 m resolutions. This selection is suitable because a significant portion of our database contains mainly minor to intermediate landslides. It is important to consider that lower resolutions may not accurately capture topographical, hydrological, and surface features, which are crucial for precise susceptibility forecasting. The use of randomly selecting non-landslide samples is a method that reduces the need for human intervention. It has been widely adopted in current research for its satisfactory overall predictive accuracy and simplicity of operation (Chen and Zhang 2021). However, it may be worth considering the possibility of high susceptibility samples in areas without a history of landslides. In this case, SSRF may be a better approach because its method of sampling more accurately from areas of very low and very low susceptibility based on the initial susceptibility prediction would reduce the error associated with the quality of the non-landslide samples during the model training and testing phases. Since landslide areas are in the minority and non-landslide areas are in the majority, extending the non-landslide ratio brings the model closer to the real quantitative relationship between landslides and non-landslides in the study area (Chang et al. 2023). Ultimately, we determined that a landslide to non-landslide ratio of 1:4 was the most consistent ratio for modeling landslide susceptibility in the GBA. Although rain gauge observations are the most direct and precise form of precipitation measurements, numerous studies have highlighted the inadequacies of even dense rain gauge networks in accurately capturing the intensity and spatial distribution of heavy rainfall events. Therefore, it is important to consider other methods of precipitation measurement in addition to rain gauges (Scofield and Kuligowski 2003). To overcome this limitation, the current study integrates the MSWEP product, which is known for its high resolution, high stability and high accuracy of precipitation estimation, particularly at the urban scale, to develop an early warning model (Li et al. 2022; Liu 2023; Mo et al. 2024). Additionally, the threshold model described in Section 3.3 has demonstrated its effectiveness in providing dependable alerts. It is also acknowledged that different satellite products may provide distinct assessments of landslide rainfall thresholds. Therefore, future research should concentrate on assessing the appropriateness of various satellite-based precipitation datasets, such as CMORPH and GPM, in the GBA. Further testing will be undertaken to establish multi-source rainfall thresholds and to explore the feasibility of threshold interoperability to improve the reliability and utility of landslide hazard assessments across different event scenarios (Stanley et al. 2017). While the topography of the Bay Area may initially appear stable, its subsurface composition reveals a high degree of heterogeneity, with marked spatial variation. Urban expansion has led to a marked increase in non-porous surfaces, exacerbating runoff and erosion (Mautner et al. 2020). Furthermore, the dynamic nature of urban vegetation cover, influenced by rapid infrastructural development, necessitates constant vigilance. Given the unique geophysical and socio-economic characteristics of the Bay Area, landslide warning methods must accommodate the frequent updates required by the region's evolving road network in order to maintain the predictive accuracy of the hazard assessment model, a key factor in effective disaster prevention (Lizheng et al. 2023). In urban areas, engineered interventions such as drainage systems and retaining structures have been shown to improve slope stability. This makes them more resilient to landslides than rural environments facing similar environmental constraints. When assessing hazards, it is important to account for the considerable influence of engineering measures to ensure precise risk prediction and the formulation of impactful mitigation strategies. 6 Conclusions After eliminating redundant and less significant factors, the remaining 10 factors were utilized to assess landslide susceptibility. Contrary to the conventional approach that involves randomly selecting non-landslide samples throughout the entire region for the RF model, the method begins with a random selection of non-landslide samples across the region to create the LSM. Subsequently, non-landslide samples are reselected from areas classified as low or very low in susceptibility, serving as inputs for the SSRF model. A 1:4 ratio of landslide to non-landslide samples has been found to be effective for the highly urbanized GBA. At this ratio, the SSRF model exhibited an AUC value of 0.973, a substantial enhancement from the 0.779 AUC value attained by the RF model, which involved a random selection of non-landslide samples across the region. The SSRF model successfully identified 74.94% of known landslides within merely 8.98% of the GBA. Spatially, the primary factors contributing to landslides in the GBA are identified as the distance to roads, elevation, and slope. Furthermore, satellite rainfall products data enable the calculation of the grid's historical maximum daily rainfall, in addition to rainfall over three-day and seven-day periods. These measurements are crucial for establishing rainfall thresholds and for assessing the temporal dimension necessary for disaster warning development. The dynamic landslide warning model integrates rainfall thresholds with landslide susceptibility analyses to produce detailed warning cells for regions at extreme risk within highly urbanized areas. Declarations Conflict of interest The authors have no relevant financial or non-financial interests to disclose. Acknowledgments The research is financially supported by National Key R&D Program of China (2021YFC3001000). References Ahmed M, Tanyas H, Huser R et al. (2023) Dynamic rainfall-induced landslide susceptibility: a step towards a unified forecasting system. Int J Appl Earth Obs 125:103593 Brunetti MT, Melillo M, Gariano SL et al. (2021) Satellite rainfall products outperform ground observations for landslide prediction in india. Hydrol Earth Syst Sc 25:3267–3279 Brunetti MT, Peruccacci S, Rossi M et al. (2010) Rainfall thresholds for the possible occurrence of landslides in italy. Nat Hazard Earth Sys 10:447–458 Canavesi V, Segoni S, Rosi A et al. (2020) Different approaches to use morphometric attributes in landslide susceptibility mapping based on meso-scale spatial units: a case study in rio de janeiro (brazil). Remote Sens-Basel 12:1826 Casagli N, Intrieri E, Tofani V et al. (2023) Landslide detection, monitoring and prediction with remote-sensing techniques. Nat Rev Earth Env 4:51–64 Chang Z, Huang J, Huang F et al. (2023) Uncertainty analysis of non-landslide sample selection in landslide susceptibility prediction using slope unit-based machine learning models. Gondwana Res 117:307–320 Chen W, Zhang S (2021) Gis-based comparative study of bayes network, hoeffding tree and logistic model tree for landslide susceptibility modeling. Catena 203:105344 De Sy V, Schoorl JM, Keesstra SD et al. (2013) Landslide model performance in a high resolution small-scale landscape. Geomorphology 190:73–81 Ding Q, Shao Z, Huang X et al. (2022) Time-series land cover mapping and urban expansion analysis using openstreetmap data and remote sensing big data: a case study of guangdong-hong kong-macao greater bay area, china. Int J Appl Earth Obs 113:103001 Dou H, He J, Huang S et al. (2023) Influences of non-landslide sample selection strategies on landslide susceptibility mapping by machine learning. Geomatics, Natural Hazards and Risk 14:2285719 Fan L, Lehmann P, Or D (2015) Effects of hydromechanical loading history and antecedent soil mechanical damage on shallow landslide triggering. Journal of Geophysical Research: Earth Surface 120:1990–2015 Fang L, Wang Q, Yue J, Xing Y (2023) Analysis of optimal buffer distance for linear hazard factors in landslide susceptibility prediction. Sustainability-Basel 15:10180 Froude MJ, Petley DN (2018) Global fatal landslide occurrence from 2004 to 2016. Nat Hazard Earth Sys 18:2161–2181 Gariano SL, Guzzetti F (2016) Landslides in a changing climate. Earth-Sci Rev 162:227–252 Geng S, Zhang H, Xie F et al. (2022) Vegetation dynamics under rapid urbanization in the guangdong–hong kong–macao greater bay area urban agglomeration during the past two decades. Remote Sens-Basel 14:3993 Goudie AS (2016) Quantification of rock control in geomorphology. Earth-Sci Rev 159:374–387 Guzzetti F (2000) Landslide fatalities and the evaluation of landslide risk in italy. Eng Geol 58:89–107 Huang F, Chen J, Liu W et al. (2022) Regional rainfall-induced landslide hazard warning based on landslide susceptibility mapping and a critical rainfall threshold. Geomorphology 408:108236 Huang F, Xiong H, Jiang S et al. (2024) Modelling landslide susceptibility prediction: a review and construction of semi-supervised imbalanced theory. Earth-Sci Rev:104700 Huang F, Ye Z, Jiang S et al. (2021) Uncertainty study of landslide susceptibility prediction considering the different attribute interval numbers of environmental factors and different data-based models. Catena 202:105250 Jiang Y, Wang W, Zou L, Cao Y (2024) Regional landslide susceptibility assessment based on improved semi-supervised clustering and deep learning. Acta Geotech 19:509–529 Johnston EC, Davenport FV, Wang L et al. (2021) Quantifying the effect of precipitation on landslide hazard in urbanized and non-urbanized areas. Geophys Res Lett 48:e2021GL094038 Jones JN, Boulton SJ, Bennett GL et al. (2021) Temporal variations in landslide distributions following extreme events: implications for landslide susceptibility modeling. Journal of Geophysical Research: Earth Surface 126:e2021JF006067 Keefer DK, Larsen MC (2007) Assessing landslide hazards. Science 316:1136–1138 Kim SW, Chun KW, Kim M et al. (2021) Effect of antecedent rainfall conditions and their variations on shallow landslide-triggering rainfall thresholds in south korea. Landslides 18:569–582 Kirschbaum DB, Adler R, Hong Y et al. (2010) A global landslide catalog for hazard applications: method, results, and limitations. Nat Hazards 52:561–575 Kwong A, Wang M, Lee CF, Law KT (2004) A review of landslide problems and mitigation measures in chongqing and hong kong:: similarities and differences. Eng Geol 76:27–39 Lacroix P, Araujo G, Hollingsworth J, Taipe E (2019) Self-entrainment motion of a slow‐moving landslide inferred from landsat‐8 time series. Journal of Geophysical Research: Earth Surface 124:1201–1216 Lacroix P, Handwerger AL, Bièvre G (2020) Life and death of slow-moving landslides. Nat Rev Earth Env 1:404–419 Lee S, Pradhan B (2007) Landslide hazard mapping at selangor, malaysia using frequency ratio and logistic regression models. Landslides 4:33–41 Li J, Liu Z, Wang R et al. (2022) Analysis of debris flow triggering conditions for different rainfall patterns based on satellite rainfall products in hengduan mountain region, china. Remote Sens-Basel 14:2731 Liu Z (2023) Evaluation of rainfall thresholds triggering debris flows in western china with gauged-and satellite-based precipitation measurement. J Hydrol 620:129500 Lizheng D, Hongyong Y, Mingzhi Z, Jianguo C (2023) Research progress on landslide deformation monitoring and early warning technology. Journal of Tsinghua University (Science and Technology) 63:849–864 Loche M, Alvioli M, Marchesini I et al. (2022) Landslide susceptibility maps of italy: lesson learnt from dealing with multiple landslide types and the uneven spatial distribution of the national inventory. Earth-Sci Rev:104125 Maraun D, Knevels R, Mishra AN et al. (2022) A severe landslide event in the alpine foreland under possible future climate and land-use changes. Commun Earth Environ 3:87 Mautner MR, Foglia L, Herrera GS et al. (2020) Urban growth and groundwater sustainability: evaluating spatially distributed recharge alternatives in the mexico city metropolitan area. J Hydrol 586:124909 Merghadi A, Yunus AP, Dou J et al. (2020) Machine learning methods for landslide susceptibility studies: a comparative overview of algorithm performance. Earth-Sci Rev 207:103225 Mo C, Lei X, Mo X et al. (2024) Comprehensive evaluation and comparison of ten precipitation products in terms of accuracy and stability over a typical mountain basin, southwest china. Atmos Res 297:107116 Mondini AC, Guzzetti F, Melillo M (2023) Deep learning forecast of rainfall-induced shallow landslides. Nat Commun 14:2466 Pardeshi SD, Autade SE, Pardeshi SS (2013) Landslide hazard assessment: recent trends and techniques. Springerplus 2:1–11 Peruccacci S, Brunetti MT, Gariano SL et al. (2017) Rainfall thresholds for possible landslide occurrence in italy. Geomorphology 290:39–57 Piciullo L, Gariano SL, Melillo M et al. (2017) Definition and performance of a threshold-based regional early warning model for rainfall-induced landslides. Landslides 14:995–1008 Rapstine TD, Rengers FK, Allstadt KE et al. (2020) Reconstructing the velocity and deformation of a rapid landslide using multiview video. Journal of Geophysical Research: Earth Surface 125:e2019JF005348 Schilirò L, Montrasio L, Mugnozza GS (2016) Prediction of shallow landslide occurrence: validation of a physically-based approach through a real case study. Sci Total Environ 569:134–144 Scofield RA, Kuligowski RJ (2003) Status and outlook of operational satellite precipitation algorithms for extreme-precipitation events. Weather Forecast 18:1037–1051 Segoni S, Lagomarsino D, Fanti R et al. (2015) Integration of rainfall thresholds and susceptibility maps in the emilia romagna (italy) regional-scale landslide warning system. Landslides 12:773–785 Segoni S, Piciullo L, Gariano SL (2018) A review of the recent literature on rainfall thresholds for landslide occurrence. Landslides 15:1483–1501 Stanley T, Kirschbaum DB, Huffman GJ, Adler RF (2017) Approximating long-term statistics early in the global precipitation measurement era. Earth Interact 21:1–10 Tan Y, Guo D, Xu B (2015) A geospatial information quantity model for regional landslide risk assessment. Nat Hazards 79:1385–1398 Wang Y, Gao G, Zhai J et al. (2023) Evolution characteristics of the rainstorm disaster chains in the guangdong–hong kong–macao greater bay area, china. Nat Hazards:1–22 Wang Y, Zhai J, Gao G et al. (2022) Risk assessment of rainstorm disasters in the guangdong–hong kong–macao greater bay area of china during 1990–2018. Geomatics, Natural Hazards and Risk 13:267–288 Xin Y, Lu N, Jiang H et al. (2021) Performance of era5 reanalysis precipitation products in the guangdong-hong kong-macao greater bay area, china. J Hydrol 602:126791 Xing Y, Chen Y, Huang S et al. (2023) Research on the uncertainty of landslide susceptibility prediction using various data-driven models and attribute interval division. Remote Sens-Basel 15:2149 Youssef B, Bouskri I, Brahim B et al. (2023) The contribution of the frequency ratio model and the prediction rate for the analysis of landslide risk in the tizi n'tichka area on the national road (rn9) linking marrakech and ouarzazate. Catena 232:107464 Yu X, Zhang K, Song Y et al. (2021) Study on landslide susceptibility mapping based on rock–soil characteristic factors. Sci Rep-Uk 11:15476 Zhou B, Zeng H, Zhao L, Han Z (2023) Climate change and climate risks in the guangdong-hong kong-macau greater bay areaAnnual Report on Actions to Address Climate Change (2019) Climate Risk Prevention.Springer,pp 173–193 Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-5684743","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":405611068,"identity":"45e3c62d-094f-4eca-94cf-bb557b884b92","order_by":0,"name":"Haixia Yu","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABCklEQVRIiWNgGAWjYFACHgaGBAYJBn4g88ADJEHCWiQbgFoSiNYCAgYHGEB6idBicCP34IeHbRbyxtcOPwTaUpc4f0YC44O3bQzy5ji15CVLJJyRMNx2O80AqOVw4oYbCcyGc9sYDHc24NKSYyCRUCHBuO12AkjLgcQNEgls0rxtDAlgp2LXYvwjwUDCfvPs9A8wh7H/JqDFDGRL4gbpHJAtzIkNNxLYmPFpkTzzxswC6JfkGbdzCg4kGBw23nDmYbPknHMShhtwaOE7nmN882dbnW3/7PTNHz5U1MnOb08++OFNmY08LlsUUMUNGBwbGBgbgCwJ7OqBQL4BTcAep9JRMApGwSgYsQAAmgFiQ5p3wlQAAAAASUVORK5CYII=","orcid":"https://orcid.org/0009-0003-1114-2733","institution":"Sun Yat-Sen University","correspondingAuthor":true,"prefix":"","firstName":"Haixia","middleName":"","lastName":"Yu","suffix":""},{"id":405611069,"identity":"c7563f5e-8e2e-4b19-9d3b-f7e871599b86","order_by":1,"name":"Yi Jin","email":"","orcid":"","institution":"Sun Yat-Sen University","correspondingAuthor":false,"prefix":"","firstName":"Yi","middleName":"","lastName":"Jin","suffix":""},{"id":405611070,"identity":"fed27908-befa-479a-acc2-5ec4d3cb5acc","order_by":2,"name":"Kunlong He","email":"","orcid":"","institution":"Sun Yat-Sen University","correspondingAuthor":false,"prefix":"","firstName":"Kunlong","middleName":"","lastName":"He","suffix":""},{"id":405611071,"identity":"44fae770-e5d0-4373-a5a9-7bd366d94f44","order_by":3,"name":"Xuan Yu","email":"","orcid":"","institution":"Institute of Soil Science Chinese Academy of Sciences","correspondingAuthor":false,"prefix":"","firstName":"Xuan","middleName":"","lastName":"Yu","suffix":""}],"badges":[],"createdAt":"2024-12-20 14:12:10","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-5684743/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-5684743/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":74668864,"identity":"7c19150d-b5df-4956-858e-2ee4921f1893","added_by":"auto","created_at":"2025-01-24 13:45:56","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":20839564,"visible":true,"origin":"","legend":"\u003cp\u003eThe location of the study area\u003c/p\u003e","description":"","filename":"image1.png","url":"https://assets-eu.researchsquare.com/files/rs-5684743/v1/8c8c96e00ee1167b9c0d94d4.png"},{"id":74668859,"identity":"57df9380-1773-48d8-a758-41dde1730595","added_by":"auto","created_at":"2025-01-24 13:45:56","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":508074,"visible":true,"origin":"","legend":"\u003cp\u003eCriteria for classification of rainfall thresholds\u003c/p\u003e","description":"","filename":"image2.png","url":"https://assets-eu.researchsquare.com/files/rs-5684743/v1/e21c4db576daa648677738d0.png"},{"id":74668862,"identity":"25790a6a-fe5b-434f-a047-63a39565da6a","added_by":"auto","created_at":"2025-01-24 13:45:56","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":520740,"visible":true,"origin":"","legend":"\u003cp\u003eClassification of landslide hazard warning zones\u003c/p\u003e","description":"","filename":"image3.png","url":"https://assets-eu.researchsquare.com/files/rs-5684743/v1/0c3911e5fc9b70a32c6dac4c.png"},{"id":74668858,"identity":"9de01f09-8dae-4d61-b8f4-172f20aa8a8d","added_by":"auto","created_at":"2025-01-24 13:45:56","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":135746,"visible":true,"origin":"","legend":"\u003cp\u003ePearson correlation plots for each factor\u003c/p\u003e","description":"","filename":"image4.png","url":"https://assets-eu.researchsquare.com/files/rs-5684743/v1/116673ae6866ca8c8d4a693b.png"},{"id":74669632,"identity":"927d0164-4e28-493f-abf6-6981e5b8a1ec","added_by":"auto","created_at":"2025-01-24 13:53:56","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":865626,"visible":true,"origin":"","legend":"\u003cp\u003eOptimal parameter selection process for the SSRF model: \u003cstrong\u003e(a)\u003c/strong\u003e ROC plots for five non-landslide sample ratios, \u003cstrong\u003e(b)\u003c/strong\u003edistribution features of the LSI of SSRF and RF models under an AIN of 8 \u003cstrong\u003e(c)\u003c/strong\u003eimportance of each factor.\u003c/p\u003e","description":"","filename":"image5.png","url":"https://assets-eu.researchsquare.com/files/rs-5684743/v1/3ca8465eb1c28f4574a03f46.png"},{"id":74668895,"identity":"09f3ad24-09cb-4204-b387-792586d45e92","added_by":"auto","created_at":"2025-01-24 13:45:57","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":25891686,"visible":true,"origin":"","legend":"\u003cp\u003eLandslide susceptibility of GBA:(a) distribution of 476 landslide samples in different susceptibility zones, (b) area ratio of different landslide susceptibility levels in GBA, (c) proportion of 476 landslides in different susceptibility levels.\u003c/p\u003e","description":"","filename":"image6.png","url":"https://assets-eu.researchsquare.com/files/rs-5684743/v1/9c4c6b5246ef4b39101884e2.png"},{"id":74668867,"identity":"f30bb3a6-e822-4cb2-9b5f-26af4e9eab5f","added_by":"auto","created_at":"2025-01-24 13:45:56","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":16177771,"visible":true,"origin":"","legend":"\u003cp\u003eComparison of landslide prediction effects of (a) rainfall threshold warning and (b) dynamic landslide hazard warning on August 31, 2018.\u003c/p\u003e","description":"","filename":"image7.png","url":"https://assets-eu.researchsquare.com/files/rs-5684743/v1/a982f8532f376ad3cb894605.png"},{"id":74668896,"identity":"8f727b52-41e1-41d0-a7b1-ba43c317124c","added_by":"auto","created_at":"2025-01-24 13:45:57","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":30849345,"visible":true,"origin":"","legend":"\u003cp\u003eLandslide warning for August 15 to August 20, 2013.\u003c/p\u003e","description":"","filename":"image8.png","url":"https://assets-eu.researchsquare.com/files/rs-5684743/v1/f541f36ae06d8e9eda62576a.png"},{"id":80065164,"identity":"33ee4e40-855d-4f61-aa80-1ca5945297d1","added_by":"auto","created_at":"2025-04-07 13:06:33","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":92536855,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5684743/v1/11e620e2-f505-4e54-bbd6-095b375c0084.pdf"}],"financialInterests":"","formattedTitle":"Advanced Landslide Early Warning System Based on a Semi-supervised Model in Highly Urbanized Areas across China's Greater Bay Area","fulltext":[{"header":"1 Introduction","content":"\u003cp\u003eLandslides are a frequent and damaging natural occurrence, influenced by geological, meteorological, and human factors (Casagli et al. 2023). Between 2004 and 2016, landslides caused the deaths of around 55,997 people worldwide, with Asia being the most affected region (Froude and Petley 2018). China has also suffered greatly, with landslides accounting for 14,394 deaths between 1940 and 2020. Notably, the majority of these incidents occurred during the country's rainy season, which spans from April to September. The situation is further complicated by the fact that the unpredictability of landslides is intensified by human activities and climatic changes, which increase risk factors and the spatial-temporal variability of occurrences (Gariano and Guzzetti 2016; Johnston et al. 2021; Jones et al. 2021). Consequently, an accurate prediction of landslide susceptibility is an essential component to preemptively mitigate the potential detrimental impacts on both the constructed and natural environments due to landslides (Keefer and Larsen 2007; Lacroix et al. 2020). Recent advancements in landslide susceptibility prediction (LSP) have leveraged methodologies grounded in statistical analyses and burgeoning machine learning techniques, establishing a foundation for sophisticated hazard warning systems.\u003c/p\u003e \u003cp\u003eIn the realm of LSP, statistical models such as the information value (IV) model and frequency ratio model play a pivotal role. These models utilize probabilistic analysis, integrating environmental variables with historical landslide incidents to forecast susceptibility (Pardeshi et al. 2013; Tan et al. 2015). Parallel to this, the machine learning spectrum has expanded to incorporate models like the RF, support vector machines (SVM), artificial neural networks (ANN), and decision trees (DT), with RF particularly noted for its proficiency in managing multidimensional datasets and projecting geohazard susceptibilities at a regional level (Merghadi et al. 2020). While fine-tuning model parameters before training may lead to increased accuracy in LSP, the modeling process remains beset by uncertainties. Examples include methods for selecting non-landslide samples for input to the RF model regarding both quantity and spatial distribution (Dou et al. 2023), determining the grid location of landslide samples based on the landslide catalog (Loche et al. 2022), and the categorization of environmental factors that include the treatment of continuous variables (Huang et al. 2021).\u003c/p\u003e \u003cp\u003eFurthermore, the occurrence of regional landslides is intricately linked to the hydrological elements of precipitation, particularly the amount and duration, which are critical factors for the onset and movement of slope failures. Physically-based models adeptly integrate these rainfall attributes, offering reliable predictions for the occurrence and spread of rainfall-induced landslides (Schilir\u0026ograve; et al. 2016). However, these methods mainly show localized precision, which is specific to the environmental conditions of the study area, and often rely on detailed site-specific datasets that may not be consistently applicable to different geographic locations or larger scales (De Sy et al. 2013). This specificity presents a substantial challenge in the general applicability of the models since the diversity and uniqueness of geographical settings can substantially influence their transferability. Hence, advancing universally applicable methodologies is paramount, necessitating integrating historical rainfall datasets and exhaustive landslide inventory compilations. These methodologies promote the construction of uniform models that guarantee the scalability of landslide prediction across various terrains (Mondini et al. 2023; Segoni et al. 2018). Notably, models that delineate rainfall thresholds according to cumulative event rainfall-duration (ED) and intensity-duration (ID) criteria have proven to be effective in predicting landslide occurrences, particularly for events triggered by extensive rainfall (Kim et al. 2021). Expanding upon this base, leveraging global satellite rainfall data promises to refine the precision of rainfall threshold models, courtesy of their broad scope and regular updates (Maraun et al. 2022). Nonetheless, while rainfall thresholds assist in pinpointing when landslides might occur, they fall short in shedding light on where landslides might transpire spatially. To surmount this shortfall, one can utilize landslide susceptibility maps, which depict the probability of landslides over an extended period within a delineated locale (Ahmed et al. 2023). Moreover, the static representation in a landslide susceptibility map does not encapsulate the dynamic influence of rainfall, posing challenges for real-time landslide prediction.\u003c/p\u003e \u003cp\u003eThis research has given rise to a spatial-temporal model that fuses the unchanging aspects of landslide susceptibility with the fluctuating traits of rainfall thresholds, aiming to anticipate landslides in the Greater Bay Area. Calculating warning levels for distinct zones enables the anticipation of landslide disaster risks for specified regions within set intervals. Such an approach lays down a scientific framework for informed disaster risk reduction and evasion.\u003c/p\u003e"},{"header":"2 Study area and datasets","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.1 Study area\u003c/h2\u003e \u003cp\u003eThe GBA is a crucial urban agglomeration in coastal China, with a well-established metropolitan area, extensive port infrastructure, and a sophisticated network for transportation. Covering nine cities of Guangdong Province, including Guangzhou, Shenzhen, Zhuhai, Foshan, Zhongshan, Dongguan, Huizhou, Jiangmen, and Zhaoqing, as well as the special administrative regions of Hong Kong and Macau, the area is fringed by mountains on three sides and is open to the sea on the fourth, presenting a rich tapestry of landforms (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eWith China\u0026rsquo;s reform and opening up, urbanization within the GBA has dramatically altered its spatial attributes. For example, a six-fold increase in impervious surface areas, extensive land cover changes (Ding et al. 2022), and the central urban regions\u0026rsquo; fluctuating and often low NDVI values (Geng et al. 2022), amongst others.In terms of climatic conditions, the GBA is located along the coast of South China with a subtropical humid monsoon climate, where typhoons and rainstorms are frequent (Wang et al. 2022). The average yearly rain is between 1300 and 2500 mm and the rainy season occurs from April to September annually (Xin et al. 2021). Despite a decreasing trend in the overall number of rainy days, recent decades have seen a surge in both heavy precipitation events and extreme weather patterns, exacerbating landslide hazards across the GBA. Climate risks in the GBA will become more acute. It is urgent to rationalize the use of climate resources and improve the capacity for climate disaster and risk management (Zhou et al. 2023) Geologically, the area is characterized by a stratigraphic sequence spanning from the Upper Proterozoic to the Quaternary period, with a notable presence of Upper Paleozoic carbonate formations and a veneer of Quaternary deposits. The bedrock in the region is predominantly granite, indicating a geological framework that potentially poses intricate challenges to the geomorphological stability due to weathering and fracturing processes inherent in granitic terrains (Goudie 2016). Moreover, the coastal areas witness a widespread distribution of Upper Paleozoic carbonate rocks and a superficial layer of Quaternary deposits, contributing to the diverse geology from the Upper Proterozoic to the Quaternary period.\u003c/p\u003e \u003cp\u003eGiven the aforesaid factors, the uncertainty surrounding meteorological disaster-induced landslides, such as rainstorms and typhoons in the GBA, is increasing (Wang et al. 2023). Therefore, accurately categorizing this susceptibility and implementing targeted disaster prevention is imperative. The study region was demarcated into evaluation cells, measuring 12.5 m \u0026times; 12.5 m, to amass a total of 8.40 \u0026times; 10\u003csup\u003e8\u003c/sup\u003e grid cells.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e2.2 Datasets\u003c/h2\u003e \u003cp\u003eThe landslide data was collated from the Guangdong Provincial Disaster Prevention and Mitigation Yearbook (GDPMY) and NASA's Global Landslide Catalog (GLC). The GDPMY contains overviews, major events, disaster descriptions, and appendices dating back to 1995. The GLC has been developed for rainfall-induced landslides reported in the media, online databases and other sources. It uses data from 2007 to the present (Kirschbaum et al. 2010). Finally, a total of 476 landslide events with precise temporal and spatial details from 2008 to 2020 were identified as the focus of this study. Heavy rainfall served as the principal trigger for those landslides, followed by anthropogenic construction activities.\u003c/p\u003e \u003cp\u003eThe landslide events dataset was systematically partitioned into three separate subsets, adhering to a ratio of 6:2:2. Specifically, 60% (285 landslide events) of the aggregated samples were allocated for the training phase of the SSRF, aimed at constructing a robust predictive engine. A further 20% (95 landslide events) were reserved for the SSRF model's testing sequence, facilitating an initial performance assessment. The remainder, also constituting 20% (96 landslide events), functioned as the validation set designated to evaluate the efficacy of the warning model post-integration with the proposed rainfall threshold mechanism. This stratified distribution ensures a comprehensive approach to model development by encompassing distinct phases from calibration to validation, thereby fortifying the predictive reliability of the landslide early warning system.\u003c/p\u003e \u003cp\u003ePrecipitation data was sourced from the Multi-Source Weighted-Ensemble Precipitation (MSWEP) dataset with a pixel resolution of 0.1\u0026deg; (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.gloh2o.org/mswep/).Th\u003c/span\u003e\u003cspan address=\"https://www.gloh2o.org/mswep/).Th\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003ee 12.5 m DEM of the study area was collected by the Advanced Land Observation Satellite (ALOS). Furthermore, the elevation, slope, aspect, curvature, profile curvature, plane curvature, flow direction, relief amplitude, and TWI were extracted. Among them, curvature, profile curvature and plan curvature are calculated in the Spatial Analyst Tools - Surface - Curvature option. The D8 method is used to create a grid of flow directions from each pixel to its downhill adjacent points. Relief amplitude is the vertical difference between the highest and lowest elevations in a specific area. It is a macroscopic index used to describe the terrain characteristics of a district. The maximum and minimum values were obtained using the Spatial Analyst Tools- Neighborhood - Focus Statistics option using the Focus Statistics tools in ArcGIS Pro. TWI is a physical index of the influence of regional topography on runoff flow direction and accumulation, which is calculated from the slope and flow direction of the grid. The Open Street Map (OSM) at \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://www.openstreetmap.org/\u003c/span\u003e\u003cspan address=\"http://www.openstreetmap.org/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e provides road and water system data. The NDVI data are derived from the China regional 250 m normalized difference vegetation index dataset (2000\u0026ndash;2022), while soil type distribution was gathered from the soil map of China at a 1:1\u0026nbsp;million scale (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.resdc.cn/data.aspx?DATAID=145\u003c/span\u003e\u003cspan address=\"https://www.resdc.cn/data.aspx?DATAID=145\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). All of the above data were eventually resampled to 12.5m through ArcGIS Pro in preparation for analysis.\u003c/p\u003e \u003c/div\u003e"},{"header":"3 Methods","content":"\u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003e3.1 Frequency ratio model\u003c/h2\u003e \u003cp\u003eFR is a statistical method used to classify various factors into intervals. Its history is related to geological engineering and research on geological disasters (Lee and Pradhan 2007; Youssef et al. 2023). The FR was calculated according to the following equation:\u003cdiv id=\"Equa\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equa\" name=\"EquationSource\"\u003e\n$$\\:\\begin{array}{c}FR=\\frac{\\raisebox{1ex}{$N$}\\!\\left/\\:\\!\\raisebox{-1ex}{${N}_{0}$}\\right.}{\\raisebox{1ex}{$S$}\\!\\left/\\:\\!\\raisebox{-1ex}{${S}_{0}$}\\right.}\\ \\left(1\\right)\\end{array}$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003ewhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:N\\)\u003c/span\u003e\u003c/span\u003e is the area of geohazards within the categorized interval of a factor, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{N}_{0}\\)\u003c/span\u003e\u003c/span\u003e is the total area of geohazards in the study area, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:S\\)\u003c/span\u003e\u003c/span\u003e is the categorized area of the factor, and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{S}_{0}\\)\u003c/span\u003e\u003c/span\u003e is the total area. A FR value exceeding 1 suggests an increased likelihood of geohazard occurrences attributed to the factor, whereas an FR below 1 implies a reduced susceptibility. This study considers soil type, aspect, and flow direction as discrete factors, each with specific meanings and influences on landslides (Canavesi et al. 2020). Roads and water systems need to be buffered separately due to their different distribution and impact on landslides. These buffer zones were defined based on iterative procedures, conclusively setting 125 meters as the buffer distance of the road and 250 meters as the buffer distance of the river, respective distances that optimally encapsulate the influence on landslide incidences (Fang et al. 2023). The assignment of continuous factors to specific intervals is pivotal in the model's predictive capability. A small AIN value can result in over-rough division of factors, which reduces the prediction accuracy of the model. On the other hand, a large AIN value can increase the complexity and computational cost of modeling (Xing et al. 2023). It was found that when the attribute interval number (AIN) exceeded a threshold of 8, the incremental gain in prediction precision plateaued or even decreased. Hence, an AIN of 8 was identified as an optimal trade-off between model simplicity and comprehensive factor representation.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003e3.2 Semi-supervised random forest model\u003c/h2\u003e \u003cp\u003eLeveraging the competencies of the RF model, a supervised learning algorithm renowned for its robust applicability in complex classification tasks, this study integrates semi-supervised techniques to address the known inaccuracies engendered by the arbitrary selection of non-landslide samples (Huang et al. 2024). SSRF effectively combines the robust, multivariate processing capability of RF with semi-supervised learning mechanisms, minimizing the biases introduced by arbitrary non-landslide sample selection (Jiang et al. 2024). The SSRF modeling approach is as follows: Firstly, non-landslide samples are randomly selected from the grids in the region. Then, the RF model in the tidymodels package in R is used for LSP. The predicted landslide susceptibility index (LSI) with a range of 0\u0026ndash;1 are then imported into ArcGIS Pro and classified into five levels: very low, low, medium, high, and very high using the natural breaks method. Non-landslide samples with higher reliability were then randomly selected from the very low and low susceptibility zones of the five levels.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003e3.3 Rainfall threshold warning model\u003c/h2\u003e \u003cp\u003eCurrent landslide rainfall threshold research is mainly based on empirical methods, which derive thresholds from the relationship between historical landslides and precipitation without directly considering physical processes. This approach indirectly predicts landslide occurrences through rainfall as the primary inducing factor (Brunetti et al. 2010). Due to the convenience of data acquisition, strong generalizability, and good applicability, it has been widely used worldwide (Brunetti et al. 2021; Peruccacci et al. 2017; Piciullo et al. 2017). To determine the rainfall thresholds that trigger landslides in the GBA, this study has adopted a comprehensive classification standard. This criterion combines rainfall data from the GBA from 2008 to 2020, including daily rainfall, cumulative three-day rainfall, and cumulative seven-day rainfall (Fan et al. 2015). For each grid, the ratio of rainfall within these time frames to the historical maximum rainfall is calculated. Utilizing landslide data comprising 80% (380 landslide events), percentiles for these ratios were established to evaluate the precipitation situation at each grid. As illustrated in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, the step implemented involves comparing the observed daily rainfall at each grid to the historical maximum for that day. If this comparison exceeds the 95th percentile among training samples, the grid is immediately classified as a zone of very high risk (R1). Additionally, if the cumulative three-day and seven-day rainfall ratios surpass 95% and 90% of the training samples, respectively, the grid is also classified as R1. Grids where the cumulative three-day rainfall ratio ranges from 75\u0026ndash;95%, or the cumulative seven-day rainfall ratio is between 80% and 90%, are determined to be high-risk zones (R2). Risk levels for other ratio intervals are delineated likewise. A landslide warning is issued when the rainfall reaches R3 or more.\u003c/p\u003e \u003cp\u003eIt is important to note that if the rainfall ratios do not reach these specific thresholds such as a daily rainfall ratio is below 95% or the cumulative three-day rainfall ratio is less than 40%, it is difficult to determine the exact rainfall level. In these cases, the grid remains in an indeterminate zone, which is depicted in grey on the Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e and represents an unspecified risk area. This is due to the observation that relying solely on the values of daily or three-day rainfall, which are not significantly high, does not comprehensively predict landslides in the GBA based on its historical patterns. Ultimately, the rainfall level for each grid is determined by the highest level indicated among the daily, cumulative three-day, and cumulative seven-day rainfall ratio (i.e., the red dashed line in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). The rainfall level on the first day is determined by determining the percentage of rain that falls on that day, categorized as R1. Although the rainfall on the second day decreases to a normal level, it is categorized as being in R2 by the proportion of cumulative rainfall over the three days due to the heavy rainfall on the first two days.This methodology allows for a synthesis of real-time rainfall data with historical accumulations, assessing the extent of historical anomaly in rainfall patterns and its impact on the potential probability of landslide initiation. We think that when the rainfall reaches R3 or more, a landslide warning is issued.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003e3.4 Dynamic landslide hazard warning model\u003c/h2\u003e \u003cp\u003eThe warning model based on a matrix method integrates the characteristics of landslide susceptibility mapping (LSM) and rainfall thresholds to realize the spatial-temporal early warning of landslide (Huang et al. 2022; Segoni et al. 2015). LSM enables the prediction of landslides under the influence of environmental ontogenetic factors and facilitates preventive interventions in very high and high susceptibility areas. Concurrently, rainfall thresholds gauge instantaneous probabilities of cumulative precipitation inducing landslides. Figure\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e shows the specific warning levels established through this model. In monitoring areas classified within the Very High Susceptibility Zone, the issuance of the highest emergency level warning is mandated upon R1 classification, underscoring a pronounced likelihood of landslide occurrence. The immediacy of this threat necessitates prompt response protocols, including on-site verification, evacuation, and secure sheltering, to mitigate potential loss of life and property. Similarly, the rest of the landslide hazard classification criteria need to be based on the corresponding susceptibility class level areas that are discriminated by the rainfall thresholds. We consider that a landslide warning is issued when its warning level reaches above watch, and therefore serves as a criterion for whether to predict a landslide.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"4 Result","content":"\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003e4.1 Elimination of redundant impact factors\u003c/h2\u003e \u003cp\u003eThe genesis of landslides is multifactorial, and inter-correlations among predictive indicators could propagate errors, compromising model precision. The study employed Pearson correlation analysis to evaluate the linear relationships among various predisposing factors (Yu et al. 2021). It is posited that Pearson coefficients with absolute values below 0.5 signify negligible inter-dependence among the evaluation metrics. Conversely, coefficients exceeding 0.5 indicate a moderate correlation between the evaluation factors. The results of the analysis are shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e, which shows that there is a significant correlation between relief amplitude, slope, and the TWI, with correlation coefficients surpassing the 0.5 threshold. The SSRF model's output underscored a comparatively minor contribution of relief amplitude to landslide susceptibility, an indicator of more substantial impact from other assessed factors. Analysis revealed substantial coefficients for both curvature against profile curvature, and flow direction against slope direction, prompting their exclusion to avoid collinearity. Consequently, the study identified elevation, slope, aspect, plan curvature, profile curvature, TWI, NDVI, soil type, distance to road, distance to river as the fundamental variables influencing landslide susceptibility within the GBA.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003e4.2 Combination of factors indicating high probability of landslide occurrence\u003c/h2\u003e \u003cp\u003eUtilizing the attribute interval number value of 8, as delineated in Section 3.1, this study classified continuous factors impacting landslides and summarized the results in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. Elevation between 38 to 311 meters, covering 42.2% of the GBA, consistently showed FR values above 1, indicating heightened landslide susceptibility. Specifically, the elevation band between 38 and 113 meters, constituting a low-altitude hilly topography in the GBA, displays the apex of FR values. Although not reaching mountainous classifications, this segment is characterized by its moderate heights, likely presenting undulating hillocks and gentle slopes. Diverging from the prototypical steeply inclined landscapes that typify landslide-vulnerable areas within China, the GBA's inclination rarely transcends 5 degrees, offering a unique pattern of susceptibility. A significant concentration of landslides, approximately 69.6% of documented occurrences, are situated within gentle to moderate slopes ranging from 7.64 to 27.94 degrees.\u003c/p\u003e \u003cp\u003eIn terms of road transport, the GBA boasts a sophisticated road network reflective of its urbanization. The areas within 125 meters of roadways encompass over 30% of the region and are associated with a FR exceeding 1.4. Remarkably, FR values peak between 125 to 250 meters distance from roads, hinting at a transitional corridor from dense urban to burgeoning suburban zones. The ascendant FR values potentially stem from the abundance of virgin slopes and geological protrusions, prone to heightened landslide susceptibility (Kwong et al. 2004). Beyond 375 meters road distance, all FR values fall below 1, suggesting a significant suppression of landslide occurrences in this range. This phenomenon is presumably aided by denser vegetation, milder topographic gradients, and differing edaphic compositions. Meanwhile, Spatial distribution analysis of FR values in relation to aquatic vicinities unfurls a non-linear trend. An initial ascension is observed for locations within the 0 to 500 meters range from river, peaking with an FR of 1.222 within the 500 to 750 meters range. This relationship delineates an intricate interplay of erosive force and hydrological variance, which imbues the proximal fringes of water bodies with greater susceptibility. Conversely, the retreat from river influence beyond this distance diminishes landslide propensity, where topographic and vegetative factors preside in susceptibility determination. Conclusively, the correlation between the spatial proximity to rivers, roads, and landslide incidence shows that regions within a 250 to 750 meters radius from river, and up to 250 meters from road infrastructure, exhibit an excess FR value of over 1.2. This demarks an augmented likelihood of landslide emergence within these confines.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eFrequency analysis of all conditioning factors.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFactors\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eInterval\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eFR\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eFactors\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eInterval\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eFR\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"7\" rowspan=\"8\"\u003e \u003cp\u003eElevation (m)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;38\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.622\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\" morerows=\"7\" rowspan=\"8\"\u003e \u003cp\u003eSlope (\u0026deg;)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0\u0026thinsp;~\u0026thinsp;3.62\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.551\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e38\u0026ndash;113\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e2.047\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e3.62\u0026thinsp;~\u0026thinsp;7.64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.762\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e113\u0026ndash;205\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.374\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e7.64\u0026thinsp;~\u0026thinsp;12.47\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1.448\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e205\u0026ndash;311\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.041\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e12.47\u0026thinsp;~\u0026thinsp;17.44\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1.441\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e311\u0026ndash;436\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.408\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e17.43\u0026thinsp;~\u0026thinsp;22.49\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1.334\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e436\u0026ndash;595\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.108\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e22.49\u0026thinsp;~\u0026thinsp;27.94\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1.283\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e595\u0026ndash;822\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.112\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e27.94\u0026thinsp;~\u0026thinsp;34.97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.817\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u0026gt;\u0026thinsp;822\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e34.97\u0026thinsp;~\u0026thinsp;78.46\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.588\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"7\" rowspan=\"8\"\u003e \u003cp\u003eProfile curvature\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u0026lt;-1.69\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.869\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\" morerows=\"8\" rowspan=\"9\"\u003e \u003cp\u003eAspect\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eFlat\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-1.69~-0.92\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.736\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eNorth\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.523\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-0.92~-0.31\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.014\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eNortheast\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1.175\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-0.31\u0026thinsp;~\u0026thinsp;0.30\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.967\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eEast\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1.549\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.30\u0026thinsp;~\u0026thinsp;0.89\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.949\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSoutheast\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1.249\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.89\u0026thinsp;~\u0026thinsp;1.54\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.359\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSouth\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1.091\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1.54\u0026thinsp;~\u0026thinsp;2.83\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.329\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSouthwest\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1.016\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u0026gt;\u0026thinsp;2.83\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eWest\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.7\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"7\" rowspan=\"8\"\u003e \u003cp\u003ePlan curvature\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u0026lt;-1.65\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.847\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eNorthwest\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.787\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-1.65~-0.89\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.059\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\" morerows=\"8\" rowspan=\"9\"\u003e \u003cp\u003eDistance to road (m)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;125\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1.454\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-0.89~-0.29\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.922\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e125\u0026thinsp;~\u0026thinsp;250\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1.566\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-0.29\u0026thinsp;~\u0026thinsp;0.29\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.857\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e250\u0026thinsp;~\u0026thinsp;375\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1.049\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.28\u0026thinsp;~\u0026thinsp;0.87\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.275\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e375\u0026thinsp;~\u0026thinsp;500\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.868\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.87\u0026thinsp;~\u0026thinsp;1.54\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.154\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e500\u0026thinsp;~\u0026thinsp;625\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.892\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1.54\u0026thinsp;~\u0026thinsp;2.76\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.47\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e625\u0026thinsp;~\u0026thinsp;750\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.296\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u0026gt;\u0026thinsp;2.76\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e750\u0026thinsp;~\u0026thinsp;875\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.76\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"7\" rowspan=\"8\"\u003e \u003cp\u003eTWI\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;5.20\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.359\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e875\u0026thinsp;~\u0026thinsp;1000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.546\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e5.20\u0026ndash;7.13\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.093\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u0026gt;\u0026thinsp;1000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.39\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e7.13\u0026ndash;9.44\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.764\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\" morerows=\"8\" rowspan=\"9\"\u003e \u003cp\u003eDistance to river (m)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;250\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.747\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e9.44\u0026ndash;11.42\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.718\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e250\u0026thinsp;~\u0026thinsp;500\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1.21\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e11.42\u0026ndash;13.02\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.416\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e500\u0026thinsp;~\u0026thinsp;750\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1.222\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e13.0-14.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.359\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e750\u0026thinsp;~\u0026thinsp;1000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1.049\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e14.95\u0026ndash;18.26\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.395\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e1000\u0026thinsp;~\u0026thinsp;1250\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1.036\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u0026gt;\u0026thinsp;18.26\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.384\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e1250\u0026thinsp;~\u0026thinsp;1500\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1.032\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"7\" rowspan=\"8\"\u003e \u003cp\u003eNDVI\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-0.2~-0.092\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.126\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e1500\u0026thinsp;~\u0026thinsp;1750\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.709\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-0.092\u0026thinsp;~\u0026thinsp;0.119\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.202\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e1750\u0026thinsp;~\u0026thinsp;2000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.56\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.119\u0026thinsp;~\u0026thinsp;0.297\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.962\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u0026gt;\u0026thinsp;2000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.611\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.297\u0026thinsp;~\u0026thinsp;0.452\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.133\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\" morerows=\"4\" rowspan=\"5\"\u003e \u003cp\u003eSoil type\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eAquic Soil\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.423\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.452\u0026thinsp;~\u0026thinsp;0.607\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.06\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSolonchak\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.607\u0026thinsp;~\u0026thinsp;0.733\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.06\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eFerralsols\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.834\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.733\u0026thinsp;~\u0026thinsp;0.827\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.232\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eAnthrosols\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1.138\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.827\u0026thinsp;~\u0026thinsp;0.995\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.797\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eOthers\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.513\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003e4.3 Landslide susceptibility map generatio\u003c/h2\u003e \u003cp\u003eIn order to investigate the implications of non-landslide sample representation on the SSRF\u0026rsquo;s landslide susceptibility mapping efficacy in the GBA, different proportions of non-landslide samples were randomly selected. Landslide samples, designated with a \u0026lsquo;1\u0026rsquo; label, were paired with non-landslide samples, labeled \u0026lsquo;0\u0026rsquo;, across a range of ratios from 1:1 up to 1:5. These combinations, integrated with previously calculated FR, served as the input factors for refining the SSRF's predictive analytics. Analyses indicate that an increase in the proportion of non-landslide samples significantly reduces the number of grids categorized as highly susceptible, while simultaneously increasing those classified as less susceptible.\u003c/p\u003e \u003cp\u003eWith the different pairing ratios mentioned above, the proportions of historical landslides in very high and high susceptibility areas were 83.37%, 94.74%, 91.79%, 91.79%, and 89.05%, respectively. Among these, the number of historical landslides contained in the grids predicted as very high and high susceptibility areas at the ratios of 1:2, 1:3, and 1:4 ratios exceeded 90%, and the AUC values were 0.924, 0.950, and 0.973. The ROC curves of the five models are shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e(a). For refined statistical delineation, the LSI was segmented into 64 intervals. Within the SSRF model configurations of 1:2, 1:3, and 1:4, the derived mean LSI values were 0.375, 0.319, and 0.240, with standard deviations of 0.315, 0.272, and 0.239, respectively. A lower mean LSI coupled with a higher standard deviation signifies reduced predictive uncertainty. Ultimately, a 1:4 landslide to non-landslide sample ratio was established to yield the superior predictive prowess for the SSRF model. Meanwhile, the performance of the SSRF and traditional RF models in producing LSM was evaluated; the optimal ratio of landslide to non-landslide samples for the RF model was determined across the study area, and LSI was likewise apportioned into 64 intervals. As shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e(b), the mean LSI of the SSRF model was 0.241, which is higher than the RF's mean value of 0.180. However, the SSRF had a significantly larger standard deviation of 0.239 compared to 0.149 of RF. And the AUC value of the SSRF model under the optimal ratio of landslide and non-landslide is 0.973, which is 0.194 higher than the AUC value of RF under the optimal parameter of 0.779. Therefore, we believe that SSRF, with its smaller variability and higher AUC values, can more accurately predict landslides and generate LSM. The significance of the ten evaluated factors was delineated, utilizing the optimal SSRF proportion, as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e(c). Distance to roads and elevation emerged as preeminent factors in landslide susceptibility modeling, reflected by mean decrease Gini values of 90.8 and 74.8, respectively. In particular, road distance was identified as a salient variable in the SSRF and generalized additive models. The diminished road distances correlated with increased settlement density, thus amplifying landslide risks to life and property. Consequently, preemptive measures, including the strategic design and systematic layout of roads, augmented by vigilant traffic management, are imperative to attenuate landslide risks. Timely road maintenance practices such as routine inspections and immediate rectification of fissures, depressions, and obstructions in drainage are instrumental in curtailing landslide occurrence. Additionally, the strategic emplacement of barriers at susceptible slopes and the conspicuous display of warning signage along critical sections can substantially mitigate landslide-induced traffic mishaps, thereby ensuring the safety of both motorists and pedestrians.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eSubsequently, the LSM can be obtained by inputting the optimal parameters derived above and the DEM the GBA (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e). The region was divided into five categorically distinct susceptibility zones. These zones range from very low, constituting 42.10% of the area, to very high susceptibility, which occupies only 8.98% of the area. Notably, the very high susceptibility zone, although representing a minor fraction of the total area, encompasses an astonishing 74.94% of the landslide occurrences, attesting to the model\u0026rsquo;s precision in identifying zones of critical concern. Aligning with data in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, the spatial distribution of these zones elucidates a correlation of very high susceptibility with parameters such as low elevation, moderate slopes, immediate vicinity to roads or rivers, eastward-facing topography, and zones of active construction. The observed spatial patterns reflect a complex interplay of geomorphological and anthropogenic influences, where lower elevations coupled with moderate inclines predispose the formation of layered soil structures, and proximity to infrastructural developments heightens vulnerability to erosional forces, leading to destabilization of the slopes. These anthropogenically mediated modifications can adversely affect soil cohesion. Conversely, the areas positioned at higher altitudes in the GBA tend to exhibit decreased susceptibility due to their diminished exposure to fluvial erosion, inherently lower hydrological accumulation capacities, and relative seclusion from human-induced alterations.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003e4.4 Comparison of rainfall threshold warning and dynamic landslide hazard warning\u003c/h2\u003e \u003cp\u003eThe training dataset for model training consisted of 80% (380 landslide events). Among these, 135 events were classified under category R3, while a significant portion exceeded the defined critical levels in categorized R1 and R2. The probability of successful landslide prediction under the rainfall threshold model is 67.2%. With respect to dynamic landslide hazard warning, 120 events triggered emergency-level warnings and 131 activated alert-level warnings. Finally, compared with the rainfall threshold warning, the prediction success rate is increased by 22.1%, reaching 89.2%. Detailed verifications revealed that several events, which were not successfully predicted, were influenced by non-rainfall-induced factors such as anthropogenic slope alterations and infrastructural deficiencies. These deficiencies include unfortified road slopes, absence of drainage facilities in retaining structures, and a general lack of preventive maintenance protocols. Further validation was conducted using the remaining 20% of landslide data, resulting in the successful prediction of 63 landslides, with a success rate of 65.6%. Although the success rate of the prediction was similar to that of the training set, the forecast sample identified only 4 events and 14 events at R1 and R2, respectively. This posed a challenge to distinguish landslide occurrence conditions in the case of heavy rainfall and severe storms. In contrast, while the dynamic landslide hazard warning model identified only 67 of the 96 landslides, 14 of them were in an emergency level and 34 were in a warning level. This more accurately reflects the occurrence of landslides under the situation of heavy rainfall.\u003c/p\u003e \u003cp\u003eA comparison of the performance of the warning systems was conducted using data from a rainfall event which occurred on 31 August 2018. Despite the fact that the average rainfall recorded at each landslide location did not exceed 30 mm for that day, cumulative precipitation data over three and seven days showed significant amounts. Consequently, this phenomenon initiated a total of 37 landslides on that day. Figure\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003e(a) illustrates the day's rainfall warnings, which are categorized into three distinct levels: R2, R3, and R4. The distribution of landslides highlights 29 instances within the R2 region, predominantly around Huizhou and Zhaoqing, and 8 within the distributed R3 region. Figure\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003e(b) analyses the incident data, showing that landslide hazard alerts were more precise in determining severity compared to rainfall threshold warnings. Out of the 37 landslide events, the warning system accurately predicted 24 at the emergency level, 11 within the alert level, and only 2 at the attention level.\u003c/p\u003e \u003cp\u003eNevertheless, both models were able to predict the location of the landslides. According to the rainfall threshold warning model, there was no R1 rainfall in the entire GBA area, and the highest warning level that could be issued was R2. However, on 31 August 2018, a dense landslide hazard occurred in the entire GBA area, which was not in line with our expectation of the warning. In contrast, the dynamic landslide warning model classified 24 out of 37 landslides as the highest warning, i.e., the emergency level and provides accurate warnings that align with actual conditions.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003e4.5 Performance of dynamic landslide hazard warning system\u003c/h2\u003e \u003cp\u003eThe continuous landslide events in GBA from August 15 to August 20, 2013 were chosen to verify the performance of the landslide hazard warning system. A total of 35 landslides were recorded within this six-day period, with the majority occurring in the municipality of Huizhou, located in the eastern sector of the GBA, as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003e. Among these events, the dynamic landslide warning model correctly predicted 28 landslide events, with an accuracy of 74.3%. Emergency warnings were issued for seven of the 28 events that were correctly predicted. Notably, a smooth spatial correlation was observed as one's proximity to the identified landslide points decreases, there is a corresponding escalation in the warning level; hence, proximal locations receive the most acute level of alerts commensurate with their increased risk of being impacted by landslide events.\u003c/p\u003e \u003cp\u003eHowever, on August 16, 2013, three landslides occurred in the eastern part of the GBA that were expected to be at alert level, while no landslides occurred in their more easterly emergency level. Two explanations may account for this anomaly. The potential lack of hourly updates in the landslide catalog raises the possibility that a nighttime (August 16, 2013) rainstorm triggered the landslides but was not recorded by the daily rainfall measurements, obscuring its anomaly. This temporal mismatch between rainfall and catalog updates could impact warning accuracy, especially for sudden events. Secondly, the study used 0.1\u0026deg; data from MSWEP, which has superior spatial resolution compared to conventional 0.5\u0026deg; satellite data. However, it is important to acknowledge that in some areas, MSWEP achieves its finer resolution through spatial interpolation, which can introduce unavoidable inaccuracies. It is possible that these deviations may have affected the system's forecasts, which could explain the difference between the actual and expected landslide phenomena.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"5 Discussion","content":"\u003cp\u003eThis study has utilized a landslide catalog and precipitation datasets to shape the landslide hazard warning framework for the GBA. The catalogs classified events based on primary components, such as nominal spatial data, temporal instances, and causative factors, and secondary attributes encompassing the type, scale, and consequences of the landslides. The classification of events in catalogs is based on primary components, such as nominal location information, time, and trigger, as well as secondary elements that encompass the type and relative size of the event, latitude and longitude, and impact information regarding landslides. However, incorporating landslides of different sizes into a single database requires a calibrated approach to accommodate the diversity in precipitation profiles, which poses challenges in modeling efforts. The relationship between landslide size and rainfall intensity is a complex issue that significantly impacts the accuracy of early warning systems (Kirschbaum et al. 2010). When combined with other natural disasters, such as floods and tropical storms, it becomes challenging to isolate landslide-specific impact metrics. To improve data accuracy, it is necessary to clearly distinguish landslides from other disasters, considering their distinctiveness in terms of casualties and geographical extent. In multi-hazard scenarios, it is important to ensure that landslide incidents are not conflated with other hazards, as this can lead to potential underrepresentation in disaster reporting and compromise data robustness for early warning model calibration (Guzzetti 2000).\u003c/p\u003e \u003cp\u003eRegional landslide catalogs typically contain precise spatial coordinates and timing data. To improve clarity and comprehensibility, field mapping and remote sensing interpretation can be used to corroborate and update detailed information on the shape characteristics and spatial location of landslides, if necessary(Lacroix et al. 2019; Rapstine et al. 2020).\u003c/p\u003e \u003cp\u003eLandform databases may not provide sufficient detail about individual landslides, and instead rely on broader classification scales. This can potentially compromise the accuracy of susceptibility models, as the commonly used grid dimensions in LSM may not accurately represent the affected areas. To mitigate such spatial bias, this study employs 12.5 m resolution DEM data instead of 30 m or 90 m resolutions. This selection is suitable because a significant portion of our database contains mainly minor to intermediate landslides. It is important to consider that lower resolutions may not accurately capture topographical, hydrological, and surface features, which are crucial for precise susceptibility forecasting.\u003c/p\u003e \u003cp\u003eThe use of randomly selecting non-landslide samples is a method that reduces the need for human intervention. It has been widely adopted in current research for its satisfactory overall predictive accuracy and simplicity of operation (Chen and Zhang 2021). However, it may be worth considering the possibility of high susceptibility samples in areas without a history of landslides. In this case, SSRF may be a better approach because its method of sampling more accurately from areas of very low and very low susceptibility based on the initial susceptibility prediction would reduce the error associated with the quality of the non-landslide samples during the model training and testing phases. Since landslide areas are in the minority and non-landslide areas are in the majority, extending the non-landslide ratio brings the model closer to the real quantitative relationship between landslides and non-landslides in the study area (Chang et al. 2023). Ultimately, we determined that a landslide to non-landslide ratio of 1:4 was the most consistent ratio for modeling landslide susceptibility in the GBA.\u003c/p\u003e \u003cp\u003eAlthough rain gauge observations are the most direct and precise form of precipitation measurements, numerous studies have highlighted the inadequacies of even dense rain gauge networks in accurately capturing the intensity and spatial distribution of heavy rainfall events. Therefore, it is important to consider other methods of precipitation measurement in addition to rain gauges (Scofield and Kuligowski 2003). To overcome this limitation, the current study integrates the MSWEP product, which is known for its high resolution, high stability and high accuracy of precipitation estimation, particularly at the urban scale, to develop an early warning model (Li et al. 2022; Liu 2023; Mo et al. 2024). Additionally, the threshold model described in Section 3.3 has demonstrated its effectiveness in providing dependable alerts. It is also acknowledged that different satellite products may provide distinct assessments of landslide rainfall thresholds. Therefore, future research should concentrate on assessing the appropriateness of various satellite-based precipitation datasets, such as CMORPH and GPM, in the GBA. Further testing will be undertaken to establish multi-source rainfall thresholds and to explore the feasibility of threshold interoperability to improve the reliability and utility of landslide hazard assessments across different event scenarios (Stanley et al. 2017).\u003c/p\u003e \u003cp\u003eWhile the topography of the Bay Area may initially appear stable, its subsurface composition reveals a high degree of heterogeneity, with marked spatial variation. Urban expansion has led to a marked increase in non-porous surfaces, exacerbating runoff and erosion (Mautner et al. 2020). Furthermore, the dynamic nature of urban vegetation cover, influenced by rapid infrastructural development, necessitates constant vigilance. Given the unique geophysical and socio-economic characteristics of the Bay Area, landslide warning methods must accommodate the frequent updates required by the region's evolving road network in order to maintain the predictive accuracy of the hazard assessment model, a key factor in effective disaster prevention (Lizheng et al. 2023). In urban areas, engineered interventions such as drainage systems and retaining structures have been shown to improve slope stability. This makes them more resilient to landslides than rural environments facing similar environmental constraints. When assessing hazards, it is important to account for the considerable influence of engineering measures to ensure precise risk prediction and the formulation of impactful mitigation strategies.\u003c/p\u003e"},{"header":"6 Conclusions","content":"\u003cp\u003eAfter eliminating redundant and less significant factors, the remaining 10 factors were utilized to assess landslide susceptibility. Contrary to the conventional approach that involves randomly selecting non-landslide samples throughout the entire region for the RF model, the method begins with a random selection of non-landslide samples across the region to create the LSM. Subsequently, non-landslide samples are reselected from areas classified as low or very low in susceptibility, serving as inputs for the SSRF model. A 1:4 ratio of landslide to non-landslide samples has been found to be effective for the highly urbanized GBA. At this ratio, the SSRF model exhibited an AUC value of 0.973, a substantial enhancement from the 0.779 AUC value attained by the RF model, which involved a random selection of non-landslide samples across the region. The SSRF model successfully identified 74.94% of known landslides within merely 8.98% of the GBA. Spatially, the primary factors contributing to landslides in the GBA are identified as the distance to roads, elevation, and slope. Furthermore, satellite rainfall products data enable the calculation of the grid's historical maximum daily rainfall, in addition to rainfall over three-day and seven-day periods. These measurements are crucial for establishing rainfall thresholds and for assessing the temporal dimension necessary for disaster warning development. The dynamic landslide warning model integrates rainfall thresholds with landslide susceptibility analyses to produce detailed warning cells for regions at extreme risk within highly urbanized areas.\u003c/p\u003e "},{"header":"Declarations","content":"\u003cp\u003e \u003cstrong\u003eConflict of interest\u003c/strong\u003e \u003cp\u003e \u003cb\u003eThe authors have no relevant financial or non-financial interests to disclose.\u003c/b\u003e \u003c/p\u003e\u003cp\u003e\u003cstrong\u003eAcknowledgments\u003c/strong\u003e\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eThe research is financially supported by National Key R\u0026amp;D Program of China (2021YFC3001000).\u003c/strong\u003e\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eAhmed M, Tanyas H, Huser R et al. (2023) Dynamic rainfall-induced landslide susceptibility: a step towards a unified forecasting system. Int J Appl Earth Obs 125:103593\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBrunetti MT, Melillo M, Gariano SL et al. (2021) Satellite rainfall products outperform ground observations for landslide prediction in india. Hydrol Earth Syst Sc 25:3267\u0026ndash;3279\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBrunetti MT, Peruccacci S, Rossi M et al. (2010) Rainfall thresholds for the possible occurrence of landslides in italy. Nat Hazard Earth Sys 10:447\u0026ndash;458\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCanavesi V, Segoni S, Rosi A et al. (2020) Different approaches to use morphometric attributes in landslide susceptibility mapping based on meso-scale spatial units: a case study in rio de janeiro (brazil). Remote Sens-Basel 12:1826\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCasagli N, Intrieri E, Tofani V et al. (2023) Landslide detection, monitoring and prediction with remote-sensing techniques. Nat Rev Earth Env 4:51\u0026ndash;64\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChang Z, Huang J, Huang F et al. (2023) Uncertainty analysis of non-landslide sample selection in landslide susceptibility prediction using slope unit-based machine learning models. Gondwana Res 117:307\u0026ndash;320\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen W, Zhang S (2021) Gis-based comparative study of bayes network, hoeffding tree and logistic model tree for landslide susceptibility modeling. Catena 203:105344\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDe Sy V, Schoorl JM, Keesstra SD et al. (2013) Landslide model performance in a high resolution small-scale landscape. Geomorphology 190:73\u0026ndash;81\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDing Q, Shao Z, Huang X et al. (2022) Time-series land cover mapping and urban expansion analysis using openstreetmap data and remote sensing big data: a case study of guangdong-hong kong-macao greater bay area, china. Int J Appl Earth Obs 113:103001\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDou H, He J, Huang S et al. (2023) Influences of non-landslide sample selection strategies on landslide susceptibility mapping by machine learning. Geomatics, Natural Hazards and Risk 14:2285719\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFan L, Lehmann P, Or D (2015) Effects of hydromechanical loading history and antecedent soil mechanical damage on shallow landslide triggering. Journal of Geophysical Research: Earth Surface 120:1990\u0026ndash;2015\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFang L, Wang Q, Yue J, Xing Y (2023) Analysis of optimal buffer distance for linear hazard factors in landslide susceptibility prediction. Sustainability-Basel 15:10180\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFroude MJ, Petley DN (2018) Global fatal landslide occurrence from 2004 to 2016. Nat Hazard Earth Sys 18:2161\u0026ndash;2181\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGariano SL, Guzzetti F (2016) Landslides in a changing climate. Earth-Sci Rev 162:227\u0026ndash;252\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGeng S, Zhang H, Xie F et al. (2022) Vegetation dynamics under rapid urbanization in the guangdong\u0026ndash;hong kong\u0026ndash;macao greater bay area urban agglomeration during the past two decades. Remote Sens-Basel 14:3993\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGoudie AS (2016) Quantification of rock control in geomorphology. Earth-Sci Rev 159:374\u0026ndash;387\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGuzzetti F (2000) Landslide fatalities and the evaluation of landslide risk in italy. Eng Geol 58:89\u0026ndash;107\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHuang F, Chen J, Liu W et al. (2022) Regional rainfall-induced landslide hazard warning based on landslide susceptibility mapping and a critical rainfall threshold. Geomorphology 408:108236\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHuang F, Xiong H, Jiang S et al. (2024) Modelling landslide susceptibility prediction: a review and construction of semi-supervised imbalanced theory. Earth-Sci Rev:104700\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHuang F, Ye Z, Jiang S et al. (2021) Uncertainty study of landslide susceptibility prediction considering the different attribute interval numbers of environmental factors and different data-based models. Catena 202:105250\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJiang Y, Wang W, Zou L, Cao Y (2024) Regional landslide susceptibility assessment based on improved semi-supervised clustering and deep learning. Acta Geotech 19:509\u0026ndash;529\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJohnston EC, Davenport FV, Wang L et al. (2021) Quantifying the effect of precipitation on landslide hazard in urbanized and non-urbanized areas. Geophys Res Lett 48:e2021GL094038\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJones JN, Boulton SJ, Bennett GL et al. (2021) Temporal variations in landslide distributions following extreme events: implications for landslide susceptibility modeling. Journal of Geophysical Research: Earth Surface 126:e2021JF006067\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKeefer DK, Larsen MC (2007) Assessing landslide hazards. Science 316:1136\u0026ndash;1138\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKim SW, Chun KW, Kim M et al. (2021) Effect of antecedent rainfall conditions and their variations on shallow landslide-triggering rainfall thresholds in south korea. Landslides 18:569\u0026ndash;582\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKirschbaum DB, Adler R, Hong Y et al. (2010) A global landslide catalog for hazard applications: method, results, and limitations. Nat Hazards 52:561\u0026ndash;575\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKwong A, Wang M, Lee CF, Law KT (2004) A review of landslide problems and mitigation measures in chongqing and hong kong:: similarities and differences. Eng Geol 76:27\u0026ndash;39\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLacroix P, Araujo G, Hollingsworth J, Taipe E (2019) Self-entrainment motion of a slow‐moving landslide inferred from landsat‐8 time series. Journal of Geophysical Research: Earth Surface 124:1201\u0026ndash;1216\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLacroix P, Handwerger AL, Bi\u0026egrave;vre G (2020) Life and death of slow-moving landslides. Nat Rev Earth Env 1:404\u0026ndash;419\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLee S, Pradhan B (2007) Landslide hazard mapping at selangor, malaysia using frequency ratio and logistic regression models. Landslides 4:33\u0026ndash;41\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi J, Liu Z, Wang R et al. (2022) Analysis of debris flow triggering conditions for different rainfall patterns based on satellite rainfall products in hengduan mountain region, china. Remote Sens-Basel 14:2731\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu Z (2023) Evaluation of rainfall thresholds triggering debris flows in western china with gauged-and satellite-based precipitation measurement. J Hydrol 620:129500\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLizheng D, Hongyong Y, Mingzhi Z, Jianguo C (2023) Research progress on landslide deformation monitoring and early warning technology. Journal of Tsinghua University (Science and Technology) 63:849\u0026ndash;864\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLoche M, Alvioli M, Marchesini I et al. (2022) Landslide susceptibility maps of italy: lesson learnt from dealing with multiple landslide types and the uneven spatial distribution of the national inventory. Earth-Sci Rev:104125\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMaraun D, Knevels R, Mishra AN et al. (2022) A severe landslide event in the alpine foreland under possible future climate and land-use changes. Commun Earth Environ 3:87\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMautner MR, Foglia L, Herrera GS et al. (2020) Urban growth and groundwater sustainability: evaluating spatially distributed recharge alternatives in the mexico city metropolitan area. J Hydrol 586:124909\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMerghadi A, Yunus AP, Dou J et al. (2020) Machine learning methods for landslide susceptibility studies: a comparative overview of algorithm performance. Earth-Sci Rev 207:103225\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMo C, Lei X, Mo X et al. (2024) Comprehensive evaluation and comparison of ten precipitation products in terms of accuracy and stability over a typical mountain basin, southwest china. Atmos Res 297:107116\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMondini AC, Guzzetti F, Melillo M (2023) Deep learning forecast of rainfall-induced shallow landslides. Nat Commun 14:2466\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePardeshi SD, Autade SE, Pardeshi SS (2013) Landslide hazard assessment: recent trends and techniques. Springerplus 2:1\u0026ndash;11\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePeruccacci S, Brunetti MT, Gariano SL et al. (2017) Rainfall thresholds for possible landslide occurrence in italy. Geomorphology 290:39\u0026ndash;57\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePiciullo L, Gariano SL, Melillo M et al. (2017) Definition and performance of a threshold-based regional early warning model for rainfall-induced landslides. Landslides 14:995\u0026ndash;1008\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRapstine TD, Rengers FK, Allstadt KE et al. (2020) Reconstructing the velocity and deformation of a rapid landslide using multiview video. Journal of Geophysical Research: Earth Surface 125:e2019JF005348\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSchilir\u0026ograve; L, Montrasio L, Mugnozza GS (2016) Prediction of shallow landslide occurrence: validation of a physically-based approach through a real case study. Sci Total Environ 569:134\u0026ndash;144\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eScofield RA, Kuligowski RJ (2003) Status and outlook of operational satellite precipitation algorithms for extreme-precipitation events. Weather Forecast 18:1037\u0026ndash;1051\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSegoni S, Lagomarsino D, Fanti R et al. (2015) Integration of rainfall thresholds and susceptibility maps in the emilia romagna (italy) regional-scale landslide warning system. Landslides 12:773\u0026ndash;785\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSegoni S, Piciullo L, Gariano SL (2018) A review of the recent literature on rainfall thresholds for landslide occurrence. Landslides 15:1483\u0026ndash;1501\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eStanley T, Kirschbaum DB, Huffman GJ, Adler RF (2017) Approximating long-term statistics early in the global precipitation measurement era. Earth Interact 21:1\u0026ndash;10\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTan Y, Guo D, Xu B (2015) A geospatial information quantity model for regional landslide risk assessment. Nat Hazards 79:1385\u0026ndash;1398\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang Y, Gao G, Zhai J et al. (2023) Evolution characteristics of the rainstorm disaster chains in the guangdong\u0026ndash;hong kong\u0026ndash;macao greater bay area, china. Nat Hazards:1\u0026ndash;22\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang Y, Zhai J, Gao G et al. (2022) Risk assessment of rainstorm disasters in the guangdong\u0026ndash;hong kong\u0026ndash;macao greater bay area of china during 1990\u0026ndash;2018. Geomatics, Natural Hazards and Risk 13:267\u0026ndash;288\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXin Y, Lu N, Jiang H et al. (2021) Performance of era5 reanalysis precipitation products in the guangdong-hong kong-macao greater bay area, china. J Hydrol 602:126791\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXing Y, Chen Y, Huang S et al. (2023) Research on the uncertainty of landslide susceptibility prediction using various data-driven models and attribute interval division. Remote Sens-Basel 15:2149\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYoussef B, Bouskri I, Brahim B et al. (2023) The contribution of the frequency ratio model and the prediction rate for the analysis of landslide risk in the tizi n'tichka area on the national road (rn9) linking marrakech and ouarzazate. Catena 232:107464\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYu X, Zhang K, Song Y et al. (2021) Study on landslide susceptibility mapping based on rock\u0026ndash;soil characteristic factors. Sci Rep-Uk 11:15476\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhou B, Zeng H, Zhao L, Han Z (2023) Climate change and climate risks in the guangdong-hong kong-macau greater bay areaAnnual Report on Actions to Address Climate Change (2019) Climate Risk Prevention.Springer,pp 173\u0026ndash;193\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Non-Landslide sample, Semi-supervised random forest, Landslide susceptibility map, Rainfall threshold, Dynamic landslide warning","lastPublishedDoi":"10.21203/rs.3.rs-5684743/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-5684743/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eLandslides are a significant global geological hazard, with adverse and far for human life, the economy and the natural environment on an annual basis worldwide. Accurately estimating the spatial and temporal distribution of landslide probability is crucial for reducing these losses. Nevertheless, existing landslide warning systems may fail to consider the selection of non-landslide samples and the dynamic process of landslides, potentially compromising the accuracy of landslide warning systems. This study explores the impact of different selections of non-landslide samples and satellite rainfall datasets on the early warning model for landslides in the Guangdong-Hong Kong-Macao Greater Bay Area (GBA). Through Pearson correlation analysis, critical factors associated with landslide occurrences were identified, including elevation, slope, aspect, distance to roads and rivers, soil type, plan curvature, profile curvature, Topographic Wetness Index (TWI), and Normalized Difference Vegetation Index (NDVI). In this study, a semi-supervised random forest (SSRF) model incorporating frequency ratios (FR) to evaluate landslide susceptibility in the GBA. The susceptibility and rainfall threshold model were subsequently combined into a dynamic landslide hazard warning system through a matrix approach. The findings revealed that the maximum area under the curve (AUC) value for a landslide to non-landslide ratio of 1:4 is 0.973. The very high susceptibility zone is typically located between 125 and 250 meters away from roads. Moreover, the validation phase yielded successful predictions for 67 out of 96 landslide events, thereby providing effective early warning and a reference point for disaster mitigation and prevention.\u003c/p\u003e","manuscriptTitle":"Advanced Landslide Early Warning System Based on a Semi-supervised Model in Highly Urbanized Areas across China's Greater Bay Area","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-01-24 13:45:51","doi":"10.21203/rs.3.rs-5684743/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"2b2e1d99-c565-4e4b-85bb-5052f92198a8","owner":[],"postedDate":"January 24th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2025-04-07T12:57:44+00:00","versionOfRecord":[],"versionCreatedAt":"2025-01-24 13:45:51","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-5684743","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-5684743","identity":"rs-5684743","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.