AI-Driven Hotspot Detection and Program Performance Analysis of Schistosomiasis in Africa | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article AI-Driven Hotspot Detection and Program Performance Analysis of Schistosomiasis in Africa Akinjide S. Anifowose, Tomilayo O. Fadairo, Temitope E. Olajide, and 4 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8905314/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Schistosomiasis is one of the top neglected tropical diseases in sub-Saharan Africa with endemic locales of disease transmission and subnational heterogeneity despite repeated mass drug administration (MDA). This study created an AI-operated framework that identified the hotspots of spread, characterized the risk of endemicity and examined the performance of the programme based on 70,372 Admin2-years observations in endemic countries in Africa. It was a multi-component analytical methodology that combined machine learning clustering, supervised hotspots prediction, geographic stratification, and spatiotemporal trend modelling. Structurally imbalanced epidemiological patterns were detected by k-means clustering (k = 4), and the outcome is a dominant cluster attaining 65.9 percent of districts indicating a systemic pattern of similarity in coverage and endemicity, and small atypical clusters that are suggestive of extreme-risk or high-performance situations. The target hotspots were predicted by a random forest with 88 per cent accuracy and AUCs equal to 0.80 which indicates good discriminatory ability. Nevertheless, their extreme class imbalance led to high levels of recall (0.99) in hotspots too but close to zero in non-hotspots, with which methodological problems in elimination-phase modelling are identified. Combined risk stratification was able to show that most districts are in moderate-priority category, and not extreme high-risk one. Spatiotemporal analysis resulted in the overall negative mean endemicity trend (p= -7.59), but zero median slope of the slope indicated the widespread stability with significant inter-district variability. These results indicate that the process of eradication is discontinuous and geographically embedded. The study offers an AI hybrid stratification architecture that integrates clustering and predictive modelling to overcome the targeted intervention planning and optimal resource allocation problems. Findings highlight the importance of modular, evidence-based and geographically diverse eradication measures throughout Africa. Computational Biology Infectious Diseases Schistosomiasis Machine Learning Hotspot Detection Spatial Epidemiology Spatiotemporal Modelling Risk Stratification Mass Drug Administration Africa Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 INTRODUCTION Schistosomiasis is among the most critical parasitic neglected tropical diseases (NTDs) impacting the populations in sub-Saharan Africa with the ecological suitability, poverty, lack of sufficient sanitation, and poor access to safe water supporting the continuation of the disease. The disease still shows a significant spatial heterogeneity and active transmission foci in several countries endemic despite years of control efforts that are majorly based on preventive chemotherapy when mass drug administration (MDA). As per recent modelling evaluations, the current improvement to the World Health Organization (WHO) 2030 morbidity elimination objectives still shows uneven developments where most high-burden areas are not likely to reach the desired threshold of morbidity control without supplementary and better-focused interventions [1,2]. Mass-treatment efforts have achieved impressive gains and declines in prevalence and intensity of infections in most endemic locations but the effect has not been spread equally. [3]. Besides machine learning, spatiotemporal modelling has also been used to give vital information regarding the development of the risk of schistosomiasis across time and space. The model-based geostatistical analyses of Kenya showed that spatial autocorrelation and survey design have a huge impact on the precision of risk estimates and the ultimate MDA decisions [4]. Reinfection after MDA, environmental fitness of ecological parameters of intermediate hosts, locational mobility, and intermittent treatment use have all played a role in the existence of hotspots despite repeated intervention cycles [5]. Multi-country studies have proven the point that improvements at national levels usually conceal significant disparities at the subnational level, and newer analysis instruments are required to reveal the existence of residual pockets of transmission [6,7]. Prevalence-based stratification approaches, which are usually based on threshold classifications based on cross-sectional surveys, might not be sufficiently able to include nonlinear and dynamic contributors of persistence to transmission. More and more scientists are recommending that machine learning (ML), geospatial modelling and longitudinal performance indicators be incorporated into schistosomiasis surveillance and control systems [1,8]. The machine learning techniques present their unique benefits in the ability to fit in complex interactions among environmental, epidemiological and programmatic variables that traditional regression models might not capture fully. Recent discoveries currently reveal the usefulness of ML-based predictive modelling in the environment of the presence of schistosomiasis. Chen et al. used several machine learning models to predict the level of infection by country and assess the achievement of WHO Roadmap 2030 goals, and it was found that the accuracy of predictions increased significantly with the implementation of the multidimensional indicator [1]. On the same note, Li et al. applied machine learning models to simulate the dynamics of schistosomiasis transmission in Zimbabwe, observing how the environment and programmatic coverage play out to form risk-prone patterns [3]. These results indicate a wider shift between the descriptive mapping to predictive analytics that are able to help in the proactive use of interventions. Recent studies have begun exploring advanced predictive and analytical frameworks to better understand transmission dynamics and control efficacy for schistosomiasis, highlighting the importance of integrating real-world programmatic data with machine learning methods to enhance hotspot detection and intervention planning [13]. The comparisons based on the Global Burden of Disease statistics also illustrated the findings of heterogeneous reductions in disability-adjusted life years (DALYs) regionally in the African countries, which proves the significance of subnational prognostic models [6]. Collectively, these papers indicate that risk profiling through artificial intelligence, hotspots forecasting, geographical segmentation, and temporal trend modeling can integrate and enhance the performance of schistosomiasis programs in Africa. Aim of the study The aim of the study is to design an AI-based system to monitor hotspots of transmission, profile the risk of endemicity, and assess the effectiveness of the programs on schistosomiasis in Africa by means of multi-level administrating information and time level-based data to assist in the planning of interventions and optimizing the resources. Objectives To apply machine learning methods to identify and stratify spatial transmission risk patterns of schistosomiasis using administrative-level indicators such as endemicity class, treatment coverage, population at risk, and MDA history. To develop and validate machine-learning models that predict areas likely to remain hotspots or experience persistent low coverage based on historical endemicity, treatment performance, and population-level indicators. To integrate ML-derived risk profiles, hotspot predictions, and cluster outputs to generate a geographic stratification framework that supports targeted MDA planning and resource allocation across Africa. To analyze and forecast temporal changes in endemicity and treatment coverage using time-aware machine learning models, supporting proactive decision-making. The year variable helps the temporal dimension. RELATED STUDIES Increasing use of machine learning tools in the study of schistosomiasis is an indicator of an increased understanding of how transmission systems are complex. The research by Chen et al. showed that the ensemble machine learning models, such as random forests and gradient boosting, are more effective than the traditional statistical methods to predict the categories of infection intensity among endemic countries in Africa [1]. Their analysis also found that even with no specific efforts to improve the performance of programmes, a significant number of moderate- to high-risk districts can stay above the elimination extremes. To supplement this, Singer et al. created predictive models to establish baseline features that are connected to communities with a high likelihood of continuing high prevalence after multiple rounds of MDA [2]. Despite the average predictive performance, this study highlighted the plausibility of predicting the occurrence of hotspots through the combination of epidemiological and environmental data. There has also been the application of machine learning to environmental and ecological determinants of schistosomiasis transmission. Tabo et al. trained machine learning-based to model the distribution of intermediate snail hosts in East Africa, which shows that predictive value of climatic and hydrological variables to the transmission suitability is provided [9]. Kagabo et al. also determined agrochemical exposure and disruption of ecological patterns as important predictors of snail abundance with the use of ML-based ecological modelling in Rwanda [10]. These results solidify the notion of using the environmental covariates into the hotspots prediction systems. Spatiotemporal modelling has also contributed in the knowledge of perennial patterns of transmission. The article by Okoyo et al. showed through the use of model-based geostatistical techniques that the type of spatially structured random effects yields much better estimates of risk in Kenya [4]. Mbugi et al. examined data on national surveys in Tanzania and showed significant subdistrict variation in schistosomiasis prevalence that was due to coarse administrative stratification [11]. Similarly, Sokolow et al. presented evidence that ecological interventions that affect the habitat of snails have the capability of modifying the patterns of transmission thereby confirming the interdependence between environmental and programmatic determinants [12]. In the broader body of schistosomiasis research, studies have examined both biological mechanisms and transmission dynamics using traditional epidemiological methods and mathematical modelling [2–4]. However, emerging analyses utilizing high-dimensional datasets and computational approaches suggest potential for deeper insights into transmission persistence and program performance, exemplified by recent preprint work exploring data-driven infectious risk frameworks using real-world endemic datasets. [13]. Endemicity patterns over time have also been clearly explained using the temporal trend analyses. Based on the Global Burden of Disease, Peng et al. showed the heterogeneous trends of the decline in the burden of schistosomiasis in Africa, but these trends varied between the year 1990 and 2021 [6]. Subsequent longitudinal studies in Ghana and Senegal found that among multiple repeated MDA rounds, an often incompletely filled focus of endemicity may be left behind in places with sporadic coverage or where newly infected individuals are at high risk of being reinfected [7,14]. The results underscore the need to integrate programmatic measures of performance like coverage of treatment, coverage of effective treatment and population at risk with the predictive modelling framework. In addition to predictive mapping innovations, there are diagnostic and surveillance innovative works, which have been used in improving on modelling inputs. The development of the environmental DNA (eDNA) surveillance has enhanced sensitivity of the detection of the presence of the schistosome in aquatic environments and provided a source of complementary data to predictive risk modelling [15]. Image classification tools by machine learning have also enhanced the outbreak of parasites in microscopy images and increased the accuracy of the diabetes and possibly the quality of data used in modeling processes [16,17]. Other economic conditions that can be used to alter the dynamics of the transmission of schistosomiasis include climate change and demographic transitions. According to Tabo et al. the climate and population changes in future may contribute significantly to widening the transmission suitability in certain regions of the Lake Victoria Basin, which explains the necessity of adaptive forecasting frameworks [18]. The same modelling applied to Angola and Rwanda showed that a fortification of demographic growth and environmental variability into spatiotemporal models improves prediction of other factors [19,20]. Modelling syntheses and systematic reviews also highlight the same research gaps. Malizia et al. have expressed the difficulty of forecasting the long-term removal curves because of the reinfection processes and dissimilar treatment adherence [21]. Altogether, the literature indicates a significant advancement in machine learning, geostatistics, and spatiotemporal modelling of the risk of schistosomiasis. Nevertheless, use of endemicity classification, treatment coverage indicator, multi-level administrative identifier and longitudinal hot spot persistence modelling on a single AI driven framework is restricted. Close this divide is necessary to enhance the data-based geographic stratification and useful highlighted interventions planning throughout Africa. MATERIALS AND METHODS Study Design This research utilized a retrospective, multi-country, and spatiotemporal analytic design that combined both administrative level epidemiological and programmatic indicators of schistosomiasis in endemic countries of Africa. The analysis model integrated machine learning, forecasting modelling, spatial stratification, and time-related forecasting methods in assessing the transmission risk pattern and identifying chronic hotspots and aiding intervention planning. This paper was designed with four main analytical elements consisting of the mentioned study aim: (1) machine learning onto endemicity and transmission risk profiling, (2) predictive modelling of persistent hotspots and program gaps, (3) AI based geographic stratification and (4) spatiotemporal modelling of temporal trend. Data Sources and Study Variables The data on administrative level schistosomiasis programs were accessed through the data portal of the Expanded Special Project for the Elimination of Neglected Tropical Diseases (ESPEN database) [1]. The data was downloaded in the period between 2014 and 2024 and downloaded at the unit of implementation (IU) level, which is the second-level administrative divisions (Admin2) in endemic African countries. The 2014–2024 years have been chosen to ensure a decade of programmatic implementation within the framework of intensified mass drug administration (MDA) to meet the WHO Roadmap in alignment with the WHO NTD Roadmap 2021–2030 [2]. This time window enables meaningful spatiotemporal modelling, the likelihood of longitudinal persistence in hotspots, and the measurement of the treatment coverage trends through time. Five African regions were selected in order to obtain a continental representation and to obtain geographic heterogeneity in transmission dynamics: Data were downloaded in 5 African regions. West Africa: Nigeria, Benin, Ghana and Senegal. Central Africa: Cameroon, Democratic republic of Congo, Chad, Central African Republic. Eastern Africa: Ethiopia, Kenya, Tanzania (Mainland), Uganda. Southern Africa: Mozambique, South Africa, Zambia, Zimbabwe. North Africa: Mauritania, Algeria, South Sudan. Each region was selected with four countries with the exception of three countries in North Africa since there was scant availability of endemic reporting within the ESPEN portal. Selection of countries was done on: Past endemicity of schistosomiasis, Access to uniform longitudinal IU-level programmatic data, Reported practice of MDA programs, Sample of various transmission ecologies in Africa. The overall collection of the countries represents high-, moderate-, and residual-endemic settings, which allows the comparative modelling of transmission persistence and program performance. The key variables in the dataset were: endemicity (categorical endemicity classification) cov (treatment coverage) epiCov (epidemiological coverage) popReq (population requiring treatment) popTreat (population treated) mdaScher (mass drug administration schedule) effPc (efficacy of preventive chemotherapy in coverage) epiPc (epidemiological preventive chemotherapy indicator) epiEffPc (epidemiological efficacious preventive chemotherapy) year (temporal variable) In order to increase modelling capacity, more derived variables have been created, such as: coverage_ratio (popTreat/popReq) coverage (popReq [?]) popTreat) endemicity_lag1 cov_lag1 effPc_lag1 rolling indicators on coverage. The ultimate data was 70,372 Admin2-years of data between the years 2014 and 2024. Data Availability Statement The schistosomiasis administrative-level dataset used in this study was obtained from the SCH (Schistosomiasis Control (SCH) program) database. The data were downloaded from the official database portal and are publicly accessible subject to the database’s terms of use. Processed datasets generated during the analysis are available from the corresponding author upon reasonable request. Data Preprocessing Preprocessing and data cleaning Data processing was performed before modelling. Missing values were evaluated and filled in applying proper statistical procedures based on the patterns of missing values. Standardization of continuous variables was done so as to enhance the stability of the model. The classifications of categorical endemicity were coded on a numerical basis to be modelled. The temporal lag variables were developed in order to address the dynamics of persistence and better predictive modelling of hotspots continuity. Variance inflation factors (VIF) were used to test multicollinearity between predictors and variables that were strongly correlated were treated in a suitable manner to minimize redundancy. MB Eliza Objective 1: ML-Based Endemicity and Transmission Risk Profiling. Supervised and unsupervised computer learning methods were used to identify and partition risk patterns of transmission within space. Risk Classification under Supervision. The classification of endemicity was used as the primary outcome. The algorithm models such as the Random Forest [22] and the Gradient Boosting [23] were trained to forecast endemicity relationships with prediction variables (cov, epiCov, popReq, popTreat, mdaScher). The importance measures of the features were obtained to identify the prevalent transmission-risk drivers. Performance on different models was estimated with the help of cross-validation procedures and such indicators as accuracy, precision, recall and F1-score. Unmonitored Risk Grouping. To derive the latent patterns of transmission that do not depend on a priori endemicity categories, models of clustering algorithms (K-means and hierarchical clustering) were used over the effect of selected endemicity categories [24,25]. Silhouette scores and elbows were used to calculate optimal cluster numbers. The interpretation of clusters as transmission typologies and their mapping on the levels of the administrative units were aimed at visualizing the spatial heterogeneity. Hotspots and Program Gap Predictive Modeling Persistent hotspots were operationally defined according to endemicity levels that were continuously high or low rates of treatment coverage during successive years. Intended hotspots persistence was formed by creating binary outcome variables. The time-sensitive supervised machine learning were constructed to forecast the persistence of hotspots by use of the historical endemicity, coverage indicators (cov, effPc, epiPc, epiEffPc), and population-level variables (popReq, popTreat), with introducing temporal lag characteristics. Random Forest, Gradient Boosting and logistic regression (base model) were trained and also validated using k-fold cross-validation. Performance on the model was measured using : • Receiver Operating Characteristic (ROC) curves. • Area Under the Curve (AUC) • Precision-Recall metrics • Calibration plots The analysis of programmatic indicators with the greatest association with persistent hotspots revealed an importance of the feature. Geographic stratification of targeted intervention planning with the help of AI. The products of Objective 1 and 2 were used to design a framework of a geographic stratification between the multi-level administrative units (admin0-admin3). The administrative units were sorted into intervention levels of priority according to: • Transmission risk classification • Predicted hotspot persistence probability • Treatment coverage gaps The patterns of stratification were visualized with the help of spatial clustering techniques and geographic information system (GIS) mapping. To determine the high-risk area clustering, Moran’s I was used to perform the spatial autocorrelation [26]. The hierarchy model created decision support categories to inform resource distribution and strategic MDA planning. Spatiotemporal Modeling of Schistosomiasis Trends The time-centric modelling methods were used to the temporal dynamics of endemicity and treatment coverage. Mixed-effects regression maps were used to provide compensation to hierarchical administrative structure; whereby administrative units are considered random effects. Also, the model of machine learning that includes time characteristics was applied to predict short-term changes in endemicity and coverage. ANOVA has been used to determine the trends in an attempt to estimate the annual change rates. Rolling averages were calculated to regularize the variations in time. Projection models used projected endemicity curves, where they were available, over a minimum of 3 which were updated over a maximum of 5 years. The terms of spatial-temporal interaction were tested with an aim of determining whether temporal trends differed significantly among geographic units. Model Validation and Sensitivity Analysis The robustness of models was assessed by several cross-validation and hold-out cross-validation in regions where it was possible. The sensitivity analysis was done by changing the definition of hotspots and evaluating the changes in predictive performance. The analysis of the comparative performance of machine learning models to traditional regression methods was conducted to test the differences in the performance that could be attributed to the use of AI-based modelling. Ethical Considerations The research employed secondary, aggregate administrative level data with no identifiable information. No direct human participation was made, and secondary programmatic data analysis did not require any ethical approval as per the guidelines of the institution. RESULTS AND DISCUSSION This study analyzed 70,372 Admin2-year observations to address four core objectives relating to spatial risk clustering, persistent hotspot prediction, intervention prioritization, and spatiotemporal endemicity trends. Table 1 Classification Performance Metrics for Persistent Hotspot Prediction Model Class Precision Recall F1-Score Support 0 0.01 0.00 0.00 1,445 1 0.89 0.99 0.94 11,843 Accuracy 0.88 13,288 Macro Average 0.45 0.49 0.47 13,288 Weighted Average 0.79 0.88 0.84 13,288 Table 2 Distribution of Administrative Units Across Risk Clusters Risk Cluster Number of Administrative Units 0 12,163 1 9,948 2 46,397 3 1,864 Total 70,372 Table 3 Descriptive Statistics of Spatiotemporal Trend Slopes Statistic Value Count 70,372 Mean -7.586508 Standard Deviation 28.378124 Minimum -450.000000 25th Percentile -0.109091 Median (50th Percentile) 0.000000 75th Percentile 0.036364 Maximum 48.880952 In Objective 1, a K-means (k = 4) was used after standardization of epidemiological and coverage indicators. The cluster distribution that resulted was extremely skewed. Cluster 2 had the highest number of 46,397 Admin2 units (65.9 percent), Cluster 0 had 12,163 units (17.3 percent), Cluster 1 had 9,948 units (14.1 percent) whereas Cluster 3 had only 1,864 units (2.6 percent). To this distribution, there is one dominant large group of epidemiological risk profile and a very small high-risk/ extreme-profile group. In Objective 2, a Random Forest classifier was trained to identify long lasting hotspots. The model tested on 13288 test observations generated a general accuracy of 88. The shape area under the ROC curve (AUC) was 0.80, which was a good discriminative performance. But, performance in terms of classes had been uneven. In the non-hot spot group (0), the accuracy was 0.01 and the recall was 0.00 where 1, 445 observations were made. In case of hot spot (1), the accuracy was 0.89 and the recall was 0.99 with 11,843 observations. The confusion matrix indicated that there were 11,706 true positives, 137 false negatives, 1,444 false positives and only 1 true negative. In Objective 3, prioritization and intervention based on hotspots probability and cluster membership revealed that Priority 2 constituted the highest percentage of all the data (more than 55,000 Admin2 units). Priority 3 used up about 12000 units, and Priority 1 and Priority 4 were tiny values (around 2000 and 1000 units respectively). This shows that the majority of areas lies in the moderate-risk levels of operation and not extreme high- or low-priority. Under Objective 4, the average slope of the trend of spatiotemporal endemicity was found to be -7.59 with a standard deviation of 28.38. The median slope was 0.00 indicating that the majority of the Admin2 units had remained at a similar endemicity level during the time frame. Extreme negative slopes (minimum − 450) and positive slopes (maximum 48.88) however, show that few districts had large changes in endemicity, either positive or negative. DISCUSSION The clustering outcomes show a structurally disproportionate epidemiological situation as depicted in Fig. 1 and Table 2 . The supremacy of Cluster 2 implies that most Admin2 units have similar epidemiological and intervention cover features. This can be a sign of homogenization of programmatic implementation among districts. Nevertheless, the fact that there is a very small Cluster 3 (2.6) is indicative of the existence of pockets of atypical epidemiological patterns, which may be extreme vulnerability or highly regulated settings. These mini clusters are especially critical in the setting of elimination since they tend to be either the chronic site of transmission or the region that has abnormal productivity. The random forest model was also good in its overall discrimination capacity (AUC = 0.80), which means that the predictors chosen, coverage indicator, lag variables, efficiency measures, carry a lot of information in the form of the hotspots as shown in Fig. 4 and Table 1 . The evaluation metrics however show gross effects of class imbalance. The hotspot model is highly predictive of the hotspot class (1), although it successfully predicts 99% of hotspots but is almost hopeless at identifying non-hotspots (0). This bias is affirmed by the confusion matrix in Fig. 6 , which gave only 1 correctly predicted non-hotspot. This indicates that the dataset is dominated by persistent hotspots and the model has learnt to predict almost all observations to be hotspots to maximize the accuracy. Hence, accuracy is high (88 percent); it is exaggerated by the imbalance in the distribution of classes. This imbalance is better expressed by the macro-average F1 of 0.47 as compared to overall accuracy. In Fig. 2 , the prioritization structure indicates that the majority of the Admin2 units belong to the priority 2. This means that there will be a great deal of moderate risk in contrast to risk categories that are sharply outlined. Minor: very few units would be characterized as Priority 1, which implies that extreme high-risk convergence (high probability of hotspots and high-risk clusters) is low. It can be taken to mean that despite the persistence of hotspots, they do not necessarily correspond to the extreme epidemiological clusters. The allocation also indicates that the allocation of resources should not be made to cover a small group on the district level but rather have a wider, stratified model of operation. The spatiotemporal trend analysis is a finer insight as shown in Fig. 3 . The negative value shown in the mean slope (− 7.59) in Table 3 indicates an overall reduction of endemicity during the study period. The median of zero, however, shows that it was not a steady state of endemicity but more likely was steady as the majority of districts had stable levels of endemicity. The standard deviation is large, and the extreme values indicate that there is heterogeneity of program effects. The change in some districts is dramatic, which perhaps indicates effective intervention campaigns, whereas in others, the change is upwards or the same, indicating lingering foci of transmission of the infection or gaps in the program. SHAP interaction plots in Fig. 6 demonstrated that both cov and effPc exhibited interaction values tightly clustered around zero, with no substantial dispersion toward positive or negative extremes. This pattern indicates weak second-order interaction effects between treatment coverage and effective preventive chemotherapy coverage. In practical terms, the model’s predictions are primarily driven by the independent (main) effects of these variables rather than by multiplicative interactions. This finding reinforces the robustness of coverage indicators as standalone predictors in hotspot detection and suggests limited nonlinear dependency between programmatic coverage metrics within the SCH database. Combinations of the findings support the thesis that although an elimination process at an aggregate level might be undergone, there is still geographic and structural concentration of endemicity. The prediction and trend analysis along with the clustering analysis all come to the same conclusion that there are non-linear and heterogeneous elimination dynamics. FINDINGS The paper determines that there is one large cluster of the epidemiological landscape, which implies that most Admin2 units are similar systems. Hotspots of persistent characteristics make up a significant percentage of observations, and greatly affect the behaviour of the predictive models. The predictive model is shown to be good in discrimination but lacks in immense imbalance in the classes and this discourages its reliability in recognition of true non-hotspots. This shows that the sectors of moderate risks are dominant in intervention prioritization and that only a few districts are considered extreme reverse. Analysis of the temporal trends indicates general decline in the endemicity although with high inter-district exchange. COMPARISON WITH EXISTING STUDIES The performance of the Random Forest model (AUC = 0.80) aligns well with recent machine learning applications for schistosomiasis risk prediction at continental and regional scales [1,2]. Like these studies, coverage indicators and time variables were good predictors of hotspots persistence. Nevertheless, as compared to the previous literature, this paper clearly illustrates the effect of class imbalance in elimination-phase data, in which high sensitivity to hotspots may conceal low detection of true non-hotspots. The results of the clustering shows one dominant structural group with a small atypical cluster are very consistent with documented spatial heterogeneity and focal transmission patterns [3,7]. Although the former analyzed situations mainly mapped prevalence surfaces, this analysis combines unsupervised structural clustering with supervised hotspot location, providing a more operational district level stratification framework. The observed general decrease in endemicity as well as large inter-district variation is similar to continental burden studies that find that there are aggregate decreases with local hotspots. [4]. Similarly, ecological modelling studies have emphasized environmental drivers of transmission [5,6], whereas the present findings highlight the dominant independent role of programmatic coverage indicators in predicting hotspot persistence. In general, the given study does serve as a true complement to the current body of modelling literature by integrating structural clustering, supervised hotspot prediction, and temporal trend analysis into one prioritization system to sub-national elimination planning. CONTRIBUTION TO KNOWLEDGE The paper is important in the expanding field of literature about spatial epidemiology and elimination strategy in three main aspects. To begin with, it combines both unsupervised clustering and supervised hotspots finding, so as to offer a hybrid framework on risk stratification, wherein structural similarity and dynamic persistence are encompassed. Second, it shows that the methodological ramifications of class disparity in the context of elimination include one in which persistent transmission predominates in the epidemiological scenery. Thirdly, it presents a data-driven prioritization schema that integrates hotspots probability with structural cluster membership supplying an actionable paradigm of planned intervention targeting at sub-national scales. CONCLUSION The results show that endemic areas are highly rooted in the spatiality of endemic zones due to how hotspot dynamics are entrenched therein. Aggregate endemicity may be seemingly declining; however, the fact that its stability is observed in most districts and extreme variability is observed in others indicate that elimination needs to be spatially differentiated. Predictive modelling is promising when it comes to finding high-risk locales but must deal with the issue of class imbalance cautiously. The clustering of risk proves that the epidemiological heterogeneity persists to be a characteristic of the sub-national disease landscapes. All in all, the elimination strategies should move beyond the strategies of the uniform mass toward the approach of data-driven, adaptive and more geographically sensitive intervention frameworks. RECOMMENDATIONS To enhance non-hotspot detection, the methods to correct class imbalances, e.g., SMOTE or class-weighted modelling, should be introduced into future work. Trend interpretation can also be further improved by longitudinal mixed-effects modelling. The focus of the medium-risk districts should be realized in operational programs rather than focusing on extreme hotspots. Lastly, the inclusion of environmental, climatic, and socio-economic covariates may enhance predictive strength and increase the ability to create early warning. References Chen X, Le J, Hu Y, et al. Predicting schistosomiasis intensity in Africa: a machine learning approach to evaluate progress toward WHO Roadmap 2030. Am J Trop Med Hyg. 2024;111(1):73–79. DOI: 10.4269/ajtmh.23-0751 Singer BJ, Coulibaly JT, Park HJ, et al. Development of prediction models to identify hotspots of schistosomiasis in endemic regions to guide mass drug administration. Proc Natl Acad Sci U S A. 2024;121(2):e2315463120. DOI: 10.1073/pnas.2315463120 Li H, Zheng J, Midzi N, et al. Schistosomiasis transmission in Zimbabwe: modelling based on machine learning. Infect Dis Model. 2024;9:100–112. DOI :https://doi.org/10.1016/j.idm.2024.06.001 Okoyo C, Minnery M, Orowe I, et al. Using Model-based geostatistical design for schistosomiasis prevalence surveys in Kenya. Front Trop Dis. 2023;4:1240617. DOI: https://doi.org/10.3389/fitd.2023.1240617 Colley DG, Bustinduy AL, Secor WE and King CH. Human schistosomiasis. Lancet. 2014 Jun 28;383(9936):2253-64. doi : 10.1016/S0140-6736(13)61949-2. Epub 2014 Apr 1. PMID: 24698483; PMCID: PMC4672382. Peng D, Zhu Y, Liu L, et al. Schistosomiasis burden and trend analysis in Africa: insights from the Global Burden of Disease Study 2021. Trop Med Infect Dis. 2025;10(2):42. DOI: 10.3390/tropicalmed10020042 Opare J, Hervie T, Mensah E, et al. Schistosomiasis in Ghana from baseline to now: the impact of fifteen years of interventions. Front Public Health. 2025 Jun 6;13:1554069. doi : 10.3389/fpubh.2025.1554069. PMID: 40547465; PMCID: PMC12179181. Kepha S, Ochol D, Wakesho F, et al.. Precision mapping of schistosomiasis and soil-transmitted helminthiasis among school age children at the coastal region, Kenya. PLoS Negl Trop Dis. 2023 Jan 5;17(1):e0011043. doi : 10.1371/journal.pntd.0011043. PMID: 36602986; PMCID: PMC9847902. Tabo Z, Breuer L, Fabia C, et al. machine learning approach for modeling the occurrence of the major intermediate hosts for schistosomiasis in East Africa. Sci Rep. 2024 Feb 21;14(1):4274. doi : 10.1038/s41598-024-54699-1. PMID: 38383705; PMCID: PMC10881506. Kagabo J, Tabo Z, Kalinda C, et al. Schistosomiasis transmission: A machine learning analysis reveals the importance of agrochemicals on snail abundance in Rwanda. PLoS Negl Trop Dis. 2024;18:e0012345. DOI: https://journals.plos.org/plosntds/article?id=10.1371/journal.pntd.0012730 Mbugi NO, Laizer H, Chacha M and Mbega E. Prevalence of human schistosomiasis in various regions of Tanzania Mainland and Zanzibar: A systematic review and meta-analysis of studies conducted for the past ten years (2013-2023). PLoS Negl Trop Dis. 2024 Sep 9;18(9):e0012462. doi : 10.1371/journal.pntd.0012462. PMID: 39250468; PMCID: PMC11412511. Sokolow SH, Wood CL, Jones IJ et al. To Reduce the Global Burden of Human Schistosomiasis, Use 'Old Fashioned' Snail Control. Trends Parasitol. 2018 Jan;34(1):23-40. doi : 10.1016/j.pt.2017.10.002. Epub 2017 Nov 7. PMID: 29126819; PMCID: PMC5819334. Anifowose A, Oluwaseun TF and Akintola MM. Machine learning–enabled genomic meta-analysis for schistosomiasis surveillance across Nigeria. Research Square [preprint]. 2026. Available from: https://www.researchsquare.com/article/rs-8910278/latest Diop B, Sylla K, Kane NM, et al. Correction: Schistosomiasis control in Senegal: results from community data analysis for optimizing preventive chemotherapy intervention with praziquantel. Infect Dis Poverty. 2024 Jun 25;13(1):49. doi: 10.1186/s40249-024-01217-0. Erratum for: Infect Dis Poverty. 2023 Nov 27;12(1):106. doi : 10.1186/s40249-023-01155-3. PMID: 38918879; PMCID: PMC11197347. Sengupta ME, Hellström M, Kariuki HC, et al. Environmental DNA for improved detection and environmental surveillance of schistosomiasis. Proc Natl Acad Sci U S A. 2019 Apr 30;116(18):8931-8940. doi : 10.1073/pnas.1815046116. Epub 2019 Apr 11. PMID: 30975758; PMCID: PMC6500138.. Belachew E, Calpotura K, Adamu A, et al. Constructing a Predictive Model for STH and Schistosomiasis Classification From Microscopic Images. Biomed Res Int. 2025 Nov 29;2025:8074581. doi : 10.1155/bmri/8074581. PMID: 41321694; PMCID: PMC12663861. Cure-Bolt N, Perez F, Broadfield LA, et al. Artificial intelligence-based digital pathology for the detection and quantification of soil-transmitted helminths eggs. PLoS Negl Trop Dis. 2024 Sep 30;18(9):e0012492. doi : 10.1371/journal.pntd.0012492. PMID: 39348405; PMCID: PMC11488745. Tabo Z, Wangalwa R, Rwibutso M, et al. Future climate and demographic changes will almost double the risk of schistosomiasis transmission in the Lake Victoria Basin. One Health. 2025 Jul 18;21:101148. doi : 10.1016/j.onehlt.2025.101148. PMID: 40735740; PMCID: PMC12305727. Bartlett AW, Proboste T, Mendes EP, et al. Spatiotemporal analysis of schistosomiasis and soil-transmitted helminth distribution in three highly endemic provinces in Angola. PLoS Negl Trop Dis. 2025 Apr 8;19(4):e0012974. doi : 10.1371/journal.pntd.0012974. PMID: 40198696; PMCID: PMC12013881. Nyandwi, E., Osei, F.B.,Veldkamp, T. and Amer, S. Modeling schistosomiasis spatial risk dynamics over time in Rwanda using zero-inflated Poisson regression. Sci. Rep. 2020, 10, 19276. [CrossRef] [PubMed]. DOI :10.1038/s41598-020-76288-8 Malizia V, de Vlas SJ, Roes KCB and Giardina F (2024) Revisiting the impact of Schistosoma mansoni regulating mechanisms on transmission dynamics using SchiSTOP, a novel modelling framework. PLoS Negl Trop Dis 18(9): e0012464. doi :10.1371/journal.pntd.0012464 Random Forest Algorithm Overview (H. A. Salman, A. Kalakech, & A. Steiti , Trans.). (2024). Babylonian Journal of Machine Learning , 2024 , 69-79. https://doi.org/10.58496/BJML/2024/007 Delgado-Panadero Á, Benítez-Andrades JA, García-Ordás MT. A generalized decision tree ensemble based on the NeuralNetworks architecture: Distributed Gradient Boosting Forest (DGBF). arXiv [preprint] . 2024. Available from: https://arxiv.org/abs/2402.03386 [preprint] Ahmed M, Seraj R, Islam SMS. The k-means algorithm: A comprehensive survey and performance evaluation. Electronics (Switzerland). 2020;9(8):1295. doi: 10.3390/electronics9081295 Wahyudin, W., Riza, L. S., Erlangga, E., & Al Husaeni, D. N. (2025). Machine Learning-Based Clustering for Program Learning Outcomes in Higher Education: A Systematic Review. Brilliance: Research of Artificial Intelligence , 5 (1), 182–189. https://doi.org/10.47709/brilliance.v5i1.5953 Isnan S, Bin Abdullah AF, Shariff AR, Ishak I, Syed Ismail SN, Appanan MR. Moran's I and Geary's C : investigation of the effects of spatial weight matrices for assessing the distribution of infectious diseases. Geospat Health. 2025 Jan 23;20(1). doi: 10.4081/gh.2025.1277. Epub 2025 Apr 7. PMID: 40197607. Additional Declarations The authors declare no competing interests. Supplementary Files Supplementarylist.docx Source code Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8905314","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":593357476,"identity":"0e46d75f-e1ac-45a2-b874-26b0cee05896","order_by":0,"name":"Akinjide S. Anifowose","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABIElEQVRIie2QsWrDMBBATwQ8CbQ6ZOgv2AhSCk7zKycM7mJPXToUIggoS4tX9S8MXToKDM1iyJrRnbx08FTcpa1rk01uO3bQW3Tc3eN0B+Bw/EM8IBIwGmJSt6d0PT4zm8JgZgCTsSHUY5JI/EGZS68vj4q3oH9RAkOXdY2RyFlpFqunPtjtmxpv4IxJygO7ch4gJuJBJ8izKhG6SkOJFYTaUI4TU3zRlVlxpEGcqT6AlEihgBRAubEoa8PefMS+83Boywv1mRWseZHiA9ZTyjBlUExKtkSZrPAxlEKC+FasHysHJdn0u3Byr+KNPr6GGp/9WJfetXX9/d1y3mHEGSub7l1dcpZf1W17G63y3fbRt13Zevoef7rkcDgcjl/5Ap+zaWsvVvikAAAAAElFTkSuQmCC","orcid":"https://orcid.org/0000-0003-2105-0325","institution":"Newcastle University","correspondingAuthor":true,"prefix":"","firstName":"Akinjide","middleName":"S.","lastName":"Anifowose","suffix":""},{"id":593357477,"identity":"43e8aba9-5c4a-470b-b368-7a98512cc685","order_by":1,"name":"Tomilayo O. Fadairo","email":"","orcid":"https://orcid.org/0009-0006-5028-3792","institution":"Lead City University, Nigeria","correspondingAuthor":false,"prefix":"","firstName":"Tomilayo","middleName":"O.","lastName":"Fadairo","suffix":""},{"id":601999824,"identity":"e6f29984-d65c-4765-b825-65da030751dc","order_by":2,"name":"Temitope E. Olajide","email":"","orcid":"","institution":"Mathematics and Natural Sciences, William V.S. Tubman University, Harper, Liberia, Harper, NGA","correspondingAuthor":false,"prefix":"","firstName":"Temitope","middleName":"E.","lastName":"Olajide","suffix":""},{"id":593357478,"identity":"c5e303a8-3c19-4dce-afad-f6222936c6eb","order_by":3,"name":"Mayode M. Akintola","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABEElEQVRIiWNgGAWjYDACHsYGIGkB5TBIyIHoAw8IaTnAIAHTYmEM1pKAVwtIBUJLRSLIUgZ8Wvh7Drc9/lAhIa/bfsbwwZsKifT5YYcfAm2xk9NtwK5F4mxju8GBMxKG287kGBvOOSORu/F2mgFQS7Kx2QEc1pxnbJM42CbBuO1Ajpk0bxtQy+wEkJYDidtwaJGHarHfdv4NUMs/iXTD2ekf8GoxONsI1pK47QbIlgaJBHnpHPy2GJ4Bqj9zRiJ5241nxYZzjkkYbpDOKTiQYIDbL3Jn0p9JVFTY2G47n7zxwZuaOnn52embP3yosJPD6X0E4DCAOBWs0oCgchBgfwCm5BuIUj0KRsEoGAUjCAAA39tm5afjqVwAAAAASUVORK5CYII=","orcid":"https://orcid.org/0009-0009-3436-4850","institution":"University of Ibadan","correspondingAuthor":true,"prefix":"","firstName":"Mayode","middleName":"M.","lastName":"Akintola","suffix":""},{"id":601999825,"identity":"f07bc165-f6c1-4f00-be0b-02573df6628b","order_by":4,"name":"Olusegun E. Thomas","email":"","orcid":"","institution":"Biology, Tai Solarin University, Nigeria, Ijagun, NGA","correspondingAuthor":false,"prefix":"","firstName":"Olusegun","middleName":"E.","lastName":"Thomas","suffix":""},{"id":601999826,"identity":"c3cb38fe-089c-4e7d-aef7-669d3dc8e60f","order_by":5,"name":"Praise T. Oloruntola","email":"","orcid":"","institution":"College of Veterinary Medicine, Federal University of Agriculture, Abeokuta, NGA","correspondingAuthor":false,"prefix":"","firstName":"Praise","middleName":"T.","lastName":"Oloruntola","suffix":""},{"id":601999827,"identity":"662fa404-982f-4418-98f1-9acdebf82c5a","order_by":6,"name":"Oluwatobiloba S. Ogunde","email":"","orcid":"","institution":"College of Animal Science and Livestock Production, Federal University of Agriculture, Abeokuta, NGA","correspondingAuthor":false,"prefix":"","firstName":"Oluwatobiloba","middleName":"S.","lastName":"Ogunde","suffix":""}],"badges":[],"createdAt":"2026-02-18 03:12:35","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-8905314/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8905314/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":104476140,"identity":"c276e4c3-1c7b-4466-bdd9-b6f6ca89b770","added_by":"auto","created_at":"2026-03-12 08:20:42","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":26901,"visible":true,"origin":"","legend":"\u003cp\u003eRisk Cluster Distribution\u003c/p\u003e","description":"","filename":"Figure1.png","url":"https://assets-eu.researchsquare.com/files/rs-8905314/v1/84a24c75cba118b5fdc999d4.png"},{"id":104476139,"identity":"6f81ced7-5369-4eef-abd8-1b1cdb8b9bf2","added_by":"auto","created_at":"2026-03-12 08:20:42","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":36508,"visible":true,"origin":"","legend":"\u003cp\u003eIntervention priority Levels\u003c/p\u003e","description":"","filename":"Figure3.png","url":"https://assets-eu.researchsquare.com/files/rs-8905314/v1/82a8eb8aae2d93d820b16dd9.png"},{"id":104780616,"identity":"c1456a43-af32-46f9-a7e9-d3251cc1c74f","added_by":"auto","created_at":"2026-03-17 07:53:24","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":80514,"visible":true,"origin":"","legend":"\u003cp\u003eSpatiotemporal Endemicity Trends per Admin2\u003c/p\u003e","description":"","filename":"Figure4.png","url":"https://assets-eu.researchsquare.com/files/rs-8905314/v1/2455dd9b77af8204bb30f8fa.png"},{"id":104476143,"identity":"0da5eb51-2c2d-4276-9164-37e5039a8fab","added_by":"auto","created_at":"2026-03-12 08:20:42","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":53714,"visible":true,"origin":"","legend":"\u003cp\u003eR0C Curve for persistent hotspot prediction\u003c/p\u003e","description":"","filename":"Figure5.png","url":"https://assets-eu.researchsquare.com/files/rs-8905314/v1/241e56e308e20dae5af81d9d.png"},{"id":104476144,"identity":"eca2c82c-1e96-4a8c-ad42-63bf3e691650","added_by":"auto","created_at":"2026-03-12 08:20:42","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":31629,"visible":true,"origin":"","legend":"\u003cp\u003eConfusion Matrix For persistent hotspot\u003c/p\u003e","description":"","filename":"Figure6.png","url":"https://assets-eu.researchsquare.com/files/rs-8905314/v1/8f89198f314554286d32a743.png"},{"id":104476142,"identity":"79e86e46-5e69-4f23-bb3a-06f98748fed4","added_by":"auto","created_at":"2026-03-12 08:20:42","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":58579,"visible":true,"origin":"","legend":"\u003cp\u003eSHAP interaction result\u003c/p\u003e","description":"","filename":"Figure7.png","url":"https://assets-eu.researchsquare.com/files/rs-8905314/v1/6f755e8503b8ba48969fa79b.png"},{"id":104784427,"identity":"6f670778-6c6d-46d3-9ed7-cab713522bd6","added_by":"auto","created_at":"2026-03-17 08:07:29","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1059538,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8905314/v1/40a38e04-aaa7-46eb-9bbf-750f94d29118.pdf"},{"id":104476138,"identity":"026307ba-83e2-486f-b84f-547aa3c20041","added_by":"auto","created_at":"2026-03-12 08:20:42","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":22371,"visible":true,"origin":"","legend":"\u003cp\u003eSource code\u003c/p\u003e","description":"","filename":"Supplementarylist.docx","url":"https://assets-eu.researchsquare.com/files/rs-8905314/v1/22ad4eecdc0cbbdd6f098644.docx"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003e\u003cstrong\u003eAI-Driven Hotspot Detection and Program Performance Analysis of Schistosomiasis in Africa\u003c/strong\u003e\u003c/p\u003e","fulltext":[{"header":"INTRODUCTION","content":"\u003cp\u003eSchistosomiasis is among the most critical parasitic neglected tropical diseases (NTDs) impacting the populations in sub-Saharan Africa with the ecological suitability, poverty, lack of sufficient sanitation, and poor access to safe water supporting the continuation of the disease. The disease still shows a significant spatial heterogeneity and active transmission foci in several countries endemic despite years of control efforts that are majorly based on preventive chemotherapy when mass drug administration (MDA). As per recent modelling evaluations, the current improvement to the World Health Organization (WHO) 2030 morbidity elimination objectives still shows uneven developments where most high-burden areas are not likely to reach the desired threshold of morbidity control without supplementary and better-focused interventions [1,2].\u003c/p\u003e \u003cp\u003eMass-treatment efforts have achieved impressive gains and declines in prevalence and intensity of infections in most endemic locations but the effect has not been spread equally. [3]. Besides machine learning, spatiotemporal modelling has also been used to give vital information regarding the development of the risk of schistosomiasis across time and space. The model-based geostatistical analyses of Kenya showed that spatial autocorrelation and survey design have a huge impact on the precision of risk estimates and the ultimate MDA decisions [4]. Reinfection after MDA, environmental fitness of ecological parameters of intermediate hosts, locational mobility, and intermittent treatment use have all played a role in the existence of hotspots despite repeated intervention cycles [5]. Multi-country studies have proven the point that improvements at national levels usually conceal significant disparities at the subnational level, and newer analysis instruments are required to reveal the existence of residual pockets of transmission [6,7].\u003c/p\u003e \u003cp\u003ePrevalence-based stratification approaches, which are usually based on threshold classifications based on cross-sectional surveys, might not be sufficiently able to include nonlinear and dynamic contributors of persistence to transmission. More and more scientists are recommending that machine learning (ML), geospatial modelling and longitudinal performance indicators be incorporated into schistosomiasis surveillance and control systems [1,8]. The machine learning techniques present their unique benefits in the ability to fit in complex interactions among environmental, epidemiological and programmatic variables that traditional regression models might not capture fully.\u003c/p\u003e \u003cp\u003eRecent discoveries currently reveal the usefulness of ML-based predictive modelling in the environment of the presence of schistosomiasis. Chen et al. used several machine learning models to predict the level of infection by country and assess the achievement of WHO Roadmap 2030 goals, and it was found that the accuracy of predictions increased significantly with the implementation of the multidimensional indicator [1]. On the same note, Li et al. applied machine learning models to simulate the dynamics of schistosomiasis transmission in Zimbabwe, observing how the environment and programmatic coverage play out to form risk-prone patterns [3]. These results indicate a wider shift between the descriptive mapping to predictive analytics that are able to help in the proactive use of interventions. Recent studies have begun exploring advanced predictive and analytical frameworks to better understand transmission dynamics and control efficacy for schistosomiasis, highlighting the importance of integrating real-world programmatic data with machine learning methods to enhance hotspot detection and intervention planning [13].\u003c/p\u003e \u003cp\u003eThe comparisons based on the Global Burden of Disease statistics also illustrated the findings of heterogeneous reductions in disability-adjusted life years (DALYs) regionally in the African countries, which proves the significance of subnational prognostic models [6]. Collectively, these papers indicate that risk profiling through artificial intelligence, hotspots forecasting, geographical segmentation, and temporal trend modeling can integrate and enhance the performance of schistosomiasis programs in Africa.\u003c/p\u003e\n\u003ch3\u003eAim of the study\u003c/h3\u003e\n\u003cp\u003eThe aim of the study is to design an AI-based system to monitor hotspots of transmission, profile the risk of endemicity, and assess the effectiveness of the programs on schistosomiasis in Africa by means of multi-level administrating information and time level-based data to assist in the planning of interventions and optimizing the resources.\u003c/p\u003e \u003cp\u003e \u003cb\u003eObjectives\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003col\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eTo apply machine learning methods to identify and stratify spatial transmission risk patterns of schistosomiasis using administrative-level indicators such as endemicity class, treatment coverage, population at risk, and MDA history.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eTo develop and validate machine-learning models that predict areas likely to remain hotspots or experience persistent low coverage based on historical endemicity, treatment performance, and population-level indicators.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eTo integrate ML-derived risk profiles, hotspot predictions, and cluster outputs to generate a geographic stratification framework that supports targeted MDA planning and resource allocation across Africa.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eTo analyze and forecast temporal changes in endemicity and treatment coverage using time-aware machine learning models, supporting proactive decision-making. The year variable helps the temporal dimension.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eRELATED STUDIES\u003c/h2\u003e \u003cp\u003eIncreasing use of machine learning tools in the study of schistosomiasis is an indicator of an increased understanding of how transmission systems are complex. The research by Chen et al. showed that the ensemble machine learning models, such as random forests and gradient boosting, are more effective than the traditional statistical methods to predict the categories of infection intensity among endemic countries in Africa [1]. Their analysis also found that even with no specific efforts to improve the performance of programmes, a significant number of moderate- to high-risk districts can stay above the elimination extremes. To supplement this, Singer et al. created predictive models to establish baseline features that are connected to communities with a high likelihood of continuing high prevalence after multiple rounds of MDA [2]. Despite the average predictive performance, this study highlighted the plausibility of predicting the occurrence of hotspots through the combination of epidemiological and environmental data.\u003c/p\u003e \u003cp\u003eThere has also been the application of machine learning to environmental and ecological determinants of schistosomiasis transmission. Tabo et al. trained machine learning-based to model the distribution of intermediate snail hosts in East Africa, which shows that predictive value of climatic and hydrological variables to the transmission suitability is provided [9]. Kagabo et al. also determined agrochemical exposure and disruption of ecological patterns as important predictors of snail abundance with the use of ML-based ecological modelling in Rwanda [10]. These results solidify the notion of using the environmental covariates into the hotspots prediction systems.\u003c/p\u003e \u003cp\u003eSpatiotemporal modelling has also contributed in the knowledge of perennial patterns of transmission. The article by Okoyo et al. showed through the use of model-based geostatistical techniques that the type of spatially structured random effects yields much better estimates of risk in Kenya [4]. Mbugi et al. examined data on national surveys in Tanzania and showed significant subdistrict variation in schistosomiasis prevalence that was due to coarse administrative stratification [11]. Similarly, Sokolow et al. presented evidence that ecological interventions that affect the habitat of snails have the capability of modifying the patterns of transmission thereby confirming the interdependence between environmental and programmatic determinants [12]. In the broader body of schistosomiasis research, studies have examined both biological mechanisms and transmission dynamics using traditional epidemiological methods and mathematical modelling [2\u0026ndash;4]. However, emerging analyses utilizing high-dimensional datasets and computational approaches suggest potential for deeper insights into transmission persistence and program performance, exemplified by recent preprint work exploring data-driven infectious risk frameworks using real-world endemic datasets. [13].\u003c/p\u003e \u003cp\u003eEndemicity patterns over time have also been clearly explained using the temporal trend analyses. Based on the Global Burden of Disease, Peng et al. showed the heterogeneous trends of the decline in the burden of schistosomiasis in Africa, but these trends varied between the year 1990 and 2021 [6]. Subsequent longitudinal studies in Ghana and Senegal found that among multiple repeated MDA rounds, an often incompletely filled focus of endemicity may be left behind in places with sporadic coverage or where newly infected individuals are at high risk of being reinfected [7,14]. The results underscore the need to integrate programmatic measures of performance like coverage of treatment, coverage of effective treatment and population at risk with the predictive modelling framework.\u003c/p\u003e \u003cp\u003eIn addition to predictive mapping innovations, there are diagnostic and surveillance innovative works, which have been used in improving on modelling inputs. The development of the environmental DNA (eDNA) surveillance has enhanced sensitivity of the detection of the presence of the schistosome in aquatic environments and provided a source of complementary data to predictive risk modelling [15]. Image classification tools by machine learning have also enhanced the outbreak of parasites in microscopy images and increased the accuracy of the diabetes and possibly the quality of data used in modeling processes [16,17].\u003c/p\u003e \u003cp\u003eOther economic conditions that can be used to alter the dynamics of the transmission of schistosomiasis include climate change and demographic transitions. According to Tabo et al. the climate and population changes in future may contribute significantly to widening the transmission suitability in certain regions of the Lake Victoria Basin, which explains the necessity of adaptive forecasting frameworks [18]. The same modelling applied to Angola and Rwanda showed that a fortification of demographic growth and environmental variability into spatiotemporal models improves prediction of other factors [19,20].\u003c/p\u003e \u003cp\u003eModelling syntheses and systematic reviews also highlight the same research gaps. Malizia et al. have expressed the difficulty of forecasting the long-term removal curves because of the reinfection processes and dissimilar treatment adherence [21]. Altogether, the literature indicates a significant advancement in machine learning, geostatistics, and spatiotemporal modelling of the risk of schistosomiasis. Nevertheless, use of endemicity classification, treatment coverage indicator, multi-level administrative identifier and longitudinal hot spot persistence modelling on a single AI driven framework is restricted. Close this divide is necessary to enhance the data-based geographic stratification and useful highlighted interventions planning throughout Africa.\u003c/p\u003e \u003c/div\u003e"},{"header":"MATERIALS AND METHODS","content":"\u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003eStudy Design\u003c/h2\u003e \u003cp\u003eThis research utilized a retrospective, multi-country, and spatiotemporal analytic design that combined both administrative level epidemiological and programmatic indicators of schistosomiasis in endemic countries of Africa. The analysis model integrated machine learning, forecasting modelling, spatial stratification, and time-related forecasting methods in assessing the transmission risk pattern and identifying chronic hotspots and aiding intervention planning. This paper was designed with four main analytical elements consisting of the mentioned study aim: (1) machine learning onto endemicity and transmission risk profiling, (2) predictive modelling of persistent hotspots and program gaps, (3) AI based geographic stratification and (4) spatiotemporal modelling of temporal trend.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eData Sources and Study Variables\u003c/h3\u003e\n\u003cp\u003eThe data on administrative level schistosomiasis programs were accessed through the data portal of the Expanded Special Project for the Elimination of Neglected Tropical Diseases (ESPEN database) [1]. The data was downloaded in the period between 2014 and 2024 and downloaded at the unit of implementation (IU) level, which is the second-level administrative divisions (Admin2) in endemic African countries.\u003c/p\u003e \u003cp\u003eThe 2014\u0026ndash;2024 years have been chosen to ensure a decade of programmatic implementation within the framework of intensified mass drug administration (MDA) to meet the WHO Roadmap in alignment with the WHO NTD Roadmap 2021\u0026ndash;2030 [2]. This time window enables meaningful spatiotemporal modelling, the likelihood of longitudinal persistence in hotspots, and the measurement of the treatment coverage trends through time.\u003c/p\u003e \u003cp\u003eFive African regions were selected in order to obtain a continental representation and to obtain geographic heterogeneity in transmission dynamics: Data were downloaded in 5 African regions.\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eWest Africa: Nigeria, Benin, Ghana and Senegal.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eCentral Africa: Cameroon, Democratic republic of Congo, Chad, Central African Republic.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eEastern Africa: Ethiopia, Kenya, Tanzania (Mainland), Uganda.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eSouthern Africa: Mozambique, South Africa, Zambia, Zimbabwe.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eNorth Africa: Mauritania, Algeria, South Sudan.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eEach region was selected with four countries with the exception of three countries in North Africa since there was scant availability of endemic reporting within the ESPEN portal.\u003c/p\u003e \u003cp\u003eSelection of countries was done on:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003ePast endemicity of schistosomiasis,\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eAccess to uniform longitudinal IU-level programmatic data,\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eReported practice of MDA programs,\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eSample of various transmission ecologies in Africa.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eThe overall collection of the countries represents high-, moderate-, and residual-endemic settings, which allows the comparative modelling of transmission persistence and program performance.\u003c/p\u003e \u003cp\u003eThe key variables in the dataset were:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eendemicity (categorical endemicity classification)\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003ecov (treatment coverage)\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eepiCov (epidemiological coverage)\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003epopReq (population requiring treatment)\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003epopTreat (population treated)\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003emdaScher (mass drug administration schedule)\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eeffPc (efficacy of preventive chemotherapy in coverage)\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eepiPc (epidemiological preventive chemotherapy indicator)\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eepiEffPc (epidemiological efficacious preventive chemotherapy)\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eyear (temporal variable)\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eIn order to increase modelling capacity, more derived variables have been created, such as:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003ecoverage_ratio (popTreat/popReq)\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003ecoverage (popReq [?]) popTreat)\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eendemicity_lag1\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003ecov_lag1\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eeffPc_lag1\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003erolling indicators on coverage.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eThe ultimate data was 70,372 Admin2-years of data between the years 2014 and 2024.\u003c/p\u003e\n\u003ch3\u003eData Availability Statement\u003c/h3\u003e\n\u003cp\u003eThe schistosomiasis administrative-level dataset used in this study was obtained from the SCH (Schistosomiasis Control (SCH) program) database. The data were downloaded from the official database portal and are publicly accessible subject to the database\u0026rsquo;s terms of use. Processed datasets generated during the analysis are available from the corresponding author upon reasonable request.\u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eData Preprocessing\u003c/h2\u003e \u003cp\u003ePreprocessing and data cleaning Data processing was performed before modelling. Missing values were evaluated and filled in applying proper statistical procedures based on the patterns of missing values. Standardization of continuous variables was done so as to enhance the stability of the model. The classifications of categorical endemicity were coded on a numerical basis to be modelled. The temporal lag variables were developed in order to address the dynamics of persistence and better predictive modelling of hotspots continuity. Variance inflation factors (VIF) were used to test multicollinearity between predictors and variables that were strongly correlated were treated in a suitable manner to minimize redundancy. MB Eliza Objective 1: ML-Based Endemicity and Transmission Risk Profiling. Supervised and unsupervised computer learning methods were used to identify and partition risk patterns of transmission within space.\u003c/p\u003e \u003cp\u003e \u003cb\u003eRisk Classification under Supervision.\u003c/b\u003e \u003c/p\u003e \u003cp\u003eThe classification of endemicity was used as the primary outcome. The algorithm models such as the Random Forest [22] and the Gradient Boosting [23] were trained to forecast endemicity relationships with prediction variables (cov, epiCov, popReq, popTreat, mdaScher). The importance measures of the features were obtained to identify the prevalent transmission-risk drivers. Performance on different models was estimated with the help of cross-validation procedures and such indicators as accuracy, precision, recall and F1-score. Unmonitored Risk Grouping. To derive the latent patterns of transmission that do not depend on a priori endemicity categories, models of clustering algorithms (K-means and hierarchical clustering) were used over the effect of selected endemicity categories [24,25]. Silhouette scores and elbows were used to calculate optimal cluster numbers. The interpretation of clusters as transmission typologies and their mapping on the levels of the administrative units were aimed at visualizing the spatial heterogeneity.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eHotspots and Program Gap Predictive Modeling\u003c/h3\u003e\n\u003cp\u003ePersistent hotspots were operationally defined according to endemicity levels that were continuously high or low rates of treatment coverage during successive years. Intended hotspots persistence was formed by creating binary outcome variables. The time-sensitive supervised machine learning were constructed to forecast the persistence of hotspots by use of the historical endemicity, coverage indicators (cov, effPc, epiPc, epiEffPc), and population-level variables (popReq, popTreat), with introducing temporal lag characteristics. Random Forest, Gradient Boosting and logistic regression (base model) were trained and also validated using k-fold cross-validation.\u003c/p\u003e\u003ch3\u003e\u003cstrong\u003ePerformance on the model was measured using\u003c/strong\u003e:\u003c/h3\u003e\n\u003cp\u003e\u0026bull; Receiver Operating Characteristic (ROC) curves.\u003c/p\u003e\n\u003cp\u003e\u0026bull; Area Under the Curve (AUC)\u003c/p\u003e\n\u003cp\u003e\u0026bull; Precision-Recall metrics\u003c/p\u003e\n\u003cp\u003e\u0026bull; Calibration plots\u003c/p\u003e\n\u003cp\u003eThe analysis of programmatic indicators with the greatest association with persistent hotspots revealed an importance of the feature.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eGeographic stratification of targeted intervention planning with the help of AI.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe products of Objective 1 and 2 were used to design a framework of a geographic stratification between the multi-level administrative units (admin0-admin3). The administrative units were sorted into intervention levels of priority according to:\u003c/p\u003e\n\u003cp\u003e\u0026bull; Transmission risk classification\u003c/p\u003e\n\u003cp\u003e\u0026bull; Predicted hotspot persistence probability\u003c/p\u003e\n\u003cp\u003e\u0026bull; Treatment coverage gaps\u003c/p\u003e\n\u003cp\u003eThe patterns of stratification were visualized with the help of spatial clustering techniques and geographic information system (GIS) mapping. To determine the high-risk area clustering, Moran\u0026rsquo;s I was used to perform the spatial autocorrelation [26]. The hierarchy model created decision support categories to inform resource distribution and strategic MDA planning.\u003c/p\u003e\n\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e\n \u003ch2\u003eSpatiotemporal Modeling of Schistosomiasis Trends\u003c/h2\u003e\n \u003cp\u003eThe time-centric modelling methods were used to the temporal dynamics of endemicity and treatment coverage. Mixed-effects regression maps were used to provide compensation to hierarchical administrative structure; whereby administrative units are considered random effects. Also, the model of machine learning that includes time characteristics was applied to predict short-term changes in endemicity and coverage. ANOVA has been used to determine the trends in an attempt to estimate the annual change rates. Rolling averages were calculated to regularize the variations in time. Projection models used projected endemicity curves, where they were available, over a minimum of 3 which were updated over a maximum of 5 years. The terms of spatial-temporal interaction were tested with an aim of determining whether temporal trends differed significantly among geographic units.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec12\" class=\"Section2\"\u003e\n \u003ch2\u003eModel Validation and Sensitivity Analysis\u003c/h2\u003e\n \u003cp\u003eThe robustness of models was assessed by several cross-validation and hold-out cross-validation in regions where it was possible. The sensitivity analysis was done by changing the definition of hotspots and evaluating the changes in predictive performance. The analysis of the comparative performance of machine learning models to traditional regression methods was conducted to test the differences in the performance that could be attributed to the use of AI-based modelling.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec13\" class=\"Section2\"\u003e\n \u003ch2\u003eEthical Considerations\u003c/h2\u003e\n \u003cp\u003eThe research employed secondary, aggregate administrative level data with no identifiable information. No direct human participation was made, and secondary programmatic data analysis did not require any ethical approval as per the guidelines of the institution.\u003c/p\u003e\n\u003c/div\u003e"},{"header":"RESULTS AND DISCUSSION","content":"\u003cp\u003eThis study analyzed 70,372 Admin2-year observations to address four core objectives relating to spatial risk clustering, persistent hotspot prediction, intervention prioritization, and spatiotemporal endemicity trends.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eClassification Performance Metrics for Persistent Hotspot Prediction Model\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eClass\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eF1-Score\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSupport\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e1,445\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.89\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.94\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e11,843\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eAccuracy\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.88\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e13,288\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eMacro Average\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.45\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.49\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.47\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e13,288\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eWeighted Average\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.79\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.88\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.84\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e13,288\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eDistribution of Administrative Units Across Risk Clusters\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"2\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRisk Cluster\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNumber of Administrative Units\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e12,163\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e9,948\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e46,397\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e1,864\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eTotal\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e70,372\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eDescriptive Statistics of Spatiotemporal Trend Slopes\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"2\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eStatistic\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eValue\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCount\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e70,372\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMean\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-7.586508\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eStandard Deviation\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e28.378124\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMinimum\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-450.000000\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e25th Percentile\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-0.109091\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMedian (50th Percentile)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.000000\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e75th Percentile\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.036364\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMaximum\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e48.880952\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eIn Objective 1, a K-means (k\u0026thinsp;=\u0026thinsp;4) was used after standardization of epidemiological and coverage indicators. The cluster distribution that resulted was extremely skewed. Cluster 2 had the highest number of 46,397 Admin2 units (65.9 percent), Cluster 0 had 12,163 units (17.3 percent), Cluster 1 had 9,948 units (14.1 percent) whereas Cluster 3 had only 1,864 units (2.6 percent). To this distribution, there is one dominant large group of epidemiological risk profile and a very small high-risk/ extreme-profile group.\u003c/p\u003e \u003cp\u003eIn Objective 2, a Random Forest classifier was trained to identify long lasting hotspots. The model tested on 13288 test observations generated a general accuracy of 88. The shape area under the ROC curve (AUC) was 0.80, which was a good discriminative performance. But, performance in terms of classes had been uneven. In the non-hot spot group (0), the accuracy was 0.01 and the recall was 0.00 where 1, 445 observations were made. In case of hot spot (1), the accuracy was 0.89 and the recall was 0.99 with 11,843 observations. The confusion matrix indicated that there were 11,706 true positives, 137 false negatives, 1,444 false positives and only 1 true negative.\u003c/p\u003e \u003cp\u003eIn Objective 3, prioritization and intervention based on hotspots probability and cluster membership revealed that Priority 2 constituted the highest percentage of all the data (more than 55,000 Admin2 units). Priority 3 used up about 12000 units, and Priority 1 and Priority 4 were tiny values (around 2000 and 1000 units respectively). This shows that the majority of areas lies in the moderate-risk levels of operation and not extreme high- or low-priority.\u003c/p\u003e \u003cp\u003eUnder Objective 4, the average slope of the trend of spatiotemporal endemicity was found to be -7.59 with a standard deviation of 28.38. The median slope was 0.00 indicating that the majority of the Admin2 units had remained at a similar endemicity level during the time frame. Extreme negative slopes (minimum\u0026thinsp;\u0026minus;\u0026thinsp;450) and positive slopes (maximum 48.88) however, show that few districts had large changes in endemicity, either positive or negative.\u003c/p\u003e"},{"header":"DISCUSSION","content":"\u003cp\u003eThe clustering outcomes show a structurally disproportionate epidemiological situation as depicted in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e and Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e. The supremacy of Cluster 2 implies that most Admin2 units have similar epidemiological and intervention cover features. This can be a sign of homogenization of programmatic implementation among districts. Nevertheless, the fact that there is a very small Cluster 3 (2.6) is indicative of the existence of pockets of atypical epidemiological patterns, which may be extreme vulnerability or highly regulated settings. These mini clusters are especially critical in the setting of elimination since they tend to be either the chronic site of transmission or the region that has abnormal productivity.\u003c/p\u003e \u003cp\u003eThe random forest model was also good in its overall discrimination capacity (AUC\u0026thinsp;=\u0026thinsp;0.80), which means that the predictors chosen, coverage indicator, lag variables, efficiency measures, carry a lot of information in the form of the hotspots as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e and Table \u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. The evaluation metrics however show gross effects of class imbalance. The hotspot model is highly predictive of the hotspot class (1), although it successfully predicts 99% of hotspots but is almost hopeless at identifying non-hotspots (0). This bias is affirmed by the confusion matrix in Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e, which gave only 1 correctly predicted non-hotspot. This indicates that the dataset is dominated by persistent hotspots and the model has learnt to predict almost all observations to be hotspots to maximize the accuracy. Hence, accuracy is high (88 percent); it is exaggerated by the imbalance in the distribution of classes. This imbalance is better expressed by the macro-average F1 of 0.47 as compared to overall accuracy.\u003c/p\u003e \u003cp\u003eIn Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, the prioritization structure indicates that the majority of the Admin2 units belong to the priority 2. This means that there will be a great deal of moderate risk in contrast to risk categories that are sharply outlined. Minor: very few units would be characterized as Priority 1, which implies that extreme high-risk convergence (high probability of hotspots and high-risk clusters) is low. It can be taken to mean that despite the persistence of hotspots, they do not necessarily correspond to the extreme epidemiological clusters. The allocation also indicates that the allocation of resources should not be made to cover a small group on the district level but rather have a wider, stratified model of operation.\u003c/p\u003e \u003cp\u003eThe spatiotemporal trend analysis is a finer insight as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e. The negative value shown in the mean slope (\u0026minus;\u0026thinsp;7.59) in Table \u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e indicates an overall reduction of endemicity during the study period. The median of zero, however, shows that it was not a steady state of endemicity but more likely was steady as the majority of districts had stable levels of endemicity. The standard deviation is large, and the extreme values indicate that there is heterogeneity of program effects. The change in some districts is dramatic, which perhaps indicates effective intervention campaigns, whereas in others, the change is upwards or the same, indicating lingering foci of transmission of the infection or gaps in the program.\u003c/p\u003e \u003cp\u003eSHAP interaction plots in Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e demonstrated that both cov and effPc exhibited interaction values tightly clustered around zero, with no substantial dispersion toward positive or negative extremes. This pattern indicates weak second-order interaction effects between treatment coverage and effective preventive chemotherapy coverage. In practical terms, the model\u0026rsquo;s predictions are primarily driven by the independent (main) effects of these variables rather than by multiplicative interactions. This finding reinforces the robustness of coverage indicators as standalone predictors in hotspot detection and suggests limited nonlinear dependency between programmatic coverage metrics within the SCH database.\u003c/p\u003e \u003cp\u003eCombinations of the findings support the thesis that although an elimination process at an aggregate level might be undergone, there is still geographic and structural concentration of endemicity. The prediction and trend analysis along with the clustering analysis all come to the same conclusion that there are non-linear and heterogeneous elimination dynamics.\u003c/p\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003eFINDINGS\u003c/h2\u003e \u003cp\u003eThe paper determines that there is one large cluster of the epidemiological landscape, which implies that most Admin2 units are similar systems. Hotspots of persistent characteristics make up a significant percentage of observations, and greatly affect the behaviour of the predictive models. The predictive model is shown to be good in discrimination but lacks in immense imbalance in the classes and this discourages its reliability in recognition of true non-hotspots. This shows that the sectors of moderate risks are dominant in intervention prioritization and that only a few districts are considered extreme reverse. Analysis of the temporal trends indicates general decline in the endemicity although with high inter-district exchange.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003eCOMPARISON WITH EXISTING STUDIES\u003c/h2\u003e \u003cp\u003eThe performance of the Random Forest model (AUC\u0026thinsp;=\u0026thinsp;0.80) aligns well with recent machine learning applications for schistosomiasis risk prediction at continental and regional scales [1,2]. Like these studies, coverage indicators and time variables were good predictors of hotspots persistence. Nevertheless, as compared to the previous literature, this paper clearly illustrates the effect of class imbalance in elimination-phase data, in which high sensitivity to hotspots may conceal low detection of true non-hotspots.\u003c/p\u003e \u003cp\u003eThe results of the clustering shows one dominant structural group with a small atypical cluster are very consistent with documented spatial heterogeneity and focal transmission patterns [3,7]. Although the former analyzed situations mainly mapped prevalence surfaces, this analysis combines unsupervised structural clustering with supervised hotspot location, providing a more operational district level stratification framework.\u003c/p\u003e \u003cp\u003eThe observed general decrease in endemicity as well as large inter-district variation is similar to continental burden studies that find that there are aggregate decreases with local hotspots. [4]. Similarly, ecological modelling studies have emphasized environmental drivers of transmission [5,6], whereas the present findings highlight the dominant independent role of programmatic coverage indicators in predicting hotspot persistence.\u003c/p\u003e \u003cp\u003eIn general, the given study does serve as a true complement to the current body of modelling literature by integrating structural clustering, supervised hotspot prediction, and temporal trend analysis into one prioritization system to sub-national elimination planning.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003eCONTRIBUTION TO KNOWLEDGE\u003c/h2\u003e \u003cp\u003eThe paper is important in the expanding field of literature about spatial epidemiology and elimination strategy in three main aspects.\u003c/p\u003e \u003cp\u003eTo begin with, it combines both unsupervised clustering and supervised hotspots finding, so as to offer a hybrid framework on risk stratification, wherein structural similarity and dynamic persistence are encompassed.\u003c/p\u003e \u003cp\u003eSecond, it shows that the methodological ramifications of class disparity in the context of elimination include one in which persistent transmission predominates in the epidemiological scenery.\u003c/p\u003e \u003cp\u003eThirdly, it presents a data-driven prioritization schema that integrates hotspots probability with structural cluster membership supplying an actionable paradigm of planned intervention targeting at sub-national scales.\u003c/p\u003e \u003c/div\u003e"},{"header":"CONCLUSION","content":"\u003cp\u003eThe results show that endemic areas are highly rooted in the spatiality of endemic zones due to how hotspot dynamics are entrenched therein. Aggregate endemicity may be seemingly declining; however, the fact that its stability is observed in most districts and extreme variability is observed in others indicate that elimination needs to be spatially differentiated. Predictive modelling is promising when it comes to finding high-risk locales but must deal with the issue of class imbalance cautiously. The clustering of risk proves that the epidemiological heterogeneity persists to be a characteristic of the sub-national disease landscapes.\u003c/p\u003e \u003cp\u003eAll in all, the elimination strategies should move beyond the strategies of the uniform mass toward the approach of data-driven, adaptive and more geographically sensitive intervention frameworks.\u003c/p\u003e \u003cdiv id=\"Sec20\" class=\"Section2\"\u003e \u003ch2\u003eRECOMMENDATIONS\u003c/h2\u003e \u003cp\u003eTo enhance non-hotspot detection, the methods to correct class imbalances, e.g., SMOTE or class-weighted modelling, should be introduced into future work. Trend interpretation can also be further improved by longitudinal mixed-effects modelling. The focus of the medium-risk districts should be realized in operational programs rather than focusing on extreme hotspots. Lastly, the inclusion of environmental, climatic, and socio-economic covariates may enhance predictive strength and increase the ability to create early warning.\u003c/p\u003e \u003c/div\u003e"},{"header":"References","content":"\u003col start=\"1\" type=\"1\"\u003e\n\u003cli\u003eChen X, Le J, Hu Y, et al. Predicting schistosomiasis intensity in Africa: a machine learning approach to evaluate progress toward WHO Roadmap 2030. Am J Trop Med Hyg. 2024;111(1):73\u0026ndash;79. \u003cstrong\u003eDOI:\u003c/strong\u003e 10.4269/ajtmh.23-0751 \u003c/li\u003e\n\u003cli\u003eSinger BJ, Coulibaly JT, Park HJ, et al. Development of prediction models to identify hotspots of schistosomiasis in endemic regions to guide mass drug administration. Proc Natl Acad Sci U S A. 2024;121(2):e2315463120. \u003cstrong\u003eDOI:\u003c/strong\u003e 10.1073/pnas.2315463120\u003c/li\u003e\n\u003cli\u003eLi H, Zheng J, Midzi N, et al. Schistosomiasis transmission in Zimbabwe: modelling based on machine learning. Infect Dis Model. 2024;9:100\u0026ndash;112. \u003cstrong\u003eDOI\u003c/strong\u003e:https://doi.org/10.1016/j.idm.2024.06.001\u003c/li\u003e\n\u003cli\u003eOkoyo C, Minnery M, Orowe I, et al. Using Model-based geostatistical design for schistosomiasis prevalence surveys in Kenya. Front Trop Dis. 2023;4:1240617. \u003cstrong\u003eDOI: \u003c/strong\u003ehttps://doi.org/10.3389/fitd.2023.1240617\u003c/li\u003e\n\u003cli\u003eColley DG, Bustinduy AL, Secor WE and King CH. Human schistosomiasis. Lancet. 2014 Jun 28;383(9936):2253-64. \u003cstrong\u003edoi\u003c/strong\u003e: 10.1016/S0140-6736(13)61949-2. Epub 2014 Apr 1. PMID: 24698483; PMCID: PMC4672382.\u003c/li\u003e\n\u003cli\u003ePeng D, Zhu Y, Liu L, et al. Schistosomiasis burden and trend analysis in Africa: insights from the Global Burden of Disease Study 2021. Trop Med Infect Dis. 2025;10(2):42. \u003cstrong\u003eDOI:\u003c/strong\u003e 10.3390/tropicalmed10020042 \u003c/li\u003e\n\u003cli\u003eOpare J, Hervie T, Mensah E, et al. Schistosomiasis in Ghana from baseline to now: the impact of fifteen years of interventions. Front Public Health. 2025 Jun 6;13:1554069. \u003cstrong\u003edoi\u003c/strong\u003e: 10.3389/fpubh.2025.1554069. PMID: 40547465; PMCID: PMC12179181.\u003c/li\u003e\n\u003cli\u003eKepha S, Ochol D, Wakesho F, et al.. Precision mapping of schistosomiasis and soil-transmitted helminthiasis among school age children at the coastal region, Kenya. PLoS Negl Trop Dis. 2023 Jan 5;17(1):e0011043. \u003cstrong\u003edoi\u003c/strong\u003e: 10.1371/journal.pntd.0011043. PMID: 36602986; PMCID: PMC9847902.\u003c/li\u003e\n\u003cli\u003eTabo Z, Breuer L, Fabia C, et al. machine learning approach for modeling the occurrence of the major intermediate hosts for schistosomiasis in East Africa. Sci Rep. 2024 Feb 21;14(1):4274. \u003cstrong\u003edoi\u003c/strong\u003e: 10.1038/s41598-024-54699-1. PMID: 38383705; PMCID: PMC10881506.\u003c/li\u003e\n\u003cli\u003eKagabo J, Tabo Z, Kalinda C, et al. Schistosomiasis transmission: A machine learning analysis reveals the importance of agrochemicals on snail abundance in Rwanda. PLoS Negl Trop Dis. 2024;18:e0012345. \u003cstrong\u003eDOI: \u003c/strong\u003ehttps://journals.plos.org/plosntds/article?id=10.1371/journal.pntd.0012730\u003c/li\u003e\n\u003cli\u003eMbugi NO, Laizer H, Chacha M and Mbega E. Prevalence of human schistosomiasis in various regions of Tanzania Mainland and Zanzibar: A systematic review and meta-analysis of studies conducted for the past ten years (2013-2023). PLoS Negl Trop Dis. 2024 Sep 9;18(9):e0012462. \u003cstrong\u003edoi\u003c/strong\u003e: 10.1371/journal.pntd.0012462. PMID: 39250468; PMCID: PMC11412511.\u003c/li\u003e\n\u003cli\u003eSokolow SH, Wood CL, Jones IJ et al. To Reduce the Global Burden of Human Schistosomiasis, Use \u0026apos;Old Fashioned\u0026apos; Snail Control. Trends Parasitol. 2018 Jan;34(1):23-40. \u003cstrong\u003edoi\u003c/strong\u003e: 10.1016/j.pt.2017.10.002. Epub 2017 Nov 7. PMID: 29126819; PMCID: PMC5819334.\u003c/li\u003e\n\u003cli\u003eAnifowose A, Oluwaseun TF and Akintola MM. Machine learning\u0026ndash;enabled genomic meta-analysis for schistosomiasis surveillance across Nigeria. Research Square [preprint]. 2026. Available from: https://www.researchsquare.com/article/rs-8910278/latest\u003c/li\u003e\n\u003cli\u003eDiop B, Sylla K, Kane NM, et al. Correction: Schistosomiasis control in Senegal: results from community data analysis for optimizing preventive chemotherapy intervention with praziquantel. Infect Dis Poverty. 2024 Jun 25;13(1):49. doi: 10.1186/s40249-024-01217-0. Erratum for: Infect Dis Poverty. 2023 Nov 27;12(1):106. \u003cstrong\u003edoi\u003c/strong\u003e: 10.1186/s40249-023-01155-3. PMID: 38918879; PMCID: PMC11197347.\u003c/li\u003e\n\u003cli\u003eSengupta ME, Hellstr\u0026ouml;m M, Kariuki HC, et al. Environmental DNA for improved detection and environmental surveillance of schistosomiasis. Proc Natl Acad Sci U S A. 2019 Apr 30;116(18):8931-8940. \u003cstrong\u003edoi\u003c/strong\u003e: 10.1073/pnas.1815046116. Epub 2019 Apr 11. PMID: 30975758; PMCID: PMC6500138..\u003c/li\u003e\n\u003cli\u003eBelachew E, Calpotura K, Adamu A, et al. Constructing a Predictive Model for STH and Schistosomiasis Classification From Microscopic Images. Biomed Res Int. 2025 Nov 29;2025:8074581. \u003cstrong\u003edoi\u003c/strong\u003e: 10.1155/bmri/8074581. PMID: 41321694; PMCID: PMC12663861.\u003c/li\u003e\n\u003cli\u003eCure-Bolt N, Perez F, Broadfield LA, et al. Artificial intelligence-based digital pathology for the detection and quantification of soil-transmitted helminths eggs. PLoS Negl Trop Dis. 2024 Sep 30;18(9):e0012492. \u003cstrong\u003edoi\u003c/strong\u003e: 10.1371/journal.pntd.0012492. PMID: 39348405; PMCID: PMC11488745.\u003c/li\u003e\n\u003cli\u003eTabo Z, Wangalwa R, Rwibutso M, et al. Future climate and demographic changes will almost double the risk of schistosomiasis transmission in the Lake Victoria Basin. One Health. 2025 Jul 18;21:101148. \u003cstrong\u003edoi\u003c/strong\u003e: 10.1016/j.onehlt.2025.101148. PMID: 40735740; PMCID: PMC12305727.\u003c/li\u003e\n\u003cli\u003eBartlett AW, Proboste T, Mendes EP, et al. Spatiotemporal analysis of schistosomiasis and soil-transmitted helminth distribution in three highly endemic provinces in Angola. PLoS Negl Trop Dis. 2025 Apr 8;19(4):e0012974. \u003cstrong\u003edoi\u003c/strong\u003e: 10.1371/journal.pntd.0012974. PMID: 40198696; PMCID: PMC12013881.\u003c/li\u003e\n\u003cli\u003eNyandwi, E., Osei, F.B.,Veldkamp, T. and Amer, S. Modeling schistosomiasis spatial risk dynamics over time in Rwanda using zero-inflated Poisson regression. Sci. Rep. 2020, 10, 19276. [CrossRef] [PubMed]. \u003cstrong\u003eDOI\u003c/strong\u003e:10.1038/s41598-020-76288-8\u003c/li\u003e\n\u003cli\u003eMalizia V, de Vlas SJ, Roes KCB and Giardina F (2024) Revisiting the impact of \u003cem\u003eSchistosoma mansoni\u003c/em\u003e regulating mechanisms on transmission dynamics using SchiSTOP, a novel modelling framework. PLoS Negl Trop Dis 18(9): e0012464. \u003cstrong\u003edoi\u003c/strong\u003e:10.1371/journal.pntd.0012464\u003c/li\u003e\n\u003cli\u003eRandom Forest Algorithm Overview (H. A. Salman, A. Kalakech, \u0026amp; A. Steiti , Trans.). (2024). \u003cem\u003eBabylonian Journal of Machine Learning\u003c/em\u003e, \u003cem\u003e2024\u003c/em\u003e, 69-79. https://doi.org/10.58496/BJML/2024/007\u003c/li\u003e\n\u003cli\u003eDelgado-Panadero \u0026Aacute;, Ben\u0026iacute;tez-Andrades JA, Garc\u0026iacute;a-Ord\u0026aacute;s MT. A generalized decision tree ensemble based on the NeuralNetworks architecture: Distributed Gradient Boosting Forest (DGBF). \u003cem\u003earXiv [preprint]\u003c/em\u003e. 2024.\u003cbr\u003e Available from: https://arxiv.org/abs/2402.03386 [preprint] \u003c/li\u003e\n\u003cli\u003eAhmed M, Seraj R, Islam SMS. The k-means algorithm: A comprehensive survey and performance evaluation. \u003cem\u003eElectronics (Switzerland).\u003c/em\u003e 2020;9(8):1295. \u003cstrong\u003edoi:\u003c/strong\u003e 10.3390/electronics9081295 \u003c/li\u003e\n\u003cli\u003eWahyudin, W., Riza, L. S., Erlangga, E., \u0026amp; Al Husaeni, D. N. (2025). Machine Learning-Based Clustering for Program Learning Outcomes in Higher Education: A Systematic Review. \u003cem\u003eBrilliance: Research of Artificial Intelligence\u003c/em\u003e, \u003cem\u003e5\u003c/em\u003e(1), 182\u0026ndash;189. https://doi.org/10.47709/brilliance.v5i1.5953\u003c/li\u003e\n\u003cli\u003eIsnan S, Bin Abdullah AF, Shariff AR, Ishak I, Syed Ismail SN, Appanan MR. Moran\u0026apos;s \u003cem\u003eI\u003c/em\u003e and Geary\u0026apos;s \u003cem\u003eC\u003c/em\u003e: investigation of the effects of spatial weight matrices for assessing the distribution of infectious diseases. Geospat Health. 2025 Jan 23;20(1). doi: 10.4081/gh.2025.1277. Epub 2025 Apr 7. PMID: 40197607.\u003c/li\u003e\n\u003c/ol\u003e\n"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"Newcastle University","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Schistosomiasis, Machine Learning, Hotspot Detection, Spatial Epidemiology, Spatiotemporal Modelling, Risk Stratification, Mass Drug Administration, Africa","lastPublishedDoi":"10.21203/rs.3.rs-8905314/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8905314/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eSchistosomiasis is one of the top neglected tropical diseases in sub-Saharan Africa with endemic locales of disease transmission and subnational heterogeneity despite repeated mass drug administration (MDA). This study created an AI-operated framework that identified the hotspots of spread, characterized the risk of endemicity and examined the performance of the programme based on 70,372 Admin2-years observations in endemic countries in Africa. It was a multi-component analytical methodology that combined machine learning clustering, supervised hotspots prediction, geographic stratification, and spatiotemporal trend modelling.\u003c/p\u003e \u003cp\u003eStructurally imbalanced epidemiological patterns were detected by k-means clustering (k\u0026thinsp;=\u0026thinsp;4), and the outcome is a dominant cluster attaining 65.9 percent of districts indicating a systemic pattern of similarity in coverage and endemicity, and small atypical clusters that are suggestive of extreme-risk or high-performance situations. The target hotspots were predicted by a random forest with 88 per cent accuracy and AUCs equal to 0.80 which indicates good discriminatory ability. Nevertheless, their extreme class imbalance led to high levels of recall (0.99) in hotspots too but close to zero in non-hotspots, with which methodological problems in elimination-phase modelling are identified. Combined risk stratification was able to show that most districts are in moderate-priority category, and not extreme high-risk one.\u003c/p\u003e \u003cp\u003eSpatiotemporal analysis resulted in the overall negative mean endemicity trend (p= -7.59), but zero median slope of the slope indicated the widespread stability with significant inter-district variability. These results indicate that the process of eradication is discontinuous and geographically embedded.\u003c/p\u003e \u003cp\u003eThe study offers an AI hybrid stratification architecture that integrates clustering and predictive modelling to overcome the targeted intervention planning and optimal resource allocation problems. Findings highlight the importance of modular, evidence-based and geographically diverse eradication measures throughout Africa.\u003c/p\u003e","manuscriptTitle":"AI-Driven Hotspot Detection and Program Performance Analysis of Schistosomiasis in Africa","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-03-12 08:20:37","doi":"10.21203/rs.3.rs-8905314/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"9e08e75b-cce5-44fc-80b8-8e32ca20ec3d","owner":[],"postedDate":"March 12th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":63139438,"name":"Computational Biology"},{"id":63139439,"name":"Infectious Diseases"}],"tags":[],"updatedAt":"2026-03-12T08:20:37+00:00","versionOfRecord":[],"versionCreatedAt":"2026-03-12 08:20:37","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8905314","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8905314","identity":"rs-8905314","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.