Hydroclimatic Trend Analysis and Machine Learning-Based Water Level Prediction for The Ganges (Padma) River Basin

preprint OA: closed
📄 Open PDF Full text JSON View at publisher
Full text 43,382 characters · extracted from preprint-html · click to expand
Hydroclimatic Trend Analysis and Machine Learning-Based Water Level Prediction for The Ganges (Padma) River Basin | Authorea try { document.documentElement.classList.add('js'); } catch (e) { } var _gaq = _gaq || []; _gaq.push(['_setAccount', 'G-8VDV14Y67G']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); Skip to main content Preprints Collections Wiley Open Research IET Open Research Ecological Society of Japan All Collections About About Authorea FAQs Contact Us Quick Search anywhere Search for preprint articles, keywords, etc. Search Search ADVANCED SEARCH SCROLL This is a preprint and has not been peer reviewed. Data may be preliminary. 9 September 2025 V1 Latest version Share on Hydroclimatic Trend Analysis and Machine Learning-Based Water Level Prediction for The Ganges (Padma) River Basin Authors : Mahfujur Rahman Joy 0009-0003-4202-9776 [email protected] and Raied Ahmed Nishat Authors Info & Affiliations https://doi.org/10.22541/au.175741407.71397410/v1 198 views 121 downloads Contents Abstract 1 Introduction 2 Methodology 2.2 Data 2.3 Handling Missing Data with EM Algorithm 2.4 Statistical Analysis 2.5 Machine Learning Modeling 2.5.2 Support Vector Regression 2.5.3 Random Forest 2.6 K-fold Cross Validation 2.7 Model Evaluation Matrices 3. Results and Discussion 3.2 Model Evaluation 4 Conclusion Code availability Data availability Interactive computing environment Sample availability Video supplement Team list Author contribution Competing interests Disclaimer Acknowledgements Financial support References Information & Authors Metrics & Citations View Options References Figures Tables Media Share Abstract This study integrates hydroclimatic trend analysis with machine learning to predict future water levels in the Ganges (Padma) River Basin, focusing on Goalondo Station. We applied statistical methods, including the Mann-Kendall test and Sen’s Slope estimator, to identify trends in rainfall, discharge, and temperature, revealing a weak declining trend in rainfall, moderate fluctuations in discharge, and a significant downward trend in temperature. The machine learning models employed for water level prediction included Linear Regression, Support Vector Regression, and Random Forest. Among these, Random Forest demonstrated the best performance, achieving the highest R 2 values of 0.967 for training and 0.961 for testing. Additionally, it recorded the lowest RMSE values of 0.458 for training and 0.4336 for testing, as well as the lowest MAE values of 0.320 for training and 0.311 for testing. These findings were further supported by cross-validation, which consistently validated Random Forest’s superior performance. This research bridges the gap between trend analysis and predictive modeling, offering new insights into how long-term hydroclimatic shifts influence predictive accuracy. Our findings demonstrate the potential of ML models to enhance water resource management, flood risk assessment, and climate change adaptation strategies in the Ganges-Padma River Basin and similar regions globally. Hydroclimatic Trend Analysis and Machine Learning-Based Water Level Prediction for The Ganges (Padma) River Basin Mahfujur Rahman Joy 1* , Raied Ahmed Nishat 2 1 Department of Civil and Environmental Engineering, Shahjalal University of Science and Technology, Sylhet-3114, Bangladesh 2 Department of Civil and Environmental Engineering, Shahjalal University of Science and Technology, Sylhet-3114, Bangladesh Correspondence to : Mahfujur Rahman Joy ( [email protected] ) Abstract This study integrates hydroclimatic trend analysis with machine learning to predict future water levels in the Ganges (Padma) River Basin, focusing on Goalondo Station. We applied statistical methods, including the Mann-Kendall test and Sen’s Slope estimator, to identify trends in rainfall, discharge, and temperature, revealing a weak declining trend in rainfall, moderate fluctuations in discharge, and a significant downward trend in temperature. The machine learning models employed for water level prediction included Linear Regression, Support Vector Regression, and Random Forest. Among these, Random Forest demonstrated the best performance, achieving the highest R² values of 0.967 for training and 0.961 for testing. Additionally, it recorded the lowest RMSE values of 0.458 for training and 0.4336 for testing, as well as the lowest MAE values of 0.320 for training and 0.311 for testing. These findings were further supported by cross-validation, which consistently validated Random Forest’s superior performance. This research bridges the gap between trend analysis and predictive modeling, offering new insights into how long-term hydroclimatic shifts influence predictive accuracy. Our findings demonstrate the potential of ML models to enhance water resource management, flood risk assessment, and climate change adaptation strategies in the Ganges-Padma River Basin and similar regions globally. Keywords: Goalondo Point; EM Algorithm; Water Level; Predictive Modeling; Padma Basin; Baruria Transit; Trend Analysis 1 Introduction The Ganges (Padma) River Basin, one of the largest and most important river systems in South Asia, is a lifeline for millions of people living in its catchment area. The river supports a diverse range of ecosystems, provides water for agricultural activities, and plays a crucial role in flood mitigation and water management for the surrounding population. However, with the accelerating impacts of climate change, there has been growing concern over shifts in hydroclimatic patterns, which directly affect the river’s discharge, rainfall, and temperature (Eshita et al., 2023). These changes pose critical challenges for water resource management, particularly in regions where seasonal rainfall dictates river flow dynamics and sustains millions of livelihoods. Despite a growing body of research on historical hydroclimatic trends, there remains a crucial gap in translating these trends into predictive models that can inform sustainable decision-making and climate adaptation strategies. Traditional hydrological studies have extensively analyzed past trends in rainfall, temperature, and discharge to assess climate change impacts on river basins (Lundqvist & and Falkenmark, 2010; Whateley et al., 2015). While these studies provide valuable insights, they often lack predictive capabilities, which are essential for future water management strategies. Forecasting water levels based on climate variables remains a major challenge due to the nonlinear interactions between atmospheric temperature, precipitation, and river discharge. Conventional hydrological models struggle to capture these complexities and often fail under non-stationary climate conditions (Belyakova et al., 2022; Bricheno et al., 2021). Recent advances in machine learning (ML) provide a transformative approach by enabling data-driven modeling of intricate hydroclimatic relationships (Mihel et al., 2024; Shahgedanova, 2021). By incorporating multi-source climate data such as rainfall, atmospheric temperature, and river discharge, ML-based models can dynamically adapt to changing patterns and provide more reliable water level forecasts (Bolan et al., 2024; Hossain et al., 2020). Several studies have demonstrated the effectiveness of ML in predicting water level across different hydrological systems. Khan & Coulibaly (2006) explored the potential of support vector machines (SVM) for long-term water level prediction in Lake Erie. Similarly, Altunkaynak (2007) applied artificial neural networks (ANNs) to predict Lake Van’s temporal water level variations, outperforming traditional time-series models. A study comparing three different watersheds (Mediterranean, Oceanic, and Hemiboreal) tested ANNs, support vector regression (SVR), wavelet-ANN, and wavelet-SVR, and found that SVR-based models gave the best results overall (Karran et al., 2013). Additionally, Ahmed et al. (2022) evaluated six ML algorithms for predicting daily water levels in the Durian Tunggal River, Malaysia, and found Gaussian Process Regression (GPR) to be the most effective, particularly in capturing extreme fluctuations with high precision. Ensemble learning models such as Random Forest have also shown success in forecasting water levels and river discharge, as they account for both linear and non-linear relationships in the data (Belyakova et al., 2022; Mihel et al., 2024). While these studies highlight the success of machine learning in water level prediction, a significant gap remains in integrating hydroclimatic trend diagnostics with predictive analytics, particularly in the Ganges-Padma River Basin. Most research treats trend analysis and predictive modeling as separate domains, missing the opportunity to explore how long-term hydroclimatic shifts can influence predictive accuracy. By applying statistical techniques such as the Mann-Kendall test and Sen’s Slope estimator to assess hydroclimatic trends and using machine learning algorithms for water level prediction, this study aims to bridge that gap. This approach will offer valuable insights into how climate variability is affecting the Ganges (Padma) River Basin and will highlight the potential of predictive models to assist in future water resource management and flood risk reduction. This research provides a framework for adapting to changing hydroclimatic conditions and developing effective water level prediction models. As climate patterns become unpredictable, this study offers crucial insights for sustainable water management in the Ganges (Padma) Basin and similar river systems globally. 2 Methodology 2.1 Study area The Ganges (Padma) River Basin, one of the world’s most extensive fluvial systems, spans multiple geopolitical regions, including India, Nepal, China (Tibet), and Bangladesh (Fig. 1). Upon entering Bangladesh at Chapai Nawabganj, the river is locally referred to as the Padma, continuing its course until its confluences with the Brahmaputra and Meghna rivers, forming a vast delta. The study area is delineated between 23.5° - 24°50´N latitude and 89.5° - 90.25°E longitude covering the confluence region receiving upstream discharge from both the Brahmaputra and Meghna rivers. The region is characterized by a heterogeneous landscape, consisting of active floodplains and river tributaries. Hydrologically, the Padma River exhibits significant seasonal variability, with peak discharges during the monsoon (June–September) and reduced flows in the dry season (March–May) (Mirza, 2002). The discharge data is collected from the Baruria Transit (Station SW91.9L) in Harirampur, Manikganj, a key monitoring station in the region. Rainfall patterns in the region, primarily influenced by the South Asian monsoon, show significant seasonal variation, with occasional extreme precipitation events leading to major flood occurrences. This region has a generally warm climate with notable year-to-year fluctuations, experiencing both cooler and hotter extremes over the years. The climate-induced hydrological shifts in the Ganges (Padma) Basin underscore the region’s vulnerability to changing precipitation regimes, intensified flood-drought cycles, and rising temperatures (Abrar et al., 2024). Fig. . Study area map with station information 2.2 Data For this study, several hydroclimatic variables were collected to develop a machine learning-based model for predicting water levels in the Padma River. The data spans from 1983 to 2020 and includes rainfall, discharge, and temperature, all of which are key factors influencing water levels and hydrological dynamics in the region. Monthly precipitation data were obtained from the Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks - Climate Data Record (PERSIANN-CDR) dataset. This dataset is developed by the Center for Hydrometeorology and Remote Sensing (CHRS) at UC Irvine and provides high-resolution (0.25° × 0.25°) satellite-based rainfall estimates. The data was adjusted using the Global Precipitation Climatology Project (GPCP) monthly products to ensure long-term consistency and accuracy. Discharge and water level data for the study area were collected from the Bangladesh Water Development Board (BWDB) at the Baruria Transit (SW91.9L) station located in Manikganj. This station provides essential data on river flow and is a key monitoring station in the Ganges-Padma River system. The data collected includes monthly mean discharge values that are critical for understanding the hydrological patterns in the region. Temperature data for the study area were derived from NASA’s Prediction of Worldwide Energy Resource (POWER) project. The POWER dataset provides long-term climate records at a 0.5° × 0.625° spatial resolution. The dataset includes global temperature measurements, which are essential for analyzing temperature variations and their effects on river discharge and water level dynamics. 2.3 Handling Missing Data with EM Algorithm The data used in this study includes missing values. To handle missing values in the dataset, particularly in the rainfall and discharge variables, we used the Expectation-Maximization (EM) algorithm. This method estimates missing data by iteratively updating the values based on the existing data, improving the accuracy of the dataset without discarding incomplete records. Fig. . Kernel density estimation plots of (a) rainfall and (b) discharge data before and after EM imputation The EM algorithm works in two steps: the Expectation (E) step estimates missing values, and the Maximization (M) step updates the model parameters based on these estimates. This process continues until the algorithm converges, resulting in a complete dataset that reflects the underlying patterns more accurately. Fig. 2 illustrates the kernel density estimation (KDE) plots of rainfall (a) and discharge (b) data before and after EM imputation. The distributions remain nearly identical, indicating that the EM algorithm effectively preserved the original data structure while filling in missing values (Chen et al., 2019; Gao, 2017). 2.4 Statistical Analysis For the statistical analysis of the hydroclimatic variables, key descriptive statistics, including the mean, median, standard deviation, and range, were calculated to summarize the central tendencies and variability of the data. These statistics provided an essential understanding of the dataset’s general characteristics, helping to contextualize the trends observed in the subsequent analyses. To investigate trends in the data, the Mann-Kendall Test and Sen’s Slope Estimator were performed. The Mann-Kendall Test is a non-parametric rank-based method used to detect monotonic trends in time series data. It tests the null hypothesis that no trend exists, with the alternative hypothesis suggesting the presence of a trend (either upward or downward). A positive Z-value indicates an increasing trend, while a negative Z-value suggests a decreasing trend. Statistical significance is determined if the p-value is below 0.05. This test is particularly useful for hydroclimatic data, as it does not require the assumption of normality (Kendall, 1948; Mann, 1945). The Sen’s Slope Estimator is a non-parametric method for quantifying the magnitude of a trend. It calculates the median slope from all pairs of data points, providing a robust estimate of the trend’s direction. This method is advantageous as it is resistant to outliers and does not rely on distributional assumptions, making it suitable for the often irregular nature of hydroclimatic data (Othman. Ali & Rashid Abubaker, 2019; Sen, 1968). 2.5 Machine Learning Modeling Machine learning techniques were employed in this study to predict water levels based on key hydroclimatic variables: rainfall, temperature, and discharge. The predictive models selected for this analysis are Linear Regression, Support Vector Regression (SVR), and Random Forest Regressor. These models were chosen for their ability to capture both linear and non-linear relationships within the data. 2.5.1 Linear Regression Linear Regression (LR) is a foundational statistical method that assumes a linear relationship between the predictor variables and the target variable. The model can be expressed by the equation: \(\mathbf{y=}\mathbf{\beta}_{\mathbf{0}}\mathbf{+}\mathbf{\beta}_{\mathbf{1}}\mathbf{x}_{\mathbf{1}}\mathbf{+}\mathbf{\beta}_{\mathbf{2}}\mathbf{x}_{\mathbf{2}}\mathbf{+\ldots+}\mathbf{\beta}_{\mathbf{n}}\mathbf{x}_{\mathbf{n}}\mathbf{+\epsilon}\) Eq. 1 Where, \(y\) is the predicted water level; \(\beta_{0}\) is the intercept; \(\beta_{1},\beta_{2},\ldots,\beta_{n}\) are the coefficients of the predictor variables (\(x_{1},x_{2},\ldots,x_{n}\)) and\(\epsilon\) is the error term. The objective of Linear Regression is to minimize the sum of squared errors, typically done by solving: \(\widehat{\mathbf{\beta}}\mathbf{=arg}\underset{\mathbf{\beta}}{\mathbf{\min}}\sum_{\mathbf{i=1}}^{\mathbf{n}}\left(\mathbf{y}_{\mathbf{i}}\mathbf{-}\widehat{\mathbf{y}_{\mathbf{i}}}\right)^{\mathbf{2}}\) Eq. 2 Where \(y_{i}\) is the observed value, and \(\widehat{y_{i}}\) is the predicted value. While Linear Regression is computationally efficient and interpretable, its primary limitation is the assumption that the relationship between the predictors and the target is strictly linear. (Chieu et al., 2024; Mihel et al., 2024). 2.5.2 Support Vector Regression Support Vector Regression (SVR) is a more advanced model that can capture non-linear relationships through a process known as kernel trick. SVR projects the data into a higher-dimensional space using a kernel function and attempts to fit the data within a margin of tolerance, \(\epsilon\). The objective of SVR is to minimize the following cost function, \(\frac{\mathbf{1}}{\mathbf{2}}\mathbf{\parallel w}\mathbf{\parallel}^{\mathbf{2}}\mathbf{+C}\sum_{\mathbf{i=1}}^{\mathbf{n}}\mathbf{\epsilon}_{\mathbf{i}}\) Eq. 3 Where, \(\parallel w\parallel^{2}\) ensures the function is as flat as possible, \(C\) is the penalty parameter that controls the trade-off between error and complexity and \(\epsilon_{i}\) are slack variables that allow the data points to lie outside the margin. The function predicted by SVR is given by, \(\mathbf{f}\left(\mathbf{x}\right)\mathbf{=\langle w,\phi}\left(\mathbf{x}\right)\mathbf{\rangle+b}\) Eq. 4 Where, \(f\left(x\right)\) is the predicted water level,\(\langle w,\phi\left(x\right)\rangle\) is the inner product of the weight vector \(w\) and the transformed input feature\(\phi\left(x\right)\) and \(b\) is the bias term. SVR is particularly useful for datasets with complex, non-linear relationships. It performs well in capturing intricate patterns between the predictors and target variable, especially in cases where the data is not linearly separable. For this reason, it is particularly effective for hydrological prediction tasks (Mihel et al., 2024; Wu et al., 2008). 2.5.3 Random Forest Random Forest is an ensemble learning method that constructs multiple decision trees during training and combines their individual predictions to improve performance and reduce overfitting. Each decision tree is built using a random subset of the data, and the final prediction is the average of the predictions from all trees. This ensemble approach enhances model robustness, particularly for high-dimensional datasets with many features. The prediction from Random Forest is given by: \(\widehat{\mathbf{y}}\mathbf{=}\frac{\mathbf{1}}{\mathbf{T}}\sum_{\mathbf{t=1}}^{\mathbf{T}}\mathbf{f}_{\mathbf{t}}\left(\mathbf{x}\right)\) Eq. 5 Where, \(\widehat{y}\) is the predicted water level, \(T\) is the number of trees and \(f_{t}\left(x\right)\) is the prediction of the \(t\)-th tree. Random Forest has several advantages, including its ability to handle both numerical and categorical data, its resistance to overfitting, and its effectiveness in capturing both linear and non-linear relationships between predictors and the target variable. This model is particularly well-suited for complex, dynamic systems like river flow prediction, where multiple variables interact in intricate ways (Belyakova et al., 2022; Mihel et al., 2024). 2.6 K-fold Cross Validation In machine learning, k-fold cross-validation (CV) is a widely adopted technique to enhance model evaluation. For this study, a 5-fold CV approach was utilized to ensure a reliable assessment of model performance. The dataset was divided into five equal-sized, non-overlapping subsets using stratified sampling. During each iteration, the model was trained on four subsets while the remaining one was used for validation. This process was repeated five times, allowing every data point to be used for both training and validation. This method optimizes both computational efficiency and evaluation reliability while effectively managing the bias-variance trade-off (Kohavi, 1995; Wong, 2015). 2.7 Model Evaluation Matrices The models were trained using 80% of the data and tested on the remaining 20%. This approach ensures that the model is evaluated on its ability to generalize to unseen data. To assess the predictive performance of the developed machine learning models, three standard evaluation metrics were employed: Coefficient of Determination (R²), Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE). Additionally, cross-validation was used to further assess model robustness and to minimize overfitting, providing a more reliable estimate of the model’s predictive power (Naidu et al., 2023). \(\mathbf{R}^{\mathbf{2}}\mathbf{=1-}\frac{\sum_{\mathbf{i=1}}^{\mathbf{n}}\left(\mathbf{y}_{\mathbf{i}}\mathbf{-}\widehat{\mathbf{y}_{\mathbf{i}}}\right)^{\mathbf{2}}}{\sum_{\mathbf{i=1}}^{\mathbf{n}}\left(\mathbf{y}_{\mathbf{i}}\mathbf{-}\bar{\mathbf{y}}\right)^{\mathbf{2}}}\) Eq. 6 \(\mathbf{RMSE=}\sqrt{\frac{\mathbf{1}}{\mathbf{n}}\sum_{\mathbf{i=1}}^{\mathbf{n}}\left(\mathbf{y}_{\mathbf{i}}\mathbf{-}\widehat{\mathbf{y}_{\mathbf{i}}}\right)^{\mathbf{2}}}\) Eq. 7 \(\mathbf{MAE=}\frac{\mathbf{1}}{\mathbf{n}}\sum_{\mathbf{i=1}}^{\mathbf{n}}\left|\mathbf{y}_{\mathbf{i}}\mathbf{-}\widehat{\mathbf{y}_{\mathbf{i}}}\right|\) Eq. 8 Where \(y_{i}\) is the observed value, \(\widehat{y_{i}}\) is the predicted value and n is the number of data points (Naidu et al., 2023). 3. Results and Discussion 3.1 Hydroclimatic Statistics The analysis of rainfall, discharge, and temperature in the Ganges (Padma) River Basin provides insights into the variability and distribution of these critical hydroclimatic factors. Table 1 summarizes the key statistics for each of these variables, including the mean, median, standard deviation, and range, offering a glimpse into their behavior over the study period (1983–2020). Table . Summary statistics of the variables Mean 181.80 34815.98 26.10 Median 124.09 20473.43 27.79 Standard Deviation 175.30 29650.07 4.32 Skewness 0.74 0.73 -0.67 Min 0.00 (1985) 1345.01 (1986) 15.34 (2018) Max 704.78 (1998) 99800.00 (1995) 33.42 (1999) Range 704.78 98454.99 18.08 Lowest Annual 1446 27134.53 (mean) 24.98 (mean) Highest Annual 2759 45615.34 (mean) 27.38 (mean) Sen’s Slope -0.3945 mm/year -40.9256 m³/s/year -0.0426 °C/year Mann-Kendall (z, p) (-1.4779, 0.1394) (-0.9024, 0.3668) (-3.2043, 0.0014) The rainfall data exhibits significant variability with a mean of 181.80 mm and a high standard deviation of 175.30 mm, indicating large fluctuations in monthly rainfall. The skewness of 0.74 suggests that while low rainfall values are more common, extreme rainfall events do occur. The maximum monthly rainfall recorded was 704.78 mm in 1998, while the minimum was 0.00 mm in 1985, reflecting dry conditions in some years. The discharge data shows a mean of 34815.98 m³/s and a standard deviation of 29650.07 m³/s, indicating substantial variability in river flow, particularly influenced by the monsoon season. The range of 98454.99 m³/s (from 1345.01 m³/s to 99800.00 m³/s) reflects the large differences between high-flow and low-flow conditions. The skewness of 0.73 suggests that higher discharge values, although significant, are less frequent. The maximum discharge of 99800.00 m³/s occurred in 1995, which coincides with the year of maximum rainfall. The temperature data shows a mean of 26.10°C and a standard deviation of 4.32°C, reflecting moderate fluctuations over time. The skewness of -0.67 indicates that cooler temperatures are more frequent than extreme heat. The range of 18.08°C (from 15.34°C to 33.42°C) shows the seasonal variations in temperature, with extremes occurring in 1999 for the highest temperature and 2018 for the lowest. The Mann-Kendall test and Sen’s Slope estimator were applied to assess the long-term trends in rainfall, discharge, and temperature over the study period (1983–2020). For rainfall, the Mann-Kendall test produced a Z value of -1.4779 with a p-value of 0.1394, indicating a slight negative trend. However, since the p-value is greater than 0.05, this trend is not statistically significant. The Sen’s Slope estimate of -0.3945 mm/year suggests a minor decrease in rainfall over the study period, but the trend is not significant enough to be considered meaningful. The discharge data yielded a Z value of -0.9024 and a p-value of 0.3668, suggesting no significant trend in discharge over the study period. The Sen’s Slope of -40.9256 m³/s/year indicates a slight decline in discharge, but again, this trend is not statistically significant. For temperature, the Mann-Kendall test produced a Z value of -3.2043 with a p-value of 0.0014, indicating a statistically significant downward trend. The Sen’s Slope of -0.0426°C/year quantifies this cooling trend, suggesting a consistent decrease in temperature over the study period. 3.2 Model Evaluation The evaluation of the three predictive models (Linear Regression, Support Vector Regression, and Random Forest) to predict water levels in the Ganges (Padma) River Basin for water level prediction in the Ganges (Padma) River Basin is presented in Table 2, showcasing their performance across various metrics. Table . Model Performance Linear Regression 0.912 0.892 0.754 0.727 0.580 0.569 0.914 Support Vector Regression 0.961 0.958 0.476 0.4488 0.322 0.304 0.967 Random Forest 0.967 0.961 0.458 0.4336 0.320 0.311 0.968 The Linear Regression model demonstrated a solid fit to the data, with an R² value of 0.912 for training and 0.892 for testing. However, the predicted vs actual plot (Fig. 3a) reveals that the model follows the general trend but exhibits a higher spread, especially in extreme values. This spread indicates some overestimation and underestimation, suggesting that Linear Regression may not fully capture non-linear relationships in the data. The RMSE and MAE values for the training and testing data (0.754 and 0.727, respectively) show that the model’s errors are relatively moderate, but higher than those of other models. The SVR model performed well, with R² values of 0.961 (training) and 0.958 (testing), indicating a good fit to the data. However, the predicted vs actual plot (Fig. 3b) suggests that the predictions are flat, particularly at higher values, which indicates that the model may have poor generalization to unseen data. This could be due to the need for better hyperparameters or a different kernel. The RMSE and MAE values (0.476 and 0.4488, respectively) are lower than those of Linear Regression, but the model still struggles with extreme values. Fig. . Predicted vs Actual Plot; (a) Linear Regression, (b) SVR, (c) Random Forest Random Forest exhibited the best performance overall, with R² values of 0.967 (training) and 0.961 (testing). The predicted vs actual plot (Fig. 3c) shows that Random Forest provides more accurate predictions, particularly for non-linear relationships. The model fits the data better than Linear Regression, and the RMSE and MAE values (0.458 and 0.4336) are lower than those of both other models. However, some spread is still visible for extreme values, suggesting that the model struggles with rare events. Fig. . Residual Plot; (a) Linear Regression, (b) SVR, (c) Random Forest The residual plots in Fig. 4 reveal additional insights into the models’ performances. Linear Regression residuals show a clear pattern, indicating that the model is not capturing the non-linearity of the data. Larger residuals are present for higher water levels, suggesting that Linear Regression underperforms when predicting extreme values. This suggests that the model does not fully capture the relationships in the data, especially for more complex, non-linear patterns. The residual plot for SVR shows an increase in residuals as the water level rises, indicating that the model underperforms at higher water levels (Fig. 4b). This suggests that SVR is not generalizing well for extreme values, which may be due to the chosen kernel or the need for parameter optimization. Random Forest (Fig. 4c) shows fewer structured patterns, with residuals more evenly distributed. While the model performs better than Linear Regression, there is still some variance, especially for extreme values. This suggests that Random Forest is better at capturing non-linear relationships but still faces challenges with outliers or rare extreme events. 4 Conclusion This study has explored the hydroclimatic trends of rainfall, discharge, and temperature in the Ganges (Padma) River Basin, with a particular focus on the Goalondo Station. By analyzing long-term data and applying machine learning techniques, the study aimed to uncover patterns in these vital variables and predict future water levels. The findings highlight the complexity of hydroclimatic changes in the basin, with clear evidence of rainfall variability, fluctuating discharge, and temperature changes over time. The analysis of rainfall showed some signs of a weak declining trend, though the results were not statistically significant. This suggests that, while rainfall is a crucial driver of river discharge, its variability does not yet present a clear pattern of long-term change. The discharge data revealed moderate fluctuations, with a slight decrease in river flow, particularly during the dry season, which has implications for water availability and flood management. The temperature data, on the other hand, indicated a significant downward trend, with evidence of cooling over the study period. This trend, although gradual, could potentially have long-term implications for the region’s hydrological cycle. Additionally, the machine learning models, Linear Regression, Support Vector Regression (SVR), and Random Forest, proved to be effective in predicting water levels in the Ganges (Padma) River Basin. Among these, SVR and Random Forest showed the most promising results, with lower errors and higher R² values compared to Linear Regression. These findings demonstrate the potential of using advanced machine learning techniques to predict water levels and manage water resources in the face of climate change. Overall, the study underscores the importance of continuous monitoring and analysis of hydroclimatic variables in the Ganges (Padma) River Basin. The results provide valuable insights for water resource management, flood risk prediction, and climate change adaptation strategies. As climate patterns continue to evolve, the ability to predict river discharge and water levels using advanced models will be crucial for sustaining the region’s water supply, ensuring the resilience of ecosystems, and supporting the livelihoods of millions of people who depend on the river. In conclusion, this research contributes to the growing body of knowledge on the impacts of climate change on river systems, specifically in the Ganges (Padma) Basin. By leveraging both statistical analyses and machine learning models, it provides a comprehensive approach to understanding hydroclimatic trends and predicting water levels. Future research should focus on refining these models, incorporating more diverse datasets, and considering the influence of additional climate and environmental factors. Such efforts will be essential for designing effective policies and strategies to address the challenges posed by climate change in river basins worldwide. Code availability The code used for data analysis and visualization in this study is available from the corresponding author upon reasonable request. Data availability All raw data used in this study are available from the corresponding authors upon reasonable request. Interactive computing environment Not applicable. Sample availability Not applicable. Video supplement Not applicable. Team list Not applicable. Author contribution MRJ: Conceptualization, Formal analysis, Writing – Original Draft (Methodology & Results). RAN: Data curation, Writing – Original Draft (Literature review and Introduction). Both authors contributed to the interpretation of results and the final revision of the manuscript. All authors have read and approved the final version. Competing interests The authors declare that they have no conflict of interest. Disclaimer None. Acknowledgements The authors express their sincere gratitude to the organizations and individuals who make high-quality environmental and hydrological data freely available for scientific research. In particular, we thank the teams behind the PERSIANN-CDR dataset, NASA’s POWER project, and the Bangladesh Water Development Board (BWDB) for maintaining and sharing valuable datasets. Their efforts play a crucial role in enabling research such as this. Financial support This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors. References 1. Abrar, M. F., Iman, Y. E., Mustak, M. B., & Pal, S. K. (2024). Assessment of vulnerability to flood risk in the Padma River Basin using hydro-morphometric modeling and flood susceptibility mapping. Environmental Monitoring and Assessment , 196 (7), 661. https://doi.org/10.1007/s10661-024-12780-2Ahmed, A. N., Yafouz,Ayman, Birima,Ahmed H., Kisi,Ozgur, Huang,Yuk Feng, Sherif,Mohsen, Sefelnasr,Ahmed, & and El-Shafie, A. (2022). Water level prediction using various machine learning algorithms: A case study of Durian Tunggal river, Malaysia. Engineering Applications of Computational Fluid Mechanics , 16 (1), 422–440. https://doi.org/10.1080/19942060.2021.2019128Altunkaynak, A. (2007). Forecasting Surface Water Level Fluctuations of Lake Van by Artificial Neural Networks. Water Resources Management , 21 (2), 399–408. https://doi.org/10.1007/s11269-006-9022-6Belyakova, P. A., Moreido, V. M., Tsyplenkov, A. S., Amerbaev, A. N., Grechishnikova, D. A., Kurochkina, L. S., Filippov, V. A., & Makeev, M. S. (2022). Forecasting Water Levels in Krasnodar Krai Rivers with the Use of Machine Learning. Water Resources , 49 (1), 10–22. https://doi.org/10.1134/S0097807822010043Bolan, S., Padhye, L. P., Jasemizad, T., Govarthanan, M., Karmegam, N., Wijesekara, H., Amarasiri, D., Hou, D., Zhou, P., Biswal, B. K., Balasubramanian, R., Wang, H., Siddique, K. H. M., Rinklebe, J., Kirkham, M. B., & Bolan, N. (2024). Impacts of climate change on the fate of contaminants through extreme weather events. Science of The Total Environment , 909 , 168388. https://doi.org/10.1016/j.scitotenv.2023.168388Bricheno, L. M., Wolf, J., & Sun, Y. (2021). Saline intrusion in the Ganges-Brahmaputra-Meghna megadelta. Estuarine, Coastal and Shelf Science , 252 , 107246. https://doi.org/10.1016/j.ecss.2021.107246Chen, L., Xu, J., Wang, G., & Shen, Z. (2019). Comparison of the multiple imputation approaches for imputing rainfall data series and their applications to watershed models. Journal of Hydrology , 572 , 449–460. https://doi.org/10.1016/j.jhydrol.2019.03.025Chieu, T. Q., Thao, N. T. P., Thi Hue, D., & Huong, N. T. T. (2024). Prediction of the water level at the Kien Giang River based on regression techniques. River , 3 (1), 59–68. https://doi.org/10.1002/rvr2.71Eshita, N. R., Bhuiyan, M. A. H., & Saadat, A. H. M. (2023). Recent morphological shifting of Padma River: Geoenvironmental and socioeconomic implications. Natural Hazards , 117 (1), 447–472. https://doi.org/10.1007/s11069-023-05867-5Gao, Y. (2017). Dealing with missing data in hydrology: Data analysis of discharge and groundwater time-series in Northeast Germany . https://doi.org/10.17169/refubium-16900Hossain, B., Sohel, Md. S., & Ryakitimbo, C. M. (2020). Climate change induced extreme flood disaster in Bangladesh: Implications on people’s livelihoods in the Char Village and their coping mechanisms. Progress in Disaster Science , 6 , 100079. https://doi.org/10.1016/j.pdisas.2020.100079Karran, D. J., Morin, E., & Adamowski, J. (2013). Multi-step streamflow forecasting using data-driven non-linear methods in contrasting climate regimes. Journal of Hydroinformatics , 16 (3), 671–689. https://doi.org/10.2166/hydro.2013.042Kendall, M. G. (1948). Rank correlation methods . Griffin.Khan, M. S., & Coulibaly, P. (2006). Application of Support Vector Machine in Lake Water Level Prediction. Journal of Hydrologic Engineering , 11 (3), 199–205. https://doi.org/10.1061/(ASCE)1084-0699(2006)11:3(199)Lundqvist, J., & and Falkenmark, M. (2010). Adaptation to Rainfall Variability and Unpredictability: New Dimensions of Old Challenges and Opportunities. International Journal of Water Resources Development , 26 (4), 595–612. https://doi.org/10.1080/07900627.2010.519488Mann, H. B. (1945). Nonparametric Tests Against Trend. Econometrica , 13 (3), 245–259. https://doi.org/10.2307/1907187Mihel, A. M., Lerga, J., & Krvavica, N. (2024). Estimating water levels and discharges in tidal rivers and estuaries: Review of machine learning approaches. Environmental Modelling & Software , 176 , 106033. https://doi.org/10.1016/j.envsoft.2024.106033Naidu, G., Zuva, T., & Sibanda, E. M. (2023). A Review of Evaluation Metrics in Machine Learning Algorithms. In R. Silhavy & P. Silhavy (Eds.), Artificial Intelligence Application in Networks and Systems (pp. 15–25). Springer International Publishing. https://doi.org/10.1007/978-3-031-35314-7_2Othman. Ali, R., & Rashid Abubaker, S. (2019). Trend analysis using mann-kendall, sen’s slope estimator test and innovative trend analysis method in Yangtze river basin, china: Review. International Journal of Engineering & Technology , 8 (2), 110–119. https://doi.org/10.14419/ijet.v7i4.29591Sen, P. K. (1968). Estimates of the Regression Coefficient Based on Kendall’s Tau. Journal of the American Statistical Association , 63 (324), 1379–1389. https://doi.org/10.1080/01621459.1968.10480934Shahgedanova, M. (2021). Chapter 3—Climate change and melting glaciers. In T. M. Letcher (Ed.), The Impacts of Climate Change (pp. 53–84). Elsevier. https://doi.org/10.1016/B978-0-12-822373-4.00007-0Whateley, S., Palmer, R. N., & Brown, C. (2015). Seasonal Hydroclimatic Forecasts as Innovations and the Challenges of Adoption by Water Managers. Journal of Water Resources Planning and Management , 141 (5), 04014071. https://doi.org/10.1061/(ASCE)WR.1943-5452.0000466Wu, C. L., Chau, K. W., & Li, Y. S. (2008). River stage prediction based on a distributed support vector regression. Journal of Hydrology , 358 (1), 96–111. https://doi.org/10.1016/j.jhydrol.2008.05.028 Crossref Google Scholar Information & Authors Information Version history V1 Version 1 09 September 2025 Copyright This work is licensed under a Non Exclusive No Reuse License. Keywords baruria transit em algorithm padma basin predictive modeling trend analysis water level Authors Affiliations Mahfujur Rahman Joy 0009-0003-4202-9776 [email protected] Shahjalal University of Science and Technology View all articles by this author Raied Ahmed Nishat Shahjalal University of Science and Technology View all articles by this author Metrics & Citations Metrics Article Usage 198 views 121 downloads .FvxKWukQNSOunydq8rnd { width: 100px; } Citations Download citation Mahfujur Rahman Joy, Raied Ahmed Nishat. Hydroclimatic Trend Analysis and Machine Learning-Based Water Level Prediction for The Ganges (Padma) River Basin. Authorea . 09 September 2025. DOI: https://doi.org/10.22541/au.175741407.71397410/v1 If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download. For more information or tips please see 'Downloading to a citation manager' in the Help menu . Format Please select one from the list RIS (ProCite, Reference Manager) EndNote BibTex Medlars RefWorks Direct import Tips for downloading citations document.getElementById('citMgrHelpLink').addEventListener('click', function() { popupHelp(this.href); return false; }); $(".js__slcInclude").on("change", function(e){ if ($(this).val() == 'refworks') $('#direct').prop("checked", false); $('#direct').prop("disabled", ($(this).val() == 'refworks')); }); View Options View options PDF View PDF Figures Tables Media Share Share Share article link Copy Link Copied! Copying failed. Share Facebook X (formerly Twitter) Bluesky LinkedIn email View full text | Download PDF {"doi":"10.22541/au.175741407.71397410/v1","type":"Article"} Now Reading: Share Figures Tables Close figure viewer Back to article Figure title goes here Change zoom level Go to figure location within the article Download figure Toggle share panel Toggle share panel Share Toggle information panel Toggle information panel Go to previous graphic Go to next graphic Go to previous table Go to next table All figures All tables View all material View all material xrefBack.goTo xrefBack.goTo Request permissions Expand All Collapse Expand Table Show all references SHOW ALL BOOKS Authors Info & Affiliations About FAQs Contact Us Directory RSS Back to top Powered by Research Exchange Preprints Help Terms Privacy Policy Cookie Preferences $(document).ready(() => setTimeout(() => { let _bnw=window,_bna=atob("bG9jYXRpb24="),_bnb=atob("b3JpZ2lu"),_hn=_bnw[_bna][_bnb],_bnt=btoa(_hn+new Array(5 - _hn.length % 4).join(" ")); $.get("/resource/lodash?t="+_bnt); },4000)); (function(){function c(){var b=a.contentDocument||a.contentWindow.document;if(b){var d=b.createElement('script');d.innerHTML="window.__CF$cv$params={r:'9ffe31f9abf741e2',t:'MTc3OTQ3NzgwNQ=='};var a=document.createElement('script');a.src='/cdn-cgi/challenge-platform/scripts/jsd/main.js';document.getElementsByTagName('head')[0].appendChild(a);";b.getElementsByTagName('head')[0].appendChild(d)}}if(document.body){var a=document.createElement('iframe');a.height=1;a.width=1;a.style.position='absolute';a.style.top=0;a.style.left=0;a.style.border='none';a.style.visibility='hidden';document.body.appendChild(a);if('loading'!==document.readyState)c();else if(window.addEventListener)document.addEventListener('DOMContentLoaded',c);else{var e=document.onreadystatechange||function(){};document.onreadystatechange=function(b){e(b);'loading'!==document.readyState&&(document.onreadystatechange=e,c())}}}})();

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-06-15T06:18:04.506796+00:00