Research on The Inversion Model of Water Environment Parameters of Coal Mining Subsidence Waters Based on Machine Learning

preprint OA: closed
Full text JSON View at publisher
Full text 215,011 characters · extracted from preprint-html · click to expand
Research on The Inversion Model of Water Environment Parameters of Coal Mining Subsidence Waters Based on Machine Learning | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Research on The Inversion Model of Water Environment Parameters of Coal Mining Subsidence Waters Based on Machine Learning Haitao Wu, Ying Liu, Yuzhi Zhou, Yanxue Gao, Yong Li, Xiaoyang Chen This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6914046/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract The Long-term large-scale mining of coal underground has led to the destruction of the initial water system structure on the surface, and the water pollution of subsidence waters has become increasingly serious. The accuracy of the traditional water quality parameter concentration inversion model is low, and the current improvement of water quality monitoring technology and the improvement of the inversion accuracy of water quality parameters will play a vital role in protecting the water resources in the mining area. This study focuses on coal mining subsidence water areas in Huainan City, combining measured water quality data from spring, summer, autumn, and winter of 2024 with concurrent Sentinel-2 satellite imagery. Based on statistical regression algorithms and three machine learning algorithms of Random Forest (RF), Decision Tree (DT), and Support Vector Machine (SVM), the concentrations of total nitrogen (TN), total phosphorus (TP), ammonium nitrogen (NH₄⁺-N) and chlorophyll-a (Chl-a) in subsidence waters are fitted and modeled and the accuracy of the model is verified. A comprehensive comparison of model performance in water quality inversion revealed that machine learning models significantly outperformed traditional statistical regression models in terms of inversion accuracy. Among them, RF, DT, and SVM exhibited varying strengths across different seasons and water quality parameters, with the best-performing models achieving coefficient of determination (R²) values generally exceeding 0.8 and stable validation accuracy. These findings highlight the advantages of machine learning algorithms in water quality remote sensing inversion and further confirm the technical feasibility of this approach for monitoring complex aquatic environments. By integrating scientific data analysis with machine learning techniques, It not only provides more accurate data support for the monitoring and management of water quality in coal mining subsidence waters, but also provides a scientific decision-making basis for water ecological protection. Subsidence Waterlogged Areas Machine Learning Remote Sensing Inversion Water Quality Parameters Sentinel-2 Imagery Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Introduction As a typical high-diving coal-grain composite area in the eastern plains of China, its shallow groundwater is buried at a shallow depth and rich in groundwater resources. With the long-term large-scale mining of coal underground, soil subsidence is caused [He et al., 2020], It has led to the destruction of the initial water system structure on the surface, the alteration of the landscape, and the disturbance of surface run-off, forming permanent or seasonal stagnant water in different landscapes such as reservoirs, lakes, wetlands, or plain reservoirs [Chen et al., 2016; Sanmiquel et al., 2018].However, the water environment system in this region is extremely vulnerable to pollution and destruction due to agricultural irrigation, industrial production, and fishery activities. For example, the continuous import of exogenous nitrogen and phosphorus nutrients into the above-mentioned waters has led to increasingly serious pollution, and eutrophic cyanobacteria blooms frequently, causing imbalances in the structure of aquatic biomes and ecosystems. A series of environmental problems such as functional degradation eventually formed a heterotrophic ecosystem dominated by algae, which seriously affected the health and stability of the water ecosystem [Chen et al., 2018]. The current improvement of water quality monitoring technology and the improvement of the accuracy of water quality parameter inversion will play a vital role in protecting water resources, preventing and controlling water pollution, and strengthening the effective management of water resources. Existing research mostly uses traditional on-site regular collection of water samples, and then chemical testing and analysis through the laboratory. This method has the advantages of many detectable parameters and high accuracy, but the sampling process is difficult, the economic cost is high, the timeliness is low, and it is susceptible to factors such as the geographical location of the mining area and bad weather [Vasissha et al., 2020], especially the water quality data of specific sampling sites can only represent the water quality information of the sample site, and cannot objectively reflect the water quality parameters of the entire water area, and it is difficult to obtain the water quality status of the entire water area in a timely, fast and accurate manner [He et al., 2021]. In recent years, science and technology have advanced by leaps and bounds, and a large number of advanced satellites at home and abroad (such as Landsat, MODIS, GF-1 and HJ-1, etc.) have been launched one after another, providing a large number of remote sensing data sources for monitoring the evolution and trend of water quality in coal mining subsidence waters [Mccullough et al., 2013; Shi et al., 2018; Alikas et al., 2017]. Satellite monitoring has the characteristics of high efficiency and wide monitoring range [Fan et al., 2024]. It can monitor water quality pollution in coal mining subsidence waters in a timely manner, and can realize rapid identification and diagnosis of water environment ecology, which greatly enriches the results of remote sensing inversion in related fields [Zhang et al., 2021]. It can also monitor the dynamic changes and distribution of water quality in time and space for a long time, play an increasingly important role in water quality monitoring and early warning, and provide scientific basis and technical support for the ecological management and protection of the water environment in coal mining subsidence waters [Qin et al., 2014]. In recent years, most of the research on remote sensing inversion at home and abroad has focused on heavily polluted lakes and rivers [Wang et al., 2023]. At present, remote sensing monitoring of water quality has rarely been involved in the research on water quality inversion in coal mining subsidence waters, and there is a lack of practical application and large-scale deployment experience. There is an urgent need to explore the rapid, efficient and green use of this type of water. Accurate inversion method for water quality monitoring. With the vigorous development of remote sensing technology, many scholars at home and abroad have gradually in-depth research on water quality remote sensing inversion algorithms. Traditional mathematical statistical regression algorithms are simple to operate and have good explanatory properties, but the linear fitting model is single and the fitting accuracy is usually low, which is not suitable for solving complex nonlinear problems. Machine learning algorithms are widely used in the field of remote sensing inversion, which provides new opportunities for the further development of water quality parameter monitoring and prediction [Yuan et al., 2020]. Machine learning algorithms can use sample data to realize autonomous optimization and iterative evolution of empirical models, and better mine potential connections between data, thereby improving the analytical accuracy and generalization performance of weak correlation relationships, so that it can show significant advantages in processing data regression tasks with low correlation. There are many kinds of machine learning. The common applications of machine learning regression algorithms include decision trees, random forests, support vector machines (SVM), and BP neural networks. In recent years, the application of machine learning algorithms has greatly improved the inversion accuracy of non-optically active water quality parameters such as TN、TP、NH 4 + -N and DO [Yuan et al., 2019].Jamal et al. used SVM to simulate the quality of the Ajchai River in Iran, proving that SVM showed higher R 2 and lower RMSE and MAE values at all simulation sites [Sarafaraz et al., 2024], Xiong et al. used traditional mathematical statistical regression models and machine learning models to study the TP inversion of Taihu Lake, respectively. It was found that the machine learning model has higher accuracy [Xiong et al., 2022].Wang Chunling and others constructed a COD high-precision inversion model based on four machine learning algorithms: linear regression, Random Forest, AdaBoost, and XGBoost, which provides new methods and ideas for the establishment of machine learning inversion models in the field of hyperspectral water quality monitoring [Wang et al., 2022]. Although the inversion results of these water quality parameters have their own characteristics, most of the research objects are large-area water areas such as lakes and rivers, and the accuracy of the inversion model needs to be improved. In the future, it is necessary to further construct an inversion model for the different water quality parameters of a particular water area. Therefore, this study focuses on coal-mining subsidence waters. Based on the spectral characteristics of Sentinel-2 remote sensing images combined with simultaneous measured water quality concentration data, mathematical statistical regression methods and machine learning-based methods are used to model the water quality of coal-mining subsidence waters by remote sensing inversion. It aims to break through the spatiotemporal limitations of traditional concentration inversion of water quality parameters, Combined with machine learning algorithms, a more accurate and efficient water quality parameter inversion model is established, which breaks through the temporal and spatial limitations of traditional water quality monitoring, and provides new opportunities for the further development of water quality parameter concentration monitoring and prediction. Overview of the research area The research area is located in Panji District, Huaihua City, East China, at the junction of the Jianghuai hills and the Huanghuaihai plain (116°21′05″E-117°12′30″E, 31°54′08″N-33°00′26″N), with the Panji mining area situated in the middle reaches of the Huaihe River basin. It was built and thrived because of coal, and it is one of the 14 billion-ton coal bases and 6 major coal-electricity integration bases confirmed by China [He et al., 2020]. The terrain in the territory is flat and there are many river networks. It is a resource-based town dominated by coal, electric power and chemical industry,is presented in Fig. 1 . It belongs to the warm temperate semi-humid monsoon climate, which has the characteristics of four distinct seasons, mild climate and moderate precipitation. Precipitation is mainly concentrated in summer (June-August), and precipitation accounts for more than half of the year. It is prone to short-term heavy precipitation or heavy rains. There is less precipitation in winter and the climate is relatively dry. In this study, three typical small and micro-coal mining subsidence waters in this area were selected as the research objects. The main utilization methods of coal mining subsided waters are aquaculture and photovoltaic power generation. At the same time, the water environment is affected by various factors such as industrial waste discharge, direct urban and rural sewage discharge, and surface run-off pollution, and the water quality is in poor condition. Research methods and data processing Collection of water quality parameters Taking into account the spatial and seasonal differences in surface water quality, the grid distribution method is used to set up 68 effective sampling points in the research area, and the overall spatial layout is relatively balanced and reasonably covers the entire water area. A total of 4 field samples were taken in the study. The water samples were collected on March 11, 2024, June 12, 2024, September 22, 2024, and December 24, 2024 in four different quarters, spring, summer, autumn and winter. A total of 272 water samples were collected. The weather was clear during the sampling period., The wind speed is small and the water surface is calm. At the same time, GPS is used to record the geographic location of each sampling point. Collect water samples 0.5m below the surface water surface of the subsided water area, load the collected water samples into polyethylene bottles, and according to the testing requirements of different water quality indicators, take them back to the laboratory after corresponding treatment on site, and detect the concentration of four non-optical water quality parameters of TN, TP, NH 4 + -N and Chl-a in accordance with the methods specified in the national standard. The manual monitoring data is divided into training samples and test samples in a 4:1 ratio, which are used for the construction of the model and the evaluation of the accuracy of the model inversion. The detection method refers to the standards of HJ-636-2012, GB 11893-89, HJ 357–2009 and HJ 897–2017 in the "National Environmental Protection Standards of the People's Republic of China". The total nitrogen concentration is measured by alkaline potassium persulfate ultraviolet spectrophotometry; the total phosphorus concentration is measured by ammonium metabolite spectrophotometry; and the ammonia nitrogen concentration is measured by alkaline potassium persulfate ultraviolet spectrophotometry. Naer's reagent spectrophotometry; chlorophyll a concentration determination using spectrophotometry. Satellite data acquisition and preprocessing In this study, we selected data from the European Space Agency (ESA). https://www.esa.int ) Sentinel-2 images downloaded from the same period as the field water sample collection and with less than 10% cloud cover. Sentinel-2 satellite consists of two satellites, Sentinel-2A and Sentinel-2B. They can operate in collaboration to shorten the revisit time and thus provide more intensive continuous observation data. The portable multispectral imager (MSI) can provide data in 13 frequency bands, including visible light, near-infrared (near-infrared), and near-infrared (near-infrared).NIR), short-wave infrared (SWIR) and other bands(Table 1 ).The main data products of Sentinel-2 are Level-1B, Level-1C, and Level-2A. The L2A-level remote sensing data selected in this study has been radiated corrected and atmospheric corrected, and has higher brightness and contrast. You only need to use the SNAP software downloaded from the ESA's official website for remote sensing.The data is resampled to 10 m, the layer stacking tool in ENVI 5.3 software is used for image band fusion, the waters of the research area are roughly cropped using ROI, etc. Processing, and the water body reflectance information of the sampling point is extracted. Table 1 Sentinel-2 multispectral information for each band Band Center wavelength (λ/nm) Wave width ( λ /nm) Spatial resolution (m) B1-Coastalaerosol 443 20 60 B2-Blue 490 65 10 B3-Green 560 35 10 B4-Red 665 30 10 B5-Vegetation Red Edge 705 15 20 B6-Vegetation Red Edge 740 15 20 B7-Vegetation Red Edge 783 20 20 B8-NIR 842 115 10 B8A-Narrow NIR 865 20 20 B9-Narrow NIR 945 20 60 B10-Narrow NIR 1375 30 60 B11-SWIR 1610 90 20 B12-SWIR 2190 180 20 In order to accurately obtain the sensitive bands with high correlation between water bodies in the studied waters, the Pearson correlation coefficient is used in this study to extract the corresponding characteristic bands of the sampling points in the studied waters. The reflectance information of the spectral band and the concentration of each water quality parameter are used as different variables. The correlation value ranges from − 1 to 1. The closer the value is to 1 or -1, the stronger the correlation. Refer to the following formula to calculate the correlation coefficient between them. $$\:\begin{array}{c}r=\frac{{\sum\:}_{i=1}^{n}\left({X}_{i}-\stackrel{̄}{X}\right)\left({Y}_{i}-\stackrel{̄}{Y}\right)}{\sqrt{{\sum\:}_{i=1}^{n}({X}_{i}-\stackrel{̄}{X}{)}^{2}}\sqrt{{\sum\:}_{i=1}^{n}({Y}_{i}-\stackrel{̄}{Y}{)}^{2}}}\#\text{(}\text{1}\text{)}\end{array}$$ Where r is the correlation coefficient and n is the total number of samples; Xi represents the spectral value of the band combination data, \(\:\overline{X}\) represents the average value of the spectral values of the band combination data; Yi represents the measured value of the water quality parameters, and \(\:\overline{Y}\) represents the average value of the measured values of the water quality parameters. Extraction of water bodies In the study of remote sensing and inversion of water quality, the target area usually focuses on the water body itself. In order to optimize the efficiency of the inversion process and reduce the interference of non-water body elements such as photovoltaic and land areas in the research area to the model, it is necessary to carry out accurate boundary identification of coal mining subsidence waters. The current common water body extraction techniques include single-band method, multi-band combination method, vegetation index method and water body index method. In this study, the normalized water body index (NDWI) [Mcfeeters et al., 1996] method was used to enhance the characteristics of the water body, while weakening other local information such as vegetation and soil. The principle is based on the unique spectral response law of the water body in the visible-near-infrared band: Through the calculation of the normalized difference between the green band (reflection valley area of water bodies) and the near-infrared band (reflection peak area of water bodies), the spectral separability between water bodies and non-water bodies can be significantly expanded, thereby effectively suppressing background noise and strengthening the signal characteristics of water bodies, to achieve high-precision water range extraction. $$\:\begin{array}{c}NDWI=\frac{Green-NIR}{Green+NIR}\#\text{(}\text{2}\text{)}\end{array}$$ Where: Green corresponds to the B3 band of the Sentinel-2 image; NIR corresponds to the B8 band of the Sentinel-2 image. There are many methods for extracting water bodies from Sentinel-2 satellite remote sensing image data, including artificial visual interpretation [Dai et al., 2019], supervised classification, unsupervised classification, single-band threshold method and multi-band threshold method [Xu et al., 2021]. There are many rivers and lakes in the research area, and the submerged waters are relatively scattered, so the effect of artificial interpretation is not good. Therefore, combined with artificial visual interpretation, select the appropriate frequency band, determine the threshold value of the water body, use the single-band threshold method and the inter-spectral relationship analysis method to extract the water body, and establish a model of the spectral characteristics of the water body. Through multi-band synthesis, the single-band NDWI of spring, summer, autumn and winter quarters is combined into a multi-band. Based on the decision tree classification, the spring is used as the reference quarter, which is divided into two categories: water bodies and non-water bodies, of which NDWI > 0 is the water body, NDWI ≤ 0 is the non-water body, and the water body information of each quarter is extracted in turn. Principles of inversion model algorithm Mathematical statistical regression model Mathematical statistical regression method is a traditional statistical method commonly used in the study of water quality inversion, especially suitable for quantitative inversion of optically active substances in water bodies. This method can not only reveal the correlation between variables through mathematical expressions (empirical formulas), but can also be used to predict or regulate the trend of changes in target variables [Cui et al., 2021]. Prediction is made by establishing an explicit mathematical relationship between independent variables (such as spectral bands, derived exponents) and dependent variables (water quality parameters). Such models have clear physical interpretability and high computational efficiency, and are suitable for scenarios where the linear relationship is significant and the multiple collinearities between variables are low. However, its adaptability to nonlinear relationships, high-order interaction effects, and complex noise is weak, which can easily lead to limited extrapolation capabilities, especially in the modern remote sensing inversion of water quality driven by multi-source heterogeneous data, it is difficult to fully capture the dynamic coupling mechanism of spectral characteristics and pollutant concentration, which has become its main use in high-precision modeling. bottleneck. In this study, a 4:1 ratio is used to divide the data set into training sets and verification sets. Based on the experimental data, single-band or band combinations with high correlation are selected, and various mathematical and statistical regression models such as unary linearity, exponential, logarithmic, power function, and quadratic polynomial are constructed. Machine learning algorithms Machine learning regression algorithms are a class of supervised learning methods designed to make numerical predictions by establishing the mapping relationship between input variables and continuous target variables. Its core is to minimize the deviation between the predicted value and the true value by optimizing the loss function (such as mean square error, absolute error). Common algorithms include linear regression (based on linear combination modeling of features, suitable for high-interpretability scenarios), decision tree regression (dividing feature space through tree structure to deal with nonlinear relationships), support for linear regression (based on linear combination modeling of features, suitable for high-interpretability scenarios), decision tree regression (dividing feature space through tree structure to deal with nonlinear relationships), and support for linear regression (based on linear combination modeling of features, suitable for high-interpretability scenarios).Vector regression (robust prediction based on kernel function mapping high-dimensional space), integrated regression methods (such as random forests, gradient lifting trees, reducing variance and deviation through multi-model fusion), and neural network regression (using deep nonlinear structures to fit complex data patterns). In addition, regularization techniques (such as ridge regression and Lasso) alleviate overfitting problems by constraining model complexity. The machine learning models used in this study for the inversion of water quality parameter concentration in coal mining subsidence waters include DT, RF, and SVM. The above three machine learning algorithms are used to construct a regression model of the concentration of water quality parameters in different quarters, is presented in Fig. 2 . The model construction is based on MATLAB R2022a software, and the hyperparameters of the inversion model are finally determined after repeated training and screening based on the water quality parameter concentration and spectral reflectance data sets. (1) Decision tree Decision tree regression is a non-parametric supervised learning algorithm that constructs a tree structure model through recursive binary feature space to predict continuous target variables. Its core is to select the optimal features and their segmentation points through variance minimization criteria (such as mean square error and average absolute error), divide the data into homogeneous subsets, and finally use the mean of the samples in the output area of the leaf node as the predicted value. The algorithm has the advantages of intuitive interpretability (visual tree structure), no need for data distribution assumptions, can handle nonlinear relationships and missing values, and is not sensitive to feature scales [Zhang et al., 2016]. (2) Random Forest Random forest regression is a nonlinear regression algorithm based on integrated learning that improves the generalization ability and robustness of the model by constructing multiple decision trees and aggregating their prediction results (usually averaged).Its core mechanism combines Bootstrap sampling (there are put-back sampling to generate diverse training subsets) and random feature selection (only some features are considered when each tree is split) to reduce model variance and suppress overfitting through dual randomness.Compared with a single decision tree, random forest effectively balances deviation and variance through “collective wisdom”, can process high-dimensional, nonlinear and noisy data, and is insensitive to missing values and feature scales, and supports parallelization training [Domitr et al., 2023; Alnahit et al., 2022]. the principle can be Expressed as: $$\:\begin{array}{c}\widehat{y}\left(x\right)=\frac{1}{T}{\sum\:}_{t=1}^{T}f\left(x;\theta\:t\right)\#\text{(}\text{3}\text{)}\end{array}$$ Where \(\:\widehat{y}\) (x) is the predicted value of the random forest for sample x, T is the number of decision trees, \(\:f\left(x;\theta\:t\right)\) is the predicted result of the t-th decision tree for sample x , and \(\:\theta\:t\) is the parameter of the t-th decision tree. (3) Support Vector Machine Support vector machine regression is a supervised regression algorithm based on statistical learning theory. Its core idea is to construct an interval band (ε-tube) of an ε-insensitive loss function., Under the premise of tolerating a certain prediction error (controlled by ε), find the optimal hyperplane to fit the data and maximize the geometric boundary of the spacer band. The optimization goal is to combine the principle of structural risk minimization and balance the model complexity and training error by adjusting the regularization parameter C. It has the robustness to outlier values and the ability to adapt to high-dimensional data [Mohammadib et al., 2020]. The corresponding mathematical expressions are as follows. $$\:\begin{array}{c}f\left(x\right)={\sum\:}_{i=1}^{n}\alpha\:iK\left({x}_{i},x\right)+b\#\text{(}\text{4}\text{)}\end{array}$$ Among them, \(\:\alpha\:i\) is the Lagrangian multiplier, K ( \(\:{x}_{i},x\) ) = exp ( - γ ‖ \(\:{x}_{i}-x\) ‖ 2 ) is the radial basis kernel function, γ is the kernel function parameter, the higher the value, the more local the impact, xi is the support vector, b is the bias term. In this study, three machine learning methods, support vector machine regression, random forest and decision tree, are used to construct a water quality inversion model of coal mining subsidence waters. The model data set is divided into a 4:1 training set and a test set. The prediction accuracy of the model on the training set and the test set is comprehensively evaluated by determining the three indicators of coefficient (R 2 ), root mean square error (RMSE), and mean absolute error (MAE). According to the evaluation results, the model parameters are adjusted to improve the accuracy and generalization ability of the model. Correlation inversion model accuracy evaluation method In order to evaluate the predictive performance of the model and choose the most ideal model, in this study, three indicators are selected: the determinant coefficient (R 2 ), the average relative error (MAE), and the root mean square error (RMSE) to evaluate and analyze the accuracy of the water quality parameter inversion model. $$\:\begin{array}{c}{R}^{2}=\frac{{\sum\:}_{i=1}^{n}({\widehat{P}}_{i}-{P}_{i}{)}^{2}}{{\sum\:}_{i=1}^{n}({\stackrel{̄}{P}}_{i}-{P}_{i}{)}^{2}}\#\text{(}\text{5}\text{)}\end{array}$$ R 2 represents the correlation between the measured water concentration value and the predicted value of the model. The closer the value is to 1, the better the accuracy of the model. $$\:\begin{array}{c}MAE=\frac{1}{n}{\sum\:}_{i=1}^{n}\left|\left({P}_{i}-{\widehat{P}}_{i}\right)\right|\#\text{(}\text{6}\text{)}\end{array}$$ MAE is used to measure the average level of absolute error between the measured water concentration value and the predicted value. $$\:\begin{array}{c}RMSE=\sqrt{\frac{{\sum\:}_{i=1}^{n}({P}_{i}-{\widehat{P}}_{i}{)}^{2}}{n}}\#\text{(}\text{7}\text{)}\end{array}$$ RMSE indicates the deviation between the actual water quality concentration values and the predicted values. The lower the deviation value, the better the prediction effect, and the higher the model accuracy. Where, \(\:\widehat{{P}_{i}}\) and \(\:{P}_{i}\) represent the predicted and measured values of the water quality parameter, respectively; \(\:\overline{P}\) is the average of the measured values; and \(\:n\) represents the total number of water sample points. Results Original spectrum and correlation analysis The original remote sensing spectral data of the collected 68 sampling points are processed by ENVI software, and the remote sensing reflectance information of the sampling point is extracted, as shown in the figure below (Fig. 3 ). As can be seen from the figure, the trend of spectral curves at different sampling points is similar, but there is no significant difference. However, due to certain differences in water quality concentration between different sampling points, the valley peak level and the speed of change of the reflection curve are different, of which the reflectance value between the sampling points No. 0–20 is relatively high. High, it may be caused by different water quality conditions in the area, and it is difficult to directly predict the concentration of water quality indicators through the spectral curve. The measured water quality parameter concentration and Sentinel-2 single-band image reflectance data were analyzed by Pearson correlation, and the results are shown in Fig. 4 . It can be seen that, except for the low correlation between individual bands in summer, the correlation between the remaining bands is significant, reaching more than 0.5.According to the correlation analysis, it can be seen that the water quality parameters of each quarter have a high correlation with the bands B3, B4, and B5. Among them, TP and Chl-a have a significant correlation with B2 and B8.Due to the information redundancy between the various frequency bands of the remote sensing image, the relationship between the water quality parameters and the reflectance of each frequency band cannot be well reflected. Therefore, the spectral reflectance of the sampling point is further processed by band combination, so as to eliminate interference between bands to a certain extent, effectively reduce the influence of other impurities in the water body, and highlight the correlation between water quality parameters and band reflectance. Select a single band with a high correlation with each water quality parameter for band combination operation. The band combination method includes band difference, band ratio and other calculation methods. Therefore, select the sensitive band or band combination with a high correlation between each water quality parameter and select the spectral parameter (significant level P) that is more sensitive to the water quality parameter < 0.05) Perform inversion model establishment. Analysis of statistical regression model Respectively, the band or band combination value with the highest correlation is used as the independent variable, and the concentration value of the water quality parameter is used as the dependent variable. Various mathematical and statistical regression and inversion models such as quadratic polynomial, logarithmic, logarithmic square and power function are established, and the results are evaluated based on the accuracy of the verification set, The optimal fitting model is selected as the best fitting model for each water quality index based on statistical regression. The optimal fitting models and effects of different quarters of related water quality indicators are shown below (Table 2 ). In the inversion model based on mathematical statistical regression, there are differences in the concentration of water quality parameters due to different seasons, which in turn affects the spectral reflectance of remote sensing images, resulting in different accuracy of the inversion models of various water quality indicators [Chen et al., 2025]. Among them, the TN and NH 4 + -N have the highest modeling set accuracy in the spring inversion model, with R 2 reaching 0.4701 and 0.6042, respectively, and the verification set R 2 is 0.4023 and 0.5041. The TP inversion accuracy is the highest in summer, with a modeling set R 2 of 0.6223 and a verification set of 0.6021. The highest inversion accuracy of Chl-a concentration is 0.6322, and the verification set is 0.6102. Different water quality parameters are affected by seasonal climate and show different inversion accuracy. Although the verification set R 2 is relatively stable, the model has a strong generalization ability. However, in practical applications, the overall goodness of fit of the model has not achieved the expected effect. This phenomenon may be due to the fact that the mathematical statistical regression model uses only a small amount of band information for regression modeling, and does not fully consider the combined information of a large number of related bands, so it shows low inversion accuracy in the inversion of water quality concentration in each quarter. Table 2 Optimal model of remote sensing inversion of water quality parameters based on mathematical statistical regression Water quality parameters quarter Model type Model expression Modeling set R ༒ Verification set R ༒ TN spring logarithmic square Y=-30.5841-39.7286*ln(X)-11.696*ln(X) 2 0.4701 0.4023 summer Quadratic polynomial Y=-6.3470 + 60.5119X-110.1181X 2 0.2513 0.2437 autumn Linear Y = 6.6032–17.7869*X 0.4272 0.3986 winter Quadratic polynomial Y=-3.8503 + 76.5355*X-252.6308*X 2 0.2480 0.2298 TP spring Quadratic polynomial Y = 0.1337-0.3026X + 252.6864X 2 0.5936 0.5031 summer logarithm Y = 0.23 + 0.09*ln(X) 0.6223 0.6021 autumn Power function Y = 0.1205*3120.7271 x -0.4882 0.3568 0.3142 winter logarithmic square Y = 16.9536 + 46.8780*ln(X) + 33.0362*ln(X) 2 0.2730 0.2676 NH 4 + -N spring logarithmic square Y=-4.016-6.5232*ln(X)-2.2718*ln(X) 2 0.6042 0.5041 summer Power function Y = 1.3342X 2.0183 +0.088 0.4366 0.4415 autumn Hyperbolic Y = 1/(-6.0075 + 36.7093*X) 0.4816 0.4376 winter Quadratic polynomial Y=-0.3713 + 23.1665*X-99.8451*X 2 0.3565 0.3351 Chl-a spring Quadratic polynomial Y=-1.4158 + 183.5549*X-2635.1177*X 2 0.5241 0.4923 summer logarithmic square Y = 9.72-6.29ln(X)-19.2849ln(X) 2 0.6322 0.6102 autumn logarithm Y = 34.9376 + 22.1397*ln(X) 0.5046 0.4795 winter Power function Y = 0.1117*X − 1.843 -2.1875 0.4275 0.4068 Note : In the table above, X is the reflectance of the sensitive band with a high correlation with the water quality index. Machine learning model analysis Construct a machine learning feature data set based on reflectance data and highly correlated band combination values, respectively, The input of the model is the subset of features and the concentration values of water quality parameters filtered by Pearson correlation analysis, and the training set and the verification set are randomly divided into 4:1.Three machine learning models, decision tree, random forest (RF) and support vector machine (SVM), were established for four different water quality indicators, and the decision coefficient R 2 , root mean square error RMSE and mean absolute error MAE were used as evaluation criteria to verify whether the model is accurate. According to the evaluation results, the model parameters are adjusted to improve the accuracy and generalization ability of the model. The training and verification effects of each model are shown in the following table. Table 3 Accuracy evaluation of results of different machine learning models Water quality parameters quarter model Modeling set Verification set R 2 MAE RMSE R 2 MAE RMSE TN spring DT 0.7908 0.325 0.4422 0.6067 0.365 0.499 RF 0.524 0.312 0.433 0.245 0.354 0.476 SVM 0.484 0.289 0.453 0.312 0.353 0.478 summer 决策树 0.695 0.059 0.126 0.506 0.077 0.091 RF 0.695 0.059 0.126 0.506 0.077 0.094 SVM 0.854 0.058 0.091 0.753 0.055 0.072 autumn DT 0.107 0.421 0.522 0.217 0.375 0.487 RF 0.863 0.164 0.216 0.592 0.321 0.405 SVM 0.459 0.298 0.407 0.272 0.351 0.469 winter DT 0.858 0.084 0.139 0.654 0.129 0.176 RF 0.807 0.074 0.118 0.599 0.087 0.131 SVM 0.777 0.154 0.182 0.585 0.096 0.139 TP spring DT 0.803 0.019 0.031 0.315 0.043 0.045 RF 0.813 0.016 0.023 0.776 0.021 0.026 SVM 0.476 0.038 0.052 0.249 0.039 0.047 summer DT 0.877 0.002 0.001 0.885 0.002 0.003 RF 0.819 0.001 0.002 0.918 0.001 0.002 SVM 0.816 0.001 0.002 0.984 0.001 0.001 autumn DT 0.823 0.073 0.099 0.327 0.170 0.241 RF 0.657 0.110 0.149 0.502 0.121 0.160 SVM 0.838 0.074 0.097 0.689 0.091 0.124 winter 决策树 0.699 0.092 0.128 0.212 0.085 0.108 RF 0.658 0.093 0.1281 0.338 0.076 0.099 SVM 0.742 0.057 0.072 0.636 0.109 0.134 AH3-N spring DT 0.8408 0.272 0.484 0.5097 0.45489 0.725 RF 0.865 0.305 0.446 0.402 0.619 0.851 SVM 0.8187 0.237 0.359 0.658 0.486 0.597 summer DT 0.850 0.004 0.006 0.676 0.015 0.016 RF 0.899 0.005 0.006 0.833 0.014 0.015 SVM 0.674 0.006 0.008 0.982 0.019 0.019 autumn DT 0.710 0.093 0.123 0.625 0.141 0.185 RF 0.891 0.077 0.099 0.005 0.212 0.241 SVM 0.419 0.177 0.212 0.121 0.286 0.343 winter DT 0.836 0.057 0.129 0.733 0.014 0.018 RF 0.797 0.141 0.186 0.362 0.104 0.123 SVM 0.696 0.138 0.193 0.464 0.074 0.089 Chl-a spring DT 0.864 0.623 1.185 0.709 1.323 1.813 RF 0.713 1.152 1.72 0.706 1.611 2.018 SVM 0.439 1.456 2.404 0.481 1.516 2.327 summer DT 0.877 0.116 0.240 0.354 0.447 0.627 RF 0.688 0.181 0.397 0.683 0.263 0.335 SVM 0.515 0.157 0.477 0.947 0.078 0.096 autumn DT 0.783 0.896 1.228 0.178 3.789 4.127 RF 0.733 1.427 1.659 0.568 2.191 2.580 SVM 0.159 2.052 2.426 0.161 2.441 2.952 winter DT 0.891 0.329 0.436 0.662 0.728 0.907 RF 0.816 0.449 0.566 0.874 0.653 0.732 SVM 0.906 0.371 0.479 0.823 0.514 0.592 Note : The bold font is the optimal machine learning algorithm There are certain differences between the concentration inversion models of water quality parameters of subsidence waters based on different machine learning algorithms. As can be seen from the table above (Table 3 ), the accuracy of the machine learning algorithm's inversion model for the concentration of different water quality indicators in different quarters can reach a high level. Among them, the minimum coefficient of determination of the TN training set is 0.7908, and the coefficient of determination of the verification set is determined by the verification set in the other three quarters except autumn. The coefficients are all above 0.6, The DT regression algorithm showed good results in spring and winter, and SVR and RF performed well in summer and autumn, respectively. Compared with the traditional mathematical statistical regression model, the fitting performance of the machine learning model is better. In the TP inversion model, DT has the best fitting effect, and the test set has high inversion accuracy. Its determinant coefficient R 2 reaches 0.885, the root mean square error RMSE is 0.026, and the absolute error MAE is 0.021. In the mathematical statistical regression model, R 2 is 0.6223. Compared with the SVR algorithm in the inversion of TP concentration in autumn and winter, the accuracy of the mathematical statistical regression algorithm has been greatly improved. NH 4 + -N and Chl-a concentration also have high inversion accuracy. In the four quarters of ammonia nitrogen concentration inversion, the determinant coefficient reached more than 0.71, of which three quarters exceeded 0.81. The determination coefficient of the Chl-a training set can reach up to 0.906 in the winter concentration inversion, and the inversion accuracy is significantly improved compared to the traditional mathematical model. The three machine learning models have shown different advantages in the inversion of water quality in different quarters. In order to more intuitively reflect the prediction effect of machine learning algorithms on water quality concentration, the concentrations of TN, TP, NH 4 + -N and Chl-a are predicted based on three algorithms: SVR, DT and RF. Different the scatter plot of the comparison between the measured values of the quarterly water quality parameters and the predicted values of the optimal machine learning model is shown below (Fig. 5 ), and the degree of dispersion between the predicted values and the actual values can be intuitively seen. The scattered color indicates the concentration and density of water quality parameters, and the green color indicates that there are more points in the concentration range. The closer the slope of the regression equation is to 1, the smaller the intercept, indicating that the closer the measured value is to the predicted value, the better the prediction effect. Discussion Inversion model effects of different water quality parameters From the water quality concentration fitting scatter plot, the degree of dispersion between the true value of the sample and the predicted value can be intuitively seen. The training and verification effect of each water quality index is better, and the model fitting accuracy is high. The performance of machine learning models (SVM, RF, DT) in the concentration inversion of non-photosensitive parameters such as TN,TP and NH 4 + -N is significantly better than that of traditional statistical regression models (R 2 increased by 20%~40%), which is consistent with Xiong et al. The research conclusions in Taihu Lake are consistent, It shows that machine learning algorithms can better capture the complex relationship between nonlinear spectral characteristics and water quality parameters [Xiong et al., 2022; Jiang et al., 2021], However, the accuracy of the inversion of Chl-a concentration of the photosensitivity parameter is limited. Therefore, the difference in the accuracy of the inversion model may be related to the optical characteristics of water quality parameters. Non-photosensitive parameters such as TN and TP depend on indirect spectral responses (such as the synergistic effects of suspended solids and soluble organic matter) [Shi et al., 2015; Abayazid et al., 2019]. The dynamics and seasonal fluctuations of phytoplankton communities will change the content and distribution of chlorophyll a [Rinaldi et al., 2014], its photosensitivity is significantly affected by phytoplankton community dynamics and seasonal fluctuations, resulting in a weak correlation between spectral signals and concentration. It can be found that the machine learning model has great advantages in the inversion of the concentration of non-photosensitive parameters [Zhou et al., 2015]. Different water quality indicators have their own unique laws of change and influencing factors, which makes each model show obvious differences when inverting the concentration of different water quality indicators [Zeng et al., 2023]. The decision tree model performs well in the inversion of the concentration of two indicators, TN and NH 4 + -N. The support vector machine model performs well in the inversion of TP concentration, while the random forest occupies an advantage in the inversion of Chl-a concentration. In summary, compared with traditional mathematical statistical regression models, the inversion accuracy of DT, RF and SVM machine learning models in the inversion of water quality parameter concentration in coal mining subsidence waters has been significantly improved, It shows the applicability of machine learning models in remote sensing inversion of water quality parameter concentration [Gu et al., 2020].However, in this study, it has not been found that a single model can show a good fitting effect in the inversion of the concentration of all water quality parameters. In practical applications, it is necessary to choose a suitable model according to different seasons and specific water quality indicators. Differences in inversion models for different seasons The influence of seasonal changes on the performance of the model cannot be ignored. There are differences in the optical characteristics of water bodies in the same water area in different seasons [Wang et al., 2023]. The study found that due to the higher temperature in summer, the prediction accuracy of the model generally improves, which may be related to the stability of the optical characteristics of water bodies [Song et al., 2024]. However, in winter, the performance of the statistical regression model and the machine learning model diverged significantly. Compared with the summer, the R 2 value of the statistical regression model decreased by an average of about 30%, while the machine learning model decreased by only 10–15%, indicating that the machine learning algorithm is more adaptable to seasonal changes. There are certain water quality parameters in different quarters. Due to the large distribution interval of water quality concentration at a few points, better fitting cannot be achieved. To a certain extent, the fitting accuracy of the inversion will be reduced, but at the same time, the phenomenon of overfitting is avoided, which makes the prediction effect of the model better. In fact, the generalization of the model Stronger ability [Md et al., 2025]. There is still a certain difference between the inversion concentration of the model and the measured value (Fig. 5 ). This may be due to the uncertainty of field sampling and the optical characteristics of water bodies due to a variety of natural environmental influences, and it is difficult for the limited sampling sites to cover the spectral characteristics of the entire water area under study [Pan et al., 2025]. In summary, for the changes of specific water quality parameters in different seasons, it is difficult for the model to be flexibly adjusted to adapt to the changes. In practical applications, it lacks the universality of different research areas and seasonal characteristics. Later research also requires a large number of parameter adjustments and verification of the model for different seasonal conditions. Research significance Parameters such as TN, TP, NH 4 + -N and Chl-a are the key characteristic parameters to measure the status of water ecological environment, and it is important to accurately find out the non-point source pollution of coal mining subsidence waters by relevant monitoring methods [Chen et al., 2025]. This study focuses on the water accumulation area of coal mining subsidence in Huainan City, and comprehensively uses mathematical statistical regression method and machine learning algorithm to construct an inversion model of water quality parameters, which has important research significance in many aspects. (1) Traditional water quality monitoring methods have many limitations, such as difficult sampling process, high cost, and low timeliness, and the sampling point data cannot represent the water quality status of the entire water area [Muhoyi et al., 2025]. The research is based on remote sensing inversion technology and uses satellite remote sensing images for inversion modeling to predict the concentration of water quality parameters in coal mining subsidence waters. It shows good applicability in the inversion of the concentration of water quality indicators in coal mining subsidence waters, provides new ideas and methods for model construction, and helps to further explore the synergistic effects of different models in complex environmental monitoring.(2) In this study, the relationship between water quality parameters and remote sensing image characteristics in coal mining subsidence area was studied in depth, and the spectral response characteristics of different water quality parameters in different wavelength bands were explored. This not only helps to improve the accuracy of the water quality inversion model in coal mining subsidence areas, but also promotes the application of remote sensing monitoring technology in environmental monitoring of other water bodies. It can timely discover the change trend of water quality in subsidence waters, comprehensively grasp the water quality status of coal mining subsidence areas, and provide effective data support for water resources protection and pollution prevention and control。(3) Mathematical statistical regression algorithms have a single linear fitting model and usually have low fitting accuracy, which is not suitable for solving complex nonlinear problems [Salem, et al., 2017].Machine learning algorithms can use sample data to achieve autonomous optimization and iterative evolution of empirical models, and better explore potential connections between data. Through the deep integration of machine learning and remote sensing technology, this study breaks through the temporal and spatial limitations of traditional water quality monitoring, and provides new opportunities for the further development of water quality parameter monitoring and prediction. Research limitations and prospects When analyzing the effect of the inversion model in different seasons, there may be many factors such as insufficient coverage of sample data in time and space, data quality, model error and parameter uncertainty, which further affect the accurate evaluation of the prediction effect of the inversion model [Caballero et al.,2025]. Among them, the collection process of water quality data is limited by existing conditions and cannot evenly cover the entire research area, resulting in large errors in the concentration inversion results of water quality parameters in some areas [Luo et al., 2022]. At the same time, the insufficient resolution of the Sentinel-2 satellite image itself, the weather and cloud cover on the day the image was taken, and other factors made it difficult for the satellite to capture the details of small subsidence ponds, especially in areas obscured by photovoltaic panels, which further affected the accuracy of the inversion model. In the future, it is necessary to overcome the current problems and use high-resolution satellite image data combined with unmanned aerial vehicle hyperspectral data to increase the stability of the atmospheric correction algorithm [Liu et al., 2022], at the same time optimize the sample collection work and improve the representativeness of the sample data to improve the adaptability and inversion accuracy of the model to the complex water environment. Mathematical statistical models have different advantages and disadvantages from machine learning models. Mathematical statistical models are usually easier to explain, while machine learning models focus on results. Statistical models avoid overfitting by adjusting variables and tests, while machine learning commonly uses methods such as cross-verification and regularization. In terms of data volume, statistical models may perform better when the data volume is small, while machine learning requires a lot of data training to better realize its advantages. In the future, we can consider continuing to optimize the algorithm model to reduce the interference of outlier to the model [Yu et al., 2025]. By combining mathematical statistical models with machine learning inversion models, it has shown good applicability in inverting the concentration of water quality indicators in coal mining subsidence waters [Sarigai et al., 2021]. The currently constructed inversion model of water quality index of subsidence waters is only applicable to specific research waters, and each inversion algorithm is still limited by season, geographical location, and type of water body, and the extrapolation adaptability of the inversion algorithm is also limited. In the future, it is necessary to break the above limitations of the model, build an inversion model with strong universality[Liang et al., 2025], further integrate multi-source remote sensing data, combine artificial intelligence, deep learning and other methods to build a high-precision inversion model of water quality parameters in subsidence waters, improve the accuracy of remote sensing monitoring, and at the same time strengthen the analysis of the inherent mechanism of machine learning algorithms. Conclusion This paper takes the coal mining subsidence and stagnant water area of Huainan City as the research object, based on ground sampling data and Sentinel-2 satellite image data, Mathematical statistical regression model and three machine learning algorithms RF, DT and SVM are used to construct an inversion model of the water quality parameters of coal mining subsidence waters in the four quarters of 2024, and its accuracy is systematically evaluated. The effects of mathematical statistical regression model and machine learning model on the accuracy of the inversion model of four water quality parameters TN, TP, NH 4 + -N and Chl-a are compared. Through the analysis of the correlation between the water quality parameters and the reflectance of the Sentinel-2 image band, it is found that the correlation is not high, and the correlation coefficient is within 0.5. The mathematical statistical regression model only uses a small amount of band information for regression modeling, and does not fully consider the combined information of a large number of related bands, so the accuracy of the statistical regression algorithm is low, R 2 is mostly lower than 0.5.The methods of training set fitting and test set verification are further adopted, and the band reflectance is used as the input characteristic variable to construct three machine learning regression models of RF, DT and SVM. The research results show that the inversion accuracy of the machine learning model is higher than that of the mathematical statistical regression model. The determination coefficient R 2 of the inversion model of water quality parameters in the four quarters is mostly above 0.8, and the performance of the machine learning regression model has been verified in samples of different water quality parameters in the four quarters. Improve the inversion accuracy of water quality parameters in coal mining subsidence waters, and provide more accurate data support for water quality monitoring and management in coal mining subsidence waters. Declarations Authors and Affiliations State Key Laboratory for Safe Mining of Deep Coal Resources and Environment Protection, Anhui University of Science and Technology, Huainan, China, 232001 Haitao Wu, Email: [email protected] . Corresponding author: Ying Liu, Email: [email protected] . Yuzhi Zhou, Email: [email protected] Gao, Email: [email protected] &Xiaoyang Chen, Email: [email protected] . School of Earth and Environment, Anhui University of Science and Technology, Huainan, China, 232001 Haitao Wu, Ying Liu, Yuzhi Zhou, Yanxue Gao&Xiaoyang Chen Corresponding author Correspondence to Ying Liu. Email: [email protected] Ethics declarations Competing interests The authors declare no competing interests. Funding This work was supported by Supported by [The Opening Foundation of Anhui Province Engineering Research Center of Water and Soil Resources Comprehensive Utilization and Ecological Protection in High Groundwater Mining Area; and Project entrusted by Ping ’an Coal Mining Engineering Technology Research Institute Co., Ltd. (HNKY-PGJS-2023-228); the National Natural Science Foundation of China (52204181)](Grant numbers [HNKY-PGJS-2023-228] and [52204181]) Author Contribution HW: Writing – original draft, Visualization, Software, Investigation, Data curation,Methodology,Formal analysis.YL: Writing – review & editing, Writing – original draft, Funding acquisition, Supervision, Data curation,Formal analysis. Conceptualization.YZ: Data curation, Investigation, Writing – original draft. YG: Data curation,Writing – original draft.YL: Validation, Writing – original draft. XC: Conceptualization, Funding acquisition, Writing – review & editing. Data availability No datasets were generated or analysed during the current study. References Alikas, K., & Kratzer, S. (2017). Improved retrieval of Secchi depth for optically-complex waters using remote sensing data. Ecological Indicators, 77: 218-227. https://doi.org/10.1016/j.ecolind.2017.02.007 Alnahit, A.O., Mishra, A.K., &Khan A.A. (2022). Stream water quality prediction using boosted regression tree and random forest models. Stochastic Environmental Research and Risk Assessment 36(9): 2661-2680. https://doi.org/10.1007/s00477-021-02152-4. Abayazid, H.O., &El-Adawy, A. (2019). Assessment of a non-optical water quality property using space-based imagery in Egyptian coastal lake. Journal of Water Resource and Protection, 11(6):713-727. https://doi.org/10.4236/jwarp.2019.116042 Chen, Y.C., Yuan, L., &Xu, Z. (2016). Investigation on using mining subsidence area to build a reservoir in Huainan Coal Mining Area. Journal of China Coal Society, 41(11): 2830-2835. https://doi.org/10.13225/j.cnki.jccs.2016.0135 Chen, G.Z., Wang X.M., Wang R.W., &Liu G.J. (2019). Health risk assessment of potentially harmful elements in subsidence water bodies using a Monte Carlo approach: An example from the Huainan coal mining area, China. Ecotoxicology and Environmental Safety, 171, 737-745, 0147-6513, https://doi.org/10.1016/j.ecoenv.2018.12.101. Cui, L.K., Song, X.Q., Yang, Y.W., Liu, J.X., LI, Z.N., Yun, L., &Zhang M.L. (2021). Doppler Lidar Retrieval of Particulate Matter Concentration Based on Statistical Regression Method. Acta Photonica Sinica, 50(12):1201005. https://doi.org/10.3788/gzxb20215012.1201005 Chen, R.K., Kang, J., Zhao, Y.X., Guo, Y.Q., Xu, X.L., &Wang, Y.G. (2025). Spatial and temporal evolution of water quality in the Yangtze River Basin from 2003 to 2024. China Environmental Science 2025,45(4):2171~2182. https://doi.org/10.19674/j.cnki.issn1000-6923.20250217.003 Caballero, C.B., Martins, V.S., Paulino, R.S., Butler, E., Sparks, E., Lima, T.M., &Novo, E.M.L.M. (2025). The need for advancing algal bloom forecasting using remote sensing and modeling: Progress and future directions. Ecological Indicators, 172:113244. https://doi.org/10.1016/j.ecolind.2025.113244 Chen, Y.C., Wu, H.T., Shen, L.P., Liu, Y., Xu, Y.F., Chen, X.Y., &Zhou, Y.Z. (2025). Research progress on remote sensing monitoring of water eutrophication indicators in coal mining subsidence water area. Mining Safety & Environmental Protection., 52(02):153-163. https://doi.org/10.19835/j.issn.1008-4495.20240059 Dai, X, Yang, X., Wang, M., Gao, Y., Liu, S.H., &Zhang, J.M. ( 2019). The Dynamic Change of Bosten Lake Area in Response to Climate in the Past 30 Years. Water, 12(1): 4. https://doi.org/10.3390/w12010004 Domitr, P., Wlostowski, M., Laskowski, R., &Jurkowski, R. (2023). Comparison of inverse uncertainty quantification methods for critical flow test. Energy, 2023, 263:125640. https://doi.org/10.1016/j.energy.2022.125640 Fan, K.X., Liu, Y., Zhang, X.Y., Chen, X.Y., Li, Y., Zhou, Y.Z., Shen, W.J., Tao, H.L., Gong, C.G., &Lei, S.G. (2024). Assessing the relative contribution of climate change and human activity factors to spatiotemporal distributions of sand fixation service in the Loess Plateau. GIScience & Remote Sensing, 62(1):2444630. https://doi.org/10.1080/15481603.2024.2444630 Gu, K., Zhang, Y.H., &Qiao, J.F. (2020). Random forest ensemble for river turbidity measurement from space remote sensing data. IEEE Transactions on Instrumentation and Measurement,69(11):9028-9036http://dx.doi.org/10.1109/tim.2020.2998615 He, T.T., Wu, X., Zhao, Y.L., Deng, X.Y., &Hu, Z.Q. (2020). Identification of waterlogging in Eastern China induced by mining subsidence: A case study of Google Earth Engine time-series analysis applied to the Huainan coal field. Remote Sensing of Environment, 242: 111742. https://doi.org/10.1016/j.rse.2020.111742 He, Y.H., Gong Z.J., Zheng Y.H., &Zhang Y.B. (2021). Inland Reservoir Water Quality Inversion and Eutrophication Evaluation Using BP Neural Network and Remote Sensing Imagery: A Case Study of Dashahe Reservoir. Water, 13(20): 2844-2844. https://doi.org/10.3390/w13202844 Jiang, Q.O., Xu, L.D., Sun, S.Y., Wang, M.L., &Xiao, H.J. (2021). Retrieval model for total nitrogen concentration based on UAV hyper spectral remote sensing data and machine learning algorithms-A case study in the Miyun Reservoir, China. Ecological Indicators, 124: 107356. http://dx.doi.org/10.1016/J.ECOLIND.2021.107356 Liu, Y., Li, J.S., Xiao C.C., Zhang F.F., &Wang, S.L. (2022). Inland water chlorophyll-a retrieval based on ZY- 102D satellite hyperspectral observations. Journal of Remote Sensing, 26(1):168- 178https://dx.doi.org/10.11834/jrs.20221244 Luo, J.H., Yang, J.Z.C., Duan, H.T., Lu L.R., Sun, J., &Xin, Y.Y. (2022). Research progress of aquatic vegetation remote sensing in shallow lake. Journal of Remote Sensing, 26(1):68- 76. https://dx.doi.org/10.11834/jrs.20221208 Liang, Y.H., Ding, F.Y., Liu, L., Yin, F., Hao, M.M., Kang, T.T., Zhao, C.P., Wang, Z.T., &Jiang, Dong. (2025). Monitoring water quality parameters in urban rivers using multi-source data and machine learning approach. Journal of Hydrology, 648:132394. https://doi.org/10.1016/j.jhydrol.2024.132394 Mccullough, M.I., Loftin, S.C., &Sader A.S. (2013). Lakes without Landsat? An alternative approach to remote lake monitoring with MODIS 250m imagery. Lake and Reservoir Management, 29(2):89-98. https://doi.org/10.1080/10402381.2013.778926 Mcfeeters, S.K. (1996). The use of the Normalized Difference Water Index (NDWI) in the delineation of open water features. International journal of remote sensing, 17(7): 1425-1432. https://doi.org/10.1080/01431169608948714 Mohammadi, B., &Mehdizadeh, S. (2020). Modeling daily reference evapotranspiration via a novel approach based on support vector regression coupled with whale optimization algorithm. Agricultural Water Management,2020,237:106145. https://doi.org/10.1016/j.agwat.2020.106145 Rahman, M.M., Shults, R., Surya, P.T., Tiwari, S.P., Arshad, A., Usman, M., Raihan, A., &Ishraque, M.F. (2025). Review on sea water quality (SWQ) monitoring using satellite remote sensing techniques (SRST). Marine Pollution Bulletin, 217:118108. https://doi.org/10.1016/j.marpolbul.2025.118108. Muhoyi, H., Gumindoga, W., Mhizha, A., N. Misi, S., &Nondo, N. (2025). Remote sensing application in compliment to in-situ monitoring of water quality: Lower Manyame Sub-catchment. Zimbabwe. Scientific African, 27: e02551. https://doi.org/10.1016/j.sciaf.2025.e02551 Pan, W.B., Yu, F., Li, J.L., Li C.Q., &Ye, M. (2025). Quantification of chlorophyll-a in inland waters by remote sensing algorithm based on modified equivalent spectra of Sentinel-2. Ecological Informatics, 87:103061. https://doi.org/10.1016/j.ecoinf.2025.103061 Qin, H., Su, Q., Khu, S.T., &Tang N. (2014). Water quality changes during rapid urbanization in the Shenzhen River Catchment: An integrated view of socio-economic and infrastructure development. Sustainability, 6(10):7433-7451. https://doi.org/10.3390/su6107433 Rinaldi, E., Nardelli, B.B., Volpe, G., &Santoleri, R. (2014). Chlorophyll distribution and variability in the Sicily Channel (Mediterranean Sea) as seen by remote sensing data. Continental Shelf Research, 77:61-68. https://doi.org/10.1016/j.csr.2014.01.010 Sanmiquel, L., Bascompta, M., Vintro, C., &Yubero T. (2018). Subsidence management system for underground mining. Minerals, 8(6): 243. https://doi.org/10.3390/min8060243 Shi, K., Zhang, Y., Zhu, G., Qin, D.Q., &Pan, D.L. (2018). Deteriorating water clarity inshallow waters: Evidence from long term MODIS and in-situ observations. International Journal of Applied Earth Observation and Geoinformation, 68: 287-297. https://doi.org/10.1016/j.jag.2017.12.015 Sarafaraz, J., Ahmadzadeh, K.F., Mahmoudi, K.J., Habibzadeh N. (2024). Predicting river water quality: An imposing engagement between machine learning and the QUAL2Kw models (case study: Aji-Chai, river, Iran). Results in Engineering, 21: 101921. https://doi.org/10.1016/j.rineng.2024.101921 Shi, K., Zhang, Y.L., Zhu, G.W., Liu, X.H., Zhou, Y.Q., Xu, H., Qin, B.Q., Liu, G., &Li, Y.M. (2015). Long-term remote monitoring of total suspended matter concentration in Lake Taihu using 250 m MODIS-Aqua data. Remote Sensing of Environment, 164:43- 56. https://doi.org/10.1016/j.rse.2015.02.029 Song, W., A, Y.L., Wang, Y.T., Fang, Q.Q., &Tang, R. (2024). Study on remote sensing inversion and temporal-spatial variation of Hulun lake water quality based on machine learning. Journal of Contaminant Hydrology, 260: 104282, https://doi.org/10.1016/j.jconhyd.2023.104282 Sarigai., Yang, J., Zhou, A., Han, L.S., Li, Y., &Xie, Y.C. (2021). Monitoring urban black-odorous water by using hyperspectral data and machine learning. Environmental Pollution, 269:116166. https://doi.org/10.1016/j.envpol.2020.116166 Salem, S.I., Higa, H., Kim, H., Kobayashi, H., Oki, K., &Oki, T. (2017). Assessment of chlorophyll-a algorithms considering different trophic statuses and optimal bands. SENSORS, 17(8):1746. https://doi.org/10.3390/s17081746. Vasistha, P., Ganguly, R,. 2020. Water quality assessment of natural lakes and its importance: An overview. Materials Today. Proceedings, 32: 544-552. https://doi.org/10.1016/j.matpr.2020.02.092 Wang, C.L., Shi, K.Y., Ming, X., Cong, M.Q., Liu, X.Y., &Guo, W.J. (2022). A Comparative Study of the COD Hyperspectral Inversion Models in Water Based on the Maching Learning. Spectroscopy and Spectral Analysis, 42(08):2353-2358. https://doi.org/10.3964/j.issn.1000-0593(2022)08-2353-06 Wang, S.M., &Qin, B.Q. (2023). Research Progress on Remote Sensing Monitoring of Lake Water Quality Parameters. Environmental Science, 44(03):1228-1243. https://doi.org/10.13227/j.hjkx.202203285 Xiong, J.F., Lin, C., Cao, Z.G., Hu, M.Q., Xue, K., Chen, X., &Ma, R.H. (2022). Development of remote sensing algorithm for total phosphorus concentration in eutrophic lakes: Conventional or machine learning. Water Research, 215:118213. https://doi.org/10.1016/j.watres.2022.118213 Xu, Q., Guo, P., Jin, M., &Qi, J.F. (2021). Multi-scenario landscape ecological risk assessment based on Markov–FLUS composite model. Geomatics, Natural Hazards and Risk, 12(1): 1449-1466. https://doi.org/10.1080/19475705.2021.1931478 Yuan, Q.Q., Shen H.F., Li T.W., Li, Z.W., Li, S.W., Jiang Y., Xu, H.Z., Tan, W.W., Yang, Q.Q., Wang, J.W., Gao J.H., &Zhang, L.P. (2020). Deep learning in environmental remote sensing: Achievements and challenges. Remote Sensing of Environment, 241: 111716. https://doi.org/10.1016/j.rse.2020.111716 Yuan Z.H., LI Q.H., He Y., Ma X.Y., Han M.S., Sun R.G., Zhang H.J. (2019). Variation and evaluation of nutrients in Baihua Reservoir in Guizhou Plateau based on Bayesian method,2014-2018.Journal of Lake Sciences, 31(06):1623-1636. https://doi.org/10. 18307 /2019. 0602 Yu, Y.S., Ding, P., Bian, H.Y., Wei, J.S., &Zhang, H. (2025). Water quality parameters inversion based on multispectral remote sensing. Journal of Water Process Engineering, 73:107707. https://doi.org/10.1016/j.jwpe.2025.107707 Zhang, B., Li, J.S., Shen, Q., Wu, H.Y., Zhang, F.F., Wang, S.L., Yao, Y., Guo, L.N., &Yin, Z.Y. (2021). Recent research progress on long time series and large scale optical remote sensing of inland water. National Remote Sensing Bulletin, 25(1): 37-52. https://doi.org/10.11834/jrs.20210570 Zhang, Y., &Cao, J. (2016). Decision Tree Algorithms for Big Data Analysis. Computer Science,43(S1):374-379+383. https://doi.org/10.11896/j.issn.1002-137X.2016.6A.089 Zhou, D.M., &Wang, D.Y. (2015). Quantitative Estimation of Chlorophyll-a and Suspended Solids in Taihu Based on Landsat TM. Environmental Science & Technology, 38(S1):362-367. http://dx.doi.org/10.3969/j.issn.1003-6504.2015.6P.075 Zeng, F.X., Song, C.Q., Cao, Z.G., Xue K., Lu, S.L., Chen, Tan., &Liu, Kai. (2023). Monitoring inland water via Sentinel satellite constellation: A review and perspective. ISPRS Journal of Photogrammetry and Remote Sensing, 204:40-361. https://doi.org/10.1016/j.isprsjprs.2023.09.011 Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6914046","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":479787375,"identity":"3fe67d8a-2d40-4a6e-8c08-75651e5d8b99","order_by":0,"name":"Haitao Wu","email":"","orcid":"","institution":"Anhui University of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Haitao","middleName":"","lastName":"Wu","suffix":""},{"id":479787376,"identity":"83a77e0d-e904-47b0-adfb-4b5a96fa670b","order_by":1,"name":"Ying Liu","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA7klEQVRIiWNgGAWjYBACPmYgkcDAYMfYzHzw8Y8KCTl5QlrYoFqSmdvbko0ZzlgYGzYQ0gKlGdt7zphJM7ZVJDIcIKSFncdM4uGOWmbeGWkJ0oXzJBIYG5gfPrqB12FALYlnjvNJzkg+YDxzm0QeOwObsXEOQS1tx5gNgbYk8G6TKGZs4GGTJkYL4/4bOQYHeOdIJDYcIE5LDWNjzxnDZt4GorSwFVskth1IZgQGMuOMYxLGhs0E/MLPf3jjzZ9tdaCoPP7jQ02dnDx788PH+LQAAYsEA8NhJD4zfuVgJR8YGOoIKxsFo2AUjIKRCwB0BkhEqeRwBQAAAABJRU5ErkJggg==","orcid":"","institution":"Anhui University of Science and Technology","correspondingAuthor":true,"prefix":"","firstName":"Ying","middleName":"","lastName":"Liu","suffix":""},{"id":479787377,"identity":"152b479e-2a54-45e3-81cc-19f24ec9d030","order_by":2,"name":"Yuzhi Zhou","email":"","orcid":"","institution":"Anhui University of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Yuzhi","middleName":"","lastName":"Zhou","suffix":""},{"id":479787378,"identity":"c2be083f-cb15-4422-8a83-67aa46bdb43b","order_by":3,"name":"Yanxue Gao","email":"","orcid":"","institution":"Anhui University of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Yanxue","middleName":"","lastName":"Gao","suffix":""},{"id":479787379,"identity":"3be526d6-6bff-4bc4-8dbe-402f460d7261","order_by":4,"name":"Yong Li","email":"","orcid":"","institution":"Anhui Huayin electromechanical Co. ltd","correspondingAuthor":false,"prefix":"","firstName":"Yong","middleName":"","lastName":"Li","suffix":""},{"id":479787380,"identity":"446c74b9-7f05-403d-90a2-c80c37de1d8f","order_by":5,"name":"Xiaoyang Chen","email":"","orcid":"","institution":"Anhui University of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Xiaoyang","middleName":"","lastName":"Chen","suffix":""}],"badges":[],"createdAt":"2025-06-17 11:53:20","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6914046/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6914046/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":86008220,"identity":"21ac2d4d-a96f-4b68-845a-5e7adc6dfa85","added_by":"auto","created_at":"2025-07-04 09:17:06","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":2897569,"visible":true,"origin":"","legend":"\u003cp\u003eOverview of the research area and distribution of sampling points\u003c/p\u003e","description":"","filename":"image1.png","url":"https://assets-eu.researchsquare.com/files/rs-6914046/v1/ed258d9a8ae9ef47f943a4cc.png"},{"id":86008218,"identity":"e24ad527-7d45-4fad-a854-c1e607f446e5","added_by":"auto","created_at":"2025-07-04 09:17:06","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":247504,"visible":true,"origin":"","legend":"\u003cp\u003eMachine learning principle (a) Decision tree, each path from the root node to the leaf node represents a decision rule; (b) Support vector machine, the decision boundary is the maximum margin hyperplane for solving the learning sample; (c) Random forest, Multiple regression results are obtained the average value is used as the final result.\u003c/p\u003e","description":"","filename":"image2.png","url":"https://assets-eu.researchsquare.com/files/rs-6914046/v1/32a4ff49758b2d2abf6a5e13.png"},{"id":86009198,"identity":"b1c3fb97-3779-4e01-8d93-4b3fd721e887","added_by":"auto","created_at":"2025-07-04 09:25:06","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":521894,"visible":true,"origin":"","legend":"\u003cp\u003eGraph of remote sensing reflectance of sampling points in different quarters\u003c/p\u003e","description":"","filename":"image3.png","url":"https://assets-eu.researchsquare.com/files/rs-6914046/v1/c10d27af076c9d93e7a37e5a.png"},{"id":86008235,"identity":"1787a891-8575-4032-93a9-69e9900cb27e","added_by":"auto","created_at":"2025-07-04 09:17:06","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":3895553,"visible":true,"origin":"","legend":"\u003cp\u003eAnalysis of the correlation between water quality parameters and reflectance of single-band images in different quarters\u003c/p\u003e","description":"","filename":"image4.png","url":"https://assets-eu.researchsquare.com/files/rs-6914046/v1/a96af3a94f95bad4763bd5db.png"},{"id":86008228,"identity":"f4751326-ca06-4466-95da-b2d0ba6e2fdf","added_by":"auto","created_at":"2025-07-04 09:17:06","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":1524053,"visible":true,"origin":"","legend":"\u003cp\u003eScatter plot of comparison of measured and predicted values of water quality parameters in different seasons; (1) Accuracy of TN concentration inversion prediction set and verification set (2) Accuracy of TP concentration inversion prediction set and verification set (3) Accuracy of NH4+-N concentration inversion prediction set and verification set (4) Chl-a Concentration inversion prediction set and verification set accuracy; (a) spring water quality parameter inversion optimal model (b) summer water quality parameter inversion optimal model (c) autumn water quality parameter inversion optimal model (d) winter water quality parameter inversion optimal model\u003c/p\u003e","description":"","filename":"image5.png","url":"https://assets-eu.researchsquare.com/files/rs-6914046/v1/1d36adf9df312cc54a09e902.png"},{"id":87111380,"identity":"83de872a-0f84-4bec-91c4-74bdfa940530","added_by":"auto","created_at":"2025-07-19 18:31:40","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":9186444,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6914046/v1/47830d93-61a3-4a4b-bcc7-ebc46d453244.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Research on The Inversion Model of Water Environment Parameters of Coal Mining Subsidence Waters Based on Machine Learning","fulltext":[{"header":"Introduction","content":"\u003cp\u003eAs a typical high-diving coal-grain composite area in the eastern plains of China, its shallow groundwater is buried at a shallow depth and rich in groundwater resources. With the long-term large-scale mining of coal underground, soil subsidence is caused [He et al., 2020], It has led to the destruction of the initial water system structure on the surface, the alteration of the landscape, and the disturbance of surface run-off, forming permanent or seasonal stagnant water in different landscapes such as reservoirs, lakes, wetlands, or plain reservoirs [Chen et al., 2016; Sanmiquel et al., 2018].However, the water environment system in this region is extremely vulnerable to pollution and destruction due to agricultural irrigation, industrial production, and fishery activities. For example, the continuous import of exogenous nitrogen and phosphorus nutrients into the above-mentioned waters has led to increasingly serious pollution, and eutrophic cyanobacteria blooms frequently, causing imbalances in the structure of aquatic biomes and ecosystems. A series of environmental problems such as functional degradation eventually formed a heterotrophic ecosystem dominated by algae, which seriously affected the health and stability of the water ecosystem [Chen et al., 2018]. The current improvement of water quality monitoring technology and the improvement of the accuracy of water quality parameter inversion will play a vital role in protecting water resources, preventing and controlling water pollution, and strengthening the effective management of water resources.\u003c/p\u003e \u003cp\u003eExisting research mostly uses traditional on-site regular collection of water samples, and then chemical testing and analysis through the laboratory. This method has the advantages of many detectable parameters and high accuracy, but the sampling process is difficult, the economic cost is high, the timeliness is low, and it is susceptible to factors such as the geographical location of the mining area and bad weather [Vasissha et al., 2020], especially the water quality data of specific sampling sites can only represent the water quality information of the sample site, and cannot objectively reflect the water quality parameters of the entire water area, and it is difficult to obtain the water quality status of the entire water area in a timely, fast and accurate manner [He et al., 2021]. In recent years, science and technology have advanced by leaps and bounds, and a large number of advanced satellites at home and abroad (such as Landsat, MODIS, GF-1 and HJ-1, etc.) have been launched one after another, providing a large number of remote sensing data sources for monitoring the evolution and trend of water quality in coal mining subsidence waters [Mccullough et al., 2013; Shi et al., 2018; Alikas et al., 2017]. Satellite monitoring has the characteristics of high efficiency and wide monitoring range [Fan et al., 2024]. It can monitor water quality pollution in coal mining subsidence waters in a timely manner, and can realize rapid identification and diagnosis of water environment ecology, which greatly enriches the results of remote sensing inversion in related fields [Zhang et al., 2021]. It can also monitor the dynamic changes and distribution of water quality in time and space for a long time, play an increasingly important role in water quality monitoring and early warning, and provide scientific basis and technical support for the ecological management and protection of the water environment in coal mining subsidence waters [Qin et al., 2014]. In recent years, most of the research on remote sensing inversion at home and abroad has focused on heavily polluted lakes and rivers [Wang et al., 2023]. At present, remote sensing monitoring of water quality has rarely been involved in the research on water quality inversion in coal mining subsidence waters, and there is a lack of practical application and large-scale deployment experience. There is an urgent need to explore the rapid, efficient and green use of this type of water. Accurate inversion method for water quality monitoring.\u003c/p\u003e \u003cp\u003eWith the vigorous development of remote sensing technology, many scholars at home and abroad have gradually in-depth research on water quality remote sensing inversion algorithms. Traditional mathematical statistical regression algorithms are simple to operate and have good explanatory properties, but the linear fitting model is single and the fitting accuracy is usually low, which is not suitable for solving complex nonlinear problems. Machine learning algorithms are widely used in the field of remote sensing inversion, which provides new opportunities for the further development of water quality parameter monitoring and prediction [Yuan et al., 2020]. Machine learning algorithms can use sample data to realize autonomous optimization and iterative evolution of empirical models, and better mine potential connections between data, thereby improving the analytical accuracy and generalization performance of weak correlation relationships, so that it can show significant advantages in processing data regression tasks with low correlation. There are many kinds of machine learning. The common applications of machine learning regression algorithms include decision trees, random forests, support vector machines (SVM), and BP neural networks. In recent years, the application of machine learning algorithms has greatly improved the inversion accuracy of non-optically active water quality parameters such as TN、TP、NH\u003csub\u003e4\u003c/sub\u003e\u003csup\u003e+\u003c/sup\u003e-N and DO [Yuan et al., 2019].Jamal et al. used SVM to simulate the quality of the Ajchai River in Iran, proving that SVM showed higher R\u003csup\u003e2\u003c/sup\u003e and lower RMSE and MAE values at all simulation sites [Sarafaraz et al., 2024], Xiong et al. used traditional mathematical statistical regression models and machine learning models to study the TP inversion of Taihu Lake, respectively. It was found that the machine learning model has higher accuracy [Xiong et al., 2022].Wang Chunling and others constructed a COD high-precision inversion model based on four machine learning algorithms: linear regression, Random Forest, AdaBoost, and XGBoost, which provides new methods and ideas for the establishment of machine learning inversion models in the field of hyperspectral water quality monitoring [Wang et al., 2022]. Although the inversion results of these water quality parameters have their own characteristics, most of the research objects are large-area water areas such as lakes and rivers, and the accuracy of the inversion model needs to be improved. In the future, it is necessary to further construct an inversion model for the different water quality parameters of a particular water area.\u003c/p\u003e \u003cp\u003eTherefore, this study focuses on coal-mining subsidence waters. Based on the spectral characteristics of Sentinel-2 remote sensing images combined with simultaneous measured water quality concentration data, mathematical statistical regression methods and machine learning-based methods are used to model the water quality of coal-mining subsidence waters by remote sensing inversion. It aims to break through the spatiotemporal limitations of traditional concentration inversion of water quality parameters, Combined with machine learning algorithms, a more accurate and efficient water quality parameter inversion model is established, which breaks through the temporal and spatial limitations of traditional water quality monitoring, and provides new opportunities for the further development of water quality parameter concentration monitoring and prediction.\u003c/p\u003e\n\u003ch3\u003eOverview of the research area\u003c/h3\u003e\n\u003cp\u003eThe research area is located in Panji District, Huaihua City, East China, at the junction of the Jianghuai hills and the Huanghuaihai plain (116°21′05″E-117°12′30″E, 31°54′08″N-33°00′26″N), with the Panji mining area situated in the middle reaches of the Huaihe River basin. It was built and thrived because of coal, and it is one of the 14\u0026nbsp;billion-ton coal bases and 6 major coal-electricity integration bases confirmed by China [He et al., 2020]. The terrain in the territory is flat and there are many river networks. It is a resource-based town dominated by coal, electric power and chemical industry,is presented in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. It belongs to the warm temperate semi-humid monsoon climate, which has the characteristics of four distinct seasons, mild climate and moderate precipitation. Precipitation is mainly concentrated in summer (June-August), and precipitation accounts for more than half of the year. It is prone to short-term heavy precipitation or heavy rains. There is less precipitation in winter and the climate is relatively dry. In this study, three typical small and micro-coal mining subsidence waters in this area were selected as the research objects. The main utilization methods of coal mining subsided waters are aquaculture and photovoltaic power generation. At the same time, the water environment is affected by various factors such as industrial waste discharge, direct urban and rural sewage discharge, and surface run-off pollution, and the water quality is in poor condition.\u003c/p\u003e"},{"header":"Research methods and data processing","content":"\u003ch2\u003eCollection of water quality parameters\u003c/h2\u003e\u003cp\u003eTaking into account the spatial and seasonal differences in surface water quality, the grid distribution method is used to set up 68 effective sampling points in the research area, and the overall spatial layout is relatively balanced and reasonably covers the entire water area. A total of 4 field samples were taken in the study. The water samples were collected on March 11, 2024, June 12, 2024, September 22, 2024, and December 24, 2024 in four different quarters, spring, summer, autumn and winter. A total of 272 water samples were collected. The weather was clear during the sampling period., The wind speed is small and the water surface is calm. At the same time, GPS is used to record the geographic location of each sampling point. Collect water samples 0.5m below the surface water surface of the subsided water area, load the collected water samples into polyethylene bottles, and according to the testing requirements of different water quality indicators, take them back to the laboratory after corresponding treatment on site, and detect the concentration of four non-optical water quality parameters of TN, TP, NH\u003csub\u003e4\u003c/sub\u003e\u003csup\u003e+\u003c/sup\u003e-N and Chl-a in accordance with the methods specified in the national standard. The manual monitoring data is divided into training samples and test samples in a 4:1 ratio, which are used for the construction of the model and the evaluation of the accuracy of the model inversion. The detection method refers to the standards of HJ-636-2012, GB 11893-89, HJ 357–2009 and HJ 897–2017 in the \"National Environmental Protection Standards of the People's Republic of China\". The total nitrogen concentration is measured by alkaline potassium persulfate ultraviolet spectrophotometry; the total phosphorus concentration is measured by ammonium metabolite spectrophotometry; and the ammonia nitrogen concentration is measured by alkaline potassium persulfate ultraviolet spectrophotometry. Naer's reagent spectrophotometry; chlorophyll a concentration determination using spectrophotometry.\u003c/p\u003e\u003ch3\u003eSatellite data acquisition and preprocessing\u003c/h3\u003e\u003cp\u003eIn this study, we selected data from the European Space Agency (ESA). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.esa.int\u003c/span\u003e\u003cspan address=\"https://www.esa.int\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) Sentinel-2 images downloaded from the same period as the field water sample collection and with less than 10% cloud cover. Sentinel-2 satellite consists of two satellites, Sentinel-2A and Sentinel-2B. They can operate in collaboration to shorten the revisit time and thus provide more intensive continuous observation data. The portable multispectral imager (MSI) can provide data in 13 frequency bands, including visible light, near-infrared (near-infrared), and near-infrared (near-infrared).NIR), short-wave infrared (SWIR) and other bands(Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e).The main data products of Sentinel-2 are Level-1B, Level-1C, and Level-2A. The L2A-level remote sensing data selected in this study has been radiated corrected and atmospheric corrected, and has higher brightness and contrast. You only need to use the SNAP software downloaded from the ESA's official website for remote sensing.The data is resampled to 10 m, the layer stacking tool in ENVI 5.3 software is used for image band fusion, the waters of the research area are roughly cropped using ROI, etc. Processing, and the water body reflectance information of the sampling point is extracted.\u003c/p\u003e\u003cdiv class=\"gridtable\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eSentinel-2 multispectral information for each band\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e\u003ccolgroup cols=\"4\"\u003e\u003c/colgroup\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBand\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCenter wavelength (λ/nm)\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eWave width (\u003cem\u003eλ\u003c/em\u003e /nm)\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eSpatial resolution (m)\u003c/p\u003e \u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eB1-Coastalaerosol\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e443\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e20\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e60\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eB2-Blue\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e490\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e65\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eB3-Green\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e560\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e35\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eB4-Red\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e665\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e30\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eB5-Vegetation Red Edge\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e705\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e15\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e20\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eB6-Vegetation Red Edge\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e740\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e15\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e20\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eB7-Vegetation Red Edge\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e783\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e20\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e20\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eB8-NIR\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e842\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e115\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eB8A-Narrow NIR\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e865\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e20\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e20\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eB9-Narrow NIR\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e945\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e20\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e60\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eB10-Narrow NIR\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e1375\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e30\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e60\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eB11-SWIR\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e1610\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e90\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e20\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eB12-SWIR\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e2190\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e180\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e20\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/table\u003e\u003c/div\u003e\u003cp\u003eIn order to accurately obtain the sensitive bands with high correlation between water bodies in the studied waters, the Pearson correlation coefficient is used in this study to extract the corresponding characteristic bands of the sampling points in the studied waters. The reflectance information of the spectral band and the concentration of each water quality parameter are used as different variables. The correlation value ranges from − 1 to 1. The closer the value is to 1 or -1, the stronger the correlation. Refer to the following formula to calculate the correlation coefficient between them.\u003c/p\u003e\u003cdiv id=\"Equa\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equa\" name=\"EquationSource\"\u003e\n$$\\:\\begin{array}{c}r=\\frac{{\\sum\\:}_{i=1}^{n}\\left({X}_{i}-\\stackrel{̄}{X}\\right)\\left({Y}_{i}-\\stackrel{̄}{Y}\\right)}{\\sqrt{{\\sum\\:}_{i=1}^{n}({X}_{i}-\\stackrel{̄}{X}{)}^{2}}\\sqrt{{\\sum\\:}_{i=1}^{n}({Y}_{i}-\\stackrel{̄}{Y}{)}^{2}}}\\#\\text{(}\\text{1}\\text{)}\\end{array}$$\u003c/div\u003e\u003c/div\u003e\u003cp\u003eWhere \u003cem\u003er\u003c/em\u003e is the correlation coefficient and \u003cem\u003en\u003c/em\u003e is the total number of samples; \u003cem\u003eXi\u003c/em\u003e represents the spectral value of the band combination data, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\overline{X}\\)\u003c/span\u003e\u003c/span\u003e represents the average value of the spectral values of the band combination data; \u003cem\u003eYi\u003c/em\u003e represents the measured value of the water quality parameters, and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\overline{Y}\\)\u003c/span\u003e\u003c/span\u003e represents the average value of the measured values of the water quality parameters.\u003c/p\u003e\u003ch3\u003eExtraction of water bodies\u003c/h3\u003e\u003cp\u003eIn the study of remote sensing and inversion of water quality, the target area usually focuses on the water body itself. In order to optimize the efficiency of the inversion process and reduce the interference of non-water body elements such as photovoltaic and land areas in the research area to the model, it is necessary to carry out accurate boundary identification of coal mining subsidence waters. The current common water body extraction techniques include single-band method, multi-band combination method, vegetation index method and water body index method. In this study, the normalized water body index (NDWI) [Mcfeeters et al., 1996] method was used to enhance the characteristics of the water body, while weakening other local information such as vegetation and soil. The principle is based on the unique spectral response law of the water body in the visible-near-infrared band: Through the calculation of the normalized difference between the green band (reflection valley area of water bodies) and the near-infrared band (reflection peak area of water bodies), the spectral separability between water bodies and non-water bodies can be significantly expanded, thereby effectively suppressing background noise and strengthening the signal characteristics of water bodies, to achieve high-precision water range extraction.\u003c/p\u003e\u003cdiv id=\"Equb\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equb\" name=\"EquationSource\"\u003e\n$$\\:\\begin{array}{c}NDWI=\\frac{Green-NIR}{Green+NIR}\\#\\text{(}\\text{2}\\text{)}\\end{array}$$\u003c/div\u003e\u003c/div\u003e\u003cp\u003eWhere: \u003cem\u003eGreen\u003c/em\u003e corresponds to the B3 band of the Sentinel-2 image; \u003cem\u003eNIR\u003c/em\u003e corresponds to the B8 band of the Sentinel-2 image.\u003c/p\u003e\u003cp\u003eThere are many methods for extracting water bodies from Sentinel-2 satellite remote sensing image data, including artificial visual interpretation [Dai et al., 2019], supervised classification, unsupervised classification, single-band threshold method and multi-band threshold method [Xu et al., 2021]. There are many rivers and lakes in the research area, and the submerged waters are relatively scattered, so the effect of artificial interpretation is not good. Therefore, combined with artificial visual interpretation, select the appropriate frequency band, determine the threshold value of the water body, use the single-band threshold method and the inter-spectral relationship analysis method to extract the water body, and establish a model of the spectral characteristics of the water body. Through multi-band synthesis, the single-band NDWI of spring, summer, autumn and winter quarters is combined into a multi-band. Based on the decision tree classification, the spring is used as the reference quarter, which is divided into two categories: water bodies and non-water bodies, of which NDWI \u0026gt; 0 is the water body, NDWI ≤ 0 is the non-water body, and the water body information of each quarter is extracted in turn.\u003c/p\u003e\u003ch3\u003ePrinciples of inversion model algorithm\u003c/h3\u003e\u003ch2\u003eMathematical statistical regression model\u003c/h2\u003e\u003cp\u003eMathematical statistical regression method is a traditional statistical method commonly used in the study of water quality inversion, especially suitable for quantitative inversion of optically active substances in water bodies. This method can not only reveal the correlation between variables through mathematical expressions (empirical formulas), but can also be used to predict or regulate the trend of changes in target variables [Cui et al., 2021]. Prediction is made by establishing an explicit mathematical relationship between independent variables (such as spectral bands, derived exponents) and dependent variables (water quality parameters). Such models have clear physical interpretability and high computational efficiency, and are suitable for scenarios where the linear relationship is significant and the multiple collinearities between variables are low. However, its adaptability to nonlinear relationships, high-order interaction effects, and complex noise is weak, which can easily lead to limited extrapolation capabilities, especially in the modern remote sensing inversion of water quality driven by multi-source heterogeneous data, it is difficult to fully capture the dynamic coupling mechanism of spectral characteristics and pollutant concentration, which has become its main use in high-precision modeling. bottleneck. In this study, a 4:1 ratio is used to divide the data set into training sets and verification sets. Based on the experimental data, single-band or band combinations with high correlation are selected, and various mathematical and statistical regression models such as unary linearity, exponential, logarithmic, power function, and quadratic polynomial are constructed.\u003c/p\u003e\u003ch3\u003eMachine learning algorithms\u003c/h3\u003e\u003cp\u003eMachine learning regression algorithms are a class of supervised learning methods designed to make numerical predictions by establishing the mapping relationship between input variables and continuous target variables. Its core is to minimize the deviation between the predicted value and the true value by optimizing the loss function (such as mean square error, absolute error). Common algorithms include linear regression (based on linear combination modeling of features, suitable for high-interpretability scenarios), decision tree regression (dividing feature space through tree structure to deal with nonlinear relationships), support for linear regression (based on linear combination modeling of features, suitable for high-interpretability scenarios), decision tree regression (dividing feature space through tree structure to deal with nonlinear relationships), and support for linear regression (based on linear combination modeling of features, suitable for high-interpretability scenarios).Vector regression (robust prediction based on kernel function mapping high-dimensional space), integrated regression methods (such as random forests, gradient lifting trees, reducing variance and deviation through multi-model fusion), and neural network regression (using deep nonlinear structures to fit complex data patterns). In addition, regularization techniques (such as ridge regression and Lasso) alleviate overfitting problems by constraining model complexity.\u003c/p\u003e\u003cp\u003eThe machine learning models used in this study for the inversion of water quality parameter concentration in coal mining subsidence waters include DT, RF, and SVM. The above three machine learning algorithms are used to construct a regression model of the concentration of water quality parameters in different quarters, is presented in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e. The model construction is based on MATLAB R2022a software, and the hyperparameters of the inversion model are finally determined after repeated training and screening based on the water quality parameter concentration and spectral reflectance data sets.\u003c/p\u003e\u003cp\u003e(1) Decision tree\u003c/p\u003e\u003cp\u003eDecision tree regression is a non-parametric supervised learning algorithm that constructs a tree structure model through recursive binary feature space to predict continuous target variables. Its core is to select the optimal features and their segmentation points through variance minimization criteria (such as mean square error and average absolute error), divide the data into homogeneous subsets, and finally use the mean of the samples in the output area of the leaf node as the predicted value. The algorithm has the advantages of intuitive interpretability (visual tree structure), no need for data distribution assumptions, can handle nonlinear relationships and missing values, and is not sensitive to feature scales [Zhang et al., 2016].\u003c/p\u003e\u003cp\u003e(2) Random Forest\u003c/p\u003e\u003cp\u003eRandom forest regression is a nonlinear regression algorithm based on integrated learning that improves the generalization ability and robustness of the model by constructing multiple decision trees and aggregating their prediction results (usually averaged).Its core mechanism combines Bootstrap sampling (there are put-back sampling to generate diverse training subsets) and random feature selection (only some features are considered when each tree is split) to reduce model variance and suppress overfitting through dual randomness.Compared with a single decision tree, random forest effectively balances deviation and variance through “collective wisdom”, can process high-dimensional, nonlinear and noisy data, and is insensitive to missing values and feature scales, and supports parallelization training [Domitr et al., 2023; Alnahit et al., 2022]. the principle can be Expressed as:\u003c/p\u003e\u003cdiv id=\"Equc\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equc\" name=\"EquationSource\"\u003e\n$$\\:\\begin{array}{c}\\widehat{y}\\left(x\\right)=\\frac{1}{T}{\\sum\\:}_{t=1}^{T}f\\left(x;\\theta\\:t\\right)\\#\\text{(}\\text{3}\\text{)}\\end{array}$$\u003c/div\u003e\u003c/div\u003e\u003cp\u003eWhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\widehat{y}\\)\u003c/span\u003e\u003c/span\u003e(x) is the predicted value of the random forest for sample x, \u003cem\u003eT\u003c/em\u003e is the number of decision trees, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:f\\left(x;\\theta\\:t\\right)\\)\u003c/span\u003e\u003c/span\u003e is the predicted result of the t-th decision tree for sample \u003cem\u003ex\u003c/em\u003e, and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\theta\\:t\\)\u003c/span\u003e\u003c/span\u003e is the parameter of the t-th decision tree.\u003c/p\u003e\u003cp\u003e(3) Support Vector Machine\u003c/p\u003e\u003cp\u003eSupport vector machine regression is a supervised regression algorithm based on statistical learning theory. Its core idea is to construct an interval band (ε-tube) of an ε-insensitive loss function., Under the premise of tolerating a certain prediction error (controlled by ε), find the optimal hyperplane to fit the data and maximize the geometric boundary of the spacer band. The optimization goal is to combine the principle of structural risk minimization and balance the model complexity and training error by adjusting the regularization parameter C. It has the robustness to outlier values and the ability to adapt to high-dimensional data [Mohammadib et al., 2020]. The corresponding mathematical expressions are as follows.\u003c/p\u003e\u003cdiv id=\"Equd\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equd\" name=\"EquationSource\"\u003e\n$$\\:\\begin{array}{c}f\\left(x\\right)={\\sum\\:}_{i=1}^{n}\\alpha\\:iK\\left({x}_{i},x\\right)+b\\#\\text{(}\\text{4}\\text{)}\\end{array}$$\u003c/div\u003e\u003c/div\u003e\u003cp\u003eAmong them, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\alpha\\:i\\)\u003c/span\u003e\u003c/span\u003e is the Lagrangian multiplier, \u003cem\u003eK\u003c/em\u003e(\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{x}_{i},x\\)\u003c/span\u003e\u003c/span\u003e ) = exp ( -\u003cem\u003eγ\u003c/em\u003e ‖ \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{x}_{i}-x\\)\u003c/span\u003e\u003c/span\u003e ‖\u003csup\u003e2\u003c/sup\u003e ) is the radial basis kernel function, \u003cem\u003eγ\u003c/em\u003e is the kernel function parameter, the higher the value, the more local the impact, \u003cem\u003exi\u003c/em\u003e is the support vector, \u003cem\u003eb\u003c/em\u003e is the bias term.\u003c/p\u003e\u003cp\u003eIn this study, three machine learning methods, support vector machine regression, random forest and decision tree, are used to construct a water quality inversion model of coal mining subsidence waters. The model data set is divided into a 4:1 training set and a test set. The prediction accuracy of the model on the training set and the test set is comprehensively evaluated by determining the three indicators of coefficient (R\u003csup\u003e2\u003c/sup\u003e), root mean square error (RMSE), and mean absolute error (MAE). According to the evaluation results, the model parameters are adjusted to improve the accuracy and generalization ability of the model.\u003c/p\u003e\u003ch3\u003eCorrelation inversion model accuracy evaluation method\u003c/h3\u003e\u003cp\u003eIn order to evaluate the predictive performance of the model and choose the most ideal model, in this study, three indicators are selected: the determinant coefficient (R\u003csup\u003e2\u003c/sup\u003e), the average relative error (MAE), and the root mean square error (RMSE) to evaluate and analyze the accuracy of the water quality parameter inversion model.\u003c/p\u003e\u003cdiv id=\"Eque\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Eque\" name=\"EquationSource\"\u003e\n$$\\:\\begin{array}{c}{R}^{2}=\\frac{{\\sum\\:}_{i=1}^{n}({\\widehat{P}}_{i}-{P}_{i}{)}^{2}}{{\\sum\\:}_{i=1}^{n}({\\stackrel{̄}{P}}_{i}-{P}_{i}{)}^{2}}\\#\\text{(}\\text{5}\\text{)}\\end{array}$$\u003c/div\u003e\u003c/div\u003e\u003cp\u003e \u003cem\u003eR\u003c/em\u003e \u003csup\u003e \u003cem\u003e2\u003c/em\u003e \u003c/sup\u003e represents the correlation between the measured water concentration value and the predicted value of the model. The closer the value is to 1, the better the accuracy of the model.\u003c/p\u003e\u003cdiv id=\"Equf\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equf\" name=\"EquationSource\"\u003e\n$$\\:\\begin{array}{c}MAE=\\frac{1}{n}{\\sum\\:}_{i=1}^{n}\\left|\\left({P}_{i}-{\\widehat{P}}_{i}\\right)\\right|\\#\\text{(}\\text{6}\\text{)}\\end{array}$$\u003c/div\u003e\u003c/div\u003e\u003cp\u003e \u003cem\u003eMAE\u003c/em\u003e is used to measure the average level of absolute error between the measured water concentration value and the predicted value.\u003c/p\u003e\u003cdiv id=\"Equg\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equg\" name=\"EquationSource\"\u003e\n$$\\:\\begin{array}{c}RMSE=\\sqrt{\\frac{{\\sum\\:}_{i=1}^{n}({P}_{i}-{\\widehat{P}}_{i}{)}^{2}}{n}}\\#\\text{(}\\text{7}\\text{)}\\end{array}$$\u003c/div\u003e\u003c/div\u003e\u003cp\u003e \u003cem\u003eRMSE\u003c/em\u003e indicates the deviation between the actual water quality concentration values and the predicted values. The lower the deviation value, the better the prediction effect, and the higher the model accuracy.\u003c/p\u003e\u003cp\u003eWhere, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\widehat{{P}_{i}}\\)\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{P}_{i}\\)\u003c/span\u003e\u003c/span\u003e represent the predicted and measured values of the water quality parameter, respectively; \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\overline{P}\\)\u003c/span\u003e\u003c/span\u003e is the average of the measured values; and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:n\\)\u003c/span\u003e\u003c/span\u003e represents the total number of water sample points.\u003c/p\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003eOriginal spectrum and correlation analysis\u003c/h2\u003e \u003cp\u003eThe original remote sensing spectral data of the collected 68 sampling points are processed by ENVI software, and the remote sensing reflectance information of the sampling point is extracted, as shown in the figure below (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e). As can be seen from the figure, the trend of spectral curves at different sampling points is similar, but there is no significant difference. However, due to certain differences in water quality concentration between different sampling points, the valley peak level and the speed of change of the reflection curve are different, of which the reflectance value between the sampling points No. 0\u0026ndash;20 is relatively high. High, it may be caused by different water quality conditions in the area, and it is difficult to directly predict the concentration of water quality indicators through the spectral curve.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe measured water quality parameter concentration and Sentinel-2 single-band image reflectance data were analyzed by Pearson correlation, and the results are shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e. It can be seen that, except for the low correlation between individual bands in summer, the correlation between the remaining bands is significant, reaching more than 0.5.According to the correlation analysis, it can be seen that the water quality parameters of each quarter have a high correlation with the bands B3, B4, and B5. Among them, TP and Chl-a have a significant correlation with B2 and B8.Due to the information redundancy between the various frequency bands of the remote sensing image, the relationship between the water quality parameters and the reflectance of each frequency band cannot be well reflected. Therefore, the spectral reflectance of the sampling point is further processed by band combination, so as to eliminate interference between bands to a certain extent, effectively reduce the influence of other impurities in the water body, and highlight the correlation between water quality parameters and band reflectance. Select a single band with a high correlation with each water quality parameter for band combination operation. The band combination method includes band difference, band ratio and other calculation methods. Therefore, select the sensitive band or band combination with a high correlation between each water quality parameter and select the spectral parameter (significant level P) that is more sensitive to the water quality parameter\u0026thinsp;\u0026lt;\u0026thinsp;0.05) Perform inversion model establishment.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003eAnalysis of statistical regression model\u003c/h2\u003e \u003cp\u003eRespectively, the band or band combination value with the highest correlation is used as the independent variable, and the concentration value of the water quality parameter is used as the dependent variable. Various mathematical and statistical regression and inversion models such as quadratic polynomial, logarithmic, logarithmic square and power function are established, and the results are evaluated based on the accuracy of the verification set, The optimal fitting model is selected as the best fitting model for each water quality index based on statistical regression. The optimal fitting models and effects of different quarters of related water quality indicators are shown below (Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). In the inversion model based on mathematical statistical regression, there are differences in the concentration of water quality parameters due to different seasons, which in turn affects the spectral reflectance of remote sensing images, resulting in different accuracy of the inversion models of various water quality indicators [Chen et al., 2025]. Among them, the TN and NH\u003csub\u003e4\u003c/sub\u003e\u003csup\u003e+\u003c/sup\u003e-N have the highest modeling set accuracy in the spring inversion model, with R\u003csup\u003e2\u003c/sup\u003e reaching 0.4701 and 0.6042, respectively, and the verification set R\u003csup\u003e2\u003c/sup\u003e is 0.4023 and 0.5041. The TP inversion accuracy is the highest in summer, with a modeling set R\u003csup\u003e2\u003c/sup\u003e of 0.6223 and a verification set of 0.6021. The highest inversion accuracy of Chl-a concentration is 0.6322, and the verification set is 0.6102. Different water quality parameters are affected by seasonal climate and show different inversion accuracy. Although the verification set R\u003csup\u003e2\u003c/sup\u003e is relatively stable, the model has a strong generalization ability. However, in practical applications, the overall goodness of fit of the model has not achieved the expected effect. This phenomenon may be due to the fact that the mathematical statistical regression model uses only a small amount of band information for regression modeling, and does not fully consider the combined information of a large number of related bands, so it shows low inversion accuracy in the inversion of water quality concentration in each quarter.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eOptimal model of remote sensing inversion of water quality parameters based on mathematical statistical regression\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eWater quality parameters\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003equarter\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eModel type\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eModel expression\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eModeling set\u003c/p\u003e \u003cp\u003eR\u003csup\u003e༒\u003c/sup\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eVerification set\u003c/p\u003e \u003cp\u003eR\u003csup\u003e༒\u003c/sup\u003e\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003eTN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003espring\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003elogarithmic square\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eY=-30.5841-39.7286*ln(X)-11.696*ln(X)\u003csup\u003e2\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.4701\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.4023\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003esummer\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eQuadratic polynomial\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eY=-6.3470\u0026thinsp;+\u0026thinsp;60.5119X-110.1181X\u003csup\u003e2\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.2513\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.2437\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eautumn\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eLinear\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eY\u0026thinsp;=\u0026thinsp;6.6032\u0026ndash;17.7869*X\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.4272\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.3986\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ewinter\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eQuadratic polynomial\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eY=-3.8503\u0026thinsp;+\u0026thinsp;76.5355*X-252.6308*X\u003csup\u003e2\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.2480\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.2298\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003eTP\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003espring\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eQuadratic polynomial\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eY\u0026thinsp;=\u0026thinsp;0.1337-0.3026X\u0026thinsp;+\u0026thinsp;252.6864X\u003csup\u003e2\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.5936\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.5031\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003esummer\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003elogarithm\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eY\u0026thinsp;=\u0026thinsp;0.23\u0026thinsp;+\u0026thinsp;0.09*ln(X)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.6223\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.6021\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eautumn\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePower function\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eY\u0026thinsp;=\u0026thinsp;0.1205*3120.7271\u003csup\u003ex\u003c/sup\u003e-0.4882\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.3568\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.3142\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ewinter\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003elogarithmic square\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eY\u0026thinsp;=\u0026thinsp;16.9536\u0026thinsp;+\u0026thinsp;46.8780*ln(X)\u0026thinsp;+\u0026thinsp;33.0362*ln(X)\u003csup\u003e2\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.2730\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.2676\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003eNH\u003csub\u003e4\u003c/sub\u003e\u003csup\u003e+\u003c/sup\u003e-N\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003espring\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003elogarithmic square\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eY=-4.016-6.5232*ln(X)-2.2718*ln(X)\u003csup\u003e2\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.6042\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.5041\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003esummer\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePower function\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eY\u0026thinsp;=\u0026thinsp;1.3342X\u003csup\u003e2.0183\u003c/sup\u003e+0.088\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.4366\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.4415\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eautumn\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eHyperbolic\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eY\u0026thinsp;=\u0026thinsp;1/(-6.0075\u0026thinsp;+\u0026thinsp;36.7093*X)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.4816\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.4376\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ewinter\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eQuadratic polynomial\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eY=-0.3713\u0026thinsp;+\u0026thinsp;23.1665*X-99.8451*X\u003csup\u003e2\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.3565\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.3351\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003eChl-a\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003espring\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eQuadratic polynomial\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eY=-1.4158\u0026thinsp;+\u0026thinsp;183.5549*X-2635.1177*X\u003csup\u003e2\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.5241\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.4923\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003esummer\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003elogarithmic square\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eY\u0026thinsp;=\u0026thinsp;9.72-6.29ln(X)-19.2849ln(X)\u003csup\u003e2\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.6322\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.6102\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eautumn\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003elogarithm\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eY\u0026thinsp;=\u0026thinsp;34.9376\u0026thinsp;+\u0026thinsp;22.1397*ln(X)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.5046\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.4795\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ewinter\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePower function\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eY\u0026thinsp;=\u0026thinsp;0.1117*X\u003csup\u003e\u0026minus;\u0026thinsp;1.843\u003c/sup\u003e-2.1875\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.4275\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.4068\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"6\"\u003e\u003cb\u003eNote\u003c/b\u003e: In the table above, X is the reflectance of the sensitive band with a high correlation with the water quality index.\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003eMachine learning model analysis\u003c/h2\u003e \u003cp\u003eConstruct a machine learning feature data set based on reflectance data and highly correlated band combination values, respectively, The input of the model is the subset of features and the concentration values of water quality parameters filtered by Pearson correlation analysis, and the training set and the verification set are randomly divided into 4:1.Three machine learning models, decision tree, random forest (RF) and support vector machine (SVM), were established for four different water quality indicators, and the decision coefficient R\u003csup\u003e2\u003c/sup\u003e, root mean square error RMSE and mean absolute error MAE were used as evaluation criteria to verify whether the model is accurate. According to the evaluation results, the model parameters are adjusted to improve the accuracy and generalization ability of the model. The training and verification effects of each model are shown in the following table.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eAccuracy evaluation of results of different machine learning models\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"9\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eWater quality parameters\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003equarter\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003emodel\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colspan=\"3\" nameend=\"c6\" namest=\"c4\"\u003e \u003cp\u003eModeling set\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colspan=\"3\" nameend=\"c9\" namest=\"c7\"\u003e \u003cp\u003eVerification set\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eR\u003csup\u003e2\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eMAE\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eRMSE\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eR\u003csup\u003e2\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eMAE\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eRMSE\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"11\" rowspan=\"12\"\u003e \u003cp\u003eTN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003espring\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003eDT\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.7908\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.325\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e0.4422\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.6067\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e0.365\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e0.499\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRF\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.524\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.312\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.433\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.245\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.354\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.476\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSVM\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.484\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.289\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.453\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.312\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.353\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.478\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003esummer\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e决策树\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.695\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.059\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.126\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.506\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.077\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.091\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRF\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.695\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.059\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.126\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.506\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.077\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.094\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003eSVM\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.854\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.058\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e0.091\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.753\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e0.055\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e0.072\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eautumn\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eDT\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.107\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.421\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.522\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.217\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.375\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.487\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003eRF\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.863\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.164\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e0.216\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.592\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e0.321\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e0.405\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSVM\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.459\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.298\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.407\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.272\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.351\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.469\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003ewinter\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003eDT\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.858\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.084\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e0.139\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.654\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e0.129\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e0.176\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRF\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.807\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.074\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.118\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.599\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.087\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.131\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSVM\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.777\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.154\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.182\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.585\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.096\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.139\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"11\" rowspan=\"12\"\u003e \u003cp\u003eTP\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003espring\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eDT\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.803\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.019\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.031\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.315\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.043\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.045\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003eRF\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.813\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.016\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e0.023\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.776\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e0.021\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e0.026\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSVM\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.476\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.038\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.052\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.249\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.039\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.047\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003esummer\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003eDT\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.877\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.002\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.885\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.002\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.003\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRF\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.819\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.002\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.918\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.002\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSVM\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.816\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.002\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.984\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eautumn\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eDT\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.823\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.073\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.099\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.327\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.170\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.241\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRF\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.657\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.110\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.149\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.502\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.121\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.160\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003eSVM\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.838\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.074\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e0.097\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.689\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e0.091\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e0.124\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003ewinter\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e决策树\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.699\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.092\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.128\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.212\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.085\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.108\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRF\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.658\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.093\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.1281\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.338\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.076\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.099\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003eSVM\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.742\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.057\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e0.072\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.636\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e0.109\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e0.134\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"11\" rowspan=\"12\"\u003e \u003cp\u003eAH3-N\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003espring\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eDT\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.8408\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.272\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.484\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.5097\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.45489\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.725\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRF\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.865\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.305\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.446\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.402\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.619\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.851\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003eSVM\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.8187\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.237\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e0.359\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.658\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e0.486\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e0.597\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003esummer\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eDT\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.850\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.004\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.006\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.676\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.015\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.016\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003eRF\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.899\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.005\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e0.006\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.833\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e0.014\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e0.015\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSVM\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.674\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.006\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.008\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.982\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.019\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.019\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eautumn\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003eDT\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.710\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.093\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e0.123\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.625\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e0.141\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e0.185\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRF\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.891\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.077\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.099\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.005\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.212\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.241\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSVM\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.419\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.177\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.212\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.121\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.286\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.343\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003ewinter\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003eDT\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.836\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.057\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e0.129\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.733\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e0.014\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e0.018\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRF\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.797\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.141\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.186\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.362\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.104\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.123\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSVM\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.696\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.138\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.193\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.464\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.074\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.089\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"11\" rowspan=\"12\"\u003e \u003cp\u003eChl-a\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003espring\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003eDT\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.864\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.623\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e1.185\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.709\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e1.323\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e1.813\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRF\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.713\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e1.152\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1.72\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.706\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e1.611\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e2.018\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSVM\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.439\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e1.456\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e2.404\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.481\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e1.516\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e2.327\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003esummer\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eDT\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.877\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.116\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.240\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.354\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.447\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.627\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003eRF\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.688\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.181\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e0.397\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.683\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e0.263\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e0.335\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSVM\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.515\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.157\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.477\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.947\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.078\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.096\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eautumn\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eDT\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.783\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.896\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1.228\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.178\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e3.789\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e4.127\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003eRF\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.733\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e1.427\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e1.659\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.568\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e2.191\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e2.580\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSVM\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.159\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e2.052\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e2.426\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.161\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e2.441\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e2.952\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003ewinter\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eDT\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.891\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.329\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.436\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.662\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.728\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.907\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRF\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.816\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.449\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.566\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.874\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.653\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.732\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003eSVM\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.906\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.371\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e0.479\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.823\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e0.514\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e0.592\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"9\"\u003e\u003cb\u003eNote\u003c/b\u003e: The bold font is the optimal machine learning algorithm\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eThere are certain differences between the concentration inversion models of water quality parameters of subsidence waters based on different machine learning algorithms. As can be seen from the table above (Table \u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e), the accuracy of the machine learning algorithm's inversion model for the concentration of different water quality indicators in different quarters can reach a high level. Among them, the minimum coefficient of determination of the TN training set is 0.7908, and the coefficient of determination of the verification set is determined by the verification set in the other three quarters except autumn. The coefficients are all above 0.6, The DT regression algorithm showed good results in spring and winter, and SVR and RF performed well in summer and autumn, respectively. Compared with the traditional mathematical statistical regression model, the fitting performance of the machine learning model is better. In the TP inversion model, DT has the best fitting effect, and the test set has high inversion accuracy. Its determinant coefficient R\u003csup\u003e2\u003c/sup\u003e reaches 0.885, the root mean square error RMSE is 0.026, and the absolute error MAE is 0.021. In the mathematical statistical regression model, R\u003csup\u003e2\u003c/sup\u003e is 0.6223. Compared with the SVR algorithm in the inversion of TP concentration in autumn and winter, the accuracy of the mathematical statistical regression algorithm has been greatly improved. NH\u003csub\u003e4\u003c/sub\u003e\u003csup\u003e+\u003c/sup\u003e-N and Chl-a concentration also have high inversion accuracy. In the four quarters of ammonia nitrogen concentration inversion, the determinant coefficient reached more than 0.71, of which three quarters exceeded 0.81. The determination coefficient of the Chl-a training set can reach up to 0.906 in the winter concentration inversion, and the inversion accuracy is significantly improved compared to the traditional mathematical model.\u003c/p\u003e \u003cp\u003eThe three machine learning models have shown different advantages in the inversion of water quality in different quarters. In order to more intuitively reflect the prediction effect of machine learning algorithms on water quality concentration, the concentrations of TN, TP, NH\u003csub\u003e4\u003c/sub\u003e\u003csup\u003e+\u003c/sup\u003e-N and Chl-a are predicted based on three algorithms: SVR, DT and RF. Different the scatter plot of the comparison between the measured values of the quarterly water quality parameters and the predicted values of the optimal machine learning model is shown below (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e), and the degree of dispersion between the predicted values and the actual values can be intuitively seen. The scattered color indicates the concentration and density of water quality parameters, and the green color indicates that there are more points in the concentration range. The closer the slope of the regression equation is to 1, the smaller the intercept, indicating that the closer the measured value is to the predicted value, the better the prediction effect.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"Discussion","content":"\u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003eInversion model effects of different water quality parameters\u003c/h2\u003e \u003cp\u003eFrom the water quality concentration fitting scatter plot, the degree of dispersion between the true value of the sample and the predicted value can be intuitively seen. The training and verification effect of each water quality index is better, and the model fitting accuracy is high. The performance of machine learning models (SVM, RF, DT) in the concentration inversion of non-photosensitive parameters such as TN,TP and NH\u003csub\u003e4\u003c/sub\u003e\u003csup\u003e+\u003c/sup\u003e-N is significantly better than that of traditional statistical regression models (R\u003csup\u003e2\u003c/sup\u003e increased by 20%~40%), which is consistent with Xiong et al. The research conclusions in Taihu Lake are consistent, It shows that machine learning algorithms can better capture the complex relationship between nonlinear spectral characteristics and water quality parameters [Xiong et al., 2022; Jiang et al., 2021], However, the accuracy of the inversion of Chl-a concentration of the photosensitivity parameter is limited. Therefore, the difference in the accuracy of the inversion model may be related to the optical characteristics of water quality parameters. Non-photosensitive parameters such as TN and TP depend on indirect spectral responses (such as the synergistic effects of suspended solids and soluble organic matter) [Shi et al., 2015; Abayazid et al., 2019]. The dynamics and seasonal fluctuations of phytoplankton communities will change the content and distribution of chlorophyll a [Rinaldi et al., 2014], its photosensitivity is significantly affected by phytoplankton community dynamics and seasonal fluctuations, resulting in a weak correlation between spectral signals and concentration. It can be found that the machine learning model has great advantages in the inversion of the concentration of non-photosensitive parameters [Zhou et al., 2015].\u003c/p\u003e \u003cp\u003eDifferent water quality indicators have their own unique laws of change and influencing factors, which makes each model show obvious differences when inverting the concentration of different water quality indicators [Zeng et al., 2023]. The decision tree model performs well in the inversion of the concentration of two indicators, TN and NH\u003csub\u003e4\u003c/sub\u003e\u003csup\u003e+\u003c/sup\u003e-N. The support vector machine model performs well in the inversion of TP concentration, while the random forest occupies an advantage in the inversion of Chl-a concentration. In summary, compared with traditional mathematical statistical regression models, the inversion accuracy of DT, RF and SVM machine learning models in the inversion of water quality parameter concentration in coal mining subsidence waters has been significantly improved, It shows the applicability of machine learning models in remote sensing inversion of water quality parameter concentration [Gu et al., 2020].However, in this study, it has not been found that a single model can show a good fitting effect in the inversion of the concentration of all water quality parameters. In practical applications, it is necessary to choose a suitable model according to different seasons and specific water quality indicators.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003eDifferences in inversion models for different seasons\u003c/h2\u003e \u003cp\u003eThe influence of seasonal changes on the performance of the model cannot be ignored. There are differences in the optical characteristics of water bodies in the same water area in different seasons [Wang et al., 2023]. The study found that due to the higher temperature in summer, the prediction accuracy of the model generally improves, which may be related to the stability of the optical characteristics of water bodies [Song et al., 2024]. However, in winter, the performance of the statistical regression model and the machine learning model diverged significantly. Compared with the summer, the R\u003csup\u003e2\u003c/sup\u003e value of the statistical regression model decreased by an average of about 30%, while the machine learning model decreased by only 10\u0026ndash;15%, indicating that the machine learning algorithm is more adaptable to seasonal changes. There are certain water quality parameters in different quarters. Due to the large distribution interval of water quality concentration at a few points, better fitting cannot be achieved. To a certain extent, the fitting accuracy of the inversion will be reduced, but at the same time, the phenomenon of overfitting is avoided, which makes the prediction effect of the model better. In fact, the generalization of the model Stronger ability [Md et al., 2025]. There is still a certain difference between the inversion concentration of the model and the measured value (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e). This may be due to the uncertainty of field sampling and the optical characteristics of water bodies due to a variety of natural environmental influences, and it is difficult for the limited sampling sites to cover the spectral characteristics of the entire water area under study [Pan et al., 2025]. In summary, for the changes of specific water quality parameters in different seasons, it is difficult for the model to be flexibly adjusted to adapt to the changes. In practical applications, it lacks the universality of different research areas and seasonal characteristics. Later research also requires a large number of parameter adjustments and verification of the model for different seasonal conditions.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003eResearch significance\u003c/h2\u003e \u003cp\u003eParameters such as TN, TP, NH\u003csub\u003e4\u003c/sub\u003e\u003csup\u003e+\u003c/sup\u003e-N and Chl-a are the key characteristic parameters to measure the status of water ecological environment, and it is important to accurately find out the non-point source pollution of coal mining subsidence waters by relevant monitoring methods [Chen et al., 2025]. This study focuses on the water accumulation area of coal mining subsidence in Huainan City, and comprehensively uses mathematical statistical regression method and machine learning algorithm to construct an inversion model of water quality parameters, which has important research significance in many aspects. (1) Traditional water quality monitoring methods have many limitations, such as difficult sampling process, high cost, and low timeliness, and the sampling point data cannot represent the water quality status of the entire water area [Muhoyi et al., 2025]. The research is based on remote sensing inversion technology and uses satellite remote sensing images for inversion modeling to predict the concentration of water quality parameters in coal mining subsidence waters. It shows good applicability in the inversion of the concentration of water quality indicators in coal mining subsidence waters, provides new ideas and methods for model construction, and helps to further explore the synergistic effects of different models in complex environmental monitoring.(2) In this study, the relationship between water quality parameters and remote sensing image characteristics in coal mining subsidence area was studied in depth, and the spectral response characteristics of different water quality parameters in different wavelength bands were explored. This not only helps to improve the accuracy of the water quality inversion model in coal mining subsidence areas, but also promotes the application of remote sensing monitoring technology in environmental monitoring of other water bodies. It can timely discover the change trend of water quality in subsidence waters, comprehensively grasp the water quality status of coal mining subsidence areas, and provide effective data support for water resources protection and pollution prevention and control。(3) Mathematical statistical regression algorithms have a single linear fitting model and usually have low fitting accuracy, which is not suitable for solving complex nonlinear problems [Salem, et al., 2017].Machine learning algorithms can use sample data to achieve autonomous optimization and iterative evolution of empirical models, and better explore potential connections between data. Through the deep integration of machine learning and remote sensing technology, this study breaks through the temporal and spatial limitations of traditional water quality monitoring, and provides new opportunities for the further development of water quality parameter monitoring and prediction.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec19\" class=\"Section2\"\u003e \u003ch2\u003eResearch limitations and prospects\u003c/h2\u003e \u003cp\u003eWhen analyzing the effect of the inversion model in different seasons, there may be many factors such as insufficient coverage of sample data in time and space, data quality, model error and parameter uncertainty, which further affect the accurate evaluation of the prediction effect of the inversion model [Caballero et al.,2025]. Among them, the collection process of water quality data is limited by existing conditions and cannot evenly cover the entire research area, resulting in large errors in the concentration inversion results of water quality parameters in some areas [Luo et al., 2022]. At the same time, the insufficient resolution of the Sentinel-2 satellite image itself, the weather and cloud cover on the day the image was taken, and other factors made it difficult for the satellite to capture the details of small subsidence ponds, especially in areas obscured by photovoltaic panels, which further affected the accuracy of the inversion model. In the future, it is necessary to overcome the current problems and use high-resolution satellite image data combined with unmanned aerial vehicle hyperspectral data to increase the stability of the atmospheric correction algorithm [Liu et al., 2022], at the same time optimize the sample collection work and improve the representativeness of the sample data to improve the adaptability and inversion accuracy of the model to the complex water environment.\u003c/p\u003e \u003cp\u003eMathematical statistical models have different advantages and disadvantages from machine learning models. Mathematical statistical models are usually easier to explain, while machine learning models focus on results. Statistical models avoid overfitting by adjusting variables and tests, while machine learning commonly uses methods such as cross-verification and regularization. In terms of data volume, statistical models may perform better when the data volume is small, while machine learning requires a lot of data training to better realize its advantages. In the future, we can consider continuing to optimize the algorithm model to reduce the interference of outlier to the model [Yu et al., 2025]. By combining mathematical statistical models with machine learning inversion models, it has shown good applicability in inverting the concentration of water quality indicators in coal mining subsidence waters [Sarigai et al., 2021]. The currently constructed inversion model of water quality index of subsidence waters is only applicable to specific research waters, and each inversion algorithm is still limited by season, geographical location, and type of water body, and the extrapolation adaptability of the inversion algorithm is also limited. In the future, it is necessary to break the above limitations of the model, build an inversion model with strong universality[Liang et al., 2025], further integrate multi-source remote sensing data, combine artificial intelligence, deep learning and other methods to build a high-precision inversion model of water quality parameters in subsidence waters, improve the accuracy of remote sensing monitoring, and at the same time strengthen the analysis of the inherent mechanism of machine learning algorithms.\u003c/p\u003e \u003c/div\u003e"},{"header":"Conclusion","content":"\u003cp\u003eThis paper takes the coal mining subsidence and stagnant water area of Huainan City as the research object, based on ground sampling data and Sentinel-2 satellite image data, Mathematical statistical regression model and three machine learning algorithms RF, DT and SVM are used to construct an inversion model of the water quality parameters of coal mining subsidence waters in the four quarters of 2024, and its accuracy is systematically evaluated. The effects of mathematical statistical regression model and machine learning model on the accuracy of the inversion model of four water quality parameters TN, TP, NH\u003csub\u003e4\u003c/sub\u003e\u003csup\u003e+\u003c/sup\u003e-N and Chl-a are compared. Through the analysis of the correlation between the water quality parameters and the reflectance of the Sentinel-2 image band, it is found that the correlation is not high, and the correlation coefficient is within 0.5. The mathematical statistical regression model only uses a small amount of band information for regression modeling, and does not fully consider the combined information of a large number of related bands, so the accuracy of the statistical regression algorithm is low, R\u003csup\u003e2\u003c/sup\u003e is mostly lower than 0.5.The methods of training set fitting and test set verification are further adopted, and the band reflectance is used as the input characteristic variable to construct three machine learning regression models of RF, DT and SVM. The research results show that the inversion accuracy of the machine learning model is higher than that of the mathematical statistical regression model. The determination coefficient R\u003csup\u003e2\u003c/sup\u003e of the inversion model of water quality parameters in the four quarters is mostly above 0.8, and the performance of the machine learning regression model has been verified in samples of different water quality parameters in the four quarters. Improve the inversion accuracy of water quality parameters in coal mining subsidence waters, and provide more accurate data support for water quality monitoring and management in coal mining subsidence waters.\u003c/p\u003e "},{"header":"Declarations","content":"\u003ch2\u003e \u003cb\u003eAuthors and Affiliations\u003c/b\u003e \u003c/h2\u003e \u003cp\u003e \u003cstrong\u003eState Key Laboratory for Safe Mining of Deep Coal Resources and Environment Protection, Anhui University of Science and Technology, Huainan, China, 232001\u003c/strong\u003e \u003cp\u003eHaitao Wu, Email: [email protected]. Corresponding author: Ying Liu, Email: [email protected]. Yuzhi Zhou, Email: [email protected] Gao, Email: [email protected] \u0026amp;Xiaoyang Chen, Email: [email protected].\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eSchool of Earth and Environment, Anhui University of Science and Technology, Huainan, China, 232001\u003c/strong\u003e \u003cp\u003eHaitao Wu, Ying Liu, Yuzhi Zhou, Yanxue Gao\u0026amp;Xiaoyang Chen\u003c/p\u003e \u003cp\u003e \u003cstrong\u003eCorresponding author\u003c/strong\u003e \u003cp\u003eCorrespondence to Ying Liu. Email: [email protected]\u003c/p\u003e \u003c/p\u003e\u003ch2\u003e \u003cb\u003eEthics declarations\u003c/b\u003e \u003c/h2\u003e \u003cp\u003e \u003cstrong\u003eCompeting interests\u003c/strong\u003e \u003cp\u003eThe authors declare no competing interests.\u003c/p\u003e \u003c/p\u003e\u003ch2\u003eFunding\u003c/h2\u003e \u003cp\u003eThis work was supported by Supported by [The Opening Foundation of Anhui Province Engineering Research Center of Water and Soil Resources Comprehensive Utilization and Ecological Protection in High Groundwater Mining Area; and Project entrusted by Ping \u0026rsquo;an Coal Mining Engineering Technology Research Institute Co., Ltd. (HNKY-PGJS-2023-228); the National Natural Science Foundation of China (52204181)](Grant numbers [HNKY-PGJS-2023-228] and [52204181])\u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eHW: Writing \u0026ndash; original draft, Visualization, Software, Investigation, Data curation,Methodology,Formal analysis.YL: Writing \u0026ndash; review \u0026amp; editing, Writing \u0026ndash; original draft, Funding acquisition, Supervision, Data curation,Formal analysis. Conceptualization.YZ: Data curation, Investigation, Writing \u0026ndash; original draft. YG: Data curation,Writing \u0026ndash; original draft.YL: Validation, Writing \u0026ndash; original draft. XC: Conceptualization, Funding acquisition, Writing \u0026ndash; review \u0026amp; editing.\u003c/p\u003e\u003cdiv id=\"Sec21\" class=\"Section2\"\u003e \u003ch2\u003eData availability\u003c/h2\u003e \u003cp\u003eNo datasets were generated or analysed during the current study.\u003c/p\u003e \u003c/div\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eAlikas, K., \u0026amp; Kratzer, S. (2017). Improved retrieval of Secchi depth for optically-complex waters using remote sensing data. Ecological Indicators, 77: 218-227. https://doi.org/10.1016/j.ecolind.2017.02.007\u003c/li\u003e\n\u003cli\u003eAlnahit, A.O., Mishra, A.K., \u0026amp;Khan A.A. (2022). Stream water quality prediction using boosted regression tree and random forest models. Stochastic Environmental Research and Risk Assessment 36(9): 2661-2680. https://doi.org/10.1007/s00477-021-02152-4.\u003c/li\u003e\n\u003cli\u003eAbayazid, H.O., \u0026amp;El-Adawy, A. (2019). Assessment of a non-optical water quality property using space-based imagery in Egyptian coastal lake. Journal of Water Resource and Protection, 11(6):713-727. https://doi.org/10.4236/jwarp.2019.116042\u003c/li\u003e\n\u003cli\u003eChen, Y.C., Yuan, L., \u0026amp;Xu, Z. (2016). Investigation on using mining subsidence area to build a reservoir in Huainan Coal Mining Area. Journal of China Coal Society, 41(11): 2830-2835. https://doi.org/10.13225/j.cnki.jccs.2016.0135\u003c/li\u003e\n\u003cli\u003eChen, G.Z., Wang X.M., Wang R.W., \u0026amp;Liu G.J. (2019). Health risk assessment of potentially harmful elements in subsidence water bodies using a Monte Carlo approach: An example from the Huainan coal mining area, China. Ecotoxicology and Environmental Safety, 171, 737-745, 0147-6513, https://doi.org/10.1016/j.ecoenv.2018.12.101.\u003c/li\u003e\n\u003cli\u003eCui, L.K., Song, X.Q., Yang, Y.W., Liu, J.X., LI, Z.N., Yun, L., \u0026amp;Zhang M.L. (2021). Doppler Lidar Retrieval of Particulate Matter Concentration Based on Statistical Regression Method. Acta Photonica Sinica, 50(12):1201005. https://doi.org/10.3788/gzxb20215012.1201005\u003c/li\u003e\n\u003cli\u003eChen, R.K., Kang, J., Zhao, Y.X., Guo, Y.Q., Xu, X.L., \u0026amp;Wang, Y.G. (2025). Spatial and temporal evolution of water quality in the Yangtze River Basin from 2003 to 2024. China Environmental Science 2025,45(4):2171~2182. https://doi.org/10.19674/j.cnki.issn1000-6923.20250217.003\u003c/li\u003e\n\u003cli\u003eCaballero, C.B., Martins, V.S., Paulino, R.S., Butler, E., Sparks, E., Lima, T.M., \u0026amp;Novo, E.M.L.M. (2025). The need for advancing algal bloom forecasting using remote sensing and modeling: Progress and future directions. Ecological Indicators, 172:113244. https://doi.org/10.1016/j.ecolind.2025.113244\u003c/li\u003e\n\u003cli\u003eChen, Y.C., Wu, H.T., Shen, L.P., Liu, Y., Xu, Y.F., Chen, X.Y., \u0026amp;Zhou, Y.Z. (2025). Research progress on remote sensing monitoring of water eutrophication indicators in coal mining subsidence water area. Mining Safety \u0026amp; Environmental Protection., 52(02):153-163. https://doi.org/10.19835/j.issn.1008-4495.20240059\u003c/li\u003e\n\u003cli\u003eDai, X, Yang, X., Wang, M., Gao, Y., Liu, S.H., \u0026amp;Zhang, J.M. ( 2019). The Dynamic Change of Bosten Lake Area in Response to Climate in the Past 30 Years. Water, 12(1): 4. https://doi.org/10.3390/w12010004\u003c/li\u003e\n\u003cli\u003eDomitr, P., Wlostowski, M., Laskowski, R., \u0026amp;Jurkowski, R. (2023). Comparison of inverse uncertainty quantification methods for critical flow test. Energy, 2023, 263:125640. https://doi.org/10.1016/j.energy.2022.125640\u003c/li\u003e\n\u003cli\u003eFan, K.X., Liu, Y., Zhang, X.Y., Chen, X.Y., Li, Y., Zhou, Y.Z., Shen, W.J., Tao, H.L., Gong, C.G., \u0026amp;Lei, S.G. (2024). Assessing the relative contribution of climate change and human activity factors to spatiotemporal distributions of sand fixation service in the Loess Plateau. GIScience \u0026amp; Remote Sensing, 62(1):2444630. https://doi.org/10.1080/15481603.2024.2444630\u003c/li\u003e\n\u003cli\u003eGu, K., Zhang, Y.H., \u0026amp;Qiao, J.F. (2020). Random forest ensemble for river turbidity measurement from space remote sensing data. IEEE Transactions on Instrumentation and Measurement,69(11):9028-9036http://dx.doi.org/10.1109/tim.2020.2998615\u003c/li\u003e\n\u003cli\u003eHe, T.T., Wu, X., Zhao, Y.L., Deng, X.Y., \u0026amp;Hu, Z.Q. (2020). Identification of waterlogging in Eastern China induced by mining subsidence: A case study of Google Earth Engine time-series analysis applied to the Huainan coal field. Remote Sensing of Environment, 242: 111742. https://doi.org/10.1016/j.rse.2020.111742\u003c/li\u003e\n\u003cli\u003eHe, Y.H., Gong Z.J., Zheng Y.H., \u0026amp;Zhang Y.B. (2021). Inland Reservoir Water Quality Inversion and Eutrophication Evaluation Using BP Neural Network and Remote Sensing Imagery: A Case Study of Dashahe Reservoir. Water, 13(20): 2844-2844. https://doi.org/10.3390/w13202844\u003c/li\u003e\n\u003cli\u003eJiang, Q.O., Xu, L.D., Sun, S.Y., Wang, M.L., \u0026amp;Xiao, H.J. (2021). Retrieval model for total nitrogen concentration based on UAV hyper spectral remote sensing data and machine learning algorithms-A case study in the Miyun Reservoir, China. Ecological Indicators, 124: 107356. http://dx.doi.org/10.1016/J.ECOLIND.2021.107356\u003c/li\u003e\n\u003cli\u003eLiu, Y., Li, J.S., Xiao C.C., Zhang F.F., \u0026amp;Wang, S.L. (2022). Inland water chlorophyll-a retrieval based on ZY- 102D satellite hyperspectral observations. Journal of Remote Sensing, 26(1):168- 178https://dx.doi.org/10.11834/jrs.20221244\u003c/li\u003e\n\u003cli\u003eLuo, J.H., Yang, J.Z.C., Duan, H.T., Lu L.R., Sun, J., \u0026amp;Xin, Y.Y. (2022). Research progress of aquatic vegetation remote sensing in shallow lake. Journal of Remote Sensing, 26(1):68- 76. https://dx.doi.org/10.11834/jrs.20221208\u003c/li\u003e\n\u003cli\u003eLiang, Y.H., Ding, F.Y., Liu, L., Yin, F., Hao, M.M., Kang, T.T., Zhao, C.P., Wang, Z.T., \u0026amp;Jiang, Dong. (2025). Monitoring water quality parameters in urban rivers using multi-source data and machine learning approach. Journal of Hydrology, 648:132394. https://doi.org/10.1016/j.jhydrol.2024.132394\u003c/li\u003e\n\u003cli\u003eMccullough, M.I., Loftin, S.C., \u0026amp;Sader A.S. (2013). Lakes without Landsat? An alternative approach to remote lake monitoring with MODIS 250m imagery. Lake and Reservoir Management, 29(2):89-98. https://doi.org/10.1080/10402381.2013.778926\u003c/li\u003e\n\u003cli\u003eMcfeeters, S.K. (1996). The use of the Normalized Difference Water Index (NDWI) in the delineation of open water features. International journal of remote sensing, 17(7): 1425-1432. https://doi.org/10.1080/01431169608948714\u003c/li\u003e\n\u003cli\u003eMohammadi, B., \u0026amp;Mehdizadeh, S. (2020). Modeling daily reference evapotranspiration via a novel approach based on support vector regression coupled with whale optimization algorithm. Agricultural Water Management,2020,237:106145. https://doi.org/10.1016/j.agwat.2020.106145\u003c/li\u003e\n\u003cli\u003eRahman, M.M., Shults, R., Surya, P.T., Tiwari, S.P., Arshad, A., Usman, M., Raihan, A., \u0026amp;Ishraque, M.F. (2025). Review on sea water quality (SWQ) monitoring using satellite remote sensing techniques (SRST). Marine Pollution Bulletin, 217:118108. https://doi.org/10.1016/j.marpolbul.2025.118108.\u003c/li\u003e\n\u003cli\u003eMuhoyi, H., Gumindoga, W., Mhizha, A., N. Misi, S., \u0026amp;Nondo, N. (2025). Remote sensing application in compliment to in-situ monitoring of water quality: Lower Manyame Sub-catchment. Zimbabwe. Scientific African, 27: e02551. https://doi.org/10.1016/j.sciaf.2025.e02551\u003c/li\u003e\n\u003cli\u003ePan, W.B., Yu, F., Li, J.L., Li C.Q., \u0026amp;Ye, M. (2025). Quantification of chlorophyll-a in inland waters by remote sensing algorithm based on modified equivalent spectra of Sentinel-2. Ecological Informatics, 87:103061. https://doi.org/10.1016/j.ecoinf.2025.103061\u003c/li\u003e\n\u003cli\u003eQin, H., Su, Q., Khu, S.T., \u0026amp;Tang N. (2014). Water quality changes during rapid urbanization in the Shenzhen River Catchment: An integrated view of socio-economic and infrastructure development. Sustainability, 6(10):7433-7451. https://doi.org/10.3390/su6107433\u003c/li\u003e\n\u003cli\u003eRinaldi, E., Nardelli, B.B., Volpe, G., \u0026amp;Santoleri, R. (2014). Chlorophyll distribution and variability in the Sicily Channel (Mediterranean Sea) as seen by remote sensing data. Continental Shelf Research, 77:61-68. https://doi.org/10.1016/j.csr.2014.01.010\u003c/li\u003e\n\u003cli\u003eSanmiquel, L., Bascompta, M., Vintro, C., \u0026amp;Yubero T. (2018). Subsidence management system for underground mining. Minerals, 8(6): 243. https://doi.org/10.3390/min8060243\u003c/li\u003e\n\u003cli\u003eShi, K., Zhang, Y., Zhu, G., Qin, D.Q., \u0026amp;Pan, D.L. (2018). Deteriorating water clarity inshallow waters: Evidence from long term MODIS and in-situ observations. International Journal of Applied Earth Observation and Geoinformation, 68: 287-297. https://doi.org/10.1016/j.jag.2017.12.015\u003c/li\u003e\n\u003cli\u003eSarafaraz, J., Ahmadzadeh, K.F., Mahmoudi, K.J., Habibzadeh N. (2024). Predicting river water quality: An imposing engagement between machine learning and the QUAL2Kw models (case study: Aji-Chai, river, Iran). Results in Engineering, 21: 101921. https://doi.org/10.1016/j.rineng.2024.101921\u003c/li\u003e\n\u003cli\u003eShi, K., Zhang, Y.L., Zhu, G.W., Liu, X.H., Zhou, Y.Q., Xu, H., Qin, B.Q., Liu, G., \u0026amp;Li, Y.M. (2015). Long-term remote monitoring of total suspended matter concentration in Lake Taihu using 250 m MODIS-Aqua data. Remote Sensing of Environment, 164:43- 56. https://doi.org/10.1016/j.rse.2015.02.029\u003c/li\u003e\n\u003cli\u003eSong, W., A, Y.L., Wang, Y.T., Fang, Q.Q., \u0026amp;Tang, R. (2024). Study on remote sensing inversion and temporal-spatial variation of Hulun lake water quality based on machine learning. Journal of Contaminant Hydrology, 260: 104282, https://doi.org/10.1016/j.jconhyd.2023.104282\u003c/li\u003e\n\u003cli\u003eSarigai., Yang, J., Zhou, A., Han, L.S., Li, Y., \u0026amp;Xie, Y.C. (2021). Monitoring urban black-odorous water by using hyperspectral data and machine learning. Environmental Pollution, 269:116166. https://doi.org/10.1016/j.envpol.2020.116166\u003c/li\u003e\n\u003cli\u003eSalem, S.I., Higa, H., Kim, H., Kobayashi, H., Oki, K., \u0026amp;Oki, T. (2017). Assessment of chlorophyll-a algorithms considering different trophic statuses and optimal bands. SENSORS, 17(8):1746. https://doi.org/10.3390/s17081746.\u003c/li\u003e\n\u003cli\u003eVasistha, P., Ganguly, R,. 2020. Water quality assessment of natural lakes and its importance: An overview. Materials Today. Proceedings, 32: 544-552. https://doi.org/10.1016/j.matpr.2020.02.092\u003c/li\u003e\n\u003cli\u003eWang, C.L., Shi, K.Y., Ming, X., Cong, M.Q., Liu, X.Y., \u0026amp;Guo, W.J. (2022). A Comparative Study of the COD Hyperspectral Inversion Models in Water Based on the Maching Learning. Spectroscopy and Spectral Analysis, 42(08):2353-2358. https://doi.org/10.3964/j.issn.1000-0593(2022)08-2353-06\u003c/li\u003e\n\u003cli\u003eWang, S.M., \u0026amp;Qin, B.Q. (2023). Research Progress on Remote Sensing Monitoring of Lake Water Quality Parameters. Environmental Science, 44(03):1228-1243. https://doi.org/10.13227/j.hjkx.202203285\u003c/li\u003e\n\u003cli\u003eXiong, J.F., Lin, C., Cao, Z.G., Hu, M.Q., Xue, K., Chen, X., \u0026amp;Ma, R.H. (2022). Development of remote sensing algorithm for total phosphorus concentration in eutrophic lakes: Conventional or machine learning. Water Research, 215:118213. https://doi.org/10.1016/j.watres.2022.118213\u003c/li\u003e\n\u003cli\u003eXu, Q., Guo, P., Jin, M., \u0026amp;Qi, J.F. (2021). Multi-scenario landscape ecological risk assessment based on Markov\u0026ndash;FLUS composite model. Geomatics, Natural Hazards and Risk, 12(1): 1449-1466. https://doi.org/10.1080/19475705.2021.1931478\u003c/li\u003e\n\u003cli\u003eYuan, Q.Q., Shen H.F., Li T.W., Li, Z.W., Li, S.W., Jiang Y., Xu, H.Z., Tan, W.W., Yang, Q.Q., Wang, J.W., Gao J.H., \u0026amp;Zhang, L.P. (2020). Deep learning in environmental remote sensing: Achievements and challenges. Remote Sensing of Environment, 241: 111716. https://doi.org/10.1016/j.rse.2020.111716\u003c/li\u003e\n\u003cli\u003eYuan Z.H., LI Q.H., He Y., Ma X.Y., Han M.S., Sun R.G., Zhang H.J. (2019). Variation and evaluation of nutrients in Baihua Reservoir in Guizhou Plateau based on Bayesian method,2014-2018.Journal of Lake Sciences, 31(06):1623-1636. https://doi.org/10. 18307 /2019. 0602\u003c/li\u003e\n\u003cli\u003eYu, Y.S., Ding, P., Bian, H.Y., Wei, J.S., \u0026amp;Zhang, H. (2025). Water quality parameters inversion based on multispectral remote sensing. Journal of Water Process Engineering, 73:107707. https://doi.org/10.1016/j.jwpe.2025.107707\u003c/li\u003e\n\u003cli\u003eZhang, B., Li, J.S., Shen, Q., Wu, H.Y., Zhang, F.F., Wang, S.L., Yao, Y., Guo, L.N., \u0026amp;Yin, Z.Y. (2021). Recent research progress on long time series and large scale optical remote sensing of inland water. National Remote Sensing Bulletin, 25(1): 37-52. https://doi.org/10.11834/jrs.20210570\u003c/li\u003e\n\u003cli\u003eZhang, Y., \u0026amp;Cao, J. (2016). Decision Tree Algorithms for Big Data Analysis. Computer Science,43(S1):374-379+383. https://doi.org/10.11896/j.issn.1002-137X.2016.6A.089\u003c/li\u003e\n\u003cli\u003eZhou, D.M., \u0026amp;Wang, D.Y. (2015). Quantitative Estimation of Chlorophyll-a and Suspended Solids in Taihu Based on Landsat TM. Environmental Science \u0026amp; Technology, 38(S1):362-367. http://dx.doi.org/10.3969/j.issn.1003-6504.2015.6P.075\u003c/li\u003e\n\u003cli\u003eZeng, F.X., Song, C.Q., Cao, Z.G., Xue K., Lu, S.L., Chen, Tan., \u0026amp;Liu, Kai. (2023). Monitoring inland water via Sentinel satellite constellation: A review and perspective. ISPRS Journal of Photogrammetry and Remote Sensing, 204:40-361. https://doi.org/10.1016/j.isprsjprs.2023.09.011\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Subsidence Waterlogged Areas, Machine Learning, Remote Sensing Inversion, Water Quality Parameters, Sentinel-2 Imagery","lastPublishedDoi":"10.21203/rs.3.rs-6914046/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6914046/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThe Long-term large-scale mining of coal underground has led to the destruction of the initial water system structure on the surface, and the water pollution of subsidence waters has become increasingly serious. The accuracy of the traditional water quality parameter concentration inversion model is low, and the current improvement of water quality monitoring technology and the improvement of the inversion accuracy of water quality parameters will play a vital role in protecting the water resources in the mining area. This study focuses on coal mining subsidence water areas in Huainan City, combining measured water quality data from spring, summer, autumn, and winter of 2024 with concurrent Sentinel-2 satellite imagery. Based on statistical regression algorithms and three machine learning algorithms of Random Forest (RF), Decision Tree (DT), and Support Vector Machine (SVM), the concentrations of total nitrogen (TN), total phosphorus (TP), ammonium nitrogen (NH₄⁺-N) and chlorophyll-a (Chl-a) in subsidence waters are fitted and modeled and the accuracy of the model is verified. A comprehensive comparison of model performance in water quality inversion revealed that machine learning models significantly outperformed traditional statistical regression models in terms of inversion accuracy. Among them, RF, DT, and SVM exhibited varying strengths across different seasons and water quality parameters, with the best-performing models achieving coefficient of determination (R\u0026sup2;) values generally exceeding 0.8 and stable validation accuracy. These findings highlight the advantages of machine learning algorithms in water quality remote sensing inversion and further confirm the technical feasibility of this approach for monitoring complex aquatic environments. By integrating scientific data analysis with machine learning techniques, It not only provides more accurate data support for the monitoring and management of water quality in coal mining subsidence waters, but also provides a scientific decision-making basis for water ecological protection.\u003c/p\u003e","manuscriptTitle":"Research on The Inversion Model of Water Environment Parameters of Coal Mining Subsidence Waters Based on Machine Learning","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-07-04 09:17:01","doi":"10.21203/rs.3.rs-6914046/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"f0ad5171-45bc-40b8-8cb2-fcbe0a794acb","owner":[],"postedDate":"July 4th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2025-07-19T18:23:24+00:00","versionOfRecord":[],"versionCreatedAt":"2025-07-04 09:17:01","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-6914046","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6914046","identity":"rs-6914046","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00