Prediction of water temperature based on machine learning algorithm integrating endogenous thermal dynamics for pond aquaculture

doi:10.22541/au.176122116.62333498/v1

Prediction of water temperature based on machine learning algorithm integrating endogenous thermal dynamics for pond aquaculture

2025 · doi:10.22541/au.176122116.62333498/v1

preprint OA: closed

Full text JSON View at publisher

Full text 66,673 characters · extracted from preprint-html · click to expand

Prediction of water temperature based on machine learning algorithm integrating endogenous thermal dynamics for pond aquaculture | Authorea try { document.documentElement.classList.add('js'); } catch (e) { } var _gaq = _gaq || []; _gaq.push(['_setAccount', 'G-8VDV14Y67G']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); Skip to main content Preprints Collections Wiley Open Research IET Open Research Ecological Society of Japan All Collections About About Authorea FAQs Contact Us Quick Search anywhere Search for preprint articles, keywords, etc. Search Search ADVANCED SEARCH SCROLL This is a preprint and has not been peer reviewed. Data may be preliminary. 23 October 2025 V1 Latest version Share on Prediction of water temperature based on machine learning algorithm integrating endogenous thermal dynamics for pond aquaculture Authors : Jiayi Qiu , Ang Zhang [email protected] , Yanhao Lin 0009-0006-5701-2303 , Zhilun Lin , Tengfei Liu , Shuchang Zhang , and Yuqi Zeng Authors Info & Affiliations https://doi.org/10.22541/au.176122116.62333498/v1 290 views 108 downloads Contents Abstract Introduction Literature review Experiment setup Experiment Time-channel fusion Spatial Dimension Fusion Temporal Dimension Fusion Identification of predictors Predictors of Temperature Stratification The choice of the ML algorithm xLSTM: Extended Long Short-Term Memory Model optimization PID Optimization Based on Physical Mechanisms Discussions Conclusion Acknowledgements Affiliation Information & Authors Metrics & Citations View Options References Figures Tables Media Share Abstract Abstract Traditional water temperature prediction relies on discrete point measurements, neglecting endogenous thermal noise induced by biological activity and thermal inertia. To address the challenge of vertical thermal stratification prediction in pond aquaculture, a novel prediction model integrated thermodynamic with a Multi-Attention eXtended Long Short-Term Memory Network ( MAN-xLSTM ) is proposed. The pond system is modelled as a ”low-pass filter” with high-frequency disturbances. Additionally, the application of a proportional-integral-derivative ( PID ) controller is incorporated to regulate the filter’s dynamic response. External physical changes are considered as an integrator, while endogenous noise functions as a differentiator. The models are evaluated in a fishpond in Minhou County, Fujian Province, China. A self-developed IoT-based monitoring system is used to achieve high-frequency measurements. Furthermore, dynamic spatiotemporal channel fusion is applied to construct a three-dimensional thermal stratification model, enabling effective quantification of spatiotemporal thermal lag. The results indicate that the root mean square error ( RMSE ) across all 20 nodes was maintained within 0.1 degree Celsius (℃), achieving a minimum RMSE of 0.0219 ℃, significantly outperforming benchmark models. This study provides a theoretical breakthrough for thermal regulation in pond aquaculture, representing the first successful coupling of endogenous thermal disturbances and external environmental drivers in aquaculture pond modeling. Introduction Pond aquaculture, as a primary form of aquatic farming (Zhou et al., 2025), requires accurate thermal monitoring and forecasting due to the critical influence of water temperature on the physiological processes of aquatic organisms. Previous studies have demonstrated that temperature conditions significantly impact fish swimming performance (Stitt et al., 2014), digestive metabolism (Bowyer et al., 2014), and reproductive success (Donelson et al., 2010). However, existing monitoring systems face considerable challenges in predicting temperature variations along vertical water profiles. These difficulties arise from complex thermal dynamics governed by the interplay of multiple environmental and biological factors. The thermal regime exhibits spatiotemporal variability due to exogenous environmental drivers (Woolway & Merchant, 2018; Woolway et al., 2020) and endogenous system characteristics. These multidimensional influences result in high-frequency thermal oscillations with short periodicities. Such fluctuations are further compounded by the system’s inherent thermal inertia and temporal hysteresis. Moreover, the vertical heterogeneity in water’s heat capacity presents an additional challenge for predictive modeling. Conventional monitoring approaches—typically based on discrete-time and spatially localized measurements—are in adequate for capturing the full complexity of three-dimensional thermal dynamics. Traditional water temperature prediction models, driven by exogenous factors such as Lake Surface Water Temperature (LSWT), typically estimate water temperatures based on measurements taken at 1 meter below the surface to reflect the thermal dynamics of the water body. Azadeh Yousef et al. proposed that lake can be considered a ‘filter’, where external signals excite and ultimately result in a low-frequency response. It is noted that this approach inherently assumes thermal homogeneity within water, treating it as a uniform system while neglecting endogenous fluctuations. However, it is imperative to underscore that in aquaculture environments, which differ from natural water bodies, temperature dynamics are influenced not only by external environmental factors but also significantly by biological processes associated with higher densities. These internal processes include the metabolic activities of fish larvae, excretion products, microbial communities and variations in plankton density. These endogenous factors contribute to thermal heterogeneity, leading to distinct stratification and interactions among different depth layers. Consequently, the water body cannot be idealized as a purely low-pass filter, as internal noise significantly affects the system’s thermal response. Our model conceptualizes the water body as a ”low-pass filter” with high-frequency disturbances, and thus proposes the application of a proportional-integral-derivative (PID) controller to regulate the filter’s dynamic response. External physical changes can be regarded as an integrator because external excitation accelerates the signal’s approach to steady-state response, typically as low-frequency fluctuations, directly causing the lead in surface water temperature response. In contrast, changes induced by internal biological activity are considered as a differentiator, as endogenous noise in the water generates disturbances to the steady-state response, usually at a high frequency. The disturbance is always delayed to response, acting as the lag in deep-water temperature response. Introducing the integrator allows the system output to more quickly approach steady- state response, better reflecting the influence of external physical changes, while the differentiator helps adjust overshoot, thereby enhancing the model’s ability to account for endogenous noise interference caused by biological activities in real aquaculture environments. If the model is solely viewed as an ideal low-pass filter, it will inevitably overlook the impact of high-frequency noise within the water, leading to control system mismatch. Therefore, for a non-ideal filter with high-frequency disturbance, incorporating the PID controller can effectively suppress high-frequency noise, eliminating the steady state error. Thus, the traditional prediction model can be further optimized by integrating the thermodynamic processes of water at different depths. The limitations of conventional approaches pose dual challenges in aquaculture water temperature control: management interventions based on instantaneous measurements risk temporal mismatches with actual thermal states, while localized data extrapolation induces spatial discrepancies. To address these limitations, the model necessitates the integration of continuous temporal water temperature measurements at progressively sampled depths, rather than relying on discrete-time and localized measurements. This approach enables enhanced regulation of the system’s steady-state characteristics, thereby mitigating errors arising from endogenous thermal fluctuations. Therefore, we propose a novel predictive model for aquatic temperature regulation, which integrates hydrothermal dynamic principles with machine learning predictive algorithms. Utilizing a self-developed IoT-based monitoring system for field data collection, the model dynamically combines external heat transfer processes with endogenous bio-thermal regulation mechanisms. The proposed cyber-physical modeling paradigm constitutes a methodological breakthrough, offering transformative potential for precision thermal regulation, data-driven aquaculture optimization, and ecological resilience enhancement in pond ecosystems. Literature review Water prediction models can be divided into physically-based models and data-driven models. Physically-based models generally require extensive site-specific external data inputs (e.g., lake bathymetry and meteorological variables) rather than relying solely on water temperature data (Livingstone, 2003; Piotrowski & Napiorkowski, 2018). For vertically stratified reservoirs, Chen et al. (1998) developed a one-dimensional vertical model integrating surface heat flux, inflow and outflow. Ji et al. (2007) proposed a dimension- reduction method using Fourier expansion to simplify three-dimensional thermal diffusion equations into a two-dimensional system, highlighting its advantages in resolving thermodynamic mechanisms. Most recently, Wade et al. (2024) coupled physics-driven models with the National Water Model, achieving notable performance in forecasting time series. Even though physically-based models can provide predictive outcomes, they remain de- pendent on substantial water temperature data for calibrating and validating. Data-driven models, however, rely on water temperature data to other influential factors, demonstrating a significant diversity in methodologies. Some traditional statistical models, such as the Autoregressive Integrated Moving Average (ARIMA) model, can be applied to time series forecasting (Jia et al., 2010). However, this linear model has been shown to perform less effectively on large datasets compared to machine learning models (Huo et al., 2013). Data-driven models also include machine learning (ML) algorithms, which are becoming increasingly popular as general-purpose computational models (Mohri et al., 2018; Hong et al., 2023; Du et al., 2023). These models have gradually gained popularity due to their ability to handle large-scale datasets and address complex nonlinear relationships (Sharma et al., 2008; Liu and Chen, 2012; Read et al., 2019). Recurrent Neural Networks (RNNs) have shown extraordinary performance and adapt- ability in time-series analysis. RNNs can effectively capture the temporal dependencies and dynamic changes, thus enabling accurate modeling and forecasting of both short-term fluctuations and long-term trends (Zaremba et al., 2015). However, traditional RNN architectures suffer from the vanishing gradient problem. To address this limitation, variants such as Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU) introduce gating mechanisms that enhance model stability and predictive accuracy (Wang et al., 2024). The LSTM model, a specialized form of RNN architecture, consists of three gating mechanisms—input, forget, and output gates—as well as two internal states: the cell state and the hidden state. Unlike traditional statistical approaches such as ARIMA, LSTM is well-suited for capturing nonlinear relationships and complex temporal dynamics in time-series data (Jia et al., 2010; Qin et al., 2023). Its ability to model long-term dependencies makes it particularly effective for volatile and non-seasonal water quality datasets (Wang et al., 2013; Zhi et al., 2021). Building on this strength, Roushangar et al. (2024) developed a hybrid LSTM-based model to predict dissolved oxygen (DO) levels in natural river systems. Similarly, Huan et al. (2020) proposed an ensemble model combining gradient-boosting decision trees with LSTM to enhance DO prediction accuracy. Despite its advantages, LSTM exhibits limitations when applied to large-scale datasets, including low computational efficiency and susceptibility to gradient vanishing (Beck et al., 2024). To address these challenges, the extended Long Short-Term Memory (xLSTM) architecture introduces several structural enhancements. Specifically, xLSTM modifies the traditional memory structure by incorporating two additional memory units: Matrix LSTM (mLSTM) and Scalar LSTM (sLSTM). These enhancements not only improve the efficiency of memory updates but also facilitate parallel computation, thereby significantly accelerating processing speed. Furthermore, the inclusion of residual connections mitigates the risks of both gradient vanishing and explosion, contributing to improved model stability and training performance (Beck et al., 2024). xLSTM significantly enhances computational efficiency and model scalability while maintaining prediction accuracy, making it a highly promising innovative prediction model. However, current deep learning models still have three significant drawbacks. Firstly , these models often operate as ”black boxes,” relying solely on data-driven learning while neglecting thermodynamic principles that govern water temperature dynamics—an omission that can compromise predictive accuracy. Most existing research on thermal stratification has concentrated on deep lakes and other open-water systems (Read et al., 2019). In contrast, studies on pond environments remain limited, with prevailing stratification the- ories frequently based on expert suggestions or empirical knowledge from fishermen (Chen et al., 2023). Notably, there remains a lack of standardized calibration methods tailored to pond thermal stratification. Secondly , the vertical heterogeneity of water’s heat capacity further complicates predictive modeling, as current methods relying on discrete point measurements fail to capture the three-dimensional thermal dynamics. Pond water temperature regulation is an inertial process with significant time lag effects, influenced by both external heat transfer and internal biological processes. Therefore, in real pond environments, considering the water as an ideal ”filter” is inadequate. It is crucial to optimize the model by accounting for the internal heterogeneity of the pond. Thirdly , accurately predicting water temperature at varying depths using high-frequency, short-term datasets remain a considerable challenge. This difficulty arises from the highly dynamic nature of interacting physical, chemical, and biological processes. Internal factors—such as fish activity, fry development, and organic matter respiration—continuously alter the thermal profile of the water column. In addition, numerous externally driven environmental variables, including air temperature, wind speed, and humidity, further complicate thermal dynamics. The interplay of these internal and external influences introduces substantial variability, which conventional models often struggle to account for effectively. This paper focuses on a central question, that is, how to optimize machine learning prediction models by integrating the physical processes of thermal diffusion? An innovative approach for pond water temperature prediction is proposed, analyzing short-term high-frequency water temperature data in terms of noise reduction, water temperature stratification, and prediction layers. The main contributions of this study are as follows: Firstly , we have introduced an innovative approach to enhance the controllability of the pond system. We consider it a ‘low-pass filter’ with high frequency disturbance instead of an ideal one. The system is optimized by the PID controller, where external physical changes are treated as an ‘integrator’ that rapidly excites the system toward the steady-state response, and internal biological activity as a ’differentiator’ to control overshoot and oscillations. Secondly , to further select the indicators of ‘PID controller’ and prediction models, a dynamic time-channel fusion is used to quantify water temperature stratification from both spatial and temporal dimensions. The impact of various sliding window sizes is tested by comparing RMSE changes, determining the optimal sliding window parameters. Thirdly , the xLSTM algorithm is employed as the baseline for small-sample, high- frequency predictions of water temperature at each node across different water layers. Furthermore, optimization is proposed in terms of algorithm and thermal dynamics. Multi-level attention is carried out to complete the final network – multi-attention xLSTM network (MAN-xLSTM). Besides, several machine learning models are tested and compared with the MAN-xLSTM model, demonstrating the predictive performance of the proposed approach. Finally , a custom-developed IoT-based monitoring system is deployed to enable continuous water temperature sampling, replacing traditional discrete measurement methods. The system enables the capture of high-frequency temperature data at one-minute intervals. In parallel, external environmental indicators are collected from the official website of the China Meteorological Administration (CMA). Four distinct noise-reduction algorithms are experimented to improve the overall accuracy of the dataset. Figure 1 illustrates the workflow, and Figure 2 summarizes the main innovations of our work. Figure 1: Flowchart of our work. Figure 2: Comparison of traditional prediction methods and ours. Experiment setup The data is collected from a fish farming pond located in Shangjie Town, Minhou County, Fuzhou City, Fujian Province, China (26°05´N, 119°19´E). The time range of data collection is from 00:01:26 on September 1st to 02:17:24 on September 20th, a total of 11,659 data points. The experimental site is in a typical aquatic ecosystem, with a total area of 13,330.67 m 2 . The pond mainly farms catfish and provides ideal natural conditions for the experiment. The geographic location and equipment images are shown in Figure 3 (a) – (d). Figure 3: (a)The map of China. (b) A satellite view that illustrates the location andsurrounding geographical environment, with a pentagram symbol indicating themeasurement points. (c) The tube sensor used in the experiment. (d) Thephotograph of measurement.To achieve high-frequency continuous measurement, this study utilizes a customed- developed Internet of Things (IoT) real-time monitoring system for continuous monitoring of the pond environment. The system architecture diagram is shown in Figure 4.Water quality data is collected using self-designed hardware. The waterproof enclosure and sensor external interfaces are customized. The data collector uses the Holtek AIR480E as the MCU for data collection and communication. The monitoring result is displayed on both mobile and web platforms.Figure 4: IoT system architecture diagram.Water temperature is measured using a tube sensor, connected to a data collector and weights at both sides, respectively. The tube has a total length of 2.2 meters, of which 1.8 meters is submerged. A total of 20 sensing nodes are distributed along the tube at 10 cm intervals. Nodes 1 and 2 are positioned above the water surface to measure near-surface air temperature, while the remaining 18 sensors are located underwater to capture temperature variations at different depths. The sampling interval is set to 2 minutes.External data is collected from the official website of the China Meteorological Administration (CMA) [https://www.weather.com.cn/], including seven meteorological indicators: air temperature (AT), wind speed (WS), air pressure (AP), humidity, air quality (AQ), and visibility (V), with a 5-minute time interval. The collected data structure is shown in Table 1. Experiment Noise reduction Since water temperature signals are easily affected by high-frequency noise from external sources, it is essential to filter the original signals before using deep learning algorithms for water temperature prediction. This paper compares four common signal denoising methods: wavelet threshold (WT), moving average (MA), Loess smoothing (LS), and Savitzky-Golay filtering (SGF). Additionally, we use xLSTM as the prediction algorithm to calculate the RMSE for the training and validation sets of different nodes (The algorithm construction details are presented in Section 3.4). The results are shown as Figure 5 (a) - (b). Among the four noise reduction methods evaluated, the discrete wavelet thresholding (WT) algorithm indicates the highest level of robustness. Specifically, WT achieves the lowest RMSE on the validation set while maintaining a consistent trend with the training RMSE. Although it ranks second in training set accuracy, its overall performance reflects high stability. Consequently, WT is selected as the final filtering algorithm for processing the raw temperature data. The corresponding visualization is provided in Figure 6. Table 1: Example format of the collected data. Time nWT1 nWT2 . . . nWT20 AT AP WL WS Humidity V(km) AQ ( ◦ C) ( ◦ C) ( ◦ C) ( ◦ C) (hPa) (km/h) (%RH) 24/09/01 00:01 24/09/01 00:04 24/09/01 00:07 24/09/01 00:10 24/09/01 00:13 24/09/01 00:16 24/09/01 00:19 24/09/01 00:22 29.6 29.5 32.6 27.4 1002 1.98 8.85 90.0 7.07 46.0 29.6 29.5 32.6 27.4 1002 1.90 8.40 89.8 7.26 46.0 29.6 29.5 32.6 27.4 1002 1.85 8.10 89.7 7.38 46.0 29.7 29.5 32.6 27.4 1002 1.80 7.81 89.7 7.50 46.0 29.7 29.5 32.6 27.4 1002 1.75 7.51 89.6 7.61 46.0 29.7 29.6 32.6 27.4 1002 1.68 7.08 89.5 7.77 46.0 29.8 29.6 32.6 27.4 1002 1.63 6.80 89.4 7.88 46.0 29.8 29.6 32.6 27.4 1002 1.59 6.52 89.4 7.97 46.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24/09/01 23:41 24/09/01 23:44 24/09/01 23:47 24/09/01 23:50 24/09/01 23:53 24/09/01 23:56 24/09/01 23:59 29.9 29.7 32.9 27.9 1001 1.00 2.63 87.2 9.31 48.9 29.8 29.7 32.9 27.9 1001 1.00 2.68 87.1 9.24 48.9 29.8 29.6 32.9 27.9 1001 1.00 2.75 87.1 9.15 49.0 29.8 29.6 32.9 27.9 1001 1.00 2.80 87.0 9.10 49.0 29.8 29.6 32.9 27.9 1001 1.00 2.85 87.0 9.06 49.0 29.7 29.6 32.9 27.9 1001 1.00 2.90 87.0 9.03 49.0 29.6 29.5 32.9 27.9 1001 1.00 3.00 87.0 9.00 49.0 Figure 5: RMSE for the training and validation sets of differentnodes. All RMSE values are averaged over 50 independent runs. Figure 6: Time-series visualization results of WT method. Time-channel fusion Due to significant vertical thermal stratification and stochastic fluctuations induced by environmental and biological factors (Chen et al., 2023), pond water cannot be treated as an ideal filter. Instead, it functions as a filter with high-frequency disturbances. To capture these effects in the predictive model, spatial and temporal stratification features derived from continuous data are used as key inputs to the PID controller.In this study, we propose a novel time–channel fusion approach to dynamically model water temperature stratification across both spatial and temporal dimensions. Spatial Dimension Fusion A time-channel fusion module is introduced to fuse sensor nodes with similar character- istics, thereby determining the location of water temperature partitions.First, the time series is reduced to a low-dimensional temporal embedding vector.For each sensor node, its temperature time series is denoted as$TS_{i}=\{T_{i}\left(t\right)\}_{t=1}^{L}$.This series is projected into a low-dimensional temporal embedding space using a set of learnable basis functions $\varphi_{m}(\bullet)$.where $\varphi_{m}(\bullet)$ represents the discrete wavelet transform (DWT), which captures both seasonal trends and transient high-frequency perturbations.$TS_{i}=\{T_{i}\left(t\right)\}_{t=1}^{L}$is the temperature time series of node over a time window of length L.$\varphi_{M}$ is the m-th wavelet basis function used for feature extraction (via DWT). $e_{i}\in\mathbb{R}^{M}$ is the embedding vector representing node. M is the embedding dimension, corresponding to the number of extracted features.This paper compares the following methods for implementing time-channel fusion: Spec- tral Clustering, Fuzzy C-means Clustering, Self-Organizing Map (SOM), K-means Clustering and Balanced Iterative Reducing and Clustering Using Hierarchies (BIRCH). Here, we de- clare the detailed process of Spectral Clustering, Fuzzy C-means Clustering, Self-Organizing Map (SOM), and Balanced Iterative Reducing and Clustering Using Hierarchies (BIRCH) below. Spectral Clustering Step 1) Perturbation-Aware Similarity Construction For any two sensor nodes, a composite distance is defined using their temporal embedding vectors e i , e j and water depths z i , z j ,where ∥⋅∥ 2 denotes the Euclidean norm of a vector.$\delta_{\text{ij}}$ is the composite perturbation-aware distance between two nodes. α, β ∈ [0, 1] denote fusion weights derived from the normalized high-frequency noise power and vertical temperature variance, satisfying α + β = 1. Step 2) Normalized Graph Laplacian The similarity matrix$\mathbf{W}=[w_{\text{ij}}]_{N\times N}$ is defined bywhere $\sigma$ is scale parameter controlling the decay rate of similarity. $\mathbf{W}\in\mathbb{R}^{N\times N}$ is weight matrix of the graph. w ij represents the edge weight between two nodes. $\mathbf{D}\in\mathbb{R}^{N\times N}$ is the diagonal degree matrix. $\mathbf{L}_{\mathbf{\text{sym}}}\in\mathbb{R}^{N\times N}$is symmetric normalized Laplacian matrix used for spectral decomposition. I is an identity matrix. Step 3) Spectral Embedding Compute the eigenvectors corresponding to the smallest K eigenvalues of Lsym, we havewhere $\mathbf{V}\in\mathbb{R}^{N\times K}$ is matrix of eigenvectors. $\mathbf{V}_{*}\in\mathbb{R}^{N\times K}$ is the i-th row of V, representing the spectral coordinates of node i.$u_{i}\in\mathbb{R}^{K}$ is normalized spectral embedding of node i. Step 4) Thermal Layer Aggregation Apply K-means clustering to the set of normalized embeddings {ui} to assign cluster labels. Each cluster $c_{K}=\{s_{i}|y_{i}=k\}$corresponds to a thermal layer, where $y_{i}\in\{1,2,\ldots,k\}$ is the cluster label assigned to sensor node i by the K-means algorithm.$c_{K}$ is the $k$-th cluster, interpreted as a distinct thermal layer in the water body. Fuzzy C-means Clustering Step 1) Noise-Aware Objective Function Minimize the fuzzy objective, we havewhere $u_{\text{ik}}\in[0,\ 1]$ denote the membership degree of node i belonging to thermal layer k. m is the fuzzifier parameter, set as 2. $\sigma_{i,\ \ \ \text{high}}$ is the high-frequency noise standard deviation of node i. Step 2) Iterative Update Rules The update rules consist of three steps, which are membership update, centroid update, and termination condition, respectively. Membership Update Centroid Update Termination Condition where ε is the convergence threshold. Self-Organizing Layering Map (SOM) Step 1) Input Representation Vertical coordinate extension is then performed aswhere λ is a normalization factor to scale depth values into a range comparable with the embedding features. Step 2) Projection to Grid A hexagonal grid of size P×Q is constructed, representing the spatial projection space. Each neuron r in the grid is associated with a weight vectorwhere $m_{r}^{(t)}\in\mathbb{R}^{M+1}$ is the full weight vector of neuron r at iteration t, including temporal and depth dimensions.$w_{r}^{(t)T}\in\mathbb{R}^{M}$ is the temporal part of the weight vector. $h_{r}^{(t)}\mathbb{\in R}$ the depth component of the weight vector. Step 3) Matching and Updating The updating process follows the following steps. Determine Best Matching Unit (BMU) where $m_{r}^{(t)}\in\mathbb{R}^{M+1}$ is weight vector of neuron$r$ at iteration $t$. $D(.,.)$ denotes squared Euclidean distance.$r^{*}$ is the index of the best matching unit (BMU) for node i. Determine Neighborhood Function where $H_{r,r^{*}}^{(t)}$ is the influence strength between neuron$r$ and the BMU $r^{*}$.$\text{dist}_{\text{map}}\left(r,r^{*}\right)$ is the topological distance between neurons $r$ on the grid. $\sigma(t)$ is the decaying neighborhood radius at time $t$. $\sigma_{0}$ is the initial neighborhood radius. $\tau_{\sigma}$ is the decay time constant. Adaptive Learning Rate where $\eta_{i}(t)$ is the learning rate for node $i$ at iteration$t$. $\eta_{0}$ is the initial learning rate. $\tau_{\eta}$ is the learning rate decay constant. $\sigma_{i,\ \text{high}}$ is the high-frequency noise standard deviation of node i. $\kappa$ is the penalty coefficient for noise suppression. Weight Vector Update where $m_{r}^{(t)}$ is current weight of neuron $r$. Step 4) Thermal Layer Extraction After training, neurons with similar water depths are grouped into connected components to form thermal layers,where $\mathcal{L}_{k}$ is the $k$-th thermal layer, represented by a set of neurons $r$. $h_{r}$ is the water depth component of neuron$r^{\prime}s$ weight. ${\overline{h}}_{k}$ is the average water depth of the group. $\delta_{z}\approx 0.2\bigtriangleup z$ is the threshold to control acceptable vertical variation within a thermal layer. K-means Clustering and Balanced Iterative Reducing and Clustering Using Hierarchies (BIRCH) Step 1) Input Representation Following the same step in SOM method, we the input representation. Step 2) Extended CF-quadruple First, we determine the CF-quadruple as below,where $N$ is the number of samples contained in the CF (sub-cluster).$c$ is the centroid of temperature embeddings in the CF.$\overline{z}$ is the mean depth in the CF. $\beta$ is the weight that tunes the contribution of the depth term in the distance. $d^{2}$is the squared distance between a sample and the CF centroid that simultaneously considers time-series similarity and depth offset. Step 3) Noise-adaptive threshold where T i denotes the leaf-node radius.$\sigma_{i,\text{high}}\ $is the standard deviation of the high-frequency noise at the node and $k\in[0,1]$. Step 4) Insertion & split strategy • Path selection: from root downward, choose the child with the smallest$d^{2}$. • Attempt absorption: if the leaf radius after insertion $\leq T_{i}$, update its CF. • Leaf split: otherwise create a new leaf node. • Non-leaf split: if a parent overflows, split it and rebalance the hierarchy. Step 5) Layer generation After the tree is built, run one pass of K-means Algorithm on the CF vectors. Each cluster becomes a thermal layer $\mathcal{L}_{k}$. Evaluation indexes The Silhouette coefficient, Calinski–Harabasz index, Dunn index, and Davies–Bouldin index are widely used evaluation metrics for assessing the quality of clustering results (Rousseeuw, 1987; Caliński & Harabasz, 1974; Dunn, 1974; Davies & Bouldin, 1979). In this study, these four indices are employed to evaluate the clustering performance of dynamic thermal stratification in pond water. Silhouette coefficient (Silh) where $n$ is the total number of samples. $x_{i}$ is the $i$-th sample. $C_{y_{i}}$ is the cluster that contains $x_{i}$.$n_{y_{i}}$ is the size of cluster $C_{y_{i}}$. $a_{i}$ is the average distance from $x_{i}$ to all other samples in its own cluster.$b_{i}$ is the smallest average distance from $x_{i}$ to samples in any other cluster. $d(\bullet,\bullet)$ is the sample-to-sample distance. Calinski–Harabasz index (CH) where $K$ is the number of clusters. $C_{k}$ is the $k$-th cluster. $n_{k}$ is the size of $k$-th cluster. $\mu_{k}$ is the centroid of cluster $k$-th cluster. $\mu$ is the overall mean of the whole data set. Dunn index (Dunn) where $\delta(C_{p},\ C_{k})$ is the smallest distance between any two points belonging to distinct clusters $C_{p}$ and $C_{q}$.$\mathrm{\Delta}(C_{k})$ is the largest distance within cluster$C_{k}$. Davies–Bouldin index (DB) where s i is the average distance of points in cluster $i$ to their centroid.Within, Silhouette coefficient, Calinski-Harabasz index, and Dunn index are positive indicators, while Davies-Bouldin index is a negative indicator.Based on the evaluation results presented in Table 2, the BIRCH clustering method exhibits the highest performance in spatial fusion. To further enhance the fusion quality, Principal Component Analysis (PCA) is employed for dimensionality reduction prior to clustering (Migenda et al., 2021). Among all PCA-enhanced methods, the PCA–BIRCH combination achieves the highest overall accuracy and consistency. Taking both fusion performance and computational efficiency into account, this study adopts the PCA–BIRCH strategy as the final approach for spatial dimension fusion.The evaluation results for the models are shown in Table 2. Table 2: Evaluation results of time-channel fusion methods. Spectral 0.5222 48.45 0.5489 0.1390 PCA-Spectral 0.6241 98.25 0.4106 0.1506 Fuzzy C-means 0.5222 48.45 0.5489 0.1390 PCA-Fuzzy C-means 0.6241 98.25 0.4106 0.1506 SOM 0.4690 35.96 0.5925 0.1636 PCA-SOM 0.6139 81.76 0.4322 0.1560 K-means 0.5222 48.46 0.5789 0.1390 PCA-K-means 0.6241 98.25 0.4106 0.1506 BIRCH 0.5119 46.99 0.5553 0.1900 PCA-BIRCH 0.6087 93.67 0.4192 0.2145 Temporal Dimension Fusion Notably, external environmental factors that alter water temperatures such as solar radiation and air temperature fluctuations—typically exhibit low-frequency variations, whereas internal factors—such as biological activities—are characterized by high-frequency dynamics. These multi-scale temporal characteristics induce endogenous thermal stratification dynamics at different time scales. Therefore, incorporating temporal quantification is essential for a more comprehensive characterization of water temperature stratification.Here we introduce sliding window method to further optimize the fusion. To determine the optimal parameter combination for the sliding window, this paper tests three sliding window parameter combinations: [‘window width’, ‘window step’] = [‘1800’, ‘300’], [‘1800’, ‘600’], and [‘2500’, ‘600’].The RMSE values for each combination are calculated, shown in Figure 7. It is indicated that when [‘window width’, ‘window step’] = [‘2500’, ‘600’], the RMSE is minimum, indicating that the dynamic time-channel fusion result under this combination yields the best optimization effect for subsequent model predictions.The time-channel visualization result for [‘window width’, ‘window step’] = [‘2500’, ‘600’] is shown in Figure 8. We observe that the fusion results at the front and rear nodes closely match the division points of the sliding window, indicating the high consistence with our dynamic fusion results, further validating the rationality of this parameter combination. Figure 7: RMSE values for each sliding window parameter combination.All RMSE values are averaged over 50 independent runs. Figure 8: Visualization of time-channel fusion. Different colors represent different temporal blocks. Identification of predictors 1. Kendall Correlation Analysis After implementing dynamic time channel fusion, the next step is to select appropriate predictive indicators as inputs for the DL model to predict the target node.This study selects air temperature (AT), air pressure (AP), wind level (WL), wind speed (WS), humidity (%RH), visibility (V), air quality (AQ), and node water temperature (nWT) as the input variables.We recall that Kendall’s rank correlation measures the strength and direction of the association between two variables based on their ordinal ranks (Kendall, 1938).Based on the results shown in Figure 9, the correlation between external indicators and the water temperature nodes decreases as the depth of the nodes increases. Specifically, air temperature (AT) shows a clear positive correlation with water temperature, while humidity(H) exhibits a significant negative correlation with water temperature.It is worth noting that the correlation between sensor nodes decreases as depth increases. For example, the correlation between nodes 1 and 2 is the strongest, while the correlation with node 20 is the weakest. This is due to the lag in temperature changes caused by the spacing between sensors, further confirming that the temperature changes within the pond are not synchronous. Predictors of Temperature Stratification As detailed in Section 3.2, the influences of the temporal and spatial dimensions on water temperature stratification have been discussed. This section will compare the impact of various predictors of water temperature stratification on prediction results, to validate the necessity and importance of dynamic water temperature stratification. Figure 9: Kendall’s rank correlation between variables.Considering the spatial stratification of water temperature, this study defines four categories of inputs. Case O : set original data excluding the target node as multiple inputs. Case A : considering spatial stratification only. Set water quality indicators including the temperature of nodes in the same zone, excluding the target node, the two adjacent nodes at the upper and lower zone, and external indicators as multiple inputs. Case B : considering the temporal stratification only. Each subinterval with consistent fusion results is merged. When the internal fusion results of a subinterval are inconsistent, it is set as a distinct time block. Within the same time block, the water temperature data exhibits consistent variation, and we set original data excluding the target node as multiple inputs. Case A-B : consider both spatial and temporal stratification.These four types of inputs are respectively fed into the DL model. The results are shown in Table 3. Table 3: Evaluationresults of time-channel fusion methods. All RMSE values are averaged over 50indepen- dent runs. O 3 0.0376 0.0344 6 0.0192 0.0339 9 0.0232 0.0243 12 0.0268 0.0277 15 0.0211 0.0305 18 0.0307 0.0333 A 3 0.0191 0.0133 6 0.0184 0.0206 9 0.0158 0.0262 12 0.0167 0.0231 15 0.0125 0.0268 18 0.0170 0.0215 B 3 0.0121 0.0262 6 0.0166 0.0210 9 0.0164 0.0237 12 0.0252 0.0256 15 0.0163 0.0226 18 0.0179 0.0201 AB 3 0.0110 0.0120 6 0.0110 0.0190 9 0.0160 0.0170 12 0.0150 0.0180 15 0.0150 0.0220 18 0.0140 0.0200 The Case A-B input indicates the highest performance in terms of RMSE.All RMSE values in Case A are lower than those in Case O. This is because Case A considers the leading and lagging effects of water temperature changes caused by thermal stratification, resulting in more accurate outcomes. Similarly, the performance of all nodes in Case B is better than in Case O, as Case B incorporates the influence of the periodic patterns of water temperature variation. The results indicate that the influence of spatial and temporal stratification on water temperature prediction, as discussed in Section 3.2, is significant. Therefore, it is crucial to consider both spatial and temporal stratification char- acteristics to further quantify the lagging and leading effects of water temperature changes. The choice of the ML algorithm 1. LSTM: Long-short term memory Hochreiter (1991) proposed the LSTM, introducing a scalar memory unit as a central processing and storage unit, which effectively mitigates the vanishing gradient problem. This memory unit comprises three gates: the input gate, the output gate, and the forget gate, which are described as followswhere $c_{t}$ denotes the cell state at the current time step, which is composed of the previous state $c_{t-1}$ modulated by the forget gate and the new input $z_{t}$ controlled by the input gate. $h_{t}$is the current hidden state, which is obtained by element-wise multiplication of the output gate $o_{t}$ and the activated cell state${\widetilde{h}}_{t}$. The activation value of the input information, denoted as $z_{t}$, is derived by applying a linear combination${\widetilde{z}}_{t}$ followed by an activation function $\varphi$. The input gate $i_{t}$, which regulates the weight assigned to the new input $z_{t}$, is computed through a linear combination followed by an activation $\sigma$ (sigmoid). Similarly, the forget gate $f_{t}$, which determines the retention ratio of the previous cell state$c_{t-1}$, is computed via an activated linear combination${\widetilde{f}}_{t}$. The output gate $o_{t}$, controlling the output of the hidden state, is computed through the activation$\sigma$ of a linear combination ${\widetilde{o}}_{t}$ The activated cell state ${\widetilde{h}}_{t}$ is typically obtained using an activation function $\psi(c_{t})$ (e.g., $\tanh\theta$). $w_{z}$,$w_{i}$, $w_{f}$, $w_{o}$, represents the input weight,$r_{z}$, $r_{i}$, $r_{f}$, $r_{o}$, represents the recurrent weight, and $b_{z}$, $b_{i}$, $b_{f}$, $b_{o}$, represents the bias term. xLSTM: Extended Long Short-Term Memory However, traditional LSTM models still exhibit certain limitations in storage capacity, computational stability, and parallel processing capabilities. To address these issues, the xLSTM optimizes the gating mechanism, memory storage, and computational methodology of the traditional LSTM by integrating the concepts of scalar LSTM extension (sLSTM) and matrix LSTM extension (mLSTM) (Beck et al., 2024).Specifically, the sLSTM (scalar LSTM) introduces an Exponential Gating mechanism and a Normalizer State to enhance gating flexibility and stabilize numerical computations; in contrast, the mLSTM (matrix LSTM) extends the storage cell into a matrix form, enabling the xLSTM to store high-dimensional information and incorporates a Query-Key Memory mechanism to optimize information storage and retrieval. Moreover, the xLSTM refines the computational pathway to improve its parallelization capabilities. The corresponding flowchart is illustrated in Figure 10.Regarding the hyperparameter settings, we defined the parameters as follows: the number of epochs was set as 20, the step size was 20, the learning rate was 0.001, the time step was 10, and the batch size was 16. Figure 10: Architecture of xLSTM model. Model optimization Multi-scale Attention Based on Algorithm Mechanisms To further improve the prediction accuracy and stability of complex time series, this study proposes a deep learning model that integrates multi-scale attention mechanisms, with xLSTM model (MAN-xLSTM), as illustrated in the Figure 11 (a) - (b). Figure 11: (a) The architecture of the proposed Multi-scaleattention xLSTM network (MAN-xLSTM) model.(b) The architecture details of theproposed multi-scale attention module.To achieve a balanced and comprehensive feature representation, MAN-xLSTM strategically combines four attention-based modules in Swin-Transformer Encoder . The Dilated Convolution (DC) module extracts multi-scale context by enlarging the receptive field without downsampling, while the Scale Fusion (SF) module dynamically integrates these multi-scale features along channel dimensions to prevent information fragmentation. The Path Attention (PA) module then weights different temporal trajectories, enabling the network to distinguish and emphasize critical sequence paths that DC-SF cannot capture alone. To complement these local and temporal refinements, the Large Kernel Attention (LKA) module aggregates long-range spatial dependencies and global context. Finally, the Swin-Transformer Encoder sits atop this hierarchy to unify local details and global interactions via self-attention. Dilated Convolution Module (DC) For the single-channel case, the dilated convolution can be expressed as where, $p$ denotes the output position (e.g., the $p$-th position in a one-dimensional sequence or feature map). $\alpha$ is the expansion coefficient. $R$ represents the sampling region of the convolution kernel, and $\omega$ denotes the convolution kernel weight.Here we adopt a multi-scale parallel structure. Two parallel paths are established with dilation rates $\alpha_{1}$=1 and $\alpha_{2}$=2, respectively. Because the xLSTM already offers stable feature representations along the temporal dimension, the dilated convolution further captures long-term dependencies and local details. Scale Fusion Module (SF) With dilated convolutions providing richer multi-scale representations, it becomes crucial to fuse these features effectively along the channel and spatial dimensions. To address this, we further introduce the Scale Fusion (SF) mechanism, which draws on group convolution and cross-scale dynamic fusion strategies. where $Attng(.)$ denotes the intra-group attention operation, and$\beta_{g}$ represents the trainable cross-scale fusion weight used to evaluate the importance of the $g$-th group in the final feature representation.The SF enhances the efficiency of feature expressions along both the channel and scale dimensions through grouping and dynamic fusion. Path Attention Mechanism Module (PA) To improve the model’s ability to capture potential temporal paths, the model introduces a Path Attention (PA) module.where $\text{p\ ϵ\ }\mathbb{R}^{T\times d}$ denotes the trainable path parameter matrix, and $Q$, $K$, and $V$are obtained from linear mappings of the features.While PA focuses on capturing temporal trajectory variations, SF emphasizes cross-scale feature representations. Their integration significantly enhances the ability to characterize complex sequential patterns. Large Kernel Attention Module (LKA) To further expand the receptive field and capture global context, the model employs the Large Kernel Attention (LKA) module at this stage, where DWConv denotes the depthwise separable convolution.$\text{DWCon}v_{\text{dil}}$ represents the depthwise separable convolution with a dilation coefficient, and $f_{\text{pwise}}$indicates the pointwise convolution $(1\times 1)$ used to integrate channel information.LKA effectively aggregates global context and local details, complementing the SF and PA modules. LKA focuses on feature correlations over a larger spatial range, while SF and PA provide fine-grained characterization along the channel and path dimensions. Swin Transformer Encoder Module Finally, to fully leverage the self-attention mechanism’s capability in modeling global dependencies, we replace conventional sequence or convolutional encoders with the Swin Transformer,where the query Q , key K , and value V matrices are linear projections of the input. The similarity between queries and keys is scaled by the square root of the feature dimension d and normalized using the softmax function to yield attention weights, which are then applied to the value matrix to obtain the output.Swin Transformer and LKA complement each other by addressing local and global features. LKA emphasizes global context and long-range dependencies, while SF and PA focus on precise characterization along the channel dimension and path selection. This design achieves a balance between local and global representations as well as the fusion of long-scale and short-scale features. The results of MAN-xLSTM model is shown in Table 4. PID Optimization Based on Physical Mechanisms In traditional modeling approaches, treating the pond water body as an ‘ideal low-pass filter’ inevitably overlooks the impact of high-frequency noise within the water, resulting in significant deviations in system response. This study adopts a control-theoretic perspective by modelling the pond system as a non-ideal low-pass filter subjected to both low-frequency external disturbances and high-frequency internal interferences.To more accurately characterize the thermal response to distinct disturbance sources, we further draw on the core principles of PID control to refine the input variables of MAN-xLSTM architecture. Specifically, external physical variations (e.g. solar irradiance and ambient air temperature) predominantly affect the surface layer, leading to an anticipatory rise in surface-water temperature. To quantify these low-frequency disturbances, we introduce third-order backward differences of the two sensors immediately above and below the target node, as indicated by the blue dashed box in Figure 12. Because such external excitations drive the system rapidly toward its steady state, we analogize them to the integral component (I) of a PID controller to correct accumulated error and promote more stable convergence, highlighted by the yellow and red dashed boxes in Figure 12. Figure 12: The optimized input variables of MAN-xLSTM architecture refined by PID controller. The formula for the backward difference is given bywhere $V(k)$ represents the leading variable of water temperature change at time $k$, defined as the n-th difference between the water temperature at time step $k$ and that at time step $k-n$.Likewise, endogenous biological activity induces high-frequency disturbances manifested as delayed fluctuations in deep-water temperature. We capture these effects using third-order forward differences of the two sensors adjacent to the target node, also marked by the blue dashed box in Figure 12. Given that high-frequency noise can provoke transient overshoot, we analogize these internal perturbations to the derivative component (D), as shown by the yellow and red dashed boxes in Figure 12, thereby attenuating rapid deviations and suppressing noise. Through this PID-inspired augmentation, the model responds to both low-frequency, cumulative external influences and high-frequency, endogenous fluctuations, yielding a concise yet effective innovation for multi-layer aquatic thermal dynamics prediction.The prediction results by introducing the leading and lagging variable are shown in Table4. Specifically, in the validation set, the $R^{2}$ value reached 0.9974, MAE was 0.0273, RMSE was 0.0354, and MAPE was 0.08%, significantly outperforming the other benchmark models, as discussed in Chapter 6. Table 4: Ablation experiment results of modules of MAN-xLSTM model. All RMSE values are tested based on Node Water Temperature 7, averaged over 50 independent runs. Discussions To comprehensively validate the advantages of the Multi-attention Network – xLSTM (MAN-xLSTM) model in Chapter 4, we conducted comparative experiments. In the experi- ments, we selected several mainstream regression models for comparison, including Random Forest Regression, Adaboost Regression, Gradient Boosting Decision Tree (GBDT), BP Neu- ral Network, XGBoost, and LightGBM. The results are shown in Table 5. The predictive accuracy of the benchmark models is far inferior to that of our MAN-xLSTM model. Table 5: Results of comparison experiment. All RMSE values are averaged over 50 independent runs. Training Testing Training Testing Training Testing Training Testing Training Testing Training Testing Training Testing Node 3 0.079 1.534 0.002 1.391 0.014 1.389 0.017 1.374 0.024 1.586 0.029 3.058 0.094 0.114 Node 4 0.229 1.759 0.004 1.816 0.040 1.897 0.025 1.801 0.063 1.76 0.051 2.853 0.078 0.081 Node 5 0.207 1.648 0.002 1.884 0.034 1.823 0.021 1.498 0.054 1.632 0.029 1.675 0.059 0.072 Node 6 0.077 1.088 0.001 1.096 0.010 1.017 0.010 1.008 0.016 1.067 0.017 2.254 0.087 0.073 Node 7 0.050 1.140 0.001 1.076 0.008 1.117 0.009 1.118 0.013 1.115 0.018 2.328 0.086 0.078 Node 8 0.042 1.134 0.000 1.067 0.008 1.077 0.010 1.081 0.012 1.091 0.016 2.292 0.062 0.065 Node 9 0.035 1.131 0.000 1.101 0.006 1.104 0.009 1.117 0.010 1.125 0.014 2.285 0.067 0.077 Node 10 0.031 1.111 0.000 1.072 0.006 1.074 0.008 1.085 0.009 1.084 0.014 2.284 0.066 0.072 Node 11 0.032 1.153 0.000 1.118 0.005 1.124 0.008 1.131 0.008 1.139 0.014 2.291 0.087 0.068 Node 12 0.028 1.145 0.000 1.119 0.005 1.123 0.008 1.135 0.007 1.131 0.013 2.276 0.052 0.061 Node 13 0.027 1.155 0.000 1.112 0.004 1.116 0.007 1.111 0.007 1.133 0.021 2.259 0.097 0.081 Node 14 0.024 1.147 0.000 1.113 0.004 1.115 0.007 1.119 0.007 1.127 0.021 2.248 0.053 0.041 Node 15 0.025 1.155 0.000 1.143 0.004 1.145 0.008 1.146 0.007 1.148 0.021 2.256 0.062 0.083 Node 16 0.023 1.126 0.000 1.099 0.004 1.102 0.008 1.108 0.007 1.109 0.022 2.246 0.058 0.065 Node 17 0.022 1.142 0.000 1.119 0.003 1.121 0.007 1.121 0.006 1.133 0.022 2.239 0.051 0.059 Node 18 0.021 1.152 0.000 1.129 0.003 1.120 0.007 1.122 0.006 1.138 0.022 2.252 0.050 0.055 Node 19 0.020 1.153 0.000 1.162 0.004 1.144 0.007 1.145 0.006 1.157 0.022 2.233 0.065 0.057 Node 20 0.022 1.139 0.000 1.125 0.003 1.123 0.006 1.134 0.007 1.146 0.022 2.242 0.030 0.039 To further validate the generalization ability of the model, we predicted the future water temperature changes at six nodes (nodes 3, 6, 9, 12, 15, and 18) based on the model, using a time step of 2 minutes. We forecasted the data for a total of 1295 periods and compared the predictions with the actual values. The RMSE for each node was recorded, and the results are shown in Table 6. Table 6: RMSE values of predictions for each node. All RMSEvalues are averaged over 50 independent runs. R² 0.9734 0.9941 0.9788 0.9940 0.9839 0.9898 MAE 0.0839 0.0594 0.0716 0.0572 0.0604 0.0802 RMSE 0.0984 0.0753 0.0879 0.0763 0.0766 0.0996 MAPE 0.0026 0.0018 0.0022 0.0018 0.0019 0.0025 The results indicate that the RMSE for all nodes was controlled below 0.1 ◦ C, confirmingthe reliability of the predictions and indicating that our model exhibits incredibly excellent generalization ability. Conclusion In summary, this paper presents a novel water temperature prediction method that in- tegrates deep learning with physical processes. Firstly, the pond system is modelled as a ‘non-ideal filter with high-frequency disturbance’. A PID controller is incorporated to regulate the dynamic response of the system. External physical changes are considered as an integrator, as excitations accelerate the steady-state response, while endogenous noise functions as a differentiator, introducing delayed disturbances to the steady-state response. Besides, dynamic time-channel fusion and the novel MAN-xLSTM model is applied to ac- curately predict water temperature at different depths. The results indicate that the RMSE for all nodes was controlled below 0.1 ◦ C, confirming the reliability of predictions and gen- eralization of our model. This research holds significant academic value and engineering application potential in fields such as aquaculture, stratified environmental monitoring, and ecological surveillance. Acknowledgements This work was supported by the National Natural Science Foundation of China under Grant No. 62471143. Affiliation College of Physics and Information Engineering, Fuzhou University, No. 2 Wulongjiang North Avenue, Fuzhou, Fujian Province, 350108, China. Reference [1] X. Zhou, Y. Hao, Y. Liu et al., “Short-term prediction of dissolved oxygen and water temperature using deep learning with dual proportional–integral–derivative error corrector in pond culture,” Eng. Appl. Artif. Intell., vol. 142, 2025. doi: 10.1016/j.engappai.2024.109964. [2] B. C. Stitt, G. Burness, K. A. Burgomaster et al., “Intraspecific variation in thermal tolerance and acclimation capacity in brook trout (Salvelinus fontinalis): physiological implications for climate change,” Physiol. Biochem. Zool., vol. 87, 2014. doi: 10.1086/675259. [3] J. N. Bowyer, M. A. Booth, J. G. Qin et al., “Temperature and dissolved oxygen influence growth and digestive enzyme activities of yellowtail kingfish Seriola lalandi,” Aquac. Res., vol. 45, 2014. doi: 10.1111/are.12146. [4] J. Donelson, P. Munday, M. McCormick et al., “Effects of elevated water temperature and food availability on the reproductive performance of a coral-reef fish,” Mar. Ecol. Prog. Ser., vol. 401, pp. 233–243, 2010. doi: 10.3354/meps08366. [5] R. I. Woolway and C. J. Merchant, “Intralake heterogeneity of thermal responses to climate change: a study of large Northern Hemisphere lakes,” J. Geophys. Res.: Atmos., vol. 123, pp. 3087–3098, 2018. doi: 10.1002/2017JD027661. [6] R. I. Woolway, B. M. Kraemer, J. D. Lenters et al., “Global lake responses to climate change,” Nat. Rev. Earth Environ., vol. 1, no. 8, pp. 388–403, 2020. doi: 10.1038/s43017-020-0067-5. [7] D. M. Livingstone, “Impact of secular climate change on the thermal structure of a large temperate central European lake,” Clim. Change, vol. 57, no. 1–2, pp. 205–225, 2003. doi: 10.1023/a:1022119503144. [8] A. P. Piotrowski and J. J. Napiorkowski, “Performance of the air2stream model that relates air and stream water temperatures depends on the calibration method,” J. Hydrol., vol. 561, pp. 842–855, 2018. doi: 10.1016/j.jhydrol.2018.04.016. [9] Y. C. Chen, B. X. Zhang, and Y. L. Li, “Study on model for vertical distribution of water temperature in Miyun Reservoir,” J. Hydraul. Eng., vol. 29, no. 5, pp. 60–65, 1998. [10] J. Shun-wen, Z. Yue-ming, S. Qiang, and D. Zeng, “Forecast of water temperature in reservoir based on analytical solution,” J. Hohai Univ. (Nat. Sci.), vol. 20, no. 2, pp. 123–128, 2008. doi: 10.1016/S1001-6058(08)60087-6. [11] J. Wade, C. Kelleher, and B. L. Kurylyk, “Incorporating physically-based water-temperature predictions into the National Water Model framework,” Environ. Model. Softw., vol. 171, 2023. doi: 10.1016/j.envsoft.2023.105866. [12] J. S. Jia, J. Z. Zhao, H. B. Deng, and J. Duan, “Ecological footprint simulation and prediction by ARIMA model—a case study in Henan Province of China,” Ecol. Indicat., vol. 10, no. 2, pp. 538–544, 2010. doi: 10.1016/j.ecolind.2009.06.007. [13] S. Hua, Z. Huo, J. Shen, B. Xu, and C. Zhang, “Using artificial neural network models for eutrophication prediction,” Procedia Environ. Sci., vol. 18, pp. 758–765, 2013. doi: 10.1016/j.proenv.2013.04.102. [14] M. Mohri, A. Rostamizadeh, and A. Talwalkar, Foundations of Machine Learning , 2nd ed. Cambridge, MA: MIT Press, 2018. [15] T. Hong, X. Huang, G. Chen, Y. Yang, and L. Chen, “Exploring the spatiotemporal relationship between green infrastructure and urban heat island under multi-source remote sensing imagery: A case study of Fuzhou City,” CAAI Trans. Intell. Technol., vol. 8, no. 4, pp. 1337–1349, 2023. doi: 10.1049/CIT2.12272. [16] H. Du, S. Du, and W. Li, “Probabilistic time series forecasting with deep non-linear state space models,” CAAI Trans. Intell. Technol., vol. 8, no. 1, pp. 3–13, 2023. doi: 10.1049/CIT2.12192. [17] S. Sharma, S. C. Walker, and D. A. Jackson, “Empirical modelling of lake-water-temperature relationships: a comparison of approaches,” Freshw. Biol., vol. 53, no. 5, pp. 897–911, 2008. doi: 10.1111/j.1365-2427.2008.01943.x. [18] W. C. Liu and W. B. Chen, “Prediction of water temperature in a subtropical subalpine lake using an artificial neural network and three-dimensional circulation models,” Comput. Geosci., vol. 45, pp. 208–216, 2012. doi: 10.1016/j.cageo.2012.03.010. [19] J. S. Read, X. Jia, J. Willard et al., “Process-guided deep-learning predictions of lake water temperature,” Water Resour. Res., vol. 55, no. 11, pp. 9173–9190, 2019. doi: 10.1029/2019WR024922. [20] W. Zaremba, I. Sutskever, and O. Vinyals, “Recurrent neural network regularization,” arXiv preprint, arXiv:1409.2329, 2014. [Online]. Available: http://arxiv.org/abs/1409.2329v5 [21] M. Wang, F. X. Ying, and Q. R. Nan, “Refined offshore wind-speed prediction: leveraging a two-layer decomposition technique, gated recurrent unit, and kernel density estimation,” Eng. Appl. Artif. Intell., vol. 133, 2024. doi: 10.1016/j.engappai.2024.108435. [22] C. Qin et al., “Anti-noise diesel engine misfire diagnosis using a multi-scale CNN-LSTM neural network with denoising module,” CAAI Trans. Intell. Technol., vol. 8, no. 3, pp. 963–986, 2023. doi: 10.1049/CIT2.12170. [23] Y. Wang, T. Zheng, Y. Zhao et al., “Monthly water-quality forecasting and uncertainty assessment via bootstrapped wavelet neural networks under missing data for Harbin, China,” Environ. Sci. Pollut. Res., vol. 20, no. 7, pp. 4627–4639, 2013. doi: 10.1007/s11356-013-1874-8. [24] W. Zhi, D. Feng, W. P. Tsai et al., “From hydrometeorology to river water quality: can a deep-learning model predict dissolved oxygen at the continental scale?” Environ. Sci. Technol., vol. 55, no. 5, pp. 3086–3096, 2021. doi: 10.1021/acs.est.0c06783. [25] K. Roushangar, S. Davoudi, and S. Shahnazi, “Temporal prediction of dissolved oxygen based on CEEMDAN and multi-strategy LSTM hybrid model,” Environ. Earth Sci., vol. 83, no. 8, pp. 1–17, 2024. doi: 10.1007/s12665-024-11476-9. [26] J. Huan, H. Li, M. Li, and B. Chen, “Prediction of dissolved oxygen in aquaculture based on gradient-boosting decision tree and long short-term memory network,” Comput. Electron. Agric., vol. 175, 2020. doi: 10.1016/j.compag.2020.105530. [27] M. Beck, K. Pöppel, M. Spanring et al., “xLSTM: extended long short-term memory,” arXiv preprint, arXiv:2405.04517, 2024. [Online]. Available: https://github.com/NX-AI/xlstm [28] H. Chen, X. Nan, and S. Xia, “Data fusion based on temperature monitoring of aquaculture ponds with wireless sensor networks,” IEEE Sens. J., vol. 23, no. 4, pp. 4046–4055, 2023. doi: 10.1109/JSEN.2022.3222510. [29] P. J. Rousseeuw, “Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,” J. Comput. Appl. Math., vol. 20, pp. 53–65, 1987. doi: 10.1016/0377-0427(87)90125-7. [30] T. Caliński and J. Harabasz, “A dendrite method for cluster analysis,” Commun. Stat., vol. 3, no. 1, pp. 1–27, 1974. doi: 10.1080/03610927408827101. [31] J. C. Dunn, “Well-separated clusters and optimal fuzzy partitions,” J. Cybern., vol. 4, no. 1, pp. 95–104, 1974. doi: 10.1080/01969727408546059. [32] D. L. Davies and D. W. Bouldin, “A cluster separation measure,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-1, no. 2, pp. 224–227, 1979. doi: 10.1109/TPAMI.1979.4766909. [33] N. Migenda, R. Möller, and W. Schenck, “Adaptive dimensionality reduction for neural network-based online principal component analysis,” PLoS ONE, vol. 16, no. 4, pp. 1–18, 2021. doi: 10.1371/journal.pone.0248896. Information & Authors Information Version history V1 Version 1 23 October 2025 Copyright This work is licensed under a Non Exclusive No Reuse License. Keywords endogenous thermal dynamics man-xlstm pid controller regulation pond aquaculture modelling water temperature prediction Authors Affiliations Jiayi Qiu Fuzhou University School of Physics and Information Engineering View all articles by this author Ang Zhang [email protected] Fuzhou University School of Physics and Information Engineering View all articles by this author Yanhao Lin 0009-0006-5701-2303 Fuzhou University School of Physics and Information Engineering View all articles by this author Zhilun Lin Maynooth University Faculty of Science and Engineering View all articles by this author Tengfei Liu Fuzhou University School of Physics and Information Engineering View all articles by this author Shuchang Zhang Maynooth University Faculty of Science and Engineering View all articles by this author Yuqi Zeng Maynooth University Faculty of Science and Engineering View all articles by this author Metrics & Citations Metrics Article Usage 290 views 108 downloads .FvxKWukQNSOunydq8rnd { width: 100px; } Citations Download citation Jiayi Qiu, Ang Zhang, Yanhao Lin, et al. Prediction of water temperature based on machine learning algorithm integrating endogenous thermal dynamics for pond aquaculture. Authorea . 23 October 2025. DOI: https://doi.org/10.22541/au.176122116.62333498/v1 If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download. For more information or tips please see 'Downloading to a citation manager' in the Help menu . Format Please select one from the list RIS (ProCite, Reference Manager) EndNote BibTex Medlars RefWorks Direct import Tips for downloading citations document.getElementById('citMgrHelpLink').addEventListener('click', function() { popupHelp(this.href); return false; }); $(".js__slcInclude").on("change", function(e){ if ($(this).val() == 'refworks') $('#direct').prop("checked", false); $('#direct').prop("disabled", ($(this).val() == 'refworks')); }); View Options View options PDF View PDF Figures Tables Media Share Share Share article link Copy Link Copied! Copying failed. Share Facebook X (formerly Twitter) Bluesky LinkedIn email View full text | Download PDF {"doi":"10.22541/au.176122116.62333498/v1","type":"Article"} Now Reading: Share Figures Tables Close figure viewer Back to article Figure title goes here Change zoom level Go to figure location within the article Download figure Toggle share panel Toggle share panel Share Toggle information panel Toggle information panel Go to previous graphic Go to next graphic Go to previous table Go to next table All figures All tables View all material View all material xrefBack.goTo xrefBack.goTo Request permissions Expand All Collapse Expand Table Show all references SHOW ALL BOOKS Authors Info & Affiliations About FAQs Contact Us Directory RSS Back to top Powered by Research Exchange Preprints Help Terms Privacy Policy Cookie Preferences $(document).ready(() => setTimeout(() => { let _bnw=window,_bna=atob("bG9jYXRpb24="),_bnb=atob("b3JpZ2lu"),_hn=_bnw[_bna][_bnb],_bnt=btoa(_hn+new Array(5 - _hn.length % 4).join(" ")); $.get("/resource/lodash?t="+_bnt); },4000)); (function(){function c(){var b=a.contentDocument||a.contentWindow.document;if(b){var d=b.createElement('script');d.innerHTML="window.__CF$cv$params={r:'9feb17100c370708',t:'MTc3OTI3NzQ3Mw=='};var a=document.createElement('script');a.src='/cdn-cgi/challenge-platform/scripts/jsd/main.js';document.getElementsByTagName('head')[0].appendChild(a);";b.getElementsByTagName('head')[0].appendChild(d)}}if(document.body){var a=document.createElement('iframe');a.height=1;a.width=1;a.style.position='absolute';a.style.top=0;a.style.left=0;a.style.border='none';a.style.visibility='hidden';document.body.appendChild(a);if('loading'!==document.readyState)c();else if(window.addEventListener)document.addEventListener('DOMContentLoaded',c);else{var e=document.onreadystatechange||function(){};document.onreadystatechange=function(b){e(b);'loading'!==document.readyState&&(document.onreadystatechange=e,c())}}}})();

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00