A robust federated biased learning algorithm for time series forecasting | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article A robust federated biased learning algorithm for time series forecasting Mingli Song, Xinyu Zhao, Witold Pedrycz This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4658479/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 10 You are reading this latest preprint version Abstract The federated averaging algorithm (FedAvg) is extensively used for multi-sensor data modeling but often overlooks the unique characteristics of local models when privacy and data security are not considered. This study introduces a novel federated learning algorithm built upon the FedAvg framework, which emphasizes the specificity of each local model to optimize global knowledge aggregation. The algorithm's effectiveness is demonstrated through an air quality index prediction problem, showcasing superior prediction performance and robustness in noisy data scenarios. Additionally, the study delves into the reliability and robustness of the proposed approach, addressing the prevalent notion that centralized learning methods often surpass federated learning when data security is not a concern. Our experiments affirm the necessity and superiority of federated learning methods, even in the absence of privacy considerations, by effectively managing real-world noisy data. Federated learning air quality index time series neural networks federated averaging algorithm Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 1. Introduction Federated learning techniques are widely applied to many fields due to the blossom of IoT and intelligent sensors. In literature, the federated learning studies can be classified into two categories: one is applied for privacy related tasks and the other one is for tasks without considering privacy issues. This study will concentrate on the second one. No matter in which environment, one of the most important issues in federated learning is how to utilize information provided by local models to form global viewpoints and guide further update of local models. Federated averaging algorithm (FedAvg) [1-4], as a popular tool, introduces an average idea when sharing model parameters and has been verified very effective in many situations. However, sometimes the information from local models is not equal but with biases. Therefore, in this study, we take a thorough study on how to realize federated biased learning using a real case. Since our environment is heavily affected by people activation and getting air quality information in advance may help people make proper decisions, air quality prediction issues seem more complex but important. To help provide more useful information and further improve prediction accuracy, an idea of mutual help among neighbors (neighbor observing stations or groups) is developed in this study which is in fact a type of federated learning. FedAvg algorithm simply averages all local model parameters and sends them to local sides to start a new training round. If the selection of local sides is not proper, there will be new disturbance decreasing the accuracy prediction. Therefore, we introduce the idea of federated biased learning which comprehensively considers the contributions of each local model instead of averaging them. It avoids the disturbance problem brought by spatial information. In literature, many researchers adopt spatial information as part of the data; however, whether this action is proper is pending especially from the mathematical perspective. Therefore, we try an incremental way on local sides and observe the performances of different groups. The results imply an interesting conclusion on the air quality prediction tasks. Another crucial issue in federated learning is that federated learning strategies don’t always show better results comparing with centralized learning. In this case, one has to look at the result of centralized learning when the prediction accuracy is the main requirement. If centralized learning methods win, does it mean that federated learning methods are useless for this special problem? To answer this question, we design a federated learning verification framework from the robustness perspective which becomes a more important evaluation criterion when developing models. Above all, the main contributions of this study are: (1) We propose a new federated learning algorithm named federated biased learning (FedBiased) which is testified more effective than FedAvg when being applied to solve multi-station air quality prediction problems in Beijing. Here the novelty is that each local model’s specificity is utilized and each local model pursuing the highest accuracy is the objective. (2) We verify the strong robustness of federated learning strategies through proposing a comprehensive framework comprising three scenarios of disturbances. Both FedAvg and FedBiased algorithms as representatives are applied to the air quality prediction problem with raw data and noisy data. (3) A thorough experimental studies and comparisons are executed to reveal where federated learning performs better and where centralized learning performs better under the complex multi-station air quality prediction issue. The main contents of this study are organized as follows. Section 2 introduces the background of federated learning and related techniques. Section 3 elaborates on the details of the proposed FedBiased method. Some experiments are done and discussed in Section 4. Finally, the conclusion is stated in Section 5. 2. Background Collecting a plethora of data from different sensors is benefit for increasing prediction accuracy. Therefore, federated techniques in distributed learning become popular especially when all local data sets are collected for the same target. The client-server federated learning framework is currently the most commonly used mode in federated learning. In 2016, google proposed the classical algorithm Federated Averaging (FedAvg) [ 5 ] which becomes the benchmark of this field for a long time. There are also many improved methods proposed based on FedAvg, such as FedProx [ 6 ], Scaffold [ 7 ], and FedNova [ 8 ]. FedProx addresses non-IID data and device heterogeneity issues through a proximal term. Scaffold achieves higher communication efficiency by maintaining a control variate. FedNova employs second-order optimization methods and adaptive regularization techniques to improve the convergence speed and the performance. These methods require that all local data sets are Independent and Identically Distributed. Sometimes, traditional federated learning approaches are not effective when there are significant differences in the local data distribution. The distribution of Non-Independent and Identically Distributed (Non-IID) data in the client data causes the local model update to deviate from the global optimal, which seriously affects the performance of the training model [ 9 ]. Therefore, the selection of proper learning algorithm depends on the distribution of local data sets. In this study, we only pay attention to temporal data (time series) that are recorded from different sensors (stations). It is a kind of sequential data and in general there is only one critical variable (univariable) directly connected with the task. Recent years witnesses that more and more spatial-temporal data come into existence due to the rapid development of computer and networking industry. Spatial-temporal data prediction is a central research issue in this field [ 10 ], with significant applications in domains such as transportation [ 11 ], meteorology [ 12 ], and healthcare [ 13 ]. The primary goal of spatial-temporal data prediction is to leverage historical spatial-temporal data to make informed judgments about the future development of such data, aiding decision-making processes [ 14 , 15 ]. Deep networks are very effective tools to solve sequential data prediction problems and even spatial-temporal data prediction problems [ 16 ] proposed a network traffic prediction model based on dynamic graph attention spatial-temporal networks, effectively capturing the complex features of network traffic [ 17 ] introduced a novel spatial–temporal gated attention transformer, achieving state-of-the-art performance in traffic flow forecasting. Zhao et al. [ 18 ] proposed predictive models based on a combination of graph convolutional networks and Gated Recurrent Unit (GRU) recurrent neural networks. This model integrates city adjacency matrices and feature matrices into a GCN for comprehensive spatial modeling. Wang et al. [ 19 ] encoded spatial adjacency relationships, temporal pattern similarities, and functional similarities between air prediction stations in Beijing and Tianjin into a graph, proposing an attentive temporal graph convolutional network (ATGCN). Results indicate that graph convolutional networks can aggregate features between stations, leading to superior prediction performance. Furthermore, Padhi et al. [ 20 ] leveraged Transformers to introduce a hierarchical BERT model for learning temporal sequences. Although these deep network models complete prediction tasks from specific perspectives, deep networks are not always necessary, for example, when the quantity of the data is not huge. One has to look into the data set or do some experiments to determine whether to adopt deep networks or not. In addition, proper usage of spatial information may help increase the accuracy. On the contrary, unproper usage may decrease the accuracy. Thus, how to utilize the spatial information deserves careful attention and further exploration. 3. A Federated Biased Learning method As peng et al. [ 21 ] put forward, averaging parameters may not be the optimum way of aggregating trained parameters. In this section, we introduce a new federated learning method named federated biased learning (FedBiased), which emphasizes the individual model knowledge through considering the performance of each local model. Most federated learning methods are designed to deal with privacy related tasks in which the terms “clients” and “server” are often used. In this study, we focus on the study of communication of parameters among different models without considering the privacy fact. Therefore, the term “local” is adopted to represent the client side and the term “global” is adopted to represent the server side. 3.1 The Basic framework of federated learning methods Figure 2 shows a conventional federated learning process. Each local model comes with the same architecture and their parameters are stored in W i , i = 1, 2, …, p . All local models are initialized with the same parameters, that is, W i (0) = W 0 . Each local model is firstly trained independently to some extent. At time t, each local model provides its parameters to the global model and the global model collects all parameters and computes a “representative” matrix \(\stackrel{\sim}{W}\) . This matrix is sent to each local model as the initial parameters at time t + 1. Each local model is then trained independently. This interaction repeats until a stop condition is satisfied. In any article it is unnecessary to have an arrangement statement at the beginning (or end) of every (sub-) section. Rather, a single overall arrangement statement about the whole paper can be made at the end of the Introduction section. The federated averaging algorithm is a popular method in this field and has been verified more effective than many other methods [ 22 – 24 ]. It follows the basic learning process shown in Fig. 2 and the idea of “averaging” is embodied in the calculation way of \(\stackrel{\sim}{W}\) . Please refer to formula (1). $$\stackrel{\sim}{\varvec{W}}(t+1)=\frac{\sum _{i=1}^{p}{\varvec{W}}_{i}\left(t\right)}{p}$$ 1 Where \({\varvec{W}}_{i}\left(t\right)\) represents the parameters of the i-th local model at time t, and \(\stackrel{\sim}{\varvec{W}}\) (t + 1) refers to the aggregated matrix using all local models at time t. The federated averaging algorithm aggregates all local parameters through computing their mean values. Take neural networks for example. In the training process of each local model, \({\varvec{W}}_{i}\left(t\right)\) is updated using backpropagation method and \({\varvec{W}}_{i}(t+1)\) is obtained and then sent to the global model side. Assume that there is no information lost during the transmission process. 3.2 A biased way of aggregation We consider a real situation: the local model or the local data resource shows distinctive characteristics. In this case, when calculating mean values, the standard deviation will be large. At this time, the biased way of aggregating local parameters will perform better than the averaging algorithm. Therefore, we define a new federated learning algorithm using its dominant knowledge for a certain local model. The performance of each local model is evaluated by the accuracy (or error). We assume that a local model with higher accuracy provides more useful information when aggregating. Compared with federated averaging algorithm, it should occupy more information than the averaged one. Therefore, we calculate a weight for each local model using a normalized accuracy weight. In this way, the overall weight matrix is not an average one, but a biased one. In FedBiased, each local model comes with the same initialized architecture and parameters. Each local model trains independently with its own data for some time and then sends the parameters to the global model. The global model collects all local parameters and calculate a unique biased matrix of parameters. The update of the parameters follows formula (2). This process iterates through multiple rounds until convergence. $$\stackrel{\sim}{\text{W}}(\text{t}+1)=\sum _{\text{i}=1}^{\text{p}}{\text{W}}_{\text{i}}\left(\text{t}\right)\times {\text{r}}_{\text{i}}\left(\text{t}\right)$$ 2 $${r}_{i}\left(t\right)=\frac{1- \frac{{error}_{i}\left(t\right)}{\sum _{i=1}^{p}{error}_{i}\left(t\right)}}{\sum _{i=1}^{p}1-\frac{{error}_{i}\left(t\right)}{\sum _{i=1}^{p}{error}_{i}\left(t\right)}}$$ 3 where \({\varvec{W}}_{i}\left(t\right)\) represents the parameters of the i -th local model at time t, and \(\stackrel{\sim}{\varvec{W}}\) (t + 1) refers to the aggregated matrix using all local models at time t. \({error}_{i}\left(t\right)\) in formula (3) represents the error of the i -th local model at time t, MSE for instance. The biased way of updating global weight matrix using formula (2) sufficiently considers the contribution of each individual model. Note that each local model has to be trained independently when they obtain new parameters for “some time” which means several rounds or epochs. All local models adopt the same number of rounds or epochs. This requirement makes sure that all local models always stay in the same level. An issue is how to determine this number. In this study, we adopt a number less than 10. The discussion of its experimental performance will be elaborated in Section 4. The biased federated learning method has two main advantages: It sufficiently considers the individual contribution of each local model when aggregating. It dynamically adjusts the weight of each local model through computing a local model’s accuracy during each epoch. The weight of each local model can be regarded as a learning rate or ratio of the model. The learning process of FedBiased is the same as FedAvg (in Fig. 1 ). The only difference is the way updating the global parameters-matrix. To clarify the new method, we describe it in the pseudocode way: Algorithm 1 Federated Biased Learning Method Require : W // Set a consistent model architecture across all local models \(E\) // Number of epochs for local training \(\text{t}\text{r}\text{a}\text{i}\text{n} \text{l}\text{o}\text{a}\text{d}\text{e}\text{r}\left[i\right],\) test \(\text{l}\text{o}\text{a}\text{d}\text{e}\text{r}\left[i\right]\) // Training and testing data loaders for each local model \(N\) // Number of local models \(R\) // Number of iterations \(Ratio\left(r\right)\) // the learning rate for local models in round \(r\) \({\stackrel{\sim}{W}}^{{\prime }}\left(r\right)\) // the aggregated matrix using all local models in round \(r-1\) Ensure : The updated global model weights: \({W}^{{\prime }}\left(R\right)\) 1: \(r\leftarrow 0;\) 2: \({W}^{{\prime }}\left(r\right)\leftarrow \varvec{W};\) 3: While \(r<R\) do 4: Load \({W}^{{\prime }}\left(r\right)\) for all local models 5: for \(i=0\) to \(N-1\) do 6: \(mode{l}_{i}\leftarrow \text{t}\text{r}\text{a}\text{i}\text{n}\left(model,\text{t}\text{r}\text{a}\text{i}\text{n} \text{l}\text{o}\text{a}\text{d}\text{e}\text{r}\left[i\right]\right);\) 7: \({W}_{i}^{{\prime }}\left(r\right)\leftarrow get\_weights(model\_i);\) 8: \(erro{r}_{i}\leftarrow test\left(model,\text{t}\text{r}\text{a}\text{i}\text{n} \text{l}\text{o}\text{a}\text{d}\text{e}\text{r}\left[i\right]\right);\) 9: end for 10: \(error\left(r\right)\leftarrow \left[erro{r}_{1},erro{r}_{2},\dots ,erro{r}_{N}\right];\) 11: \(Result\left(r\right)\leftarrow ToSumOne\left(\varvec{e}\varvec{r}\varvec{r}\varvec{o}\varvec{r}\right( \varvec{r}\left)\right);\) 12: \(Ratio\left(r\right)\leftarrow ToSumOne(1-\varvec{R}\varvec{e}\varvec{s}\varvec{u}\varvec{l}\varvec{t}(r\left)\right);\) 13: \(r\leftarrow r+1;\) 14: \({\stackrel{\sim}{W}}^{{\prime }}\left(r\right)\leftarrow \sum _{i=0}^{N-1}{W}_{i}^{{\prime }}\left(r-1\right)\times Ratio\left(r-1\right);\) 15: \({{W}^{{\prime }}\left(r\right)\leftarrow \stackrel{\sim}{W}}^{{\prime }}\left(r\right)\) 16: end while 17: return \({W}^{{\prime }}\left(R\right)\) 3.3 Robustness analysis of the federated biased learning When using federated learning strategies, one has to note that federated learning methods may not always perform better than centralized learning methods. There are two reasons to explain this phenomenon. One is that more information may bring more disturbance or uncertainty. At this time, the federated learning result will be worse than the centralized learning result. The other reason is that some models such as deep networks model nowadays show remarkable high accuracy and there is no obvious improvement when using the federated learning methods. Although this obsession exists at all times, we find an interesting fact: federated learning methods show better results than centralized learning when there is some noise in the time series. In other words, federated learning methods will reduce the impact of uncertainties and show strong robustness in the learning process. To verify this declaration, we assume three scenarios of noisy data in this section and illustrate the comparison results in Section 4. Assume that the public time series data sets are clean (if not specified). To continue the analysis, we have to refer to some noise techniques. Among those, Gaussian noise is the most commonly used one [ 25 ]. Gaussian white noise is a common type of noise, which follows the Gaussian distribution and contains no correlation. Its probability density function follows normal distribution, which can simulate the random fluctuations and uncertainties that may exist in the actual data [ 26 ]. To add Gaussian white noise to time series data, one can use the following formula: $${X}_{i\_noise}={X}_{i}+N\left(\text{0,1}\right)\times \sqrt{{P}_{n}}$$ 4 $${P}_{n}=\frac{{P}_{s}}{{10}^{(SNR/10)}}$$ 5 $${P}_{s}=\frac{\sum |{X}_{i}{|}^{2}}{len\left(sequence\right)}$$ 6 where X i is the raw data, SNR is the signal-to-noise ratio, and N (0,1) is a Gaussian distributed random variable with mean 0 and standard deviation 1. \({P}_{s}\) stands for signal power, which is calculated by dividing the sequence sum of squares by the sequence length. And \({P}_{n}\) stands for noise power, which is jointly determined by signal power \({P}_{s}\) and SNR. By adjusting the value of SNR, the intensity of the noise can be controlled. To consider possible uncertainties occurred during sampling time series from sensors, we set up three cases: Case 1 the entire time series is added with noise; Case 2 the first half part of the time series is added with noise; Case 3 the latter half part of the time series is added with noise. Case 1 is used to simulate the scenario that the sensor doesn’t function normally because of the aging. Case 2 reveals the scenario that an abnormal performance of a sensor is observed after some time and the first half time series is added with a Gaussian noise. Case 3 is used to mimic the scenario that after some time the sensor doesn’t function normally and this is not found and corrected. Therefore, the latter noisy time series along with the first corrected time series is used to construct a prediction model. Another case is that there are several noisy data points in a time series which shows similar results when using federated learning to build prediction models. Therefore, we didn’t list it in this study. In federated learning, since the data are stored separately for each localization, the uncertainty and noise situation of the actual data can be simulated by adding noise to the local stations individually. In other words, the parameters of a Gaussian white noise may be different for different localization. Of course, we can set same values of those parameters to simplify the programming task. 3.4 Analysis of spatial information Recent studies on utilizing both spatial and temporal information to construct time series prediction models in federated learning can be summarized into two categories: one is utilizing spatial information and temporal information separately; the other is utilizing both spatial information and temporal information in one model. As to the first case, stations are put into different groups using cluster methods or other strategies and then training models use temporal data only. The second case uses spatial and temporal information in one model. An issue is that the spatial and temporal information are two different types and there is some sequential relationship between them. Therefore, before effective connection mechanisms being proposed, utilizing the two kinds of information separately is preferred. However, there is no comprehensive studies revealing how to effectively use spatial information independently when building time series prediction models. In this paper, we aim to increase the prediction accuracy of each local model. Keeping this in mind, each time, one local station is regarded as a leading role and its best k neighbors are grouped for federated learning. Geographical locations on the earth and the elevation values are the main characters to be used when analyzing climate data. The Euclidean distance is adopted to select k neighbors for a certain station with converted longitudes and latitudes and elevation values. 4. Experiments In this section, we conduct simulations to evaluate the performance of the proposed federated biased learning algorithm and compare with other methods. Firstly, we introduce a real spatial-temporal time series case: air quality data from 35 monitoring stations in Beijing (China). Then, we present model selections and parameter settings. Finally, we analyze the performance of the proposed algorithm from several aspects. 4.1. Dataset The air quality index (AQI) forecasting attracts more attention since air quality heavily affects human beings’ daily life and healthcare. In this study, we collect air quality data from 35 monitoring stations in Beijing (China) and comprehensively study the effect of federated learning strategies from both temporal and spatial perspectives. Each station comes with a univariate time series which is recorded hourly. We use the time series of year 2022 which is downloaded from the Beijing Environmental Protection Testing Center and other related departments ( http://www.bjmemc.com.cn/).Th e factors that may affect the prediction results include: the number of local stations, methods of federated learning, the location (elevations) of monitoring stations, and so on. We explore the effect of each possible factor through setting different environment parameters and analyzing the corresponding results. The 35 monitoring stations are displayed in Fig. 1 . Different colors represent different elevations. The study uses air quality data collected from 35 stations in Beijing (China). Each station provides one air quality series and will be used to train a prediction model individually and collaboratively with federated learning strategy. In federated scenario, all stations will be firstly organized into several groups and in each group, each station acts as a local client. As a starting point, we choose two nearest stations to execute federated learning and observe the performance comparing with individual modelling. The distance function is calculated based on the latitude and longitude information. In order to observe the predictive results of the FedAvg under different levels of noise situation, four groups of noisy time series were used to execute our method. Meanwhile, the true time series (raw data) were also used to execute the same development process for comparison. Four different levels of noise (SNR) are used to create noisy time series: 10, 20, 30 and 40. Figure 4 illustrates the four noisy time series and the raw time series. To clearly show the trend, a period of 10-hour is enlarged in the local zoom of the figure. The noisy time series with SNR 10 shows dramatic fluctuation and the noisy time series with SNR 40 is the one with smallest error. It can be inferred that the noisy time series with SNR 40 will be the easiest one to predict. However, whether the federated learning is powerful while predicting noisy timeseries comparing with individual modeling and how the federated learning performs on different levels of noisy time series should be further studied. 4.2. Evaluation metrics All the experiments are run on a computer with an Intel Core i7 processor and 16 GB of RAM, running the Windows 10 operating system. The programming environment used for implementation was Visual Studio Code, and the experiments were coded in Python 3.9. For deep learning tasks, the PyTorch framework was employed. As to the evaluation of experimental results, mean absolute error (MAE) and mean square error (MSE) are two common indexes. MAE indicates the average absolute difference between the predicted value and the true value, which can reflect the accuracy of the prediction results. The smaller the MAE is, the higher the accuracy of the prediction results. MSE indicates the average of the squares of the errors between the predicted and the true values, which can reflect the overall error of the prediction results. The smaller the MSE is, the smaller the overall error of the prediction results is. To evaluate the performance of forecasting models, the index of agreement (IA) is used to indicates the distribution similarity between the observed and forecasting values. The IA varies in the range of [0, 1]. The closer to 1 the IA is, the higher the distribution consistency is. For time series data, due to the temporal correlation of the data, the use of a single evaluation index may not be able to comprehensively assess the performance of the model, and the three indices of MAE, MSE and IA are used here. $$\text{M}\text{S}\text{E}=\frac{1}{n}\sum _{i=1}^{n} ({y}_{i}-{\widehat{y}}_{i}{)}^{2}$$ 7 $$MAE=\frac{1}{n}\sum _{i=1}^{n} \left|{y}_{i}-{\widehat{y}}_{i}\right|$$ 8 $$\text{I}\text{A}=1-\frac{\sum _{i=1}^{n} {\left({y}_{i}-{\widehat{y}}_{i}\right)}^{2}}{\sum _{i=1}^{n} {\left(|{y}_{i}-\overline{y}|+\left|{\widehat{y}}_{i}-\overline{y}\right|\right)}^{2}}$$ 9 where y i represents the i- th real output and \(\widehat{{y}_{i}}\) represents the i- th predicted result. 4.3. Local Model selection and parameter settings The sliding window method is adopted here to reorganize the time series data. The number of input neurons of each network is determined by the length of the sliding window. In the following experiments, we set the length of a sliding window (input dimension) as 24 which can be adjusted by experts accordingly and output dimension as 1. For stance, the initial 24 hours of AQI values serve as inputs, while the output is the air quality of the 25th data point. The learning rate of the network is set as 0.005. The training set comprises 2172 hourly bar AQI values of the first quarter of 2022, while the test set encompasses 40 hours of AQI. The batch size is set to 1 and the training rounds are numbered as 100. After analyzing the AQI data, a shallow neural network comprising of one input layer, one hidden layer with 10 neurons, and one output layer was chosen for training. Shallow networks are favorable due to the short training time and effectiveness compared to deep networks such as Long Short-Term Memory neural network (LSTM). To verify this, we did some experiments and list the results of shallow networks and deep neural networks using the same time series. Table 1 presents the performances of BPNN and LSTM in terms of error and running time. Both neural networks were tested under the same conditions and repeated five times. The results show that BPNN has a lower MSE and a faster running time, almost twice as short as LSTM. There is no dramatic improvement while using deep neural networks and therefore, we decide to use shallow networks as the fundamental modelling units. In the federated learning process, local models are trained for a while independently to reach a certain criterion before sending their parameters to a central server. The central server then calculates to obtain global parameters. These global parameters are then sent back to each local side. This iterative process continues until a predetermined convergence condition is reached. Here we set the number of rounds r in the outer loop to 10, and a single client is trained locally 10 times per round, so the number of iterations in the whole process is set as 100. To look at the performances of shallow neural networks predicting raw time series and noisy ones, we illustrate the test results in Fig. 5 . The raw time series is drawn in the figure as well. The results show that the predicted values with a SNR of 10 had the most significant difference with the raw values. In contrast, the difference between the predicted values and true values for the other groups was smaller. In a nutshell, prediction models only function on a rational range of noisy data which is consistent with the human beings’ inference. In addition, the predicted values lagged slightly behind the true values. Therefore, we choose noisy time series with SNR 40 in the disturbance verification experiments. 4.4. The performances of FedBiased vs FedAvg To testify the effectiveness of the proposed FedBiased method, we compare the experimental results with the popular FedAvg from three perspectives: (1) the trend of the entire convergence process; (2) the performances of federated learning methods under different groups; (3) statistical testing. 4.4.1. The trend of the entire convergence process Figure 6 illustrates the MSE change with epoch on centralized modeling, FedAvg modeling and FedBiased modeling for monitoring station 1. The three curves are very close and we enlarge part of them in a small window in which it is clear that FedBiased performs best. The federated group includes stations 1, 6 and 2. 4.4.2. The performances of federated learning methods under different groups In Fig. 3 , station 16 and station 17 are located on higher elevation than other stations. To make sure the federated learning methods function under different environments and settings, we employ the time series from these two stations to execute the experiments. In addition, the time series from station 15 (the nearest station of station 17) is collected as well. Tables 2 , 3 , 4 , 5 show the MSE values and MAE values of the three methods (FedBiased, FedAvg and centralized modeling) under two different groups: (1) stations 15, 16, 17; (2) stations 1, 6, 2. Table 2 MSE and standard deviation for stations 15,16,17 using FedAvg, FedBiased and Centralized algorithms Station FedAvg FedBiased Centralized 15 0.000253 ± 9.875E-08 0.000245 ± 2.515E-08 0.000243 ± 7.664E-07 16 0.000517 ± 1.29E-06 0.000508 ± 1.265E-07 0.000530 ± 4.586E-07 17 0.000358 ± 1.769E-06 0.000348 ± 7.124E-08 0.000350 ± 1.577E-07 Table 2 reveals that sometimes federated learning algorithms show better performances than centralized learning algorithms, and sometimes centralized learning algorithms show better performances than federated learning algorithms. In the groups of stations 15, 16, 17, the reason under this may be station 16 and station 17 are located nearby and with higher elevation whereas station 15 is far from the two stations. In any event, FedBiased performs better than FedAvg. Table 3 MSE and standard deviation for stations 1,6,2 using FedAvg, FedBiased and Centralized algorithms Station FedAvg FedBiased Centralized 1 0.000146 ± 4.41E-06 0.000137 ± 4.20E-08 0.000135 ± 9.76E-07 6 0.000844 ± 3.02E-06 0.000835 ± 3.18E-07 0.000826 ± 2.13E-06 2 0.000235 ± 3.09E-06 0.000221 ± 8.05E-07 0.000256 ± 3.04E-06 Table 3 displays the results of stations 1,6,2. In this case, centralized learning shows better results on two stations and FedBiased performs best on the last station. There is no obvious rule, however, FedBiased show better results on all the three stations. Table 4 MAE and standard deviation for stations 15,16,17 using FedAvg, FedBiased and Centralized algorithms Station FedAvg FedBiased Centralized 15 0.010 ± 1.03E-05 0.0101 ± 4.14E-05 0.01 ± 1.69E-05 16 0.0154 ± 4.45E-05 0.0151 ± 4.52E-05 0.0158 ± 3.45E-05 17 0.0143 ± 8.61E-06 0.0142 ± 1.50E-05 0.0144 ± 1.11E-05 Table 4 lists the values of MAE which reveals the same conclusion with Table 2 : FedBiased performs best on station 16 and station 17 and centralized learning performs best on station 15. Table 5 MAE and standard deviation for stations 1,6,2 using FedAvg, FedBiased and Centralized algorithms Station FedAvg FedBiased Centralized 1 0.00606 ± 3.805E-05 0.0060 ± 6.090E-05 0.005 ± 2.102E-05 6 0.0163 ± 3.461E-05 0.0160 ± 5.990E-05 0.0160 ± 2.766E-05 2 0.0116 ± 0.000170 0.0112 ± 0.000178 0.0119 ± 0.00012 Table 5 lists the values of MAE which reveals the same conclusion with Table 3 : centralized learning performs best on station 1 and station 6 and FedBiased performs best on station 2. On matter which groups are adopted for experiments, FedBiased shows better performances than FedAvg. However, sometimes, centralized learning is preferred since it outputs better results and higher efficiency. 4.4.3. Statistical testing We apply two-sample t-test to demonstrate that the results of FedBiased model are significant against the FedAvg. We set the null hypothesis (H0) as “The MSE of FedBiased is not different from FedAvg” and the alternative hypothesis (H1) as “The MSE of FedBiased is different from FedAvg”. MSE was selected as samples to calculate their corresponding t-statistics with a confidence level (α) of 0.05, respectively. The definition of t value is displayed in formula (7). $$t=(\widehat{{X}_{1}}-\widehat{{X}_{2}})/\sqrt{\frac{{s}_{1}^{2}}{{n}_{1}}+\frac{{s}_{2}^{2}}{{n}_{2}}}$$ 7 where \(\widehat{X}\) is the sample mean, \({s}^{2}\) is the sample variance, and \(n\) is the sample size. We compare the MSEs of FedBiased and FedAvg under the group including stations 1, 6 and 2. The t-statistic value for station 1 was 6.453 with a p-value of 0.000112, and the t-statistic value for station 6 was 9.372 with a p-value of 6.123E-06. The t-statistic value for station 2 was 13.865 with a p-value of 2.231E-07. The p-values for all three stations are less than the 0.05 confidence level, rejecting the null hypothesis and accepting the alternative hypothesis that FedBiased is different from FedAvg and that the prediction is better than FedAvg. 4.5. The robustness analysis of federated learning techniques In literature, when privacy and data security issues are not considered, centralized learning methods often show better performances than federated learning algorithms [ 27 ]. Does this mean that federated learning algorithms are useless or needless? Or, in which environment federated learning algorithms show dramatic performances? In the following studies, we simulate three real noisy scenarios of time series data which have been introduced in Section 3. The results are arranged from four aspects: (1) the performances of federated learning algorithms on noisy time series (only one time series is noisy); (2) the performances of federated learning algorithms under three nosing cases; (3) the performances of federated learning algorithms with incremental stations. 4.5.1. The performances of federated learning algorithms on noisy time series Robustness becomes a more and more important characteristic recent years. Therefore, in this study, we try to verify the effectiveness of federated learning algorithms on noisy time series (air quality data). The raw time series are assumed clean and we add some noise on them using three strategies defined in Section 3. Time series from three stations are collected to realize federated learning algorithms: station 1, 6 and 2. Assume that only one station, i.e., station 1, has problems on sensors: the time series is noisy (SNR = 40). Now let’s look at the performances of federated learning algorithms using two metrics: MSE and IA. As comparison, centralized learning is employed as well on Station 1. Table 6 Performances of FedAvg, FedBiased and Centralized with MSE and IA Metrics FedAvg FedBiased Centralized MSE 0.000139 ± 1.498E-07 0.000136 ± 1.583E-07 0.000137 ± 1.706E-06 IA 0.7455 ± 6.031E-06 0.7457 ± 6.488E-06 0.7451 ± 0.000461 Table 6 displays the results of three strategies: FedAvg, FedBiased and Centralized algorithms on the group of stations 1, 2, 6. Both MSE and IA show that FedBiased outputs the smallest test error and thus owing the highest generalization. This experiment verified the effectiveness of the FedAvg strategy when dealing with the entire noisy data. Even in the presence of data uncertainty and noise, the FedAvg is still able to effectively learn useful patterns from the data. In addition, the FedAvg is better able to detect trends in the data when faced with noisy data, resulting in more accurate predictions. 4.5.2. The performances of federated learning algorithms under three nosing cases Considering that the appearance of noise in real life may be random, we added two sets of contrast experiments, noise was only added to the first and second halves of the training data. To get closer to the noise common in real life, we set the SNR to 40 for station 1 and used the FedAvg, FedBiased and Centralized for training. The average results of the 10 times experiments are presented in Table 7 , where columns 1 to 3 show the MSE and its variance for the three noise cases, and columns 4 to 6 show the IA and its variance. Table 7 MSE and IA values for three noise adding cases using FedAvg, FedBiased and Centralized methods Method MSE IA Case 1 Case 2 Case 3 Case 1 Case 2 Case 3 FedAvg 0.000139 ± 1.498E-07 0.000142 ± 1.37E-07 0.000139 ± 1.526E-07 0.7455 ± 6.031E-06 0.7494 ± 4.928E-06 0.7455 ± 9.863E-06 FedBiased 0.000136 ± 1.583E-07 0.000140 ± 1.103E-07 0.000136 ± 1.224E-07 0.7457 ± 6.488E-06 0.7497 ± 6.238E-06 0.7457 ± 6.467E-06 Centralized 0.000137 ± 1.706E-06 0.000139 ± 1.63E-06 0.000136 ± 1.678E-06 0.7451 ± 0.000461 0.7495 ± 0.000449 0.7455 ± 0.000447 Table 7 shows that the MSE values of FedBiased for the three noise cases are smallest, and the IA values of FedBiased are highest. It can be inferred that federated learning algorithms are still able to effectively learn useful patterns from the data even in the face of noise interference. Specifically, the prediction accuracy and stability of the model of the noise-added time series data are improved after being processed by the FedBiased. Table 7 also shows that Case 1 and Case 3 have similar effects, while case 2 differs somewhat from these two. Specifically, both MSE and IA are larger for Case2. This indicates that when dealing with noisy data, the first half of the noise has a greater impact on the training results. 4.5.3. The performances of FedBiased with incremental stations Tables 8 and 9 display the MSE and standard deviation values of Station 1 and Station 6 under different groups of stations. It shows in an incremental manner: starting from two stations until six stations. We stop at six stations due to two reasons: one is computation complexity and the other is the error. Note that individual modeling using Station 1 returns MSE value equal to 0.000135 and individual modeling using Station 6 returns MSE value equal to 0.000827. To test the influence of station 1 with noise data on different stations using FedBiased, Tables 8 and 9 show the influence on its own and peer station 6, respectively. The SNR was set to 30 to make the effect of FedBiased more prominent. It is worth noting that individual modeling using Station 1 returns an MSE value of 0.000140. In Table 8 , the best results are from the group of stations 1 ,6 and 2. Station 1 has an MSE of 0.000138 under this group, which is lower than its centralized value of 0.000140. This reflects the fact that FedBiased has a more pronounced effect in the presence of noise, resulting in higher accuracy. Table 8 MSE and standard deviation values of Station 1 under different groups of monitoring stations Indexes of grouped stations SNR = 0 SNR = 30 1, 6 0.000139 ± 2.322E-07 0.000138 ± 6.415E-07 1, 6, 2 0.000137 ± 4.196E-08 0.000138 ± 1.453E-07 1, 6, 2, 5 0.000141 ± 2.554E-08 0.000140 ± 9.318E-08 1, 6, 2, 5, 3 0.000142 ± 2.952E-08 0.000139 ± 5.950E-08 1, 6, 2, 5, 3, 4 0.000143 ± 1.157E-07 0.000140 ± 1.311E-08 At the same time, some experimental results show that when SNR = 30, the loss of the training set is larger than that of the training set when SNR = 0, but the loss of the test set is very close and even smaller, which can be explained by the fact that appropriate Gaussian noise can alleviate the overfitting of the model. Table 9 MSE and standard deviation values of Station 6 under different groups of monitoring stations Indexes of grouped stations SNR = 0 SNR = 30 6, 1 0.000826 ± 5.770E-07 0.000826 ± 1.20E-06 6, 1, 2 0.000835 ± 3.182E-07 0.000835 ± 4.754E-07 6, 1, 2, 5 0.000835 ± 4.293E-08 0.000835 ± 3.72E-07 6, 1, 2, 5, 3 0.000838 ± 4.690E-07 0.000835 ± 3.432E-07 6, 1, 2, 5, 3, 4 0.000838 ± 4.515E-07 0.000838 ± 3.047E-07 For station 6 without adding noise, the best results are both from the group of stations 1 and 6 when raw and noisy time series are used to train prediction models. When comparing the centralized result of 0.0000827 for station 6 with the FedBiased result of 0.000826 for station 1 under SNR = 0 and SNR = 30, it is evident that site 1 has little influence on other sites in the federated algorithm, even with a certain amount of noise. From a comprehensive comparison of Tables 8 and 9 , it is clear that the FedBiased improves the performance of stations with a certain level of noise without adversely affecting the performance of other stations the best combination. This demonstrates the stability and robustness of the federated biased algorithm model. However, this does not mean that a larger number of stations will be more robust, as each station has its most appropriate group. 5. Conclusion This study proposed a new federated learning algorithm–federated biased learning and verified its better performances than the popular federated averaging learning algorithm when predicting time series using an air quality index case. We established a multi-station time series prediction framework to achieve the best prediction for each station, using the personalized effect of federated biased learning in the optimal combination of stations. At the same time, a comprehensive framework comprising three scenarios of disturbances existing in the time series is designed to testify the robustness and reliability of federated learning algorithms. This study carefully compares the performances of federated learning algorithms and centralized methods and concludes that federated learning algorithms function better when there are some disturbances or uncertainties in the raw time series. Through the experimental studies, we can make the following conclusions: (1) Federated learning methods are more robust than single modeling because they collect as many as possible information and variety from different data sources. This makes it is preferred when reliability is the most important requirement in real engineering tasks. (2) Both FedAvg and FedBiased methods function well in different cases. However, in the air quality index prediction case in Beijing, FedBiased performances better from several aspects than FedAvg. (3) Shallow neural networks are more suitable for this air quality index series than deep networks (LSTM in this paper) in terms of accuracy and running time. Therefore, it is necessary to consider model architectures from simple to complex before developing new models. (4) The number of local model combinations does not necessarily correlate with better federal learning. Each local model has its own optimal combination. According to those observations, our future research work will focus on the following topics: (1) The information of spatial data should be further explored to help capture more useful information and improve accuracy. (2) More federated learning methods will be collected to study the air quality index prediction problem and compared using more experiments. (3) Validation of the framework's effectiveness and generalizability on various datasets will be studied. Declarations Author Contribution Mingli Song: methodology and writing. Xinyu Zhao: experiments.Witold Pedrycz: reviewing and editing. Acknowledgement This work was supported by the National Natural Science Foundation of China (NSFC) 61773352 and the Fundamental Research Funds for the Central Universities. http://www.bjmemc.com.cn/ References Yu, P., Liu, Y.: Federated object detection: Optimizing object detection model with federated learning. In: Proc. 3rd Int. Conf. Vision, Image Signal Process., pp. 1-6 (2019) Latif, S., Khalifa, S., Rana, R., Jurdak, R.: Federated learning for speech emotion recognition applications. In: 2020 19th ACM/IEEE Int. Conf. Inf. Process. Sensor Netw. (IPSN), pp. 341-342. IEEE (2020) Ek, S., Portet, F., Lalanda, P., Vega, G.: Evaluation of federated learning aggregation algorithms: application to human activity recognition. In: Adjunct Proc. 2020 ACM Int. Joint Conf. Pervasive Ubiquitous Comput. Proc. 2020 ACM Int. Symp. Wearable Comput., pp. 638-643 (2020) Chhikara, P., Tekchandani, R., Kumar, N., Guizani, M., Hassan, M.M.: Federated learning and autonomous UAVs for hazardous zone detection and AQI prediction in IoT environment. IEEE Internet Things J., 8(20), 15456-15467 (2021) McMahan, B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Artif. Intell. Stat., pp. 1273-1282 (2017) Li, T., Sahu, A.K., Zaheer, M., Sanjabi, M., Talwalkar, A., Smith, V.: Federated optimization in heterogeneous networks. Proc. Mach. Learn. Syst., 2, 429-450 (2020) Karimireddy, S.P., Kale, S., Mohri, M., Reddi, S., Stich, S., Suresh, A.T.: Scaffold: Stochastic controlled averaging for federated learning. In: Int. Conf. Mach. Learn., pp. 5132-5143 (2020) Wang, J., Liu, Q., Liang, H., Joshi, G., Poor, H.V.: Tackling the objective inconsistency problem in heterogeneous federated optimization. Adv. Neural Inf. Process. Syst., 33, 7611-7623 (2020) Zhao, Z., Feng, C., Hong, W., Jiang, J., Jia, C., Quek, T.Q., Peng, M.: Federated learning with non-iid data in wireless networks. IEEE Trans. Wireless Commun., 21(3), 1927-1942 (2021) Wang, S., Cao, J., Philip, S.: Deep learning for spatio-temporal data mining: A survey. IEEE Trans. Knowl. Data Eng., 34(8), 3681-3700 (2020) Zhang, X., Wen, S., Yan, L., Feng, J., Xia, Y.: A hybrid-convolution spatial–temporal recurrent network for traffic flow prediction. Comput. J., 67(1), 236-252 (2024) Huo, P., Li, Z., Bai, M., Li, Z., Huang, J., Han, L.: Spatial-temporal evolutions of historical and future meteorological drought center in Beijing area, China. Urban Clim., 53, 101786 (2024) Johnson, D.P., Owusu, C.: Examining associations between social vulnerability indices and COVID-19 incidence and mortality with spatial-temporal Bayesian modeling. Spatial Spatio-temporal Epidemiol., 48, 100623 (2024) Wu, Y., Huang, Z., Zheng, Y., Liu, Y., Li, H., Che, Y., et al.: Spatial–temporal data-driven full driving cycle prediction for optimal energy management of battery/supercapacitor electric vehicles. Energy Convers. Manag., 277, 116619 (2023) Musa, A.A., Hussaini, A., Liao, W., Liang, F., Yu, W.: Deep neural networks for spatial-temporal cyber-physical systems: A survey. Future Internet, 15(6), 199 (2023) Jin, Z., Qian, J., Kong, Z., Pan, C.: A mobility aware network traffic prediction model based on dynamic graph attention spatio-temporal network. Comput. Netw., 235, 109981 (2023) Geng, Z., Xu, J., Wu, R., Zhao, C., Wang, J., Li, Y., Zhang, C.: STGAFormer: Spatial–temporal gated attention transformer based graph neural network for traffic flow forecasting. Inf. Fusion, 102228 (2024) Zhao, L., Song, Y., Zhang, C., Liu, Y., Wang, P., Lin, T., Li, H.: T-GCN: A temporal graph convolutional network for traffic prediction. IEEE Trans. Intell. Transp. Syst., 21(9), 3848-3858 (2019) Wang, C., Zhu, Y., Zang, T., Liu, H., Yu, J.: Modeling inter-station relationships with attentive temporal graph convolutional network for air quality prediction. In: Proc. 14th ACM Int. Conf. Web Search Data Min., pp. 616-634 (2021) Padhi, I., Schiff, Y., Melnyk, I., Rigotti, M., Mroueh, Y., Dognin, P., Altman, E.: Tabular transformers for modeling multivariate time series. In: ICASSP 2021-2021 IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), pp. 3565-3569. IEEE (2021) Xiao, P., Cheng, S., Stankovic, V., Vukobratovic, D.: Averaging is probably not the optimum way of aggregating parameters in federated learning. Entropy, 22(3), 314 (2020) Qu, Z., Lin, K., Li, Z., Zhou, J.: Federated learning’s blessing: FedAvg has linear speedup. In: ICLR 2021-Workshop Distrib. Private Mach. Learn. (DPML) (2021) Wang, J., Das, R., Joshi, G., Kale, S., Xu, Z., Zhang, T.: On the unreasonable effectiveness of federated averaging with heterogeneous data. arXiv preprint arXiv:2206.04723 (2022) Wang, J., Charles, Z., Xu, Z., Joshi, G., McMahan, H.B., Al-Shedivat, M., et al.: A field guide to federated optimization. arXiv preprint arXiv:2107.06917 (2021) Wei, K., Li, J., Ding, M., Ma, C., Yang, H.H., Farokhi, F., et al.: Federated learning with differential privacy: Algorithms and performance analysis. IEEE Trans. Inf. Forensics Secur., 15, 3454-3469 (2020) Pastor, D.: A theoretical result for processing signals that have unknown distributions and priors in white Gaussian noise. Comput. Stat. Data Anal., 52(6), 3167-3186 (2008) Wang, Y., et al.: Federated learning for automatic modulation classification under class imbalance and varying noise condition. IEEE Trans. Cogn. Commun. Netw., 8(1), 86-96 (2021) Additional Declarations No competing interests reported. Cite Share Download PDF Status: Under Review Version 1 posted Editorial decision: Revision requested 29 Sep, 2024 Reviews received at journal 18 Jul, 2024 Reviews received at journal 17 Jul, 2024 Reviewers agreed at journal 13 Jul, 2024 Reviewers agreed at journal 10 Jul, 2024 Reviewers agreed at journal 08 Jul, 2024 Reviewers invited by journal 08 Jul, 2024 Editor assigned by journal 30 Jun, 2024 Submission checks completed at journal 29 Jun, 2024 First submitted to journal 29 Jun, 2024 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4658479","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":328949687,"identity":"e1a0c647-b487-4ae8-b793-6f4752936a29","order_by":0,"name":"Mingli Song","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA6ElEQVRIiWNgGAWjYNACAwYGPmbmAxDOAWK1sDGzJZCiBQjYGHgMiNMi3957+NWNgjt2bew8Hz/z1DDI8d1IYPxcgM9JZ86lWecYPEtuY+bdLM1zjMFY8kYCs/QMfFokcsyMcwwOJ7Mx825j5m1gSNxwI4GNmQefw2bAtfA8A2mpJ6iF4UaO8WOgFjugMjaQlgQDQloMzpwxYwZqASpjM5acc0zCcOaZh83SeB3W3mP8OefPYXt+/sMPP7ypsZHnO5588DNehwFjRAJIJDZAOCA2YwN+DQwMzB+AhD0hVaNgFIyCUTCCAQAs7UQRwfGc/gAAAABJRU5ErkJggg==","orcid":"","institution":"Communication University of China","correspondingAuthor":true,"prefix":"","firstName":"Mingli","middleName":"","lastName":"Song","suffix":""},{"id":328949688,"identity":"635eeda0-8efe-48bb-a0a1-a73c511bbbaf","order_by":1,"name":"Xinyu Zhao","email":"","orcid":"","institution":"Communication University of China","correspondingAuthor":false,"prefix":"","firstName":"Xinyu","middleName":"","lastName":"Zhao","suffix":""},{"id":328949689,"identity":"ec23555a-4563-4a7b-860c-656319d0835f","order_by":2,"name":"Witold Pedrycz","email":"","orcid":"","institution":"University of Alberta","correspondingAuthor":false,"prefix":"","firstName":"Witold","middleName":"","lastName":"Pedrycz","suffix":""}],"badges":[],"createdAt":"2024-06-29 09:08:21","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4658479/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4658479/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":60816182,"identity":"94d96442-3cf9-476e-baea-7c969f488de3","added_by":"auto","created_at":"2024-07-22 11:57:40","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":33881,"visible":true,"origin":"","legend":"\u003cp\u003eThe general learning process of federated learning methods.\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-4658479/v1/b6beb7470bd1d5c94df0d954.png"},{"id":60816186,"identity":"0f086e57-bd02-42e0-9540-d6a9af0531f8","added_by":"auto","created_at":"2024-07-22 11:57:40","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":24481,"visible":true,"origin":"","legend":"\u003cp\u003ek nearest neighbors for a certain station on the earth\u003c/p\u003e","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-4658479/v1/1764abf9dcd7cdaf1c10de5f.png"},{"id":60816184,"identity":"0d083919-95c8-4338-86e9-94a7d8df9c48","added_by":"auto","created_at":"2024-07-22 11:57:40","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":228385,"visible":true,"origin":"","legend":"\u003cp\u003e35 AQI monitoring stations in Beijing (China)\u003c/p\u003e","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-4658479/v1/2b4fe3c122c11aa9c99cc1b6.png"},{"id":60816183,"identity":"9b49c5fb-d8f2-4be9-98ed-f6f8c43b715b","added_by":"auto","created_at":"2024-07-22 11:57:40","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":79350,"visible":true,"origin":"","legend":"\u003cp\u003eTraining data (raw time series) are added with different SNRs from station 1.\u003c/p\u003e","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-4658479/v1/8c6eb60a99cc060746246c38.png"},{"id":60816187,"identity":"32feced0-58d6-4a5a-b0b6-0e9f2b7a7038","added_by":"auto","created_at":"2024-07-22 11:57:41","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":48763,"visible":true,"origin":"","legend":"\u003cp\u003eAQI prediction results of Station 1 with different SNRs time series and raw time series\u003c/p\u003e","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-4658479/v1/704c2160e557fcd2c1b3ed3b.png"},{"id":60816739,"identity":"82195c90-377a-41ed-9866-4d0eba691ed9","added_by":"auto","created_at":"2024-07-22 12:05:40","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":27839,"visible":true,"origin":"","legend":"\u003cp\u003eThe training MSE change with epoch on centralized modeling, FedAvg modeling and FedBiased modeling for monitoring station 1.\u003c/p\u003e","description":"","filename":"floatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-4658479/v1/f41ff1cba6a9d3a26c0a4e7c.png"},{"id":60817214,"identity":"b7be488b-b916-4ac9-8783-0e158cfa96b3","added_by":"auto","created_at":"2024-07-22 12:13:42","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1382795,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4658479/v1/fe1d3d43-f0ce-49b9-a9b6-665f369d43cc.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"A robust federated biased learning algorithm for time series forecasting ","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eFederated learning techniques are widely applied to many fields due to the blossom of IoT and intelligent sensors. In literature, the federated learning studies can be classified into two categories: one is applied for privacy related tasks and the other one is for tasks without considering privacy issues. This study will concentrate on the second one. No matter in which environment, one of the most important issues in federated learning is how to utilize information provided by local models to form global viewpoints and guide further update of local models. Federated averaging algorithm (FedAvg) [1-4], as a popular tool, introduces an average idea when sharing model parameters and has been verified very effective in many situations. However, sometimes the information from local models is not equal but with biases. Therefore, in this study, we take a thorough study on how to realize federated biased learning using a real case.\u003c/p\u003e\n\u003cp\u003eSince our environment is heavily affected by people activation and getting air quality information in advance may help people make proper decisions, air quality prediction issues seem more complex but important. To help provide more useful information and further improve prediction accuracy, an idea of mutual help among neighbors (neighbor observing stations or groups) is developed in this study which is in fact a type of federated learning.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eFedAvg algorithm simply averages all local model parameters and sends them to local sides to start a new training round. If the selection of local sides is not proper, there will be new disturbance decreasing the accuracy prediction. Therefore, we introduce the idea of federated biased learning which comprehensively considers the contributions of each local model instead of averaging them. It avoids the disturbance problem brought by spatial information. In literature, many researchers adopt spatial information as part of the data; however, whether this action is proper is pending especially from the mathematical perspective. Therefore, we try an incremental way on local sides and observe the performances of different groups. The results imply an interesting conclusion on the air quality prediction tasks.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eAnother crucial issue in federated learning is that federated learning strategies don\u0026rsquo;t always show better results comparing with centralized learning. In this case, one has to look at the result of centralized learning when the prediction accuracy is the main requirement. If centralized learning methods win, does it mean that federated learning methods are useless for this special problem? To answer this question, we design a federated learning verification framework from the robustness perspective which becomes a more important evaluation criterion when developing models.\u003c/p\u003e\n\u003cp\u003eAbove all, the main contributions of this study are:\u003c/p\u003e\n\u003cp\u003e(1) We propose a new federated learning algorithm named federated biased learning (FedBiased) which is testified more effective than FedAvg when being applied to solve multi-station air quality prediction problems in Beijing. Here the novelty is that each local model\u0026rsquo;s specificity is utilized and each local model pursuing the highest accuracy is the objective.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e(2) We verify the strong robustness of federated learning strategies through proposing a comprehensive framework comprising three scenarios of disturbances. Both FedAvg and FedBiased algorithms as representatives are applied to the air quality prediction problem with raw data and noisy data.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e(3) A thorough experimental studies and comparisons are executed to reveal where federated learning performs better and where centralized learning performs better under the complex multi-station air quality prediction issue.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe main contents of this study are organized as follows. Section 2 introduces the background of federated learning and related techniques. Section 3 elaborates on the details of the proposed FedBiased method. Some experiments are done and discussed in Section 4. Finally, the conclusion is stated in Section 5.\u003c/p\u003e"},{"header":"2. Background","content":"\u003cp\u003eCollecting a plethora of data from different sensors is benefit for increasing prediction accuracy. Therefore, federated techniques in distributed learning become popular especially when all local data sets are collected for the same target. The client-server federated learning framework is currently the most commonly used mode in federated learning. In 2016, google proposed the classical algorithm Federated Averaging (FedAvg) [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e] which becomes the benchmark of this field for a long time. There are also many improved methods proposed based on FedAvg, such as FedProx [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e], Scaffold [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e], and FedNova [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. FedProx addresses non-IID data and device heterogeneity issues through a proximal term. Scaffold achieves higher communication efficiency by maintaining a control variate. FedNova employs second-order optimization methods and adaptive regularization techniques to improve the convergence speed and the performance. These methods require that all local data sets are Independent and Identically Distributed.\u003c/p\u003e \u003cp\u003eSometimes, traditional federated learning approaches are not effective when there are significant differences in the local data distribution. The distribution of Non-Independent and Identically Distributed (Non-IID) data in the client data causes the local model update to deviate from the global optimal, which seriously affects the performance of the training model [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. Therefore, the selection of proper learning algorithm depends on the distribution of local data sets.\u003c/p\u003e \u003cp\u003eIn this study, we only pay attention to temporal data (time series) that are recorded from different sensors (stations). It is a kind of sequential data and in general there is only one critical variable (univariable) directly connected with the task. Recent years witnesses that more and more spatial-temporal data come into existence due to the rapid development of computer and networking industry. Spatial-temporal data prediction is a central research issue in this field [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e], with significant applications in domains such as transportation [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e], meteorology [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e], and healthcare [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]. The primary goal of spatial-temporal data prediction is to leverage historical spatial-temporal data to make informed judgments about the future development of such data, aiding decision-making processes [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e, \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eDeep networks are very effective tools to solve sequential data prediction problems and even spatial-temporal data prediction problems [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e] proposed a network traffic prediction model based on dynamic graph attention spatial-temporal networks, effectively capturing the complex features of network traffic [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e] introduced a novel spatial\u0026ndash;temporal gated attention transformer, achieving state-of-the-art performance in traffic flow forecasting. Zhao et al. [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e] proposed predictive models based on a combination of graph convolutional networks and Gated Recurrent Unit (GRU) recurrent neural networks. This model integrates city adjacency matrices and feature matrices into a GCN for comprehensive spatial modeling. Wang et al. [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e] encoded spatial adjacency relationships, temporal pattern similarities, and functional similarities between air prediction stations in Beijing and Tianjin into a graph, proposing an attentive temporal graph convolutional network (ATGCN). Results indicate that graph convolutional networks can aggregate features between stations, leading to superior prediction performance. Furthermore, Padhi et al. [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e] leveraged Transformers to introduce a hierarchical BERT model for learning temporal sequences.\u003c/p\u003e \u003cp\u003eAlthough these deep network models complete prediction tasks from specific perspectives, deep networks are not always necessary, for example, when the quantity of the data is not huge. One has to look into the data set or do some experiments to determine whether to adopt deep networks or not. In addition, proper usage of spatial information may help increase the accuracy. On the contrary, unproper usage may decrease the accuracy. Thus, how to utilize the spatial information deserves careful attention and further exploration.\u003c/p\u003e"},{"header":"3. A Federated Biased Learning method","content":"\u003cp\u003eAs peng et al. [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e] put forward, averaging parameters may not be the optimum way of aggregating trained parameters. In this section, we introduce a new federated learning method named federated biased learning (FedBiased), which emphasizes the individual model knowledge through considering the performance of each local model.\u003c/p\u003e \u003cp\u003eMost federated learning methods are designed to deal with privacy related tasks in which the terms \u0026ldquo;clients\u0026rdquo; and \u0026ldquo;server\u0026rdquo; are often used. In this study, we focus on the study of communication of parameters among different models without considering the privacy fact. Therefore, the term \u0026ldquo;local\u0026rdquo; is adopted to represent the client side and the term \u0026ldquo;global\u0026rdquo; is adopted to represent the server side.\u003c/p\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e3.1 The Basic framework of federated learning methods\u003c/h2\u003e \u003cp\u003eFigure\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e shows a conventional federated learning process. Each local model comes with the same architecture and their parameters are stored in \u003cem\u003eW\u003c/em\u003e\u003csub\u003e\u003cem\u003ei\u003c/em\u003e\u003c/sub\u003e, \u003cem\u003ei\u003c/em\u003e\u0026thinsp;=\u0026thinsp;1, 2, \u0026hellip;, \u003cem\u003ep\u003c/em\u003e. All local models are initialized with the same parameters, that is, \u003cem\u003eW\u003c/em\u003e\u003csub\u003e\u003cem\u003ei\u003c/em\u003e\u003c/sub\u003e (0)\u0026thinsp;=\u0026thinsp;\u003cem\u003eW\u003c/em\u003e\u003csub\u003e0\u003c/sub\u003e. Each local model is firstly trained independently to some extent. At time t, each local model provides its parameters to the global model and the global model collects all parameters and computes a \u0026ldquo;representative\u0026rdquo; matrix \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\stackrel{\\sim}{W}\\)\u003c/span\u003e\u003c/span\u003e. This matrix is sent to each local model as the initial parameters at time t\u0026thinsp;+\u0026thinsp;1. Each local model is then trained independently. This interaction repeats until a stop condition is satisfied.\u003c/p\u003e \u003cp\u003eIn any article it is unnecessary to have an arrangement statement at the beginning (or end) of every (sub-) section. Rather, a single overall arrangement statement about the whole paper can be made at the end of the Introduction section.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe federated averaging algorithm is a popular method in this field and has been verified more effective than many other methods [\u003cspan additionalcitationids=\"CR23\" citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e]. It follows the basic learning process shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e and the idea of \u0026ldquo;averaging\u0026rdquo; is embodied in the calculation way of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\stackrel{\\sim}{W}\\)\u003c/span\u003e\u003c/span\u003e. Please refer to formula (1).\u003cdiv id=\"Equ1\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ1\" name=\"EquationSource\"\u003e\n$$\\stackrel{\\sim}{\\varvec{W}}(t+1)=\\frac{\\sum _{i=1}^{p}{\\varvec{W}}_{i}\\left(t\\right)}{p}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e1\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eWhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({\\varvec{W}}_{i}\\left(t\\right)\\)\u003c/span\u003e\u003c/span\u003e represents the parameters of the i-th local model at time t, and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\stackrel{\\sim}{\\varvec{W}}\\)\u003c/span\u003e\u003c/span\u003e(t\u0026thinsp;+\u0026thinsp;1) refers to the aggregated matrix using all local models at time t. The federated averaging algorithm aggregates all local parameters through computing their mean values. Take neural networks for example. In the training process of each local model, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({\\varvec{W}}_{i}\\left(t\\right)\\)\u003c/span\u003e\u003c/span\u003e is updated using backpropagation method and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({\\varvec{W}}_{i}(t+1)\\)\u003c/span\u003e\u003c/span\u003e is obtained and then sent to the global model side. Assume that there is no information lost during the transmission process.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e3.2 A biased way of aggregation\u003c/h2\u003e \u003cp\u003eWe consider a real situation: the local model or the local data resource shows distinctive characteristics. In this case, when calculating mean values, the standard deviation will be large. At this time, the biased way of aggregating local parameters will perform better than the averaging algorithm. Therefore, we define a new federated learning algorithm using its dominant knowledge for a certain local model.\u003c/p\u003e \u003cp\u003eThe performance of each local model is evaluated by the accuracy (or error). We assume that a local model with higher accuracy provides more useful information when aggregating. Compared with federated averaging algorithm, it should occupy more information than the averaged one. Therefore, we calculate a weight for each local model using a normalized accuracy weight. In this way, the overall weight matrix is not an average one, but a biased one.\u003c/p\u003e \u003cp\u003eIn FedBiased, each local model comes with the same initialized architecture and parameters. Each local model trains independently with its own data for some time and then sends the parameters to the global model. The global model collects all local parameters and calculate a unique biased matrix of parameters. The update of the parameters follows formula (2). This process iterates through multiple rounds until convergence.\u003cdiv id=\"Equ2\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ2\" name=\"EquationSource\"\u003e\n$$\\stackrel{\\sim}{\\text{W}}(\\text{t}+1)=\\sum _{\\text{i}=1}^{\\text{p}}{\\text{W}}_{\\text{i}}\\left(\\text{t}\\right)\\times {\\text{r}}_{\\text{i}}\\left(\\text{t}\\right)$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e2\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equ3\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ3\" name=\"EquationSource\"\u003e\n$${r}_{i}\\left(t\\right)=\\frac{1- \\frac{{error}_{i}\\left(t\\right)}{\\sum _{i=1}^{p}{error}_{i}\\left(t\\right)}}{\\sum _{i=1}^{p}1-\\frac{{error}_{i}\\left(t\\right)}{\\sum _{i=1}^{p}{error}_{i}\\left(t\\right)}}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e3\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003ewhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({\\varvec{W}}_{i}\\left(t\\right)\\)\u003c/span\u003e\u003c/span\u003e represents the parameters of the \u003cem\u003ei\u003c/em\u003e-th local model at time t, and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\stackrel{\\sim}{\\varvec{W}}\\)\u003c/span\u003e\u003c/span\u003e(t\u0026thinsp;+\u0026thinsp;1) refers to the aggregated matrix using all local models at time t. \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({error}_{i}\\left(t\\right)\\)\u003c/span\u003e\u003c/span\u003e in formula (3) represents the error of the \u003cem\u003ei\u003c/em\u003e-th local model at time t, MSE for instance. The biased way of updating global weight matrix using formula (2) sufficiently considers the contribution of each individual model. Note that each local model has to be trained independently when they obtain new parameters for \u0026ldquo;some time\u0026rdquo; which means several rounds or epochs. All local models adopt the same number of rounds or epochs. This requirement makes sure that all local models always stay in the same level. An issue is how to determine this number. In this study, we adopt a number less than 10. The discussion of its experimental performance will be elaborated in Section 4.\u003c/p\u003e \u003cp\u003eThe biased federated learning method has two main advantages:\u003c/p\u003e \u003cp\u003e \u003col\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eIt sufficiently considers the individual contribution of each local model when aggregating.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eIt dynamically adjusts the weight of each local model through computing a local model\u0026rsquo;s accuracy during each epoch. The weight of each local model can be regarded as a learning rate or ratio of the model.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e \u003cp\u003eThe learning process of FedBiased is the same as FedAvg (in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). The only difference is the way updating the global parameters-matrix. To clarify the new method, we describe it in the pseudocode way:\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"No\" id=\"Taba\" border=\"1\"\u003e \u003ccolgroup cols=\"1\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAlgorithm 1 Federated Biased Learning Method\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eRequire\u003c/b\u003e:\u003c/p\u003e \u003cp\u003e\u003cb\u003eW\u003c/b\u003e // Set a consistent model architecture across all local models\u003c/p\u003e \u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(E\\)\u003c/span\u003e\u003c/span\u003e // Number of epochs for local training\u003c/p\u003e \u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\text{t}\\text{r}\\text{a}\\text{i}\\text{n} \\text{l}\\text{o}\\text{a}\\text{d}\\text{e}\\text{r}\\left[i\\right],\\)\u003c/span\u003e\u003c/span\u003e test\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\text{l}\\text{o}\\text{a}\\text{d}\\text{e}\\text{r}\\left[i\\right]\\)\u003c/span\u003e\u003c/span\u003e // Training and testing data loaders for each local model\u003c/p\u003e \u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(N\\)\u003c/span\u003e\u003c/span\u003e // Number of local models\u003c/p\u003e \u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(R\\)\u003c/span\u003e\u003c/span\u003e // Number of iterations\u003c/p\u003e \u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(Ratio\\left(r\\right)\\)\u003c/span\u003e\u003c/span\u003e // the learning rate for local models in round \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(r\\)\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({\\stackrel{\\sim}{W}}^{{\\prime }}\\left(r\\right)\\)\u003c/span\u003e\u003c/span\u003e // the aggregated matrix using all local models in round \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(r-1\\)\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eEnsure\u003c/b\u003e:\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eThe updated global model weights: \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({W}^{{\\prime }}\\left(R\\right)\\)\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003cp\u003e1: \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(r\\leftarrow 0;\\)\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003cp\u003e2: \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({W}^{{\\prime }}\\left(r\\right)\\leftarrow \\varvec{W};\\)\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003cp\u003e3: \u003cb\u003eWhile\u003c/b\u003e \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(r\u0026lt;R\\)\u003c/span\u003e\u003c/span\u003e \u003cb\u003edo\u003c/b\u003e\u003c/p\u003e \u003cp\u003e4: Load \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({W}^{{\\prime }}\\left(r\\right)\\)\u003c/span\u003e\u003c/span\u003e for all local models\u003c/p\u003e \u003cp\u003e5: \u003cb\u003efor\u003c/b\u003e \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(i=0\\)\u003c/span\u003e\u003c/span\u003e to \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(N-1\\)\u003c/span\u003e\u003c/span\u003e \u003cb\u003edo\u003c/b\u003e\u003c/p\u003e \u003cp\u003e6: \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(mode{l}_{i}\\leftarrow \\text{t}\\text{r}\\text{a}\\text{i}\\text{n}\\left(model,\\text{t}\\text{r}\\text{a}\\text{i}\\text{n} \\text{l}\\text{o}\\text{a}\\text{d}\\text{e}\\text{r}\\left[i\\right]\\right);\\)\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003cp\u003e7: \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({W}_{i}^{{\\prime }}\\left(r\\right)\\leftarrow get\\_weights(model\\_i);\\)\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003cp\u003e8: \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(erro{r}_{i}\\leftarrow test\\left(model,\\text{t}\\text{r}\\text{a}\\text{i}\\text{n} \\text{l}\\text{o}\\text{a}\\text{d}\\text{e}\\text{r}\\left[i\\right]\\right);\\)\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003cp\u003e9: \u003cb\u003eend for\u003c/b\u003e\u003c/p\u003e \u003cp\u003e10: \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(error\\left(r\\right)\\leftarrow \\left[erro{r}_{1},erro{r}_{2},\\dots ,erro{r}_{N}\\right];\\)\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003cp\u003e11: \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(Result\\left(r\\right)\\leftarrow ToSumOne\\left(\\varvec{e}\\varvec{r}\\varvec{r}\\varvec{o}\\varvec{r}\\right( \\varvec{r}\\left)\\right);\\)\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003cp\u003e12: \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(Ratio\\left(r\\right)\\leftarrow ToSumOne(1-\\varvec{R}\\varvec{e}\\varvec{s}\\varvec{u}\\varvec{l}\\varvec{t}(r\\left)\\right);\\)\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003cp\u003e13: \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(r\\leftarrow r+1;\\)\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003cp\u003e14: \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({\\stackrel{\\sim}{W}}^{{\\prime }}\\left(r\\right)\\leftarrow \\sum _{i=0}^{N-1}{W}_{i}^{{\\prime }}\\left(r-1\\right)\\times Ratio\\left(r-1\\right);\\)\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003cp\u003e15: \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({{W}^{{\\prime }}\\left(r\\right)\\leftarrow \\stackrel{\\sim}{W}}^{{\\prime }}\\left(r\\right)\\)\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003cp\u003e16: \u003cb\u003eend while\u003c/b\u003e\u003c/p\u003e \u003cp\u003e17: \u003cb\u003ereturn\u003c/b\u003e \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({W}^{{\\prime }}\\left(R\\right)\\)\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e3.3 Robustness analysis of the federated biased learning\u003c/h2\u003e \u003cp\u003eWhen using federated learning strategies, one has to note that federated learning methods may not always perform better than centralized learning methods. There are two reasons to explain this phenomenon. One is that more information may bring more disturbance or uncertainty. At this time, the federated learning result will be worse than the centralized learning result. The other reason is that some models such as deep networks model nowadays show remarkable high accuracy and there is no obvious improvement when using the federated learning methods. Although this obsession exists at all times, we find an interesting fact: federated learning methods show better results than centralized learning when there is some noise in the time series. In other words, federated learning methods will reduce the impact of uncertainties and show strong robustness in the learning process. To verify this declaration, we assume three scenarios of noisy data in this section and illustrate the comparison results in Section 4.\u003c/p\u003e \u003cp\u003eAssume that the public time series data sets are clean (if not specified). To continue the analysis, we have to refer to some noise techniques. Among those, Gaussian noise is the most commonly used one [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e]. Gaussian white noise is a common type of noise, which follows the Gaussian distribution and contains no correlation. Its probability density function follows normal distribution, which can simulate the random fluctuations and uncertainties that may exist in the actual data [\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eTo add Gaussian white noise to time series data, one can use the following formula:\u003cdiv id=\"Equ4\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ4\" name=\"EquationSource\"\u003e\n$${X}_{i\\_noise}={X}_{i}+N\\left(\\text{0,1}\\right)\\times \\sqrt{{P}_{n}}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e4\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equ5\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ5\" name=\"EquationSource\"\u003e\n$${P}_{n}=\\frac{{P}_{s}}{{10}^{(SNR/10)}}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e5\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equ6\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ6\" name=\"EquationSource\"\u003e\n$${P}_{s}=\\frac{\\sum |{X}_{i}{|}^{2}}{len\\left(sequence\\right)}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e6\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003ewhere \u003cem\u003eX\u003c/em\u003e\u003csub\u003e\u003cem\u003ei\u003c/em\u003e\u003c/sub\u003e is the raw data, SNR is the signal-to-noise ratio, and \u003cem\u003eN\u003c/em\u003e (0,1) is a Gaussian distributed random variable with mean 0 and standard deviation 1.\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({P}_{s}\\)\u003c/span\u003e\u003c/span\u003e stands for signal power, which is calculated by dividing the sequence sum of squares by the sequence length. And \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({P}_{n}\\)\u003c/span\u003e\u003c/span\u003e stands for noise power, which is jointly determined by signal power \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({P}_{s}\\)\u003c/span\u003e\u003c/span\u003e and SNR. By adjusting the value of SNR, the intensity of the noise can be controlled.\u003c/p\u003e \u003cp\u003eTo consider possible uncertainties occurred during sampling time series from sensors, we set up three cases:\u003c/p\u003e \u003cp\u003e \u003cstrong\u003eCase 1\u003c/strong\u003e \u003cp\u003ethe entire time series is added with noise;\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eCase 2\u003c/strong\u003e \u003cp\u003ethe first half part of the time series is added with noise;\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eCase 3\u003c/strong\u003e \u003cp\u003ethe latter half part of the time series is added with noise.\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eCase 1\u003c/strong\u003e \u003cp\u003eis used to simulate the scenario that the sensor doesn\u0026rsquo;t function normally because of the aging. Case \u003cspan refid=\"FPar2\" class=\"InternalRef\"\u003e2\u003c/span\u003e reveals the scenario that an abnormal performance of a sensor is observed after some time and the first half time series is added with a Gaussian noise. Case \u003cspan refid=\"FPar3\" class=\"InternalRef\"\u003e3\u003c/span\u003e is used to mimic the scenario that after some time the sensor doesn\u0026rsquo;t function normally and this is not found and corrected. Therefore, the latter noisy time series along with the first corrected time series is used to construct a prediction model.\u003c/p\u003e \u003c/p\u003e \u003cp\u003eAnother case is that there are several noisy data points in a time series which shows similar results when using federated learning to build prediction models. Therefore, we didn\u0026rsquo;t list it in this study.\u003c/p\u003e \u003cp\u003eIn federated learning, since the data are stored separately for each localization, the uncertainty and noise situation of the actual data can be simulated by adding noise to the local stations individually. In other words, the parameters of a Gaussian white noise may be different for different localization. Of course, we can set same values of those parameters to simplify the programming task.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003e3.4 Analysis of spatial information\u003c/h2\u003e \u003cp\u003eRecent studies on utilizing both spatial and temporal information to construct time series prediction models in federated learning can be summarized into two categories: one is utilizing spatial information and temporal information separately; the other is utilizing both spatial information and temporal information in one model. As to the first case, stations are put into different groups using cluster methods or other strategies and then training models use temporal data only. The second case uses spatial and temporal information in one model. An issue is that the spatial and temporal information are two different types and there is some sequential relationship between them. Therefore, before effective connection mechanisms being proposed, utilizing the two kinds of information separately is preferred. However, there is no comprehensive studies revealing how to effectively use spatial information independently when building time series prediction models.\u003c/p\u003e \u003cp\u003eIn this paper, we aim to increase the prediction accuracy of each local model. Keeping this in mind, each time, one local station is regarded as a leading role and its best k neighbors are grouped for federated learning. Geographical locations on the earth and the elevation values are the main characters to be used when analyzing climate data. The Euclidean distance is adopted to select k neighbors for a certain station with converted longitudes and latitudes and elevation values.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"4. Experiments","content":"\u003cp\u003eIn this section, we conduct simulations to evaluate the performance of the proposed federated biased learning algorithm and compare with other methods. Firstly, we introduce a real spatial-temporal time series case: air quality data from 35 monitoring stations in Beijing (China). Then, we present model selections and parameter settings. Finally, we analyze the performance of the proposed algorithm from several aspects.\u003c/p\u003e\n\u003cdiv id=\"Sec8\"\u003e\n \u003ch2\u003e4.1. Dataset\u003c/h2\u003e\n \u003cp\u003eThe air quality index (AQI) forecasting attracts more attention since air quality heavily affects human beings\u0026rsquo; daily life and healthcare. In this study, we collect air quality data from 35 monitoring stations in Beijing (China) and comprehensively study the effect of federated learning strategies from both temporal and spatial perspectives. Each station comes with a univariate time series which is recorded hourly.\u003c/p\u003e\n \u003cp\u003eWe use the time series of year 2022 which is downloaded from the Beijing Environmental Protection Testing Center and other related departments (\u003cspan\u003e\u003cspan\u003ehttp://www.bjmemc.com.cn/).Th\u003c/span\u003e\u003c/span\u003ee factors that may affect the prediction results include: the number of local stations, methods of federated learning, the location (elevations) of monitoring stations, and so on. We explore the effect of each possible factor through setting different environment parameters and analyzing the corresponding results. The 35 monitoring stations are displayed in Fig.\u0026nbsp;\u003cspan\u003e1\u003c/span\u003e. Different colors represent different elevations.\u003c/p\u003e\n \u003cp\u003eThe study uses air quality data collected from 35 stations in Beijing (China). Each station provides one air quality series and will be used to train a prediction model individually and collaboratively with federated learning strategy. In federated scenario, all stations will be firstly organized into several groups and in each group, each station acts as a local client. As a starting point, we choose two nearest stations to execute federated learning and observe the performance comparing with individual modelling. The distance function is calculated based on the latitude and longitude information.\u003c/p\u003e\n \u003cp\u003eIn order to observe the predictive results of the FedAvg under different levels of noise situation, four groups of noisy time series were used to execute our method. Meanwhile, the true time series (raw data) were also used to execute the same development process for comparison. Four different levels of noise (SNR) are used to create noisy time series: 10, 20, 30 and 40. Figure\u0026nbsp;\u003cspan\u003e4\u003c/span\u003e illustrates the four noisy time series and the raw time series. To clearly show the trend, a period of 10-hour is enlarged in the local zoom of the figure.\u003c/p\u003e\n \u003cp\u003eThe noisy time series with SNR 10 shows dramatic fluctuation and the noisy time series with SNR 40 is the one with smallest error. It can be inferred that the noisy time series with SNR 40 will be the easiest one to predict. However, whether the federated learning is powerful while predicting noisy timeseries comparing with individual modeling and how the federated learning performs on different levels of noisy time series should be further studied.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec9\"\u003e\n \u003ch2\u003e4.2. Evaluation metrics\u003c/h2\u003e\n \u003cp\u003eAll the experiments are run on a computer with an Intel Core i7 processor and 16 GB of RAM, running the Windows 10 operating system. The programming environment used for implementation was Visual Studio Code, and the experiments were coded in Python 3.9. For deep learning tasks, the PyTorch framework was employed.\u003c/p\u003e\n \u003cp\u003eAs to the evaluation of experimental results, mean absolute error (MAE) and mean square error (MSE) are two common indexes. MAE indicates the average absolute difference between the predicted value and the true value, which can reflect the accuracy of the prediction results. The smaller the MAE is, the higher the accuracy of the prediction results. MSE indicates the average of the squares of the errors between the predicted and the true values, which can reflect the overall error of the prediction results. The smaller the MSE is, the smaller the overall error of the prediction results is. To evaluate the performance of forecasting models, the index of agreement (IA) is used to indicates the distribution similarity between the observed and forecasting values. The IA varies in the range of [0, 1]. The closer to 1 the IA is, the higher the distribution consistency is.\u003c/p\u003e\n \u003cp\u003eFor time series data, due to the temporal correlation of the data, the use of a single evaluation index may not be able to comprehensively assess the performance of the model, and the three indices of MAE, MSE and IA are used here.\u003c/p\u003e\n \u003cdiv id=\"Equ7\"\u003e\n \u003cdiv id=\"FileID_Equ7\" name=\"EquationSource\"\u003e$$\\text{M}\\text{S}\\text{E}=\\frac{1}{n}\\sum _{i=1}^{n} ({y}_{i}-{\\widehat{y}}_{i}{)}^{2}$$\u003c/div\u003e\n \u003cdiv\u003e7\u003c/div\u003e\n \u003c/div\u003e\n \u003cdiv id=\"Equ8\"\u003e\n \u003cdiv id=\"FileID_Equ8\" name=\"EquationSource\"\u003e$$MAE=\\frac{1}{n}\\sum _{i=1}^{n} \\left|{y}_{i}-{\\widehat{y}}_{i}\\right|$$\u003c/div\u003e\n \u003cdiv\u003e8\u003c/div\u003e\n \u003c/div\u003e\n \u003cdiv id=\"Equ9\"\u003e\n \u003cdiv id=\"FileID_Equ9\" name=\"EquationSource\"\u003e$$\\text{I}\\text{A}=1-\\frac{\\sum _{i=1}^{n} {\\left({y}_{i}-{\\widehat{y}}_{i}\\right)}^{2}}{\\sum _{i=1}^{n} {\\left(|{y}_{i}-\\overline{y}|+\\left|{\\widehat{y}}_{i}-\\overline{y}\\right|\\right)}^{2}}$$\u003c/div\u003e\n \u003cdiv\u003e9\u003c/div\u003e\n \u003c/div\u003e\n \u003cp\u003ewhere \u003cem\u003ey\u003c/em\u003e\u003csub\u003e\u003cem\u003ei\u003c/em\u003e\u003c/sub\u003e represents the \u003cem\u003ei-\u003c/em\u003eth real output and \u003cspan\u003e\u003cspan\u003e\\(\\widehat{{y}_{i}}\\)\u003c/span\u003e\u003c/span\u003e represents the \u003cem\u003ei-\u003c/em\u003eth predicted result.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec10\"\u003e\n \u003ch2\u003e4.3. Local Model selection and parameter settings\u003c/h2\u003e\n \u003cp\u003eThe sliding window method is adopted here to reorganize the time series data. The number of input neurons of each network is determined by the length of the sliding window. In the following experiments, we set the length of a sliding window (input dimension) as 24 which can be adjusted by experts accordingly and output dimension as 1. For stance, the initial 24 hours of AQI values serve as inputs, while the output is the air quality of the 25th data point. The learning rate of the network is set as 0.005. The training set comprises 2172 hourly bar AQI values of the first quarter of 2022, while the test set encompasses 40 hours of AQI. The batch size is set to 1 and the training rounds are numbered as 100.\u003c/p\u003e\n \u003cp\u003eAfter analyzing the AQI data, a shallow neural network comprising of one input layer, one hidden layer with 10 neurons, and one output layer was chosen for training. Shallow networks are favorable due to the short training time and effectiveness compared to deep networks such as Long Short-Term Memory neural network (LSTM). To verify this, we did some experiments and list the results of shallow networks and deep neural networks using the same time series.\u003c/p\u003e\n \u003cp\u003eTable\u0026nbsp;\u003cspan\u003e1\u003c/span\u003e presents the performances of BPNN and LSTM in terms of error and running time. Both neural networks were tested under the same conditions and repeated five times. The results show that BPNN has a lower MSE and a faster running time, almost twice as short as LSTM. There is no dramatic improvement while using deep neural networks and therefore, we decide to use shallow networks as the fundamental modelling units.\u003c/p\u003e\n \u003cp\u003e\u003cimg src=\"https://myfiles.space/user_files/122228_c8a1650c59388082/122228_custom_files/img1721648974.png\"\u003e\u003cbr\u003e\u003c/p\u003e\n \u003cdiv\u003e\n \u003c/div\u003e\n \u003cp\u003eIn the federated learning process, local models are trained for a while independently to reach a certain criterion before sending their parameters to a central server. The central server then calculates to obtain global parameters. These global parameters are then sent back to each local side. This iterative process continues until a predetermined convergence condition is reached. Here we set the number of rounds r in the outer loop to 10, and a single client is trained locally 10 times per round, so the number of iterations in the whole process is set as 100.\u003c/p\u003e\n \u003cp\u003eTo look at the performances of shallow neural networks predicting raw time series and noisy ones, we illustrate the test results in Fig.\u0026nbsp;\u003cspan\u003e5\u003c/span\u003e. The raw time series is drawn in the figure as well. The results show that the predicted values with a SNR of 10 had the most significant difference with the raw values. In contrast, the difference between the predicted values and true values for the other groups was smaller. In a nutshell, prediction models only function on a rational range of noisy data which is consistent with the human beings\u0026rsquo; inference. In addition, the predicted values lagged slightly behind the true values. Therefore, we choose noisy time series with SNR 40 in the disturbance verification experiments.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec11\"\u003e\n \u003ch2\u003e4.4. The performances of FedBiased vs FedAvg\u003c/h2\u003e\n \u003cp\u003eTo testify the effectiveness of the proposed FedBiased method, we compare the experimental results with the popular FedAvg from three perspectives: (1) the trend of the entire convergence process; (2) the performances of federated learning methods under different groups; (3) statistical testing.\u003c/p\u003e\n \u003cdiv id=\"Sec12\"\u003e\n \u003ch2\u003e4.4.1. The trend of the entire convergence process\u003c/h2\u003e\n \u003cp\u003eFigure\u0026nbsp;\u003cspan\u003e6\u003c/span\u003e illustrates the MSE change with epoch on centralized modeling, FedAvg modeling and FedBiased modeling for monitoring station 1. The three curves are very close and we enlarge part of them in a small window in which it is clear that FedBiased performs best. The federated group includes stations 1, 6 and 2.\u003c/p\u003e\n \u003c/div\u003e\n \u003cdiv id=\"Sec13\"\u003e\n \u003ch2\u003e4.4.2. The performances of federated learning methods under different groups\u003c/h2\u003e\n \u003cp\u003eIn Fig.\u0026nbsp;\u003cspan\u003e3\u003c/span\u003e, station 16 and station 17 are located on higher elevation than other stations. To make sure the federated learning methods function under different environments and settings, we employ the time series from these two stations to execute the experiments. In addition, the time series from station 15 (the nearest station of station 17) is collected as well. Tables\u0026nbsp;\u003cspan\u003e2\u003c/span\u003e, \u003cspan\u003e3\u003c/span\u003e, \u003cspan\u003e4\u003c/span\u003e, \u003cspan\u003e5\u003c/span\u003e show the MSE values and MAE values of the three methods (FedBiased, FedAvg and centralized modeling) under two different groups: (1) stations 15, 16, 17; (2) stations 1, 6, 2.\u003c/p\u003e\n \u003cdiv\u003e\n \u003ctable id=\"Tab2\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv\u003eTable 2\u003c/div\u003e\n \u003cdiv\u003e\n \u003cp\u003eMSE and standard deviation for stations 15,16,17 using FedAvg, FedBiased and Centralized algorithms\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003ccolgroup cols=\"4\"\u003e\u003c/colgroup\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eStation\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eFedAvg\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eFedBiased\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eCentralized\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e15\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.000253\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;9.875E-08\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.000245\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;2.515E-08\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.000243\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;7.664E-07\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e16\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.000517\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;1.29E-06\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.000508\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;1.265E-07\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.000530\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;4.586E-07\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e17\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.000358\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;1.769E-06\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.000348\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;7.124E-08\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.000350\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;1.577E-07\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n \u003cp\u003eTable\u0026nbsp;\u003cspan\u003e2\u003c/span\u003e reveals that sometimes federated learning algorithms show better performances than centralized learning algorithms, and sometimes centralized learning algorithms show better performances than federated learning algorithms. In the groups of stations 15, 16, 17, the reason under this may be station 16 and station 17 are located nearby and with higher elevation whereas station 15 is far from the two stations. In any event, FedBiased performs better than FedAvg.\u003c/p\u003e\n \u003cdiv\u003e\n \u003ctable id=\"Tab3\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv\u003eTable 3\u003c/div\u003e\n \u003cdiv\u003e\n \u003cp\u003eMSE and standard deviation for stations 1,6,2 using FedAvg, FedBiased and Centralized algorithms\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003ccolgroup cols=\"4\"\u003e\u003c/colgroup\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eStation\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eFedAvg\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eFedBiased\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eCentralized\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.000146\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;4.41E-06\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.000137\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;4.20E-08\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.000135\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e\u0026plusmn;\u0026thinsp;9.76E-07\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e6\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.000844\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;3.02E-06\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.000835\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;3.18E-07\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.000826\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e\u0026plusmn;\u0026thinsp;2.13E-06\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.000235\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;3.09E-06\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.000221\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e\u0026plusmn;\u0026thinsp;8.05E-07\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.000256\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;3.04E-06\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n \u003cp\u003eTable\u0026nbsp;\u003cspan\u003e3\u003c/span\u003e displays the results of stations 1,6,2. In this case, centralized learning shows better results on two stations and FedBiased performs best on the last station. There is no obvious rule, however, FedBiased show better results on all the three stations.\u003c/p\u003e\n \u003cdiv\u003e\n \u003ctable id=\"Tab4\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv\u003eTable 4\u003c/div\u003e\n \u003cdiv\u003e\n \u003cp\u003eMAE and standard deviation for stations 15,16,17 using FedAvg, FedBiased and Centralized algorithms\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003ccolgroup cols=\"4\"\u003e\u003c/colgroup\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eStation\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eFedAvg\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eFedBiased\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eCentralized\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e15\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.010\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;1.03E-05\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.0101\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;4.14E-05\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.01\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e\u0026plusmn;\u0026thinsp;1.69E-05\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e16\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.0154\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;4.45E-05\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.0151\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e\u0026plusmn;\u0026thinsp;4.52E-05\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.0158\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;3.45E-05\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e17\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.0143\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;8.61E-06\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.0142\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e\u0026plusmn;\u0026thinsp;1.50E-05\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.0144\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;1.11E-05\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n \u003cp\u003eTable\u0026nbsp;\u003cspan\u003e4\u003c/span\u003e lists the values of MAE which reveals the same conclusion with Table\u0026nbsp;\u003cspan\u003e2\u003c/span\u003e: FedBiased performs best on station 16 and station 17 and centralized learning performs best on station 15.\u003c/p\u003e\n \u003cdiv\u003e\n \u003ctable id=\"Tab5\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv\u003eTable 5\u003c/div\u003e\n \u003cdiv\u003e\n \u003cp\u003eMAE and standard deviation for stations 1,6,2 using FedAvg, FedBiased and Centralized algorithms\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003ccolgroup cols=\"4\"\u003e\u003c/colgroup\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eStation\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eFedAvg\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eFedBiased\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eCentralized\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.00606\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;3.805E-05\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.0060\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;6.090E-05\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.005\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e\u0026plusmn;\u0026thinsp;2.102E-05\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e6\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.0163\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;3.461E-05\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.0160\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;5.990E-05\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.0160\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e\u0026plusmn;\u0026thinsp;2.766E-05\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.0116\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;0.000170\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.0112\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e\u0026plusmn;\u0026thinsp;0.000178\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.0119\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;0.00012\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n \u003cp\u003eTable\u0026nbsp;\u003cspan\u003e5\u003c/span\u003e lists the values of MAE which reveals the same conclusion with Table\u0026nbsp;\u003cspan\u003e3\u003c/span\u003e: centralized learning performs best on station 1 and station 6 and FedBiased performs best on station 2.\u003c/p\u003e\n \u003cp\u003eOn matter which groups are adopted for experiments, FedBiased shows better performances than FedAvg. However, sometimes, centralized learning is preferred since it outputs better results and higher efficiency.\u003c/p\u003e\n \u003c/div\u003e\n \u003cdiv id=\"Sec14\"\u003e\n \u003ch2\u003e4.4.3. Statistical testing\u003c/h2\u003e\n \u003cp\u003eWe apply two-sample t-test to demonstrate that the results of FedBiased model are significant against the FedAvg. We set the null hypothesis (H0) as \u0026ldquo;The MSE of FedBiased is not different from FedAvg\u0026rdquo; and the alternative hypothesis (H1) as \u0026ldquo;The MSE of FedBiased is different from FedAvg\u0026rdquo;. MSE was selected as samples to calculate their corresponding t-statistics with a confidence level (\u0026alpha;) of 0.05, respectively. The definition of t value is displayed in formula (7).\u003c/p\u003e\n \u003cdiv id=\"Equ10\"\u003e\n \u003cdiv id=\"FileID_Equ10\" name=\"EquationSource\"\u003e$$t=(\\widehat{{X}_{1}}-\\widehat{{X}_{2}})/\\sqrt{\\frac{{s}_{1}^{2}}{{n}_{1}}+\\frac{{s}_{2}^{2}}{{n}_{2}}}$$\u003c/div\u003e\n \u003cdiv\u003e7\u003c/div\u003e\n \u003c/div\u003e\n \u003cp\u003ewhere \u003cspan\u003e\u003cspan\u003e\\(\\widehat{X}\\)\u003c/span\u003e\u003c/span\u003e is the sample mean, \u003cspan\u003e\u003cspan\u003e\\({s}^{2}\\)\u003c/span\u003e\u003c/span\u003e is the sample variance, and \u003cspan\u003e\u003cspan\u003e\\(n\\)\u003c/span\u003e\u003c/span\u003e is the sample size.\u003c/p\u003e\n \u003cp\u003eWe compare the MSEs of FedBiased and FedAvg under the group including stations 1, 6 and 2. The t-statistic value for station 1 was 6.453 with a p-value of 0.000112, and the t-statistic value for station 6 was 9.372 with a p-value of 6.123E-06. The t-statistic value for station 2 was 13.865 with a p-value of 2.231E-07. The p-values for all three stations are less than the 0.05 confidence level, rejecting the null hypothesis and accepting the alternative hypothesis that FedBiased is different from FedAvg and that the prediction is better than FedAvg.\u003c/p\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec15\"\u003e\n \u003ch2\u003e4.5. The robustness analysis of federated learning techniques\u003c/h2\u003e\n \u003cp\u003eIn literature, when privacy and data security issues are not considered, centralized learning methods often show better performances than federated learning algorithms [\u003cspan\u003e27\u003c/span\u003e]. Does this mean that federated learning algorithms are useless or needless? Or, in which environment federated learning algorithms show dramatic performances? In the following studies, we simulate three real noisy scenarios of time series data which have been introduced in Section 3. The results are arranged from four aspects: (1) the performances of federated learning algorithms on noisy time series (only one time series is noisy); (2) the performances of federated learning algorithms under three nosing cases; (3) the performances of federated learning algorithms with incremental stations.\u003c/p\u003e\n \u003cdiv id=\"Sec16\"\u003e\n \u003ch2\u003e4.5.1. The performances of federated learning algorithms on noisy time series\u003c/h2\u003e\n \u003cp\u003eRobustness becomes a more and more important characteristic recent years. Therefore, in this study, we try to verify the effectiveness of federated learning algorithms on noisy time series (air quality data). The raw time series are assumed clean and we add some noise on them using three strategies defined in Section 3. Time series from three stations are collected to realize federated learning algorithms: station 1, 6 and 2. Assume that only one station, i.e., station 1, has problems on sensors: the time series is noisy (SNR\u0026thinsp;=\u0026thinsp;40). Now let\u0026rsquo;s look at the performances of federated learning algorithms using two metrics: MSE and IA. As comparison, centralized learning is employed as well on Station 1.\u003c/p\u003e\n \u003cdiv\u003e\n \u003ctable id=\"Tab6\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv\u003eTable 6\u003c/div\u003e\n \u003cdiv\u003e\n \u003cp\u003ePerformances of FedAvg, FedBiased and Centralized with MSE and IA\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003ccolgroup cols=\"4\"\u003e\u003c/colgroup\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eMetrics\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eFedAvg\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eFedBiased\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eCentralized\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eMSE\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.000139\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;1.498E-07\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.000136\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;1.583E-07\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.000137\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;1.706E-06\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eIA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.7455\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;6.031E-06\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.7457\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;6.488E-06\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.7451\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;0.000461\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n \u003cp\u003eTable\u0026nbsp;\u003cspan\u003e6\u003c/span\u003e displays the results of three strategies: FedAvg, FedBiased and Centralized algorithms on the group of stations 1, 2, 6. Both MSE and IA show that FedBiased outputs the smallest test error and thus owing the highest generalization.\u003c/p\u003e\n \u003cp\u003eThis experiment verified the effectiveness of the FedAvg strategy when dealing with the entire noisy data. Even in the presence of data uncertainty and noise, the FedAvg is still able to effectively learn useful patterns from the data. In addition, the FedAvg is better able to detect trends in the data when faced with noisy data, resulting in more accurate predictions.\u003c/p\u003e\n \u003c/div\u003e\n \u003cdiv id=\"Sec17\"\u003e\n \u003ch2\u003e4.5.2. The performances of federated learning algorithms under three nosing cases\u003c/h2\u003e\n \u003cp\u003eConsidering that the appearance of noise in real life may be random, we added two sets of contrast experiments, noise was only added to the first and second halves of the training data. To get closer to the noise common in real life, we set the SNR to 40 for station 1 and used the FedAvg, FedBiased and Centralized for training.\u003c/p\u003e\n \u003cp\u003eThe average results of the 10 times experiments are presented in Table\u0026nbsp;\u003cspan\u003e7\u003c/span\u003e, where columns 1 to 3 show the MSE and its variance for the three noise cases, and columns 4 to 6 show the IA and its variance.\u003c/p\u003e\n \u003cdiv\u003e\n \u003ctable id=\"Tab7\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv\u003eTable 7\u003c/div\u003e\n \u003cdiv\u003e\n \u003cp\u003eMSE and IA values for three noise adding cases using FedAvg, FedBiased and Centralized methods\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003ccolgroup cols=\"7\"\u003e\u003c/colgroup\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\" rowspan=\"2\"\u003e\n \u003cp\u003eMethod\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colspan=\"3\"\u003e\n \u003cp\u003eMSE\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colspan=\"3\"\u003e\n \u003cp\u003eIA\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eCase \u003cspan\u003e1\u003c/span\u003e\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eCase \u003cspan\u003e2\u003c/span\u003e\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eCase \u003cspan\u003e3\u003c/span\u003e\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eCase \u003cspan\u003e1\u003c/span\u003e\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eCase \u003cspan\u003e2\u003c/span\u003e\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eCase \u003cspan\u003e3\u003c/span\u003e\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eFedAvg\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.000139\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;1.498E-07\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.000142\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;1.37E-07\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.000139\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;1.526E-07\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.7455\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;6.031E-06\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.7494\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;4.928E-06\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.7455\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;9.863E-06\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eFedBiased\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.000136\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e\u0026plusmn;\u0026thinsp;1.583E-07\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.000140\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e\u0026plusmn;\u0026thinsp;1.103E-07\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.000136\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e\u0026plusmn;\u0026thinsp;1.224E-07\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.7457\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e\u0026plusmn;\u0026thinsp;6.488E-06\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.7497\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e\u0026plusmn;\u0026thinsp;6.238E-06\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.7457\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e\u0026plusmn;\u0026thinsp;6.467E-06\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCentralized\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.000137\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;1.706E-06\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.000139\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;1.63E-06\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.000136\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;1.678E-06\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.7451\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;0.000461\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.7495\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;0.000449\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.7455\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;0.000447\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n \u003cp\u003eTable\u0026nbsp;\u003cspan\u003e7\u003c/span\u003e shows that the MSE values of FedBiased for the three noise cases are smallest, and the IA values of FedBiased are highest. It can be inferred that federated learning algorithms are still able to effectively learn useful patterns from the data even in the face of noise interference. Specifically, the prediction accuracy and stability of the model of the noise-added time series data are improved after being processed by the FedBiased. Table\u0026nbsp;\u003cspan\u003e7\u003c/span\u003e also shows that Case \u003cspan\u003e1\u003c/span\u003e and Case \u003cspan\u003e3\u003c/span\u003e have similar effects, while case \u003cspan\u003e2\u003c/span\u003e differs somewhat from these two. Specifically, both MSE and IA are larger for Case2. This indicates that when dealing with noisy data, the first half of the noise has a greater impact on the training results.\u003c/p\u003e\n \u003c/div\u003e\n \u003cdiv id=\"Sec18\"\u003e\n \u003ch2\u003e4.5.3. The performances of FedBiased with incremental stations\u003c/h2\u003e\n \u003cp\u003eTables\u0026nbsp;\u003cspan\u003e8\u003c/span\u003e and \u003cspan\u003e9\u003c/span\u003e display the MSE and standard deviation values of Station 1 and Station 6 under different groups of stations. It shows in an incremental manner: starting from two stations until six stations. We stop at six stations due to two reasons: one is computation complexity and the other is the error. Note that individual modeling using Station 1 returns MSE value equal to 0.000135 and individual modeling using Station 6 returns MSE value equal to 0.000827.\u003c/p\u003e\n \u003cp\u003eTo test the influence of station 1 with noise data on different stations using FedBiased, Tables\u0026nbsp;\u003cspan\u003e8\u003c/span\u003e and \u003cspan\u003e9\u003c/span\u003e show the influence on its own and peer station 6, respectively. The SNR was set to 30 to make the effect of FedBiased more prominent. It is worth noting that individual modeling using Station 1 returns an MSE value of 0.000140.\u003c/p\u003e\n \u003cp\u003eIn Table\u0026nbsp;\u003cspan\u003e8\u003c/span\u003e, the best results are from the group of stations 1 ,6 and 2. Station 1 has an MSE of 0.000138 under this group, which is lower than its centralized value of 0.000140. This reflects the fact that FedBiased has a more pronounced effect in the presence of noise, resulting in higher accuracy.\u003c/p\u003e\n \u003cdiv\u003e\n \u003ctable id=\"Tab8\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv\u003eTable 8\u003c/div\u003e\n \u003cdiv\u003e\n \u003cp\u003eMSE and standard deviation values of Station 1 under different groups of monitoring stations\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003ccolgroup cols=\"3\"\u003e\u003c/colgroup\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eIndexes of grouped stations\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eSNR\u0026thinsp;=\u0026thinsp;0\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eSNR\u0026thinsp;=\u0026thinsp;30\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1, 6\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.000139\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;2.322E-07\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.000138\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;6.415E-07\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1, 6, 2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.000137\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e\u0026plusmn;\u0026thinsp;4.196E-08\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.000138\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e\u0026plusmn;\u0026thinsp;1.453E-07\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1, 6, 2, 5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.000141\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;2.554E-08\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.000140\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;9.318E-08\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1, 6, 2, 5, 3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.000142\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;2.952E-08\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.000139\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;5.950E-08\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1, 6, 2, 5, 3, 4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.000143\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;1.157E-07\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.000140\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;1.311E-08\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n \u003cp\u003eAt the same time, some experimental results show that when SNR\u0026thinsp;=\u0026thinsp;30, the loss of the training set is larger than that of the training set when SNR\u0026thinsp;=\u0026thinsp;0, but the loss of the test set is very close and even smaller, which can be explained by the fact that appropriate Gaussian noise can alleviate the overfitting of the model.\u003c/p\u003e\n \u003cdiv\u003e\n \u003ctable id=\"Tab9\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv\u003eTable 9\u003c/div\u003e\n \u003cdiv\u003e\n \u003cp\u003eMSE and standard deviation values of Station 6 under different groups of monitoring stations\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003ccolgroup cols=\"3\"\u003e\u003c/colgroup\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eIndexes of grouped stations\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eSNR\u0026thinsp;=\u0026thinsp;0\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eSNR\u0026thinsp;=\u0026thinsp;30\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e6, 1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.000826\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e\u0026plusmn;\u0026thinsp;5.770E-07\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.000826\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e\u0026plusmn;\u0026thinsp;1.20E-06\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e6, 1, 2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.000835\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;3.182E-07\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.000835\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;4.754E-07\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e6, 1, 2, 5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.000835\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;4.293E-08\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.000835\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;3.72E-07\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e6, 1, 2, 5, 3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.000838\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;4.690E-07\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.000835\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;3.432E-07\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e6, 1, 2, 5, 3, 4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.000838\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;4.515E-07\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.000838\u003c/p\u003e\n \u003cp\u003e\u0026plusmn;\u0026thinsp;3.047E-07\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n \u003cp\u003eFor station 6 without adding noise, the best results are both from the group of stations 1 and 6 when raw and noisy time series are used to train prediction models. When comparing the centralized result of 0.0000827 for station 6 with the FedBiased result of 0.000826 for station 1 under SNR\u0026thinsp;=\u0026thinsp;0 and SNR\u0026thinsp;=\u0026thinsp;30, it is evident that site 1 has little influence on other sites in the federated algorithm, even with a certain amount of noise.\u003c/p\u003e\n \u003cp\u003eFrom a comprehensive comparison of Tables\u0026nbsp;\u003cspan\u003e8\u003c/span\u003e and \u003cspan\u003e9\u003c/span\u003e, it is clear that the FedBiased improves the performance of stations with a certain level of noise without adversely affecting the performance of other stations the best combination. This demonstrates the stability and robustness of the federated biased algorithm model. However, this does not mean that a larger number of stations will be more robust, as each station has its most appropriate group.\u003c/p\u003e\n \u003c/div\u003e\n\u003c/div\u003e"},{"header":"5. Conclusion","content":"\u003cp\u003eThis study proposed a new federated learning algorithm\u0026ndash;federated biased learning and verified its better performances than the popular federated averaging learning algorithm when predicting time series using an air quality index case. We established a multi-station time series prediction framework to achieve the best prediction for each station, using the personalized effect of federated biased learning in the optimal combination of stations. At the same time, a comprehensive framework comprising three scenarios of disturbances existing in the time series is designed to testify the robustness and reliability of federated learning algorithms.\u003c/p\u003e \u003cp\u003eThis study carefully compares the performances of federated learning algorithms and centralized methods and concludes that federated learning algorithms function better when there are some disturbances or uncertainties in the raw time series. Through the experimental studies, we can make the following conclusions:\u003c/p\u003e \u003cp\u003e(1) Federated learning methods are more robust than single modeling because they collect as many as possible information and variety from different data sources. This makes it is preferred when reliability is the most important requirement in real engineering tasks.\u003c/p\u003e \u003cp\u003e(2) Both FedAvg and FedBiased methods function well in different cases. However, in the air quality index prediction case in Beijing, FedBiased performances better from several aspects than FedAvg.\u003c/p\u003e \u003cp\u003e(3) Shallow neural networks are more suitable for this air quality index series than deep networks (LSTM in this paper) in terms of accuracy and running time. Therefore, it is necessary to consider model architectures from simple to complex before developing new models.\u003c/p\u003e \u003cp\u003e(4) The number of local model combinations does not necessarily correlate with better federal learning. Each local model has its own optimal combination.\u003c/p\u003e \u003cp\u003eAccording to those observations, our future research work will focus on the following topics:\u003c/p\u003e \u003cp\u003e(1) The information of spatial data should be further explored to help capture more useful information and improve accuracy.\u003c/p\u003e \u003cp\u003e(2) More federated learning methods will be collected to study the air quality index prediction problem and compared using more experiments.\u003c/p\u003e \u003cp\u003e(3) Validation of the framework's effectiveness and generalizability on various datasets will be studied.\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eMingli Song: methodology and writing. Xinyu Zhao: experiments.Witold Pedrycz: reviewing and editing.\u003c/p\u003e\u003cp\u003eAcknowledgement\u003c/p\u003e\n\u003cp\u003eThis work was supported by the National Natural Science Foundation of China (NSFC) 61773352 and the Fundamental Research Funds for the Central Universities.\u003c/p\u003e\u003cp\u003ehttp://www.bjmemc.com.cn/\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eYu, P., Liu, Y.: Federated object detection: Optimizing object detection model with federated learning. In: Proc. 3rd Int. Conf. Vision, Image Signal Process., pp. 1-6 (2019)\u003c/li\u003e\n\u003cli\u003eLatif, S., Khalifa, S., Rana, R., Jurdak, R.: Federated learning for speech emotion recognition applications. In: 2020 19th ACM/IEEE Int. Conf. Inf. Process. Sensor Netw. (IPSN), pp. 341-342. IEEE (2020)\u003c/li\u003e\n\u003cli\u003eEk, S., Portet, F., Lalanda, P., Vega, G.: Evaluation of federated learning aggregation algorithms: application to human activity recognition. In: Adjunct Proc. 2020 ACM Int. Joint Conf. Pervasive Ubiquitous Comput. Proc. 2020 ACM Int. Symp. Wearable Comput., pp. 638-643 (2020)\u003c/li\u003e\n\u003cli\u003eChhikara, P., Tekchandani, R., Kumar, N., Guizani, M., Hassan, M.M.: Federated learning and autonomous UAVs for hazardous zone detection and AQI prediction in IoT environment. IEEE Internet Things J., 8(20), 15456-15467 (2021)\u003c/li\u003e\n\u003cli\u003eMcMahan, B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Artif. Intell. Stat., pp. 1273-1282 (2017)\u003c/li\u003e\n\u003cli\u003eLi, T., Sahu, A.K., Zaheer, M., Sanjabi, M., Talwalkar, A., Smith, V.: Federated optimization in heterogeneous networks. Proc. Mach. Learn. Syst., 2, 429-450 (2020)\u003c/li\u003e\n\u003cli\u003eKarimireddy, S.P., Kale, S., Mohri, M., Reddi, S., Stich, S., Suresh, A.T.: Scaffold: Stochastic controlled averaging for federated learning. In: Int. Conf. Mach. Learn., pp. 5132-5143 (2020)\u003c/li\u003e\n\u003cli\u003eWang, J., Liu, Q., Liang, H., Joshi, G., Poor, H.V.: Tackling the objective inconsistency problem in heterogeneous federated optimization. Adv. Neural Inf. Process. Syst., 33, 7611-7623 (2020)\u003c/li\u003e\n\u003cli\u003eZhao, Z., Feng, C., Hong, W., Jiang, J., Jia, C., Quek, T.Q., Peng, M.: Federated learning with non-iid data in wireless networks. IEEE Trans. Wireless Commun., 21(3), 1927-1942 (2021)\u003c/li\u003e\n\u003cli\u003eWang, S., Cao, J., Philip, S.: Deep learning for spatio-temporal data mining: A survey. IEEE Trans. Knowl. Data Eng., 34(8), 3681-3700 (2020)\u003c/li\u003e\n\u003cli\u003eZhang, X., Wen, S., Yan, L., Feng, J., Xia, Y.: A hybrid-convolution spatial\u0026ndash;temporal recurrent network for traffic flow prediction. Comput. J., 67(1), 236-252 (2024)\u003c/li\u003e\n\u003cli\u003eHuo, P., Li, Z., Bai, M., Li, Z., Huang, J., Han, L.: Spatial-temporal evolutions of historical and future meteorological drought center in Beijing area, China. Urban Clim., 53, 101786 (2024)\u003c/li\u003e\n\u003cli\u003eJohnson, D.P., Owusu, C.: Examining associations between social vulnerability indices and COVID-19 incidence and mortality with spatial-temporal Bayesian modeling. Spatial Spatio-temporal Epidemiol., 48, 100623 (2024)\u003c/li\u003e\n\u003cli\u003eWu, Y., Huang, Z., Zheng, Y., Liu, Y., Li, H., Che, Y., et al.: Spatial\u0026ndash;temporal data-driven full driving cycle prediction for optimal energy management of battery/supercapacitor electric vehicles. Energy Convers. Manag., 277, 116619 (2023)\u003c/li\u003e\n\u003cli\u003eMusa, A.A., Hussaini, A., Liao, W., Liang, F., Yu, W.: Deep neural networks for spatial-temporal cyber-physical systems: A survey. Future Internet, 15(6), 199 (2023)\u003c/li\u003e\n\u003cli\u003eJin, Z., Qian, J., Kong, Z., Pan, C.: A mobility aware network traffic prediction model based on dynamic graph attention spatio-temporal network. Comput. Netw., 235, 109981 (2023)\u003c/li\u003e\n\u003cli\u003eGeng, Z., Xu, J., Wu, R., Zhao, C., Wang, J., Li, Y., Zhang, C.: STGAFormer: Spatial\u0026ndash;temporal gated attention transformer based graph neural network for traffic flow forecasting. Inf. Fusion, 102228 (2024)\u003c/li\u003e\n\u003cli\u003eZhao, L., Song, Y., Zhang, C., Liu, Y., Wang, P., Lin, T., Li, H.: T-GCN: A temporal graph convolutional network for traffic prediction. IEEE Trans. Intell. Transp. Syst., 21(9), 3848-3858 (2019)\u003c/li\u003e\n\u003cli\u003eWang, C., Zhu, Y., Zang, T., Liu, H., Yu, J.: Modeling inter-station relationships with attentive temporal graph convolutional network for air quality prediction. In: Proc. 14th ACM Int. Conf. Web Search Data Min., pp. 616-634 (2021)\u003c/li\u003e\n\u003cli\u003ePadhi, I., Schiff, Y., Melnyk, I., Rigotti, M., Mroueh, Y., Dognin, P., Altman, E.: Tabular transformers for modeling multivariate time series. In: ICASSP 2021-2021 IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), pp. 3565-3569. IEEE (2021)\u003c/li\u003e\n\u003cli\u003eXiao, P., Cheng, S., Stankovic, V., Vukobratovic, D.: Averaging is probably not the optimum way of aggregating parameters in federated learning. Entropy, 22(3), 314 (2020)\u003c/li\u003e\n\u003cli\u003eQu, Z., Lin, K., Li, Z., Zhou, J.: Federated learning\u0026rsquo;s blessing: FedAvg has linear speedup. In: ICLR 2021-Workshop Distrib. Private Mach. Learn. (DPML) (2021)\u003c/li\u003e\n\u003cli\u003eWang, J., Das, R., Joshi, G., Kale, S., Xu, Z., Zhang, T.: On the unreasonable effectiveness of federated averaging with heterogeneous data. arXiv preprint arXiv:2206.04723 (2022)\u003c/li\u003e\n\u003cli\u003eWang, J., Charles, Z., Xu, Z., Joshi, G., McMahan, H.B., Al-Shedivat, M., et al.: A field guide to federated optimization. arXiv preprint arXiv:2107.06917 (2021)\u003c/li\u003e\n\u003cli\u003eWei, K., Li, J., Ding, M., Ma, C., Yang, H.H., Farokhi, F., et al.: Federated learning with differential privacy: Algorithms and performance analysis. IEEE Trans. Inf. Forensics Secur., 15, 3454-3469 (2020)\u003c/li\u003e\n\u003cli\u003ePastor, D.: A theoretical result for processing signals that have unknown distributions and priors in white Gaussian noise. Comput. Stat. Data Anal., 52(6), 3167-3186 (2008)\u003c/li\u003e\n\u003cli\u003eWang, Y., et al.: Federated learning for automatic modulation classification under class imbalance and varying noise condition. IEEE Trans. Cogn. Commun. Netw., 8(1), 86-96 (2021)\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"cluster-computing","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"Learn more about [Cluster Computing](https://www.springer.com/journal/10586)","snPcode":"10586","submissionUrl":"https://submission.nature.com/new-submission/10586/3","title":"Cluster Computing","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"Federated learning, air quality index, time series, neural networks, federated averaging algorithm","lastPublishedDoi":"10.21203/rs.3.rs-4658479/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4658479/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThe federated averaging algorithm (FedAvg) is extensively used for multi-sensor data modeling but often overlooks the unique characteristics of local models when privacy and data security are not considered. This study introduces a novel federated learning algorithm built upon the FedAvg framework, which emphasizes the specificity of each local model to optimize global knowledge aggregation. The algorithm's effectiveness is demonstrated through an air quality index prediction problem, showcasing superior prediction performance and robustness in noisy data scenarios. Additionally, the study delves into the reliability and robustness of the proposed approach, addressing the prevalent notion that centralized learning methods often surpass federated learning when data security is not a concern. Our experiments affirm the necessity and superiority of federated learning methods, even in the absence of privacy considerations, by effectively managing real-world noisy data.\u003c/p\u003e","manuscriptTitle":"A robust federated biased learning algorithm for time series forecasting ","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-07-22 11:57:36","doi":"10.21203/rs.3.rs-4658479/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2024-09-29T23:13:35+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-07-18T08:57:42+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-07-17T06:31:51+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"98295610795288554297961226998358509271","date":"2024-07-14T02:20:52+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"200278723455760991701659929771758702190","date":"2024-07-10T10:03:20+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"134545931064251321826853668245891894009","date":"2024-07-09T03:09:24+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2024-07-08T07:00:28+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2024-06-30T06:48:49+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2024-06-29T09:25:11+00:00","index":"","fulltext":""},{"type":"submitted","content":"Cluster Computing","date":"2024-06-29T08:55:03+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"cluster-computing","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"Learn more about [Cluster Computing](https://www.springer.com/journal/10586)","snPcode":"10586","submissionUrl":"https://submission.nature.com/new-submission/10586/3","title":"Cluster Computing","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"7c521b40-2f7b-4e8f-95f7-5aba4e3a1cdf","owner":[],"postedDate":"July 22nd, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2024-12-24T00:38:12+00:00","versionOfRecord":[],"versionCreatedAt":"2024-07-22 11:57:36","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-4658479","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4658479","identity":"rs-4658479","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.