Machine learning ARIMA for modelling and forecasting variations inearth’s surface phenomenon using sparse time series satellite data—acase study of sea surface salinity in the Nigerian coastal zone

doi:10.21203/rs.3.rs-4056329/v1

Machine learning ARIMA for modelling and forecasting variations inearth’s surface phenomenon using sparse time series satellite data—acase study of sea surface salinity in the Nigerian coastal zone

2024 · doi:10.21203/rs.3.rs-4056329/v1

preprint OA: closed

Full text JSON View at publisher

Full text 115,760 characters · extracted from preprint-html · click to expand

Machine learning ARIMA for modelling and forecasting variations inearth’s surface phenomenon using sparse time series satellite data—acase study of sea surface salinity in the Nigerian coastal zone | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Machine learning ARIMA for modelling and forecasting variations inearth’s surface phenomenon using sparse time series satellite data—acase study of sea surface salinity in the Nigerian coastal zone Opeyemi Ajibola-James, Francis I. Okeke This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4056329/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract The tropical coasts, particularly the Nigerian coastal zone, have been traditionally undersampled using appropriate in situ methods and understudied using appropriate remote sensing techniques despite the proliferation of satellite missions for earth observation. The contemporary all-weather satellite observations of phenomena of interest are characterized by relatively sparse time series data that discourage their utilization as input in building efficient machine learning (ML) models for both exploratory and predictive purposes. Additionally, data-poor areas usually have difficulties meeting the multiple predictor variable requirement of building appropriate multivariate ML regression models. We utilized a relatively sparse sea surface salinity (SSS) dataset from the Soil Moisture Active Passive Mission (SMAP) satellite products (Jan., 2016-Dec., 2021) for this study. We determined the accuracy and variability of the relatively sparse SSS data for the study area to be approximately 6.5° × 4.5°. We built ML autoregressive integrated moving average (ARIMA) models and determined and validated the best model for modelling (Jan., 2016-Dec., 2020) and forecasting (Jan.-Dec., 2021) Earth’s surface phenomenon (ESP) using relatively sparse SSS data as a case study. We show root mean squared differences (RMSDs) of 0.1279 psu and 0.1162 psu for modelling and forecasting data accuracy, respectively. We show a standard deviation (SD) of 0.2528 for the interannual SSS variability (iSSSv). We show the modelling accuracy with an R-squared (R 2 ) of 0.8345281 and its validation with a mean absolute percentage error (MAPE) of 0.7779% and the forecasting accuracy with a root mean squared error (RMSE) of 0.9850 psu and its validation with a MAPE of 2.7670% for the best ML ARIMA model. The relatively low SD value suggests a relatively stable iSSSv along the Nigerian coastal zone. The R 2 and MAPE results suggest relatively high modelling and prediction accuracy. The results imply that relatively sparse satellite time series data of at least 60 epochs (hourly, daily, weekly, monthly or yearly observations) can be utilized for building a relatively accurate ML ARIMA model for modelling and forecasting variations in any ESP in any geographical area. Earth’s surface phenomenon sea surface salinity machine learning arima variations modelling time series forecasting Figures Figure 1 Figure 2 Figure 3 1. Introduction Like sea surface salinity (SSS), every ESP is characterized by both spatial and temporal variations. The magnitude and frequency of such variations are usually driven by several factors. In some cases, such variations are associated with some risks to humankind and the environment, which is characterized by various species of plants, animals and microorganisms. In the case of changes in SSS on a global spatial scale, evaporation, precipitation, and river outflow are among the principal drivers (Dinnat et al., 2019). However, changes in SSS on a local spatial scale in the tropics, particularly along the Nigerian coastal zone, have been attributed to three important factors, namely, wind speed, high wind speed and sea level anomalies (Ajibola-James, 2023). The implications of spatial and temporal anomalies in SSS along coastal zones, particularly on relatively small (local or national) spatial scales, include the increasing risk of upstream seawater intrusion. More than often, the risk is associated with socioeconomic and environmental problems such as (a) the relatively high cost of tidal river water treatment for domestic and industrial purposes, (b) the threat to a sustainable freshwater supply for household consumption (Sneath, 2023) coupled with exposure to high blood pressure that may result from drinking water containing relatively high salt concentrations, (c) a decrease in the viability of the agricultural sector that can achieve an optimum yield of sensitive plants such as paddy rice and horticultural crops (CGIARCSA, 2016, Trung et al., 2016), and (d) disturbed natural ecosystems that cannot support species diversity and composition. It should be noted that relatively low-salinity water is important for establishing an enabling natural environment for industrial growth and economic development in coastal areas, particularly for the manufacturing and food processing industries. Therefore, proper modelling and forecasting of the ESP, including SSS changes along coastal zones, are crucial for providing useful early warning information for mitigating any such future risks (Ajibola-James, 2023; Ajibola-James et al., 2023). Prior to the advent of remote sensing technologies that lend themselves to the observation of specific ESP, the traditional approach to such surface data acquisition has been in situ measurements. However, remoteness and large spatial extent limit conventional in situ measurements, while the seasonal cloud cover of dynamically important regions limits the applications of the predominant optical satellite surface observations. In the case of SSS observations, the launch of different all-weather satellite missions focused on sea surface observations, particularly the European Space Agency (ESA) Soil Moisture and Ocean Salinity (SMOS) satellite, which had a microwave imaging radiometer using aperture synthesis (MIRAS) on board in 2009; the subsequent National Aeronautics and Space Administration (NASA) Aquarius in 2011; the Soil Moisture Active Passive Mission (SMAP) in 2015, which used L-band (1.4 GHz) radiometry to measure SSS at approximately 0.2 practical salinity unit (psu) accuracy; and a paradigm shift to global satellite observations of SSS and other relevant sea surface variables such as high wind speed, wind speed, and sea surface temperature (Ajibola-James, 2023). The increasing development of contemporary all-weather satellite missions signifies the relative importance of their datasets for various applications, particularly for ML modelling and forecasting of ESP at different spatial scales ranging from local to global scales. ML is a subset of artificial intelligence that enables computers to cleverly and intuitively make relatively accurate predictions based on previous learning by ML models. ML is a method of data analysis that involves building systems (models and algorithms) that can learn from data without being explicitly programmed, identifying patterns, and making decisions with minimal human intervention (Ajibola-James, 2023). To make the model selection process simpler for forecasting, ML entails a variety of strategies for identifying patterns and relationships in the data (Chan-Lau, 2017). A notable advantage of ML models and algorithms is their increasing ability to handle the time component of relatively large amounts of data (complex structured, semistructured and unstructured datasets with several characteristics, including volume, velocity, veracity, value and validity) in predictive studies. A time series is a sequence of data collected over a specific period of time. The time scale component of a data series may be either every minute or hourly or daily or monthly or yearly. When only one of the time scales is involved, it is regarded as a single seasonality. Any situation involving datasets with more than one of the time scales, for example, hourly and daily or hourly, daily and monthly or daily, monthly, and yearly, is called multiple seasonality. Time series forecasting has become a significant part of ML since there are many prediction problems with time components (Ajibola-James, 2023). In terms of trade-offs, Chan-Lau (2017) considered various ML methods from two categorical perspectives, namely, ‘flexibility’ and ‘interpretability’. He opines that the latter should be given priority over the former and hence suggests a selection of relevant ML methods in decreasing order of interpretability, least absolute shrinkage and selection operator (LASSO) regressions, least squares (LS), generalized additive models (GAM), trees (T), support vector machines (SVM), and methods combining different base learning methods such as bagging and boosting. He further argues that the predictive power and interpretability of a linear regression model that improves fit by including a large number of independent variables are negatively affected. To alleviate and/or possibly overcome such effects, he proposed two types of linear models based on methods such as ‘subset selection’ and ‘shrinkage’. Typical examples of ML models that use the latter approach are L0-regularized regression (L0) and LASSO models. These models are considered sparse learning models, which can assist in eliminating the least important set of predictor variables to optimize the forecast accuracy. The ARIMA model, in which the predictors consist of lags of the dependent variable and/or lags of the forecast errors, may be considered an example of an ML model that utilizes the former method. The relative advantage of the ARIMA model for time series modelling and prediction is that it does not require predictor (independent) variables to fit new (predicted) values. Therefore, the costs (in terms of the amount of data input, data processing time, and computer hardware) of implementing it are relatively low (Ajibola-James, 2023). The ARIMA model, which usually seeks to describe data autocorrelations by providing complementary approaches to a problem, is one of the most widely used methods for time series forecasting. The differenced autoregressive model is combined with the moving average model to form a typical ARIMA model, which consists of three technical parts. The AR component of ARIMA indicates that the time series has been regressed on its own past data. The MA component of ARIMA denotes that the forecast error is a linear combination of previous errors. The I part of ARIMA shows that the data values have been replaced with different values of d to obtain stationary data, as required by the assumption of the ARIMA model. With this combination approach, the ARIMA model is effective at fitting past data and forecasting future points in a time series (Kotu & Deshpande, 2019). In the applications of the ARIMA model, a widely used approach is known as the Box–Jenkins principle, which consists of three iterative steps, namely, model identification, parameter estimation, and diagnostic checking phases (Box & Jenkins, 1970). The main goal and fundamental rule of the model identification phase is to produce stationary time series data that have a constant mean and variance to comply with a basic requirement for time series forecasting, which is also one of the basic assumptions of the ARIMA model (Hyndman & Khandakar, 2008; Hyndman & Athanasopoulos, 2018). This implies that a time series should exhibit some theoretical autocorrelation (stationarity or white noise) qualities if it is derived from an ARIMA process. Such a stationary time series can be visually represented by autocorrelation function (ACF) and partial autocorrelation function (PACF) plots that do not show any exponential decay. Consequently, testing time series data for the presence of either white noise (stationarity) or a unit root (nonstationarity) is a required criterion in time series analysis. In this regard, Box & Jenkins (1970) suggest using the ACF and PACF of sample data as the fundamental tools to determine the order of ARIMA models. The ACF has been used to determine whether time series data are stationary, while the PACF has been used to test time series datasets for seasonality as part of the data preparation process for deploying ARIMA models (Fattah et al., 2018; Benvenuto, 2020; Hyndman & Athanasopoulos, 2021). The bar charts of the ACF plot of a stationary time series approach zero relatively rapidly, but those of the ACF plot of nonstationary data decline slowly (Hyndman & Athanasopoulos, 2021). However, a credible test for stationarity cannot be achieved by utilizing only the ACF plot, an informal test for stationarity that is based solely on the visual analysis of the series. In this regard, the augmented Dickey-Fuller test (ADF), a relatively credible and commonly used method (Cheung & Lai, 1995) that offers objective metric values for testing time series for stationarity, has been suggested. The Dickey-Fuller (DF) value, also known as the critical value of the ADF test, and its corresponding p value can easily be interpreted without prejudice to determine the stationarity of time series data (Ajibola-James, 2023). In the parlance of ML, it is generally believed that a relatively large amount of historical data is required to successfully build and test a relatively accurate and reliable model for both classification and forecasting purposes. This is essentially because a small sample size is related to overfitting (a condition that predisposes a model that performs very well on small amounts of training data to fail in predicting a new task on new samples), which usually inhibits the development of a useful model (Raudys & Jain, 1991; Liu & Gillies, 2016; Zhao et al., 2017; Nguyen et al., 2018; Ajibola-James, 2023). Consequently, the contemporary all-weather satellite observations of phenomena of interest that are characterized by relatively sparse time series data discourage their utilization as input in building efficient ML models for such purposes. The tropical coasts, particularly the Nigerian coastal zone, have been traditionally undersampled using appropriate in situ methods and are understudied using remote sensing techniques (Ajibola-James, 2023). More than often, such data-poor areas have difficulties meeting the multiple predictor variable requirement of building appropriate multivariate ML regression models. Despite the relative advantages of using ML ARIMA for modelling ESP, our knowledge of its accuracy in fitting new values when built with sparse time series satellite data is still limited, particularly in such data-poor areas. Consequently, the objectives of this paper are to (i) determine the accuracy of relatively sparse SSS data (Jan., 2016-Dec., 2020 and Jan.-Dec., 2021) for the study area; (ii) determine the interannual variability of the SSS data (Jan., 2016-Dec., 2020); and (iii) construct ML ARIMA models and determine and validate the best model (Jan. 2016-Dec. 2020) and forecast (Jan.-Dec., 2021) the ESP using relatively sparse SSS data as a case study. 2. Study Area The location adopted for this experimental study was the Nigerian coastal zone, which comprises the immediate maritime area (IMA) and the contiguous Exclusive Economic Zone (EEZ) and reaches approximately 200 nautical miles (370 km) offshore of the Nigerian continental shelf; this zone should not extend beyond the limits of approximately 350 nautical miles in accordance with the provisions of Article 76(8) of the 1982 United Nations Convention on the Law of the Sea (UNCLOS) (United Nations, undated). The IMA was established for the purpose of this study. The offset ranged from 58-100 km between the shoreline and the edge of the observation points in the contiguous EEZ (Figure 1). To significantly reduce the effect of the error associated with satellite SSS data acquisitions close to land masses on the data accuracy, as observed by Boutin et al. (2016), the IMA was excluded from the study area. The study area was restricted to 278 data observation points in the contiguous EEZ of approximately 295,027.4 km2 (Figure 1). In the area, the mean monthly rainfall ranges from approximately 28 mm in January to approximately 374 mm in September (Zabbey et al., 2019), while the mean daily temperature ranges from 25–36°C (298.15– 309.15 K) depending on the time of day and the month of the year (Usoro, 2010). Several rivers, including the Niger, Forcados, Nun, Ase, Imo, Warri, Bonny, and Sombreiro Rivers, discharge freshwater to the coastal region of Nigeria. Given the actual evaporation of 1,000 mm per annum, a total runoff of 1,700–2,000 mm, and an additional flow of 50–60 km 3 calculated for the water balance of the Niger system, a total of 250 km 3 per year eventually discharges into the Gulf of Guinea (Golitzen et al., 2005; Ajibola-James, 2023; Ajibola-James et al., 2023). 3. Materials And Methods 3.1 Satellite Observations and Map This study utilized the SMAP satellite SSS time series dataset, which was retrieved from NASA's SMAP online repository managed by NASA’s Joint Propulsion Laboratory, JPL (JPL, 2020), in netCDF-4, network Common Data Form-4 file format. Tables 3.1 (a) and (b) provide more information on the data. The base map material used for the study area was sourced from Ajibola-James (2023) and modified as appropriate ( Figure 1 ). Table 3.1 (a): Satellite dataset retrieved for the study and the sources Data Name Data Variable Observation Period, Temporal, and Spatial Resolutions Source and Metadata Url SMAP SSS; SSS Uncertainty Jan., 2016 to Dec., 2021; Monthly; 0.25° (Lat.) × 0.25° (Lon.) JPL (2020) https://doi.org/10.5067/SMP50-3TMCS Table 3.1 (b): Quantity, quality and epochs of the dataset analysed for the study Data Name Data Variable Observation (Obs.) Period Obs./ Time Total Obs. RMSD SMAP SSS Jan., 2016 to Dec., 2020 278 16680 0.1279 psu SMAP SSS Jan. to Dec., 2021 278 3336 0.1162 psu 3.2 Data Preparation Prior to the modelling and prediction tasks of the study, the appropriate data preparation tasks (data extraction, cleaning and selection) were implemented using automatic (scripted) procedures. The dataset was automatically extracted from the netCDF, network common data form (.nc and .nc4) files into comma-separated Excel (.csv) files by executing a python 3.10.2 script with glob , netCDF4 , pandas , numpy and xarray libraries in Spyder IDE (Integrated Development Environment) 5.2.2 software. The data cleaning, which involved rigorous supervised-automatic deletion of the observation records with null values and outliers induced by radio frequency interference (RFI) and land contamination in the dataset stored in the .csv file, was achieved through three consecutive tasks: (a) automatic deletion of null values by executing a python script with libraries pandas , numpy , csv and xarray in the IDE; (b) visual identification and verification of outliers by overlaying each of the monthly SSS observations in the .csv files on the Google Earth Pro online to ascertain their proximity to land and tendency for land contamination; and (c) automatic deletion of the predetermined outliers by using their concatenated location coordinates as criteria for executing a python script with the same libraries and IDE that was utilized in (a) above. A total of 278 appropriate satellite observation points were selected for analysis in this study; these points constitute the study area (Figure 1), was achieved by executing a python script with the pandas , numpy , csv and xarray libraries in the IDE. The points were imported and merged with the base map using the overlay function in ArcMap 10.4.1 (Ajibola-James et al., 2023; Ajibola-James, 2023). 3.3 Data accuracy and variability The accuracy of the satellite SSS data was computed in Microsoft Excel software by using the SSS uncertainty data (the difference between in situ SSS and satellite SSS) that were downloaded with the SSS data as the only input. See Table 3.1 (a) . To compute the accuracy of the modelling data, the SSS uncertainty data of 16680 observation points were uploaded to column A in Excel to produce the formula A2:A16681 for computing the sum square (SUMSQ) in cell C2, which was given by the formula SUMSQ (A2:A16681). The mean squared difference (MSD) given by formula =(C2/16680) was computed in cell D2, while the RMSD was finally computed by using formula =SQRT(D2). The same procedure was replicated for computing the accuracy of the forecasting data using 3336 observation points. See Table 3.1 (b) for details of the input datasets. Table 3.3: Dataframe for computing interannual variability in SSS Year SSS 2016 33.15872 2017 33.12886 2018 32.79823 2019 32.55897 2020 33.02366 The interannual variability of the SSS data was determined by utilizing the MLmetrics library to compute the SD, a universal measure of variability in R 4.1.3/R-studio 2022.02.3-492 software. After the mean annual SSS values for 2016 to 2020 were uploaded to the software by running data_obs_sss <- read.csv(file.choose(), header = TRUE, stringsAsFactors = FALSE), the dataframe produced ( Table 3.3 ) by running data_sss <- data_obs_sss[, c("year", "sss")] was vectorized by running sss_2016_2020 <- data_sss$sss. The SD was finally computed by running sd (sss_2016_2020). 3.4 Autoregressive Integrated Moving Average Model and Algorithm In the application of ML methods for modelling and forecasting variations in SSS, ESP and ARIMA models and algorithms were built primarily with the forecast library 8.17.0 in R 4.1.3/R-studio 2022.02.3-492 software. Other complimentary libraries, such as tseries and MLmetrics , were also used in this process. Model fitting and selection were achieved with the auto.arima() function. The function helps to determine the best model for given input data based on relevant model evaluation criteria. The function employs a variant of the Hyndman-Khandakar method, which combines unit root testing, Akaike information criterion (AIC) minimization, the Bayesian information criterion (BIC) and maximum likelihood estimation (MLE) to generate ARIMA models (Hyndman & Khandakar, 2008; Hyndman & Athanasopoulos, 2018). The most widely used criteria are the AIC and BIC (Rahman & Hasan, 2017; Suleiman & Sani, 2020). The function performs intuitive parameter estimation and provides information on the best ARIMA model parameter. At the inception of the ML modelling task, the dataframe, df, containing 60 monthly epochs (Jan. 2016-Dec. 2020) of the SSS data was transformed from "function" to “time series” to satisfy one of the basic assumptions of the ARIMA model. The time series data were assessed for stationarity utilizing both visual and metric approaches. The former involved the inspection of autocorrelation function (ACF) and partial autocorrelation function (PACF) plot patterns, while the latter was characterized by hypothesis testing using augmented Dickey-Fuller (ADF) test metrics. The following hypotheses and assumptions (decision rules) were adopted for the ADF test: H 0 : No white noise (nonstationary) H 1 : White noise (Stationary) where H 0 is the null hypothesis and H 1 is the alternative hypothesis. If the p value is ≤ 0.05, H 0 is rejected to support H1. Given that the computed p value = 0.1769, which is > 0.05, H 0 of Nonstationary was accepted to reject H 1 of Stationary. To achieve “stationarity”, another basic assumption of the ARIMA model, first-order differences in the data were used. The ADF test metrics were also used to reassess the output of the differenced data. Given that the computed p value = 0.01, which is < 0.05, H 0 of Nonstationary was rejected to accept H 1 of Stationary. The best ARIMA model together with the most appropriate parameters were identified using the auto.arima function, mymodel_train with the training data, and Outcome_SSS given by running mymodel_train <- auto.arima(Outcome_SSS, ic='aic', trace=TRUE, approximation=FALSE) (1) The Ljung-Box (Portmanteau) test was performed to assess the residual and stationarity of the auto.arima model based on the following hypotheses and assumptions (decision rule): H 0 : No white noise (nonstationary) H 1 : White noise (stationary) If the p value is ≥ 0.05, H 0 is rejected (Hyndman & Khandakar, 2008). Given that the computed p value = 0.4522, which is > 0.05, H 0 of Nonstationary was rejected to accept H 1 of Stationary. Having confirmed the stationarity of (1), it was used as input for building the user-defined forecasting model, myforecast_train , given by running myforecast_train <- forecast(mymodel_train, level=c(95), h=1*12) (2) where level is the confidence level and h is the number of monthly forecasts. Therefore, the SSS values were predicted 12 months ahead using the model and (2) built with parameter combinations h=1*12. The graph of the SSS values predicted by the model was generated by running autoplot (myforecast_train) (3) after running (2) successfully. The modelling accuracy was computed in terms of R 2 , rsq by running sss_obs1 <- myforecast_accuracy_Outcome_SSS$x sss_pred1 <- myforecast_accuracy_Outcome_SSS$fitted rss <- sum((sss_pred1 - sss_obs1) ^ 2) tss <- sum((sss_obs1 - mean(sss_obs1)) ^ 2) rsq <- 1 - rss/tss rsq (4) while the MAPE outcome of running myforecast_accuracy_Outcome_SSS <- Arima(Outcome_SSS, Model=mymodel_train) accuracy (myforecast_accuracy_Outcome_SSS) (5) where Outcome_SSS is the input time series SSS data and mymodel_train is the ML ARIMA model trained with the input of time series SSS data, which was utilized for validating the outcome of the above modelling accuracy. The forecasting accuracy in terms of the RMSE was computed and validated by computing the MAPE with the MLmetrics for the best ARIMA ML model by running RMSE (sss_pred1, sss_obs1) (6) and MAPE (sss_pred1, sss_obs1)*100 (7) Immediately after running (4) successfully, where sss_pred1 is the predicted SSS value and sss_obs1 is the actual satellite SSS for January-December 2021. 3.5 Determination and Validation of ARIMA Model Accuracy for Modelling and Forecasting SSS In subsection 3.4, the accuracy of the built ML ARIMA model for modelling variations in SSS was computed by using the R 2 performance metric, which represents the amount of variation explained by the ML model. The forecasting accuracy was determined with the RMSE, a measure of accuracy that reveals the magnitude of the difference between the predicted and observed (actual) values. The validation of the modelling and forecasting accuracy of the best ML model in relation to error estimation, which is also known as residual variation, was also computed in terms of MAPE, a good measure of the absolute percentage difference between predicted and observed values. In general, the greater the R 2 value is, the greater the amount of variation explained by the ML model. Conversely, lower values of MAPE and RMSE indicate relatively good accuracy of forecasts made by the model. In terms of the interpretation of the error metrics in real-world applications, the MAPE seems to be the most versatile because it is usually computed in percentage (%) units. In addition, what should be considered an acceptable accuracy level seems to be properly documented for the MAPE. In this regard, a MAPE less than 10% is considered to indicate “high prediction accuracy” (Lewis, 1982; Ağbulut et al., 2021b; Ajibola-James, 2023). It should be underscored that the true test of an ML time series model’s performance is in accurately forecasting new (future) values. This is usually determined by the value of its performance metrics in forecasting new target values that are not included in the model’s training datasets. 4. Results And Discussion 4.1 Data accuracy The accuracies of the relatively sparse SSS data over a geographical area of approximately 6.5° × 4.5° in terms of the RMSD are 0.1279 psu and 0.1162 psu for the modelling dataset and forecasting dataset, respectively. The two RMSD values show a relatively high level of accuracy exceeding the SMAP missions’ accuracy requirement of 0.2 psu by substantial margins of approximately 36.05% and 41.9%, respectively. It should be noted that relatively high accuracy was achieved by the rigorous supervised automatic data cleaning approach, which primarily involved deletion of the outliers induced by RFI and land contamination in the satellite dataset. This implies that the data preparation technique can reasonably affect the accuracy of the input dataset in a modelling and predictive study. 4.2 Interannual variability The interannual variability in the SSS data in terms of the SD is 0.2528. This shows that the iSSSv is relatively stable (predictable) given that it is approximately 74.72% less than 1 SD. This result shows that the dataset could be considered a viable input for the ARIMA model. However, to achieve “stationarity”, a basic assumption of the ARIMA model, the first-order differences of the data were taken as earlier mentioned in section 3.4. This implies that the order of differences that would be taken in a given input data for ML ARIMA modelling is a function of the SD value. Consequently, data variability assessment using the SD value should be considered an essential aspect of exploratory data analysis (EDA) in the process of building ML ARIMA models. 4.3 Determination and Validation of the Best ARIMA Model The best ML ARIMA model with the most appropriate parameter that scored the minimum AIC value of 81.80972 was ARIMA(0,1,2)(0,1,1)[12]. It was automatically determined from a variety of options of ARIMA models and the allied AIC values computed, which include the following: ARIMA(2,1,2)(1,1,1)[12] : Inf ARIMA(0,1,0)(0,1,0)[12] : 93.4323 ARIMA(0,1,1)(0,1,1)[12] : 81.91669 ARIMA(0,1,1)(0,1,0)[12]: 88.61325 ARIMA(0,1,1)(1,1,1)[12] : Inf ARIMA(0,1,1)(1,1,0)[12] : 82.80056 ARIMA(0,1,2)(0,1,1)[12] : 81.80972 ARIMA(0,1,2)(0,1,0)[12] : 89.46024 In Figure 4.3 , the “Training” side shows the result of using 60 monthly epochs (Jan., 2016-Dec., 2020) of the data to train the best ML ARIMA model for modelling variations in the SSS, while the adjoining “Forecast” side shows the result of 12 monthly epochs of the SSS forecast. The results of the preliminary automatic model selection task show that the AIC metric is an efficient approach for determining the best ML ARIMA model with the most appropriate parameters. The result of the modelling accuracy assessment performed with R 2 is 0.8345281, while the result of its validation with MAPE is 0.7779%. The relatively high R2 value shows that the ML ARIMA model explained a relatively large amount of variation, while the relatively low MAPE value shows that the ML ARIMA model has a relatively high modelling accuracy. 4.4 Determination and Validation of Forecasting Accuracy of the Best ARIMA Model Table 4.4 : Results of time series forecasting of SSS for 12 months ahead using the trained ARIMA model and the observed (actual) satellite SSS values Forecast Period (2021) ML ARIMA SSS (psu) Observed SSS (psu) January 33.11587 32.74971 February 33.51149 33.16754 March 33.73193 33.11268 April 33.84112 33.10024 May 34.15953 33.57706 June 34.77918 33.23043 July 35.00569 34.00293 August 35.03537 34.29564 September 34.49434 33.31446 October 33.25123 31.69396 November 32.02951 30.93628 December 32.26378 31.1894 According to Table 4.4 and Figure 4.4, the SSS values predicted by the best ML ARIMA model for the entire 12 months are greater than the observed satellite SSS values. This implies that the model has the tendency to overstate the SSS values in a predictive study, and the accuracy of such predicted values should be properly validated using appropriate interpretable metric(s). The forecasting accuracy of the best ML ARIMA model computed in terms of the RMSE is 0.9850, while the result of its validation in terms of MAPE is 2.7670%. Given that the RMSE is relatively difficult to interpret for such applications due to the squared nature of the measured error, the MAPE was utilized to validate the forecasting accuracy. The relatively low MAPE, which is approximately 3 times less than 10%, shows that the best ARIMA model has a relatively high forecasting accuracy. 5. Conclusion and Recommendations 5.1 Conclusion The use of sparse satellite time series SSS data from the Nigerian coastal zone as a case study for ML ARIMA for modelling and forecasting variations in ESP yields encouraging results, which imply that relatively sparse satellite time series data from at least 60 epochs (hourly, daily, weekly monthly or yearly) can be productively utilized for building a relatively accurate ML ARIMA model for modelling and forecasting variations in any ESP in any geographical area. A relative advantage of the time series model is that it does not require predictor (independent) variables to model variation and fit new (predicted) values. In this regard, the costs (in terms of the amount of data input, data processing time, and computer hardware) of implementing it are relatively low and affordable. The variation modelling accuracy that was validated with a MAPE of 0.7779% is more than 2 times greater (better) than the forecasting accuracy with a MAPE of 2.7670%. It should be underscored that such a difference in accuracy (in which the accuracy of the former exceeds that of the latter) is a normal experience in such applications of ML models because the observed data utilized for validating the accuracy of the latter are relatively new to the ML model. 5.2 Recommendations Considering the relatively high accuracy of the ML ARIMA model coupled with its relatively low costs of implementation, the following are highly recommended. The ML model and its algorithm should be updated and adopted by stakeholders (particularly government agencies and aquatic entrepreneurs) as early warning decision support tools that will enable them to provide proactive and sustainable preventive measures to any current and future risks that may be posed by any ESP to humans and the environment. For example, the ML model built with SSS data can serve as a decision support tool for providing early warning information on the risk of upstream seawater intrusion to the drinking water supply, people’s health, sensitive plants such as rice and horticultural crop yield, and the environment. Further studies on the comparative assessment of the ML model with the one utilizing a relatively large number of monthly mean SSS satellite observations should be encouraged, as more satellite data observations are available due to the apparent relative advantages they have (a) for building and improving the accuracy of such ML training and testing models and (b) over the traditional approach to time series forecasting. Additionally, appropriate local and global funding that will facilitate prompt execution of the recommendations in (1) and (2) above should be equitably provided to reliable but relatively marginalized individual researchers, private research organizations and private/public research institutions in the geospatial and related industries in Nigeria as soon as possible. References Ağbulut, Ü., Gürel, A. E., & Sarıdemir S. (2021b). Experimental investigation and prediction of performance and emission responses of a CI engine fuelled with different metal-oxide based nanoparticles–diesel blends using different machine learning algorithms. Energy , 215:119076. Ajibola-James, O. (2023). Assessment of sea surface salinity variability along Nigerian coastal zone using machine learning – 2012-2021 [Doctoral thesis, University of Nigeria, Nsukka, Enugu Campus]. Ajibola-James, O., & Okeke, F. I. (2024). An approach for good modelling and forecasting of sea surface salinity in a coastal zone using machine learning LASSO regression models built with sparse satellite time-series datasets. Research Square. Preprint. https://doi.org/10.21203/rs.3.rs-4016353/v1 Ajibola-James, O., Okeke, F. I., & Ojinnaka, O. C. (2023). Assessment of variability of sea surface salinity using integrated all-weather satellite data in a tropical coast (Nigerian coastal zone). Research Square . Preprint. https://doi.org/10.21203/rs.3.rs-3449318/v1 Anyikwa, O. B., & Martinez, N. (2012). Continental Shelf Act, 2012 . T he International Maritime Law Institute, IMO. https://imli.org/wp-content/uploads/2021/03/Obiora-Bede-Anyikwa.pdf Benvenuto, D., Giovanetti, M., Vassallo, L., Angeletti, S., & Ciccozzi, M. (2020). Application of the ARIMA model on the COVID-2019 epidemic dataset. Data in Brief, 29 . https://doi.org/10.1016/j.dib.2020.105340 Boutin, J., Chao, Y., Asher, W. E., Delcroix, T., Drucker, R., Drushka, K., Kolodziejczyk, N., Lee, T., Reul, N., Reverdin, G., Schanze, J., Soloviev, A., Yu,, L., Anderson, J., Brucker, L., Dinnat, E., Santos-Garcia, A., Jones, W., Maes, C., Meissner, T., Tang, W., Vinogradova, N., & Ward, B. (2016). Satellite and in situ salinity: understanding near-surface stratification and subfootprint variability. Bulletin of the American Meteorological Society, 97 (8), 1391–1407. https://doi:10.1175/bams-d-15-00032.1 Box, G.E.P. & Jenkins, G. (1970). Time series analysis, forecasting and control . San Francisco: Holden-Day. CGIAR Research Centers in Southeast Asia. (2016). The drought and salinity intrusion in the Mekong River Delta of Vietnam. https://cgspace.cgiar.org/rest/bitstreams/78534/retrieve/ Chan-Lau, J. A. (2017). Lasso Regressions and Forecasting Models in Applied Stress Testing. International Monetary Fund (IMF) Working Paper , WP/17/108. https://www.imf.org/~/media/Files/Publications/WP/2017/wp17108.ashx Cheung, Y.-W., & Lai, K. S. (1995). Lag order and critical values of the Augmented Dickey-Fuller test. Journal of Business & Economic Statistics, 13 (3), 277–280. https://doi.org/10.2307/1392187 Dinnat, E. P., Le Vine, D. M., Boutin, J., Meissner, T., & Lagerloef, G. (2019). Remote sensing of sea surface salinity: Comparison of satellite and in situ observations and impact of retrieval parameters. Remote Sensing, 11 (7). https://doi.org/10.3390/rs11070750 Fattah, J., Ezzine, L., Aman, Z., el Moussami, H., & Lachhab, A. (2018). Forecasting of demand using ARIMA model. International Journal of Engineering Business Management, 10 . https://doi.org/10.1177/1847979018808673 Golitzen, K. G. (Ed.), Andersen, I., Dione, O., Jarosewich-Holder, M., & Olivry, J. (2005). The Niger River Basin: A vision for sustainable management . World Bank, Washington, DC. https://doi.org/10.1596/978-0-8213-6203-7 Hyndman, R. J., & Khandakar, Y. (2008). Automatic time series forecasting: The forecast package for R. Journal of Statistical Software, 27 (1), 1–22. https://doi.org/10.18637/jss.v027.i03 Hyndman, R.J., & Athanasopoulos, G. (2018). Forecasting: Principles and practice, 2nd edition, OTexts . Melbourne, Australia. Retrieved ‎August 31, ‎2022, from https://otexts.com/fpp2/ Hyndman, R.J., & Athanasopoulos, G. (2021). Forecasting: Principles and practice, 3rd edition, OTexts . Melbourne, Australia. Retrieved ‎August 31, ‎2022, from https://otexts.com/fpp3/ Joint Propulsion Laboratory. (2020). JPL CAP SMAP Sea Surface Salinity Products (PO.DAAC; Version V5.0) [Dataset]. JPL, CA, USA. Retrieved ‎July 10, ‎2022, from https://doi.org/10.5067/SMP50-3TMCS Kotu, V., & Deshpande, B. (2019). Time Series Forecasting. Data Science, Elsevier, 395–445. https://doi.org/10.1016/B978-0-12-814761-0.00012-5 Lewis, C. D. (1982). Industrial and business forecasting methods: A radical guide to exponential smoothing and curve fitting . London: Butterworth Scientific. Liu, R., & Gillies, D. F. (2016). Overfitting in linear feature extraction for classification of high-dimensional image data. Pattern Recognition, 53 , 73–86. https://doi.org/10.1016/j.patcog.2015.11.015 Nguyen, P. T. B., Koedsin, W., McNeil, D., & Van, T. P. D. (2018). Remote sensing techniques to predict salinity intrusion: Application for a data-poor area of the coastal Mekong Delta, Vietnam. International Journal of Remote Sensing, 39 (20), 6676–6691. https://doi.org/10.1080/01431161.2018.1466071 Rahman, A., & Hasan, M. M. (2017). Modelling and forecasting of carbon dioxide emissions in Bangladesh using Autoregressive Integrated Moving Average (ARIMA) Models. Open Journal of Statistics, 7 , 560–566. https://doi.org/10.4236/ojs.2017.74038 Raudys, S. J., & Jain, A. K. (1991). Small sample size effects in statistical pattern recognition: Recommendations for practitioners. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13 (3), 252–264. https://doi.org/10.1109/34.75512 Sneath, S. (2023, September 23). Louisiana: New Orleans declares emergency over saltwater intrusion in drinking water . The Guardian. https://www.theguardian.com/us-news/2023/sep/22/louisiana-drought-drinking-water-mississippi-river-saltwater-new-orleans Suleiman, S., & Sani, M. (2020). Application of ARIMA and Artificial Neural Networks Models for daily cumulative confirmed Covid-19 prediction in Nigeria. Equity Journal of Science and Technology, 7 (2), 83–90. https://www.equijost.com/fulltext/14-1594712555.pdf?1681453732 Trung, N. H., Hoanh, C. H., Tuong, T. P., Hien, X. H., Tri, L. Q., Minh, V. Q., Nhan, D. K., Vu, P. T., & Tri, V. P. D. (2016). Climate Change Affecting Land Use in the Mekong Delta: Adaptation of Rice-Based Cropping Systems (CLUES) Theme 5: Integrated Adaptation Assessment of Bac Lieu Province and Development of Adaptation Master Plan . https://www.researchgate.net/publication/301612048_Climate_change_affecting_land_use_in_the_Mekong_Delta_Adaptation_of_rice-based_cropping_systems_CLUES_ISBN_978-1-925436-36-5 United Nations. (n.d.). United Nations Convention on the Law of the Sea. https://www.un.org/depts/los/convention_agreements/texts/unclos/unclos_e.pdf Usoro, E. (2010). Encyclopedia of the World’s coastal landforms, 1 , p. 949. London. Zabbey, N., Giadom, F. D., & Babatunde, B. B. (2019). Nigerian coastal environments. In C. Sheppard (Ed.), World Seas: An environmental evaluation (pp. 835–854). Elsevier. https://doi.org/10.1016/B978-0-12-805068-2.00042-5 Zhao, J., Temimi, M., & Ghedira, H. (2017). Remotely sensed sea surface salinity in the hypersaline Arabian Gulf: Application to landsat 8 OLI data. Estuarine, Coastal and Shelf Science, 187 , 168–177. https://doi.org/10.1016/j.ecss.2017.01.008 Additional Declarations The authors declare no competing interests. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4056329","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":277685107,"identity":"71a64a29-382a-432e-b77c-7011dd6efe37","order_by":0,"name":"Opeyemi Ajibola-James","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABTUlEQVRIie3RMWvCQBQH8CcBs5xkTVDST1C4ELhJkq9iCDhJEQrFocMV4bJYXPVbdBLHk4AuabtGAsVS6uSQqVgIpWdSWqK2XQu9P9xxXO53j5cDkJH5o8EehQYA362bYlSoImb0G0EfpP1FfjQV+nlnmM878m0ZDZRV93HaRNpV+HyTZvfOUO2zp27voeGqt7M0Bcc8pSVi0CrGXtRGOp+T5Zgl/ngwC+xRdI4QOvONEfg24eU+OAjCQgSck6RGEx/HHqvXWEv00iGK2Pcm+0RNc3LCFy9Jlt35bk7eBNE2RMmOEVRUwTwiCVS5g/UdoYLoogocEqOPunkvFo8ultfMb+nRjNXRXJB4YxsDfNCLpgYT63XaNM14MYm3meNqQbCuo8uWqw47VrrtOeYeEU9QxflCLz545T8K+NjTKKuiXnHWPXZERkZG5n/nHVyGepylDbotAAAAAElFTkSuQmCC","orcid":"https://orcid.org/0000-0002-3012-7569","institution":"Geo Inheritance Limited","correspondingAuthor":true,"prefix":"","firstName":"Opeyemi","middleName":"","lastName":"Ajibola-James","suffix":""},{"id":277685243,"identity":"cfd32664-a73f-4ea1-835e-f05de0b982b9","order_by":1,"name":"Francis I. Okeke","email":"","orcid":"","institution":"University of Nigeria, Enugu Campus, Nigeria","correspondingAuthor":false,"prefix":"","firstName":"Francis","middleName":"I.","lastName":"Okeke","suffix":""}],"badges":[],"createdAt":"2024-03-09 13:01:31","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-4056329/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4056329/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":52448248,"identity":"a783df80-825e-4a69-8e0a-ea8809fba35c","added_by":"auto","created_at":"2024-03-11 18:40:52","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":519818,"visible":true,"origin":"","legend":"\u003cp\u003eMap of the study area showing the 278 points (in red) of SMAP satellite SSS data observations (January 2016-December 2021).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eSource:\u003c/strong\u003eAnyikwa \u0026amp; Martinez (2012) and \u003cstrong\u003eModification\u003c/strong\u003e: Ajibola-James (2023)\u003c/p\u003e","description":"","filename":"Fig1.png","url":"https://assets-eu.researchsquare.com/files/rs-4056329/v1/1b8d7b7401d150ec6f11188b.png"},{"id":52447933,"identity":"4ce756fb-1ef3-4faf-a8e1-f56ed8869cac","added_by":"auto","created_at":"2024-03-11 18:32:52","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":103597,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFigure 4.3: \u003c/strong\u003eModelling (Jan. 2016-Dec. 2020) and forecasting (Jan.-Dec., 2021) of SSS variations using the best ML ARIMA model\u003c/p\u003e","description":"","filename":"Fig4.3.png","url":"https://assets-eu.researchsquare.com/files/rs-4056329/v1/9e388a472d116f27c6160b7c.png"},{"id":52447932,"identity":"fc6b2663-013e-4d56-836a-497e4c4c7dc3","added_by":"auto","created_at":"2024-03-11 18:32:52","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":32757,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFigure 4.4\u003c/strong\u003e: Bar chart of the resultsof time series forecasting of SSS for 12 months ahead using the trained ARIMA model and the observed (actual) satellite SSS values\u003c/p\u003e","description":"","filename":"Fig4.4.png","url":"https://assets-eu.researchsquare.com/files/rs-4056329/v1/7ee43363f33bddf5ddb02985.png"},{"id":52448941,"identity":"2470dc1d-ec95-4096-8c85-02202cd32e4c","added_by":"auto","created_at":"2024-03-11 18:48:52","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":996006,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4056329/v1/a261b90a-8dd9-4ca5-88a1-3c245ff04a6b.pdf"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003e\u003cstrong\u003eMachine learning ARIMA for modelling and forecasting variations inearth’s surface phenomenon using sparse time series satellite data—acase study of sea surface salinity in the Nigerian coastal zone\u003c/strong\u003e\u003c/p\u003e","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eLike\u0026nbsp;sea surface salinity (SSS), every ESP is characterized by both spatial and temporal variations. The magnitude and frequency of such variations are usually driven by\u0026nbsp;several\u0026nbsp;factors. In some cases, such variations are associated with some risks to\u0026nbsp;humankind\u0026nbsp;and the environment, which is\u0026nbsp;characterized by various species of plants, animals and\u0026nbsp;microorganisms. In the case of changes in SSS on a global spatial scale, evaporation, precipitation, and\u0026nbsp;river\u0026nbsp;outflow are among the principal drivers (Dinnat et al., 2019). However, changes in SSS on a local spatial scale in the tropics, particularly along the Nigerian coastal zone,\u0026nbsp;have been attributed to three important factors,\u0026nbsp;namely,\u0026nbsp;wind speed, high wind speed and sea level anomalies (Ajibola-James, 2023). The implications of spatial and temporal anomalies in SSS along coastal zones, particularly on relatively small (local or national) spatial\u0026nbsp;scales,\u0026nbsp;include the increasing risk of upstream seawater intrusion. More than often, the risk is associated with\u0026nbsp;socioeconomic\u0026nbsp;and environmental problems such as (a)\u0026nbsp;the\u0026nbsp;relatively high cost of tidal\u0026nbsp;river\u0026nbsp;water treatment for domestic and industrial purposes, (b)\u0026nbsp;the\u0026nbsp;threat to\u0026nbsp;a\u0026nbsp;sustainable freshwater supply for household consumption (Sneath, 2023) coupled with exposure to high blood pressure that may result from drinking water containing relatively high salt\u0026nbsp;concentrations, (c)\u0026nbsp;a\u0026nbsp;decrease in\u0026nbsp;the\u0026nbsp;viability of the agricultural sector that can achieve\u0026nbsp;an\u0026nbsp;optimum yield of sensitive plants\u0026nbsp;such as\u0026nbsp;paddy rice and horticultural crops (CGIARCSA, 2016, Trung et al., 2016), and (d) disturbed natural ecosystems that cannot support species diversity and composition. It should be\u0026nbsp;noted\u0026nbsp;that relatively low-salinity\u0026nbsp;water\u0026nbsp;is important\u0026nbsp;for\u0026nbsp;establishing an enabling natural environment for industrial growth and economic development in coastal areas, particularly for\u0026nbsp;the\u0026nbsp;manufacturing and food processing industries. Therefore, proper\u0026nbsp;modelling\u0026nbsp;and forecasting of\u0026nbsp;the\u0026nbsp;ESP,\u0026nbsp;including SSS changes along coastal zones,\u0026nbsp;are crucial for providing useful early\u0026nbsp;warning information for mitigating any such future risks (Ajibola-James, 2023; Ajibola-James et al., 2023).\u003c/p\u003e\n\u003cp\u003ePrior to the advent of remote sensing technologies that lend themselves to the observation of specific ESP, the traditional approach to such surface data acquisition has been in situ measurements. However, remoteness and large spatial extent limit conventional in situ measurements, while the seasonal cloud cover of dynamically important regions limits the applications of the predominant optical satellite surface observations. In the case of SSS observations, the launch of different all-weather satellite missions focused on sea surface observations, particularly the European Space Agency (ESA) Soil Moisture and Ocean Salinity (SMOS) satellite, which had a microwave imaging radiometer using aperture synthesis (MIRAS) on board in 2009; the subsequent National Aeronautics and Space Administration (NASA) Aquarius in 2011; the Soil Moisture Active Passive Mission (SMAP) in 2015, which used L-band (1.4 GHz) radiometry to measure SSS at approximately 0.2 practical salinity unit (psu) accuracy; and a paradigm shift to global satellite observations of SSS and other relevant sea surface variables such as high wind speed, wind speed, and sea surface temperature (Ajibola-James, 2023). The increasing development of contemporary all-weather satellite missions signifies the relative importance of their datasets for various applications, particularly for ML modelling and forecasting of ESP at different spatial scales ranging from local to global scales.\u003c/p\u003e\n\u003cp\u003eML is a subset of artificial intelligence\u0026nbsp;that\u0026nbsp;enables\u0026nbsp;computers\u0026nbsp;to cleverly and intuitively make relatively accurate predictions based on previous learning by ML models. ML is a method of data analysis that\u0026nbsp;involves\u0026nbsp;building systems (models and algorithms) that can learn from data without being explicitly programmed, identifying\u0026nbsp;patterns,\u0026nbsp;and\u0026nbsp;making\u0026nbsp;decisions with minimal human intervention (Ajibola-James, 2023).\u0026nbsp;To\u0026nbsp;make the model selection process simpler for forecasting, ML entails a variety of strategies for\u0026nbsp;identifying\u0026nbsp;patterns and relationships in the data (Chan-Lau, 2017). A notable advantage of ML models and algorithms is their increasing\u0026nbsp;ability\u0026nbsp;to handle the time component of relatively\u0026nbsp;large amounts of\u0026nbsp;data (complex structured,\u0026nbsp;semistructured\u0026nbsp;and unstructured datasets with several characteristics,\u0026nbsp;including volume, velocity, veracity, value and validity) in predictive studies. A time series is a sequence of data collected over a specific period of time. The time scale component of a data series may be either every minute or hourly or daily or monthly or yearly.\u0026nbsp;When\u0026nbsp;only one of the time scales is involved, it is regarded as\u0026nbsp;a\u0026nbsp;single seasonality. Any situation involving datasets with more than one of the time scales, for example,\u0026nbsp;hourly and daily or hourly, daily and monthly or daily, monthly, and yearly, is called multiple seasonality. Time series forecasting has\u0026nbsp;become\u0026nbsp;a significant part of ML since there are many prediction problems with time\u0026nbsp;components\u0026nbsp;(Ajibola-James, 2023).\u003c/p\u003e\n\u003cp\u003eIn terms of trade-offs, Chan-Lau (2017) considered various ML methods from two categorical perspectives,\u0026nbsp;namely,\u0026nbsp;\u0026lsquo;flexibility\u0026rsquo; and \u0026lsquo;interpretability\u0026rsquo;. He opines that the latter should be given priority over the former and hence suggests a selection of relevant ML methods in decreasing order of interpretability, least absolute shrinkage and selection operator (LASSO) regressions, least squares (LS), generalized additive models (GAM), trees (T), support vector machines (SVM), and methods combining different base learning methods such as bagging and boosting. He further argues that the predictive power and interpretability of a linear regression model that improves fit by including\u0026nbsp;a\u0026nbsp;large number of independent variables are negatively affected.\u0026nbsp;To\u0026nbsp;alleviate and/or possibly overcome such effects, he\u0026nbsp;proposed two types\u0026nbsp;of linear models based on methods\u0026nbsp;such as\u0026nbsp;\u0026lsquo;subset selection\u0026rsquo; and \u0026lsquo;shrinkage\u0026rsquo;. Typical examples of ML models that use the latter approach are L0-regularized regression (L0) and LASSO models. These\u0026nbsp;models\u0026nbsp;are considered sparse learning models, which can assist in eliminating\u0026nbsp;the\u0026nbsp;least important set of predictor variables to optimize the forecast accuracy. The ARIMA\u0026nbsp;model,\u0026nbsp;in which the predictors consist of lags of the dependent variable and/or lags of the forecast errors,\u0026nbsp;may be considered an example of\u0026nbsp;an\u0026nbsp;ML\u0026nbsp;model\u0026nbsp;that utilizes the former method. The relative advantage of\u0026nbsp;the\u0026nbsp;ARIMA model for time series\u0026nbsp;modelling\u0026nbsp;and prediction is that it does not require predictor (independent) variables to fit new (predicted) values. Therefore, the costs (in terms of\u0026nbsp;the\u0026nbsp;amount of data input, data processing time, and computer hardware) of implementing it\u0026nbsp;are\u0026nbsp;relatively low (Ajibola-James, 2023).\u003c/p\u003e\n\u003cp\u003eThe ARIMA model, which\u0026nbsp;usually seeks to describe data autocorrelations by providing complementary approaches to a problem,\u0026nbsp;is one of the most widely used methods for time series forecasting. The differenced autoregressive model is combined with the moving average model to form a typical ARIMA model, which consists of three technical parts. The AR component of ARIMA indicates that the time series has been regressed on its own past data. The MA component of ARIMA denotes that the forecast error is a linear combination of previous errors. The I part of ARIMA shows that the data values have been replaced with\u0026nbsp;different\u0026nbsp;values of \u003cem\u003ed\u003c/em\u003e to obtain stationary data, as required by the assumption of\u0026nbsp;the\u0026nbsp;ARIMA model. With this combination approach, the ARIMA model is effective\u0026nbsp;at\u0026nbsp;fitting past data and forecasting future points in a time series (Kotu \u0026amp; Deshpande, 2019).\u0026nbsp;In the applications of\u0026nbsp;the\u0026nbsp;ARIMA model, a widely used approach is known as\u0026nbsp;the\u0026nbsp;Box\u0026ndash;Jenkins principle, which consists of three iterative steps,\u0026nbsp;namely,\u0026nbsp;model identification, parameter estimation, and diagnostic checking phases (Box \u0026amp; Jenkins, 1970).\u0026nbsp;The main goal and fundamental rule of the model identification phase is to produce stationary time series data that have a constant mean and variance to comply with a basic requirement for time series forecasting, which is also one of the basic assumptions of\u0026nbsp;the\u0026nbsp;ARIMA model (Hyndman \u0026amp; Khandakar, 2008; Hyndman \u0026amp; Athanasopoulos, 2018). This implies that a time series should exhibit some theoretical autocorrelation (stationarity or white noise) qualities if it is derived from an ARIMA process. Such a stationary time series can be visually represented by autocorrelation function (ACF) and partial autocorrelation function (PACF) plots that do not show any exponential decay. Consequently, testing time series data for the presence of either white noise (stationarity) or\u0026nbsp;a\u0026nbsp;unit root (nonstationarity) is a required criterion in time series analysis. In this regard, Box \u0026amp; Jenkins (1970) suggest using the ACF and PACF\u0026nbsp;of sample data\u0026nbsp;as the fundamental tools to determine the\u0026nbsp;order of ARIMA models. The ACF has been used to determine whether time series data\u0026nbsp;are\u0026nbsp;stationary, while the PACF has been used to test time series datasets for seasonality as part of\u0026nbsp;the\u0026nbsp;data preparation process for deploying ARIMA models (Fattah et al., 2018; Benvenuto, 2020; Hyndman \u0026amp; Athanasopoulos, 2021). The bar charts of\u0026nbsp;the\u0026nbsp;ACF plot of a stationary time series\u0026nbsp;approach\u0026nbsp;zero relatively rapidly, but those of\u0026nbsp;the\u0026nbsp;ACF plot of\u0026nbsp;nonstationary\u0026nbsp;data\u0026nbsp;decline\u0026nbsp;slowly (Hyndman \u0026amp; Athanasopoulos, 2021). However, a credible test for stationarity cannot be achieved by utilizing only the ACF plot, an informal test for stationarity that is based solely on the visual analysis of the series. In this regard,\u0026nbsp;the augmented\u0026nbsp;Dickey-Fuller\u0026nbsp;test\u0026nbsp;(ADF), a relatively credible and commonly used method (Cheung \u0026amp; Lai, 1995) that offers objective metric values for testing time series for stationarity,\u0026nbsp;has been suggested. The Dickey-Fuller (DF) value,\u0026nbsp;also known as\u0026nbsp;the\u0026nbsp;critical value of\u0026nbsp;the\u0026nbsp;ADF test,\u0026nbsp;and its corresponding \u003cem\u003ep\u0026nbsp;\u003c/em\u003evalue can easily be interpreted without prejudice to determine the stationarity of time series data (Ajibola-James, 2023).\u003c/p\u003e\n\u003cp\u003eIn the parlance of ML, it is generally believed that a relatively large amount of historical data is required to successfully build and test a relatively accurate and reliable model for both classification and forecasting purposes. This is essentially because a small sample size is related to overfitting (a condition that predisposes a model that performs very well on small amounts of training data to fail in predicting a new task on new samples), which usually inhibits the development of a useful model (Raudys \u0026amp; Jain, 1991; Liu \u0026amp; Gillies, 2016; Zhao et al., 2017; Nguyen et al., 2018; Ajibola-James, 2023). Consequently, the contemporary all-weather satellite observations of phenomena of interest that are characterized by relatively sparse time series data discourage their utilization as input in building efficient ML models for such purposes. The tropical coasts, particularly the Nigerian coastal zone, have been traditionally undersampled using appropriate in situ methods and are understudied using remote sensing techniques (Ajibola-James, 2023). More than often, such data-poor areas have difficulties meeting the multiple predictor variable requirement of building appropriate multivariate ML regression models. Despite the relative advantages of using ML ARIMA for modelling ESP, our knowledge of its accuracy in fitting new values when built with sparse time series satellite data is still limited, particularly in such data-poor areas. Consequently, the objectives of this paper are to (i) determine the accuracy of relatively sparse SSS data (Jan., 2016-Dec., 2020 and Jan.-Dec., 2021) for the study area; (ii) determine the interannual variability of the SSS data (Jan., 2016-Dec., 2020); and (iii) construct ML ARIMA models and determine and validate the best model (Jan. 2016-Dec. 2020) and forecast (Jan.-Dec., 2021) the ESP using relatively sparse SSS data as a case study.\u003c/p\u003e"},{"header":"2. Study Area","content":"\u003cp\u003eThe location\u0026nbsp;adopted\u0026nbsp;for this experimental study was the Nigerian coastal zone, which\u0026nbsp;comprises the immediate maritime area (IMA) and the contiguous Exclusive Economic Zone (EEZ)\u0026nbsp;and reaches\u0026nbsp;approximately 200 nautical miles (370 km) offshore\u0026nbsp;of\u0026nbsp;the Nigerian continental shelf; this zone\u0026nbsp;should not extend beyond the limits of approximately 350 nautical miles in accordance with the provisions of Article 76(8) of\u0026nbsp;the\u0026nbsp;1982 United Nations Convention on the Law of the Sea (UNCLOS) (United Nations, undated). The IMA was established for the purpose of this study.\u0026nbsp;The\u0026nbsp;offset\u0026nbsp;ranged from\u0026nbsp;58-100 km between the shoreline and the edge of the observation points in the contiguous EEZ (Figure 1).\u0026nbsp;To\u0026nbsp;significantly reduce the effect of the error associated with satellite SSS data acquisitions close to land masses on the data accuracy,\u0026nbsp;as observed by Boutin et al. (2016), the IMA was excluded from the study area. The study area was restricted to 278 data observation points in the contiguous EEZ of\u0026nbsp;approximately\u0026nbsp;295,027.4 km2 (Figure 1). In the area, the mean monthly rainfall ranges from\u0026nbsp;approximately\u0026nbsp;28 mm in January to\u0026nbsp;approximately\u0026nbsp;374 mm in September (Zabbey et al., 2019),\u0026nbsp;while the mean daily temperature\u0026nbsp;ranges from\u0026nbsp;25\u0026ndash;36\u0026deg;C (298.15\u0026ndash; 309.15 K) depending on the time of day and the month of the year (Usoro, 2010). Several rivers,\u0026nbsp;including the Niger, Forcados, Nun, Ase, Imo, Warri, Bonny, and Sombreiro Rivers,\u0026nbsp;discharge freshwater to the coastal region of Nigeria. Given the actual evaporation of 1,000 mm per annum,\u0026nbsp;a total runoff of 1,700\u0026ndash;2,000 mm,\u0026nbsp;and an additional flow of 50\u0026ndash;60 km\u003csup\u003e3\u003c/sup\u003e calculated for the water balance of the Niger system, a total of 250 km\u003csup\u003e3\u003c/sup\u003e per year eventually discharges into the Gulf of Guinea (Golitzen et al., 2005; Ajibola-James, 2023; Ajibola-James et al., 2023).\u003c/p\u003e"},{"header":"3. Materials And Methods","content":"\u003cp\u003e\u003cstrong\u003e3.1 Satellite Observations and Map\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis study utilized the SMAP satellite SSS time series dataset, which was retrieved from NASA\u0026apos;s SMAP online repository managed by NASA\u0026rsquo;s Joint Propulsion Laboratory, JPL (JPL, 2020), in netCDF-4, network Common Data Form-4 file format. \u003cstrong\u003eTables 3.1 (a) and (b)\u003c/strong\u003e provide more information on the data. The base map material used for the study area was sourced from Ajibola-James (2023) and modified as appropriate (\u003cstrong\u003eFigure 1\u003c/strong\u003e).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable 3.1 (a):\u0026nbsp;\u003c/strong\u003eSatellite dataset retrieved for the study and the sources\u003c/p\u003e\n\u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd width=\"10.992366412213741%\" valign=\"top\"\u003e\n \u003cp\u003e\u003cstrong\u003eData Name\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"18.3206106870229%\" valign=\"top\"\u003e\n \u003cp\u003e\u003cstrong\u003eData Variable\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"28.396946564885496%\" valign=\"top\"\u003e\n \u003cp\u003e\u003cstrong\u003eObservation Period, Temporal, and\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003eSpatial Resolutions\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"42.29007633587786%\" valign=\"top\"\u003e\n \u003cp\u003e\u003cstrong\u003eSource and Metadata Url\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"10.992366412213741%\" valign=\"top\"\u003e\n \u003cp\u003eSMAP\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"18.3206106870229%\" valign=\"top\"\u003e\n \u003cp\u003eSSS;\u003c/p\u003e\n \u003cp\u003eSSS Uncertainty\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"28.396946564885496%\" valign=\"top\"\u003e\n \u003cp\u003eJan., 2016 to Dec., 2021;\u003c/p\u003e\n \u003cp\u003eMonthly;\u003c/p\u003e\n \u003cp\u003e0.25\u0026deg; (Lat.) \u0026times; 0.25\u0026deg; (Lon.)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"42.29007633587786%\" valign=\"top\"\u003e\n \u003cp\u003eJPL (2020)\u003c/p\u003e\n \u003cp\u003ehttps://doi.org/10.5067/SMP50-3TMCS\u003c/p\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003e\u003cbr\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable 3.1 (b):\u0026nbsp;\u003c/strong\u003eQuantity, quality and epochs of the dataset analysed for the study\u003c/p\u003e\n\u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd width=\"11.676646706586826%\" valign=\"top\"\u003e\n \u003cp\u003e\u003cstrong\u003eData Name\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"26.94610778443114%\" valign=\"top\"\u003e\n \u003cp\u003e\u003cstrong\u003eData Variable\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"26.047904191616766%\" valign=\"top\"\u003e\n \u003cp\u003e\u003cstrong\u003eObservation (Obs.) Period\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"9.131736526946108%\" valign=\"top\"\u003e\n \u003cp\u003e\u003cstrong\u003eObs./\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003eTime\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"9.131736526946108%\" valign=\"top\"\u003e\n \u003cp\u003e\u003cstrong\u003eTotal Obs.\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"17.065868263473053%\" valign=\"top\"\u003e\n \u003cp\u003e\u003cstrong\u003eRMSD\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"11.676646706586826%\" valign=\"top\"\u003e\n \u003cp\u003eSMAP\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"26.94610778443114%\" valign=\"top\"\u003e\n \u003cp\u003eSSS\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"26.047904191616766%\" valign=\"top\"\u003e\n \u003cp\u003eJan., 2016 to Dec., 2020\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"9.131736526946108%\" valign=\"top\"\u003e\n \u003cp\u003e278\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"9.131736526946108%\" valign=\"top\"\u003e\n \u003cp\u003e16680\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"17.065868263473053%\" valign=\"top\"\u003e\n \u003cp\u003e0.1279 psu\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"11.676646706586826%\" valign=\"top\"\u003e\n \u003cp\u003eSMAP\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"26.94610778443114%\" valign=\"top\"\u003e\n \u003cp\u003eSSS\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"26.047904191616766%\" valign=\"top\"\u003e\n \u003cp\u003eJan. to Dec., 2021\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"9.131736526946108%\" valign=\"top\"\u003e\n \u003cp\u003e278\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"9.131736526946108%\" valign=\"top\"\u003e\n \u003cp\u003e3336\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"17.065868263473053%\" valign=\"top\"\u003e\n \u003cp\u003e0.1162 psu\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003e\u003cbr\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e3.2 Data Preparation\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003ePrior to the modelling and prediction tasks of the study, the appropriate data preparation tasks (data extraction, cleaning and selection) were implemented using automatic (scripted) procedures. The dataset was automatically extracted from the netCDF, network common data form (.nc and .nc4) files into comma-separated Excel (.csv) files by executing a python 3.10.2 script with \u003cem\u003eglob\u003c/em\u003e, \u003cem\u003enetCDF4\u003c/em\u003e, \u003cem\u003epandas\u003c/em\u003e, \u003cem\u003enumpy\u003c/em\u003e and \u003cem\u003exarray\u003c/em\u003e libraries in Spyder IDE (Integrated Development Environment) 5.2.2 software. The data cleaning, which involved rigorous supervised-automatic deletion of the observation records with null values and outliers induced by radio frequency interference (RFI) and land contamination in the dataset stored in the .csv file, was achieved through three consecutive tasks: (a) automatic deletion of null values by executing a python script with libraries \u003cem\u003epandas\u003c/em\u003e, \u003cem\u003enumpy\u003c/em\u003e, \u003cem\u003ecsv\u003c/em\u003e and \u003cem\u003exarray\u003c/em\u003e in the IDE; (b) visual identification and verification of outliers by overlaying each of the monthly SSS observations in the .csv files on the Google Earth Pro online to ascertain their proximity to land and tendency for land contamination; and (c) automatic deletion of the predetermined outliers by using their concatenated location coordinates as criteria for executing a python script with the same libraries and IDE that was utilized in (a) above. A total of 278 appropriate satellite observation points were selected for analysis in this study; these points constitute the study area (Figure 1), was achieved by executing a python script with the \u003cem\u003epandas\u003c/em\u003e, \u003cem\u003enumpy\u003c/em\u003e, \u003cem\u003ecsv\u003c/em\u003e and \u003cem\u003exarray\u003c/em\u003e libraries in the IDE. The points were imported and merged with the base map using the overlay function in ArcMap 10.4.1 (Ajibola-James et al., 2023; Ajibola-James, 2023).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e3.3 Data accuracy and variability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe accuracy of the satellite SSS data was computed in Microsoft Excel software by using the SSS uncertainty data (the difference between in situ SSS and satellite SSS) that were downloaded with the SSS data as the only input. See \u003cstrong\u003eTable 3.1 (a)\u003c/strong\u003e. To compute the accuracy of the modelling data, the SSS uncertainty data of 16680 observation points were uploaded to column A in Excel to produce the formula A2:A16681 for computing the sum square (SUMSQ) in cell C2, which was given by the formula SUMSQ (A2:A16681). The mean squared difference (MSD) given by formula =(C2/16680) was computed in cell D2, while the RMSD was finally computed by using formula =SQRT(D2). The same procedure was replicated for computing the accuracy of the forecasting data using 3336 observation points. See \u003cstrong\u003eTable 3.1 (b)\u003c/strong\u003e for details of the input datasets.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable 3.3:\u0026nbsp;\u003c/strong\u003eDataframe for computing interannual variability\u0026nbsp;in\u0026nbsp;SSS\u003c/p\u003e\n\u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e\u003cstrong\u003eYear\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e\u003cstrong\u003eSSS\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e2016\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e33.15872\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e2017\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e33.12886\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e2018\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e32.79823\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e2019\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e32.55897\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e2020\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003e33.02366\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003e\u003cbr\u003e\u003c/p\u003e\n\u003cp\u003eThe interannual variability of the SSS data was determined by utilizing the \u003cem\u003eMLmetrics\u003c/em\u003e library to compute the SD, a universal measure of variability in R 4.1.3/R-studio 2022.02.3-492 software. After the mean annual SSS values for 2016 to 2020 were uploaded to the software by running data_obs_sss \u0026lt;- read.csv(file.choose(), header = TRUE, stringsAsFactors = FALSE), the dataframe produced (\u003cstrong\u003eTable 3.3\u003c/strong\u003e) by running data_sss \u0026lt;- data_obs_sss[, c(\u0026quot;year\u0026quot;, \u0026quot;sss\u0026quot;)] was vectorized by running sss_2016_2020 \u0026lt;- data_sss$sss. The SD was finally computed by running sd (sss_2016_2020).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e3.4 Autoregressive Integrated Moving Average Model and Algorithm\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eIn the application of ML methods for modelling and forecasting variations in SSS, ESP and ARIMA models and algorithms were built primarily with the \u003cem\u003eforecast\u003c/em\u003e library 8.17.0 in R 4.1.3/R-studio 2022.02.3-492 software. Other complimentary libraries, such\u003cem\u003e\u0026nbsp;\u003c/em\u003e\u003cem\u003eas\u0026nbsp;\u003c/em\u003e\u003cem\u003etseries\u003c/em\u003e and \u003cem\u003eMLmetrics\u003c/em\u003e\u003cem\u003e,\u003c/em\u003e were also used in this process. Model fitting and selection were achieved with the \u003cem\u003eauto.arima()\u003c/em\u003e function. The function helps to determine the best model for given input data based on relevant model evaluation criteria. The function employs a variant of the Hyndman-Khandakar method, which combines unit root testing, Akaike information criterion (AIC) minimization, the Bayesian information criterion (BIC) and maximum likelihood estimation (MLE) to generate ARIMA models (Hyndman \u0026amp; Khandakar, 2008; Hyndman \u0026amp; Athanasopoulos, 2018). The most widely used criteria are the AIC and BIC (Rahman \u0026amp; Hasan, 2017; Suleiman \u0026amp; Sani, 2020). The function performs intuitive parameter estimation and provides information on the best ARIMA model parameter.\u003c/p\u003e\n\u003cp\u003eAt the inception of the ML modelling task, the dataframe, df, containing 60 monthly epochs (Jan. 2016-Dec. 2020) of the SSS data was transformed from \u0026quot;function\u0026quot; to \u0026ldquo;time series\u0026rdquo; to satisfy one of the basic assumptions of the ARIMA model. The time series data were assessed for stationarity utilizing both visual and metric approaches. The former involved the inspection of autocorrelation function (ACF) and partial autocorrelation function (PACF) plot patterns, while the latter was characterized by hypothesis testing using augmented Dickey-Fuller (ADF) test metrics. The following hypotheses and assumptions (decision rules) were adopted for the ADF test:\u003c/p\u003e\n\u003cp\u003eH\u003csub\u003e0\u003c/sub\u003e: No white noise (nonstationary)\u003c/p\u003e\n\u003cp\u003eH\u003csub\u003e1\u003c/sub\u003e: White noise (Stationary)\u003c/p\u003e\n\u003cp\u003ewhere H\u003csub\u003e0\u003c/sub\u003e is the null hypothesis and H\u003csub\u003e1\u0026nbsp;\u003c/sub\u003eis the alternative hypothesis.\u003c/p\u003e\n\u003cp\u003eIf the \u003cem\u003ep\u003c/em\u003e\u003cem\u003e\u0026nbsp;\u003c/em\u003e\u003cem\u003evalue\u003c/em\u003e is \u0026le; 0.05, H\u003csub\u003e0\u003c/sub\u003e \u003csub\u003eis rejected to support H1.\u003c/sub\u003e\u003c/p\u003e\n\u003cp\u003eGiven that the computed \u003cem\u003ep value\u003c/em\u003e = 0.1769, which is \u0026gt; 0.05, H\u003csub\u003e0\u003c/sub\u003e of Nonstationary was accepted to reject H\u003csub\u003e1\u003c/sub\u003e of Stationary. To achieve \u0026ldquo;stationarity\u0026rdquo;, another basic assumption of the ARIMA model, first-order differences in the data were used. The ADF test metrics were also used to reassess the output of the differenced data. Given that the computed \u003cem\u003ep value\u003c/em\u003e = 0.01, which is \u0026lt; 0.05, H\u003csub\u003e0\u003c/sub\u003e of Nonstationary was rejected to accept H\u003csub\u003e1\u003c/sub\u003e of Stationary. The best ARIMA model together with the most appropriate parameters were identified using the \u003cem\u003eauto.arima\u003c/em\u003e function, \u003cstrong\u003emymodel_train\u003c/strong\u003e with the training data, and \u003cem\u003eOutcome_SSS\u003c/em\u003e given by running\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003emymodel_train\u003c/strong\u003e \u0026lt;- auto.arima(Outcome_SSS, ic=\u0026apos;aic\u0026apos;, trace=TRUE, approximation=FALSE) \u0026nbsp; \u0026nbsp;(1)\u003c/p\u003e\n\u003cp\u003eThe Ljung-Box (Portmanteau) test was performed to assess the residual and stationarity of the \u003cem\u003eauto.arima\u003c/em\u003e model based on the following hypotheses and assumptions (decision rule):\u003c/p\u003e\n\u003cp\u003eH\u003csub\u003e0\u003c/sub\u003e: No white noise (nonstationary)\u003c/p\u003e\n\u003cp\u003eH\u003csub\u003e1\u003c/sub\u003e: White noise (stationary)\u003c/p\u003e\n\u003cp\u003eIf the \u003cem\u003ep\u003c/em\u003e\u003cem\u003e\u0026nbsp;\u003c/em\u003e\u003cem\u003evalue\u003c/em\u003e is \u0026ge; 0.05, H\u003csub\u003e0\u003c/sub\u003e is rejected (Hyndman \u0026amp; Khandakar, 2008).\u003c/p\u003e\n\u003cp\u003eGiven that the computed \u003cem\u003ep value\u003c/em\u003e = 0.4522, which is \u0026gt; 0.05, H\u003csub\u003e0\u003c/sub\u003e of Nonstationary was rejected to accept H\u003csub\u003e1\u003c/sub\u003e of Stationary. Having confirmed the stationarity of (1), it was used as input for building the user-defined forecasting model, \u003cstrong\u003emyforecast_train\u003c/strong\u003e\u003cstrong\u003e,\u003c/strong\u003e\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003egiven by running\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003emyforecast_train\u003c/strong\u003e \u0026lt;- forecast(mymodel_train, level=c(95), h=1*12) \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; (2)\u003c/p\u003e\n\u003cp\u003ewhere level is the confidence level and h is the number of monthly forecasts. Therefore, the SSS values were predicted 12 months ahead using the model and (2) built with parameter combinations h=1*12. The graph of the SSS values predicted by the model was generated by running\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eautoplot\u003c/em\u003e\u003c/strong\u003e(myforecast_train) \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;(3)\u003c/p\u003e\n\u003cp\u003eafter running (2) successfully.\u0026nbsp;The\u0026nbsp;modelling\u0026nbsp;accuracy was computed in terms of R\u003csup\u003e2\u003c/sup\u003e, \u003cstrong\u003ersq\u003c/strong\u003e by running\u003c/p\u003e\n\u003cp\u003esss_obs1 \u0026lt;- myforecast_accuracy_Outcome_SSS$x\u003c/p\u003e\n\u003cp\u003esss_pred1 \u0026lt;- myforecast_accuracy_Outcome_SSS$fitted\u003c/p\u003e\n\u003cp\u003erss \u0026lt;- sum((sss_pred1 -\u0026nbsp;sss_obs1) ^ 2)\u003c/p\u003e\n\u003cp\u003etss \u0026lt;- sum((sss_obs1\u0026nbsp;- mean(sss_obs1)) ^ 2)\u003c/p\u003e\n\u003cp\u003ersq \u0026lt;- 1 - rss/tss \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ersq \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u003c/strong\u003e(4)\u003c/p\u003e\n\u003cp\u003ewhile the MAPE outcome of running\u003c/p\u003e\n\u003cp\u003emyforecast_accuracy_Outcome_SSS\u0026nbsp;\u0026lt;- Arima(Outcome_SSS,\u003c/p\u003e\n\u003cp\u003eModel=mymodel_train)\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;accuracy (myforecast_accuracy_Outcome_SSS) \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;(5)\u003c/p\u003e\n\u003cp\u003ewhere Outcome_SSS is the input time series SSS data and mymodel_train is the ML ARIMA model trained with the input of time series SSS data, which was utilized for validating the outcome of the above modelling accuracy.\u003c/p\u003e\n\u003cp\u003eThe forecasting accuracy in terms of the \u003cstrong\u003eRMSE\u003c/strong\u003e was computed and validated by computing the \u003cstrong\u003eMAPE\u0026nbsp;\u003c/strong\u003ewith the \u003cem\u003eMLmetrics\u003c/em\u003e for the best ARIMA ML model by running\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eRMSE\u003c/strong\u003e(sss_pred1, sss_obs1)\u003cstrong\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;\u003c/strong\u003e(6)\u003c/p\u003e\n\u003cp\u003eand\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eMAPE\u003c/strong\u003e(sss_pred1, sss_obs1)*100\u003cstrong\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u003c/strong\u003e(7)\u003c/p\u003e\n\u003cp\u003eImmediately after running (4) successfully, where sss_pred1 is the predicted SSS value and sss_obs1 is the actual satellite SSS for January-December 2021.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e3.5 Determination and Validation of ARIMA Model Accuracy for Modelling and Forecasting SSS\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eIn subsection 3.4, the accuracy of the built ML ARIMA model for\u0026nbsp;modelling\u0026nbsp;variations in SSS was computed by using the R\u003csup\u003e2\u003c/sup\u003e performance metric, which represents the amount of variation explained by the ML model. The forecasting accuracy was determined with the RMSE, a measure of accuracy that reveals the magnitude of the difference between the predicted and observed (actual) values. The validation of the modelling and forecasting accuracy of the best ML model in relation to error estimation, which is also known as residual variation, was also computed in terms of MAPE, a good measure of the absolute percentage difference between predicted and observed values. In general, the greater the R\u003csup\u003e2\u003c/sup\u003e value is, the greater the amount of variation explained by the ML model. Conversely, lower values of MAPE and RMSE indicate relatively good accuracy of forecasts made by the model. In terms of the interpretation of the error metrics in real-world applications, the MAPE seems to be the most versatile because it is usually computed in percentage (%) units. In addition, what should be considered an acceptable accuracy level seems to be properly documented for the MAPE. In this regard, a MAPE less than 10% is considered to indicate \u0026ldquo;high prediction accuracy\u0026rdquo; (Lewis, 1982; Ağbulut et al., 2021b; Ajibola-James, 2023). It should be underscored that the true test of an ML time series model\u0026rsquo;s performance is in accurately forecasting new (future) values. This is usually determined by the value of its performance metrics in forecasting new target values that are not included in the model\u0026rsquo;s training datasets.\u003c/p\u003e"},{"header":"4. Results And Discussion","content":"\u003cp\u003e\u003cstrong\u003e4.1 Data accuracy\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe accuracies of the relatively sparse SSS data over a geographical area of approximately 6.5\u0026deg; \u0026times; 4.5\u0026deg; in terms of the RMSD are 0.1279 psu and 0.1162 psu for the modelling dataset and forecasting dataset, respectively. The two RMSD values show a relatively high level of accuracy exceeding the SMAP missions\u0026rsquo; accuracy requirement of 0.2 psu by substantial margins of approximately 36.05% and 41.9%, respectively. It should be noted that relatively high accuracy was achieved by the rigorous supervised automatic data cleaning approach, which primarily involved deletion of the outliers induced by RFI and land contamination in the satellite dataset. This implies that the data preparation technique can reasonably affect the accuracy of the input dataset in a modelling and predictive study.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e4.2 Interannual variability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe interannual variability in the SSS data in terms of the SD is 0.2528. This shows that the iSSSv is relatively stable (predictable) given that it is approximately 74.72% less than 1 SD. This result shows that the dataset could be considered a viable input for the ARIMA model. However, to achieve \u0026ldquo;stationarity\u0026rdquo;, a basic assumption of the ARIMA model, the first-order differences of the data were taken as earlier mentioned in section 3.4. This implies that the order of differences that would be taken in a given input data for ML ARIMA modelling is a function of the SD value. Consequently, data variability assessment using the SD value should be considered an essential aspect of exploratory data analysis (EDA) in the process of building ML ARIMA models.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e4.3 Determination and Validation of the Best ARIMA Model\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe best ML ARIMA model with the most appropriate parameter that scored the minimum AIC value of 81.80972\u0026nbsp;was\u0026nbsp;ARIMA(0,1,2)(0,1,1)[12]. It was automatically determined from a variety of options of ARIMA models and the allied AIC values computed, which include\u0026nbsp;the following:\u003c/p\u003e\n\u003cp\u003eARIMA(2,1,2)(1,1,1)[12] \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;: Inf\u003c/p\u003e\n\u003cp\u003eARIMA(0,1,0)(0,1,0)[12] \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;: 93.4323\u003c/p\u003e\n\u003cp\u003eARIMA(0,1,1)(0,1,1)[12] \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;: 81.91669\u003c/p\u003e\n\u003cp\u003eARIMA(0,1,1)(0,1,0)[12]: 88.61325\u003c/p\u003e\n\u003cp\u003eARIMA(0,1,1)(1,1,1)[12] \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;: Inf\u003c/p\u003e\n\u003cp\u003eARIMA(0,1,1)(1,1,0)[12] \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;: 82.80056\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eARIMA(0,1,2)(0,1,1)[12]\u003c/strong\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; : \u003cstrong\u003e81.80972\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eARIMA(0,1,2)(0,1,0)[12] \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;: 89.46024\u003c/p\u003e\n\u003cp\u003eIn \u003cstrong\u003eFigure 4.3\u003c/strong\u003e, the \u0026ldquo;Training\u0026rdquo; side shows the result of using 60 monthly epochs (Jan., 2016-Dec., 2020) of the data to train the best ML ARIMA\u0026nbsp;model\u0026nbsp;for\u0026nbsp;modelling\u0026nbsp;variations in the SSS, while the adjoining \u0026ldquo;Forecast\u0026rdquo; side shows the result of 12 monthly epochs of\u0026nbsp;the\u0026nbsp;SSS forecast.\u003c/p\u003e\n\u003cp\u003eThe\u0026nbsp;results\u0026nbsp;of the preliminary automatic model selection\u0026nbsp;task show\u0026nbsp;that the AIC metric is an efficient approach for determining the best ML ARIMA model with the most appropriate\u0026nbsp;parameters. The result of the modelling accuracy assessment performed with R\u003csup\u003e2\u003c/sup\u003e is 0.8345281, while the result of its validation with MAPE is 0.7779%. The relatively high R2 value shows that the ML ARIMA model explained a relatively large amount of variation, while the relatively low MAPE value shows that the ML ARIMA model has a relatively high modelling accuracy.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e4.4 Determination and Validation of Forecasting Accuracy of the Best ARIMA Model\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable 4.4\u003c/strong\u003e: Results of time series forecasting of SSS for 12 months ahead using the trained ARIMA model and the observed (actual) satellite SSS values\u003c/p\u003e\n\u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e\u003cstrong\u003eForecast Period\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e(2021)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e\u003cstrong\u003eML ARIMA SSS\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e(psu)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e\u003cstrong\u003eObserved SSS\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e(psu)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eJanuary\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e33.11587\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e32.74971\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eFebruary\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e33.51149\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e33.16754\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eMarch\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e33.73193\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e33.11268\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eApril\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e33.84112\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e33.10024\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eMay\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e34.15953\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e33.57706\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eJune\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e34.77918\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e33.23043\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eJuly\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e35.00569\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e34.00293\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eAugust\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e35.03537\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e34.29564\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eSeptember\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e34.49434\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e33.31446\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eOctober\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e33.25123\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e31.69396\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eNovember\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e32.02951\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e30.93628\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eDecember\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e32.26378\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e31.1894\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003e\u003cbr\u003e\u003c/p\u003e\n\u003cp\u003eAccording to \u003cstrong\u003eTable 4.4\u003c/strong\u003e and \u003cstrong\u003eFigure 4.4,\u003c/strong\u003e the SSS values predicted by the best ML ARIMA model for the entire 12 months are greater than the observed satellite SSS values. This implies that the model has the tendency to overstate the SSS values in a predictive study, and the accuracy of such predicted values should be properly validated using appropriate interpretable metric(s). The forecasting accuracy of the best ML ARIMA model computed in terms of the RMSE is 0.9850, while the result of its validation in terms of MAPE is 2.7670%. Given that the RMSE is relatively difficult to interpret for such applications due to the squared nature of the measured error, the MAPE was utilized to validate the forecasting accuracy. The relatively low MAPE, which is approximately 3 times less than 10%, shows that the best ARIMA model has a relatively high forecasting accuracy.\u003c/p\u003e"},{"header":"5.\tConclusion and Recommendations","content":"\u003cp\u003e\u003cstrong\u003e5.1 Conclusion\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe use of sparse satellite time series SSS data from the Nigerian coastal zone as a case study for ML ARIMA for modelling and forecasting variations in ESP yields encouraging results, which imply that relatively sparse satellite time series data from at least 60 epochs (hourly, daily, weekly monthly or yearly) can be productively utilized for building a relatively accurate ML ARIMA model for modelling and forecasting variations in any ESP in any geographical area. A relative advantage of the time series model is that it does not require predictor (independent) variables to model variation and fit new (predicted) values. In this regard, the costs (in terms of the amount of data input, data processing time, and computer hardware) of implementing it are relatively low and affordable. The variation modelling accuracy that was validated with a MAPE of 0.7779% is more than 2 times greater (better) than the forecasting accuracy with a MAPE of 2.7670%. It should be underscored that such a difference in accuracy (in which the accuracy of the former exceeds that of the latter) is a normal experience in such applications of ML models because the observed data utilized for validating the accuracy of the latter are relatively new to the ML model.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e5.2 Recommendations\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eConsidering the relatively high accuracy of the ML ARIMA model coupled with its relatively low costs of implementation, the following are highly recommended.\u003c/p\u003e\n\u003col\u003e\n \u003cli\u003eThe ML model and its algorithm should be updated and adopted by stakeholders (particularly government agencies and aquatic entrepreneurs) as early warning decision support tools that will enable them to provide proactive and sustainable preventive measures to any current and future risks that may be posed by any ESP to humans and the environment. For example, the ML model built with SSS data can serve as a decision support tool for providing early warning information on the risk of upstream seawater intrusion to the drinking water supply, people\u0026rsquo;s health, sensitive plants such as rice and horticultural crop yield, and the environment.\u003c/li\u003e\n \u003cli\u003eFurther studies on the comparative assessment of the ML model with the one utilizing a relatively large number of monthly mean SSS satellite observations should be encouraged, as more satellite data observations are available due to the apparent relative advantages they have (a) for building and improving the accuracy of such ML training and testing models and (b) over the traditional approach to time series forecasting.\u003c/li\u003e\n \u003cli\u003eAdditionally, appropriate local and global funding that will facilitate prompt execution of the recommendations in (1) and (2) above should be equitably provided to reliable but relatively marginalized individual researchers, private research organizations and private/public research institutions in the geospatial and related industries in Nigeria as soon as possible.\u003c/li\u003e\n\u003c/ol\u003e"},{"header":"References","content":"\u003col\u003e\n \u003cli\u003eAğbulut, \u0026Uuml;., G\u0026uuml;rel, A. E., \u0026amp; Sarıdemir S. (2021b). Experimental investigation and prediction of performance and emission responses of a CI engine fuelled with different metal-oxide based nanoparticles\u0026ndash;diesel blends using different machine learning algorithms. \u003cem\u003eEnergy\u003c/em\u003e, 215:119076.\u003c/li\u003e\n \u003cli\u003eAjibola-James, O. (2023). \u003cem\u003eAssessment of sea surface salinity variability along Nigerian coastal zone using machine learning \u0026ndash; 2012-2021\u003c/em\u003e [Doctoral thesis, University of Nigeria, Nsukka, Enugu Campus].\u003c/li\u003e\n \u003cli\u003eAjibola-James, O., \u0026amp; Okeke, F. I. (2024). An approach for good modelling and forecasting of sea surface salinity in a coastal zone using machine learning LASSO regression models built with sparse satellite time-series datasets. Research Square. Preprint. https://doi.org/10.21203/rs.3.rs-4016353/v1\u003c/li\u003e\n \u003cli\u003eAjibola-James, O., Okeke, F. I., \u0026amp; Ojinnaka, O. C. (2023). Assessment of variability of sea surface salinity using integrated all-weather satellite data in a tropical coast (Nigerian coastal zone). \u003cem\u003eResearch Square\u003c/em\u003e. Preprint. https://doi.org/10.21203/rs.3.rs-3449318/v1\u003c/li\u003e\n \u003cli\u003eAnyikwa, O. B., \u0026amp; Martinez, N. (2012).\u003cem\u003e\u0026nbsp;Continental Shelf Act, 2012\u003c/em\u003e.\u003cem\u003e\u0026nbsp;T\u003c/em\u003ehe International Maritime Law Institute, IMO. https://imli.org/wp-content/uploads/2021/03/Obiora-Bede-Anyikwa.pdf\u003c/li\u003e\n \u003cli\u003eBenvenuto, D., Giovanetti, M., Vassallo, L., Angeletti, S., \u0026amp; Ciccozzi, M. (2020). Application of the ARIMA model on the COVID-2019 epidemic dataset. \u003cem\u003eData in Brief, 29\u003c/em\u003e. https://doi.org/10.1016/j.dib.2020.105340\u003c/li\u003e\n \u003cli\u003eBoutin, J., Chao, Y., Asher, W. E., Delcroix, T., Drucker, R., Drushka, K., Kolodziejczyk, N., Lee, T., Reul, N., Reverdin, G., Schanze, J., Soloviev, A., Yu,, L., Anderson, J., Brucker, L., Dinnat, E., Santos-Garcia, A., Jones, W., Maes, C., Meissner, T., Tang, W., Vinogradova, N., \u0026amp; Ward, B. (2016). Satellite and in situ salinity: understanding near-surface stratification and subfootprint variability. \u003cem\u003eBulletin of the American Meteorological Society, 97\u003c/em\u003e(8), 1391\u0026ndash;1407. https://doi:10.1175/bams-d-15-00032.1\u003c/li\u003e\n \u003cli\u003eBox, G.E.P. \u0026amp; Jenkins, G. (1970). \u003cem\u003eTime series analysis, forecasting and control\u003c/em\u003e. San Francisco: Holden-Day.\u003c/li\u003e\n \u003cli\u003eCGIAR Research Centers in Southeast Asia. (2016). \u003cem\u003eThe drought and salinity intrusion in the Mekong River Delta of Vietnam.\u003c/em\u003e https://cgspace.cgiar.org/rest/bitstreams/78534/retrieve/\u003c/li\u003e\n \u003cli\u003eChan-Lau, J. A. (2017). Lasso Regressions and Forecasting Models in Applied Stress Testing. \u003cem\u003eInternational Monetary Fund (IMF) Working Paper\u003c/em\u003e, WP/17/108. https://www.imf.org/~/media/Files/Publications/WP/2017/wp17108.ashx\u003c/li\u003e\n \u003cli\u003eCheung, Y.-W., \u0026amp; Lai, K. S. (1995). Lag order and critical values of the Augmented Dickey-Fuller test. \u003cem\u003eJournal of Business \u0026amp; Economic Statistics, 13\u003c/em\u003e(3), 277\u0026ndash;280. https://doi.org/10.2307/1392187\u003c/li\u003e\n \u003cli\u003eDinnat, E. P., Le Vine, D. M., Boutin, J., Meissner, T., \u0026amp; Lagerloef, G. (2019). Remote sensing of sea surface salinity: Comparison of satellite and in situ observations and impact of retrieval parameters. \u003cem\u003eRemote Sensing, 11\u003c/em\u003e(7). https://doi.org/10.3390/rs11070750\u003c/li\u003e\n \u003cli\u003eFattah, J., Ezzine, L., Aman, Z., el Moussami, H., \u0026amp; Lachhab, A. (2018). Forecasting of demand using ARIMA model. \u003cem\u003eInternational Journal of Engineering Business Management, 10\u003c/em\u003e. https://doi.org/10.1177/1847979018808673\u003c/li\u003e\n \u003cli\u003eGolitzen, K. G. (Ed.), Andersen, I., Dione, O., Jarosewich-Holder, M., \u0026amp; Olivry, J. (2005). \u003cem\u003eThe Niger River Basin: A vision for sustainable management\u003c/em\u003e. World Bank, Washington, DC. https://doi.org/10.1596/978-0-8213-6203-7\u003c/li\u003e\n \u003cli\u003eHyndman, R. J., \u0026amp; Khandakar, Y. (2008). \u003cem\u003eAutomatic time series forecasting: The forecast package for R. Journal of Statistical Software, 27\u003c/em\u003e(1), 1\u0026ndash;22. https://doi.org/10.18637/jss.v027.i03\u003c/li\u003e\n \u003cli\u003eHyndman, R.J., \u0026amp; Athanasopoulos, G. (2018). \u003cem\u003eForecasting: Principles and practice, 2nd edition, OTexts\u003c/em\u003e. Melbourne, Australia. Retrieved \u0026lrm;August 31, \u0026lrm;2022, from https://otexts.com/fpp2/\u003c/li\u003e\n \u003cli\u003eHyndman, R.J., \u0026amp; Athanasopoulos, G. (2021). \u003cem\u003eForecasting: Principles and practice, 3rd edition, OTexts\u003c/em\u003e. Melbourne, Australia. Retrieved \u0026lrm;August 31, \u0026lrm;2022, from https://otexts.com/fpp3/\u003c/li\u003e\n \u003cli\u003eJoint Propulsion Laboratory. (2020). \u003cem\u003eJPL CAP SMAP Sea Surface Salinity Products\u003c/em\u003e (PO.DAAC; Version V5.0) [Dataset]. JPL, CA, USA. Retrieved \u0026lrm;July 10, \u0026lrm;2022, from https://doi.org/10.5067/SMP50-3TMCS\u003c/li\u003e\n \u003cli\u003eKotu, V., \u0026amp; Deshpande, B. (2019). Time Series Forecasting. \u003cem\u003eData Science, Elsevier,\u003c/em\u003e 395\u0026ndash;445. https://doi.org/10.1016/B978-0-12-814761-0.00012-5\u003c/li\u003e\n \u003cli\u003eLewis, C. D. (1982). \u003cem\u003eIndustrial and business forecasting methods: A radical guide to exponential smoothing and curve fitting\u003c/em\u003e. London: Butterworth Scientific.\u003c/li\u003e\n \u003cli\u003eLiu, R., \u0026amp; Gillies, D. F. (2016). Overfitting in linear feature extraction for classification of high-dimensional image data. \u003cem\u003ePattern Recognition, 53\u003c/em\u003e, 73\u0026ndash;86. https://doi.org/10.1016/j.patcog.2015.11.015\u003c/li\u003e\n \u003cli\u003eNguyen, P. T. B., Koedsin, W., McNeil, D., \u0026amp; Van, T. P. D. (2018). Remote sensing techniques to predict salinity intrusion: Application for a data-poor area of the coastal Mekong Delta, Vietnam. \u003cem\u003eInternational Journal of Remote Sensing, 39\u003c/em\u003e(20), 6676\u0026ndash;6691. https://doi.org/10.1080/01431161.2018.1466071\u003c/li\u003e\n \u003cli\u003eRahman, A., \u0026amp; Hasan, M. M. (2017). Modelling and forecasting of carbon dioxide emissions in Bangladesh using Autoregressive Integrated Moving Average (ARIMA) Models. \u003cem\u003eOpen Journal of Statistics, 7\u003c/em\u003e, 560\u0026shy;\u0026ndash;566. https://doi.org/10.4236/ojs.2017.74038\u003c/li\u003e\n \u003cli\u003eRaudys, S. J., \u0026amp; Jain, A. K. (1991). Small sample size effects in statistical pattern recognition: Recommendations for practitioners. \u003cem\u003eIEEE Transactions on Pattern Analysis and Machine Intelligence, 13\u003c/em\u003e(3), 252\u0026ndash;264. https://doi.org/10.1109/34.75512\u003c/li\u003e\n \u003cli\u003eSneath, S. (2023, September 23). \u003cem\u003eLouisiana: New Orleans declares emergency over saltwater intrusion in drinking water\u003c/em\u003e. The Guardian. https://www.theguardian.com/us-news/2023/sep/22/louisiana-drought-drinking-water-mississippi-river-saltwater-new-orleans\u003c/li\u003e\n \u003cli\u003eSuleiman, S., \u0026amp; Sani, M. (2020). Application of ARIMA and Artificial Neural Networks Models for daily cumulative confirmed Covid-19 prediction in Nigeria. \u003cem\u003eEquity Journal of Science and Technology, 7\u003c/em\u003e(2), 83\u0026ndash;90. https://www.equijost.com/fulltext/14-1594712555.pdf?1681453732\u003c/li\u003e\n \u003cli\u003eTrung, N. H., Hoanh, C. H., Tuong, T. P., Hien, X. H., Tri, L. Q., Minh, V. Q., Nhan, D. K., Vu, P. T., \u0026amp; Tri, V. P. D. (2016). \u003cem\u003eClimate Change Affecting Land Use in the Mekong Delta: Adaptation of Rice-Based Cropping Systems (CLUES) Theme 5: Integrated Adaptation Assessment of Bac Lieu Province and Development of Adaptation Master Plan\u003c/em\u003e. https://www.researchgate.net/publication/301612048_Climate_change_affecting_land_use_in_the_Mekong_Delta_Adaptation_of_rice-based_cropping_systems_CLUES_ISBN_978-1-925436-36-5\u003c/li\u003e\n \u003cli\u003eUnited Nations. (n.d.). \u003cem\u003eUnited Nations Convention on the Law of the Sea.\u0026nbsp;\u003c/em\u003ehttps://www.un.org/depts/los/convention_agreements/texts/unclos/unclos_e.pdf\u003c/li\u003e\n \u003cli\u003eUsoro, E. (2010). \u003cem\u003eEncyclopedia of the World\u0026rsquo;s coastal landforms, 1\u003c/em\u003e, p. 949. London.\u003c/li\u003e\n \u003cli\u003eZabbey, N., Giadom, F. D., \u0026amp; Babatunde, B. B. (2019). Nigerian coastal environments. In C. Sheppard (Ed.), \u003cem\u003eWorld Seas: An environmental evaluation\u0026nbsp;\u003c/em\u003e(pp. 835\u0026ndash;854). Elsevier. https://doi.org/10.1016/B978-0-12-805068-2.00042-5\u003c/li\u003e\n \u003cli\u003eZhao, J., Temimi, M., \u0026amp; Ghedira, H. (2017). Remotely sensed sea surface salinity in the hypersaline Arabian Gulf: Application to landsat 8 OLI data. \u003cem\u003eEstuarine, Coastal and Shelf Science, 187\u003c/em\u003e, 168\u0026ndash;177. https://doi.org/10.1016/j.ecss.2017.01.008\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":true,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Earth’s surface phenomenon, sea surface salinity, machine learning arima, variations modelling, time series forecasting","lastPublishedDoi":"10.21203/rs.3.rs-4056329/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4056329/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThe tropical coasts, particularly the Nigerian coastal zone, have been traditionally undersampled using appropriate in situ methods and understudied using appropriate remote sensing techniques despite the proliferation of satellite missions for earth observation. The contemporary all-weather satellite observations of phenomena of interest are characterized by relatively sparse time series data that discourage their utilization as input in building efficient machine learning (ML) models for both exploratory and predictive purposes. Additionally, data-poor areas usually have difficulties meeting the multiple predictor variable requirement of building appropriate multivariate ML regression models. We utilized a relatively sparse sea surface salinity (SSS) dataset from the Soil Moisture Active Passive Mission (SMAP) satellite products (Jan., 2016-Dec., 2021) for this study. We determined the accuracy and variability of the relatively sparse SSS data for the study area to be approximately 6.5° × 4.5°. We built ML autoregressive integrated moving average (ARIMA) models and determined and validated the best model for modelling (Jan., 2016-Dec., 2020) and forecasting (Jan.-Dec., 2021) Earth’s surface phenomenon (ESP) using relatively sparse SSS data as a case study. We show root mean squared differences (RMSDs) of 0.1279 psu and 0.1162 psu for modelling and forecasting data accuracy, respectively. We show a standard deviation (SD) of 0.2528 for the interannual SSS variability (iSSSv). We show the modelling accuracy with an R-squared (R\u003csup\u003e2\u003c/sup\u003e) of 0.8345281 and its validation with a mean absolute percentage error (MAPE) of 0.7779% and the forecasting accuracy with a root mean squared error (RMSE) of 0.9850 psu and its validation with a MAPE of 2.7670% for the best ML ARIMA model. The relatively low SD value suggests a relatively stable iSSSv along the Nigerian coastal zone. The R\u003csup\u003e2 \u003c/sup\u003eand MAPE results suggest relatively high modelling and prediction accuracy. The results imply that relatively sparse satellite time series data of at least 60 epochs (hourly, daily, weekly, monthly or yearly observations) can be utilized for building a relatively accurate ML ARIMA model for modelling and forecasting variations in any ESP in any geographical area.\u003c/p\u003e","manuscriptTitle":"Machine learning ARIMA for modelling and forecasting variations inearth’s surface phenomenon using sparse time series satellite data—acase study of sea surface salinity in the Nigerian coastal zone","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-03-11 18:32:47","doi":"10.21203/rs.3.rs-4056329/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"3d79f149-7cc2-41fd-9d89-85ff589af93b","owner":[],"postedDate":"March 11th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2024-03-11T18:32:47+00:00","versionOfRecord":[],"versionCreatedAt":"2024-03-11 18:32:47","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-4056329","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4056329","identity":"rs-4056329","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00