A Framework for Forecasting Demand of General Time Series Data Using Regression Models and Machine Learning

preprint OA: closed
Full text JSON View at publisher
Full text 119,005 characters · extracted from preprint-html · click to expand
A Framework for Forecasting Demand of General Time Series Data Using Regression Models and Machine Learning | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article A Framework for Forecasting Demand of General Time Series Data Using Regression Models and Machine Learning Islam M. Hammam, Amin K. El-Kharbotly, Yomna M. Sadek This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6515650/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 04 Nov, 2025 Read the published version in Scientific Reports → Version 1 posted 26 You are reading this latest preprint version Abstract Accurate demand forecasting is essential for modern business success, as product demand follows various patterns throughout its life cycle and becomes increasingly complex due to consumer fluctuations. This paper presents a statistical demand forecasting framework that integrates both classical and machine learning methods to predict demand patterns across different phases of the product life cycle, focusing on the declining phase. Machine learning techniques are leveraged for their ability to handle complex data patterns. The framework allows each method to be applied individually or combined into an ensemble model. A grid search algorithm is utilized to optimize the weights of each forecasting technique, improving the ensemble model's performance based on the tested data. Validation across five datasets demonstrates the framework's effectiveness, with results showing that the ensemble model outperforms traditional approaches when dealing with mixed demand patterns. Physical sciences/Engineering/Mechanical engineering Physical sciences/Mathematics and computing/Statistics Mixed Data Patterns XGBoost Algorithm Ensemble Model Product Life Cycle Operations Management Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 1 Introduction Demand forecasting is the process of accurately estimating the demand for a product by considering various independent input variables and their relationship with the demand. It is the cornerstone of supply chain management and prediction of product life cycle. Many factors that affect the demand are random, uncertain, fuzzy, and have a nonlinear relation with the demand. This makes it challenging to establish precise mathematical models [ 1 ]. For decades, time series forecasting has been studied across different fields such as statistics, econometrics, mathematics, engineering, …etc. Despite their good results in forecasting, novel statistical methods (like ARIMA family) are limited to the scope of linear and near linear assumptions. Artificial intelligence can develop algorithms that can improve performance by experience. Machine learning as a subfield of artificial intelligence can make decisions, predictions and forecasting based on historical data without the limitations of linear assumptions. Machine learning does not need to be programmed explicitly for a certain task [ 2 ]. Instead, it provides an effective solution where traditional approaches may fall short, allowing prediction or decision-making based solely on data-driven information. Research on forecasting methods has been conducted to utilize both regression models and machine learning algorithms, comparing their performance with common benchmark models such as Autoregression Integrated Moving Average (ARIMA). For instance, Villegas et al. [ 1 ] employed Support Vector Machine (SVM) to choose the most suitable prediction model from several predictive models for scenarios that involve unstable demand in a short period, while Ji et al. [ 3 ] introduced a three-stage hybrid forecasting method based on Clustering, Extreme gradient Boosting (XGBoost), and ARIMA which was tested against multi-featured e-commerce datasets along with other models showing exceptional performance compared to traditional and machine learning methodologies. Pin Li and Jin-Suo Zhang [ 4 ] developed a hybrid model that combines ARIMA with XGBoost to forecast China's energy supply security. They compared the accuracy of their ARIMA-XGBoost hybrid model against an ARIMA-only approach based on mean absolute percentage error (MAPE) results, which were lower than 4.5%. As a result, they concluded that the hybrid model was more precise and closer to actual outcomes. Yan Wang and Yuankai Guo [ 5 ] decomposed the stock historical data set using discrete wavelet transform (DWT) into - a partial data set and an error-related dataset – with the used of the grid search algorithm to optimize the XGBoost parameters and construct the GSXGB model. Combining an ARIMA model with GSXGB in a hybrid model to forecast different dataset portions. Among all candidate models of ARIMA, XGBoost, GSXGB, DWT-ARIMA-XGBoost and DWT-ARIMA-GSXGB, the last one showed better accuracy and generalization ability according to the simulation results. A data-driven analytics framework was developed by Wenhan Fu and Chen-Fu Chien [ 6 ] for predicting the demands of intermittent electronics components. To counteract discontinuous demand patterns, temporal aggregation and a combination forecast using Syntetos-Boylan approximation, ARIMA, and Recurrent Neural Network (RNN) were employed. The findings indicate that this integrated approach with temporal aggregation can effectively facilitate flexible decision-making to support supply chain innovation in electronics. Similarly, Ping Jiang and Ranran Li [ 7 ] proposed a composite model for forecasting electricity demand. Their modeling concept exhibited an impressive ability to detect seasonal relationships within electricity demand data as well as superior performance accuracy compared to benchmark models. Yanzhi Duan and Sensheng Li [ 8 ] tested the result of XGBoost algorithm for forecasting short term urban gas daily demand against other machine learning models considering some features affecting the demand. The XGBoost returned excellent results against multi regression, random forest, and Support Vector Machine (SVM). Aswanuwath, L [ 9 ] introduced a combination model to predict daily electricity peak loads. The proposal utilized variational mode decomposition (VMD) and fast Fourier transform (FFT). These methods were employed for data decomposition and identification of seasonal patterns. An empirical mode decomposition algorithm was integrated within VMD, serving to establish the optimal level of disintegration required by the model. In order to capture important input variables that would impact forecasting accuracy, stepwise regression combined with similar-day selection methodologies were used during variable selection. This resulted in improved prediction performance while reducing computation time along with minimizing neural network structure requirements as well. O. Ozdemir and C. Yozgatligil[ 10 ] compared the forecasting performance of various machine learning methods - Random Forest, Support Vector Regression, XGBoost, Bayesian Neural Networks (BNN), RNN, Long Short-Term Memory (LSTM), and Feedforward Neural Networks (FFNN) - and traditional statistical - Naive and Seasonal Naive Methods, S/ARIMA, Exponential Smoothing, TBATS, Bayesian Exponential Smoothing Models with Trend Modifications, and STL Decomposition – and hybrid approaches that is a combinations of statistical and machine learning methods. The authors aim to create a comprehensive forecasting guide that considers different methods against various time domains. The findings indicate that machine learning methods generally outperform traditional statistical methods in forecasting accuracy, particularly for complex time series data. However, it also notes that the hybrid approach is not always the best option, and that the performance of models can be influenced by the characteristics of the time series being analyzed. E. Çaglayan-Akay and K.H. Topal[ 11 ] evaluated the forecasting performance of various models for predicting electricity consumption in Türkiye The research specifically aims to compare traditional single models, such as SARIMA, with hybrid models that combine linear and nonlinear forecasting techniques, including recent models developed by Wang et al. and Khashei & Bijari. The findings indicate that the Khashei & Bijari hybrid model is particularly effective in capturing the complexities of the electricity consumption series, outperforming both the Zhang model and the single models. The study concludes that traditional models may not adequately address the nonlinear characteristics of the data, highlighting the importance of using hybrid approaches for more accurate forecasting. It is evident that the previous research exhibits single and combined forecasting techniques that handle different demand patterns separately (trend, seasonal, linear, and nonlinear patterns). However, no attention was directed towards forecasting mixed demand patterns. In response to this, a demand forecasting framework has been proposed that uses both regression models and machine learning models to forecast the demand of general time series data. The framework is also utilized to forecast demand along different phases of the product life cycle as an important application, focusing the stage of decline as the main challenge among the other phases. The regression model addresses linear patterns in the data, while the machine learning model addresses non-linear demand pattern. In the present work, a framework integrates the outcomes of the regression model and the machine learning techniques through a weighted average ensemble (WAE) algorithm to benefit from the strengths of both models for mixed patterns. This framework will help forecast different phases of the product life cycle. The performance of the models will be evaluated using the most popular metrics; mean absolute error (MAE), mean square error (MSE), root mean square error (RMSE), and mean absolute percentage error (MAPE). 2 The Developed Forecasting Framework The proposed framework commences by the assumption that the data to be forecast primarily follows a linear pattern. Then, data undergoes tests to grasp seasonal and stationary patterns. Subsequently, the framework determines the optimal regression model for generating forecasts. Once the forecasts are generated, the residuals are scrutinized to identify any patterns that the regression model may have missed. The absence of patterns validates the fundamental assumption of data linearity. If any patterns are identified in the data, the framework proceeds to create forecasts using XGBoost, followed by the development of the ensemble model to assess its performance against regression and XGBoost models. However, the framework has certain limitations. The framework is more suited for medium-to-large time-series datasets with smooth and erratic demand patterns. It was not tested against very small-sized datasets – less than 100 observations – or those characterized by intermittent and lumpy demand patterns. This is because the XGBoost model is typically more data-hungry and may not perform optimally with very small datasets [12]. The framework returns the forecast of the best model based on the error measures. The proposed framework is detailed below and is depicted in Fig. 1. 2.1 Data Preparation Generally, the first step of the framework is data cleaning, indexing, and formatting. Subsequently, two tests are conducted to identify stationarity and seasonality demand patterns. This step is pivotal for making the right model selection in the subsequent step of the framework. Stationarity tests assess if a time series data's statistical properties remain constant over time. This framework uses two of the most common methods; Augmented Dickey Fuller test (ADF) and rolling statistics. In the ADF test, the p-value helps determine whether the data is stationary; if the p-value is below a critical threshold of 0.05, the null hypothesis of non-stationarity is rejected, indicating the data is stationary. Rolling statistics involve calculating a moving average or moving variance over a defined window of time, allowing for a visual assessment of stationarity by observing whether the mean and variance remain constant over time. If there is a conflict in both results, the seasonal decomposition (SD) breaks the tie. The existence of trend and/or seasonal patterns indicates data non-stationarity. Using multiple tests increases robustness and reduces the risk of model misidentification. Seasonality tests identify the recurring patterns or fluctuations that occur at regular intervals along the data set. These tests consider various methods; seasonal decomposition (SD), autocorrelation function (ACF) and partial autocorrelation function (PACF). If the analysis proves seasonality in data, a periodogram is used to identify the seasonal period. Afterwards, data is divided into sets. For the regression models, data is divided into two sets based on the common practice: training and testing sets in percentages 80% and 20% respectively [13]. For the XGBoost algorithm, the data is divided into training, validation, and testing sets in percentages 65%, 15% and 20% respectively to maintain the same size for the test data and have enough data for training process as well. 2.2 The Regression Models The regression models are statistical models that are used for predicting future values in time series by modeling the data's underlying patterns, including autoregression, integration (differencing), and moving averages based on past observations [14]. The ARIMA model stands out as one of the most widely used models for forecasting and is considered as a benchmark model in many studies [15], [16]. Based on the tests conducted during the preparation phase, the framework selects the most appropriate regression model among. For stationary, non-seasonal data, both the autoregressive (AR) and autoregressive moving average (ARMA) models are utilized. For non-stationary, non-seasonal data, the autoregressive integrated moving average (ARIMA) model is chosen. Meanwhile, for non-stationary, seasonal data patterns, the seasonal autoregressive integrated moving average (SARIMA) model is selected. The explanation of the models and their parameter selection were detailed by Box et al. [14]. However, for ARIMA and SARIMA models, the model is constructed twice as M1 and M2: once with the assistance of the stepwise function and another without it. The results of both are compared to selecting the model with the best performance based on the objective function of minimum AIC. 2.3 Regression Model Diagnostics This step is very crucial in the framework as it examines the residuals of the model to examine the initial hypothesis of data linearity to confirm or reject it. If the analysis of the residuals indicates a clear pattern, this is an indicator that the data has a nonlinear pattern. As the regression model managed to handle only the linear and seasonal patterns of the data – if existed-the framework utilizes the XGBoost algorithm to perform new forecasts. In this context, the “plot_diagnostics” function from the “statsmodels” library in Python is used to visualize the diagnostic plots for the fitted time series model. The four key diagnostic plots that “plot_diagnostics” generate are standardized residuals over time, histogram plus estimated density of standardized residuals, normal Q-Q plot, and correlogram. 2.4 The XGBoost Forecasting Model XGBoost is an advanced machine learning technique based on the gradient boosting framework that was proposed by Chen and Carlos [12] as an improved Gradient Boosting Decision Tree (GBDT). XGBoost is the selected machine learning technique in this study due to its high performance in forecasting nonlinear tabular datasets of various sizes and it consistently achieves state-of-the-art results [17]. The proposed XGBoost model employs hyperparameters, with parameter optimization carried out through a grid search method, enhancing its overall performance. The model passes through five main steps: a. Features extraction : Creating features using lag variables is a common technique employed in time series analysis to capture the temporal dependencies between past and future observations of a time series. The fundamental concept involves generating new features, which correspond to columns in a dataset, to represent the values of the time series at different time lags. After examining the ACF and PACF of the time series, significant lags were tested to create lagged features, also known as time lags or lag variables. b. Splitting the data : The data set is divided into train, validation, and test sets as explained in section 2.1. The train set is utilized for model training and algorithm parameter tuning. Subsequently, the validation set is employed to assess the model's performance based on the selected objective function, which in this case is RMSE. It also facilitates early stopping for the XGBoost model, enabling training to halt if the performance on the validation set begins to deteriorate. This serves to prevent overfitting and saves training time by avoiding unnecessary iterations. c. Hyperparameters selection : There are around 35 main hyperparameters for the XGBoost model. Those hyperparameters can be categorized into three types: General Parameters, Booster Parameters, Learning Task Parameters [18]. In this study six hyperparameters were selected. These are the most important and common ones [17]. Table 1 lists the hyperparameters. 1. Learning rate. 2. Maximum depth of the tree. 3. Number of trees. 4. Colsample_bytree. 5. Reg_alpha. 6. Subsample. d. Setting hyperparameters grid : This is accomplished using the “GridSearchCV” function from Python's scikit-learn library from defined grid. This involves specifying a grid of values for each hyperparameter to reduce the tuning time. Table 1 column 2 shows the selected grids for each hyperparameter and the selected values for each parameter for the data set. It also shows the effect of the selected hyperparameter values on the model. e. Hyperparameters tuning : Using the defined hyperparameters and grids, the cross validation in this proposed framework is done through 3 folds. The optimal hyperparameters – shown in Table 1 – are selected based on their performance, with the minimum MSE as the evaluation metric. f. Feature importance : This is a method utilized to ascertain which features in a dataset exert the most significant impact on the target variable in a predictive modeling scenario. It aids in comprehending the relationship between features and the target variable. F-score (or frequency) is a straightforward measure used in this study for measuring the importance of the features to show how often a feature contributes to partitioning the data. Once the hyperparameters of the model are set, the forecast is obtained. Table 1 XGBoost hyperparameters Hyperparameter Grid Value Comment Learning rate [0.01, 0.1] 0.1 moderate value, gives relatively gradual learning, prevents overfitting Maximum depth of the tree [3, 4, 5] 5 Moderate depth gives complex patterns without overfitting Number of trees [100, 200, 300] 200 Moderate boosting rounds with low learning rate for better learning Colsample_bytree [0.8, 1] 0.8 Moderate value, increases robustness, reduces risk of overfitting Reg_alpha [0.001, 0.01] 0.001 Slight regularization, promots sparsity and preventing overfitting Subsample [0.6, 0.8] 0.6 Low to moderate value increasing robustness and randomness 2.5 The Weighted Average Ensemble (WAE) Model Beside the results of the regression model and the XGBoost algorithm, the WAE is used to give forecasts that combine both results. According to Brownlee [18] “The weighted average or weighted sum ensemble is an extension over voting ensembles”. Each model is assigned a weight determined through a grid search process with weights ranging from 0 to 1 in increments of 0.1. The framework systematically explores the optimal combination of weights that minimize the root mean square error (RMSE). 2.6 Models Evaluation To measure the performance of the models used in this study, the test set of each dataset is compared to its forecast. The framework uses the most popular error metrics; MSE, RMSE, MAE, and MAPE. The best candidate model is the one that results in minimum error. 3 Computational Results and Discussion The proposed framework was applied to forecast demand for five data sets obtained from [19], [20]. The datasets were selected to represent the different demand patterns for the different stages of the product life cycle curve. The conventional curve comprises four primary stages: introduction, growth, maturity, and decline. The focal points of this research are the challenging maturity and decline stages, characterized by shifts in the linear trend. 3.1 Data Sets Under Study Five data sets (DS1, DS2, DS3, DS4, and DS5) were selected for this study to represent three main demand patterns: linear, nonlinear, and mixed pattern (the pattern varies between linear and non-linear) as shown in Fig. 2. These datasets simulate three specific phases of the product life cycle: Phase1 – Initial Demand: represents the introduction and the growth stages. Phase2 – Saturated Demand: represents the maturity phase. Phase3 – Diminishing Demand: represents the declining phase. DS1 represents the entire PLC curve, with the test set is a part the diminishing demand phase – as the whole phase starts from the star mark shown in Fig. 2.a. DS2 is a segment of the maturity phase, with a test set similar to DS1 case. DS3 captures the transition from the growth to the maturity stage, with the test set located in the saturated demand phase. DS4 exclusively covers the maturity phase, while DS5 focuses on the introduction and growth stages, with the test set in the initial demand phase. Figure 2 shows the five datasets, each divided into a train and a test set in percentages 80% and 20%. Next, the stationarity test was conducted revealing that four sets; DS1, DS2, DS3, and DS5 are non-stationary, whereas DS4 are stationary. Next, the seasonality test was conducted on the non-stationary data sets to reveal that DS1, DS2, and DS3 have seasonal trends. According to the proposed framework, forecasting for the first three sets will take place using SARIMA, whereas DS4 will be forecasted using AR and ARMA. For DS5, the selection is the ARIMA model. These results are tabulated in Table 2. Table 2 Stationarity and seasonality tests results with regression model selection Data set P-Value Stationary? Seasonal? Seasonal Period Model used DS1 0.4886 No Yes 12 SARIMA DS2 0.9986 No Yes 12 SARIMA DS3 0.4103 No Yes 12 SARIMA DS4 3.4570e-05 Yes No - AR & ARMA DS5 0.1142 No No - ARIMA 3.2 Framework Results for DS1 The results of applying the proposed framework on DS1 are presented and discussed in full detail. DS1 represents Phase3 in the product lifecycle curve, which is the declining phase. 3.2.1 The Regression Model According to the proposed framework, there is an initial hypothesis of data linearity. When applying the regression models on DS1, the auto.arima function from ‘pmdarima’ library is employed to automatically find the best combinations of regression model hyperparameters based on the objective function of minimum AIC, resulting in two sets of combinations: Model One or M1 using the stepwise function, and Model Two or M2 without the stepwise function. The selected models and their respective hyperparameters, along with the corresponding AIC and BIC values, are displayed in Table 3. Figure 3.a shows the forecasts for DS1 using regression model. The figure shows only the test portion of the test, with the actual data (in blue) and the forecast (in green). The curves obtained from this step have been utilized to assess the performance of forecasting models with the datasets. Table 3 Results of regression models hyperparameters for DS1 Selected Model M1 Hyperparameters AIC BIC M2 Hyperparameters AIC BIC SARIMA (0,1,2) (1,0,1) [12] 4245.2 4262.9 (2,1,0) (1,0,1) [12] intercept 4270.7 4292.0 3.2.2 Diagnostics of Regression Model Results The results obtained by the regression model are now tested to accept or reject the initial assumption of data linearity using the "plot_diagnostics" function. Figure 4 illustrates the outcomes of this step for DS1, featuring four distinct graphs: Standardized residuals over time show the model couldn’t handle the patterns in the data. Histogram plus estimated density of standardized residuals, with a Normal (0,1) density plotted for reference. As the two plots were not identical, then there is a pattern in the residuals. Normal Q-Q plot, with Normal reference line: all the blue dots should fall perfectly in line with the red line for no pattern in residuals. The graph suggests a skewed distribution. Correlogram (i.e., ACF plot): displays a correlation at lag 5. It implies that there is some pattern in the residual errors which are not explained by the regression model. From the interpretation of the graphs, the residual errors have a pattern, and this indicates nonlinearity in the data set. Hence, XGBoost model will be used to handle nonlinear data. 3.2.3 The XGBoost Forecasting Model In this study, the F-score of each lag was observed, as depicted in Fig. 5. The graph considers both the frequency of a feature's appearance in tree nodes and the average gain - or improvement in model performance - contributed by splits involving that feature. Lag 5 represents the most important feature, while other lags, such as lag 1, exhibit a high F-score value, indicating a strong time dependency within the data. The results obtained from the XGBoost model following the previously conducted hyperparameter tuning step are displayed in Fig. 3.b. The curve illustrates the proposed XGBoost model's capability to handle declining data and overcome the level change observed in the test set. 3.2.4 The WAE Model Despite the visual comparison indicating the superiority of the XGBoost model over the SARIMA model for DS1, the framework proceeds with its subsequent steps to generate forecasts using the WAE model. As identified earlier during the model diagnostic stage, the analysis revealed the presence of mixed patterns in the data for DS1, encompassing both linearity and nonlinearity. The WAE model is specifically crafted to enhance performance when dealing with such diverse data patterns. The WAE model assigned 0.3 and 0.7 weights for SARIMA and XGBoost models respectively. Forecast results are shown in Fig. 3.c (in black). The figure shows a competitive result achieved by the WAE model. Notably, the weight of XGBoost is larger than that of the SARIMA model, influenced by the level change in the test set. 3.2.5 Models Evaluation Performance measures for the three models are calculated and summarized in Table 4. A comparison of these models reveals that the developed WAE model, with a weight distribution of 0.3 for SARIMA and 0.7 for XGBoost, exhibits a remarkable ability to forecast the the decline phase of the product life cycle. This is followed by the XGBoost model, with SARIMA trailing as the least performing model. Table 4 Models evaluation results for DS1 DS1 MSE RMSE MAE MAPE SARIMA 6,980,822 2,642 2,019 0.14% XGBoost 3,307,421 1,818 1,400 0.08% WAE 2,721,807 1,649 1,325 0.08% 3.3 Framework Results for DS2 DS2 resembles the early stage of Phase3, the start of the declining stage in the product life cycle. As shown in Table 2, DS2 has a seasonal pattern. The SARIMA model is used for forecasting. Figure 6a shows the forecast (in green) versus the test portion of the dataset (in red). The diagnostic stage reveals a pattern in residuals, prompting the framework to proceed with generating forecasts using the XGBoost model shown in Fig. 6b. It is noticed that the results of both models are close. Based on the grid search for the weights of the WAE model, equal weights were given to both models and the forecast is in Fig. 6c (in black). Table 5 further demonstrates that the WAE model exhibits the best performance, with the SARIMA model closely trailing due to the limited impact of nonlinearity in this case. Table 5. Models evaluation results for DS2 MSE RMSE MAE MAPE Regressive 32,665,469,191,440 5,715,371 4,762,220 0.32% XGBoost 32,705,637,806,908 5,718,884 4,945,515 0.33% WAE 32,241,403,606,754 5,678,151 4,825,182 0.32% 3.4 Framework Results for DS3 DS3 represents Phase1, wherein the curve progresses from the introduction stage through growth and ultimately reaches maturity, with the target of the forecast being the maturity phase. Applying the SARIMA model resulted in the forecasts shown in Fig. 7a (in green). The residual analysis reveals no clear pattern in the data, confirming the linearity assumption. While the framework stops at this stage, we proceed with the following stages of the framework for validation reasons. The framework should not give forecasts using the XGBoost (in red) and the WAE model (in black), however, the results are given in Fig. 7. The grid search process for the WAE model determines the optimal weights as 100% for SARIMA and 0% for XGBoost. This matches our hypothesis that the data sets with linear pattern are best solved using the regression models, and that the machine learning model is not promising for the linear data. These results proved to be the best given the values in Table 6, confirming that SARIMA is the best candidate for DS3. Table 6. Models evaluation results for DS3 DS3 MSE RMSE MAE MAPE Regressive 8.23 2.87 2.28 0.02% XGBoost 173.97 13.19 10.23 0.09% WAE 8.23 2.87 2.28 0.02% 3.5 Framework Results for DS4 and DS5 DS4 and DS5 are two cases for erratic demand patterns. DS4 represents another instance of Phase2 that exclusively focuses on portraying the maturity stage, while DS5 represents Phase1 or the end of the introduction stage where the curve starts to take off. DS5 presents an additional challenge due to a significant shift in demand levels toward the end of the period under study. After obtaining results from the regression models – displayed in Figs. 8.a for DS4 and 9.a for DS5 (both in green) – residual analysis reveals the presence of nonlinearity in the data suggesting the necessity to proceed further in the framework stages. The XGBoost forecast demonstrates the notable ability of the model to predict such data as shown in Fig. 8.b and Fig. 9.b (in red). The WAE model, in those cases, is constructed with a weight of 0 for the regression models and 1 for the XGBoost model. As expected, model evaluation results presented in Table 7 indicate that XGBoost exhibits the best performance for this case. Table 7. Models evaluation results for DS4 DS4 MSE RMSE MAE MAPE Regressive 122,274 349.6 302.06 1.18% XGBoost 3,549 59.57 44.39 0.13% WAE 3,549 59.57 44.39 0.13% Table 8. Models evaluation results for DS5 DS5 MSE RMSE MAE MAPE Regressive 3510.13 59.25 55.81 0.63% XGBoost 54.57 7.39 5.65 0.06% WAE 54.57 7.39 5.65 0.06% 4 Conclusion and Future Work In this research, a framework is proposed for forecasting general time series data, employing an ensemble model that combines two methods: regression methods and machine learning algorithms. Within the ARIMA family, the output of selected regression models and XGBoost as a machine learning technique are combined to generate a new forecast. Initially, the framework assumes data linearity, and the appropriate ARIMA method is chosen for data forecasting. The results are then evaluated using residual diagnostic tools to assess the validity of data linearity. If the analysis reveals any patterns in the data, XGBoost and the WAE model - which utilizes different weighting schemes to average the results of the regression model and XGBoost model - are employed for forecasting. The models are then compared using various error metrics. The results demonstrate the framework's ability to provide accurate forecasting results for linear, nonlinear and mixed patterns. Notably, there is performance enhancement when comparing the WAE model with the output of existing models. This is because the weight searching algorithm enables the WAE model to prioritize and leverage the most effective models. As the impact of each pattern may vary depending on the data or the splitting concept, the WAE model can conduct a grid search to find the optimal combination of weights to minimize the RMSE. This is supported by the results obtained from DS1 and DS2 as they represent mixed patterns cases, where each had a different weight for its regression model depending on the presence of linear and seasonal patterns. For DS3, residual analysis indicates no clear pattern, and the SARIMA model makes accurate predictions, confirming the initial hypothesis of data linearity. Conversely, for DS4, the ARIMA family fails to predict the existing pattern. XGBoost and the WAE model perform well due to the nonlinearity of the data, with the zero weight is assigned to the ARIMA model and 1 for the XGBoost model for minimum RMSE. The tested data sets also show the effectiveness of the framework in forecasting different stages of the PLC curve with better performance against the candidate models. This demonstrates that the WAE model, which combines the strengths of ARIMA and XGBoost, offers several advantages for forecasting. ARIMA models excel at capturing autocorrelation and seasonality in time series data, while XGBoost is a potent machine learning algorithm capable of capturing complex nonlinear relationships and interactions between variables. By amalgamating the two models, we can potentially achieve improved accuracy, reduced bias, adaptability, and reduced overfitting. Moreover, the developed framework serves as a valuable tool for forecasting various stages of the product life cycle curve as an essential industrial aspect and an exemplification of a mixed model pattern. Utilizing tested datasets, we simulate diverse stages with different scenarios of the product life cycle to assess the framework's effectiveness at each stage. This provides a strong tool for launching product expansion strategies. The framework also excels in industrial applications related to continuous demand patterns and big data, where it is capable of producing optimal forecasts. This includes industries such as home appliances, automobile manufacturing, pharmaceuticals, and the food industry. While the framework is tested with data of varying shapes and patterns, further datasets could be examined to validate their ability to provide optimal forecasts for a wider range of time series datasets. Although XGBoost demonstrated effective forecasting results for the nonlinear patterns in this study, other machine learning algorithms could be considered to compare performance and select the most suitable method. Declarations Data Availability The data sets used and analyzed during the study are publicly available and can also be shared upon request Author Contribution Islam wrote the main manuscript. Yomna and Amin provided technical guidance and supervision. All authors reviewed the manuscript. Competing Interests No conflict of interest References M. A. Villegas, D. J. Pedregal, and J. R. Trapero, “A support vector machine for model selection in demand forecasting applications,” Comput Ind Eng , vol. 121, pp. 1–7, Jul. 2018, doi: 10.1016/j.cie.2018.04.042. M. Bertolini, D. Mezzogori, M. Neroni, and F. Zammori, “Machine Learning for industrial applications: A comprehensive literature review,” Aug. 01, 2021, Elsevier Ltd . doi: 10.1016/j.eswa.2021.114820. S. Ji, X. Wang, W. Zhao, and D. Guo, “An application of a three-stage XGboost-based model to sales forecasting of a cross-border e-commerce enterprise,” Math Probl Eng , vol. 2019, 2019, doi: 10.1155/2019/8503252. P. Li and J. S. Zhang, “A new hybrid method for china’s energy supply security forecasting based on ARIMA and xgboost,” Energies (Basel) , vol. 11, no. 7, 2018, doi: 10.3390/en11071687. Wang Yan and Guo Yuankai, “Forecasting method of stock market volatility in time series data based on mixed model of ARIMA and XGBoost,” Oct. 2019. W. Fu and C. F. Chien, “UNISON data-driven intermittent demand forecast framework to empower supply chain resilience and an empirical study in electronics distribution,” Comput Ind Eng , vol. 135, pp. 940–949, Sep. 2019, doi: 10.1016/j.cie.2019.07.002. P. Jiang, R. Li, N. Liu, and Y. Gao, “A novel composite electricity demand forecasting framework by data processing and optimized support vector machine,” Appl Energy , vol. 260, Feb. 2020, doi: 10.1016/j.apenergy.2019.114243. Y. Duan, S. Li, S. Chen, Q. Tan, C. Chen, and M. Wang, “Forecasting the short-term urban gas daily demand in winter based on the XGBoost algorithm,” in IOP Conference Series: Earth and Environmental Science , IOP Publishing Ltd, Mar. 2021. doi: 10.1088/1755-1315/675/1/012150. L. Aswanuwath, W. Pannakkong, J. Buddhakulsomsiri, J. Karnjana, and V. N. Huynh, “A Hybrid Model of VMD-EMD-FFT, Similar Days Selection Method, Stepwise Regression, and Artificial Neural Network for Daily Electricity Peak Load Forecasting,” Energies (Basel) , vol. 16, no. 4, Feb. 2023, doi: 10.3390/en16041860. O. Ozdemir and C. Yozgatligil, “Forecasting performance of machine learning, time series, and hybrid methods for low- and high-frequency time series,” Stat Neerl , vol. 78, no. 2, pp. 441–474, May 2024, doi: 10.1111/stan.12326. E. Çağlayan-Akay and K. H. Topal, “Forecasting Turkish electricity consumption: A critical analysis of single and hybrid models,” Energy , vol. 305, Oct. 2024, doi: 10.1016/j.energy.2024.132115. T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , Association for Computing Machinery, Aug. 2016, pp. 785–794. doi: 10.1145/2939672.2939785. Jason Brownlee, “Train-Test Split for Evaluating Machine Learning Algorithms - MachineLearningMastery.com.” Accessed: Jul. 21, 2024. [Online]. Available: https://machinelearningmastery.com/train-test-split-for-evaluating-machine-learning-algorithms/ G. E. P. Box, G. M. Jenkins, and G. C. Reinsel, Time series analysis : forecasting and control . John Wiley, 2008. Ö. G. Ali, S. Sayin, T. van Woensel, and J. Fransoo, “SKU demand forecasting in the presence of promotions,” Expert Syst Appl , vol. 36, no. 10, pp. 12340–12348, Dec. 2009, doi: 10.1016/j.eswa.2009.04.052. L. Menculini et al. , “Comparing Prophet and Deep Learning to ARIMA in Forecasting Wholesale Food Prices,” Forecasting , vol. 3, no. 3, pp. 644–662, Sep. 2021, doi: 10.3390/forecast3030040. K. Y. Liu, “SUPPLY CHAIN ANALYTICS Concepts, Techniques and Applications.” Jason Brownlee, “Weighted Average Ensemble for Deep Learning Neural Networks.” Accessed: Mar. 15, 2024. [Online]. Available: https://machinelearningmastery.com/weighted-average-ensemble-with-python/ “GitHub - awesomedata/awesome-public-datasets: A topic-centric list of HQ open datasets.” Accessed: Jul. 29, 2024. [Online]. Available: https://github.com/awesomedata/awesome-public-datasets “Find Open Datasets and Machine Learning Projects | Kaggle.” Accessed: Jul. 30, 2024. [Online]. Available: https://www.kaggle.com/datasets Additional Declarations No competing interests reported. Cite Share Download PDF Status: Published Journal Publication published 04 Nov, 2025 Read the published version in Scientific Reports → Version 1 posted Editorial decision: Revision requested 17 Jun, 2025 Reviews received at journal 16 Jun, 2025 Reviews received at journal 15 Jun, 2025 Reviews received at journal 12 Jun, 2025 Reviews received at journal 12 Jun, 2025 Reviewers agreed at journal 11 Jun, 2025 Reviewers agreed at journal 11 Jun, 2025 Reviewers agreed at journal 10 Jun, 2025 Reviewers agreed at journal 10 Jun, 2025 Reviewers agreed at journal 10 Jun, 2025 Reviewers agreed at journal 10 Jun, 2025 Reviewers agreed at journal 10 Jun, 2025 Reviewers agreed at journal 10 Jun, 2025 Reviewers agreed at journal 10 Jun, 2025 Reviewers agreed at journal 10 Jun, 2025 Reviews received at journal 10 Jun, 2025 Reviewers agreed at journal 10 Jun, 2025 Reviewers agreed at journal 10 Jun, 2025 Reviewers agreed at journal 10 Jun, 2025 Reviewers agreed at journal 10 Jun, 2025 Reviewers agreed at journal 10 Jun, 2025 Reviewers invited by journal 10 Jun, 2025 Editor invited by journal 02 May, 2025 Editor assigned by journal 28 Apr, 2025 Submission checks completed at journal 25 Apr, 2025 First submitted to journal 23 Apr, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6515650","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":469738562,"identity":"0e2ecc64-8fe9-4d51-9a57-73d9c0dce2f1","order_by":0,"name":"Islam M. Hammam","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABKElEQVRIiWNgGAWjYBACNgbmBhjb8AGyOB4tjHAtxgZEaWFA0mImQZQWPv6DjZ9uMGyTl3dv3lbxcUctA3//GQOGD2WHGfj5F2B3mERis3QOw23DjWeOld2ceeY4g8SNHAPGGecOM0jOeIBDC2MDSAvjxhk5Zrd5244xMNzgMWDmbTvMYHDjAHYt/AebfwO12G+c/8as+C9Qi/z5MwbMf4Fa7HFpYUhsA9mSOF+Cx4yZsa2GweBAjgGQAbSFvwG7FonENuscg9vJG3jSiiV72w7wGN5IKzjYcy6dR+IG9hCT7z98+HZOxW3b+e2HN3742VYnJ3f+8MYHP8qs5fj7sTsMAoBxaACRP8wDIkFsHgaJBDxaQNZBHF6HJMSPz5ZRMApGwSgYQQAAQllhpGENNtsAAAAASUVORK5CYII=","orcid":"","institution":"Ain Shams University","correspondingAuthor":true,"prefix":"","firstName":"Islam","middleName":"M.","lastName":"Hammam","suffix":""},{"id":469738563,"identity":"fb79243c-ea6c-41c3-a895-ada5b8a57d7c","order_by":1,"name":"Amin K. El-Kharbotly","email":"","orcid":"","institution":"Ain Shams University","correspondingAuthor":false,"prefix":"","firstName":"Amin","middleName":"K.","lastName":"El-Kharbotly","suffix":""},{"id":469738564,"identity":"a9801e68-e894-48f1-9027-40b2cf55336c","order_by":2,"name":"Yomna M. Sadek","email":"","orcid":"","institution":"Ain Shams University","correspondingAuthor":false,"prefix":"","firstName":"Yomna","middleName":"M.","lastName":"Sadek","suffix":""}],"badges":[],"createdAt":"2025-04-23 22:08:18","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6515650/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6515650/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41598-025-23352-w","type":"published","date":"2025-11-04T15:57:43+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":84503749,"identity":"59dd73aa-ad9b-4f35-93dc-81625af08420","added_by":"auto","created_at":"2025-06-12 17:50:50","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":454851,"visible":true,"origin":"","legend":"\u003cp\u003eThe Developed Framework\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-6515650/v1/df7ab6a4736bf25eb2bc950c.png"},{"id":84504596,"identity":"20df7091-9bce-4d4a-9fd3-59363ba4da23","added_by":"auto","created_at":"2025-06-12 18:06:50","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":899597,"visible":true,"origin":"","legend":"\u003cp\u003eThe five data sets under study are divided into train and test sets\u003c/p\u003e","description":"","filename":"22.png","url":"https://assets-eu.researchsquare.com/files/rs-6515650/v1/630c5872f63a030155bbc60f.png"},{"id":84504285,"identity":"8994218f-66a6-42bf-8d56-f9736e3b8b40","added_by":"auto","created_at":"2025-06-12 17:58:50","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":485664,"visible":true,"origin":"","legend":"\u003cp\u003eForecast results for DS1\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-6515650/v1/76627a0860cdb068d0254ca7.png"},{"id":84503752,"identity":"188ccfc8-f487-4725-be79-ea5a9e6f4f5d","added_by":"auto","created_at":"2025-06-12 17:50:50","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":361168,"visible":true,"origin":"","legend":"\u003cp\u003eResidual analysis using plot_diagnostics for DS 1\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-6515650/v1/21491309f0ef7ae650b91bc0.png"},{"id":84503757,"identity":"cc636161-da8b-40fe-aec1-b173f5e0e3ff","added_by":"auto","created_at":"2025-06-12 17:50:50","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":72246,"visible":true,"origin":"","legend":"\u003cp\u003eFeature Importance for DS1\u003c/p\u003e","description":"","filename":"5.png","url":"https://assets-eu.researchsquare.com/files/rs-6515650/v1/3dcbefa7d5d9b6dbb55de629.png"},{"id":84503762,"identity":"d9c99d87-346f-4dd3-b89f-d90e4aa1c94f","added_by":"auto","created_at":"2025-06-12 17:50:50","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":356728,"visible":true,"origin":"","legend":"\u003cp\u003eForecast results for DS2\u003c/p\u003e","description":"","filename":"6.png","url":"https://assets-eu.researchsquare.com/files/rs-6515650/v1/82616ef212d60bb550a2b747.png"},{"id":84503759,"identity":"3c873e92-816c-49f5-9083-55388736963d","added_by":"auto","created_at":"2025-06-12 17:50:50","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":788801,"visible":true,"origin":"","legend":"\u003cp\u003eForecast results forDS3\u003c/p\u003e","description":"","filename":"7.png","url":"https://assets-eu.researchsquare.com/files/rs-6515650/v1/43e088efd1fbdfeb45a74d21.png"},{"id":84503754,"identity":"b0007e38-27fe-40f6-9c50-d87e7da429f2","added_by":"auto","created_at":"2025-06-12 17:50:50","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":380383,"visible":true,"origin":"","legend":"\u003cp\u003eForecast results for DS4\u003c/p\u003e","description":"","filename":"8.png","url":"https://assets-eu.researchsquare.com/files/rs-6515650/v1/93c179b27b72fde29878c18a.png"},{"id":84503763,"identity":"ae6acecf-f0d5-4d1b-8efa-1fd4d9c20fc7","added_by":"auto","created_at":"2025-06-12 17:50:50","extension":"png","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":352320,"visible":true,"origin":"","legend":"\u003cp\u003eForecast results for DS5\u003c/p\u003e","description":"","filename":"9.png","url":"https://assets-eu.researchsquare.com/files/rs-6515650/v1/31737183b6d2891ba7c4450e.png"},{"id":95564150,"identity":"c0dfa4ad-4279-42f4-97ba-629dc07645c1","added_by":"auto","created_at":"2025-11-10 16:08:25","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":5248573,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6515650/v1/bee259e6-5fc9-487b-a5bd-34bd6db9e6af.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"A Framework for Forecasting Demand of General Time Series Data Using Regression Models and Machine Learning","fulltext":[{"header":"1 Introduction","content":"\u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eDemand forecasting is the process of accurately estimating the demand for a product by considering various independent input variables and their relationship with the demand. It is the cornerstone of supply chain management and prediction of product life cycle. Many factors that affect the demand are random, uncertain, fuzzy, and have a nonlinear relation with the demand. This makes it challenging to establish precise mathematical models [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. For decades, time series forecasting has been studied across different fields such as statistics, econometrics, mathematics, engineering, \u0026hellip;etc. Despite their good results in forecasting, novel statistical methods (like ARIMA family) are limited to the scope of linear and near linear assumptions. Artificial intelligence can develop algorithms that can improve performance by experience. Machine learning as a subfield of artificial intelligence can make decisions, predictions and forecasting based on historical data without the limitations of linear assumptions. Machine learning does not need to be programmed explicitly for a certain task [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. Instead, it provides an effective solution where traditional approaches may fall short, allowing prediction or decision-making based solely on data-driven information.\u003c/p\u003e \u003cp\u003eResearch on forecasting methods has been conducted to utilize both regression models and machine learning algorithms, comparing their performance with common benchmark models such as Autoregression Integrated Moving Average (ARIMA). For instance, Villegas et al. [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e] employed Support Vector Machine (SVM) to choose the most suitable prediction model from several predictive models for scenarios that involve unstable demand in a short period, while Ji et al. [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e] introduced a three-stage hybrid forecasting method based on Clustering, Extreme gradient Boosting (XGBoost), and ARIMA which was tested against multi-featured e-commerce datasets along with other models showing exceptional performance compared to traditional and machine learning methodologies. Pin Li and Jin-Suo Zhang [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e] developed a hybrid model that combines ARIMA with XGBoost to forecast China's energy supply security. They compared the accuracy of their ARIMA-XGBoost hybrid model against an ARIMA-only approach based on mean absolute percentage error (MAPE) results, which were lower than 4.5%. As a result, they concluded that the hybrid model was more precise and closer to actual outcomes. Yan Wang and Yuankai Guo [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e] decomposed the stock historical data set using discrete wavelet transform (DWT) into - a partial data set and an error-related dataset \u0026ndash; with the used of the grid search algorithm to optimize the XGBoost parameters and construct the GSXGB model. Combining an ARIMA model with GSXGB in a hybrid model to forecast different dataset portions. Among all candidate models of ARIMA, XGBoost, GSXGB, DWT-ARIMA-XGBoost and DWT-ARIMA-GSXGB, the last one showed better accuracy and generalization ability according to the simulation results.\u003c/p\u003e \u003cp\u003eA data-driven analytics framework was developed by Wenhan Fu and Chen-Fu Chien [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e] for predicting the demands of intermittent electronics components. To counteract discontinuous demand patterns, temporal aggregation and a combination forecast using Syntetos-Boylan approximation, ARIMA, and Recurrent Neural Network (RNN) were employed. The findings indicate that this integrated approach with temporal aggregation can effectively facilitate flexible decision-making to support supply chain innovation in electronics. Similarly, Ping Jiang and Ranran Li [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e] proposed a composite model for forecasting electricity demand. Their modeling concept exhibited an impressive ability to detect seasonal relationships within electricity demand data as well as superior performance accuracy compared to benchmark models. Yanzhi Duan and Sensheng Li [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e] tested the result of XGBoost algorithm for forecasting short term urban gas daily demand against other machine learning models considering some features affecting the demand. The XGBoost returned excellent results against multi regression, random forest, and Support Vector Machine (SVM). Aswanuwath, L [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e] introduced a combination model to predict daily electricity peak loads. The proposal utilized variational mode decomposition (VMD) and fast Fourier transform (FFT). These methods were employed for data decomposition and identification of seasonal patterns. An empirical mode decomposition algorithm was integrated within VMD, serving to establish the optimal level of disintegration required by the model. In order to capture important input variables that would impact forecasting accuracy, stepwise regression combined with similar-day selection methodologies were used during variable selection. This resulted in improved prediction performance while reducing computation time along with minimizing neural network structure requirements as well.\u003c/p\u003e \u003cp\u003eO. Ozdemir and C. Yozgatligil[\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e] compared the forecasting performance of various machine learning methods - Random Forest, Support Vector Regression, XGBoost, Bayesian Neural Networks (BNN), RNN, Long Short-Term Memory (LSTM), and Feedforward Neural Networks (FFNN) - and traditional statistical - Naive and Seasonal Naive Methods, S/ARIMA, Exponential Smoothing, TBATS, Bayesian Exponential Smoothing Models with Trend Modifications, and STL Decomposition \u0026ndash; and hybrid approaches that is a combinations of statistical and machine learning methods. The authors aim to create a comprehensive forecasting guide that considers different methods against various time domains. The findings indicate that machine learning methods generally outperform traditional statistical methods in forecasting accuracy, particularly for complex time series data. However, it also notes that the hybrid approach is not always the best option, and that the performance of models can be influenced by the characteristics of the time series being analyzed. E. \u0026Ccedil;aglayan-Akay and K.H. Topal[\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e] evaluated the forecasting performance of various models for predicting electricity consumption in T\u0026uuml;rkiye The research specifically aims to compare traditional single models, such as SARIMA, with hybrid models that combine linear and nonlinear forecasting techniques, including recent models developed by Wang et al. and Khashei \u0026amp; Bijari. The findings indicate that the Khashei \u0026amp; Bijari hybrid model is particularly effective in capturing the complexities of the electricity consumption series, outperforming both the Zhang model and the single models. The study concludes that traditional models may not adequately address the nonlinear characteristics of the data, highlighting the importance of using hybrid approaches for more accurate forecasting.\u003c/p\u003e \u003cp\u003eIt is evident that the previous research exhibits single and combined forecasting techniques that handle different demand patterns separately (trend, seasonal, linear, and nonlinear patterns). However, no attention was directed towards forecasting mixed demand patterns. In response to this, a demand forecasting framework has been proposed that uses both regression models and machine learning models to forecast the demand of general time series data. The framework is also utilized to forecast demand along different phases of the product life cycle as an important application, focusing the stage of decline as the main challenge among the other phases. The regression model addresses linear patterns in the data, while the machine learning model addresses non-linear demand pattern. In the present work, a framework integrates the outcomes of the regression model and the machine learning techniques through a weighted average ensemble (WAE) algorithm to benefit from the strengths of both models for mixed patterns. This framework will help forecast different phases of the product life cycle. The performance of the models will be evaluated using the most popular metrics; mean absolute error (MAE), mean square error (MSE), root mean square error (RMSE), and mean absolute percentage error (MAPE).\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e"},{"header":"2 The Developed Forecasting Framework","content":"\u003cdiv\u003e\n \u003cp\u003eThe proposed framework commences by the assumption that the data to be forecast primarily follows a linear pattern. Then, data undergoes tests to grasp seasonal and stationary patterns. Subsequently, the framework determines the optimal regression model for generating forecasts. Once the forecasts are generated, the residuals are scrutinized to identify any patterns that the regression model may have missed. The absence of patterns validates the fundamental assumption of data linearity. If any patterns are identified in the data, the framework proceeds to create forecasts using XGBoost, followed by the development of the ensemble model to assess its performance against regression and XGBoost models. However, the framework has certain limitations. The framework is more suited for medium-to-large time-series datasets with smooth and erratic demand patterns. It was not tested against very small-sized datasets – less than 100 observations – or those characterized by intermittent and lumpy demand patterns. This is because the XGBoost model is typically more data-hungry and may not perform optimally with very small datasets [12]. The framework returns the forecast of the best model based on the error measures. The proposed framework is detailed below and is depicted in Fig. 1.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec3\"\u003e\n \u003ch2\u003e2.1 Data Preparation\u003c/h2\u003e\n \u003cdiv\u003e\n \u003cp\u003eGenerally, the first step of the framework is data cleaning, indexing, and formatting. Subsequently, two tests are conducted to identify stationarity and seasonality demand patterns. This step is pivotal for making the right model selection in the subsequent step of the framework.\u003c/p\u003e\n \u003cp\u003eStationarity tests assess if a time series data's statistical properties remain constant over time. This framework uses two of the most common methods; Augmented Dickey Fuller test (ADF) and rolling statistics. In the ADF test, the p-value helps determine whether the data is stationary; if the p-value is below a critical threshold of 0.05, the null hypothesis of non-stationarity is rejected, indicating the data is stationary. Rolling statistics involve calculating a moving average or moving variance over a defined window of time, allowing for a visual assessment of stationarity by observing whether the mean and variance remain constant over time. If there is a conflict in both results, the seasonal decomposition (SD) breaks the tie. The existence of trend and/or seasonal patterns indicates data non-stationarity. Using multiple tests increases robustness and reduces the risk of model misidentification.\u003c/p\u003e\n \u003cp\u003eSeasonality tests identify the recurring patterns or fluctuations that occur at regular intervals along the data set. These tests consider various methods; seasonal decomposition (SD), autocorrelation function (ACF) and partial autocorrelation function (PACF). If the analysis proves seasonality in data, a periodogram is used to identify the seasonal period.\u003c/p\u003e\n \u003cp\u003eAfterwards, data is divided into sets. For the regression models, data is divided into two sets based on the common practice: training and testing sets in percentages 80% and 20% respectively [13]. For the XGBoost algorithm, the data is divided into training, validation, and testing sets in percentages 65%, 15% and 20% respectively to maintain the same size for the test data and have enough data for training process as well.\u003c/p\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec4\"\u003e\n \u003ch2\u003e2.2 The Regression Models\u003c/h2\u003e\n \u003cdiv\u003e\n \u003cp\u003eThe regression models are statistical models that are used for predicting future values in time series by modeling the data's underlying patterns, including autoregression, integration (differencing), and moving averages based on past observations [14]. The ARIMA model stands out as one of the most widely used models for forecasting and is considered as a benchmark model in many studies [15], [16]. Based on the tests conducted during the preparation phase, the framework selects the most appropriate regression model among. For stationary, non-seasonal data, both the autoregressive (AR) and autoregressive moving average (ARMA) models are utilized. For non-stationary, non-seasonal data, the autoregressive integrated moving average (ARIMA) model is chosen. Meanwhile, for non-stationary, seasonal data patterns, the seasonal autoregressive integrated moving average (SARIMA) model is selected. The explanation of the models and their parameter selection were detailed by Box et al. [14]. However, for ARIMA and SARIMA models, the model is constructed twice as M1 and M2: once with the assistance of the stepwise function and another without it. The results of both are compared to selecting the model with the best performance based on the objective function of minimum AIC.\u003c/p\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec5\"\u003e\n \u003ch2\u003e2.3 Regression Model Diagnostics\u003c/h2\u003e\n \u003cdiv\u003e\n \u003cp\u003eThis step is very crucial in the framework as it examines the residuals of the model to examine the initial hypothesis of data linearity to confirm or reject it. If the analysis of the residuals indicates a clear pattern, this is an indicator that the data has a nonlinear pattern. As the regression model managed to handle only the linear and seasonal patterns of the data – if existed-the framework utilizes the XGBoost algorithm to perform new forecasts. In this context, the “plot_diagnostics” function from the “statsmodels” library in Python is used to visualize the diagnostic plots for the fitted time series model. The four key diagnostic plots that “plot_diagnostics” generate are standardized residuals over time, histogram plus estimated density of standardized residuals, normal Q-Q plot, and correlogram.\u003c/p\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec6\"\u003e\n \u003ch2\u003e2.4 The XGBoost Forecasting Model\u003c/h2\u003e\n \u003cdiv\u003e\n \u003cp\u003eXGBoost is an advanced machine learning technique based on the gradient boosting framework that was proposed by Chen and Carlos [12] as an improved Gradient Boosting Decision Tree (GBDT). XGBoost is the selected machine learning technique in this study due to its high performance in forecasting nonlinear tabular datasets of various sizes and it consistently achieves state-of-the-art results [17]. The proposed XGBoost model employs hyperparameters, with parameter optimization carried out through a grid search method, enhancing its overall performance. The model passes through five main steps:\u003c/p\u003e\n \u003c/div\u003e\n \u003cp\u003e\u003cem\u003ea. Features extraction\u003c/em\u003e: Creating features using lag variables is a common technique employed in time series analysis to capture the temporal dependencies between past and future observations of a time series. The fundamental concept involves generating new features, which correspond to columns in a dataset, to represent the values of the time series at different time lags. After examining the ACF and PACF of the time series, significant lags were tested to create lagged features, also known as time lags or lag variables.\u003c/p\u003e\n \u003cp\u003e\u003cem\u003eb. Splitting the data\u003c/em\u003e: The data set is divided into train, validation, and test sets as explained in section 2.1. The train set is utilized for model training and algorithm parameter tuning. Subsequently, the validation set is employed to assess the model's performance based on the selected objective function, which in this case is RMSE. It also facilitates early stopping for the XGBoost model, enabling training to halt if the performance on the validation set begins to deteriorate. This serves to prevent overfitting and saves training time by avoiding unnecessary iterations.\u003c/p\u003e\n \u003cp\u003e\u003cem\u003ec. Hyperparameters selection\u003c/em\u003e: There are around 35 main hyperparameters for the XGBoost model. Those hyperparameters can be categorized into three types: General Parameters, Booster Parameters, Learning Task Parameters [18]. In this study six hyperparameters were selected. These are the most important and common ones [17]. Table 1 lists the hyperparameters.\u003c/p\u003e\n \u003cp\u003e1. Learning rate.\u003cbr\u003e2. Maximum depth of the tree.\u003cbr\u003e3. Number of trees.\u003cbr\u003e4. Colsample_bytree.\u003cbr\u003e5. Reg_alpha.\u003cbr\u003e6. Subsample.\u003cbr\u003e\u003c/p\u003e\n \u003cp\u003e\u003cem\u003ed. Setting hyperparameters grid\u003c/em\u003e: This is accomplished using the “GridSearchCV” function from Python's scikit-learn library from defined grid. This involves specifying a grid of values for each hyperparameter to reduce the tuning time. Table 1 column 2 shows the selected grids for each hyperparameter and the selected values for each parameter for the data set. It also shows the effect of the selected hyperparameter values on the model.\u003c/p\u003e\n \u003cp\u003e\u003cem\u003ee. Hyperparameters tuning\u003c/em\u003e: Using the defined hyperparameters and grids, the cross validation in this proposed framework is done through 3 folds. The optimal hyperparameters – shown in Table 1 – are selected based on their performance, with the minimum MSE as the evaluation metric.\u003c/p\u003e\n \u003cp\u003e\u003cem\u003ef. Feature importance\u003c/em\u003e: This is a method utilized to ascertain which features in a dataset exert the most significant impact on the target variable in a predictive modeling scenario. It aids in comprehending the relationship between features and the target variable. F-score (or frequency) is a straightforward measure used in this study for measuring the importance of the features to show how often a feature contributes to partitioning the data.\u003c/p\u003e\n \u003cp\u003eOnce the hyperparameters of the model are set, the forecast is obtained.\u003c/p\u003e\n \u003cdiv\u003e\n \u003ctable id=\"Tab1\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv\u003eTable 1\u003c/div\u003e\n \u003cdiv\u003e\n \u003cp\u003eXGBoost hyperparameters\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003e\u003cem\u003eHyperparameter\u003c/em\u003e\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003e\u003cem\u003eGrid\u003c/em\u003e\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003e\u003cem\u003eValue\u003c/em\u003e\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003e\u003cem\u003eComment\u003c/em\u003e\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eLearning rate\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e[0.01, 0.1]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003emoderate value, gives relatively gradual learning, prevents overfitting\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eMaximum depth of the tree\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e[3, 4, 5]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eModerate depth gives complex patterns without overfitting\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eNumber of trees\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e[100, 200, 300]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e200\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eModerate boosting rounds with low learning rate for better learning\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eColsample_bytree\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e[0.8, 1]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.8\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eModerate value, increases robustness, reduces risk of overfitting\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eReg_alpha\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e[0.001, 0.01]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.001\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSlight regularization, promots sparsity and preventing overfitting\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSubsample\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e[0.6, 0.8]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.6\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eLow to moderate value increasing robustness and randomness\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec7\"\u003e\n \u003ch2\u003e2.5 The Weighted Average Ensemble (WAE) Model\u003c/h2\u003e\n \u003cdiv\u003e\n \u003cp\u003eBeside the results of the regression model and the XGBoost algorithm, the WAE is used to give forecasts that combine both results. According to Brownlee [18] “The weighted average or weighted sum ensemble is an extension over voting ensembles”. Each model is assigned a weight determined through a grid search process with weights ranging from 0 to 1 in increments of 0.1. The framework systematically explores the optimal combination of weights that minimize the root mean square error (RMSE).\u003c/p\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec8\"\u003e\n \u003ch2\u003e2.6 Models Evaluation\u003c/h2\u003e\n \u003cdiv\u003e\n \u003cp\u003eTo measure the performance of the models used in this study, the test set of each dataset is compared to its forecast. The framework uses the most popular error metrics; MSE, RMSE, MAE, and MAPE. The best candidate model is the one that results in minimum error.\u003c/p\u003e\n \u003c/div\u003e\n\u003c/div\u003e"},{"header":"3 Computational Results and Discussion","content":"\u003cp\u003e\u003c/p\u003e\n\u003cp\u003eThe proposed framework was applied to forecast demand for five data sets obtained from [19], [20]. The datasets were selected to represent the different demand patterns for the different stages of the product life cycle curve. The conventional curve comprises four primary stages: introduction, growth, maturity, and decline. The focal points of this research are the challenging maturity and decline stages, characterized by shifts in the linear trend.\u003c/p\u003e\n\u003cp\u003e\u003c/p\u003e\n\u003cdiv id=\"Sec10\"\u003e\n \u003ch2\u003e3.1 Data Sets Under Study\u003c/h2\u003e\n \u003cdiv\u003e\n \u003cp\u003eFive data sets (DS1, DS2, DS3, DS4, and DS5) were selected for this study to represent three main demand patterns: linear, nonlinear, and mixed pattern (the pattern varies between linear and non-linear) as shown in Fig. 2. These datasets simulate three specific phases of the product life cycle:\u003c/p\u003e\n \u003c/div\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003ePhase1 – Initial Demand: represents the introduction and the growth stages.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003ePhase2 – Saturated Demand: represents the maturity phase.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003ePhase3 – Diminishing Demand: represents the declining phase.\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003cdiv\u003e\n \u003cp\u003eDS1 represents the entire PLC curve, with the test set is a part the diminishing demand phase – as the whole phase starts from the star mark shown in Fig. 2.a. DS2 is a segment of the maturity phase, with a test set similar to DS1 case. DS3 captures the transition from the growth to the maturity stage, with the test set located in the saturated demand phase. DS4 exclusively covers the maturity phase, while DS5 focuses on the introduction and growth stages, with the test set in the initial demand phase.\u003c/p\u003e\n \u003c/div\u003e\n \u003cdiv\u003e\n \u003cp\u003eFigure 2 shows the five datasets, each divided into a train and a test set in percentages 80% and 20%. Next, the stationarity test was conducted revealing that four sets; DS1, DS2, DS3, and DS5 are non-stationary, whereas DS4 are stationary. Next, the seasonality test was conducted on the non-stationary data sets to reveal that DS1, DS2, and DS3 have seasonal trends. According to the proposed framework, forecasting for the first three sets will take place using SARIMA, whereas DS4 will be forecasted using AR and ARMA. For DS5, the selection is the ARIMA model. These results are tabulated in Table 2.\u003c/p\u003e\n \u003c/div\u003e\n \u003cdiv\u003e\u0026nbsp;\u003ctable id=\"Tab2\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv\u003eTable 2\u003c/div\u003e\n \u003cdiv\u003e\n \u003cp\u003eStationarity and seasonality tests results with regression model selection\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003ccolgroup cols=\"7\"\u003e\u003c/colgroup\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eData set\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eP-Value\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eStationary?\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eSeasonal?\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eSeasonal Period\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eModel used\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colspan=\"1\"\u003e\u0026nbsp;\u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eDS1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.4886\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eNo\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eYes\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e12\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSARIMA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colspan=\"1\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eDS2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.9986\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eNo\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eYes\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e12\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSARIMA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colspan=\"1\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eDS3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.4103\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eNo\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eYes\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e12\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSARIMA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colspan=\"1\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eDS4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e3.4570e-05\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eYes\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eNo\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAR \u0026amp; ARMA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colspan=\"1\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eDS5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.1142\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eNo\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eNo\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eARIMA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colspan=\"1\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec11\"\u003e\n \u003ch2\u003e3.2 Framework Results for DS1\u003c/h2\u003e\n \u003cdiv\u003e\n \u003cp\u003eThe results of applying the proposed framework on DS1 are presented and discussed in full detail. DS1 represents Phase3 in the product lifecycle curve, which is the declining phase.\u003c/p\u003e\n \u003c/div\u003e\n \u003cdiv id=\"Sec12\"\u003e\n \u003ch2\u003e3.2.1 The Regression Model\u003c/h2\u003e\n \u003cdiv\u003e\n \u003cp\u003eAccording to the proposed framework, there is an initial hypothesis of data linearity. When applying the regression models on DS1, the auto.arima function from ‘pmdarima’ library is employed to automatically find the best combinations of regression model hyperparameters based on the objective function of minimum AIC, resulting in two sets of combinations: Model One or M1 using the stepwise function, and Model Two or M2 without the stepwise function. The selected models and their respective hyperparameters, along with the corresponding AIC and BIC values, are displayed in Table\u0026nbsp;3.\u003c/p\u003e\n \u003cp\u003eFigure 3.a shows the forecasts for DS1 using regression model. The figure shows only the test portion of the test, with the actual data (in blue) and the forecast (in green). The curves obtained from this step have been utilized to assess the performance of forecasting models with the datasets.\u003c/p\u003e\n \u003c/div\u003e\n \u003cdiv\u003e\u0026nbsp;\u003ctable id=\"Tab3\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv\u003eTable 3\u003c/div\u003e\n \u003cdiv\u003e\n \u003cp\u003eResults of regression models hyperparameters for DS1\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003ccolgroup cols=\"7\"\u003e\u003c/colgroup\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eSelected Model\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eM1 Hyperparameters\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eAIC\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eBIC\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eM2 Hyperparameters\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eAIC\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eBIC\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSARIMA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e(0,1,2) (1,0,1) [12]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e4245.2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e4262.9\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e(2,1,0) (1,0,1) [12] intercept\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e4270.7\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e4292.0\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n \u003c/div\u003e\n \u003cdiv id=\"Sec13\"\u003e\n \u003ch2\u003e3.2.2 Diagnostics of Regression Model Results\u003c/h2\u003e\n \u003cdiv\u003e\n \u003cp\u003eThe results obtained by the regression model are now tested to accept or reject the initial assumption of data linearity using the \"plot_diagnostics\" function. Figure 4 illustrates the outcomes of this step for DS1, featuring four distinct graphs:\u003c/p\u003e\n \u003c/div\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003eStandardized residuals over time show the model couldn’t handle the patterns in the data.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eHistogram plus estimated density of standardized residuals, with a Normal (0,1) density plotted for reference. As the two plots were not identical, then there is a pattern in the residuals.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eNormal Q-Q plot, with Normal reference line: all the blue dots should fall perfectly in line with the red line for no pattern in residuals. The graph suggests a skewed distribution.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eCorrelogram (i.e., ACF plot): displays a correlation at lag 5. It implies that there is some pattern in the residual errors which are not explained by the regression model.\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003cdiv\u003e\n \u003cp\u003eFrom the interpretation of the graphs, the residual errors have a pattern, and this indicates nonlinearity in the data set. Hence, XGBoost model will be used to handle nonlinear data.\u003c/p\u003e\n \u003c/div\u003e\n \u003c/div\u003e\n \u003cdiv id=\"Sec14\"\u003e\n \u003ch2\u003e3.2.3 The XGBoost Forecasting Model\u003c/h2\u003e\n \u003cp\u003eIn this study, the F-score of each lag was observed, as depicted in Fig.\u0026nbsp;5. The graph considers both the frequency of a feature's appearance in tree nodes and the average gain - or improvement in model performance - contributed by splits involving that feature. Lag 5 represents the most important feature, while other lags, such as lag 1, exhibit a high F-score value, indicating a strong time dependency within the data.\u003c/p\u003e\n \u003cp\u003eThe results obtained from the XGBoost model following the previously conducted hyperparameter tuning step are displayed in Fig. 3.b. The curve illustrates the proposed XGBoost model's capability to handle declining data and overcome the level change observed in the test set.\u003c/p\u003e\n \u003c/div\u003e\n \u003cdiv id=\"Sec15\"\u003e\n \u003ch2\u003e3.2.4 The WAE Model\u003c/h2\u003e\n \u003cdiv\u003e\n \u003cp\u003eDespite the visual comparison indicating the superiority of the XGBoost model over the SARIMA model for DS1, the framework proceeds with its subsequent steps to generate forecasts using the WAE model. As identified earlier during the model diagnostic stage, the analysis revealed the presence of mixed patterns in the data for DS1, encompassing both linearity and nonlinearity. The WAE model is specifically crafted to enhance performance when dealing with such diverse data patterns. The WAE model assigned 0.3 and 0.7 weights for SARIMA and XGBoost models respectively. Forecast results are shown in Fig. 3.c (in black). The figure shows a competitive result achieved by the WAE model. Notably, the weight of XGBoost is larger than that of the SARIMA model, influenced by the level change in the test set.\u003c/p\u003e\n \u003c/div\u003e\n \u003c/div\u003e\n \u003cdiv id=\"Sec16\"\u003e\n \u003ch2\u003e3.2.5 Models Evaluation\u003c/h2\u003e\n \u003cdiv\u003e\n \u003cp\u003ePerformance measures for the three models are calculated and summarized in Table 4. A comparison of these models reveals that the developed WAE model, with a weight distribution of 0.3 for SARIMA and 0.7 for XGBoost, exhibits a remarkable ability to forecast the the decline phase of the product life cycle. This is followed by the XGBoost model, with SARIMA trailing as the least performing model.\u003c/p\u003e\n \u003c/div\u003e\n \u003cdiv\u003e\u0026nbsp;\u003ctable id=\"Tab4\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv\u003eTable 4\u003c/div\u003e\n \u003cdiv\u003e\n \u003cp\u003eModels evaluation results for DS1\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003ccolgroup cols=\"5\"\u003e\u003c/colgroup\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eDS1\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eMSE\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eRMSE\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eMAE\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eMAPE\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSARIMA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e6,980,822\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2,642\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2,019\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.14%\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eXGBoost\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e3,307,421\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1,818\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1,400\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.08%\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eWAE\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2,721,807\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1,649\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1,325\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.08%\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec17\"\u003e\n \u003ch2\u003e3.3 Framework Results for DS2\u003c/h2\u003e\n \u003cdiv\u003e\n \u003cp\u003eDS2 resembles the early stage of Phase3, the start of the declining stage in the product life cycle. As shown in Table\u0026nbsp;2, DS2 has a seasonal pattern. The SARIMA model is used for forecasting. Figure\u0026nbsp;6a shows the forecast (in green) versus the test portion of the dataset (in red). The diagnostic stage reveals a pattern in residuals, prompting the framework to proceed with generating forecasts using the XGBoost model shown in Fig.\u0026nbsp;6b. It is noticed that the results of both models are close. Based on the grid search for the weights of the WAE model, equal weights were given to both models and the forecast is in Fig.\u0026nbsp;6c (in black). Table\u0026nbsp;5 further demonstrates that the WAE model exhibits the best performance, with the SARIMA model closely trailing due to the limited impact of nonlinearity in this case.\u003c/p\u003e\n \u003cp\u003eTable 5. Models evaluation results for DS2\u003c/p\u003e\n \u003c/div\u003e\n \u003ctable border=\"0\" cellspacing=\"0\" cellpadding=\"0\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eMSE\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eRMSE\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eMAE\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eMAPE\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eRegressive\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e32,665,469,191,440\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e5,715,371\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e4,762,220\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.32%\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eXGBoost\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e32,705,637,806,908\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e5,718,884\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e4,945,515\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.33%\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eWAE\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e32,241,403,606,754\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e5,678,151\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e4,825,182\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.32%\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003cdiv\u003e\n \u003cdiv align=\"left\"\u003e\u003cbr\u003e\u003c/div\u003e3.4 Framework Results for DS3\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec18\"\u003e\n \u003cdiv\u003e\n \u003cp\u003eDS3 represents Phase1, wherein the curve progresses from the introduction stage through growth and ultimately reaches maturity, with the target of the forecast being the maturity phase. Applying the SARIMA model resulted in the forecasts shown in Fig. 7a (in green). The residual analysis reveals no clear pattern in the data, confirming the linearity assumption. While the framework stops at this stage, we proceed with the following stages of the framework for validation reasons. The framework should not give forecasts using the XGBoost (in red) and the WAE model (in black), however, the results are given in Fig. 7. The grid search process for the WAE model determines the optimal weights as 100% for SARIMA and 0% for XGBoost. This matches our hypothesis that the data sets with linear pattern are best solved using the regression models, and that the machine learning model is not promising for the linear data. These results proved to be the best given the values in Table 6, confirming that SARIMA is the best candidate for DS3.\u003c/p\u003e\n \u003c/div\u003e\n \u003cdiv\u003e\n \u003cdiv align=\"left\"\u003eTable 6. Models evaluation results for DS3\u003c/div\u003e\n \u003ctable border=\"0\" cellspacing=\"0\" cellpadding=\"0\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eDS3\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eMSE\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eRMSE\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eMAE\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eMAPE\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eRegressive\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e8.23\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e2.87\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e2.28\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.02%\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eXGBoost\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e173.97\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e13.19\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e10.23\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.09%\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eWAE\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e8.23\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e2.87\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e2.28\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.02%\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec19\"\u003e\n \u003ch2\u003e3.5 Framework Results for DS4 and DS5\u003c/h2\u003e\n \u003cdiv\u003e\n \u003cp\u003eDS4 and DS5 are two cases for erratic demand patterns. DS4 represents another instance of Phase2 that exclusively focuses on portraying the maturity stage, while DS5 represents Phase1 or the end of the introduction stage where the curve starts to take off. DS5 presents an additional challenge due to a significant shift in demand levels toward the end of the period under study. After obtaining results from the regression models – displayed in Figs.\u0026nbsp;8.a for DS4 and 9.a for DS5 (both in green) – residual analysis reveals the presence of nonlinearity in the data suggesting the necessity to proceed further in the framework stages.\u003c/p\u003e\n \u003cp\u003eThe XGBoost forecast demonstrates the notable ability of the model to predict such data as shown in Fig.\u0026nbsp;8.b and Fig.\u0026nbsp;9.b (in red). The WAE model, in those cases, is constructed with a weight of 0 for the regression models and 1 for the XGBoost model. As expected, model evaluation results presented in Table\u0026nbsp;7 indicate that XGBoost exhibits the best performance for this case.\u003c/p\u003e\n \u003c/div\u003e\n \u003cp\u003eTable 7. Models evaluation results for DS4\u003c/p\u003e\n \u003ctable border=\"0\" cellspacing=\"0\" cellpadding=\"0\" width=\"486\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eDS4\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eMSE\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eRMSE\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eMAE\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eMAPE\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eRegressive\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e122,274\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e349.6\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e302.06\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e1.18%\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eXGBoost\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e3,549\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e59.57\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e44.39\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.13%\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eWAE\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e3,549\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e59.57\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e44.39\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.13%\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003cp\u003eTable 8. Models evaluation results for DS5\u003c/p\u003e\n \u003ctable border=\"0\" cellspacing=\"0\" cellpadding=\"0\" width=\"486\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eDS5\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eMSE\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eRMSE\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eMAE\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eMAPE\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eRegressive\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e3510.13\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e59.25\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e55.81\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.63%\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eXGBoost\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e54.57\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e7.39\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e5.65\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.06%\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eWAE\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e54.57\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e7.39\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e5.65\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.06%\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e"},{"header":"4 Conclusion and Future Work","content":"\u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eIn this research, a framework is proposed for forecasting general time series data, employing an ensemble model that combines two methods: regression methods and machine learning algorithms. Within the ARIMA family, the output of selected regression models and XGBoost as a machine learning technique are combined to generate a new forecast. Initially, the framework assumes data linearity, and the appropriate ARIMA method is chosen for data forecasting. The results are then evaluated using residual diagnostic tools to assess the validity of data linearity.\u003c/p\u003e \u003cp\u003eIf the analysis reveals any patterns in the data, XGBoost and the WAE model - which utilizes different weighting schemes to average the results of the regression model and XGBoost model - are employed for forecasting. The models are then compared using various error metrics. The results demonstrate the framework's ability to provide accurate forecasting results for linear, nonlinear and mixed patterns. Notably, there is performance enhancement when comparing the WAE model with the output of existing models. This is because the weight searching algorithm enables the WAE model to prioritize and leverage the most effective models. As the impact of each pattern may vary depending on the data or the splitting concept, the WAE model can conduct a grid search to find the optimal combination of weights to minimize the RMSE. This is supported by the results obtained from DS1 and DS2 as they represent mixed patterns cases, where each had a different weight for its regression model depending on the presence of linear and seasonal patterns. For DS3, residual analysis indicates no clear pattern, and the SARIMA model makes accurate predictions, confirming the initial hypothesis of data linearity. Conversely, for DS4, the ARIMA family fails to predict the existing pattern. XGBoost and the WAE model perform well due to the nonlinearity of the data, with the zero weight is assigned to the ARIMA model and 1 for the XGBoost model for minimum RMSE. The tested data sets also show the effectiveness of the framework in forecasting different stages of the PLC curve with better performance against the candidate models.\u003c/p\u003e \u003cp\u003eThis demonstrates that the WAE model, which combines the strengths of ARIMA and XGBoost, offers several advantages for forecasting. ARIMA models excel at capturing autocorrelation and seasonality in time series data, while XGBoost is a potent machine learning algorithm capable of capturing complex nonlinear relationships and interactions between variables. By amalgamating the two models, we can potentially achieve improved accuracy, reduced bias, adaptability, and reduced overfitting.\u003c/p\u003e \u003cp\u003eMoreover, the developed framework serves as a valuable tool for forecasting various stages of the product life cycle curve as an essential industrial aspect and an exemplification of a mixed model pattern. Utilizing tested datasets, we simulate diverse stages with different scenarios of the product life cycle to assess the framework's effectiveness at each stage. This provides a strong tool for launching product expansion strategies. The framework also excels in industrial applications related to continuous demand patterns and big data, where it is capable of producing optimal forecasts. This includes industries such as home appliances, automobile manufacturing, pharmaceuticals, and the food industry.\u003c/p\u003e \u003cp\u003eWhile the framework is tested with data of varying shapes and patterns, further datasets could be examined to validate their ability to provide optimal forecasts for a wider range of time series datasets. Although XGBoost demonstrated effective forecasting results for the nonlinear patterns in this study, other machine learning algorithms could be considered to compare performance and select the most suitable method.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003eData Availability\u003c/p\u003e\n\u003cp\u003eThe data sets used and analyzed during the study are publicly available and can also be shared upon request\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor Contribution\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eIslam wrote the main manuscript. Yomna and Amin provided technical guidance and supervision. All authors reviewed the manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting Interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNo conflict of interest\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eM. A. Villegas, D. J. Pedregal, and J. R. Trapero, \u0026ldquo;A support vector machine for model selection in demand forecasting applications,\u0026rdquo; \u003cem\u003eComput Ind Eng\u003c/em\u003e, vol. 121, pp. 1\u0026ndash;7, Jul. 2018, doi: 10.1016/j.cie.2018.04.042.\u003c/li\u003e\n\u003cli\u003eM. Bertolini, D. Mezzogori, M. Neroni, and F. Zammori, \u0026ldquo;Machine Learning for industrial applications: A comprehensive literature review,\u0026rdquo; Aug. 01, 2021, \u003cem\u003eElsevier Ltd\u003c/em\u003e. doi: 10.1016/j.eswa.2021.114820.\u003c/li\u003e\n\u003cli\u003eS. Ji, X. Wang, W. Zhao, and D. Guo, \u0026ldquo;An application of a three-stage XGboost-based model to sales forecasting of a cross-border e-commerce enterprise,\u0026rdquo; \u003cem\u003eMath Probl Eng\u003c/em\u003e, vol. 2019, 2019, doi: 10.1155/2019/8503252.\u003c/li\u003e\n\u003cli\u003eP. Li and J. S. Zhang, \u0026ldquo;A new hybrid method for china\u0026rsquo;s energy supply security forecasting based on ARIMA and xgboost,\u0026rdquo; \u003cem\u003eEnergies (Basel)\u003c/em\u003e, vol. 11, no. 7, 2018, doi: 10.3390/en11071687.\u003c/li\u003e\n\u003cli\u003eWang Yan and Guo Yuankai, \u0026ldquo;Forecasting method of stock market volatility in time series data based on mixed model of ARIMA and XGBoost,\u0026rdquo; Oct. 2019.\u003c/li\u003e\n\u003cli\u003eW. Fu and C. F. Chien, \u0026ldquo;UNISON data-driven intermittent demand forecast framework to empower supply chain resilience and an empirical study in electronics distribution,\u0026rdquo; \u003cem\u003eComput Ind Eng\u003c/em\u003e, vol. 135, pp. 940\u0026ndash;949, Sep. 2019, doi: 10.1016/j.cie.2019.07.002.\u003c/li\u003e\n\u003cli\u003eP. Jiang, R. Li, N. Liu, and Y. Gao, \u0026ldquo;A novel composite electricity demand forecasting framework by data processing and optimized support vector machine,\u0026rdquo; \u003cem\u003eAppl Energy\u003c/em\u003e, vol. 260, Feb. 2020, doi: 10.1016/j.apenergy.2019.114243.\u003c/li\u003e\n\u003cli\u003eY. Duan, S. Li, S. Chen, Q. Tan, C. Chen, and M. Wang, \u0026ldquo;Forecasting the short-term urban gas daily demand in winter based on the XGBoost algorithm,\u0026rdquo; in \u003cem\u003eIOP Conference Series: Earth and Environmental Science\u003c/em\u003e, IOP Publishing Ltd, Mar. 2021. doi: 10.1088/1755-1315/675/1/012150.\u003c/li\u003e\n\u003cli\u003eL. Aswanuwath, W. Pannakkong, J. Buddhakulsomsiri, J. Karnjana, and V. N. Huynh, \u0026ldquo;A Hybrid Model of VMD-EMD-FFT, Similar Days Selection Method, Stepwise Regression, and Artificial Neural Network for Daily Electricity Peak Load Forecasting,\u0026rdquo; \u003cem\u003eEnergies (Basel)\u003c/em\u003e, vol. 16, no. 4, Feb. 2023, doi: 10.3390/en16041860.\u003c/li\u003e\n\u003cli\u003eO. Ozdemir and C. Yozgatligil, \u0026ldquo;Forecasting performance of machine learning, time series, and hybrid methods for low- and high-frequency time series,\u0026rdquo; \u003cem\u003eStat Neerl\u003c/em\u003e, vol. 78, no. 2, pp. 441\u0026ndash;474, May 2024, doi: 10.1111/stan.12326.\u003c/li\u003e\n\u003cli\u003eE. \u0026Ccedil;ağlayan-Akay and K. H. Topal, \u0026ldquo;Forecasting Turkish electricity consumption: A critical analysis of single and hybrid models,\u0026rdquo; \u003cem\u003eEnergy\u003c/em\u003e, vol. 305, Oct. 2024, doi: 10.1016/j.energy.2024.132115.\u003c/li\u003e\n\u003cli\u003eT. Chen and C. Guestrin, \u0026ldquo;XGBoost: A scalable tree boosting system,\u0026rdquo; in \u003cem\u003eProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining\u003c/em\u003e, Association for Computing Machinery, Aug. 2016, pp. 785\u0026ndash;794. doi: 10.1145/2939672.2939785.\u003c/li\u003e\n\u003cli\u003eJason Brownlee, \u0026ldquo;Train-Test Split for Evaluating Machine Learning Algorithms - MachineLearningMastery.com.\u0026rdquo; Accessed: Jul. 21, 2024. [Online]. Available: https://machinelearningmastery.com/train-test-split-for-evaluating-machine-learning-algorithms/\u003c/li\u003e\n\u003cli\u003eG. E. P. Box, G. M. Jenkins, and G. C. Reinsel, \u003cem\u003eTime series analysis\u003c/em\u003e\u003cem\u003e \u003c/em\u003e\u003cem\u003e: forecasting and control\u003c/em\u003e. John Wiley, 2008.\u003c/li\u003e\n\u003cli\u003e\u0026Ouml;. G. Ali, S. Sayin, T. van Woensel, and J. Fransoo, \u0026ldquo;SKU demand forecasting in the presence of promotions,\u0026rdquo; \u003cem\u003eExpert Syst Appl\u003c/em\u003e, vol. 36, no. 10, pp. 12340\u0026ndash;12348, Dec. 2009, doi: 10.1016/j.eswa.2009.04.052.\u003c/li\u003e\n\u003cli\u003eL. Menculini \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;Comparing Prophet and Deep Learning to ARIMA in Forecasting Wholesale Food Prices,\u0026rdquo; \u003cem\u003eForecasting\u003c/em\u003e, vol. 3, no. 3, pp. 644\u0026ndash;662, Sep. 2021, doi: 10.3390/forecast3030040.\u003c/li\u003e\n\u003cli\u003eK. Y. Liu, \u0026ldquo;SUPPLY CHAIN ANALYTICS Concepts, Techniques and Applications.\u0026rdquo;\u003c/li\u003e\n\u003cli\u003eJason Brownlee, \u0026ldquo;Weighted Average Ensemble for Deep Learning Neural Networks.\u0026rdquo; Accessed: Mar. 15, 2024. [Online]. Available: https://machinelearningmastery.com/weighted-average-ensemble-with-python/\u003c/li\u003e\n\u003cli\u003e\u0026ldquo;GitHub - awesomedata/awesome-public-datasets: A topic-centric list of HQ open datasets.\u0026rdquo; Accessed: Jul. 29, 2024. [Online]. Available: https://github.com/awesomedata/awesome-public-datasets\u003c/li\u003e\n\u003cli\u003e\u0026ldquo;Find Open Datasets and Machine Learning Projects | Kaggle.\u0026rdquo; Accessed: Jul. 30, 2024. [Online]. Available: https://www.kaggle.com/datasets\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Mixed Data Patterns, XGBoost Algorithm, Ensemble Model, Product Life Cycle, Operations Management","lastPublishedDoi":"10.21203/rs.3.rs-6515650/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6515650/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eAccurate demand forecasting is essential for modern business success, as product demand follows various patterns throughout its life cycle and becomes increasingly complex due to consumer fluctuations. This paper presents a statistical demand forecasting framework that integrates both classical and machine learning methods to predict demand patterns across different phases of the product life cycle, focusing on the declining phase. Machine learning techniques are leveraged for their ability to handle complex data patterns. The framework allows each method to be applied individually or combined into an ensemble model. A grid search algorithm is utilized to optimize the weights of each forecasting technique, improving the ensemble model's performance based on the tested data. Validation across five datasets demonstrates the framework's effectiveness, with results showing that the ensemble model outperforms traditional approaches when dealing with mixed demand patterns.\u003c/p\u003e","manuscriptTitle":"A Framework for Forecasting Demand of General Time Series Data Using Regression Models and Machine Learning","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-06-12 17:50:44","doi":"10.21203/rs.3.rs-6515650/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-06-17T10:05:23+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-06-16T09:47:18+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-06-16T02:21:40+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-06-13T03:04:25+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-06-12T23:55:11+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"63853248222277477594154119001544631660","date":"2025-06-11T09:58:35+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"112948235179537346692655209408284479389","date":"2025-06-11T06:35:39+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"267777301768882532355213107272900366849","date":"2025-06-11T03:16:56+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"260455869166954035991954697000700722220","date":"2025-06-11T02:49:38+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"65779580019298699232354154928564024783","date":"2025-06-11T02:40:19+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"46364761631765123662246759123103385055","date":"2025-06-11T01:50:23+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"111944990905743574074464915406540895961","date":"2025-06-11T01:26:15+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"277534652470711425792490306940584453011","date":"2025-06-11T00:11:19+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"318334191017422646277977181290884444200","date":"2025-06-10T22:09:51+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"135251793921867860576720568907643509676","date":"2025-06-10T21:12:48+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-06-10T20:19:19+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"169526208283230848124109500895735471230","date":"2025-06-10T18:39:10+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"96463853986962795691511349522623376545","date":"2025-06-10T18:05:49+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"170293514437857092286552384205304865784","date":"2025-06-10T17:57:39+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"325076183191001591043053448773507038705","date":"2025-06-10T17:34:41+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"136221771252317119246555235540017555693","date":"2025-06-10T17:33:18+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-06-10T17:24:54+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2025-05-02T04:35:08+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-04-28T05:04:17+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-04-25T04:25:58+00:00","index":"","fulltext":""},{"type":"submitted","content":"Scientific Reports","date":"2025-04-23T22:04:09+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"e32786a3-3d70-4801-bda9-37138f9309ba","owner":[],"postedDate":"June 12th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":49877516,"name":"Physical sciences/Engineering/Mechanical engineering"},{"id":49877517,"name":"Physical sciences/Mathematics and computing/Statistics"}],"tags":[],"updatedAt":"2025-11-10T16:04:40+00:00","versionOfRecord":{"articleIdentity":"rs-6515650","link":"https://doi.org/10.1038/s41598-025-23352-w","journal":{"identity":"scientific-reports","isVorOnly":false,"title":"Scientific Reports"},"publishedOn":"2025-11-04 15:57:43","publishedOnDateReadable":"November 4th, 2025"},"versionCreatedAt":"2025-06-12 17:50:44","video":"","vorDoi":"10.1038/s41598-025-23352-w","vorDoiUrl":"https://doi.org/10.1038/s41598-025-23352-w","workflowStages":[]},"version":"v1","identity":"rs-6515650","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6515650","identity":"rs-6515650","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00