A White-box Approach to Forecasting Petrol Prices in Import-Dependent Economies base on Machine Learning and Explainable Artificial Intelligence | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article A White-box Approach to Forecasting Petrol Prices in Import-Dependent Economies base on Machine Learning and Explainable Artificial Intelligence Melanie Maliti, Aaron Zimba This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9222514/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 14 You are reading this latest preprint version Abstract Fuel price volatility presents significant economic and policy challenges in import-dependent economies such as Zambia. This study implements and compares four machine learning models: Ridge Linear Regression, Random Forest, Support Vector Regression and Long Short-Term Memory, using the MAE, RMSE, R² and DM statistic as the performance evaluation metrics. The dataset used in this research consisted of the daily petrol prices as the dependent variable and the international oil price, ZMW/US $ exchange rate, excise duty, VAT, and inflation as the independent variables, covering a period from January 2011 to September 2025. The performance evaluation revealed that Ridge Linear Regression consistently outperformed the other models, scoring a MAE of 0.0510, RMSE of 0.2356, R² of 0.9927, a DM statistic of -2.6793, and p-value of 0.0074. Explainable AI (XAI) techniques, including SHAP values, feature importance, and partial dependence plots, were integrated to enhance interpretability. The XAI results indicate that excise duty, VAT, the ZMW/US $ exchange rate, and the international oil price are dominant drivers of petrol price movements, while inflation plays a limited direct role. Machine Learning Explainable AI Petrol Price Forecasting Ridge Regression Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 1. INTRODUCTION 1.1. Background to the Study In Zambia, fuel prices have continued to show significant fluctuations over the recent years. These price fluctuations reflect changes in domestic policies, such as the subsidy removals, and the global oil market dynamics at large. The retail fuel price statistics published by the Energy Regulation Board (ERB) [1] reveal that the country has exhibited spiked rises in the price of petrol from ZMW 7.64 per litre in 2011 (post-rebasing equivalent) to ZMW 13.70 by the end of 2016 and further up to ZMW 34.98 by the first quarter of 2025. The main attributors of these hikes are the spikes in the international crude oil prices, which were caused by pandemics, such as COVID-19, and geopolitical disruptions, such as the Russia-Ukraine war [2]. The 2021 government directive to remove subsidies on fossil fuels was passed to redirect funds towards other developmental needs in the country. This directive resulted in the increase in the excise duty on petrol prices from 0.64 ZMW/L to 2.07 ZMW/L and the VAT from 0%/L to 16%/L. This resulted in large spikes in the petrol price, which have huge effects on the cost of living, transportation expenses, and the overall economic stability in the country, as petroleum is a common energy source across production, manufacturing, and distribution industries in Zambia [3]. An economic survey that was conducted showed that low-income earners lost about 29.9% of their average income to this subsidy removal, while high-income earners lost about 12%, despite the high-income earners consuming more fuel [4]. Furthermore, an investigation on the effects of fuel prices on Zambia’s economic growth suggested that there is a significant negative correlation between the oil price hikes and the economic growth of the country. The results of the exploration revealed that the spikes in oil prices do not only increase production costs but also aggravate the inflation rate, which can result in the deterioration in the fiscal balance as the governments' attempts to mitigate these consequences require the direction of more resources into oil imports [5]. A working paper done by the Bank of Zambia (BOZ) revealed that the long-run effect of petroleum prices on inflation is significant and estimated at 0.26%, while in the short-run, changes in petroleum prices are found to exert a significant but modest positive effect on inflation, estimated at 0.03% with a one-month lag [6]. These results confirm that fuel prices play a key role in the economic growth of the country and are a key driver of inflation. The ERB determined that the main factors that influence the price of fuel in the country are the International Oil Price (IOP) and the Zambian Kwacha to United States Dollar (ZMW/US$) exchange rate. ERB further added that other factors that influence the price of fuel include taxes, levies, charges, and fees for pumping and processing [7]. To maintain price stability, the ERB applies a mechanism during all price reviews such that there is a threshold of 2.5% for fuel price adjustments. What this implies is that the fuel prices will remain unchanged if they fluctuate by less than 2.5% on average during each price review. In the attempt to curb the fuel crises, the government has employed multiple mitigation initiatives, such as the revision of the fuel pricing cycle from 60 days to 30 days as of January 2022, to align the fuel prices with the consumption period of the quantity that has been procured [8]. Another initiative includes the introduction of open access on the TAZAMA pipeline to allow all eligible stakeholders to transport fuel from Dar-Es-Salaam to Ndola using the available pipeline [9]. The ERB also introduced petroleum products pricing rules to foster transparency and predictability of petroleum prices to further strengthen business planning. The Energy Regulation Board (ERB) is a regulatory institution responsible for the regulation of petroleum product prices by applying the Cost-Plus pricing model (CPM). The CPM is essentially based on total production costs plus a reasonable profit margin to ensure fairness and sustainability in their enforcement [6]. The CPM is made up of the wholesale price to OMC, terminal fee, marking fee, excise duty, transport cost, OMC margin, dealer margin, ERB fees, Strategic Reserve Fund (SRF), and Value Added Tax (VAT). Zambia’s petroleum price build-up comprises multiple interrelated factors, each contributing to fluctuations in the uniform retail pump price. Among these, international oil prices remain highly volatile, and forecasts of commodities such as crude oil are often associated with wide error bands due to their inherent unpredictability [10]. Given Zambia’s complete dependence on petroleum imports and its exposure to exchange rate fluctuations and other external shocks, forecasting petrol prices presents significant challenges [11]. In the past, fuel price forecasting mainly employed traditional time series models such as the Auto Regressive Integrated Moving Average (ARIMA). Despite their wide usage and valuable insights, these traditional time series models often struggle to capture complex, non-linear patterns within real-world datasets [12]. Machine learning (ML) techniques, on the other hand, offer advanced methods to model world complexities, and they usually provide more accurate forecasts. A study of the ARIMA, compared against Artificial Intelligence (AI) algorithms, had revealed that AI algorithms tend to display better prediction performance in most applications, with ARIMA recording a Mean Absolute Error (MAE) of 0.1927 while the other AI algorithms recorded MAE values of less than 0.1 [13]. While ML models often outperform traditional ones, they are seen as ‘black boxes’ because they do not clearly explain how their predictions are derived. A useful interpretability solution to the black box problem is the use of explainable AI (XAI). XAI refers to a set of tools and methods that are used to make machine learning model predictions more transparent and interpretable. Many studies have revealed that traditional ‘black box’ ML models deliver high predictive accuracy, however these models do not give much insight into the specific outputs generated. That is, they do not answer the ‘why’ questions or explain why the output results are the way they are. XAI provides explanations through the application of feature importance across all the predictions, which helps to give the reasoning behind each single forecast [14]. Petrol pricing in Zambia is not merely a statistical forecasting problem, but an energy-system outcome shaped by import dependence, regulatory pricing architecture, and exchange rate exposure. As a fully import-dependent fuel economy operating under a cost-plus pricing model, Zambia’s petrol prices reflect structured regulatory mechanisms rather than purely market-clearing dynamics. Consequently, forecasting accuracy must account for both global market shocks and domestic fiscal interventions embedded in the administrative pricing framework. This study therefore situates artificial intelligent (AI) within the broader energy-system context, by recognizing that predictive performance is inseparable from institutional pricing structures. 1.2. Related Works 1.2.1. Machine Learning Approaches to Fuel and Energy Forecasting Recent literature demonstrates a growing shift from traditional econometric models toward machine learning (ML) approaches in fuel and energy forecasting. A study done by Mohammad Abdulaziz Alwadi compared classical time-series models (ARIMA, SARIMA) with ML and deep learning models for fuel sales forecasting. The study found the Random Forest (RF) to outperform the ARIMA and SARIMA in predictive accuracy, scoring a coefficient of determination of R² = 0.999, followed by the LSTM-based models. While the findings highlight the superiority of ensemble methods, the evaluation relied primarily on R² and the MSE, limiting robustness in the performance assessment. Furthermore, model interpretability, such as the use of XAI, was not explored [15]. Similarly, a regional study on the South African fuel pricing evaluated feedforward neural networks (FFNN), recurrent neural networks (LSTM and GRU), and convolutional neural networks (CNN) and used simple linear regression as a baseline model [16]. The LSTM-based RNN model demonstrated competitive performance, particularly under volatile exchange rate conditions. However, the study was constrained by a relatively small dataset (180 monthly observations) and focused exclusively on the Basic Fuel Price (BFP) component, excluding fiscal and tax structures. These studies collectively suggest that nonlinear ML models can outperform traditional econometric approaches in fuel-related forecasting. However, limitations remain regarding dataset scale, evaluation breadth, and policy-variable integration. 1.2.2. Traditional Time-Series Models in Fuel Price Forecasting ARIMA-based approaches continue to be widely used in petroleum price forecasting. A Malaysian study employing weekly petrol price data (174 observations) demonstrated that ARIMA (14,1,14) outperformed lower-order specifications in short-term forecasts [17]. However, the authors acknowledged that ARIMA performance deteriorates over longer horizons and under structural shifts. Traditional models assume linearity and stationarity, which may not hold in deregulated or subsidy-reforming environments. In addition, these models do not easily accommodate complex nonlinear interactions between exchange rates, fiscal policy adjustments, and international oil price shocks. This limitation motivates the exploration of ML frameworks capable of modelling nonlinear dynamics and regime shifts. 1.2.3. Ensemble Learning in Energy Systems Forecasting Beyond fuel pricing, ensemble learning methods have shown strong predictive performance in broader energy forecasting applications. A 2025 study on marine engine fuel consumption (MEFC) prediction compared RF, Gradient Boosting, XGBoost, SVR, and linear models using MSE, R², and Kling–Gupta Efficiency (KGE). These results revealed the lowest test MSE (0.69), a robust testing R² (0.9867), and a high KGE (0.9681), with Random Forests proving to be the most appropriate model for MEFC modelling among all others. XGBoost followed closely with competitive accuracy, with MSE values of 0.75 and a robust testing of R² (0.9856) [18]. These findings reinforce the capacity of ensemble models to capture nonlinear dependencies in energy-related datasets. However, interpretability mechanisms were not integrated, limiting policy transparency. 1.2.4. Explainable Artificial Intelligence (XAI) in Energy Modelling The growing use of ML in energy systems has intensified concerns regarding interpretability. SHAP and LIME have emerged as dominant post-hoc explanation tools. A comparative evaluation of SHAP and LIME using decision trees, logistic regression, LGBM, and SVC demonstrated that explanation stability is highly sensitive to model choice and feature collinearity [19]. The study introduced the Normalized Movement Rate (NMR) metric to quantify ranking instability, revealing that feature importance attribution varies significantly across model classes. Importantly, the authors cautioned against over-reliance on standalone XAI outputs and recommended complementary diagnostic tools to improve robustness. These insights underscore the necessity of combining SHAP with permutation-based feature importance and partial dependence analysis to mitigate interpretability instability. 1.2.5. Identified Research Gaps and Contribution Despite the growing adoption of machine learning techniques in fuel and energy forecasting, several important gaps remain in the literature. Firstly, many studies rely on relatively small monthly datasets, limiting the robustness and generalizability of predictive models, particularly in volatile pricing environments. Furthermore, fiscal policy variables such as excise duty adjustments, VAT changes, and subsidy reforms are frequently excluded, even though they materially influence retail fuel prices in regulated and import-dependent economies. It can be noted that model evaluation often depends on limited performance metrics without formal statistical comparison of the forecast's accuracy. While XAI tools such as SHAP are increasingly being applied, their results are rarely complemented with additional robustness diagnostics, raising concerns about the stability of the interpretations, especially under feature collinearity. Finally, there remains limited empirical evidence from Sub-Saharan African economies, where exchange rate exposure and subsidy reforms play a dominant role in price formation. This study advances the literature by: Constructing a high-frequency daily fuel pricing dataset (2011–2025 ~ 5387 observations) for a Sub-Saharan African import-dependent economy. Applying a comparative ML framework (Ridge, RF, SVR, LSTM) with formal statistical forecast comparison using the Diebold–Mariano test. Integrating triangulated XAI diagnostics (SHAP, feature importance, PDP) to disentangle fiscal policy shocks from global market drivers. Demonstrating how regulated cost-plus pricing structures structurally favour regularised linear models over nonlinear architectures. To our knowledge, this is the first study to combine daily-frequency ML forecasting with XAI interpretability in a Sub-Saharan regulated energy market. 2. METHODS AND MODELS 2.1. Conceptual Framework The conceptual framework for this study illustrates the relationships between the main factors influencing petrol price fluctuations, the analytical techniques applied, and the intended outputs and impacts. It integrates both economic theory and data-driven artificial intelligence approaches to support robust and interpretable petrol price forecasting in Zambia. This research contributes to both the academic research body and Zambia’s energy regulation landscape. The conceptual framework is demonstrated in the figure below: This study proposes a hybrid ML-XAI forecasting model that is designed to enhance the accuracy, interpretability, and usability of petroleum price predictions in Zambia. The proposed model addresses the limitations identified in previous studies, particularly the reliance on single-variable time series models, limited forecasting horizons, and lack of transparency in predictive mechanisms. 2.2. Model Specification 2.2.1. Linear and Regularized Regression Models Let \(\:yt\in\:R\) denote the petrol price at time t and let \(\:Xt=\left(x1t,\:x2t,\:\dots\:,\:xkt\right)\) represent the vector of independent variables (international oil price, exchange rate, excise duty, VAT and inflation). a. Multivariate Linear Regression (MLR) MLR is a statistical technique used to predict dependent variables using multiple independent variables. It models the linear relationship between the independent variables and the dependent variable [ 20 ]. The standard linear regression model is defined as: $$\:yt={{X}^{{\prime\:}}}_{t}\beta\:+\:\epsilon\:t$$ Where \(\:\beta\:ϵR\) k is the parameter vector and \(\:ℇt\:\sim\:{\rm\:N}(0,{\sigma\:}^{2})\) is the disturbance term. However, MLR is highly affected by multicollinearity of variables. In the case of this study, it is very likely that variables such as the international oil price and the ZMW/US$ exchange are highly correlated. b. Ridge Regression Ridge linear regression mitigates multicollinearity by adding a penalty term proportional to the square of the magnitude of coefficients. This shrinks coefficients but does not set them to zero and thus reduces model complexity and variance and improves prediction accuracy on new data [ 21 ]. Ridge regression introduces an L2 regularization penalty: $$\:{\widehat{\beta\:}}_{ridge}=arg\left\{{\sum\:}_{t=1}^{n}(yt-{X{\prime\:}}_{t}\beta\:{)}^{2}+\lambda\:\parallel\:\beta\:{\parallel\:\:}_{2}^{2}\right\}\:$$ Where \(\:\lambda\:\ge\:0\) is the shrinkage parameter controlling the bias-variance trade-off. 2.2.2. Random Forest Random Forest is an ensemble learning method constructed from B decision trees: $$\:\widehat{f}\left(x\right)=\frac{1}{B}{\sum\:}_{b=1}^{B}{T}_{b}\left(x\right)$$ Where each tree \(\:{T}_{b}\left(x\right)\) is trained on a randomly selected subset of the original training data and a randomly chosen subset of the features. This process, known as bagging and feature randomness, helps to reduce overfitting by ensuring that individual trees are less likely to make the same errors on the data. By averaging the predictions from multiple trees, RF produces more robust and accurate forecasts compared to a single decision tree [ 22 ]. 2.2.3. Support Vector Regression (SVR) Support Vector Regression extends Support Vector Machines to regression tasks by minimizing structural risk under an \(\:\epsilon\:\) -insensitive loss function. The optimization problem is defined as: $$\:\frac{1}{2}{\parallel\:\omega\:\parallel\:}^{2}+C{\sum\:}_{i=1}^{n}({\xi\:}_{i}+\:{\xi\:}_{i}^{\ast\:})\:$$ Subjected to: $$\:{y}_{i\:}-\left({\omega\:}^{{\prime\:}}{x}_{i}+b\right)\le\:\epsilon\:+{\xi\:}_{i}$$ $$\:\left({\omega\:}^{{\prime\:}}{x}_{i}+b\right)-{y}_{i\:}\le\:\epsilon\:+{\xi\:}_{i}^{\ast\:}$$ $$\:{\xi\:}_{i},{\xi\:}_{i}^{\ast\:}\ge\:0$$ Where: \(\:C\) controls the penalty for deviations, \(\:\epsilon\:\) defines the width of the insensitive tube, \(\:{\xi\:}_{i},{\xi\:}_{i}^{\ast\:}\:\) are slack variables By forming a flexible tube of minimal radius around the estimate function, absolute values of errors below a certain threshold are ignored above and below the function. Therefore, all points outside the tube are penalized, while those within receive no penalty [ 23 ]. 2.2.4. Long Short-Term Memory (LSTM) Networks Long Short-Term Memory (LSTM) networks are a type of recurrent neural network (RNN) under the family of deep learning models. The internal gating mechanism is defined by: $$\:{f}_{t}=\sigma\:({W}_{f}\left[{h}_{t-1},\:{x}_{t}\right]+{b}_{f})$$ $$\:{i}_{t}=\sigma\:({W}_{i}\left[{h}_{t-1},\:{x}_{t}\right]+{b}_{i})$$ $$\:{\stackrel{\sim}{c}}_{t}=tanh({W}_{c}\left[{h}_{t-1},\:{x}_{t}\right]+{b}_{c})$$ $$\:{c}_{t}={f}_{t}⨀{c}_{t-1}+{i}_{t}⨀{\stackrel{\sim}{c}}_{t}$$ $$\:{h}_{t}={o}_{t}⨀tanh\left({c}_{t}\right)$$ Where: \(\:{f}_{t}\) is the forget gate, \(\:{i}_{t}\) is the input gate, \(\:{c}_{t}\:\) is the cell state, \(\:{h}_{t}\) is the hidden state. The key principle of the LSTM centres around two objectives, data and the control of data. The data component prepares the candidate data signals while the control component prepares the throttle signals through input, output and forget gates, which regulate how information is stored, updated and disposed [ 24 ]. 2.3. Explainable AI Framework To ensure interpretability, the best-performing model was subjected to feature attribution analysis using SHAP and permutation importance methods. 2.3.1. Shapley additive explanations (SHAP) SHAP is a XAI technique that quantifies the contribution of every input feature in ML-based prediction models. SHAP is based on the cooperative game theory, which considers all conceivable feature combinations to guarantee a fair distribution of feature importance. For a model \(\:{f}_{x}\) , the SHAP decomposition is: $$\:f\left(x\right)={\varphi\:}_{0}+{\sum\:}_{j=1}^{k}{\varphi\:}_{j}$$ Where: \(\:{\varphi\:}_{0}\:\) is the expected model output, \(\:{\varphi\:}_{j}\:\) represents the marginal contribution of feature j. The idea of SHAP values is to assign each feature a value that represents its contribution to the difference between the actual prediction and the prediction that would have been made in the absence of that feature [ 14 ]. 2.3.2. Feature Importance Feature importance, in relation to ML, describes how much covariates contribute to a prediction model’s accuracy. A commonly used method of carrying out feature importance is known as the permutation-based feature importance approach, which is computed as the increase in prediction error after randomly permuting feature \(\:j\) [ 25 ]: $$\:F{I}_{j}=E\left[L\left(f\left(X\right),y\right)\right]-E\left[L\left(f\left({X}_{perm\left(j\right)}\right),y\right)\right]$$ Where \(\:L(\bullet\:)\) is the loss function and \(\:{X}_{perm\left(j\right)}\) denotes the dataset with feature \(\:j\) permuted. This approach quantifies the model’s reliance on each predictor by measuring degradation in predictive performance. 2.4. Performance Evaluation Metrics To ensure robust comparison across linear, ensemble, kernel-based, and deep learning models, predictive performance was evaluated using scale-dependent, variance-based, and statistical forecast comparison metrics. The \(\:MAE\) and \(\:RMSE\) capture scale-dependent forecast accuracy, while \(\:\:\:{R}^{2}\:\) evaluates explanatory strength. The Diebold–Mariano test provides statistical validation of forecast dominance. Together, these metrics ensure comprehensive evaluation of predictive performance in a volatile, policy-sensitive energy pricing environment. Let \(\:{y}_{t}\) denote the observed petrol price at time, \(\:t,\:{\widehat{y}}_{t}\) the model prediction, and \(\:n\) the number of observations in the evaluation set. 2.4.1. Mean Absolute Error (MAE) Mean Absolute Error measures the average magnitude of prediction errors without considering direction: $$\:MAE=\frac{1}{n}{\sum\:}_{t=1}^{n}|{y}_{t}-{\widehat{y}}_{t}|$$ MAE provides an interpretable measure of average deviation in price units and is less sensitive to extreme outliers. 2.4.2. Root Mean Squared Error (RMSE) Root Mean Squared Error penalizes larger deviations more heavily due to squaring: $$\:RMSE=\sqrt{\frac{1}{n}{\sum\:}_{t=1}^{n}{({y}_{t}-{\widehat{y}}_{t})}^{2}}$$ Because fuel prices may experience sudden structural shifts, RMSE is particularly relevant for capturing the impact of large forecast errors during volatile periods. 2.4.3. Coefficient of Determination (R²) The coefficient of determination evaluates the proportion of variance explained by the model: $$\:{R}^{2}=1-\frac{{\sum\:}_{t=1}^{n}{({y}_{t}-{\widehat{y}}_{t})}^{2}}{{\sum\:}_{t=1}^{n}{({y}_{t}-\underset{\_}{{y}_{t}})}^{2}}$$ Where \(\:\underset{\_}{{y}_{t}}\:\) denotes the sample mean of the observed prices. Higher \(\:{R}^{2}\) values indicate stronger explanatory power. 2.4.4. Diebold–Mariano test While error metrics assess absolute performance, statistical comparison of competing forecasts was conducted using the Diebold–Mariano test. Let the forecast errors from two competing models be: $$\:{e}_{1,t}={y}_{t}-{\widehat{y}}_{1,t},\:{e}_{2,t}={y}_{t}-{\widehat{y}}_{2,t}$$ Define the loss differential: $$\:{d}_{t}=L\left({e}_{1,t}\right)-L\left({e}_{2,t}\right)$$ Where \(\:L\left(\bullet\:\right)\) is the chosen loss function. The DM statistic is computed as: $$\:DM=\frac{\underset{\_}{d}}{\sqrt{\widehat{V}ar\left(\underset{\_}{d}\right)}}$$ Where: $$\:\underset{\_}{d}=\frac{1}{n}{\sum\:}_{t=1}^{n}{d}_{t}$$ Under the null hypothesis: $$\:{H}_{0}:E\left({d}_{t}\right)=0$$ there is no statistically significant difference in forecast accuracy between the two models. The Diebold–Mariano test compares each model against a Naive (random walk) benchmark, which is defined as a no-change forecast where the next period’s petrol price equals the previous observed price. This benchmark is standard in time-series evaluation and provides a robust baseline for assessing predictive improvement. 2.5. Research design This research adopted a data-driven predictive modelling design with an experimental quantitative approach. The experimental design allows for comparison of four ML algorithms (Ridge LR, RF, SVR and LSTM), using standardized accuracy metrics (RMSE, MAE, R² and the DM test) to ensure objectivity and reproducibility. Furthermore, the application of Explainable AI (XAI) techniques adds an exploratory analytical layer which enables the interpretation of model outputs and identification of key economic and market drivers. Together, this integrated methodology enhances the validity of conclusions drawn, as model predictions are evaluated systematically. It also supports the reliability through cross-validation and standardized performance metrics. The research design is illustrated in the figure below: a. Data Collection This research uses data from ERB online datasets, energy sector reports and monthly fuel price publications and from BOZ published fortnightly time series statistics. The data collected from ERB sources include petrol pump prices (the dependent variable), excise duty and VAT from January 2011 to September 2025. The data is aggregated at a daily frequency, which enables high-resolution analysis of market behaviour and allows the machine learning models to better capture volatility. The dataset created is illustrated in the table below: Table 3.2 Petrol Price Dataset Date ZMW/US$ IOP (US$/Barrel) Excise Duty (ZMW/L) VAT (%/L) Inflation (%) Petrol Price (ZMW) 31/08/2025 23.61 68.83 2.07 16% 0.4% 28 01/09/2025 23.62 68.68 2.07 16% 0.5% 29.18 02/09/2025 23.69 68.53 2.07 16% 0.5% 29.18 b. Exploratory Data Analysis (EDA) EDA will be conducted to uncover structural patterns, variable relationships, and data anomalies. Time-series visualizations such as line charts will be used to examine long-term fuel price trends and detect volatility spikes. Correlation heatmaps will be used to highlight the relationships between fuel prices and explanatory variables such as crude oil prices and exchange rates. Boxplots will be used to identify outliers and assess the distributional properties of the variables. Decomposition plots will separate the time series into trend, seasonality, and residual components, offering insights into cyclical behaviours. Tools such as Matplotlib, Seaborn, and Pandas will facilitate the visual analysis process. This stage will provide not only descriptive insights, but also guidance for feature selection in the modelling phase. c. Data Preprocessing & Feature Engineering Given the complexity of fuel price data, preprocessing is essential to ensure data quality and integrity. Missing values will be addressed using interpolation and imputation techniques, ensuring continuity in the time series without distorting underlying trends. Outliers will be identified using boxplots and corrected or capped to reduce their influence on the model training. To maintain comparability, all variables will be expressed in consistent units and feature scaling using standardization will be applied to ensure balanced feature contributions. Thereafter, the dataset will be split into training (70%), validation (15%) and testing (15%) subsets to enable performance evaluation. Feature engineering will then be applied to enhance the datasets predictive power. Lag variables will be created (1-day, 7-day and 30-day) to capture the delayed effects of exchange rate movements or policy interventions such as subsidy removal on fuel prices. Moving averages and exponential smoothing terms will account for short-term trends and reduce volatility caused by daily fluctuations. Seasonal and cyclical indicators will be extracted to identify recurring fluctuations (such as monthly or quarterly petrol price adjustments), and categorical features representing regulatory interventions (Subsidies, changes in excise duty) will be encoded using one-hot encoding. This step is critical to ensure that the models capture both temporal dependencies and external policy-driven dynamics influencing fuel prices. d. Model Development & Implementation The proposed models are developed and implemented using the Python programming environment because it is an open-source resource and is efficient for data analysis and machine learning tasks. The Pandas and Numpy libraries will be used for data manipulation, Scikit-learn for the development of traditional ML algorithms (Ridge Linear Regression, SVM and RF) and TensorFlow for the development of deep learning models (LSTM). Matplotlib and Seaborn packages for data visualization and performance diagnostics. During the model implementation phase, each model is coded and trained on the petrol price dataset and will undergo hyperparameter tuning to optimize performance and reduce overfitting. Grid search cross-validation will be used for RF and SVR to identify the number of trees, maximum depth, kernel type and the regularization parameter (C). While iterative optimization techniques (Adam optimizer) will be used for the LSTM model to tune parameters such as the learning rate, number of hidden layers, neuron count, dropout rate, and training epochs. The models will be evaluated on the training and validation subsets to ensure convergence, stability, and generalization before final testing using the time-series split approach through the TimeSeriesSplit function in the Scikit-learn library. e. Model Evaluation and Selection The predictive performance of the models will be evaluated using three statistical metrics; Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and the Coefficient of Determination (R²) score. In addition to the standard error metrics, the Diebold-Mariano (DM) test will be employed to statistically compare the forecasting accuracy of competing models. This study uses the Naive Bayes as a baseline model for the DM test as it typically produces competitive performances and is ideal for a solid baseline for the comparison with more complicated algorithms [ 26 ]. Based on the outcomes of these evaluations, the models will be ranked across all performance metrics, with the top-performing model selected for further explainability analysis. f. Explainable AI Integration To enhance transparency and interpretability, this research design integrates Explainable Artificial Intelligence (XAI) techniques to the best performing model, to uncover the relationships between the variables and petrol prices. As an initial step, feature importance analysis will be conducted to identify and rank the most influential variables (ZMW/US$ exchange rate, international crude oil price, inflation rate, excise duty, VAT). Thereafter, SHAP analysis and Partial Dependence Plots (PDPs) will be used to further interpret the models behaviour. SHAP will quantify each variable’s contribution to predicted petrol prices across time, while PDPs will visualize the marginal effects of individual predictors, highlighting potential non-linearities and interaction effects[ 27 ]. 3. RESULTS AND DISCUSSIONS The results were interpreted within the institutional structure of Zambia’s regulated energy system. Unlike deregulated fuel markets where seasonal demand patterns drive pricing, Zambia’s cost-plus mechanism dampens stochastic fluctuations and embeds fiscal decisions directly into the pump price. 3.1. Exploratory Data Analysis a. Petrol Price Trend from 2011–2025: The petrol price trend line graph from 2011 to 2025 shows a continuous upward price movement with distinct periods of stability and abrupt price hikes, particularly after 2021. Figure 3.1 .1: Petrol Price Trend From 2011 to around 2016, the petrol prices remained relatively stable with gradual increases. From 2017, the trend shifts upward more noticeably, and from 2021 onwards, petrol prices rise sharply. These hikes are in line with the heightened global crude oil volatility, the significant exchange rate depreciation of the Zambian Kwacha, and shifts in Zambia’s fuel subsidy and taxation policies. The years 2022 to 2024 show more pronounced volatility, which can be marked by the change in the price adjustment period from quarterly to monthly. The second half of 2025 illustrates slight declines in the petrol price which are primarily due to the stabilization of the ZMW/US$ exchange rate. However, compared with previous years, the petrol prices are still at very high levels. b. Correlation Heat Map: The correlation heatmap demonstrates the strength of the relationships between the target variable (the petrol price) and the key drivers, as well as the strength of the relationship between the key drivers. The correlation heat map revealed that petrol prices in Zambia are strongly influenced by the exchange rate and taxation variables. The ZMW/US$ exchange rate recorded the highest correlation with the petrol prices of 0.93. This confirms that there is a substantial effect of currency depreciation on the cost of the imported fuel, which ultimately increases the price of petrol. Similarly, excise duty and VAT both exhibited strong and equal positive correlations with the petrol pump price of 0.85 each. This confirms the critical role of fiscal policy in petrol price determination and mitigation strategies. In contrast, the relationship between petrol prices and the International Oil Price (IOP) was seen to be weak and slightly negative, recording a correlation of -0.08. This suggests that the immediate impact of international oil prices may be moderated by the domestic pricing mechanisms. The inflation showed a low correlation with the fuel price movements of 0.17. This means that the general inflation rates do not directly determine petrol price adjustments in Zambia. c. Box plot: The boxplot is used to show differences in the variability of the variables influencing petrol prices. The figure below provides a visual summary of the distribution, spread, and presence of outliers across the key variables influencing petrol prices in Zambia: The International Oil Price (IOP) displayed the widest spread. This is in line with the volatility in the global oil markets and the variance in crude oil prices. However, the domestic tax components, such as excise duty and VAT, exhibited minimal variability. This result reflects their policy-determined and relatively stable nature, as fiscal policies do not change frequently. The ZMW/US$ exchange rate also showed a noticeable spread, which is consistent with the currency volatility that has been experienced in the Zambian economy. The petrol prices themselves fell within a moderate range, suggesting that the price determination mechanisms employed by ERB dampen the effects of the global price volatility. Inflation showed relatively low variability with a few outliers. This further reinforces its limited direct role in the petrol price movements d. Seasonal Decomposition: The seasonal decomposition graph splits the petrol price time series into its underlying components, which are the trend, seasonal pattern, and residual fluctuations, to provide insight on how the petrol prices change over time. The figure below illustrates the seasonal decomposition of the petrol prices in Zambia: The seasonal decomposition of the petrol price series reveals that fuel prices in Zambia exhibit a clear upward long-term trend with no meaningful seasonal pattern. The trend component shows steady growth throughout the study period, with pronounced acceleration after 2020, reflecting currency depreciation, global oil market disruptions, and policy changes. The seasonal component remains flat, indicating the absence of recurring monthly or annual patterns, which is consistent with Zambia’s administrative fuel pricing framework rather than market-driven seasonality. Residual fluctuations remain modest in the early years but become increasingly volatile after 2018, particularly between 2020 and 2024, when global and domestic shocks intensified. These results confirm that petrol prices are primarily influenced by long-term structural factors and irregular shocks rather than predictable seasonal cycles. e. Actual Vs Ridge Rolling Forecast: The figure showed how closely the ridge rolling forecast overlaps the actual petrol price movements. This indicates a strong alignment between the predicted values from the ridge LR model and the actual values, confirming that the model is indeed ideal for this forecasting task. 3.2. Performance Evaluation The four ML and the ridge rolling forecast models were evaluated and compared using the MAE, RMSE, R² and the Diebold Mariano Test. The results of the evaluation are presented in the table below: Table 3.2 .1 Performance Evaluation Results of the ML Models ML MODEL MAE RMSE R² DM statistic P-Value Ridge LR 0.0510 0.2356 0.9927 -2.6793 0.0074 RF 3.6034 4.3177 -1.1106 30.9154 0.0000 SVM 5.7043 6.7710 -5.0699 28.1765 0.0000 LSTM 1.2736 1.4630 0.6760 30.7323 0.0000 Table 3.2 .2 Rolling Forecast Evaluation Results ML MODEL MAE RMSE R² DM statistic P-Value Rolling Ridge 0.0576 0.2477 0.9920 -2.4655 0.0137 Across all evaluation metrics, the Ridge regression demonstrated clear and decisive superiority. It achieved the lowest MAE (0.0510) and RMSE (0.2356), indicating minimal average deviation and limited extreme forecast errors, while explaining 99.27% of price variance (R² = 0.9927). The rolling Ridge specification maintained comparable performance (R² = 0.9920), confirming temporal robustness. By contrast, the LSTM delivered moderate performance (R² = 0.6760) but with materially higher error magnitudes, suggesting only partial capture of temporal dependencies. More critically, Random Forest and SVR produced negative R² values (–1.1106 and − 5.0699), meaning their forecasts performed worse than a simple mean benchmark and failed to represent the structural pricing relationships embedded within Zambia’s regulated fuel system. The Diebold–Mariano results reinforce this hierarchy. The Ridge model yielded a statistically significant negative DM statistic (–2.6793, p = 0.0074), confirming superior accuracy relative to the naive random walk benchmark. In contrast, the nonlinear models generated large positive DM values with near-zero p-values, indicating systematically inferior predictive performance. These findings suggest that petrol pricing in regulated energy systems follows administratively structured linear relationships rather than stochastic nonlinear dynamics. Policy variables change discretely through regulatory adjustments rather than continuous probabilistic processes. Regularisation in Ridge regression suppresses noise while preserving the structural relationships inherent in the cost-plus pricing framework. In such institutional contexts, model simplicity aligned with pricing architecture outperforms algorithmic complexity. 3.3. Explainable AI (XAI) Analysis To enhance interpretability of the best-performing model, a triangulated XAI framework was applied to the Ridge regression specification using coefficient-based feature importance, SHAP (Shapley Additive Explanations), and partial dependence plots (PDPs). This multi-method approach enables distinction between structural contribution, marginal impact, and functional sensitivity of predictors within Zambia’s regulated fuel pricing system. a. Feature Importance Analysis The table below presents the feature importance analysis results of the key influencers of the petrol price in Zambia. Table 3.3 .1 Feature Importance Analysis Results FEATURE COEFFICIENT IOP 0.0115 ZMW/US$ 0.0039 Excise Duty 0.0034 VAT 0.0034 Inflation 0.0023 The figure below shows the relative contribution of each investigated petrol price driver to the model’s predictive performance. Features with higher importance scores are the ones that the model relies on most to minimize forecast errors. The analysis revealed that the IOP (international oil price) is the strongest predictor of the petrol price changes, followed by the exchange rate (ZMW/US$). These results are consistent with the ERB reported top determinants of fuel prices. These two variables exert the most influence because Zambia imports fuel and is therefore sensitive to global oil markets and currency fluctuations. Excise Duty and VAT also contribute to price movements but at a smaller magnitude, reflecting their role as fiscal policy components in fuel pricing. Inflation has the weakest coefficient, indicating that general macroeconomic conditions have a limited direct impact on short-term petrol price variations [ 28 ]. b. SHAP Analysis The table below presents the SHAP analysis results of the key influencers of the petrol price in Zambia. Table 3.3 .2 SHAP Analysis results FEATURE MEAN_ABSOLUTE_SHAP VALUE IOP 0.0043 ZMW/US$ 0.0100 Excise Duty 0.0135 VAT 0.0135 Inflation 0.0015 The figure below further illustrates the results of the SHAP analysis which gives insights to the model's behaviour by demonstrating both the relative importance of the key influencer variables and the direction of their influence on predicted petrol prices. The SHAP results show that Excise Duty and VAT are the strongest drivers of petrol price predictions, indicating that tax related policy changes have large impacts on the model’s output. This result is consistent with the ERBs explanations for the fuel price hikes experienced after 2021 due to subsidy removals. The exchange rate (ZMW/US$) also has a notable influence, reflecting Zambia’s dependence on imported fuel as earlier explained. The IOP has a moderate effect, aligning with the role of the global oil prices in determining the baseline fuel costs. Inflation has the weakest impact, suggesting it plays only an indirect role in short-term price changes. c. Partial Dependence Plots The partial dependence plots show how changes in each predictor variable influence the model’s predicted petrol price when all the other variables are held constant. The figure below suggests that only a few variables meaningfully drive prediction changes, while others exhibit minimal or negligible influence due to limited variation or low model sensitivity. The ZMW/US$ exchange rate and IOP display clear positive and near-linear relationships with petrol price forecasts, confirming their persistent role in price transmission. In contrast, excise duty and VAT show relatively flat curves due to limited temporal variation; however, this does not imply insignificance. Rather, it indicates that these variables change infrequently but induce substantial discrete price adjustments when modified. Inflation again exhibits weak functional influence, reinforcing its secondary role. [ 29 ]. While coefficient-based feature importance suggests that the International Oil Price (IOP) strongly influences long-term price formation, SHAP values reveal that discrete fiscal policy changes (Excise Duty and VAT adjustments) generate the largest marginal impacts during prediction periods. This apparent divergence reflects the difference between structural contribution and marginal shock attribution. IOP shapes the underlying cost baseline over time, whereas tax adjustments introduce discrete level shifts that significantly alter short-run predictions. Partial dependence plots appear flat for tax variables due to limited temporal variability; however, SHAP captures their large effect during policy change points. This triangulation confirms that fiscal shocks matter more during transition periods, while global oil prices dominate continuous trend formation. Policy Implications for Energy Regulators The findings imply that fiscal instruments (Excise Duty and VAT) are the most powerful short-run levers affecting petrol prices. The ERB’s 2.5% price adjustment threshold may dampen minor fluctuations but does not offset discrete fiscal shocks. Policymakers should therefore consider targeted compensatory measures when implementing tax reforms. Additionally, the strong performance of Ridge regression suggests that a regularised linear forecasting model could be operationalised within the ERB for short-term predictive planning and stress testing under exchange rate shock scenarios. From a social protection perspective, since tax variables disproportionately affect retail prices, targeted subsidies for vulnerable households may be more efficient than broad fuel price suppression. 4. CONCLUSION The empirical results demonstrate that Ridge regression consistently outperforms the other model implemented in forecasting petrol prices in Zambia. The model achieved the lowest forecast errors and explained over 99% of the variance in the observed data. While LSTM captured certain temporal dynamics, its predictive accuracy remained substantially below that of Ridge regression. The poor performance of RF and SVR, including negative R² values, indicates that increased model complexity does not necessarily translate into improved performance within regulated energy pricing environments. Unlike deregulated markets where seasonality reflects demand cycles (e.g., winter heating demand in OECD economies), Zambia’s administrative pricing eliminates recurring seasonal patterns. The flat seasonal decomposition therefore confirms institutional pricing dominance over market seasonality. The transition from a 60-day to 30-day pricing cycle likely increased price transmission speed, contributing to higher volatility after 2021. The Ridge model’s stability during this transition indicates robustness to structural regime shifts. The dominance of regularised linear modelling suggests that administratively structured energy pricing mechanisms favour models aligned with institutional cost-plus formulations. In such environments, policy-induced level shifts and exchange rate movements drive price dynamics more than nonlinear stochastic interactions. These findings extend beyond Zambia, offering insight into forecasting within other regulated energy markets. This study makes three principal contributions. It integrates daily-frequency machine learning forecasting with formal Diebold–Mariano statistical comparison and triangulated explainable AI diagnostics, enhancing both predictive rigor and interpretability. Furthermore, it demonstrated that fiscal policy instruments, particularly excise duty and VAT, alongside exchange rate exposure exert stronger influence on domestic petrol prices than global oil price fluctuations within a regulated, import-dependent economy. Practically, this study provides a deployable Ridge-based forecasting framework suitable for regulatory application within Zambia’s energy pricing system. Future research should expand the modelling framework to incorporate additional cost components such as inland transportation, storage and distribution margins, freight and insurance costs, and geopolitical shock indicators. Provincial-level heterogeneity and distribution zone effects may also yield further insight [ 30 ]. Extending the framework to other petroleum products, electricity tariffs [ 31 ], public transport fares [ 32 ], and staple food items [ 33 ] would enhance understanding of cost-of-living transmission effects. Hybrid statistical–deep learning architectures may also be explored to evaluate performance under structural regime changes. Overall, the integration of machine learning with explainable AI in this study provides a transparent and policy-relevant forecasting approach capable of supporting evidence-based fuel pricing decisions in regulated energy systems [ 34 ]. Declarations Conflict of Interest The authors declare no conflict of interest. Ethical Approval Not Applicable. Consent to Participate Not Applicable. Consent to Publish Not Applicable. Funding This research received no external funding. Author Contribution M.M conceptualised the study, conducted data analysis, developed the models, and wrote the manuscript. A.Z hosted the research at his institution, provided critical support and insights, and reviewed the manuscript. Both authors reviewed and approved the final version of the manuscript. Data Availability The datasets used in this study are derived from publicly accessible sources including ERB and BOZ. The compiled dataset supporting this study is available at: https://doi.org/10.5281/zenodo.19364812 References Petroleum - Energy Regulation Board. Accessed: Oct. 16, 2025. [Online]. Available: https://www.erb.org.zm/petroleum Ari A, Granados CM. The Energy Price Shock-Impact. Policy Responses, and Reform Options United Kingdom; 2023. ZAMBIA. Fuel Subsidy Removal in Zambia. Chulu Sefuka, Haabazoka L. Investigating the effects of fuel prices on Zambia’s economic growth. World J Adv Res Reviews. Apr. 2025;26(1):716–36. 10.30574/wjarr.2025.26.1.1048 . Wakumelo M. Fuel Prices and Inflation in Zambia. Energy Regulation Board-ANNUAL REPORT. | 2024 VISION, MISSION AND VALUES. [Online]. Available: Banda A, Malama T. The Effects of Monthly Price Adjustments of Fuel by Energy Regulations Board (ERB) on Petroleum Companies in Zambia. Am J Industrial Bus Manage. 2025;15(04):661–80. 10.4236/ajibm.2025.154032 . Elijah E, Sichone C, END OF YEAR PRESS BRIEFING MEMBERS OF THE PRESS ERB. MANAGEMENT AND STAFF LADIES AND GENTLEMEN INTRODUCTION. Baffes J, Bank W, Kose MA, Ohnsorge F, Stocker M. The Great Plunge in Oil Prices: Causes, Consequences, and Policy Responses, 2015. [Online]. Available: http://ssrn.com/abstract=2624398 Sharma B, Shrestha A. Petroleum dependence in developing countries with an emphasis on Nepal and potential keys, Jan. 01, 2023, Elsevier Ltd . 10.1016/j.esr.2023.101053 Lu S. Research on GDP Forecast Analysis Combining BP Neural Network and ARIMA Model, Comput. Intell. Neurosci. , vol. 2021, 2021. 10.1155/2021/1026978 Alizadegan H, Rashidi Malki B, Radmehr A, Karimi H, Ilani MA. Comparative study of long short-term memory (LSTM), bidirectional LSTM, and traditional machine learning approaches for energy consumption prediction, Energy Exploration & Exploitation , vol. 43, no. 1, pp. 281–301, Jan. 2025, 10.1177/01445987241269496 Adadi A, Berrada M. Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI). IEEE Access. 2018;6:52138–60. 10.1109/ACCESS.2018.2870052 . Alwadi MA. Fuel Sales Price Forecasting using Time Series, Machine Learning, and Deep Learning Models, Engineering, Technology and Applied Science Research , vol. 15, no. 3, pp. 22360–22366, Jun. 2025, 10.48084/etasr.10348 Kingwill R, Brink WH. Evaluating the eectiveness of neural network techniques in the forecasting of South African basic fuel prices, 2019. [Online]. Available: https://scholar.sun.ac.za Sokkalingam R, Sarpong-Streetor RMNY, Othman M, Daud H, Owusu DA. Forecasting Petroleum Fuel Price in Malaysia by ARIMA Model, in Springer Proceedings in Complexity , Springer Science and Business Media B.V., 2021, pp. 671–678. 10.1007/978-981-16-4513-6_58 Hoang AT et al. Dec., Explainable machine learning-based prediction of fuel consumption in ship main engines using operational data, Brodogradnja , vol. 76, no. 4, pp. 1–24, 2025, 10.21278/brod76405 Salih AM, et al. A Perspective on Explainable Artificial Intelligence Methods: SHAP and LIME. Adv Intell Syst. Jan. 2025;7(1). 10.1002/aisy.202400304 . Maulud D, Abdulazeez AM. A Review on Linear Regression Comprehensive in Machine Learning, Journal of Applied Science and Technology Trends , vol. 1, no. 2, pp. 140–147, Dec. 2020, 10.38094/jastt1457 Golam Kibria BM. More than hundred (100) estimators for estimating the shrinkage parameter in a linear and generalized linear ridge regression models. J Econometrics Stat. 2022;2(2):233–52. 10.47509/JES.2022.v02i02.06 . Salman HA, Kalakech A, Steiti A. Random Forest Algorithm Overview, Babylonian Journal of Machine Learning , vol. 2024, pp. 69–79, Jun. 2024. 10.58496/bjml/2024/007 Support Vector Regression. Sherstinsky A. Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network. Jul. 2023. 10.1016/j.physd.2019.132306 . Främling K. Feature Importance versus Feature Influence and What It Signifies for Explainable AI ⋆. [Online]. Available: https://www.umu.se/personal/kary-framling/ D. Pajila Assistant Professor-Senior Grade, Bg. Sheena Assistant Professor D, Associate Professor D, Professor, editors. and S. R. Subramanian Associate Professor, A Comprehensive Survey on Naive Bayes Algorithm: Advantages, Limitations and Applications. Ponce-Bobadilla AV, Schmitt V, Maier CS, Mensing S, Stodtmann S. Practical guide to SHAP analysis: Explaining supervised machine learning model predictions in drug development. Clin Transl Sci. Nov. 2024;17(11). 10.1111/cts.70056 . Saarela M, Jauhiainen S. Comparison of feature importance measures as explanations for classification models. SN Appl Sci. Feb. 2021;3(2). 10.1007/s42452-021-04148-9 . Molnar C, et al. Relating the Partial Dependence Plot and Permutation Feature Importance to the Data Generating Process. in Communications in Computer and Information Science. Springer Science and Business Media Deutschland GmbH; 2023. pp. 456–79. 10.1007/978-3-031-44064-9_24 . Chen J, et al. City- and county-level spatio-temporal energy consumption and efficiency datasets for China from 1997 to 2017. Sci Data. Dec. 2022;9(1). 10.1038/s41597-022-01240-6 . Lee MHL, et al. A Comparative Study of Forecasting Electricity Consumption Using Machine Learning Models. Mathematics. Apr. 2022;10(8). 10.3390/math10081329 . Amicosante A, Avenali A, ’ Alfonso TD, Giagnorio M, Manno A, Matteucci G. Predicting costs of local public bus transport services through machine learning methods. THE. HORIZON REVIEW ISSUE 2. Mkhize MM. Development of a dimensionless model for simulating key parameters in solar distillation systems. Discover Energy. Jan. 2026;6(1). 10.1007/s43937-025-00108-1 . Additional Declarations No competing interests reported. Cite Share Download PDF Status: Under Review Version 1 posted Reviews received at journal 08 May, 2026 Reviews received at journal 05 May, 2026 Reviews received at journal 02 May, 2026 Reviews received at journal 28 Apr, 2026 Reviewers agreed at journal 28 Apr, 2026 Reviewers agreed at journal 27 Apr, 2026 Reviewers agreed at journal 27 Apr, 2026 Reviewers agreed at journal 27 Apr, 2026 Reviewers agreed at journal 25 Apr, 2026 Reviewers agreed at journal 25 Apr, 2026 Reviewers invited by journal 10 Apr, 2026 Editor assigned by journal 03 Apr, 2026 Submission checks completed at journal 01 Apr, 2026 First submitted to journal 01 Apr, 2026 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9222514","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":624293724,"identity":"37d39d29-c083-454f-a588-2918af956205","order_by":0,"name":"Melanie Maliti","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABEUlEQVRIiWNgGAWjYFAC5gYGBgMYAwj42JkPACkJGdxaGGFaGCFa2JjZEkBaePBrQWawMfOATcCpRbe9sfFzQYENmPHh5557ckAtn1/dqLHgYWA/fHQDFi1mZw42S88wSAMzJHueFRuzMfNus845BnQYT1raDWxabiQ2SPMYHAYx2hh4DiQktgG1GOewAbVI8Jhh1XL/YfNvHoP/IEYb4x+wFp5nxjn/8Gi5wdgGtOUAmMEMsYWH+XFuGx4tZxLbrHkMknmAjGZpmQMJQL+wmTHn9knwsOHyy/HDh2/z/LGTAzIOfnxzIEGOn7358eecb3VAxuFj2LTAAEossEmASTzKMQDzB1JUj4JRMApGwbAHAByXXZJ5+b6HAAAAAElFTkSuQmCC","orcid":"","institution":"ZCAS University","correspondingAuthor":true,"prefix":"","firstName":"Melanie","middleName":"","lastName":"Maliti","suffix":""},{"id":624293725,"identity":"33cf810b-6b40-4e4c-a173-9a1ec91c34fa","order_by":1,"name":"Aaron Zimba","email":"","orcid":"","institution":"ZCAS University","correspondingAuthor":false,"prefix":"","firstName":"Aaron","middleName":"","lastName":"Zimba","suffix":""}],"badges":[],"createdAt":"2026-03-25 11:39:13","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-9222514/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9222514/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":107484947,"identity":"87bb588a-6deb-4a0a-8b37-36ffc983d167","added_by":"auto","created_at":"2026-04-22 02:33:21","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":123938,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eFigure 2.1: Conceptual Framework\u003c/em\u003e\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-9222514/v1/eadb44c73101a6c600165cb9.png"},{"id":107486102,"identity":"dce0aa17-b8a2-45eb-8ea8-3d08d11944ce","added_by":"auto","created_at":"2026-04-22 02:37:26","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":124588,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eFigure 2.5.1: Research Stages\u003c/em\u003e\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-9222514/v1/7bfa9a400bf4c1a65d7c0bf9.png"},{"id":107317790,"identity":"a460a7c8-3cee-448e-8b53-4ae607442de8","added_by":"auto","created_at":"2026-04-20 10:00:30","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":25458,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eFigure 3.1.1: Petrol Price Trend\u003c/em\u003e\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-9222514/v1/a70335d28b91921f6b5afa5d.png"},{"id":107317798,"identity":"865f64c2-7cd5-440a-b0d9-f35cab691131","added_by":"auto","created_at":"2026-04-20 10:00:33","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":45068,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eFigure 3.1.2: Correlation Heatmap\u003c/em\u003e\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-9222514/v1/c5bb3e8812e263a476424d96.png"},{"id":107486013,"identity":"e74a589c-2d5d-4747-bd6d-6bac04790614","added_by":"auto","created_at":"2026-04-22 02:37:10","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":15607,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eFigure 3.1.3: Box plot\u003c/em\u003e\u003c/p\u003e","description":"","filename":"5.png","url":"https://assets-eu.researchsquare.com/files/rs-9222514/v1/1fdd1e299a4b30c8659b9510.png"},{"id":107485508,"identity":"053d03e3-5b79-42aa-83da-fb7fd0b5ca18","added_by":"auto","created_at":"2026-04-22 02:35:15","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":39877,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eFigure 3.1.4: Seasonal Decomposition\u003c/em\u003e\u003c/p\u003e","description":"","filename":"6.png","url":"https://assets-eu.researchsquare.com/files/rs-9222514/v1/4bdf3bd3daaf08000b8c3652.png"},{"id":107486172,"identity":"1d6cc2ce-d643-4558-994b-05798c5e89b9","added_by":"auto","created_at":"2026-04-22 02:37:37","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":35171,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eFigure 3.1.5: Actual Vs Ridge Rolling\u003c/em\u003e\u003c/p\u003e","description":"","filename":"7.png","url":"https://assets-eu.researchsquare.com/files/rs-9222514/v1/94ac1c1c14afa58f37ce15be.png"},{"id":107486105,"identity":"0347aa76-547e-41b4-bb3e-295756f2b488","added_by":"auto","created_at":"2026-04-22 02:37:26","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":38787,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eFigure 3.3.1: Feature Importance Analysis\u003c/em\u003e\u003c/p\u003e","description":"","filename":"8.png","url":"https://assets-eu.researchsquare.com/files/rs-9222514/v1/1f00545d14f5e8eeacc18027.png"},{"id":107484948,"identity":"e121b632-08f9-47d9-9b72-d3cefc37729f","added_by":"auto","created_at":"2026-04-22 02:33:21","extension":"png","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":37238,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eFigure 3.3.2: SHAP Summary Plot\u003c/em\u003e\u003c/p\u003e","description":"","filename":"9.png","url":"https://assets-eu.researchsquare.com/files/rs-9222514/v1/3a410c3166e5835f9553982f.png"},{"id":107486115,"identity":"2aaaaf74-fa82-45bf-92e4-a1b4fab35d45","added_by":"auto","created_at":"2026-04-22 02:37:28","extension":"png","order_by":10,"title":"Figure 10","display":"","copyAsset":false,"role":"figure","size":30163,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eFigure 3.3.3: Partial Dependence Plots\u003c/em\u003e\u003c/p\u003e","description":"","filename":"10.png","url":"https://assets-eu.researchsquare.com/files/rs-9222514/v1/a686db6ab86e3e87805cc771.png"},{"id":107705166,"identity":"8ca95e78-b7bd-4303-8ab5-f74922bf576d","added_by":"auto","created_at":"2026-04-24 09:08:50","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":888166,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9222514/v1/5bddb9a9-9293-48ac-a066-41a3af4c118f.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"A White-box Approach to Forecasting Petrol Prices in Import-Dependent Economies base on Machine Learning and Explainable Artificial Intelligence","fulltext":[{"header":"1. INTRODUCTION","content":"\u003cp\u003e1.1.\u0026nbsp;Background to the Study\u003c/p\u003e\n\u003cp\u003eIn Zambia, fuel prices have continued to show significant fluctuations over the recent years. These price fluctuations reflect changes in domestic policies, such as the subsidy removals, and the global oil market dynamics at large. The retail fuel price statistics published by the Energy Regulation Board (ERB) [1] reveal that the country has exhibited spiked rises in the price of petrol from ZMW 7.64 per litre in 2011 (post-rebasing equivalent) to ZMW 13.70 by the end of 2016 and further up to ZMW 34.98 by the first quarter of 2025. The main attributors of these hikes are the spikes in the international crude oil prices, which were caused by pandemics, such as COVID-19, and geopolitical disruptions, such as the Russia-Ukraine war [2].\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe 2021 government directive to remove subsidies on fossil fuels was passed to redirect funds towards other developmental needs in the country. This directive resulted in the increase in the excise duty on petrol prices from 0.64 ZMW/L to 2.07 ZMW/L and the VAT from 0%/L to 16%/L. This resulted in large spikes in the petrol price, which have huge effects on the cost of living, transportation expenses, and the overall economic stability in the country, as petroleum is a common energy source across production, manufacturing, and distribution industries in Zambia [3]. An economic survey that was conducted showed that low-income earners lost about 29.9% of their average income to this subsidy removal, while high-income earners lost about 12%, despite the high-income earners consuming more fuel [4].\u003c/p\u003e\n\u003cp\u003eFurthermore, an investigation on the effects of fuel prices on Zambia\u0026rsquo;s economic growth suggested that there is a significant negative correlation between the oil price hikes and the economic growth of the country. The results of the exploration revealed that the spikes in oil prices do not only increase production costs but also aggravate the inflation rate, which can result in the deterioration in the fiscal balance as the governments\u0026apos; attempts to mitigate these consequences require the direction of more resources into oil imports [5].\u003c/p\u003e\n\u003cp\u003eA working paper done by the Bank of Zambia (BOZ) revealed that the long-run effect of petroleum prices on inflation is significant and estimated at 0.26%, while in the short-run, changes in petroleum prices are found to exert a significant but modest positive effect on inflation, estimated at 0.03% with a one-month lag [6]. These results confirm that fuel prices play a key role in the economic growth of the country and are a key driver of inflation.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe ERB determined that the main factors that influence the price of fuel in the country are the International Oil Price (IOP) and the Zambian Kwacha to United States Dollar (ZMW/US$) exchange rate. ERB further added that other factors that influence the price of fuel include taxes, levies, charges, and fees for pumping and processing [7]. \u0026nbsp;To maintain price stability, the ERB applies a mechanism during all price reviews such that there is a threshold of 2.5% for fuel price adjustments. What this implies is that the fuel prices will remain unchanged if they fluctuate by less than 2.5% on average during each price review.\u003c/p\u003e\n\u003cp\u003eIn the attempt to curb the fuel crises, the government has employed multiple mitigation initiatives, such as the revision of the fuel pricing cycle from 60 days to 30 days as of January 2022, to align the fuel prices with the consumption period of the quantity that has been procured [8]. Another initiative includes the introduction of open access on the TAZAMA pipeline to allow all eligible stakeholders to transport fuel from Dar-Es-Salaam to Ndola using the available pipeline [9]. The ERB also introduced petroleum products pricing rules to foster transparency and predictability of petroleum prices to further strengthen business planning.\u003c/p\u003e\n\u003cp\u003eThe Energy Regulation Board (ERB) is a regulatory institution responsible for the regulation of petroleum product prices by applying the Cost-Plus pricing model (CPM). The CPM is essentially based on total production costs plus a reasonable profit margin to ensure fairness and sustainability in their enforcement [6]. The CPM is made up of the wholesale price to OMC, terminal fee, marking fee, excise duty, transport cost, OMC margin, dealer margin, ERB fees, Strategic Reserve Fund (SRF), and Value Added Tax (VAT). Zambia\u0026rsquo;s petroleum price build-up comprises multiple interrelated factors, each contributing to fluctuations in the uniform retail pump price. Among these, international oil prices remain highly volatile, and forecasts of commodities such as crude oil are often associated with wide error bands due to their inherent unpredictability [10]. Given Zambia\u0026rsquo;s complete dependence on petroleum imports and its exposure to exchange rate fluctuations and other external shocks, forecasting petrol prices presents significant challenges [11].\u003c/p\u003e\n\u003cp\u003eIn the past, fuel price forecasting mainly employed traditional time series models such as the Auto Regressive Integrated Moving Average (ARIMA). Despite their wide usage and valuable insights, these traditional time series models often struggle to capture complex, non-linear patterns within real-world datasets [12]. Machine learning (ML) techniques, on the other hand, offer advanced methods to model world complexities, and they usually provide more accurate forecasts. A study of the ARIMA, compared against Artificial Intelligence (AI) algorithms, had revealed that AI algorithms tend to display better prediction performance in most applications, with ARIMA recording a Mean Absolute Error (MAE) of 0.1927 while the other AI algorithms recorded MAE values of less than 0.1 [13].\u003c/p\u003e\n\u003cp\u003eWhile ML models often outperform traditional ones, they are seen as \u0026lsquo;black boxes\u0026rsquo; because they do not clearly explain how their predictions are derived. A useful interpretability solution to the black box problem is the use of explainable AI (XAI). XAI refers to a set of tools and methods that are used to make machine learning model predictions more transparent and interpretable. Many studies have revealed that traditional \u0026lsquo;black box\u0026rsquo; ML models deliver high predictive accuracy, however these models do not give much insight into the specific outputs generated. That is, they do not answer the \u0026lsquo;why\u0026rsquo; questions or explain why the output results are the way they are. XAI provides explanations through the application of feature importance across all the predictions, which helps to give the reasoning behind each single forecast [14].\u0026nbsp;\u003c/p\u003e\n\u003cp\u003ePetrol pricing in Zambia is not merely a statistical forecasting problem, but an energy-system outcome shaped by import dependence, regulatory pricing architecture, and exchange rate exposure. As a fully import-dependent fuel economy operating under a cost-plus pricing model, Zambia\u0026rsquo;s petrol prices reflect structured regulatory mechanisms rather than purely market-clearing dynamics. Consequently, forecasting accuracy must account for both global market shocks and domestic fiscal interventions embedded in the administrative pricing framework. This study therefore situates artificial intelligent (AI) within the broader energy-system context, by recognizing that predictive performance is inseparable from institutional pricing structures.\u003c/p\u003e\n\u003cp\u003e1.2.\u0026nbsp;Related Works\u003c/p\u003e\n\u003cp\u003e1.2.1.\u0026nbsp;\u0026nbsp;Machine Learning Approaches to Fuel and Energy Forecasting\u003c/p\u003e\n\u003cp\u003eRecent literature demonstrates a growing shift from traditional econometric models toward machine learning (ML) approaches in fuel and energy forecasting. A study done by Mohammad Abdulaziz Alwadi compared classical time-series models (ARIMA, SARIMA) with ML and deep learning models for fuel sales forecasting. The study found the Random Forest (RF) to outperform the ARIMA and SARIMA in predictive accuracy, scoring a coefficient of determination of R\u0026sup2; = 0.999, followed by the LSTM-based models. While the findings highlight the superiority of ensemble methods, the evaluation relied primarily on R\u0026sup2; and the MSE, limiting robustness in the performance assessment. Furthermore, model interpretability, such as the use of XAI, was not explored [15].\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eSimilarly, a regional study on the South African fuel pricing evaluated feedforward neural networks (FFNN), recurrent neural networks (LSTM and GRU), and convolutional neural networks (CNN) and used simple linear regression as a baseline model [16]. The LSTM-based RNN model demonstrated competitive performance, particularly under volatile exchange rate conditions. However, the study was constrained by a relatively small dataset (180 monthly observations) and focused exclusively on the Basic Fuel Price (BFP) component, excluding fiscal and tax structures.\u003c/p\u003e\n\u003cp\u003eThese studies collectively suggest that nonlinear ML models can outperform traditional econometric approaches in fuel-related forecasting. However, limitations remain regarding dataset scale, evaluation breadth, and policy-variable integration.\u003c/p\u003e\n\u003cp\u003e1.2.2.\u0026nbsp;\u0026nbsp;Traditional Time-Series Models in Fuel Price Forecasting\u003c/p\u003e\n\u003cp\u003eARIMA-based approaches continue to be widely used in petroleum price forecasting. A Malaysian study employing weekly petrol price data (174 observations) demonstrated that ARIMA (14,1,14) outperformed lower-order specifications in short-term forecasts [17]. However, the authors acknowledged that ARIMA performance deteriorates over longer horizons and under structural shifts.\u003c/p\u003e\n\u003cp\u003eTraditional models assume linearity and stationarity, which may not hold in deregulated or subsidy-reforming environments. In addition, these models do not easily accommodate complex nonlinear interactions between exchange rates, fiscal policy adjustments, and international oil price shocks. This limitation motivates the exploration of ML frameworks capable of modelling nonlinear dynamics and regime shifts.\u003c/p\u003e\n\u003cp\u003e1.2.3.\u0026nbsp;\u0026nbsp;Ensemble Learning in Energy Systems Forecasting\u003c/p\u003e\n\u003cp\u003eBeyond fuel pricing, ensemble learning methods have shown strong predictive performance in broader energy forecasting applications. A 2025 study on marine engine fuel consumption (MEFC) prediction compared RF, Gradient Boosting, XGBoost, SVR, and linear models using MSE, R\u0026sup2;, and Kling\u0026ndash;Gupta Efficiency (KGE). These results revealed the lowest test MSE (0.69), a robust testing R\u0026sup2; (0.9867), and a high KGE (0.9681), with Random Forests proving to be the most appropriate model for MEFC modelling among all others. XGBoost followed closely with competitive accuracy, with MSE values of 0.75 and a robust testing of R\u0026sup2; (0.9856) [18]. These findings reinforce the capacity of ensemble models to capture nonlinear dependencies in energy-related datasets. However, interpretability mechanisms were not integrated, limiting policy transparency.\u003c/p\u003e\n\u003cp\u003e1.2.4.\u0026nbsp;\u0026nbsp;Explainable Artificial Intelligence (XAI) in Energy Modelling\u003c/p\u003e\n\u003cp\u003eThe growing use of ML in energy systems has intensified concerns regarding interpretability. SHAP and LIME have emerged as dominant post-hoc explanation tools. A comparative evaluation of SHAP and LIME using decision trees, logistic regression, LGBM, and SVC demonstrated that explanation stability is highly sensitive to model choice and feature collinearity [19]. The study introduced the Normalized Movement Rate (NMR) metric to quantify ranking instability, revealing that feature importance attribution varies significantly across model classes. Importantly, the authors cautioned against over-reliance on standalone XAI outputs and recommended complementary diagnostic tools to improve robustness. These insights underscore the necessity of combining SHAP with permutation-based feature importance and partial dependence analysis to mitigate interpretability instability.\u003c/p\u003e\n\u003cp\u003e1.2.5.\u0026nbsp;\u0026nbsp;Identified Research Gaps and Contribution\u003c/p\u003e\n\u003cp\u003eDespite the growing adoption of machine learning techniques in fuel and energy forecasting, several important gaps remain in the literature. Firstly, many studies rely on relatively small monthly datasets, limiting the robustness and generalizability of predictive models, particularly in volatile pricing environments. Furthermore, fiscal policy variables such as excise duty adjustments, VAT changes, and subsidy reforms are frequently excluded, even though they materially influence retail fuel prices in regulated and import-dependent economies. It can be noted that model evaluation often depends on limited performance metrics without formal statistical comparison of the forecast\u0026apos;s accuracy. While XAI tools such as SHAP are increasingly being applied, their results are rarely complemented with additional robustness diagnostics, raising concerns about the stability of the interpretations, especially under feature collinearity. Finally, there remains limited empirical evidence from Sub-Saharan African economies, where exchange rate exposure and subsidy reforms play a dominant role in price formation.\u003c/p\u003e\n\u003cp\u003eThis study advances the literature by:\u003c/p\u003e\n\u003col\u003e\n \u003cli\u003eConstructing a high-frequency daily fuel pricing dataset (2011\u0026ndash;2025 ~ 5387 observations) for a Sub-Saharan African import-dependent economy.\u003c/li\u003e\n \u003cli\u003eApplying a comparative ML framework (Ridge, RF, SVR, LSTM) with formal statistical forecast comparison using the Diebold\u0026ndash;Mariano test.\u003c/li\u003e\n \u003cli\u003eIntegrating triangulated XAI diagnostics (SHAP, feature importance, PDP) to disentangle fiscal policy shocks from global market drivers.\u003c/li\u003e\n \u003cli\u003eDemonstrating how regulated cost-plus pricing structures structurally favour regularised linear models over nonlinear architectures.\u003c/li\u003e\n\u003c/ol\u003e\n\u003cp\u003eTo our knowledge, this is the first study to combine daily-frequency ML forecasting with XAI interpretability in a Sub-Saharan regulated energy market.\u003c/p\u003e"},{"header":"2. METHODS AND MODELS","content":"\u003cdiv id=\"Sec9\" class=\"Section2\"\u003e\n\u003ch2\u003e2.1. Conceptual Framework\u003c/h2\u003e\n\u003cp\u003eThe conceptual framework for this study illustrates the relationships between the main factors influencing petrol price fluctuations, the analytical techniques applied, and the intended outputs and impacts. It integrates both economic theory and data-driven artificial intelligence approaches to support robust and interpretable petrol price forecasting in Zambia. This research contributes to both the academic research body and Zambia\u0026rsquo;s energy regulation landscape. The conceptual framework is demonstrated in the figure below:\u003c/p\u003e\n\u003cp\u003eThis study proposes a hybrid ML-XAI forecasting model that is designed to enhance the accuracy, interpretability, and usability of petroleum price predictions in Zambia. The proposed model addresses the limitations identified in previous studies, particularly the reliance on single-variable time series models, limited forecasting horizons, and lack of transparency in predictive mechanisms.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec10\" class=\"Section2\"\u003e\n\u003ch2\u003e2.2. Model Specification\u003c/h2\u003e\n\u003cdiv id=\"Sec11\" class=\"Section3\"\u003e\n\u003ch2\u003e2.2.1. Linear and Regularized Regression Models\u003c/h2\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n\u003cp\u003eLet \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:yt\\in\\:R\\)\u003c/span\u003e\u003c/span\u003e denote the petrol price at time t and let \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:Xt=\\left(x1t,\\:x2t,\\:\\dots\\:,\\:xkt\\right)\\)\u003c/span\u003e\u003c/span\u003e represent the vector of independent variables (international oil price, exchange rate, excise duty, VAT and inflation).\u003c/p\u003e\n\u003cp\u003ea. Multivariate Linear Regression (MLR)\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n\u003cp\u003eMLR is a statistical technique used to predict dependent variables using multiple independent variables. It models the linear relationship between the independent variables and the dependent variable [\u003cspan class=\"CitationRef\"\u003e20\u003c/span\u003e].\u003c/p\u003e\n\u003cp\u003eThe standard linear regression model is defined as:\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Equa\" class=\"Equation\"\u003e\n\u003cdiv id=\"FileID_Equa\" class=\"mathdisplay\"\u003e$$\\:yt={{X}^{{\\prime\\:}}}_{t}\\beta\\:+\\:\\epsilon\\:t$$\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n\u003cp\u003eWhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\beta\\:ϵR\\)\u003c/span\u003e\u003c/span\u003e\u003csup\u003ek\u003c/sup\u003e is the parameter vector and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:ℇt\\:\\sim\\:{\\rm\\:N}(0,{\\sigma\\:}^{2})\\)\u003c/span\u003e\u003c/span\u003e is the disturbance term.\u003c/p\u003e\n\u003cp\u003eHowever, MLR is highly affected by multicollinearity of variables. In the case of this study, it is very likely that variables such as the international oil price and the ZMW/US$ exchange are highly correlated.\u003c/p\u003e\n\u003cp\u003eb. Ridge Regression\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n\u003cp\u003eRidge linear regression mitigates multicollinearity by adding a penalty term proportional to the square of the magnitude of coefficients. This shrinks coefficients but does not set them to zero and thus reduces model complexity and variance and improves prediction accuracy on new data [\u003cspan class=\"CitationRef\"\u003e21\u003c/span\u003e].\u003c/p\u003e\n\u003cp\u003eRidge regression introduces an L2 regularization penalty:\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Equb\" class=\"Equation\"\u003e\n\u003cdiv id=\"FileID_Equb\" class=\"mathdisplay\"\u003e$$\\:{\\widehat{\\beta\\:}}_{ridge}=arg\\left\\{{\\sum\\:}_{t=1}^{n}(yt-{X{\\prime\\:}}_{t}\\beta\\:{)}^{2}+\\lambda\\:\\parallel\\:\\beta\\:{\\parallel\\:\\:}_{2}^{2}\\right\\}\\:$$\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n\u003cp\u003eWhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\lambda\\:\\ge\\:0\\)\u003c/span\u003e\u003c/span\u003e is the shrinkage parameter controlling the bias-variance trade-off.\u003c/p\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec12\" class=\"Section3\"\u003e\n\u003ch2\u003e2.2.2. Random Forest\u003c/h2\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n\u003cp\u003eRandom Forest is an ensemble learning method constructed from \u003cem\u003eB\u003c/em\u003e decision trees:\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Equc\" class=\"Equation\"\u003e\n\u003cdiv id=\"FileID_Equc\" class=\"mathdisplay\"\u003e$$\\:\\widehat{f}\\left(x\\right)=\\frac{1}{B}{\\sum\\:}_{b=1}^{B}{T}_{b}\\left(x\\right)$$\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n\u003cp\u003eWhere each tree \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{T}_{b}\\left(x\\right)\\)\u003c/span\u003e\u003c/span\u003e is trained on a randomly selected subset of the original training data and a randomly chosen subset of the features. This process, known as bagging and feature randomness, helps to reduce overfitting by ensuring that individual trees are less likely to make the same errors on the data. By averaging the predictions from multiple trees, RF produces more robust and accurate forecasts compared to a single decision tree [\u003cspan class=\"CitationRef\"\u003e22\u003c/span\u003e].\u003c/p\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec13\" class=\"Section3\"\u003e\n\u003ch2\u003e2.2.3. Support Vector Regression (SVR)\u003c/h2\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n\u003cp\u003eSupport Vector Regression extends Support Vector Machines to regression tasks by minimizing structural risk under an \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\epsilon\\:\\)\u003c/span\u003e\u003c/span\u003e-insensitive loss function.\u003c/p\u003e\n\u003cp\u003eThe optimization problem is defined as:\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Equd\" class=\"Equation\"\u003e\n\u003cdiv id=\"FileID_Equd\" class=\"mathdisplay\"\u003e$$\\:\\frac{1}{2}{\\parallel\\:\\omega\\:\\parallel\\:}^{2}+C{\\sum\\:}_{i=1}^{n}({\\xi\\:}_{i}+\\:{\\xi\\:}_{i}^{\\ast\\:})\\:$$\u003c/div\u003e\n\u003c/div\u003e\n\u003cp\u003eSubjected to:\u003c/p\u003e\n\u003cdiv id=\"Eque\" class=\"Equation\"\u003e\n\u003cdiv id=\"FileID_Eque\" class=\"mathdisplay\"\u003e$$\\:{y}_{i\\:}-\\left({\\omega\\:}^{{\\prime\\:}}{x}_{i}+b\\right)\\le\\:\\epsilon\\:+{\\xi\\:}_{i}$$\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Equf\" class=\"Equation\"\u003e\n\u003cdiv id=\"FileID_Equf\" class=\"mathdisplay\"\u003e$$\\:\\left({\\omega\\:}^{{\\prime\\:}}{x}_{i}+b\\right)-{y}_{i\\:}\\le\\:\\epsilon\\:+{\\xi\\:}_{i}^{\\ast\\:}$$\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Equg\" class=\"Equation\"\u003e\n\u003cdiv id=\"FileID_Equg\" class=\"mathdisplay\"\u003e$$\\:{\\xi\\:}_{i},{\\xi\\:}_{i}^{\\ast\\:}\\ge\\:0$$\u003c/div\u003e\n\u003c/div\u003e\n\u003cp\u003eWhere:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\n\u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:C\\)\u003c/span\u003e \u003c/span\u003e controls the penalty for deviations,\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:\\epsilon\\:\\)\u003c/span\u003e \u003c/span\u003e defines the width of the insensitive tube,\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:{\\xi\\:}_{i},{\\xi\\:}_{i}^{\\ast\\:}\\:\\)\u003c/span\u003e \u003c/span\u003eare slack variables\u003c/p\u003e\n\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eBy forming a flexible tube of minimal radius around the estimate function, absolute values of errors below a certain threshold are ignored above and below the function. Therefore, all points outside the tube are penalized, while those within receive no penalty [\u003cspan class=\"CitationRef\"\u003e23\u003c/span\u003e].\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec14\" class=\"Section3\"\u003e\n\u003ch2\u003e2.2.4. Long Short-Term Memory (LSTM) Networks\u003c/h2\u003e\n\u003cp\u003eLong Short-Term Memory (LSTM) networks are a type of recurrent neural network (RNN) under the family of deep learning models.\u003c/p\u003e\n\u003cp\u003eThe internal gating mechanism is defined by:\u003c/p\u003e\n\u003cdiv id=\"Equh\" class=\"Equation\"\u003e\n\u003cdiv id=\"FileID_Equh\" class=\"mathdisplay\"\u003e$$\\:{f}_{t}=\\sigma\\:({W}_{f}\\left[{h}_{t-1},\\:{x}_{t}\\right]+{b}_{f})$$\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Equi\" class=\"Equation\"\u003e\n\u003cdiv id=\"FileID_Equi\" class=\"mathdisplay\"\u003e$$\\:{i}_{t}=\\sigma\\:({W}_{i}\\left[{h}_{t-1},\\:{x}_{t}\\right]+{b}_{i})$$\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Equj\" class=\"Equation\"\u003e\n\u003cdiv id=\"FileID_Equj\" class=\"mathdisplay\"\u003e$$\\:{\\stackrel{\\sim}{c}}_{t}=tanh({W}_{c}\\left[{h}_{t-1},\\:{x}_{t}\\right]+{b}_{c})$$\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Equk\" class=\"Equation\"\u003e\n\u003cdiv id=\"FileID_Equk\" class=\"mathdisplay\"\u003e$$\\:{c}_{t}={f}_{t}⨀{c}_{t-1}+{i}_{t}⨀{\\stackrel{\\sim}{c}}_{t}$$\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Equl\" class=\"Equation\"\u003e\n\u003cdiv id=\"FileID_Equl\" class=\"mathdisplay\"\u003e$$\\:{h}_{t}={o}_{t}⨀tanh\\left({c}_{t}\\right)$$\u003c/div\u003e\n\u003c/div\u003e\n\u003cp\u003eWhere:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\n\u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:{f}_{t}\\)\u003c/span\u003e \u003c/span\u003e is the forget gate,\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:{i}_{t}\\)\u003c/span\u003e \u003c/span\u003e is the input gate,\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:{c}_{t}\\:\\)\u003c/span\u003e \u003c/span\u003eis the cell state,\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:{h}_{t}\\)\u003c/span\u003e \u003c/span\u003e is the hidden state.\u003c/p\u003e\n\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eThe key principle of the LSTM centres around two objectives, data and the control of data. The data component prepares the candidate data signals while the control component prepares the throttle signals through input, output and forget gates, which regulate how information is stored, updated and disposed [\u003cspan class=\"CitationRef\"\u003e24\u003c/span\u003e].\u003c/p\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec15\" class=\"Section2\"\u003e\n\u003ch2\u003e2.3. Explainable AI Framework\u003c/h2\u003e\n\u003cp\u003eTo ensure interpretability, the best-performing model was subjected to feature attribution analysis using SHAP and permutation importance methods.\u003c/p\u003e\n\u003cdiv id=\"Sec16\" class=\"Section3\"\u003e\n\u003ch2\u003e2.3.1. Shapley additive explanations (SHAP)\u003c/h2\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n\u003cp\u003eSHAP is a XAI technique that quantifies the contribution of every input feature in ML-based prediction models. SHAP is based on the cooperative game theory, which considers all conceivable feature combinations to guarantee a fair distribution of feature importance.\u003c/p\u003e\n\u003cp\u003eFor a model \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{f}_{x}\\)\u003c/span\u003e\u003c/span\u003e, the SHAP decomposition is:\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Equm\" class=\"Equation\"\u003e\n\u003cdiv id=\"FileID_Equm\" class=\"mathdisplay\"\u003e$$\\:f\\left(x\\right)={\\varphi\\:}_{0}+{\\sum\\:}_{j=1}^{k}{\\varphi\\:}_{j}$$\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n\u003cp\u003eWhere:\u003c/p\u003e\n\u003c/div\u003e\n\u003cul\u003e\n\u003cli\u003e\n\u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:{\\varphi\\:}_{0}\\:\\)\u003c/span\u003e \u003c/span\u003eis the expected model output,\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:{\\varphi\\:}_{j}\\:\\)\u003c/span\u003e \u003c/span\u003erepresents the marginal contribution of feature j.\u003c/p\u003e\n\u003c/li\u003e\n\u003c/ul\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n\u003cp\u003eThe idea of SHAP values is to assign each feature a value that represents its contribution to the difference between the actual prediction and the prediction that would have been made in the absence of that feature [\u003cspan class=\"CitationRef\"\u003e14\u003c/span\u003e].\u003c/p\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec17\" class=\"Section3\"\u003e\n\u003ch2\u003e2.3.2. Feature Importance\u003c/h2\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n\u003cp\u003eFeature importance, in relation to ML, describes how much covariates contribute to a prediction model\u0026rsquo;s accuracy. A commonly used method of carrying out feature importance is known as the permutation-based feature importance approach, which is computed as the increase in prediction error after randomly permuting feature \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:j\\)\u003c/span\u003e\u003c/span\u003e [\u003cspan class=\"CitationRef\"\u003e25\u003c/span\u003e]:\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Equn\" class=\"Equation\"\u003e\n\u003cdiv id=\"FileID_Equn\" class=\"mathdisplay\"\u003e$$\\:F{I}_{j}=E\\left[L\\left(f\\left(X\\right),y\\right)\\right]-E\\left[L\\left(f\\left({X}_{perm\\left(j\\right)}\\right),y\\right)\\right]$$\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n\u003cp\u003eWhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:L(\\bullet\\:)\\)\u003c/span\u003e\u003c/span\u003e is the loss function and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{X}_{perm\\left(j\\right)}\\)\u003c/span\u003e\u003c/span\u003e denotes the dataset with feature \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:j\\)\u003c/span\u003e\u003c/span\u003e permuted.\u003c/p\u003e\n\u003cp\u003eThis approach quantifies the model\u0026rsquo;s reliance on each predictor by measuring degradation in predictive performance.\u003c/p\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec18\" class=\"Section2\"\u003e\n\u003ch2\u003e2.4. Performance Evaluation Metrics\u003c/h2\u003e\n\u003cp\u003eTo ensure robust comparison across linear, ensemble, kernel-based, and deep learning models, predictive performance was evaluated using scale-dependent, variance-based, and statistical forecast comparison metrics. The \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:MAE\\)\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:RMSE\\)\u003c/span\u003e\u003c/span\u003e capture scale-dependent forecast accuracy, while\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\:\\:{R}^{2}\\:\\)\u003c/span\u003e\u003c/span\u003eevaluates explanatory strength. The Diebold\u0026ndash;Mariano test provides statistical validation of forecast dominance. Together, these metrics ensure comprehensive evaluation of predictive performance in a volatile, policy-sensitive energy pricing environment.\u003c/p\u003e\n\u003cp\u003eLet \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{y}_{t}\\)\u003c/span\u003e\u003c/span\u003e denote the observed petrol price at time, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:t,\\:{\\widehat{y}}_{t}\\)\u003c/span\u003e\u003c/span\u003e the model prediction, and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:n\\)\u003c/span\u003e\u003c/span\u003e the number of observations in the evaluation set.\u003c/p\u003e\n\u003cdiv id=\"Sec19\" class=\"Section3\"\u003e\n\u003ch2\u003e2.4.1. Mean Absolute Error (MAE)\u003c/h2\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n\u003cp\u003eMean Absolute Error measures the average magnitude of prediction errors without considering direction:\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Equo\" class=\"Equation\"\u003e\n\u003cdiv id=\"FileID_Equo\" class=\"mathdisplay\"\u003e$$\\:MAE=\\frac{1}{n}{\\sum\\:}_{t=1}^{n}|{y}_{t}-{\\widehat{y}}_{t}|$$\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n\u003cp\u003eMAE provides an interpretable measure of average deviation in price units and is less sensitive to extreme outliers.\u003c/p\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec20\" class=\"Section3\"\u003e\n\u003ch2\u003e2.4.2. Root Mean Squared Error (RMSE)\u003c/h2\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n\u003cp\u003eRoot Mean Squared Error penalizes larger deviations more heavily due to squaring:\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Equp\" class=\"Equation\"\u003e\n\u003cdiv id=\"FileID_Equp\" class=\"mathdisplay\"\u003e$$\\:RMSE=\\sqrt{\\frac{1}{n}{\\sum\\:}_{t=1}^{n}{({y}_{t}-{\\widehat{y}}_{t})}^{2}}$$\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n\u003cp\u003eBecause fuel prices may experience sudden structural shifts, RMSE is particularly relevant for capturing the impact of large forecast errors during volatile periods.\u003c/p\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec21\" class=\"Section3\"\u003e\n\u003ch2\u003e2.4.3. Coefficient of Determination (R\u0026sup2;)\u003c/h2\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n\u003cp\u003eThe coefficient of determination evaluates the proportion of variance explained by the model:\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Equq\" class=\"Equation\"\u003e\n\u003cdiv id=\"FileID_Equq\" class=\"mathdisplay\"\u003e$$\\:{R}^{2}=1-\\frac{{\\sum\\:}_{t=1}^{n}{({y}_{t}-{\\widehat{y}}_{t})}^{2}}{{\\sum\\:}_{t=1}^{n}{({y}_{t}-\\underset{\\_}{{y}_{t}})}^{2}}$$\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n\u003cp\u003eWhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\underset{\\_}{{y}_{t}}\\:\\)\u003c/span\u003e\u003c/span\u003edenotes the sample mean of the observed prices.\u003c/p\u003e\n\u003cp\u003eHigher \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{R}^{2}\\)\u003c/span\u003e\u003c/span\u003e values indicate stronger explanatory power.\u003c/p\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec22\" class=\"Section3\"\u003e\n\u003ch2\u003e2.4.4. Diebold\u0026ndash;Mariano test\u003c/h2\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n\u003cp\u003eWhile error metrics assess absolute performance, statistical comparison of competing forecasts was conducted using the Diebold\u0026ndash;Mariano test.\u003c/p\u003e\n\u003cp\u003eLet the forecast errors from two competing models be:\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Equr\" class=\"Equation\"\u003e\n\u003cdiv id=\"FileID_Equr\" class=\"mathdisplay\"\u003e$$\\:{e}_{1,t}={y}_{t}-{\\widehat{y}}_{1,t},\\:{e}_{2,t}={y}_{t}-{\\widehat{y}}_{2,t}$$\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n\u003cp\u003eDefine the loss differential:\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Equs\" class=\"Equation\"\u003e\n\u003cdiv id=\"FileID_Equs\" class=\"mathdisplay\"\u003e$$\\:{d}_{t}=L\\left({e}_{1,t}\\right)-L\\left({e}_{2,t}\\right)$$\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n\u003cp\u003eWhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:L\\left(\\bullet\\:\\right)\\)\u003c/span\u003e\u003c/span\u003e is the chosen loss function.\u003c/p\u003e\n\u003cp\u003eThe DM statistic is computed as:\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Equt\" class=\"Equation\"\u003e\n\u003cdiv id=\"FileID_Equt\" class=\"mathdisplay\"\u003e$$\\:DM=\\frac{\\underset{\\_}{d}}{\\sqrt{\\widehat{V}ar\\left(\\underset{\\_}{d}\\right)}}$$\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n\u003cp\u003eWhere:\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Equu\" class=\"Equation\"\u003e\n\u003cdiv id=\"FileID_Equu\" class=\"mathdisplay\"\u003e$$\\:\\underset{\\_}{d}=\\frac{1}{n}{\\sum\\:}_{t=1}^{n}{d}_{t}$$\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n\u003cp\u003eUnder the null hypothesis:\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Equv\" class=\"Equation\"\u003e\n\u003cdiv id=\"FileID_Equv\" class=\"mathdisplay\"\u003e$$\\:{H}_{0}:E\\left({d}_{t}\\right)=0$$\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n\u003cp\u003ethere is no statistically significant difference in forecast accuracy between the two models.\u003c/p\u003e\n\u003cp\u003eThe Diebold\u0026ndash;Mariano test compares each model against a Naive (random walk) benchmark, which is defined as a no-change forecast where the next period\u0026rsquo;s petrol price equals the previous observed price. This benchmark is standard in time-series evaluation and provides a robust baseline for assessing predictive improvement.\u003c/p\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec23\" class=\"Section2\"\u003e\n\u003ch2\u003e2.5. Research design\u003c/h2\u003e\n\u003cp\u003eThis research adopted a data-driven predictive modelling design with an experimental quantitative approach. The experimental design allows for comparison of four ML algorithms (Ridge LR, RF, SVR and LSTM), using standardized accuracy metrics (RMSE, MAE, R\u0026sup2; and the DM test) to ensure objectivity and reproducibility. Furthermore, the application of Explainable AI (XAI) techniques adds an exploratory analytical layer which enables the interpretation of model outputs and identification of key economic and market drivers. Together, this integrated methodology enhances the validity of conclusions drawn, as model predictions are evaluated systematically. It also supports the reliability through cross-validation and standardized performance metrics. The research design is illustrated in the figure below:\u003c/p\u003e\n\u003cp\u003ea. Data Collection\u003c/p\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n\u003cp\u003eThis research uses data from ERB online datasets, energy sector reports and monthly fuel price publications and from BOZ published fortnightly time series statistics. The data collected from ERB sources include petrol pump prices (the dependent variable), excise duty and VAT from January 2011 to September 2025. The data is aggregated at a daily frequency, which enables high-resolution analysis of market behaviour and allows the machine learning models to better capture volatility. The dataset created is illustrated in the table below:\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"gridtable\"\u003e\n\u003ctable id=\"Tab1\" border=\"1\"\u003e\u003ccaption\u003e\n\u003cdiv class=\"CaptionNumber\"\u003eTable 3.2\u003c/div\u003e\n\u003cdiv class=\"CaptionContent\"\u003e\n\u003cp\u003ePetrol Price Dataset\u003c/p\u003e\n\u003c/div\u003e\n\u003c/caption\u003e\n\u003cthead\u003e\n\u003ctr\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eDate\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eZMW/US$\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eIOP (US$/Barrel)\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eExcise Duty (ZMW/L)\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eVAT (%/L)\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eInflation (%)\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003ePetrol Price (ZMW)\u003c/p\u003e\n\u003c/th\u003e\n\u003c/tr\u003e\n\u003c/thead\u003e\n\u003ctbody\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e31/08/2025\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e23.61\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e68.83\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e2.07\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e16%\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.4%\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e28\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e01/09/2025\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e23.62\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e68.68\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e2.07\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e16%\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.5%\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e29.18\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e02/09/2025\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e23.69\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e68.53\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e2.07\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e16%\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.5%\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e29.18\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv class=\"Section2\"\u003e\u0026nbsp;\u003c/div\u003e\n\u003cdiv class=\"Section2\"\u003eb. Exploratory Data Analysis (EDA)\u003cbr /\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n\u003cp\u003eEDA will be conducted to uncover structural patterns, variable relationships, and data anomalies. Time-series visualizations such as line charts will be used to examine long-term fuel price trends and detect volatility spikes. Correlation heatmaps will be used to highlight the relationships between fuel prices and explanatory variables such as crude oil prices and exchange rates. Boxplots will be used to identify outliers and assess the distributional properties of the variables. Decomposition plots will separate the time series into trend, seasonality, and residual components, offering insights into cyclical behaviours. Tools such as Matplotlib, Seaborn, and Pandas will facilitate the visual analysis process. This stage will provide not only descriptive insights, but also guidance for feature selection in the modelling phase.\u003c/p\u003e\n\u003cp\u003ec. Data Preprocessing \u0026amp; Feature Engineering\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n\u003cp\u003eGiven the complexity of fuel price data, preprocessing is essential to ensure data quality and integrity. Missing values will be addressed using interpolation and imputation techniques, ensuring continuity in the time series without distorting underlying trends. Outliers will be identified using boxplots and corrected or capped to reduce their influence on the model training. To maintain comparability, all variables will be expressed in consistent units and feature scaling using standardization will be applied to ensure balanced feature contributions. Thereafter, the dataset will be split into training (70%), validation (15%) and testing (15%) subsets to enable performance evaluation.\u003c/p\u003e\n\u003cp\u003eFeature engineering will then be applied to enhance the datasets predictive power. Lag variables will be created (1-day, 7-day and 30-day) to capture the delayed effects of exchange rate movements or policy interventions such as subsidy removal on fuel prices. Moving averages and exponential smoothing terms will account for short-term trends and reduce volatility caused by daily fluctuations. Seasonal and cyclical indicators will be extracted to identify recurring fluctuations (such as monthly or quarterly petrol price adjustments), and categorical features representing regulatory interventions (Subsidies, changes in excise duty) will be encoded using one-hot encoding. This step is critical to ensure that the models capture both temporal dependencies and external policy-driven dynamics influencing fuel prices.\u003c/p\u003e\n\u003cp\u003ed. Model Development \u0026amp; Implementation\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n\u003cp\u003eThe proposed models are developed and implemented using the Python programming environment because it is an open-source resource and is efficient for data analysis and machine learning tasks. The Pandas and Numpy libraries will be used for data manipulation, Scikit-learn for the development of traditional ML algorithms (Ridge Linear Regression, SVM and RF) and TensorFlow for the development of deep learning models (LSTM). Matplotlib and Seaborn packages for data visualization and performance diagnostics.\u003c/p\u003e\n\u003cp\u003eDuring the model implementation phase, each model is coded and trained on the petrol price dataset and will undergo hyperparameter tuning to optimize performance and reduce overfitting. Grid search cross-validation will be used for RF and SVR to identify the number of trees, maximum depth, kernel type and the regularization parameter (C). While iterative optimization techniques (Adam optimizer) will be used for the LSTM model to tune parameters such as the learning rate, number of hidden layers, neuron count, dropout rate, and training epochs. The models will be evaluated on the training and validation subsets to ensure convergence, stability, and generalization before final testing using the time-series split approach through the TimeSeriesSplit function in the Scikit-learn library.\u003c/p\u003e\n\u003cp\u003ee. Model Evaluation and Selection\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n\u003cp\u003eThe predictive performance of the models will be evaluated using three statistical metrics; Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and the Coefficient of Determination (R\u0026sup2;) score. In addition to the standard error metrics, the Diebold-Mariano (DM) test will be employed to statistically compare the forecasting accuracy of competing models. This study uses the Naive Bayes as a baseline model for the DM test as it typically produces competitive performances and is ideal for a solid baseline for the comparison with more complicated algorithms [\u003cspan class=\"CitationRef\"\u003e26\u003c/span\u003e]. Based on the outcomes of these evaluations, the models will be ranked across all performance metrics, with the top-performing model selected for further explainability analysis.\u003c/p\u003e\n\u003cp\u003ef. Explainable AI Integration\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n\u003cp\u003eTo enhance transparency and interpretability, this research design integrates Explainable Artificial Intelligence (XAI) techniques to the best performing model, to uncover the relationships between the variables and petrol prices. As an initial step, feature importance analysis will be conducted to identify and rank the most influential variables (ZMW/US$ exchange rate, international crude oil price, inflation rate, excise duty, VAT). Thereafter, SHAP analysis and Partial Dependence Plots (PDPs) will be used to further interpret the models behaviour. SHAP will quantify each variable\u0026rsquo;s contribution to predicted petrol prices across time, while PDPs will visualize the marginal effects of individual predictors, highlighting potential non-linearities and interaction effects[\u003cspan class=\"CitationRef\"\u003e27\u003c/span\u003e].\u003c/p\u003e\n\u003c/div\u003e\n\u003c/div\u003e"},{"header":"3. RESULTS AND DISCUSSIONS","content":"\u003cp\u003eThe results were interpreted within the institutional structure of Zambia\u0026rsquo;s regulated energy system. Unlike deregulated fuel markets where seasonal demand patterns drive pricing, Zambia\u0026rsquo;s cost-plus mechanism dampens stochastic fluctuations and embeds fiscal decisions directly into the pump price.\u003c/p\u003e\n\u003cp\u003e3.1. Exploratory Data Analysis\u003c/p\u003e\n\u003cp\u003ea. Petrol Price Trend from 2011\u0026ndash;2025: The petrol price trend line graph from 2011 to 2025 shows a continuous upward price movement with distinct periods of stability and abrupt price hikes, particularly after 2021.\u003c/p\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n\u003cp\u003eFigure \u003cspan class=\"InternalRef\"\u003e3.1\u003c/span\u003e\u003cem\u003e.1: Petrol Price Trend\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eFrom 2011 to around 2016, the petrol prices remained relatively stable with gradual increases. From 2017, the trend shifts upward more noticeably, and from 2021 onwards, petrol prices rise sharply. These hikes are in line with the heightened global crude oil volatility, the significant exchange rate depreciation of the Zambian Kwacha, and shifts in Zambia\u0026rsquo;s fuel subsidy and taxation policies. The years 2022 to 2024 show more pronounced volatility, which can be marked by the change in the price adjustment period from quarterly to monthly. The second half of 2025 illustrates slight declines in the petrol price which are primarily due to the stabilization of the ZMW/US$ exchange rate. However, compared with previous years, the petrol prices are still at very high levels.\u003c/p\u003e\n\u003cp\u003eb. Correlation Heat Map: The correlation heatmap demonstrates the strength of the relationships between the target variable (the petrol price) and the key drivers, as well as the strength of the relationship between the key drivers.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n\u003cp\u003eThe correlation heat map revealed that petrol prices in Zambia are strongly influenced by the exchange rate and taxation variables. The ZMW/US$ exchange rate recorded the highest correlation with the petrol prices of 0.93. This confirms that there is a substantial effect of currency depreciation on the cost of the imported fuel, which ultimately increases the price of petrol. Similarly, excise duty and VAT both exhibited strong and equal positive correlations with the petrol pump price of 0.85 each. This confirms the critical role of fiscal policy in petrol price determination and mitigation strategies. In contrast, the relationship between petrol prices and the International Oil Price (IOP) was seen to be weak and slightly negative, recording a correlation of -0.08. This suggests that the immediate impact of international oil prices may be moderated by the domestic pricing mechanisms. The inflation showed a low correlation with the fuel price movements of 0.17. This means that the general inflation rates do not directly determine petrol price adjustments in Zambia.\u003c/p\u003e\n\u003cp\u003ec. Box plot: The boxplot is used to show differences in the variability of the variables influencing petrol prices. The figure below provides a visual summary of the distribution, spread, and presence of outliers across the key variables influencing petrol prices in Zambia:\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n\u003cp\u003eThe International Oil Price (IOP) displayed the widest spread. This is in line with the volatility in the global oil markets and the variance in crude oil prices. However, the domestic tax components, such as excise duty and VAT, exhibited minimal variability. This result reflects their policy-determined and relatively stable nature, as fiscal policies do not change frequently. The ZMW/US$ exchange rate also showed a noticeable spread, which is consistent with the currency volatility that has been experienced in the Zambian economy. The petrol prices themselves fell within a moderate range, suggesting that the price determination mechanisms employed by ERB dampen the effects of the global price volatility. Inflation showed relatively low variability with a few outliers. This further reinforces its limited direct role in the petrol price movements\u003c/p\u003e\n\u003cp\u003ed. Seasonal Decomposition: The seasonal decomposition graph splits the petrol price time series into its underlying components, which are the trend, seasonal pattern, and residual fluctuations, to provide insight on how the petrol prices change over time. The figure below illustrates the seasonal decomposition of the petrol prices in Zambia:\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n\u003cp\u003eThe seasonal decomposition of the petrol price series reveals that fuel prices in Zambia exhibit a clear upward long-term trend with no meaningful seasonal pattern. The trend component shows steady growth throughout the study period, with pronounced acceleration after 2020, reflecting currency depreciation, global oil market disruptions, and policy changes. The seasonal component remains flat, indicating the absence of recurring monthly or annual patterns, which is consistent with Zambia\u0026rsquo;s administrative fuel pricing framework rather than market-driven seasonality. Residual fluctuations remain modest in the early years but become increasingly volatile after 2018, particularly between 2020 and 2024, when global and domestic shocks intensified. These results confirm that petrol prices are primarily influenced by long-term structural factors and irregular shocks rather than predictable seasonal cycles.\u003c/p\u003e\n\u003cp\u003ee. Actual Vs Ridge Rolling Forecast: The figure showed how closely the ridge rolling forecast overlaps the actual petrol price movements. This indicates a strong alignment between the predicted values from the ridge LR model and the actual values, confirming that the model is indeed ideal for this forecasting task.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec25\" class=\"Section2\"\u003e\n\u003ch2\u003e3.2. Performance Evaluation\u003c/h2\u003e\n\u003cp\u003eThe four ML and the ridge rolling forecast models were evaluated and compared using the MAE, RMSE, R\u0026sup2; and the Diebold Mariano Test. The results of the evaluation are presented in the table below:\u0026nbsp;\u003c/p\u003e\n\u003cdiv class=\"gridtable\"\u003e\n\u003ctable id=\"Tab2\" border=\"1\"\u003e\u003ccaption\u003e\n\u003cdiv class=\"CaptionNumber\"\u003eTable \u003cspan class=\"InternalRef\"\u003e3.2\u003c/span\u003e.1\u003c/div\u003e\n\u003cdiv class=\"CaptionContent\"\u003e\n\u003cp\u003ePerformance Evaluation Results of the ML Models\u003c/p\u003e\n\u003c/div\u003e\n\u003c/caption\u003e\n\u003cthead\u003e\n\u003ctr\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eML MODEL\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eMAE\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eRMSE\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eR\u0026sup2;\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eDM statistic\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eP-Value\u003c/p\u003e\n\u003c/th\u003e\n\u003c/tr\u003e\n\u003c/thead\u003e\n\u003ctbody\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eRidge LR\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.0510\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.2356\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.9927\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e-2.6793\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.0074\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eRF\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e3.6034\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e4.3177\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e-1.1106\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e30.9154\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.0000\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eSVM\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e5.7043\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e6.7710\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e-5.0699\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e28.1765\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.0000\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eLSTM\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e1.2736\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e1.4630\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.6760\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e30.7323\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.0000\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\u003c/div\u003e\n\u003cdiv class=\"gridtable\"\u003e\n\u003cdiv class=\"colspec\" align=\"left\"\u003e\u0026nbsp;\u003c/div\u003e\n\u003cdiv class=\"colspec\" align=\"left\"\u003e\u0026nbsp;\u0026nbsp;\u003c/div\u003e\n\u003ctable id=\"Tab3\" border=\"1\"\u003e\u003ccaption\u003e\n\u003cdiv class=\"CaptionNumber\"\u003eTable \u003cspan class=\"InternalRef\"\u003e3.2\u003c/span\u003e.2\u003c/div\u003e\n\u003cdiv class=\"CaptionContent\"\u003e\n\u003cp\u003eRolling Forecast Evaluation Results\u003c/p\u003e\n\u003c/div\u003e\n\u003c/caption\u003e\n\u003cthead\u003e\n\u003ctr\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eML MODEL\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eMAE\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eRMSE\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eR\u0026sup2;\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eDM statistic\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eP-Value\u003c/p\u003e\n\u003c/th\u003e\n\u003c/tr\u003e\n\u003c/thead\u003e\n\u003ctbody\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eRolling Ridge\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e0.0576\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e0.2477\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e0.9920\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e-2.4655\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e0.0137\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\u003c/div\u003e\n\u003cp\u003eAcross all evaluation metrics, the Ridge regression demonstrated clear and decisive superiority. It achieved the lowest MAE (0.0510) and RMSE (0.2356), indicating minimal average deviation and limited extreme forecast errors, while explaining 99.27% of price variance (R\u0026sup2; = 0.9927). The rolling Ridge specification maintained comparable performance (R\u0026sup2; = 0.9920), confirming temporal robustness. By contrast, the LSTM delivered moderate performance (R\u0026sup2; = 0.6760) but with materially higher error magnitudes, suggesting only partial capture of temporal dependencies. More critically, Random Forest and SVR produced negative R\u0026sup2; values (\u0026ndash;1.1106 and \u0026minus;\u0026thinsp;5.0699), meaning their forecasts performed worse than a simple mean benchmark and failed to represent the structural pricing relationships embedded within Zambia\u0026rsquo;s regulated fuel system.\u003c/p\u003e\n\u003cp\u003eThe Diebold\u0026ndash;Mariano results reinforce this hierarchy. The Ridge model yielded a statistically significant negative DM statistic (\u0026ndash;2.6793, p\u0026thinsp;=\u0026thinsp;0.0074), confirming superior accuracy relative to the naive random walk benchmark. In contrast, the nonlinear models generated large positive DM values with near-zero p-values, indicating systematically inferior predictive performance.\u003c/p\u003e\n\u003cp\u003eThese findings suggest that petrol pricing in regulated energy systems follows administratively structured linear relationships rather than stochastic nonlinear dynamics. Policy variables change discretely through regulatory adjustments rather than continuous probabilistic processes. Regularisation in Ridge regression suppresses noise while preserving the structural relationships inherent in the cost-plus pricing framework. In such institutional contexts, model simplicity aligned with pricing architecture outperforms algorithmic complexity.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec26\" class=\"Section2\"\u003e\n\u003ch2\u003e3.3. Explainable AI (XAI) Analysis\u003c/h2\u003e\n\u003cp\u003eTo enhance interpretability of the best-performing model, a triangulated XAI framework was applied to the Ridge regression specification using coefficient-based feature importance, SHAP (Shapley Additive Explanations), and partial dependence plots (PDPs). This multi-method approach enables distinction between structural contribution, marginal impact, and functional sensitivity of predictors within Zambia\u0026rsquo;s regulated fuel pricing system.\u003c/p\u003e\n\u003cp\u003ea. Feature Importance Analysis\u003c/p\u003e\n\u003cp\u003eThe table below presents the feature importance analysis results of the key influencers of the petrol price in Zambia.\u003c/p\u003e\n\u003cdiv class=\"gridtable\"\u003e\n\u003ctable id=\"Tab4\" border=\"1\"\u003e\u003ccaption\u003e\n\u003cdiv class=\"CaptionNumber\"\u003eTable \u003cspan class=\"InternalRef\"\u003e3.3\u003c/span\u003e.1\u003c/div\u003e\n\u003cdiv class=\"CaptionContent\"\u003e\n\u003cp\u003eFeature Importance Analysis Results\u003c/p\u003e\n\u003c/div\u003e\n\u003c/caption\u003e\n\u003cthead\u003e\n\u003ctr\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eFEATURE\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eCOEFFICIENT\u003c/p\u003e\n\u003c/th\u003e\n\u003c/tr\u003e\n\u003c/thead\u003e\n\u003ctbody\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eIOP\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.0115\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eZMW/US$\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.0039\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eExcise Duty\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.0034\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eVAT\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.0034\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eInflation\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.0023\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\u003c/div\u003e\n\u003cp\u003eThe figure below shows the relative contribution of each investigated petrol price driver to the model\u0026rsquo;s predictive performance. Features with higher importance scores are the ones that the model relies on most to minimize forecast errors.\u003c/p\u003e\n\u003cp\u003eThe analysis revealed that the IOP (international oil price) is the strongest predictor of the petrol price changes, followed by the exchange rate (ZMW/US$). These results are consistent with the ERB reported top determinants of fuel prices. These two variables exert the most influence because Zambia imports fuel and is therefore sensitive to global oil markets and currency fluctuations. Excise Duty and VAT also contribute to price movements but at a smaller magnitude, reflecting their role as fiscal policy components in fuel pricing. Inflation has the weakest coefficient, indicating that general macroeconomic conditions have a limited direct impact on short-term petrol price variations [\u003cspan class=\"CitationRef\"\u003e28\u003c/span\u003e].\u003c/p\u003e\n\u003cp\u003eb. SHAP Analysis\u003c/p\u003e\n\u003cp\u003eThe table below presents the SHAP analysis results of the key influencers of the petrol price in Zambia.\u003c/p\u003e\n\u003cdiv class=\"gridtable\"\u003e\n\u003ctable id=\"Tab5\" border=\"1\"\u003e\u003ccaption\u003e\n\u003cdiv class=\"CaptionNumber\"\u003eTable \u003cspan class=\"InternalRef\"\u003e3.3\u003c/span\u003e.2\u003c/div\u003e\n\u003cdiv class=\"CaptionContent\"\u003e\n\u003cp\u003eSHAP Analysis results\u003c/p\u003e\n\u003c/div\u003e\n\u003c/caption\u003e\n\u003cthead\u003e\n\u003ctr\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eFEATURE\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eMEAN_ABSOLUTE_SHAP VALUE\u003c/p\u003e\n\u003c/th\u003e\n\u003c/tr\u003e\n\u003c/thead\u003e\n\u003ctbody\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eIOP\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.0043\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eZMW/US$\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.0100\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eExcise Duty\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.0135\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eVAT\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.0135\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eInflation\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.0015\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\u003c/div\u003e\n\u003cp\u003eThe figure below further illustrates the results of the SHAP analysis which gives insights to the model's behaviour by demonstrating both the relative importance of the key influencer variables and the direction of their influence on predicted petrol prices.\u003c/p\u003e\n\u003cp\u003eThe SHAP results show that Excise Duty and VAT are the strongest drivers of petrol price predictions, indicating that tax related policy changes have large impacts on the model\u0026rsquo;s output. This result is consistent with the ERBs explanations for the fuel price hikes experienced after 2021 due to subsidy removals. The exchange rate (ZMW/US$) also has a notable influence, reflecting Zambia\u0026rsquo;s dependence on imported fuel as earlier explained. The IOP has a moderate effect, aligning with the role of the global oil prices in determining the baseline fuel costs. Inflation has the weakest impact, suggesting it plays only an indirect role in short-term price changes.\u003c/p\u003e\n\u003cp\u003ec. Partial Dependence Plots\u003c/p\u003e\n\u003cp\u003eThe partial dependence plots show how changes in each predictor variable influence the model\u0026rsquo;s predicted petrol price when all the other variables are held constant. The figure below suggests that only a few variables meaningfully drive prediction changes, while others exhibit minimal or negligible influence due to limited variation or low model sensitivity.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe ZMW/US$ exchange rate and IOP display clear positive and near-linear relationships with petrol price forecasts, confirming their persistent role in price transmission. In contrast, excise duty and VAT show relatively flat curves due to limited temporal variation; however, this does not imply insignificance. Rather, it indicates that these variables change infrequently but induce substantial discrete price adjustments when modified. Inflation again exhibits weak functional influence, reinforcing its secondary role. [\u003cspan class=\"CitationRef\"\u003e29\u003c/span\u003e].\u003c/p\u003e\n\u003cp\u003eWhile coefficient-based feature importance suggests that the International Oil Price (IOP) strongly influences long-term price formation, SHAP values reveal that discrete fiscal policy changes (Excise Duty and VAT adjustments) generate the largest marginal impacts during prediction periods. This apparent divergence reflects the difference between structural contribution and marginal shock attribution. IOP shapes the underlying cost baseline over time, whereas tax adjustments introduce discrete level shifts that significantly alter short-run predictions. Partial dependence plots appear flat for tax variables due to limited temporal variability; however, SHAP captures their large effect during policy change points. This triangulation confirms that fiscal shocks matter more during transition periods, while global oil prices dominate continuous trend formation.\u003c/p\u003e\n\u003cp\u003ePolicy Implications for Energy Regulators\u003c/p\u003e\n\u003cp\u003eThe findings imply that fiscal instruments (Excise Duty and VAT) are the most powerful short-run levers affecting petrol prices. The ERB\u0026rsquo;s 2.5% price adjustment threshold may dampen minor fluctuations but does not offset discrete fiscal shocks. Policymakers should therefore consider targeted compensatory measures when implementing tax reforms. Additionally, the strong performance of Ridge regression suggests that a regularised linear forecasting model could be operationalised within the ERB for short-term predictive planning and stress testing under exchange rate shock scenarios. From a social protection perspective, since tax variables disproportionately affect retail prices, targeted subsidies for vulnerable households may be more efficient than broad fuel price suppression.\u003c/p\u003e\n\u003c/div\u003e"},{"header":"4. CONCLUSION","content":"\u003cp\u003eThe empirical results demonstrate that Ridge regression consistently outperforms the other model implemented in forecasting petrol prices in Zambia. The model achieved the lowest forecast errors and explained over 99% of the variance in the observed data. While LSTM captured certain temporal dynamics, its predictive accuracy remained substantially below that of Ridge regression. The poor performance of RF and SVR, including negative R\u0026sup2; values, indicates that increased model complexity does not necessarily translate into improved performance within regulated energy pricing environments.\u003c/p\u003e \u003cp\u003eUnlike deregulated markets where seasonality reflects demand cycles (e.g., winter heating demand in OECD economies), Zambia\u0026rsquo;s administrative pricing eliminates recurring seasonal patterns. The flat seasonal decomposition therefore confirms institutional pricing dominance over market seasonality. The transition from a 60-day to 30-day pricing cycle likely increased price transmission speed, contributing to higher volatility after 2021. The Ridge model\u0026rsquo;s stability during this transition indicates robustness to structural regime shifts.\u003c/p\u003e \u003cp\u003eThe dominance of regularised linear modelling suggests that administratively structured energy pricing mechanisms favour models aligned with institutional cost-plus formulations. In such environments, policy-induced level shifts and exchange rate movements drive price dynamics more than nonlinear stochastic interactions. These findings extend beyond Zambia, offering insight into forecasting within other regulated energy markets.\u003c/p\u003e \u003cp\u003eThis study makes three principal contributions. It integrates daily-frequency machine learning forecasting with formal Diebold\u0026ndash;Mariano statistical comparison and triangulated explainable AI diagnostics, enhancing both predictive rigor and interpretability. Furthermore, it demonstrated that fiscal policy instruments, particularly excise duty and VAT, alongside exchange rate exposure exert stronger influence on domestic petrol prices than global oil price fluctuations within a regulated, import-dependent economy. Practically, this study provides a deployable Ridge-based forecasting framework suitable for regulatory application within Zambia\u0026rsquo;s energy pricing system.\u003c/p\u003e \u003cp\u003eFuture research should expand the modelling framework to incorporate additional cost components such as inland transportation, storage and distribution margins, freight and insurance costs, and geopolitical shock indicators. Provincial-level heterogeneity and distribution zone effects may also yield further insight [\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e]. Extending the framework to other petroleum products, electricity tariffs [\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e], public transport fares [\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e], and staple food items [\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e] would enhance understanding of cost-of-living transmission effects. Hybrid statistical\u0026ndash;deep learning architectures may also be explored to evaluate performance under structural regime changes.\u003c/p\u003e \u003cp\u003eOverall, the integration of machine learning with explainable AI in this study provides a transparent and policy-relevant forecasting approach capable of supporting evidence-based fuel pricing decisions in regulated energy systems [\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e].\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e \u003ch2\u003eConflict of Interest\u003c/h2\u003e \u003cp\u003eThe authors declare no conflict of interest.\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eEthical Approval\u003c/strong\u003e \u003cp\u003eNot Applicable.\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eConsent to Participate\u003c/strong\u003e \u003cp\u003eNot Applicable.\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eConsent to Publish\u003c/strong\u003e \u003cp\u003eNot Applicable.\u003c/p\u003e \u003c/p\u003e\u003ch2\u003eFunding\u003c/h2\u003e \u003cp\u003eThis research received no external funding.\u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eM.M conceptualised the study, conducted data analysis, developed the models, and wrote the manuscript. A.Z hosted the research at his institution, provided critical support and insights, and reviewed the manuscript. Both authors reviewed and approved the final version of the manuscript.\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eThe datasets used in this study are derived from publicly accessible sources including ERB and BOZ. The compiled dataset supporting this study is available at: https://doi.org/10.5281/zenodo.19364812\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003ePetroleum - Energy Regulation Board. Accessed: Oct. 16, 2025. [Online]. Available: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.erb.org.zm/petroleum\u003c/span\u003e\u003cspan address=\"https://www.erb.org.zm/petroleum\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAri A, Granados CM. The Energy Price Shock-Impact. Policy Responses, and Reform Options United Kingdom; 2023.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZAMBIA.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFuel Subsidy Removal in Zambia.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChulu Sefuka, Haabazoka L. Investigating the effects of fuel prices on Zambia\u0026rsquo;s economic growth. World J Adv Res Reviews. Apr. 2025;26(1):716\u0026ndash;36. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.30574/wjarr.2025.26.1.1048\u003c/span\u003e\u003cspan address=\"10.30574/wjarr.2025.26.1.1048\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWakumelo M. Fuel Prices and Inflation in Zambia.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEnergy Regulation Board-ANNUAL REPORT. | 2024 VISION, MISSION AND VALUES. [Online]. Available: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e\u003c/span\u003e\u003cspan address=\"http://www.erb.org.zm\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBanda A, Malama T. The Effects of Monthly Price Adjustments of Fuel by Energy Regulations Board (ERB) on Petroleum Companies in Zambia. Am J Industrial Bus Manage. 2025;15(04):661\u0026ndash;80. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.4236/ajibm.2025.154032\u003c/span\u003e\u003cspan address=\"10.4236/ajibm.2025.154032\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eElijah E, Sichone C, END OF YEAR PRESS BRIEFING MEMBERS OF THE PRESS ERB. MANAGEMENT AND STAFF LADIES AND GENTLEMEN INTRODUCTION.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBaffes J, Bank W, Kose MA, Ohnsorge F, Stocker M. The Great Plunge in Oil Prices: Causes, Consequences, and Policy Responses, 2015. [Online]. Available: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://ssrn.com/abstract=2624398\u003c/span\u003e\u003cspan address=\"http://ssrn.com/abstract=2624398\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSharma B, Shrestha A. Petroleum dependence in developing countries with an emphasis on Nepal and potential keys, Jan. 01, 2023, \u003cem\u003eElsevier Ltd\u003c/em\u003e. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.esr.2023.101053\u003c/span\u003e\u003cspan address=\"10.1016/j.esr.2023.101053\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLu S. Research on GDP Forecast Analysis Combining BP Neural Network and ARIMA Model, \u003cem\u003eComput. Intell. Neurosci.\u003c/em\u003e, vol. 2021, 2021. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1155/2021/1026978\u003c/span\u003e\u003cspan address=\"10.1155/2021/1026978\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAlizadegan H, Rashidi Malki B, Radmehr A, Karimi H, Ilani MA. Comparative study of long short-term memory (LSTM), bidirectional LSTM, and traditional machine learning approaches for energy consumption prediction, \u003cem\u003eEnergy Exploration \u0026amp; Exploitation\u003c/em\u003e, vol. 43, no. 1, pp. 281\u0026ndash;301, Jan. 2025, \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1177/01445987241269496\u003c/span\u003e\u003cspan address=\"10.1177/01445987241269496\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAdadi A, Berrada M. Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI). IEEE Access. 2018;6:52138\u0026ndash;60. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/ACCESS.2018.2870052\u003c/span\u003e\u003cspan address=\"10.1109/ACCESS.2018.2870052\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAlwadi MA. Fuel Sales Price Forecasting using Time Series, Machine Learning, and Deep Learning Models, \u003cem\u003eEngineering, Technology and Applied Science Research\u003c/em\u003e, vol. 15, no. 3, pp. 22360\u0026ndash;22366, Jun. 2025, \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.48084/etasr.10348\u003c/span\u003e\u003cspan address=\"10.48084/etasr.10348\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKingwill R, Brink WH. Evaluating the eectiveness of neural network techniques in the forecasting of South African basic fuel prices, 2019. [Online]. Available: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://scholar.sun.ac.za\u003c/span\u003e\u003cspan address=\"https://scholar.sun.ac.za\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSokkalingam R, Sarpong-Streetor RMNY, Othman M, Daud H, Owusu DA. Forecasting Petroleum Fuel Price in Malaysia by ARIMA Model, in \u003cem\u003eSpringer Proceedings in Complexity\u003c/em\u003e, Springer Science and Business Media B.V., 2021, pp. 671\u0026ndash;678. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/978-981-16-4513-6_58\u003c/span\u003e\u003cspan address=\"10.1007/978-981-16-4513-6_58\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHoang AT et al. Dec., Explainable machine learning-based prediction of fuel consumption in ship main engines using operational data, \u003cem\u003eBrodogradnja\u003c/em\u003e, vol. 76, no. 4, pp. 1\u0026ndash;24, 2025, \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.21278/brod76405\u003c/span\u003e\u003cspan address=\"10.21278/brod76405\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSalih AM, et al. A Perspective on Explainable Artificial Intelligence Methods: SHAP and LIME. Adv Intell Syst. Jan. 2025;7(1). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1002/aisy.202400304\u003c/span\u003e\u003cspan address=\"10.1002/aisy.202400304\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMaulud D, Abdulazeez AM. A Review on Linear Regression Comprehensive in Machine Learning, \u003cem\u003eJournal of Applied Science and Technology Trends\u003c/em\u003e, vol. 1, no. 2, pp. 140\u0026ndash;147, Dec. 2020, \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.38094/jastt1457\u003c/span\u003e\u003cspan address=\"10.38094/jastt1457\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGolam Kibria BM. More than hundred (100) estimators for estimating the shrinkage parameter in a linear and generalized linear ridge regression models. J Econometrics Stat. 2022;2(2):233\u0026ndash;52. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.47509/JES.2022.v02i02.06\u003c/span\u003e\u003cspan address=\"10.47509/JES.2022.v02i02.06\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSalman HA, Kalakech A, Steiti A. Random Forest Algorithm Overview, \u003cem\u003eBabylonian Journal of Machine Learning\u003c/em\u003e, vol. 2024, pp. 69\u0026ndash;79, Jun. 2024. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.58496/bjml/2024/007\u003c/span\u003e\u003cspan address=\"10.58496/bjml/2024/007\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSupport Vector Regression.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSherstinsky A. Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network. Jul. 2023. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.physd.2019.132306\u003c/span\u003e\u003cspan address=\"10.1016/j.physd.2019.132306\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFr\u0026auml;mling K. Feature Importance versus Feature Influence and What It Signifies for Explainable AI ⋆. [Online]. Available: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.umu.se/personal/kary-framling/\u003c/span\u003e\u003cspan address=\"https://www.umu.se/personal/kary-framling/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eD. Pajila Assistant Professor-Senior Grade, Bg. Sheena Assistant Professor D, Associate Professor D, Professor, editors. and S. R. Subramanian Associate Professor, A Comprehensive Survey on Naive Bayes Algorithm: Advantages, Limitations and Applications.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePonce-Bobadilla AV, Schmitt V, Maier CS, Mensing S, Stodtmann S. Practical guide to SHAP analysis: Explaining supervised machine learning model predictions in drug development. Clin Transl Sci. Nov. 2024;17(11). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1111/cts.70056\u003c/span\u003e\u003cspan address=\"10.1111/cts.70056\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSaarela M, Jauhiainen S. Comparison of feature importance measures as explanations for classification models. SN Appl Sci. Feb. 2021;3(2). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/s42452-021-04148-9\u003c/span\u003e\u003cspan address=\"10.1007/s42452-021-04148-9\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMolnar C, et al. Relating the Partial Dependence Plot and Permutation Feature Importance to the Data Generating Process. in Communications in Computer and Information Science. Springer Science and Business Media Deutschland GmbH; 2023. pp. 456\u0026ndash;79. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/978-3-031-44064-9_24\u003c/span\u003e\u003cspan address=\"10.1007/978-3-031-44064-9_24\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen J, et al. City- and county-level spatio-temporal energy consumption and efficiency datasets for China from 1997 to 2017. Sci Data. Dec. 2022;9(1). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41597-022-01240-6\u003c/span\u003e\u003cspan address=\"10.1038/s41597-022-01240-6\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLee MHL, et al. A Comparative Study of Forecasting Electricity Consumption Using Machine Learning Models. Mathematics. Apr. 2022;10(8). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3390/math10081329\u003c/span\u003e\u003cspan address=\"10.3390/math10081329\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAmicosante A, Avenali A, \u0026rsquo; Alfonso TD, Giagnorio M, Manno A, Matteucci G. Predicting costs of local public bus transport services through machine learning methods.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTHE. HORIZON REVIEW ISSUE 2.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMkhize MM. Development of a dimensionless model for simulating key parameters in solar distillation systems. Discover Energy. Jan. 2026;6(1). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/s43937-025-00108-1\u003c/span\u003e\u003cspan address=\"10.1007/s43937-025-00108-1\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"discover-artificial-intelligence","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"diai","sideBox":"Learn more about [Discover Artificial Intelligence](https://www.springer.com/44163)","snPcode":"","submissionUrl":"","title":"Discover Artificial Intelligence","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Discover Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Machine Learning, Explainable AI, Petrol Price Forecasting, Ridge Regression","lastPublishedDoi":"10.21203/rs.3.rs-9222514/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9222514/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eFuel price volatility presents significant economic and policy challenges in import-dependent economies such as Zambia. This study implements and compares four machine learning models: Ridge Linear Regression, Random Forest, Support Vector Regression and Long Short-Term Memory, using the MAE, RMSE, R\u0026sup2; and DM statistic as the performance evaluation metrics. The dataset used in this research consisted of the daily petrol prices as the dependent variable and the international oil price, ZMW/US\u003cspan\u003e$\u003c/span\u003e exchange rate, excise duty, VAT, and inflation as the independent variables, covering a period from January 2011 to September 2025. The performance evaluation revealed that Ridge Linear Regression consistently outperformed the other models, scoring a MAE of 0.0510, RMSE of 0.2356, R\u0026sup2; of 0.9927, a DM statistic of -2.6793, and p-value of 0.0074. Explainable AI (XAI) techniques, including SHAP values, feature importance, and partial dependence plots, were integrated to enhance interpretability. The XAI results indicate that excise duty, VAT, the ZMW/US\u003cspan\u003e$\u003c/span\u003e exchange rate, and the international oil price are dominant drivers of petrol price movements, while inflation plays a limited direct role.\u003c/p\u003e","manuscriptTitle":"A White-box Approach to Forecasting Petrol Prices in Import-Dependent Economies base on Machine Learning and Explainable Artificial Intelligence","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-04-20 10:00:26","doi":"10.21203/rs.3.rs-9222514/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"editorInvitedReview","content":"","date":"2026-05-08T18:37:43+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-05-06T01:25:05+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-05-02T12:33:10+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-04-28T20:04:55+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"54259269224315286200618140852459618937","date":"2026-04-28T04:16:08+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"264786108968801247908926991561230534399","date":"2026-04-27T14:31:36+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"178465977958687692166530047870055670058","date":"2026-04-27T13:53:47+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"89176578126473027081343446821185411754","date":"2026-04-27T06:08:19+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"60042112519776317240219092359647148153","date":"2026-04-25T23:19:49+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"323607923534121969828416690503598866335","date":"2026-04-25T12:00:10+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2026-04-10T13:50:34+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2026-04-03T13:36:56+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2026-04-01T15:49:03+00:00","index":"","fulltext":""},{"type":"submitted","content":"Discover Artificial Intelligence","date":"2026-04-01T13:40:15+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"discover-artificial-intelligence","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"diai","sideBox":"Learn more about [Discover Artificial Intelligence](https://www.springer.com/44163)","snPcode":"","submissionUrl":"","title":"Discover Artificial Intelligence","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Discover Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"a816d394-056f-4181-ba71-2654a64560df","owner":[],"postedDate":"April 20th, 2026","published":true,"recentEditorialEvents":[{"type":"editorInvitedReview","content":"","date":"2026-05-08T18:37:43+00:00","index":58,"fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-05-06T01:25:05+00:00","index":57,"fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-05-02T12:33:10+00:00","index":56,"fulltext":""}],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2026-04-20T10:00:26+00:00","versionOfRecord":[],"versionCreatedAt":"2026-04-20 10:00:26","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9222514","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9222514","identity":"rs-9222514","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.