Forecasting Indonesian Goods and Services Imports Using Machine Learning: A Comparative Evaluation of Model Performance

doi:10.21203/rs.3.rs-6691034/v1

Forecasting Indonesian Goods and Services Imports Using Machine Learning: A Comparative Evaluation of Model Performance

2025 · doi:10.21203/rs.3.rs-6691034/v1

preprint OA: closed

Full text JSON View at publisher

Full text 112,517 characters · extracted from preprint-html · click to expand

Forecasting Indonesian Goods and Services Imports Using Machine Learning: A Comparative Evaluation of Model Performance | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Forecasting Indonesian Goods and Services Imports Using Machine Learning: A Comparative Evaluation of Model Performance Ferdinand Gusleo, Rossi Passarella This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6691034/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Accurate forecasting of import values is crucial for effective economic planning and policy-making in emerging economies like Indonesia. Traditional forecasting methods often face challenges in capturing the complex, non-linear dynamics inherent in macroeconomic time series data. This study evaluates the performance of three prominent Machine Learning (ML) models—Support Vector Regression (SVR), Random Forest, and Decision Tree—for forecasting Indonesian goods and services imports. Utilising historical macroeconomic time series data for Indonesia spanning 1970–2023, the models were trained and rigorously evaluated using standard metrics, including mean squared error (MSE), mean absolute error (MAE), and the coefficient of determination (R 2 ). To address the limitation of the relatively small original dataset size, data augmentation via linear interpolation was explored, and the models' prediction accuracy for the year 2023 was specifically assessed. The results indicate that SVR demonstrated superior performance compared to Random Forest and Decision Tree based on the evaluation metrics and achieved the highest accuracy in predicting the 2023 import value, particularly after data interpolation was applied, which generally improved point prediction accuracy. The findings suggest that ML models, especially SVR, are effective and promising tools for enhancing the precision of Indonesian import forecasting. This research provides valuable empirical evidence for policymakers and practitioners seeking to leverage advanced computational techniques for improved economic forecasting and planning in an emerging market context while also highlighting considerations related to data characteristics and augmentation strategies for future methodological advancements. Macroeconomics Theoretical Computer Science Indonesia Imports Forecasting Machine Learning Time Series Analysis Comparative Study Macroeconomic Data Figures Figure 1 Figure 2 Figure 3 Figure 4 1. Introduction International trade stands as a cornerstone of the global economy, significantly influencing the trajectory of national development. For countries like Indonesia, export and import activities are not merely transactions; they are fundamental drivers of economic growth and crucial determinants of competitiveness in the international arena [ 1 ], [ 2 ]. The interplay between economic growth and foreign trade is multifaceted and dynamic, yet the critical role of trade in fostering both short-term stability and long-term development is undeniable [ 2 ]. Imports, often viewed primarily through the lens of trade balance, play a more nuanced role in a nation's economic structure. The ratio of imports to GDP, for instance, can reveal a country's reliance on essential capital goods or raw materials for its domestic production processes, directly impacting national productivity and overall output [ 3 ]. While the discourse sometimes highlights potential negative economic impacts, evidence suggests that in certain contexts, the stimulative effect of imports on economic growth can be substantial, sometimes even surpassing that of exports [ 4 ]. The precise impact of imports is highly dependent on the nature of the goods being imported and their integration into domestic production chains [ 5 ]. Given the profound impact of imports, accurate forecasting of import values is indispensable for effective economic planning and policy-making at the national level. Precise projections empower fiscal authorities and central banks to formulate more robust monetary and fiscal strategies and enable timely policy interventions in response to evolving economic conditions [ 6 ]. In the complex domain of economic forecasting, identifying the most influential variables is key to developing accurate and reliable predictive models [ 7 ]. Recent advancements in artificial intelligence (AI) and ML technologies have opened new avenues for applying sophisticated computational models to economic analysis and forecasting [ 8 ]. ML, in particular, offers significant capabilities in macroeconomic forecasting by effectively capturing the non-linear relationships and inherent uncertainties present in economic data [ 9 ], [ 10 ]. Empirical studies support the potential of ML in this domain. For instance, research focusing on predicting the imports of goods in 28 European countries between 2010 and 2019 utilised various ML techniques [ 14 ]. This study demonstrated that ML models, such as gradient-boosted trees and random forests, were more efficient in predicting import values when compared to traditional linear methods, like generalised linear models. The evaluation in [ 14 ] showed that the Gradient Boosted Trees model yielded the best accuracy based on standard error metrics, underscoring the effectiveness of advanced ML approaches for this specific forecasting task. Furthermore, the study provided insights into key variables influencing imports, contributing valuable understanding for policymakers and highlighting the capability of ML models to predict and reveal underlying relationships. This work by [ 14 ] also noted its contribution in addressing a gap concerning the application of ML in European import prediction, thereby paving the way for similar analyses in other contexts. Despite the potential of ML demonstrated in studies like [ 14 ], macroeconomic forecasting, especially with complex time series data like import values, presents considerable challenges. Integrating relevant economic indicators while filtering out noise, alongside managing the sheer complexity and volume of historical data, can hinder model accuracy [ 11 ]. To navigate these challenges and enhance prediction performance, a common and effective strategy involves a comparative evaluation of various ML models [ 12 ]. By testing and comparing different algorithms, researchers can identify the most optimal approach tailored to the specific characteristics of the data. Forecasting accuracy is not always directly correlated with model complexity; simpler models can sometimes yield comparable or even superior results with lower prediction errors [ 13 ]. Building upon the critical need for accurate import forecasting, recognising the potential of advanced computational methods like ML (as supported by prior research), and acknowledging the inherent challenges, this research aims to develop and evaluate ML-based models specifically for predicting the value of Indonesian goods and services imports. Beyond generating accurate forecasts through a comparative approach, the study also seeks to offer explanations for the key factors driving import fluctuations within the Indonesian economic context. This investigation is particularly relevant given the pivotal role of imports as a significant macroeconomic variable in Indonesia [ 3 ] and aims to contribute to better economic planning and policy-making 2. Data and Method This section describes the data sources used, including the process of collecting and preparing the data to make it suitable for analysis. It also describes the data acquisition process to ensure its validity and reliability. In the methods section, the study discusses the type of approach applied in data processing. The description includes specific analytical techniques, such as classification or other statistical methods, that support the justification for choosing a particular method. This section also illustrates the data pre-processing stages, from pre-processing to the final stage of data preparation before further analysis. 2.1. Data This study uses classified time series data sourced from the World Bank [ 15 ]. The World Bank acts as a provider of financial and technical assistance to developing countries. Since 2010, the institution has taken the innovative step of providing open access through a web API that allows users, both individuals and organizations— to easily download comprehensive development datasets from countries around the world. By doing so, the World Bank not only supports development projects but also facilitates more in-depth research and analysis through the availability of transparent and accessible data [ 16 ]. The dataset contains 54 rows of data covering the period from 1970 to 2023, and it includes 5 independent variables and 1 dependent variable. The challenge with this dataset is that it is relatively small in size to build a prediction model. The small number of rows in this dataset can increase the risk of overfitting, especially in models such as Random Forest, which is a complex model that captures noise rather than patterns that truly reflect the relationship between variables. In addition, the limited variation in the data may cause the model to have a limited understanding of potential future economic variations, potentially reducing accuracy when applied to data beyond the training range. Therefore, these factors should be considered when interpreting the prediction results and when using the model to support decision-making in economic policy. The data acquisition process from the World Bank was conducted separately for each variable. After obtaining all the required variables, they were combined into a single dataset that will be used in this research process. Information and descriptions about the data used can be seen in Table 1 . Table 1 Description of features in the dataset Features Description Imports of goods and services (% of GDP) Percentage of import value of goods and services to GDP Exports of goods and services (% of GDP) Percentage of the value of exports of goods and services to GDP Official exchange rate (LCU per US $ , period average) Annual average official exchange rate between the local currency and the US Dollar Foreign direct investment, net inflows (% of GDP) Equity of direct investment in the economy GDP per capita (current US $ ) Value of GDP per capita in US Dollars Inflation, consumer prices (annual %) Annual inflation rate based on changes in consumer prices Figure 1 shows the trend of imports of goods and services as a percentage of a country's Gross Domestic Product (GDP) over the period 1960–2020. The value of imports fluctuated during this period, with a peak of 43.2% in 1998. This spike is thought to be due to the riots that occurred in May 1998, which impacted the country's economic conditions and international trade. In general, the trend of imports showed an increase from 1960 to 2000, but thereafter, it tended to decline until it reached 19.6% in 2020. A significant decline occurred in the 2010–2020 period, where the value of imports fell from 28.8% in 2010 to 19.6% in 2020. Meanwhile, Fig. 2 shows that the distribution of imports of goods and services in Indonesia during the period 1970–2023 tends to be asymmetrical, with a right-skewed distribution. This evidence indicates that most years during the period had below-average import values, while there were some years with much higher import values. Import values are concentrated in the range of 15–30% of GDP, with the peak of the distribution (mode) occurring in the range of 20–25%. However, there were some years with very high import values, reaching more than 40% of GDP, as was the case in 1998, indicating economic turmoil in that period. The average imports of goods and services during 1970–2023 was 23.7% of GDP, slightly above the distribution centre, indicating that Indonesia, in general, has a high degree of economic openness to international trade. The variation in import values is substantial, reflected by several extreme values, both above and below the average, reflecting the dynamics of the Indonesian economy, which is influenced by various macroeconomic factors and trade policies. As for the heatmap visualisation results as shown in Fig. 3 , it can be seen that imports of goods and services have a high correlation with several other macroeconomic variables in Indonesia. Imports of goods and services have a strong positive correlation (0.90) with exports of goods and services, indicating that import and export levels tend to move in the same direction. In addition, import of goods and services also has a moderate positive correlation with the exchange rate (0.08) and inflation (0.51), indicating that exchange rate depreciation and a high inflation rate may encourage an increase in imports. On the other hand, import of goods and services has a moderate negative correlation (-0.57) with foreign direct investment, indicating that an increase in foreign direct investment tends to be followed by a decrease in imports of goods and services. However, import of goods and services has a weak correlation with economic growth (GDP per capita), with a correlation value of only − 0.25. This indicates that import levels do not have a strong linear relationship with economic growth, and import dynamics are more influenced by other factors such as trade policy, exchange rates, and global economic conditions. Based on the heatmap analysis, the use of ML models, especially nonlinear models such as Random Forest, may be a more suitable approach than the usual linear regression model. This is because nonlinear models have a better ability to capture the complexity of the relationship between predictor variables and the target variable—the import of goods and services. In addition, information from the heatmap also indicates the potential for very high multicollinearity between features, which needs to be considered in model selection and interpretation of regression results. 2.2. Data Pre-processing After collecting data from the World Bank website, the next stage is the pre-processing stage. This pre-processing stage is important to improve the accuracy of model predictions [ 17 ]. The first step taken is data filtering; the dataset collected consists of various countries, and the data used in this study only focuses on the country of Indonesia. Therefore, data filtering is needed to take data from Indonesia only. Based on the available data, the time span used in this study is from 1970 to 2023. Although the original data covers the period 1960–2023, the researcher decided to truncate the first 10 rows, so the analysis was only conducted on data starting from 1970. Based on the examination of the available data, it was found that there were no missing values in the dataset used in this study. Therefore, there is no need to fill in or impute missing data. Then, this study uses the fuzzy logic approach as part of the feature engineering strategy to classify the five main variables that affect imports, namely exports, exchange rates, foreign direct investment, GDP per capita, and inflation. This approach was chosen because fuzzy logic has the advantage of representing uncertainty in macroeconomic data, where the relationship between variables is not always linear and deterministic. Different from fixed threshold-based classification methods, fuzzy logic allows the formulation of domain knowledge-based linguistic inference rules, thus approximating the way economists think qualitatively. Each macroeconomic variable is transformed into three fuzzy sets: low, medium and high. This process is done through a fuzzification approach based on historical data distribution and economic literature references. Next, fuzzy inference rules are constructed using the IF-THEN logic form, such as ‘If inflation is high and investment is high, then imports are high’ or ’If the exchange rate is high and exports are high, then imports are low.’ These rules are formed systematically by combining a theoretical understanding of the relationships between variables as well as empirical observations of historical trends in the data. The fuzzy output in the form of linguistic categories is converted to numerical form through the label encoding technique to ensure compatibility with ML models that require numerical data. The encoding results do not replace the original numerical features but rather are added as new ones, thus enriching the data representation. Thus, each instance in the dataset has a numerical version and a categorical version of the same economic variable, reflecting both quantitative and conceptual dimensions. The rationale for using fuzzy logic is to integrate economic domain insights into ML processes, which often rely on purely statistical relationships. This approach can be considered as a form of knowledge-guided feature augmentation, where fuzzy features introduce an interpretative structure that can potentially help models capture complex and non-explicit relationships between variables. Although this method is unconventional in economic time series prediction, the addition of fuzzy features also increases the capacity of the model to distinguish between different economic conditions more sharply, thus supporting more robust short- and long-term prediction accuracy. 2.3. Method According to Fig. 4 , the research begins by collecting data from the World Bank. The data collected includes macroeconomic indicators that are believed to affect the value of imports. Once the data is obtained, a pre-processing stage is performed with several steps, including data standardisation and transformation to ensure a uniform scale. In this study, outliers in the dataset were not manipulated to maintain the original characteristics of the data. The data exploration stage is then carried out to understand the characteristics and patterns in the dataset by visualising the graph and data distribution to get the patterns contained in the dataset. And also, data exploration is done by looking at the correlation between variables that have a relationship with the value of imports. This study uses three ML models to predict import values and then compare their performance. The models are SVR, Random Forest, and Decision Tree. These three models were chosen based on the advantages of each algorithm in handling non-linear data such as economic data. SVR is used because it is a prediction model for time series data in general, because it has a regulation mechanism that can reduce the risk of overfitting, especially in applications with a limited number of data samples, which are often encountered in time series predictions. SVR has clearer advantages in making predictions using small-sized and non-linear data [ 18 ]. Random Forest is used because it has high flexibility in performing nonlinear regression and is able to capture complex interactions between variables without requiring strict statistical assumptions as in conventional econometric models. In addition, this model is also resistant to overfitting and relatively easy to optimise because it has only a few hyperparameters that can be adjusted for each prediction task through a specific training approach and the use of cross-validation to objectively evaluate model performance [ 19 ]. A decision tree is used because it is highly adaptive to data structure and characteristics, such as sparsity and smoothness, without requiring strict distribution assumptions. These characteristics make it relevant to be applied to time series data that are generally time-dependent and have non-linear patterns. Theoretically, Decision Tree has also been shown to be statistically consistent, even in the context of large-scale predictive models with an increasing number of predictors. In addition, this model is able to adjust the complexity of the tree structure to the information contained in the data through pruning mechanisms and setting the depth of the tree [ 20 ]. Furthermore, in the data-splitting section, the data is divided into a training set (1970–2012) that will be used to train the model and then will be tested with data that has never been seen by the model in the testing set (2013–2023). This temporal data-splitting strategy is applied to prevent data leakage by keeping the model only learning from past information and not getting clues from future data. This approach is important in the context of time series data because it maintains the chronological order of the data and reflects real-world prediction conditions, where model outputs should be generated based on historical information without interference from later data [ 21 ]. Then proceed with data standardisation to ensure each variable has the same scale. After the data is standardised, it will be trained using the training set and then tested on the test set to assess the performance of the model, which will be evaluated using several evaluation metrics, namely MAE, MSE, and R-squared (R 2 ). After the model is trained and tested, the model is also tested again by predicting the import value in 2023. To test the model's ability to predict against future data. Import data in 2023 is excluded, while other variables are still used as inputs to predict imports in 2023. The prediction results will be compared with the actual data to measure the accuracy of the model. This is used to analyse the factors that affect the prediction of import values and evaluate them. 3. Results and Discussion This section discusses the results of an analysis of the performance of various ML models in predicting import values. The models compared include SVR, Random Forest, and Decision Tree. The analysis is based on evaluation metrics such as MSE, MAE, and R². 3.1. Training and Analysis of Model Results The model used in this study has been trained using a dataset of historical import values from 1970 to 2022. This study employs manual tuning as a hyperparameter tuning technique. Manual tuning is used as a hyperparameter tuning method to determine the best parameter combination for each ML model used. The parameters used in each model are listed in Table 2 . Table 2 Parameter Model Model Parameter SVR Kernel = linear C = 3.1 Gamma = Scale Epsilon = 0.001 Random Forest N_estimators = 1 Max_Depth = 10 Min_samples_split = 2 Min_samples_leaf = 2 Decision Tree Max_Depth = 10 Min_samples_split = 5 Min_samples_leaf = 5 Max_leaf_nodes = 2 The results of the performance of each model with the evaluation metrics can be seen in Table 3. Table 4 Model Performa Model MSE MAE R² SVR 2.8645 1.3639 0.5956 Random Forest 3.6817 1.6650 0.4803 Decision Tree 7.1311 2.1963 − 0.0065 According to the evaluation results in Table 3, SVR has the best performance with an MSE of 2.8645, an MAE of 1.3639, and an R² of 0.5956, which makes it the best model in terms of evaluation. SVR excels because it is able to handle non-linear complexity and has good inbuilt regularisation capabilities in avoiding overfitting. The Random Forest model performed quite well with an MSE of (3.6817), an MAE of (1.6650), and an R² of (0.4803). But the decision tree model performed less well, having an MSE of 7.1311, an MAE of 2.1963, and even showing an R² value of -0.0065, which means it failed to explain the variation in the data. Once evaluated, the model was tested by predicting the import value in 2023, the results of which are in Table 4 . Table 4 Import Prediction Results for 2023 Model 2023 Import Prediction (%) Actual Value of Import 2023 (%) Prediction distance with Actual Value (%) SVR 20.4031 19,5690 0.8341 Random Forest 17.3933 2.1757 Decision Tree 20.6067 1.0377 Based on Table 4 , the prediction results of each model are compared with the actual value of imports in 2023 of 19.5690% to assess the accuracy of each model. Based on the results obtained, the SVR model has the best results with the smallest prediction distance from the actual value, which is 0.8341%. Then the decision tree also has a fairly close distance compared to the SVR model, which is 1.0377%. Random Forest has a fairly large prediction distance from the actual value, which is 2.1757%. The results obtained are still considered unsatisfactory, especially in the Random Forest model, which shows a large enough deviation from the actual value. Therefore, data enrichment was carried out through the upsampling method with a linear interpolation approach so that the amount of data increased to 100 rows in the time span 1970 to 2023. The addition of this data aims to improve the quality of model training by enriching the representation of available time patterns. Next, the prediction models will be tested again using the interpolated data, and their performance will be compared with the previous results to evaluate the effectiveness of the additional data on prediction accuracy. There are several parameter changes in each model which can be seen in Table 5 . Table 5 Model Parameters with Additional Data Model Parameter SVR Kernel = linear C = 3.0 Gamma = Scale Epsilon = 0.15 Random Forest N_estimators = 1 Max_Depth = 10 Min_samples_split = 2 Min_samples_leaf = 1 Decision Tree Max_Depth = 5 Min_samples_split = 5 Min_samples_leaf = 10 Max_leaf_nodes = 10 By using the new parameters (Table 5 ) along with the addition of data, the results of the performance of each model with the data that has been added using the evaluation metrics can be seen in Table 6 . Table 6 Model performance with additional data and new parameters Model MSE MAE R² SVR 0.0172 0.1274 0.5491 Random Forest 0.1663 0.3580 -3.3529 Decision Tree 0.3165 0.5275 -7.2817 Based on the evaluation results in Table 6 , the performance of each model after adding data has improved in terms of MSE and MAE values, which indicates that the prediction error is generally smaller. However, the R² value actually decreased in all models. This decrease in R² values is thought to be due to the change in data distribution due to linear interpolation, which reduces the natural variability between the original data and causes the model to be less able to explain a proportion of the total variation in the data. In other words, although the models were more accurate in predicting individual values, the global relationship between the independent and target variables became weaker. Once evaluated, the model with the augmented data was tested by predicting the import value in 2023, the results of which are in Table 7 . Based on the results obtained in Table 7 , after testing the prediction of import values in 2023 using additional data, all models show an increase in accuracy, characterised by a smaller prediction distance from the actual value. The SVR model still produces predictions with the highest level of accuracy, with a difference of 0.1854% from the actual value. The accuracy of the Random Forest and Decision Tree models also improved, with a prediction distance of 0.8667% and 0.8855%, respectively. Table 7 2023 Import Prediction Results with Additional Data and New Parameters Model 2023 Import Prediction (%) Actual Value of Import 2023 (%) Prediction distance with Actual Value (%) SVR 19.3836 19,5690 0.1854 Random Forest 18.7023 0.8667 Decision Tree 18.6835 0.8855 3.2. Implications of the Findings The findings of this study offer valuable practical and theoretical implications for economic forecasting and policy-making in Indonesia. On a practical level, the demonstration that ML models, particularly SVR, can achieve a high level of accuracy in predicting Indonesian import values using readily available macroeconomic data is significant. This provides policymakers and economic planners with a potentially powerful tool for enhancing the precision of national economic forecasts. More accurate import predictions can lead to more informed decisions regarding fiscal planning, monetary policy adjustments, and trade strategies, enabling more proactive and effective responses to economic dynamics. The identification of SVR as the best-performing model among those tested offers a concrete recommendation for practitioners seeking to implement ML-based import forecasting systems in the Indonesian context. Furthermore, the study's approach to identifying influential factors (as indicated by the correlation analysis and fuzzy logic integration) contributes to a deeper understanding of the underlying drivers of import fluctuations, which is crucial for designing targeted economic interventions. From a theoretical standpoint, this research contributes to the growing body of literature on the application of ML in macroeconomic forecasting, specifically focusing on an emerging market economy like Indonesia. By conducting a comparative analysis of different ML algorithms for time series prediction in this context, the study provides empirical evidence supporting the feasibility and effectiveness of these advanced computational methods beyond developed economies, building upon prior work such as [ 14 ]. The results also highlight important considerations regarding data characteristics, such as the impact of data size and augmentation techniques (like linear interpolation) on model performance and interpretability (evidenced by the R² changes), offering insights for future methodological development in macroeconomic time series forecasting with ML. The exploration of integrating domain knowledge through fuzzy logic feature engineering also presents an avenue for further research into hybrid economic-ML modelling approaches. 3.3. Limitation Despite contributing valuable insights into applying ML for Indonesian import value prediction, this study is subject to several limitations that warrant consideration when interpreting the results and planning future research. Firstly, a significant constraint is the relatively small size of the original historical dataset used (54 annual observations spanning from 1970 to 2023). While adequate for initial model training and comparison, a limited number of data points inherently restricts the models' ability to fully capture the diverse complexities and long-term dynamics of macroeconomic variables. A small sample size can also increase the risk of overfitting, particularly with more complex models, potentially limiting the generalisability of the findings to unseen future data beyond the analysis period. Secondly, the methodology included data augmentation through linear interpolation to increase the dataset size for training. While this technique aimed to enrich data representation, as noted in the results, it appeared to reduce the natural variability within the data, which might explain the observed decrease in R² values after interpolation. The result could imply that while individual point predictions might improve, the models' capacity to explain the overall variance and capture the true volatility inherent in real-world economic time series might be affected. Thirdly, the analysis relied on a specific set of five macroeconomic independent variables. Although chosen based on economic relevance, the dynamics of Indonesian imports are influenced by a broader array of factors, including global economic conditions, specific trade agreements and policies, and potential structural shifts in the economy. The exclusion of these or other potentially relevant variables might limit the predictive power and the models' ability to fully account for all drivers of import fluctuations. Furthermore, the comparison was limited to three specific ML models: SVR, random forest, and decision tree. While these models offer diverse approaches to non-linear regression, exploring other relevant time series forecasting techniques, including traditional econometric models (e.g., ARIMA variants with exogenous variables) or deep learning architectures (e.g., LSTMs), could potentially yield different or superior performance outcomes. Additionally, the use of manual hyperparameter tuning, as opposed to more systematic methods like grid search or randomised search with cross-validation, might mean that the absolute optimal performance of the evaluated models was not fully reached. Consequently, while the SVR model showed promising performance within this study's scope and data, particularly in predicting the 2023 value after data augmentation, the generalisability and long-term robustness of the models should be interpreted with caution due to these inherent data and methodological limitations. Future research should aim to address these points by incorporating larger and more diverse datasets, exploring alternative data augmentation or variable selection strategies, evaluating a wider range of forecasting models, and employing more exhaustive hyperparameter optimisation techniques to enhance the reliability and applicability of import prediction models. 4. Conclusion This study has examined and compared the performance of three ML algorithms, namely SVR, Random Forest, and Decision Tree, in predicting the value of Indonesian imports using macroeconomic indicator data from 1970 to 2023. The evaluation process is carried out through measuring model accuracy using the MSE, MAE, and coefficient of determination (R²) metrics, and then the prediction results for 2023 are compared with actual data. The results indicate that the SVR model produces predictions closest to the actual values and also performs best on the evaluation metrics, confirming the model's ability to model complex and nonlinear relationships between economic variables. The addition of data through interpolation was shown to improve the accuracy of the 2023 import value prediction in all models tested. The SVR model produces predictions with the smallest deviation, which is 0.1854%. Meanwhile, the Random Forest and Decision Tree models also showed improved performance, with a prediction distance of 0.8667% and 0.8855% to the actual value, respectively. However, the limitations of the relatively small size of the original dataset may affect the generalisability of models, including SVR, despite their good historical performance. The limited data size could potentially cause the model to overfit to local patterns or noise, increasing the risk of overfitting and degrading the prediction performance on new data. Therefore, further validation with additional or external data is necessary to ensure the stability of the model in the context of long-term prediction. For further development, this research can be extended by integrating richer and more varied data, including global economic indicators as well as other external variables such as social conditions, politics, and international trade policies. In addition, hybrid or ensemble approaches that combine the performance of multiple algorithms can be explored to improve the accuracy and generalisability of the model. Declarations Competing Interests The authors have no relevant financial or non-financial interests to disclose Funding The authors declare that no funds, grants, or other support were received during the preparation of this manuscript. Acknowledgement We would like to thank our friends who helped us in completing this study. References H. Panta, M. L. Devkota, and D. Banjade, “Exports and Imports-Led Growth: Evidence from a Small Developing Economy,” J. Risk Financ. Manag. , vol. 15, no. 1, p. 11, Jan. 2022, doi: 10.3390/jrfm15010011. T. Kircicek and G. Ozparlak, “The essential role of international trade on economic growth,” Pressacademia , Dec. 2023, doi: 10.17261/Pressacademia.2023.1826. X. Wan, S. A. Ajaz Kazmi, and C. Yeewong, “Manufacturing, Exports, and Sustainable Growth: Evidence from Developing Countries,” Sustain. , vol. 14, no. 3, pp. 1–22, 2022, doi: 10.3390/su14031646. M. Stojanović, I. Božić-Miljković, J. Obradović, and L. Dimitrijević, “The impact of imports and exports on economic growth: Panel data analysis,” Ekonomika , vol. 69, no. 4, pp. 69–80, 2023, doi: 10.5937/ekonomika2304069S. E. Velaj and E. Bezhani, “The Impact of Import and Export to GDP Growth – The Case of Albania,” Rev. Econ. Financ. , vol. 20, pp. 791–796, 2022, doi: 10.55365/1923.x2022.20.89. M. A. Khan et al. , “Application of Machine Learning Algorithms for Sustainable Business Management Based on Macro-Economic Data: Supervised Learning Techniques Approach,” Sustain. , vol. 14, no. 16, 2022, doi: 10.3390/su14169964. A. Panagiotelis, G. Athanasopoulos, R. J. Hyndman, B. Jiang, and F. Vahid, “Macroeconomic forecasting for Australia using a large number of predictors,” Int. J. Forecast. , vol. 35, no. 2, pp. 616–633, Apr. 2019, doi: 10.1016/j.ijforecast.2018.12.002. V. S. Kumar, “Artificial Intelligence in Economic Analysis: An Overview of Techniques, Applications and Challenges,” Asian J. Econ. Financ. Manag. , vol. 6, no. 1, pp. 388–396, Dec. 2024, doi: 10.56557/ajefm/2024/v6i1246. P. Goulet Coulombe, M. Leroux, D. Stevanovic, and S. Surprenant, “How is machine learning useful for macroeconomic forecasting?,” J. Appl. Econom. , vol. 37, no. 5, pp. 920–964, 2022, doi: 10.1002/jae.2910. K. Maehashi and M. Shintani, “Macroeconomic forecasting using factor models and machine learning: an application to Japan,” J. Jpn. Int. Econ. , vol. 58, p. 101104, Dec. 2020, doi: 10.1016/j.jjie.2020.101104. W. Li and K. L. E. Law, “Deep Learning Models for Time Series Forecasting: A Review,” IEEE Access , vol. 12, pp. 92306–92327, 2024, doi: 10.1109/ACCESS.2024.3422528. G. N. Jul, “Machine Learning for Economic Forecasting : An Application to China ’ s GDP Growth,” pp. 1–40, 2024. C. Heaton, N. Ponomareva, and Q. Zhang, “Forecasting models for the Chinese macroeconomy: the simpler the better?,” Empir. Econ. , vol. 58, no. 1, pp. 139–167, Jan. 2020, doi: 10.1007/s00181-019-01788-0. A. Costantiello, L. Laureti, and A. Leogrande, “Open Access Estimation and Machine Learning Prediction of Imports of Goods in European Countries in the Period 2010-2019,” Am. J. Humanit. Soc. Sci. Res. , no. 7, pp. 188–205, 2021. W Bank, “World Bank.” [Online]. Available: https://data.worldbank.org/ A. Mishra, “Accessing the World Bank open data programmatically,” XRDS Crossroads, ACM Mag. Students , vol. 18, no. 2, pp. 44–45, Dec. 2011, doi: 10.1145/2043236.2043253. O. Sami, Y. Elsheikh, and F. Almasalha, “The Role of Data Pre-processing Techniques in Improving Machine Learning Accuracy for Predicting Coronary Heart Disease,” Int. J. Adv. Comput. Sci. Appl. , vol. 12, no. 6, 2021, doi: 10.14569/IJACSA.2021.0120695. Y. J. Chen, J. A. Lin, Y. M. Chen, and J. H. Wu, “Financial Forecasting with Multivariate Adaptive Regression Splines and Queen Genetic Algorithm-Support Vector Regression,” IEEE Access , vol. 7, pp. 112931–112938, 2019, doi: 10.1109/ACCESS.2019.2927277. G. Dudek, “A Comprehensive Study of Random Forest for Short-Term Load Forecasting,” Energies , vol. 15, no. 20, 2022, doi: 10.3390/en15207547. J. M. Klusowski and P. M. Tian, “Large Scale Prediction with Decision Trees arXiv : 2104 . 13881v5 [ stat . ML ] 13 Nov 2023,” pp. 1–47. M. A. Morid, O. R. L. I. U. Sheng, and J. Dunbar, “Time Series Prediction Using Deep Learning Methods in Healthcare,” vol. 14, no. 1, 2023, doi: 10.1145/3531326. Additional Declarations The authors declare no competing interests. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6691034","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":458195902,"identity":"21bf48b2-bb7c-4d0c-94d0-34c8afe11b84","order_by":0,"name":"Ferdinand Gusleo","email":"","orcid":"https://orcid.org/0009-0001-6275-8138","institution":"Sriwijaya University","correspondingAuthor":false,"prefix":"","firstName":"Ferdinand","middleName":"","lastName":"Gusleo","suffix":""},{"id":458195903,"identity":"7ecf58e3-3d59-44b3-82c8-5e2f7297cd7d","order_by":1,"name":"Rossi Passarella","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABIElEQVRIie2QMUvDQBTH7zi4Lk9vTWjQr3AlkFqo9qtEAnWJ0tGpnBSapeCafgs/gMODDC4F14yWgC4ZEpwKAb2IYoWL4OZwv+HucXc/7v8eIRbLP8Y7/CxAfGyyXahyflGAfxWu+qtCJO4dm5Rh76Godvdj4D3kVdXMPT+/zF6vZ+RYKHaTG5TRKg7c1fMUOIRsvV5mEORXU3cjySBFujgxKBJjTgAzHSxk7EChVuKAKknoHaFLUzD5+FLUDb4BF0+MQTMHP439WiuTTiUPZR8QgTv6F+AMpBNLVyvnXcooLYO+h5FWtgva9uJsykArTpRm5l6G4qKoSzw7EiLKiJ7YRCRtsGZ8epskW9PE5HdJ1f6FjsQM738qFovFYungHRJbWPib5btsAAAAAElFTkSuQmCC","orcid":"https://orcid.org/0000-0002-7243-0451","institution":"Sriwijaya University","correspondingAuthor":true,"prefix":"","firstName":"Rossi","middleName":"","lastName":"Passarella","suffix":""}],"badges":[],"createdAt":"2025-05-18 10:01:38","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-6691034/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6691034/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":83222936,"identity":"9bc060c4-3fe6-4459-be71-7b311424cc1a","added_by":"auto","created_at":"2025-05-21 10:41:58","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":88049,"visible":true,"origin":"","legend":"\u003cp\u003eVariable graph of imports of goods and services (% of GDP)\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-6691034/v1/54eb832c9bae42194648b790.png"},{"id":83223289,"identity":"ec638ea9-1427-4f1c-9126-1ae584736779","added_by":"auto","created_at":"2025-05-21 10:49:59","extension":"jpeg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":183025,"visible":true,"origin":"","legend":"\u003cp\u003eVariable Distribution of Imports of Goods and Services (% of GDP)\u003c/p\u003e","description":"","filename":"floatimage2.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-6691034/v1/580e05640253da08ce1b04be.jpeg"},{"id":83223288,"identity":"ec79cdb3-10d6-4d8d-b32d-b572a845a5c5","added_by":"auto","created_at":"2025-05-21 10:49:59","extension":"jpeg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":243019,"visible":true,"origin":"","legend":"\u003cp\u003eThe result of heatmap analysis is the correlation between several macroeconomic variables in the dataset used.\u003c/p\u003e","description":"","filename":"floatimage3.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-6691034/v1/9b0d78cb42dfdbef442aada3.jpeg"},{"id":83222938,"identity":"71934bea-886c-4054-b4b1-94ad19e1399e","added_by":"auto","created_at":"2025-05-21 10:41:59","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":53105,"visible":true,"origin":"","legend":"\u003cp\u003eFlowchart of the study conducted\u003c/p\u003e","description":"","filename":"floatimage41.png","url":"https://assets-eu.researchsquare.com/files/rs-6691034/v1/a8f4c278c51665f4e837cf65.png"},{"id":83223292,"identity":"9b3db22d-c6a2-458d-9047-1631e225a720","added_by":"auto","created_at":"2025-05-21 10:50:04","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1162176,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6691034/v1/8aa461dc-1f82-48d8-ad34-ec0c47ff5bd5.pdf"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003e\u003cstrong\u003eForecasting Indonesian Goods and Services Imports Using Machine Learning: A Comparative Evaluation of Model Performance\u003c/strong\u003e\u003c/p\u003e","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eInternational trade stands as a cornerstone of the global economy, significantly influencing the trajectory of national development. For countries like Indonesia, export and import activities are not merely transactions; they are fundamental drivers of economic growth and crucial determinants of competitiveness in the international arena [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e], [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. The interplay between economic growth and foreign trade is multifaceted and dynamic, yet the critical role of trade in fostering both short-term stability and long-term development is undeniable [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eImports, often viewed primarily through the lens of trade balance, play a more nuanced role in a nation's economic structure. The ratio of imports to GDP, for instance, can reveal a country's reliance on essential capital goods or raw materials for its domestic production processes, directly impacting national productivity and overall output [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. While the discourse sometimes highlights potential negative economic impacts, evidence suggests that in certain contexts, the stimulative effect of imports on economic growth can be substantial, sometimes even surpassing that of exports [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. The precise impact of imports is highly dependent on the nature of the goods being imported and their integration into domestic production chains [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eGiven the profound impact of imports, accurate forecasting of import values is indispensable for effective economic planning and policy-making at the national level. Precise projections empower fiscal authorities and central banks to formulate more robust monetary and fiscal strategies and enable timely policy interventions in response to evolving economic conditions [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. In the complex domain of economic forecasting, identifying the most influential variables is key to developing accurate and reliable predictive models [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. Recent advancements in artificial intelligence (AI) and ML technologies have opened new avenues for applying sophisticated computational models to economic analysis and forecasting [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. ML, in particular, offers significant capabilities in macroeconomic forecasting by effectively capturing the non-linear relationships and inherent uncertainties present in economic data [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e], [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eEmpirical studies support the potential of ML in this domain. For instance, research focusing on predicting the imports of goods in 28 European countries between 2010 and 2019 utilised various ML techniques [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]. This study demonstrated that ML models, such as gradient-boosted trees and random forests, were more efficient in predicting import values when compared to traditional linear methods, like generalised linear models. The evaluation in [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e] showed that the Gradient Boosted Trees model yielded the best accuracy based on standard error metrics, underscoring the effectiveness of advanced ML approaches for this specific forecasting task. Furthermore, the study provided insights into key variables influencing imports, contributing valuable understanding for policymakers and highlighting the capability of ML models to predict and reveal underlying relationships. This work by [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e] also noted its contribution in addressing a gap concerning the application of ML in European import prediction, thereby paving the way for similar analyses in other contexts.\u003c/p\u003e \u003cp\u003eDespite the potential of ML demonstrated in studies like [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e], macroeconomic forecasting, especially with complex time series data like import values, presents considerable challenges. Integrating relevant economic indicators while filtering out noise, alongside managing the sheer complexity and volume of historical data, can hinder model accuracy [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. To navigate these challenges and enhance prediction performance, a common and effective strategy involves a comparative evaluation of various ML models [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e]. By testing and comparing different algorithms, researchers can identify the most optimal approach tailored to the specific characteristics of the data. Forecasting accuracy is not always directly correlated with model complexity; simpler models can sometimes yield comparable or even superior results with lower prediction errors [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eBuilding upon the critical need for accurate import forecasting, recognising the potential of advanced computational methods like ML (as supported by prior research), and acknowledging the inherent challenges, this research aims to develop and evaluate ML-based models specifically for predicting the value of Indonesian goods and services imports. Beyond generating accurate forecasts through a comparative approach, the study also seeks to offer explanations for the key factors driving import fluctuations within the Indonesian economic context. This investigation is particularly relevant given the pivotal role of imports as a significant macroeconomic variable in Indonesia [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e] and aims to contribute to better economic planning and policy-making\u003c/p\u003e"},{"header":"2. Data and Method","content":"\u003cp\u003eThis section describes the data sources used, including the process of collecting and preparing the data to make it suitable for analysis. It also describes the data acquisition process to ensure its validity and reliability.\u003c/p\u003e \u003cp\u003eIn the methods section, the study discusses the type of approach applied in data processing. The description includes specific analytical techniques, such as classification or other statistical methods, that support the justification for choosing a particular method. This section also illustrates the data pre-processing stages, from pre-processing to the final stage of data preparation before further analysis.\u003c/p\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.1. Data\u003c/h2\u003e \u003cp\u003eThis study uses classified time series data sourced from the World Bank [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]. The World Bank acts as a provider of financial and technical assistance to developing countries. Since 2010, the institution has taken the innovative step of providing open access through a web API that allows users, both individuals and organizations\u0026mdash; to easily download comprehensive development datasets from countries around the world. By doing so, the World Bank not only supports development projects but also facilitates more in-depth research and analysis through the availability of transparent and accessible data [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThe dataset contains 54 rows of data covering the period from 1970 to 2023, and it includes 5 independent variables and 1 dependent variable. The challenge with this dataset is that it is relatively small in size to build a prediction model. The small number of rows in this dataset can increase the risk of overfitting, especially in models such as Random Forest, which is a complex model that captures noise rather than patterns that truly reflect the relationship between variables. In addition, the limited variation in the data may cause the model to have a limited understanding of potential future economic variations, potentially reducing accuracy when applied to data beyond the training range. Therefore, these factors should be considered when interpreting the prediction results and when using the model to support decision-making in economic policy.\u003c/p\u003e \u003cp\u003eThe data acquisition process from the World Bank was conducted separately for each variable. After obtaining all the required variables, they were combined into a single dataset that will be used in this research process. Information and descriptions about the data used can be seen in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eDescription of features in the dataset\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"2\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFeatures\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDescription\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eImports of goods and services (% of GDP)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePercentage of import value of goods and services to GDP\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eExports of goods and services (% of GDP)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePercentage of the value of exports of goods and services to GDP\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eOfficial exchange rate (LCU per US\u003cspan\u003e$\u003c/span\u003e, period average)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAnnual average official exchange rate between the local currency and the US Dollar\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eForeign direct investment, net inflows (% of GDP)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eEquity of direct investment in the economy\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGDP per capita (current US\u003cspan\u003e$\u003c/span\u003e)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eValue of GDP per capita in US Dollars\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eInflation, consumer prices (annual %)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAnnual inflation rate based on changes in consumer prices\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e shows the trend of imports of goods and services as a percentage of a country's Gross Domestic Product (GDP) over the period 1960\u0026ndash;2020. The value of imports fluctuated during this period, with a peak of 43.2% in 1998. This spike is thought to be due to the riots that occurred in May 1998, which impacted the country's economic conditions and international trade. In general, the trend of imports showed an increase from 1960 to 2000, but thereafter, it tended to decline until it reached 19.6% in 2020. A significant decline occurred in the 2010\u0026ndash;2020 period, where the value of imports fell from 28.8% in 2010 to 19.6% in 2020.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eMeanwhile, Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e shows that the distribution of imports of goods and services in Indonesia during the period 1970\u0026ndash;2023 tends to be asymmetrical, with a right-skewed distribution. This evidence indicates that most years during the period had below-average import values, while there were some years with much higher import values. Import values are concentrated in the range of 15\u0026ndash;30% of GDP, with the peak of the distribution (mode) occurring in the range of 20\u0026ndash;25%. However, there were some years with very high import values, reaching more than 40% of GDP, as was the case in 1998, indicating economic turmoil in that period. The average imports of goods and services during 1970\u0026ndash;2023 was 23.7% of GDP, slightly above the distribution centre, indicating that Indonesia, in general, has a high degree of economic openness to international trade. The variation in import values is substantial, reflected by several extreme values, both above and below the average, reflecting the dynamics of the Indonesian economy, which is influenced by various macroeconomic factors and trade policies.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eAs for the heatmap visualisation results as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e, it can be seen that imports of goods and services have a high correlation with several other macroeconomic variables in Indonesia. Imports of goods and services have a strong positive correlation (0.90) with exports of goods and services, indicating that import and export levels tend to move in the same direction. In addition, import of goods and services also has a moderate positive correlation with the exchange rate (0.08) and inflation (0.51), indicating that exchange rate depreciation and a high inflation rate may encourage an increase in imports. On the other hand, import of goods and services has a moderate negative correlation (-0.57) with foreign direct investment, indicating that an increase in foreign direct investment tends to be followed by a decrease in imports of goods and services. However, import of goods and services has a weak correlation with economic growth (GDP per capita), with a correlation value of only \u0026minus;\u0026thinsp;0.25. This indicates that import levels do not have a strong linear relationship with economic growth, and import dynamics are more influenced by other factors such as trade policy, exchange rates, and global economic conditions. Based on the heatmap analysis, the use of ML models, especially nonlinear models such as Random Forest, may be a more suitable approach than the usual linear regression model. This is because nonlinear models have a better ability to capture the complexity of the relationship between predictor variables and the target variable\u0026mdash;the import of goods and services. In addition, information from the heatmap also indicates the potential for very high multicollinearity between features, which needs to be considered in model selection and interpretation of regression results.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e2.2. Data Pre-processing\u003c/h2\u003e \u003cp\u003eAfter collecting data from the World Bank website, the next stage is the pre-processing stage. This pre-processing stage is important to improve the accuracy of model predictions [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e]. The first step taken is data filtering; the dataset collected consists of various countries, and the data used in this study only focuses on the country of Indonesia. Therefore, data filtering is needed to take data from Indonesia only. Based on the available data, the time span used in this study is from 1970 to 2023. Although the original data covers the period 1960\u0026ndash;2023, the researcher decided to truncate the first 10 rows, so the analysis was only conducted on data starting from 1970. Based on the examination of the available data, it was found that there were no missing values in the dataset used in this study. Therefore, there is no need to fill in or impute missing data. Then, this study uses the fuzzy logic approach as part of the feature engineering strategy to classify the five main variables that affect imports, namely exports, exchange rates, foreign direct investment, GDP per capita, and inflation. This approach was chosen because fuzzy logic has the advantage of representing uncertainty in macroeconomic data, where the relationship between variables is not always linear and deterministic. Different from fixed threshold-based classification methods, fuzzy logic allows the formulation of domain knowledge-based linguistic inference rules, thus approximating the way economists think qualitatively.\u003c/p\u003e \u003cp\u003eEach macroeconomic variable is transformed into three fuzzy sets: low, medium and high. This process is done through a fuzzification approach based on historical data distribution and economic literature references. Next, fuzzy inference rules are constructed using the IF-THEN logic form, such as \u0026lsquo;If inflation is high and investment is high, then imports are high\u0026rsquo; or \u0026rsquo;If the exchange rate is high and exports are high, then imports are low.\u0026rsquo; These rules are formed systematically by combining a theoretical understanding of the relationships between variables as well as empirical observations of historical trends in the data.\u003c/p\u003e \u003cp\u003eThe fuzzy output in the form of linguistic categories is converted to numerical form through the label encoding technique to ensure compatibility with ML models that require numerical data. The encoding results do not replace the original numerical features but rather are added as new ones, thus enriching the data representation. Thus, each instance in the dataset has a numerical version and a categorical version of the same economic variable, reflecting both quantitative and conceptual dimensions.\u003c/p\u003e \u003cp\u003eThe rationale for using fuzzy logic is to integrate economic domain insights into ML processes, which often rely on purely statistical relationships. This approach can be considered as a form of knowledge-guided feature augmentation, where fuzzy features introduce an interpretative structure that can potentially help models capture complex and non-explicit relationships between variables. Although this method is unconventional in economic time series prediction, the addition of fuzzy features also increases the capacity of the model to distinguish between different economic conditions more sharply, thus supporting more robust short- and long-term prediction accuracy.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e2.3. Method\u003c/h2\u003e \u003cp\u003eAccording to Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e, the research begins by collecting data from the World Bank. The data collected includes macroeconomic indicators that are believed to affect the value of imports. Once the data is obtained, a pre-processing stage is performed with several steps, including data standardisation and transformation to ensure a uniform scale. In this study, outliers in the dataset were not manipulated to maintain the original characteristics of the data. The data exploration stage is then carried out to understand the characteristics and patterns in the dataset by visualising the graph and data distribution to get the patterns contained in the dataset. And also, data exploration is done by looking at the correlation between variables that have a relationship with the value of imports.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThis study uses three ML models to predict import values and then compare their performance. The models are SVR, Random Forest, and Decision Tree. These three models were chosen based on the advantages of each algorithm in handling non-linear data such as economic data.\u003c/p\u003e \u003cp\u003eSVR is used because it is a prediction model for time series data in general, because it has a regulation mechanism that can reduce the risk of overfitting, especially in applications with a limited number of data samples, which are often encountered in time series predictions. SVR has clearer advantages in making predictions using small-sized and non-linear data [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eRandom Forest is used because it has high flexibility in performing nonlinear regression and is able to capture complex interactions between variables without requiring strict statistical assumptions as in conventional econometric models. In addition, this model is also resistant to overfitting and relatively easy to optimise because it has only a few hyperparameters that can be adjusted for each prediction task through a specific training approach and the use of cross-validation to objectively evaluate model performance [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eA decision tree is used because it is highly adaptive to data structure and characteristics, such as sparsity and smoothness, without requiring strict distribution assumptions. These characteristics make it relevant to be applied to time series data that are generally time-dependent and have non-linear patterns. Theoretically, Decision Tree has also been shown to be statistically consistent, even in the context of large-scale predictive models with an increasing number of predictors. In addition, this model is able to adjust the complexity of the tree structure to the information contained in the data through pruning mechanisms and setting the depth of the tree [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eFurthermore, in the data-splitting section, the data is divided into a training set (1970\u0026ndash;2012) that will be used to train the model and then will be tested with data that has never been seen by the model in the testing set (2013\u0026ndash;2023). This temporal data-splitting strategy is applied to prevent data leakage by keeping the model only learning from past information and not getting clues from future data. This approach is important in the context of time series data because it maintains the chronological order of the data and reflects real-world prediction conditions, where model outputs should be generated based on historical information without interference from later data [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]. Then proceed with data standardisation to ensure each variable has the same scale. After the data is standardised, it will be trained using the training set and then tested on the test set to assess the performance of the model, which will be evaluated using several evaluation metrics, namely MAE, MSE, and R-squared (R\u003csup\u003e2\u003c/sup\u003e).\u003c/p\u003e \u003cp\u003eAfter the model is trained and tested, the model is also tested again by predicting the import value in 2023. To test the model's ability to predict against future data. Import data in 2023 is excluded, while other variables are still used as inputs to predict imports in 2023. The prediction results will be compared with the actual data to measure the accuracy of the model. This is used to analyse the factors that affect the prediction of import values and evaluate them.\u003c/p\u003e \u003c/div\u003e"},{"header":"3. Results and Discussion","content":"\u003cp\u003eThis section discusses the results of an analysis of the performance of various ML models in predicting import values. The models compared include SVR, Random Forest, and Decision Tree. The analysis is based on evaluation metrics such as MSE, MAE, and R\u0026sup2;.\u003c/p\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003e3.1. Training and Analysis of Model Results\u003c/h2\u003e \u003cp\u003eThe model used in this study has been trained using a dataset of historical import values from 1970 to 2022. This study employs manual tuning as a hyperparameter tuning technique. Manual tuning is used as a hyperparameter tuning method to determine the best parameter combination for each ML model used. The parameters used in each model are listed in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eParameter Model\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"4\" nameend=\"c5\" namest=\"c2\"\u003e \u003cp\u003e\u003cem\u003eParameter\u003c/em\u003e\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSVR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eKernel\u0026thinsp;=\u0026thinsp;linear\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eC\u0026thinsp;=\u0026thinsp;3.1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eGamma\u0026thinsp;=\u0026thinsp;Scale\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eEpsilon\u0026thinsp;=\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRandom Forest\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eN_estimators\u0026thinsp;=\u0026thinsp;1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eMax_Depth\u0026thinsp;=\u0026thinsp;10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eMin_samples_split\u0026thinsp;=\u0026thinsp;2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eMin_samples_leaf\u0026thinsp;=\u0026thinsp;2\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDecision Tree\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMax_Depth\u0026thinsp;=\u0026thinsp;10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eMin_samples_split\u0026thinsp;=\u0026thinsp;5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eMin_samples_leaf\u0026thinsp;=\u0026thinsp;5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eMax_leaf_nodes\u0026thinsp;=\u0026thinsp;2\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eThe results of the performance of each model with the evaluation metrics can be seen in Table\u0026nbsp;3.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eModel Performa\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMSE\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eMAE\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eR\u0026sup2;\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSVR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e2.8645\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1.3639\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.5956\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRandom Forest\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e3.6817\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1.6650\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.4803\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDecision Tree\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e7.1311\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2.1963\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u0026minus;\u0026thinsp;0.0065\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eAccording to the evaluation results in Table\u0026nbsp;3, SVR has the best performance with an MSE of 2.8645, an MAE of 1.3639, and an R\u0026sup2; of 0.5956, which makes it the best model in terms of evaluation. SVR excels because it is able to handle non-linear complexity and has good inbuilt regularisation capabilities in avoiding overfitting. The Random Forest model performed quite well with an MSE of (3.6817), an MAE of (1.6650), and an R\u0026sup2; of (0.4803). But the decision tree model performed less well, having an MSE of 7.1311, an MAE of 2.1963, and even showing an R\u0026sup2; value of -0.0065, which means it failed to explain the variation in the data.\u003c/p\u003e \u003cp\u003eOnce evaluated, the model was tested by predicting the import value in 2023, the results of which are in Table\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab4\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eImport Prediction Results for 2023\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2023 Import Prediction (%)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eActual Value of Import 2023 (%)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003ePrediction distance with Actual Value (%)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSVR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e20.4031\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003e19,5690\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.8341\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRandom Forest\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e17.3933\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e2.1757\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDecision Tree\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e20.6067\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e1.0377\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eBased on Table\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e, the prediction results of each model are compared with the actual value of imports in 2023 of 19.5690% to assess the accuracy of each model. Based on the results obtained, the SVR model has the best results with the smallest prediction distance from the actual value, which is 0.8341%. Then the decision tree also has a fairly close distance compared to the SVR model, which is 1.0377%. Random Forest has a fairly large prediction distance from the actual value, which is 2.1757%.\u003c/p\u003e \u003cp\u003eThe results obtained are still considered unsatisfactory, especially in the Random Forest model, which shows a large enough deviation from the actual value. Therefore, data enrichment was carried out through the upsampling method with a linear interpolation approach so that the amount of data increased to 100 rows in the time span 1970 to 2023. The addition of this data aims to improve the quality of model training by enriching the representation of available time patterns. Next, the prediction models will be tested again using the interpolated data, and their performance will be compared with the previous results to evaluate the effectiveness of the additional data on prediction accuracy.\u003c/p\u003e \u003cp\u003eThere are several parameter changes in each model which can be seen in Table\u0026nbsp;\u003cspan refid=\"Tab5\" class=\"InternalRef\"\u003e5\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab5\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 5\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eModel Parameters with Additional Data\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"4\" nameend=\"c5\" namest=\"c2\"\u003e \u003cp\u003e\u003cem\u003eParameter\u003c/em\u003e\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSVR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eKernel\u0026thinsp;=\u0026thinsp;linear\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eC\u0026thinsp;=\u0026thinsp;3.0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eGamma\u0026thinsp;=\u0026thinsp;Scale\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eEpsilon\u0026thinsp;=\u0026thinsp;0.15\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRandom Forest\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eN_estimators\u0026thinsp;=\u0026thinsp;1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eMax_Depth\u0026thinsp;=\u0026thinsp;10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eMin_samples_split\u0026thinsp;=\u0026thinsp;2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eMin_samples_leaf\u0026thinsp;=\u0026thinsp;1\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDecision Tree\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMax_Depth\u0026thinsp;=\u0026thinsp;5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eMin_samples_split\u0026thinsp;=\u0026thinsp;5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eMin_samples_leaf\u0026thinsp;=\u0026thinsp;10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eMax_leaf_nodes\u0026thinsp;=\u0026thinsp;10\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eBy using the new parameters (Table \u003cspan refid=\"Tab5\" class=\"InternalRef\"\u003e5\u003c/span\u003e) along with the addition of data, the results of the performance of each model with the data that has been added using the evaluation metrics can be seen in Table\u0026nbsp;\u003cspan refid=\"Tab6\" class=\"InternalRef\"\u003e6\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab6\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 6\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eModel performance with additional data and new parameters\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMSE\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eMAE\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eR\u0026sup2;\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSVR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.0172\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.1274\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.5491\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRandom Forest\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.1663\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.3580\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e-3.3529\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDecision Tree\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.3165\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.5275\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e-7.2817\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eBased on the evaluation results in Table\u0026nbsp;\u003cspan refid=\"Tab6\" class=\"InternalRef\"\u003e6\u003c/span\u003e, the performance of each model after adding data has improved in terms of MSE and MAE values, which indicates that the prediction error is generally smaller. However, the R\u0026sup2; value actually decreased in all models. This decrease in R\u0026sup2; values is thought to be due to the change in data distribution due to linear interpolation, which reduces the natural variability between the original data and causes the model to be less able to explain a proportion of the total variation in the data. In other words, although the models were more accurate in predicting individual values, the global relationship between the independent and target variables became weaker.\u003c/p\u003e \u003cp\u003eOnce evaluated, the model with the augmented data was tested by predicting the import value in 2023, the results of which are in Table\u0026nbsp;\u003cspan refid=\"Tab7\" class=\"InternalRef\"\u003e7\u003c/span\u003e.\u003c/p\u003e \u003cp\u003eBased on the results obtained in Table\u0026nbsp;\u003cspan refid=\"Tab7\" class=\"InternalRef\"\u003e7\u003c/span\u003e, after testing the prediction of import values in 2023 using additional data, all models show an increase in accuracy, characterised by a smaller prediction distance from the actual value. The SVR model still produces predictions with the highest level of accuracy, with a difference of 0.1854% from the actual value. The accuracy of the Random Forest and Decision Tree models also improved, with a prediction distance of 0.8667% and 0.8855%, respectively.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab7\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 7\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003e2023 Import Prediction Results with Additional Data and New Parameters\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2023 Import Prediction (%)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eActual Value of Import 2023 (%)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003ePrediction distance with Actual Value (%)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSVR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e19.3836\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003e19,5690\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.1854\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRandom Forest\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e18.7023\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.8667\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDecision Tree\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e18.6835\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.8855\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003e3.2. Implications of the Findings\u003c/h2\u003e \u003cp\u003eThe findings of this study offer valuable practical and theoretical implications for economic forecasting and policy-making in Indonesia. On a practical level, the demonstration that ML models, particularly SVR, can achieve a high level of accuracy in predicting Indonesian import values using readily available macroeconomic data is significant. This provides policymakers and economic planners with a potentially powerful tool for enhancing the precision of national economic forecasts. More accurate import predictions can lead to more informed decisions regarding fiscal planning, monetary policy adjustments, and trade strategies, enabling more proactive and effective responses to economic dynamics. The identification of SVR as the best-performing model among those tested offers a concrete recommendation for practitioners seeking to implement ML-based import forecasting systems in the Indonesian context. Furthermore, the study's approach to identifying influential factors (as indicated by the correlation analysis and fuzzy logic integration) contributes to a deeper understanding of the underlying drivers of import fluctuations, which is crucial for designing targeted economic interventions.\u003c/p\u003e \u003cp\u003eFrom a theoretical standpoint, this research contributes to the growing body of literature on the application of ML in macroeconomic forecasting, specifically focusing on an emerging market economy like Indonesia. By conducting a comparative analysis of different ML algorithms for time series prediction in this context, the study provides empirical evidence supporting the feasibility and effectiveness of these advanced computational methods beyond developed economies, building upon prior work such as [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]. The results also highlight important considerations regarding data characteristics, such as the impact of data size and augmentation techniques (like linear interpolation) on model performance and interpretability (evidenced by the R\u0026sup2; changes), offering insights for future methodological development in macroeconomic time series forecasting with ML. The exploration of integrating domain knowledge through fuzzy logic feature engineering also presents an avenue for further research into hybrid economic-ML modelling approaches.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003e3.3. Limitation\u003c/h2\u003e \u003cp\u003eDespite contributing valuable insights into applying ML for Indonesian import value prediction, this study is subject to several limitations that warrant consideration when interpreting the results and planning future research.\u003c/p\u003e \u003cp\u003eFirstly, a significant constraint is the relatively small size of the original historical dataset used (54 annual observations spanning from 1970 to 2023). While adequate for initial model training and comparison, a limited number of data points inherently restricts the models' ability to fully capture the diverse complexities and long-term dynamics of macroeconomic variables. A small sample size can also increase the risk of overfitting, particularly with more complex models, potentially limiting the generalisability of the findings to unseen future data beyond the analysis period.\u003c/p\u003e \u003cp\u003eSecondly, the methodology included data augmentation through linear interpolation to increase the dataset size for training. While this technique aimed to enrich data representation, as noted in the results, it appeared to reduce the natural variability within the data, which might explain the observed decrease in R\u0026sup2; values after interpolation. The result could imply that while individual point predictions might improve, the models' capacity to explain the overall variance and capture the true volatility inherent in real-world economic time series might be affected.\u003c/p\u003e \u003cp\u003eThirdly, the analysis relied on a specific set of five macroeconomic independent variables. Although chosen based on economic relevance, the dynamics of Indonesian imports are influenced by a broader array of factors, including global economic conditions, specific trade agreements and policies, and potential structural shifts in the economy. The exclusion of these or other potentially relevant variables might limit the predictive power and the models' ability to fully account for all drivers of import fluctuations.\u003c/p\u003e \u003cp\u003eFurthermore, the comparison was limited to three specific ML models: SVR, random forest, and decision tree. While these models offer diverse approaches to non-linear regression, exploring other relevant time series forecasting techniques, including traditional econometric models (e.g., ARIMA variants with exogenous variables) or deep learning architectures (e.g., LSTMs), could potentially yield different or superior performance outcomes. Additionally, the use of manual hyperparameter tuning, as opposed to more systematic methods like grid search or randomised search with cross-validation, might mean that the absolute optimal performance of the evaluated models was not fully reached.\u003c/p\u003e \u003cp\u003eConsequently, while the SVR model showed promising performance within this study's scope and data, particularly in predicting the 2023 value after data augmentation, the generalisability and long-term robustness of the models should be interpreted with caution due to these inherent data and methodological limitations. Future research should aim to address these points by incorporating larger and more diverse datasets, exploring alternative data augmentation or variable selection strategies, evaluating a wider range of forecasting models, and employing more exhaustive hyperparameter optimisation techniques to enhance the reliability and applicability of import prediction models.\u003c/p\u003e \u003c/div\u003e"},{"header":"4. Conclusion","content":"\u003cp\u003eThis study has examined and compared the performance of three ML algorithms, namely SVR, Random Forest, and Decision Tree, in predicting the value of Indonesian imports using macroeconomic indicator data from 1970 to 2023. The evaluation process is carried out through measuring model accuracy using the MSE, MAE, and coefficient of determination (R\u0026sup2;) metrics, and then the prediction results for 2023 are compared with actual data. The results indicate that the SVR model produces predictions closest to the actual values and also performs best on the evaluation metrics, confirming the model's ability to model complex and nonlinear relationships between economic variables. The addition of data through interpolation was shown to improve the accuracy of the 2023 import value prediction in all models tested. The SVR model produces predictions with the smallest deviation, which is 0.1854%. Meanwhile, the Random Forest and Decision Tree models also showed improved performance, with a prediction distance of 0.8667% and 0.8855% to the actual value, respectively. However, the limitations of the relatively small size of the original dataset may affect the generalisability of models, including SVR, despite their good historical performance. The limited data size could potentially cause the model to overfit to local patterns or noise, increasing the risk of overfitting and degrading the prediction performance on new data. Therefore, further validation with additional or external data is necessary to ensure the stability of the model in the context of long-term prediction. For further development, this research can be extended by integrating richer and more varied data, including global economic indicators as well as other external variables such as social conditions, politics, and international trade policies. In addition, hybrid or ensemble approaches that combine the performance of multiple algorithms can be explored to improve the accuracy and generalisability of the model.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e \u003ch2\u003eCompeting Interests\u003c/h2\u003e \u003cp\u003eThe authors have no relevant financial or non-financial interests to disclose\u003c/p\u003e \u003c/p\u003e\u003ch2\u003eFunding\u003c/h2\u003e \u003cp\u003eThe authors declare that no funds, grants, or other support were received during the preparation of this manuscript.\u003c/p\u003e\u003ch2\u003eAcknowledgement\u003c/h2\u003e \u003cp\u003eWe would like to thank our friends who helped us in completing this study.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eH. Panta, M. L. Devkota, and D. Banjade, \u0026ldquo;Exports and Imports-Led Growth: Evidence from a Small Developing Economy,\u0026rdquo; \u003cem\u003eJ. Risk Financ. Manag.\u003c/em\u003e, vol. 15, no. 1, p. 11, Jan. 2022, doi: 10.3390/jrfm15010011.\u003c/li\u003e\n\u003cli\u003eT. Kircicek and G. Ozparlak, \u0026ldquo;The essential role of international trade on economic growth,\u0026rdquo; \u003cem\u003ePressacademia\u003c/em\u003e, Dec. 2023, doi: 10.17261/Pressacademia.2023.1826.\u003c/li\u003e\n\u003cli\u003eX. Wan, S. A. Ajaz Kazmi, and C. Yeewong, \u0026ldquo;Manufacturing, Exports, and Sustainable Growth: Evidence from Developing Countries,\u0026rdquo; \u003cem\u003eSustain.\u003c/em\u003e, vol. 14, no. 3, pp. 1\u0026ndash;22, 2022, doi: 10.3390/su14031646.\u003c/li\u003e\n\u003cli\u003eM. Stojanović, I. Božić-Miljković, J. Obradović, and L. Dimitrijević, \u0026ldquo;The impact of imports and exports on economic growth: Panel data analysis,\u0026rdquo; \u003cem\u003eEkonomika\u003c/em\u003e, vol. 69, no. 4, pp. 69\u0026ndash;80, 2023, doi: 10.5937/ekonomika2304069S.\u003c/li\u003e\n\u003cli\u003eE. Velaj and E. Bezhani, \u0026ldquo;The Impact of Import and Export to GDP Growth \u0026ndash; The Case of Albania,\u0026rdquo; \u003cem\u003eRev. Econ. Financ.\u003c/em\u003e, vol. 20, pp. 791\u0026ndash;796, 2022, doi: 10.55365/1923.x2022.20.89.\u003c/li\u003e\n\u003cli\u003eM. A. Khan \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;Application of Machine Learning Algorithms for Sustainable Business Management Based on Macro-Economic Data: Supervised Learning Techniques Approach,\u0026rdquo; \u003cem\u003eSustain.\u003c/em\u003e, vol. 14, no. 16, 2022, doi: 10.3390/su14169964.\u003c/li\u003e\n\u003cli\u003eA. Panagiotelis, G. Athanasopoulos, R. J. Hyndman, B. Jiang, and F. Vahid, \u0026ldquo;Macroeconomic forecasting for Australia using a large number of predictors,\u0026rdquo; \u003cem\u003eInt. J. Forecast.\u003c/em\u003e, vol. 35, no. 2, pp. 616\u0026ndash;633, Apr. 2019, doi: 10.1016/j.ijforecast.2018.12.002.\u003c/li\u003e\n\u003cli\u003eV. S. Kumar, \u0026ldquo;Artificial Intelligence in Economic Analysis: An Overview of Techniques, Applications and Challenges,\u0026rdquo; \u003cem\u003eAsian J. Econ. Financ. Manag.\u003c/em\u003e, vol. 6, no. 1, pp. 388\u0026ndash;396, Dec. 2024, doi: 10.56557/ajefm/2024/v6i1246.\u003c/li\u003e\n\u003cli\u003eP. Goulet Coulombe, M. Leroux, D. Stevanovic, and S. Surprenant, \u0026ldquo;How is machine learning useful for macroeconomic forecasting?,\u0026rdquo; \u003cem\u003eJ. Appl. Econom.\u003c/em\u003e, vol. 37, no. 5, pp. 920\u0026ndash;964, 2022, doi: 10.1002/jae.2910.\u003c/li\u003e\n\u003cli\u003eK. Maehashi and M. Shintani, \u0026ldquo;Macroeconomic forecasting using factor models and machine learning: an application to Japan,\u0026rdquo; \u003cem\u003eJ. Jpn. Int. Econ.\u003c/em\u003e, vol. 58, p. 101104, Dec. 2020, doi: 10.1016/j.jjie.2020.101104.\u003c/li\u003e\n\u003cli\u003eW. Li and K. L. E. Law, \u0026ldquo;Deep Learning Models for Time Series Forecasting: A Review,\u0026rdquo; \u003cem\u003eIEEE Access\u003c/em\u003e, vol. 12, pp. 92306\u0026ndash;92327, 2024, doi: 10.1109/ACCESS.2024.3422528.\u003c/li\u003e\n\u003cli\u003eG. N. Jul, \u0026ldquo;Machine Learning for Economic Forecasting : An Application to China \u0026rsquo; s GDP Growth,\u0026rdquo; pp. 1\u0026ndash;40, 2024.\u003c/li\u003e\n\u003cli\u003eC. Heaton, N. Ponomareva, and Q. Zhang, \u0026ldquo;Forecasting models for the Chinese macroeconomy: the simpler the better?,\u0026rdquo; \u003cem\u003eEmpir. Econ.\u003c/em\u003e, vol. 58, no. 1, pp. 139\u0026ndash;167, Jan. 2020, doi: 10.1007/s00181-019-01788-0.\u003c/li\u003e\n\u003cli\u003eA. Costantiello, L. Laureti, and A. Leogrande, \u0026ldquo;Open Access Estimation and Machine Learning Prediction of Imports of Goods in European Countries in the Period 2010-2019,\u0026rdquo; \u003cem\u003eAm. J. Humanit. Soc. Sci. Res.\u003c/em\u003e, no. 7, pp. 188\u0026ndash;205, 2021.\u003c/li\u003e\n\u003cli\u003eW Bank, \u0026ldquo;World Bank.\u0026rdquo; [Online]. Available: https://data.worldbank.org/\u003c/li\u003e\n\u003cli\u003eA. Mishra, \u0026ldquo;Accessing the World Bank open data programmatically,\u0026rdquo; \u003cem\u003eXRDS Crossroads, ACM Mag. Students\u003c/em\u003e, vol. 18, no. 2, pp. 44\u0026ndash;45, Dec. 2011, doi: 10.1145/2043236.2043253.\u003c/li\u003e\n\u003cli\u003eO. Sami, Y. Elsheikh, and F. Almasalha, \u0026ldquo;The Role of Data Pre-processing Techniques in Improving Machine Learning Accuracy for Predicting Coronary Heart Disease,\u0026rdquo; \u003cem\u003eInt. J. Adv. Comput. Sci. Appl.\u003c/em\u003e, vol. 12, no. 6, 2021, doi: 10.14569/IJACSA.2021.0120695.\u003c/li\u003e\n\u003cli\u003eY. J. Chen, J. A. Lin, Y. M. Chen, and J. H. Wu, \u0026ldquo;Financial Forecasting with Multivariate Adaptive Regression Splines and Queen Genetic Algorithm-Support Vector Regression,\u0026rdquo; \u003cem\u003eIEEE Access\u003c/em\u003e, vol. 7, pp. 112931\u0026ndash;112938, 2019, doi: 10.1109/ACCESS.2019.2927277.\u003c/li\u003e\n\u003cli\u003eG. Dudek, \u0026ldquo;A Comprehensive Study of Random Forest for Short-Term Load Forecasting,\u0026rdquo; \u003cem\u003eEnergies\u003c/em\u003e, vol. 15, no. 20, 2022, doi: 10.3390/en15207547.\u003c/li\u003e\n\u003cli\u003eJ. M. Klusowski and P. M. Tian, \u0026ldquo;Large Scale Prediction with Decision Trees arXiv : 2104 . 13881v5 [ stat . ML ] 13 Nov 2023,\u0026rdquo; pp. 1\u0026ndash;47.\u003c/li\u003e\n\u003cli\u003eM. A. Morid, O. R. L. I. U. Sheng, and J. Dunbar, \u0026ldquo;Time Series Prediction Using Deep Learning Methods in Healthcare,\u0026rdquo; vol. 14, no. 1, 2023, doi: 10.1145/3531326.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"Sriwijaya University","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Indonesia Imports, Forecasting, Machine Learning, Time Series Analysis, Comparative Study, Macroeconomic Data","lastPublishedDoi":"10.21203/rs.3.rs-6691034/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6691034/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eAccurate forecasting of import values is crucial for effective economic planning and policy-making in emerging economies like Indonesia. Traditional forecasting methods often face challenges in capturing the complex, non-linear dynamics inherent in macroeconomic time series data. This study evaluates the performance of three prominent Machine Learning (ML) models\u0026mdash;Support Vector Regression (SVR), Random Forest, and Decision Tree\u0026mdash;for forecasting Indonesian goods and services imports. Utilising historical macroeconomic time series data for Indonesia spanning 1970\u0026ndash;2023, the models were trained and rigorously evaluated using standard metrics, including mean squared error (MSE), mean absolute error (MAE), and the coefficient of determination (R\u003csup\u003e2\u003c/sup\u003e). To address the limitation of the relatively small original dataset size, data augmentation via linear interpolation was explored, and the models' prediction accuracy for the year 2023 was specifically assessed. The results indicate that SVR demonstrated superior performance compared to Random Forest and Decision Tree based on the evaluation metrics and achieved the highest accuracy in predicting the 2023 import value, particularly after data interpolation was applied, which generally improved point prediction accuracy. The findings suggest that ML models, especially SVR, are effective and promising tools for enhancing the precision of Indonesian import forecasting. This research provides valuable empirical evidence for policymakers and practitioners seeking to leverage advanced computational techniques for improved economic forecasting and planning in an emerging market context while also highlighting considerations related to data characteristics and augmentation strategies for future methodological advancements.\u003c/p\u003e","manuscriptTitle":"Forecasting Indonesian Goods and Services Imports Using Machine Learning: A Comparative Evaluation of Model Performance","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-05-21 10:41:52","doi":"10.21203/rs.3.rs-6691034/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"d2a3394e-3bca-4dc3-b805-d76edbfb4f87","owner":[],"postedDate":"May 21st, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":48684191,"name":"Macroeconomics"},{"id":48684192,"name":"Theoretical Computer Science"}],"tags":[],"updatedAt":"2025-05-21T10:41:52+00:00","versionOfRecord":[],"versionCreatedAt":"2025-05-21 10:41:52","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-6691034","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6691034","identity":"rs-6691034","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00