Crop yield forecasting in Senegal: application of Machine Learning methods

preprint OA: closed
Full text JSON View at publisher
Full text 135,637 characters · extracted from preprint-html · click to expand
Crop yield forecasting in Senegal: application of Machine Learning methods | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Crop yield forecasting in Senegal: application of Machine Learning methods Ndèye Khady Guissé SECK, Ablaye NGOM, Papa NGOM, Kandioura NOBA This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6269900/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 15 You are reading this latest preprint version Abstract Two approaches are generally used to predict crop yields. The first is based on Machine Learning methods and the second on mechanistic models. In this study, the robustness of Machine Learning methods in predicting groundnut, millet and cotton yields in Senegal is assessed. These methods are Multiple stepwise regression, Least Absolute Selection and Shrinkage Operator (LASSO) regression and Random Forest regression. These prediction models were tested using a collection of historical agricultural and climatic data for Senegal from 1980 to 2021. Analysis of agricultural trends reveals marked inter-annual variability depending on the crop and period: low variability between 1990–2000 for groundnuts and millet, and between 2011–2021 for cotton. High variability between 2000–2010 for groundnuts and cotton, and between 1980–1990 for millet. Overall, area, production and yields fluctuate widely depending on the periods and crops studied. The crop yield prediction models of groundnut, millet and cotton performed satisfactorily for test dataset except for cotton. They perform well for groundnut and millet, with high R 2 for LASSO regression (0.96 and 0.98) and stepwise multiple regression (0.93 and 0.98). Cotton had a low R 2 for Random Forest (0.01). The LASSO regression gave the lowest values of RMSE and MAE. Overall, it is the best model for predicting groundnut, millet and cotton yields in Senegal. Prediction Machine Learning Models Agricultural yields Climate change Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 INTRODUCTION Crop modelling has been developed since the 1980s and 1990s to predict crop yields. It is crucial for improving agricultural production (Torquebiau, 2015 ) and ensuring global food security. The low productivity of the agricultural sector is linked to many factors, including climatic disturbances (Noba et al., 2014 ). Numerous models have been proposed and validated to date (Ansarifar et al., 2021 ). These models offer the possibility of measuring the links between climate and agriculture by transforming climate data such as temperature, precipitation, etc. into agronomic variables such as crop yields, biomass, etc. (Sultan et al., 2015 ). Most studies on crop yield prediction fall into two categories: processed-based crop models, also known as mechanistic models, and data-driven machine learning models, which are mainly based on collected historical data (Sultan et al., 2015 ; Chipanshi et al., 2015 ; Maestrini et al., 2022 ; Chang et al., 2023 ). In Senegal, different forecasting approaches have been used to study the effect of climate change on agricultural yields. The majority of this research focuses on the use of mechanistic methods to make predictions (Garcia, 2015 ; Kouakou, 2013 ). However, mechanistic models require a fairly considerable amount of input information in order to make the model operational (Basso et al., 2013 ). In addition, these models have limited predictive performance due to the high variability of environmental conditions, which is associated with model structure and parameter decisions, beyond the spatio-temporal variability of yields observed over a wide area (Chang et al., 2023 ). Moreover, most mechanistic models implemented in software such as DSSAT, APSIM, SARRAH etc. use daily data for model calibration. Machine Learning models, on the other hand, reduce the computational load (Bergez et al., 2023 ), and make it possible to produce easy, accurate and up-to-date forecasts (Reisi-gahrouei et al., 2019 ; Mirani et al. 2021 ). These models are very useful for high-resolution simulations over a vast area (Sultan et al., 2015 ; Bergez et al., 2023 ). They are able to implicitly account for additive effects and interactions of different parameters, which allows them to outperform most crop models in terms of prediction (Chang et al., 2023 ). With sufficient monthly data in terms of quantity and quality, Machine Learning models can be effectively calibrated to provide accurate predictions. Thus, this study aims to assess the robustness of Machine Learning methods such as Stepwise Multiple regression, LASSO regression and Random Forest regression in predicting the yield of Senegal's main food and cash crops (groundnut, millet and cotton). 1 MATERIALS AND METHODS 1.1 Presentation of the study area The study was carried out in Senegal, covering the period from 1980 to 2021. Senegal is located in the extreme west of the African continent, between 12°5 and 16°5 north latitude and 11°5 and 17°5 west longitude. Covering an area of 196,722 km2, it is bordered to the north by Mauritania, to the east by Mali, to the south by Guinea and Guinea Bissau, to the west by Gambia, and by the Atlantic Ocean along a 500 km coastline. Dakar (550 km2), the capital, is a peninsula in the far west. Senegal is subdivided into six eco-geographical (agro-ecological) zones. The delimitation of these zones is based on a combination of biophysical and socio-economic factors. The agro-ecological zones are: the Groundnut Basin, Casamance, the Niayes zone, Eastern Senegal, the River Valley and the sylvopastoral zone (Ferlo) (Fig. 1 ). Senegal has a dry tropical climate with two seasons: a dry season from November to June and a rainy season from July to October. 1.2 Data source and parameters used Quantitative data were used in this study. These include agricultural data of groundnuts, millet and cotton from 1980 to 2021 (42 years), covering areas, production and yields from the Direction de l'Analyse, de la Prévision et des Statistiques Agricoles (DAPSA). Climatic data such as rainfall, temperature and humidity were also obtained for the same periods from the Agence Nationale de l'Aviation Civile et de la Météorologie du Sénégal (ANACIM). Additional data were extracted from online databases (FAOSTAT: https://www.fao.org/faostat/en/#home and Data Access Viewer (DAV): https://power.larc.nasa.gov/data-access-viewer/ ). All these comprehensive data cover all regions of Senegal. A total of 11 parameters were used to calibrate the model. These are area, production and yield for the agricultural parameters, and annual cumulative rainfall, maximum monthly rainfall from June to October, mean annual temperature, maximum monthly temperature (Tmax), minimum monthly temperature (Tmin), relative humidity and root zone soil wetness (RZSW) for the climatic parameters. The variable year is mainly a time parameter. Yield \(\:\left(Y\right)\) is the response variable and the other variables are explanatory variables ( \(\:{X}_{i}\) ). 1.3 Machine Learning methods 1.3.1 Stepwise Multiple Linear Regression In statistics, Multiple Linear Regression (MLR) is commonly used to find the combination of predictors \(\:{X}_{i}\) ( \(\:i=1,\cdots\:p\::\:\text{e}\text{x}\text{p}\text{l}\text{a}\text{n}\text{a}\text{t}\text{o}\text{r}\text{y}\:\text{v}\text{a}\text{r}\text{i}\text{a}\text{b}\text{l}\text{e}\text{s}\) ) that best explains the dependent variable \(\:Y\) (response variable). It aims to establish links between a dependent variable and several independent variables. The original form of linear regression, known as the method of least squares, was first introduced by Adrien-Marie Legendre in 1805 and published by Johann Carl Friedrich Gauss in 1809 (Abbas et al., 2020 ). Stepwise regression is a classical variable selection method that identifies and selects specific groups of important explanatory variables (Ajith et al., 2023 ). It is used to select the best regression variables from a large number of independent variables (Abhinaya et al., 2021 ). The MLR model can be described as follows: \(\:Y=\:{\beta\:}_{0}+{\beta\:}_{1}{X}_{1}+\:{\beta\:}_{2}{X}_{2}+\:\cdots\:{\beta\:}_{P}{X}_{P}+\:{\epsilon\:}_{i}\:,\:\:\:i=1,\cdots\:,\:\:p,\:\) ( 1 ) this equation can be written in matrix form as: \(\:Y=X\beta\:+\epsilon\:\) , ( 2 ) where : \(\:Y\) is an observed random variable, called the variable to be explained, \(\:Y=({y}_{1},\cdots\:,{y}_{n})\) , \(\:n\) is the number of observations, \(\:{X}_{1},\cdots\:,\:{X}_{p}\) are explanatory variables or regressors, \(\:{\beta\:}_{0},{\beta\:}_{1}\cdots\:,\:{\beta\:}_{p}\:\) are unknown real parameters to be estimated, called regression parameters or regression coefficients, \(\:{\epsilon\:}_{i}\) are unobserved random variables independent of \(\:{X}_{i}\) , known as errors or noise, to which certain additional conditions are imposed. To estimate the regression coefficients \(\:\beta\:\) , the ordinary least squares (OLS) method is applied. This involves minimizing the sum of squared errors. In other words, it corresponds to minimizing the sum of squared deviations between observed and predicted values. For multiple regression, this method estimates \(\:\beta\:\) by the value that minimizes \(\:{\left(Y-X\beta\:\right)}^{T}(Y-X\beta\:)\) , ( 3 ) hence the estimator of \(\:\beta\:\) , \(\:\widehat{\beta\:}\) , is: \(\:\widehat{\beta\:}={\left({X}^{T}X\right)}^{-1}{X}^{T}Y,\) ( 4 ) where \(\:{X}^{T}\:\) is the transpose of \(\:X\) , \(\:{\left({X}^{T}X\right)}^{-1}\) is the inverse of the matrix \(\:\left({X}^{T}X\right)\) . 1.3.2 Least Absolute Shrinkage and Selection Operator regression (LASSO regression) Inspired by Ridge's penalized regression, the LASSO (Least Absolute Selection and Shrinkage Operator) regression was developed in 1996 by Robert Tibshirani. This method both shrinks the size of the coefficients and performs variable selection by reducing the regression coefficients to zero and penalizing the regression model with a penalty term called norm \(\:{L}_{1}\) , which is the sum of the absolute coefficients (Tibshirani, 1996 ). The penalty has the effect of forcing certain coefficient estimates, whose contribution to the model is minor, to be exactly equal to zero (Tibshirani, 1996 ). A clear advantage of LASSO regression over Ridge regression is that it produces simpler, easier-to-interpret models that incorporate only a reduced set of predictors (Abhinaya et al., 2021 ). The statistical model takes the form of Eq. (1). When \(\:p\) is large or the variables are linearly dependent, least-squares estimators can fail. Indeed, the Ordinary Least Squares (OLS) method used to minimize residual squared errors in regressions has some major drawbacks (Hammami et al., 2012 ). If the number of independent variables is high, or if the explanatory variables are highly correlated, the variance of the least-squares coefficient estimates can be unacceptably high, leading to a lack of interpretation and accuracy (Hammami et al., 2012 ; Tibshirani, 1996 ). Thus, the penalized (or constrained) LASSO method consists in restricting the space over which this penalty criterion is minimized. The penalty (constraint) of the LASSO method is given by: \(\:p\left(\beta\:\right)={‖\beta\:‖}_{1},\:\) ( 5 ) \(\:{‖\beta\:‖}_{1}=\:\sum\:_{i=1}^{p}\left|{\beta\:}_{i}\right|\) . ( 6 ) The estimator of \(\:{\beta\:}\) , \(\:\widehat{\beta\:}\) , by the LASSO method is defined by: \(\:{\widehat{\beta\:}}_{Lasso}={arg}\underset{\beta\:}{{min}}\left\{{\left|y-X\beta\:\right|}^{2}+\:\sum\:_{i=1}^{P}\left|{\beta\:}_{i}\right|\right\},\) ( 7 ) Where \(\:p\) is the number of observations, \(\:X\) is the matrix of explanatory variables, y represents the vector of the response variable and λ \(\:\:0\:\) represents a tuning parameter known as the tuning or shrinkage parameter. 1.3.3 Random Forest regression Random Forest is one of the most advanced and powerful machine learning algorithms introduced in 2001 by Breiman. Random Forest is a non-parametric algorithm that constructs a set of decision trees, each of which is calibrated using Bootstrap sampling (Ruiz-Aĺvarez et al., 2021 ; Gaál, 2012 ). The features to perform each division in the trees are selected from a random subsample of the feature set (Ruiz-Aĺvarez et al., 2021 ). In Random Forest, the classifier can be described as the collection of tree-structured classifiers that is an advanced version of Bagging classification to which randomness has been added (Breiman, 2001 ). Random Forest divides each node using the best among a subset of randomly chosen predictors at that node (Ok et al., 2012 ). The CART algorithm is used to create trees (Breiman, 2001 ). Random Forest uses two parameters: the number of regression trees (n_tree) to be developed and the number of randomly sampled features in each division (m_try) (Ruiz-Aĺvarez et al., 2021 ). Once the model has been trained, the prediction can be obtained as follows: \(\:\widehat{Y}=\:\frac{1}{M}\sum\:_{m=1}^{M}{T}_{m}\left({y}_{i}\right),\:\:i=1,\cdots\:,\:p,\:\) ( 8 ) where \(\:M\) is the number of trees, \(\:{T}_{m}\) is a single decision tree and \(\:{y}_{i}\:\) is the vector of predictors. 1.4 Model performance measurement Model performance was tested using various statistical measures. The use of several measures enabled us to evaluate the performance of a single model and to compare several models. In this study, model performance was tested using: coefficient of determination ( \(\:{R}^{2}\) ), Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). Coefficient of determination ( \(\:{R}^{2}\) ): $$\:{R}^{2}=1-\frac{{\sum\:}_{i=1}^{n}{({Y}_{i}-{\widehat{Y}}_{i})}^{2}}{{\sum\:}_{i=1}^{n}{({Y}_{i}-{\stackrel{-}{Y}}_{i})}^{2}}$$ ( 9 ) The \(\:{R}^{2}\) corresponds to the proportion of \(\:Y\) variability explained by the model. In multiple regression models, it corresponds to the squared correlation between the observed values and the values predicted by the model. The closer the \(\:{R}^{2}\) is to 1, the better the model's prediction. Root Mean Square Error (RMSE): $$\:RMSE=\sqrt{\frac{{\sum\:}_{i=1}^{n}{({Y}_{i}-{\widehat{Y}}_{i})}^{2}}{n}}$$ ( 10 ) RMSE is a metric that measures the average difference between the predicted values of a statistical model and the actual values. An RMSE value of zero indicates a perfect model fit. The lower the RMSE, the better the model and its predictions. Mean Absolute Error (MAE): \(\:MAE=\frac{1}{n}\sum\:_{i=1}^{n}\left|{Y}_{i}-{\widehat{Y}}_{i}\right|\) ( 11 ) The Mean Absolute Error (MAE) measures the average absolute magnitude between actual values and the values predicted by the regression model. The lower the MAE, the better the model's predictions. where \(\:{Y}_{i}\) is the actual value; \(\:{\widehat{Y}}_{i}\) is the predicted value; \(\:{\stackrel{-}{Y}}_{i}\) is the mean value of \(\:{Y}_{i}\) ; and \(\:i=\text{1,2},3,\cdots\:,n\) . 1.5 Data processing and analysis R software version 4.2.0 is used to analyze and model agricultural and climatic data. The general methodology involves data visualization and yield modeling. The ggplot2 package was used to create graphs to visualize production, area sown and yields of groundnuts, millet and cotton as a function of years in Senegal, as well as their trend curves. The trend curve is generated using the simple linear regression model with the lm function under the null hypothesis H 0 : “ that there is no significant linear relationship between the variables sown area, production and yield respectively with the year variable”. Before implementing the models, the createDataPartition function in the caret package was used to randomly divide the dataset into a training dataset and test dataset. A 60 − 40% split between training and test data was used due to the limited sample size. This balance allows for sufficient training data while maintaining a reasonable test set for model evaluation. Additionally, 10-fold cross-validation on the training set helps mitigate the impact of the smaller training data size, ensuring a robust performance assessment. The k-fold cross-validation was carried out using the caret package. Mass package was used to perform Stepwise Multiple Linear Regression model. The R package most commonly used to perform constrained linear regression (e.g. LASSO regression for our study) is glmnet . The model is implemented using a function also called glmnet(). To implement Random Forest regression on R, a set of packages is required: dplyr (data manipulation), randomForest (model implementation), randomForestExplainer (graphics) and ggplot2 (graphics). The caret and Metrics packages were also used for performance measurement. 2 RESULTS 2.1 Analysis of trends in agricultural parameters 2.1.1 Change in area sown in hectares by crop An increasing trend in the area sown to groundnuts over the period studied was noted, whereas for millet and cotton, the trend was downwards with very large annual fluctuations (Fig. 2 ). The p-values (p) are all less than α = 0.05 (significance threshold), so there is a significant relationship between the area sown variable and the year variable. Observation of the raw data shows significant variation for cotton, groundnuts and millet. There was little variation in cotton land reserves between 2011 and 2021, with an amplitude of 17,183 ha. On the other hand, there was a large change between 2000 and 2010, with a variation of 33,261 ha. The most drastic drop in area was seen in 2019, down to 16,511 ha, while the largest increase was seen in 1997, at 54,439 ha. For groundnuts, the smallest amplitude was obtained in the period from 1990 to 2000, i.e. 511,778 ha, indicating little variation in the data. The greatest variation was between 2000 and 2010, at 670,730 ha, indicating a large change in area over this period. Cultivated areas fell drastically in 1998 and 2003, to 519,168 and 524,843 ha respectively. In 2017, however, there was a dramatic increase in the area set aside for groundnuts, to 1,254,058 ha. For millet, the smallest area is obtained in the period from 2011 to 2021, i.e. 307 069 ha, indicating little variation in the area sown during this period. However, the largest area (510,032 ha) is recorded in the period from 1980 to 1990, indicating high variability in area. The years 2004 and 2007 were marked by significant decreases in the area sown, to 686,929 and 686,892 ha respectively. The largest area sown was in 1985, at 1,337,805 ha. 2.1.2 Change in production in tonnes by type of crop There is an upward trend in groundnut and millet production. For cotton, on the other hand, the trend is downwards (Fig. 3 ). The p-values (p) for groundnuts and cotton are less than α = 0.05, so there is a significant relationship between the area sown variable and the year variable for groundnuts and cotton. However, the probability value associated with the trend for millet is greater than 0.05. Raw data shows that for cotton, the lowest amplitude (13,697t) is recorded in the period from 2011 to 2021, indicating low production variability. The greatest variability was recorded between 1980 and 1990, with an amplitude of 38,531t. Production is marked by a series of declines over the years, with drastic falls in 2016 and 2018 (15,121 and 15,160 t respectively). However, a significant increase was seen in 1984, with 59,495 t. Millet production is variable: the period from 1990 to 2000 was marked by low variability in production, with a range of 367,652t, while high variability was observed in the period from 2011 to 2021, with a range of 735,862t. Recurring declines were observed, notably in 2007, with a significant drop to 318,822t, while a significant increase was noted in 2020, to 1,144,855t. For groundnut production varied little between 1990 and 2000 (497,612t), while it varied significantly between 2011 and 2021 (1,269,958t). Production fell drastically in 2002, to 260,723t, before rising sharply to 1,797,486t in 2020. 2.1.3 Changes in yield (kg/ha) by type of crop Production yields change over time for groundnuts, millet and cotton (Fig. 4 ). The probability value associated with the trend for groundnuts and millet is less than 0.05. This shows that there is a significant effect between the yields of groundnuts and millet. This shows that there is a significant effect between groundnut and millet yield and the year variable, unlike for cotton (p<0.05). Cotton yields varied little between 2000 and 2010, with a range of 283 kg/ha, whereas the period from 1980 to 1990 was marked by high variability, with a range of 583 kg/ha. The lowest cotton yield was recorded in 1998 at 604 kg/ha, while a significant increase in cotton yield was recorded in 1984 at 1,284 kg/ha. Groundnut yields remain variable throughout the period studied. A small variation in yields is recorded in the period from 1990 to 2000 with a range of 455 kg/ha, while a large variation in yields is observed between 2011 and 2021 with a range of 868 kg/ha. After the low groundnut yields recorded in 1980 and 2002 (489 kg/ha and 320 kg/ha respectively), a considerable increase was noted in 2020 (1,467 kg/ha). Millet yields are increasing more or less steadily. The period from 1990 to 2000 was marked by low yield variability, with a range of 245 kg/ha, while high yield variability was achieved in the period from 2011 to 2021, with a range of 548 kg/ha. The lowest yield was observed in 1983 with 425 kg/ha, while the highest yield was obtained in 2020 with 1,119 kg/ha. 2.2 Crop yield prediction 2.2.1 With the stepwise Multiple Linear Model Overall, groundnut, millet and cotton yields in Senegal are influenced by production and area (Table 1). For groundnut, cumulative rainfall, relative humidity and root zone soil moisture (RZSW) also had an effect on yield. The RZSW variable had a positive effect on groundnut yield, unlike the other variables. For millet, mean annual temperature, relative humidity and RZSW also influence yield. Unlike production and relative humidity, which have positive effects on millet yield, area, mean annual temperature and RZSW have negative coefficients. The variables year, cumulative rainfall, mean annual temperature and mean monthly temperature (Tmin) also had an effect on cotton yield. Cumulative rainfall and production have positive effects on cotton yield. Tableau 1 : Stepwise Multiple Linear Model Groundnut Estimate Std. Error t value Pr(>|t|) (Intercept) 7.11e + 02 1.83e + 02 3.882 0.000926 *** Production 9.66e-04 3.65e-05 26.464 < 2e-16 *** Area sown -9.36e-04 5.16e-05 -18.154 6.80e-14 *** Cumulative rainfall -9.76e-02 4.22e-02 -2.314 0.031416 * Relative humidity -9.76e + 00 5.88e + 00 -1.66 0.112473 RZSW 2.05e + 03 9.00e + 02 2.272 0.034253 * Millet Estimate Std. Error t value Pr(>|t|) (Intercept) 2.08e + 03 7.71e + 02 2.703 0.0137 * Production 1.15e-03 6.16e-05 18.675 3.98e-14 *** Area sown -7.85e-04 7.42e-05 -10.581 1.21e-09 *** Mean annual temperature -3.84e + 01 2.37e + 01 -1.62 0.121 Relative humidity 1.04e + 01 5.67e + 00 1.831 0.082 . RZSW -2.03e + 03 8.46e + 02 -2.405 0.026 * Cotton Estimate Std. Error t value Pr(>|t|) (Intercept) 6.46e + 03 1.58e + 03 4.101 0.000609 *** Year -4.12e + 00 8.96e-01 -4.6 0.000195 *** Production 2.42e-02 1.17e-03 20.61 1.84e-14 *** Area sown -2.27e-02 1.35e-03 -16.79 7.46e-13 *** Cumulative rainfall 1.56e-01 3.27e-02 4.782 0.00013 *** Mean annual temperature 9.82e + 01 2.58e + 01 3.811 0.001181 ** Tmin -1.69e + 01 8.07e + 00 -2.088 0.050482 . Note: *** (**, *) means that the appropriate explanatory variable has an effect at the 1% (5%, 10%) significance level. 2.2.2 With the LASSO regression model Threshold lambda values for groundnut, millet and cotton are shown with an upper and lower limit (Fig. 5, left panel). Two specific lambda values are highlighted by vertical dotted lines. The lambda value was chosen from this range. The tables (Fig. 5, right panel) show the climatic and agricultural variables that are important for predicting groundnut, millet and cotton yields in Senegal. Among these variables, year, production, area, cumulative rainfall, Tmax, Tmin and RZSW were found to be the most influential and were used to build the groundnut forecasting model. However, production and Tmax had a positive effect on yield, unlike the other variables. On the other hand, production, area, mean monthly rainfall, mean annual temperature, Tmax, Tmin, relative humidity and RZSW are important variables for predicting millet yield. Unlike the other variables, production, year, Tmin and Tmax have positive effects on millet yield. For cotton, the variables that influence yield are: year, production, area, cumulative rainfall, mean annual temperature, Tmax, Tmin and relative humidity. Among these variables, production, cumulative rainfall and mean annual temperature have a positive effect on cotton yield. 2.2.3 With Random Forest regression model The percentage increase in mean square error (%IncMSE) and the increase in node purity (IncNodePurity) are measures commonly used in Random Forest algorithms, particularly in the context of assessing the importance of variables in the model. Figure 6 shows the ranking of the relative importance of climatic and agricultural variables in predicting the yield of groundnut, millet and cotton crops in Senegal. Variables with higher %IncMSE values are considered more important in the model, contributing more to reducing error or increasing purity when making decisions in the tree. Production, area, cumulative rainfall and year are the most important predictors of groundnut yield. For millet, the results showed that production, year, average monthly rainfall and relative humidity were the most important variables for predicting yields. For cotton, the variables with the greatest impact on yield are production, cumulative rainfall and year. 2.3 Performance evaluation of the three models To compare the best results between the stepwise Multiple Linear regression, the LASSO regression and the Random Forest regression, R 2 , RMSE and MAE were calculated (Table 2). Almost all the models show a very good quality of fit. The coefficients of determination R 2 for groundnut, millet and cotton are respectively 0.93, 0.98 and 0.70 for the stepwise Multiple Linear regression; 0.96, 0.98 and 0.70 for the LASSO regression and 0.74, 0.87 and 0.01 for the Random Forest regression. For the latter, the R 2 obtained for cotton is very low, showing that the model performs poorly in predicting cotton yields in Senegal. The RMSE and MAE statistical indicators for the three regression methods presented in Table 2 shows that the LASSO regression has the lowest values of RMSE (44.63, 16.62 and 59.04) and MAE (63.21, 21.19 and 79.55) respectivey for groundnut, millet and cotton, which represents a better result for the LASSO regression. The performance of the groundnut, millet and cotton yield prediction equation was tested by comparing the predicted values with the observed values for the period 1980 to 2021, which are presented in Fig. 7. It shows that, overall, the 3 models studied show satisfactory goodness of fit (R2 close to 1) for all the test data except for cotton with the Random Forest regression. Most of the models showed little variation between observed and predicted yields. A large proportion of the points are closely clustered around the reference line and are within the 95% confidence interval. However, a large variability was observed in the prediction of cotton yield by Random Forest regression. This clearly means that the prediction accuracy of the LASSO method is better than that of the stepwise method and Random Forest for groundnut, millet and cotton. Tableau 2 : Comparison of prediction performance Modèle Spéculation Régression Linéaire Multiple pas à pas Régression LASSO Régression de Random Forest R 2 RMSE MAE R 2 RMSE MAE R 2 RMSE MAE Groundnut 0.93 77.09 54.52 0.96 63.21 44.63 0.74 187.83 142.17 Millet 0.98 26.30 21.05 0.98 21.19 16.62 0.87 66.74 51.58 Cotton 0.70 80.51 60.55 0.70 79.55 59.04 0.01 159.70 126.76 3 Discussion Agriculture is a sector that is highly dependent on the climate. The Sahel is most affected by climate trends and variability, particularly in Senegal where the majority of crops depend on the climate (Alhassane et al., 2013 ). During the study period, the areas set aside for groundnuts increased, while a decrease was observed for millet and cotton. Groundnut and millet production also increased. Cotton, on the other hand, is down. An increase in groundnut, millet and cotton yields has also been observed. These variations could be due to the fact that rainfall in Senegal varies according to climatic zone (Faye et al. , 2017; Mballo et al., 2021 ). The 70s and 80s were marked by severe drought in the Sahel, which had a negative impact on crop yields (Paturel et al., 1997 ; Sarr, 2006 ). This explains the low and varied yields observed during the study period. The results of this study are consistent with those of Faye et al. ( 2018 ), who showed that rainfall alone does not determine agricultural production. This is why, in some years, rainfall is high and yields are low. The LASSO model gives better results for predicting groundnut, millet and cotton yields. LASSO is a method that combines regularisation and variable selection by imposing a constraint on the sum of the absolute values of the coefficients. It reduces certain coefficients to zero, thereby selecting only those variables whose coefficient remains non-zero, with the aim of minimising the prediction error. The purpose of shrinkage is to prevent overfitting caused by collinearity or high dimensionality of covariates, though it may perform poorly with highly correlated datasets (Hou et al., 2018 ; Utazirubanda et al., 2021 ). Random Fotest regression performed poorly in predicting cotton yields in Senegal, although the goodness of fit was good for groundnuts and millet. Multiple linear stepwise regression also performed well, with a very good model fit for all the crops studied. On a larger scale, these results confirm the work of Singh et al. ( 2019 ) on wheat which revealed a higher prediction accuracy of the LASSO model compared to the Multiple Step Linear model using meteorological indices. Similarly, Kumar et al. ( 2021 ), in their research on wheat yield prediction, also showed that the LASSO model was more efficient than the stepwise Multiple Linear model. On the other hand, studies by Abhinaya et al. ( 2021 ) on groundnut revealed a better fit of the data with the Multiple Linear Stepwise model. This discrepancy could be explained by the presence of multicollinearity, which can also lead to unstable and unreliable coefficient estimates due to the high correlation between the predictors. This multicollinearity, resolved by the Multiple Stepwise Linear model, enabled it to provide a better fit to the data. In Senegal, work by Sarr & Sultan ( 2023 ) using machine learning methods to predict groundnut, millet, maize and sorghum yields revealed that Random Forest regression was among the best for predicting crop yields. This disparity in the results of this study is due to the different approaches and types of data used. In fact, this study was based only on agricultural and climatic data, whereas Sarr & Sultan ( 2023 ) used three combinations of input data. These were satellite data (NDVI only), climate data only and a combination of satellite and climate data to predict crop yields. Conclusion This study assessed the robustness of yield prediction models for groundnut, millet and cotton using Machine Learning methods. The results showed an increase in the area set aside for groundnuts and a decrease in the area set aside for millet and cotton. In terms of production, an increase was also noted for groundnuts and millet. Cotton production, on the other hand, is down. An upward trend in groundnut, millet and cotton yields was also observed. The LASSO regression performed better and gave better results, with a coefficient of determination (R²) of 0.96 for groundnuts, 0.98 for millet and 0.70 for cotton in Senegal, compared with the Multiple stepwise regression and the Random Forest regression. It also gave the lowest RMSE and MAE values. This study is a contribution to the development of an operational decision support system for food security in Senegal. The promising results obtained encourage further work in this direction, in close collaboration with national agricultural research institutions and farmers' organisations. It would also be interesting to extend the study to other crops that are important for food security in Senegal. Declarations Conflict of interest: • The authors did not receive support from any organization for the submitted work. The authors declare that they have no conflict of interest. Funding statement: No funding was received for conducting this study. Ethics, Consent to Participate, and Consent to Publish declarations : not applicable. Author Contribution N.K.G.S. wrote the entire text of the manuscript and produced the analyses and graphs with the help of A.N. and P.N. A.N helped to draw up the graphs. P.N took part in data analysis.K.N to participate in the correction of the Manuscript.All authors reviewed the manuscript. Data Availability The data used in this manuscript include agricultural data of groundnuts, millet and cotton from 1980 to 2021 (42 years), covering areas, production and yields from the Direction de l'Analyse, de la Prévision et des Statistiques Agricoles (DAPSA). Climatic data such as rainfall, temperature and humidity were also obtained for the same periods from the Agence Nationale de l'Aviation Civile et de la Météorologie du Sénégal (ANACIM). Additional data (rainfall and root zone soil wetness (RZSW) respectively) were extracted from online databases (FAOSTAT: https://www.fao.org/faostat/en/#home and Data Access Viewer (DAV): https://power.larc.nasa.gov/data-access-viewer/). Data from DAPSA and ANACIM cannot be shared without prior authorisation from these organisations. References Abbas F., Afzaal H., Farooque A.A., Tang S. (2020). Crop Yield Prediction through Proximal Sensing and Machine Learning Algorithms. Agronomy, 10, 1046.https://doi.org/10.3390/agronomy10071046 Abhinaya D., Patil S. G., Dheebakaran Ga., Djanaguiraman M., Arockia Stephen Raj. (2021). Use of Statistical Models in Predicting Groundnut Yield in Relation to Weather Parameters. Madras Agricultural Journal, 108. https://doi.org/10.29321/MAJ.10.000546 Ajith S., Debnath M.K., Gupta D.S., Basak P. (2023). Application of statistical and machine learning models in combination with stepwise regression for predicting rapeseed-mustard yield in Northern districts of West Bengal. International Journal of Statistics and Applied Mathematics, 8, 141–149. https://doi.org/10.22271/maths.2023.v8.i3b.1004 Alhassane A., Salack S., Ly M., Lona I., Traoré S.B., Sarr B. (2013). Evolution of agro-climatic risks related to the recent trends of the rainfall regime over the Sudano-Sahelian region of West Africa. Sécheresse, 24, 282–293. https://doi.org/10.1684/sec.2013.0400 Ansarifar J., Wang L., Archontoulis S.V. (2021). An interaction regression model for crop yield prediction. Scientific Reports, 11, 17754. Bergez J.E., Constantin J., Debaeke Ph., Raynal H. and Plassin S., (2023). Modelling climate change impacts on agricultural systems. Chapter taken from: Nendel C. (Eds.), Burleigh Dodds Science, pp. 3–38. Basso, B., Cammarano, D., Carfagna, E. (2013). Review of crop yield forecasting methods and early warning systems. In: Report Presented to First Meeting of the Scientific Advisory Committee of the Gloal Strategy to Improve Agricultural and Rural Statistics. FAO, Headquarters, Rome, Italy, 18–19 July. Breiman L., (2001). Random Forests. Machine Learning, 45, 5–32. Chang Y., Latham J., Licht M., Wang L. (2023). A data-driven crop model for maize yield prediction. Communications Biology, 6, 439. Chipanshi A., Zhang Y., Kouadio L., Newlands N., Davidson A., Hill H., Warren R., Qian B., Daneshfar B., Bedard F., Reichert G. (2015). Evaluation of the Integrated Canadian Crop Yield Forecaster (ICCYF) model for in-season prediction of crop yield across the Canadian agricultural landscape. Agricultural and Forest Meteorology, 206, 137–150. https://doi.org/10.1016/j.agrformet.2015.03.007 Direction de l'Analyse de la Prévision et des Statistiques Agricoles (DAPSA), (2021). Rapport de l’Enquête Agricole Annuelle (EAA) 2020-2021. Gaál M. (2012). Modelling the impact of climate change on the Hungarian wine regions using random forest. Applied Ecology and Environmental Research, 10, 121–140. https://doi.org/10.15666/aeer/1002_121140 Garcia, L. (2015). Impact du changement climatique sur les rendements du mil et de l'arachide au Sénégal: Approche par expérimentation virtuelle (Doctoral dissertation, Montpellier SupAgro). Faye A. (2018). Climat et agriculture au Sénégal : Analyse économique de la disponibilité de l’eau d’irrigation dans un contexte de variabilité des précipitations dans les niayes. Thèse de doctorat, Université Cheikh-Anta-Diop de Dakar, Faculté des Sciences Économiques et de Gestion, Formation doctorale : Économie et changement climatique, 217p. Faye M., Fall A., Faye G., Van Hecke E. (2018). La variabilité pluviométrique et ses incidences sur les rendements agricoles dans la région des Terres Neuves du Sénégal oriental. Belgeo. https://doi.org/10.4000/belgeo.22083 Hammami D., Lee T.S., Ouarda T.B.M.J., Lee J. (2012). Predictor selection for downscaling GCM data with LASSO. Journal of Geophysical Research: Atmospheres, 117, 2012JD017864. https://doi.org/10.1029/2012JD017864 Hou, J., Paravati, A., Hou, J., Xu, R., & Murphy, J. (2018). High‐dimensional variable selection and prediction under competing risks with application to SEER‐Medicare linked data. Statistics in medicine, 37(24), 3486-3502. https://doi.org/10.1002/sim.7822 Kumar, S.D. Attri, K.K. Singh (2021). Comparison of Lasso and stepwise regression technique for wheat yield prediction. Journal of Agrometeorology, 21, 188–192. https://doi.org/10.54386/jam.v21i2.231 Maestrini B., Mimić G., Van Oort P.A.J., Jindo K., Brdar S., Athanasiadis I.N., Van Evert F.K. (2022). Mixing process-based and data-driven approaches in yield prediction. European Journal of Agronomy, 139, 126569. https://doi.org/10.1016/j.eja.2022.126569 Mballo I., Sy O., Gaye D., Sané B. (2021). Variabilité pluviométrique et développement de l’activité agricole dans la région de Kolda (Sénégal). Dynamiques environnementales, 101–126. https://doi.org/10.4000/dynenviron.6013 Noba K., Ngom A., Guèye M., Bassène C., Kane M., Diop I., Ndoye F., Mbaye M.S., Kane A., Tidiane Ba A. (2014). L’arachide au Sénégal : état des lieux, contraintes et perspectives pour la relance de la filière. OCL, 21, D205. https://doi.org/10.1051/ocl/2013039 Nouvelle Alliance pour la Sécurité Alimentaire et la Nutrition (NASAN), (2022). Ok A.O., Akar O., Gungor O. (2012). Evaluation of random forest method for agricultural crop classification. European Journal of Remote Sensing, 45, 421–432. https://doi.org/10.5721/EuJRS20124535 Paturel, J. E., Servat, É., Lubès-Niel, H., & Delattre, M. O. (1997). Variabilité climatique et analyse de séries pluviométriques de longue durée en Afrique de l'Ouest et centrale non sahélienne. Comptes Rendus de l'Académie des Sciences-Series IIA-Earth and Planetary Science, 325(10), 779-782. https://doi.org/10.1016/S1251-8050(97)82756-5 Ruiz-Aĺvarez M., Gomariz-Castillo F., Alonso-Sarría F. (2021). Evapotranspiration Response to Climate Change in Semi-Arid Areas: Using Random Forest as Multi-Model Ensemble Method. Water, 13, 222. Sarr A.B., Sultan B. (2023). Predicting crop yields in Senegal using machine learning methods. International Journal of Climatology, 43, 1817–1838. https://doi.org/10.1002/joc.7947 Sarr, B. (2006). INSTAT+ en bref Manuel d’utilisation destiné aux Ingénieurs en agrométéorologie et en météorologie aéronautique, CILSS, Centre régional Agrhymet, 74 p. Singh K.N., Singh K.K., Kumar S., Panwar S., Gurung B. (2019). Forecasting crop yield through weather indices through LASSO. The Indian Journal of Agricultural Sciences, 89. https://doi.org/10.56093/ijas.v89i3.87602 Kouakou P., K. (2013). Amélioration de la prévision des rendements du mil (Pennisetum glaucum (L.) R. Br.) au Sénégal par l’utilisation de modèles de culture : prise en compte de la sensibilité à la photopériode des variétés et de la fertilité dans les parcelles d’agriculteurs. Université Cheikh Anta Diop de Dakar, Thèse de doctorat. Kumar, S.D. Attri, K.K. Singh (2021). Comparison of Lasso and stepwise regression technique for wheat yield prediction. Journal of Agrometeorology, 21, 188–192. https://doi.org/10.54386/jam.v21i2.231 Sultan B., Roudier P., Traoré S. (2015). Chapitre 10. Les impacts du changement climatique sur les rendements agricoles en Afrique de l’Ouest. In: Les sociétés rurales face aux changements climatiques et environnementaux en Afrique de l’Ouest (eds Sultan B, Lalou R, Amadou Sanni M, Oumarou A, Soumaré MA), pp. 209–225. IRD Éditions. Torquebiau E., (2015). Changement climatique et agricultures du monde. Collection Agricultures et défis du monde, Cirad-AFD. Editions Quae, 328 pages. Tibshirani R. (1996). Regression Shrinkage and Selection Via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58, 267–288. https://www.jstor.org/stable/2346178 Utazirubanda, J. C., M. León, T., & Ngom, P. (2021). Variable selection with Group LASSO approach: Application to Cox regression with frailty model. Communications in Statistics-Simulation and Computation , 50 (3), 881- 901. https://doi.org/10.1080/03610918.2019.1571605 Reisi-gahrouei, O., Homayouni, S., Mcnairn, H., & Hosseini, M. (2019). Crop biomass estimation using multi regression analysis and neural networks from multitemporal L-band polarimetric synthetic aperture radar data. 1161. https://doi.org/10.1080/01431161.2019.1594436 Mirani, A., Memon, M. S., Chohan, R., Wagan, A. A., & Qabulio, M. (2021). Machine learning in agriculture: A review. LUME, 10, 5. https://doi.org/10.3390/s18082674 Additional Declarations No competing interests reported. Cite Share Download PDF Status: Under Review Version 1 posted Editorial decision: Revision requested 19 May, 2025 Reviews received at journal 16 May, 2025 Reviews received at journal 11 May, 2025 Reviewers agreed at journal 07 May, 2025 Reviewers agreed at journal 06 May, 2025 Reviewers agreed at journal 06 May, 2025 Reviewers agreed at journal 06 May, 2025 Reviews received at journal 23 Apr, 2025 Reviewers agreed at journal 15 Apr, 2025 Reviewers agreed at journal 15 Apr, 2025 Reviewers invited by journal 02 Apr, 2025 Editor invited by journal 01 Apr, 2025 Editor assigned by journal 29 Mar, 2025 Submission checks completed at journal 29 Mar, 2025 First submitted to journal 20 Mar, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6269900","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":446125343,"identity":"a0600010-ec31-4a7d-b500-50d7a2ec06e4","order_by":0,"name":"Ndèye Khady Guissé SECK","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA9klEQVRIiWNgGAWjYBACCQkgTuABMdnYf3wAUyRoYZCcAaKYidECYbIxSIO1EtIiObv54Y0HMnb2/OzHEoxtfm2T52NmYPzwMQe3FmmZY8YWCTzJiTN70g4k5/bdNmxjZmCWnLkNtxY5iQQzoF+YEwxusDcczu25zQjUwsbMi1dL+jeglnp7+xvsjc2WPbftCWqRlsgB2XKYcYME22Fmhh+3EwlqkZyRUwz0y/HEGWfS0hh7G24ntzEzNuP1i8SN9I03f/ZU2/O3HzNj+PHntu389uaDHz7i0QIGjD0wRhuYbCCgHgR+wBh/iFA8CkbBKBgFIw4AANFUS08dqytiAAAAAElFTkSuQmCC","orcid":"","institution":"Cheikh Anta Diop University","correspondingAuthor":true,"prefix":"","firstName":"Ndèye","middleName":"Khady Guissé","lastName":"SECK","suffix":""},{"id":446125344,"identity":"a5ea3302-085c-4cdd-93ff-e4e551cfb3ae","order_by":1,"name":"Ablaye NGOM","email":"","orcid":"","institution":"Université Cheikh Anta Diop de Dakar","correspondingAuthor":false,"prefix":"","firstName":"Ablaye","middleName":"","lastName":"NGOM","suffix":""},{"id":446125345,"identity":"14f3016d-426d-49e0-8393-3a9f1e533aab","order_by":2,"name":"Papa NGOM","email":"","orcid":"","institution":"Université Cheikh Anta Diop de Dakar","correspondingAuthor":false,"prefix":"","firstName":"Papa","middleName":"","lastName":"NGOM","suffix":""},{"id":446125349,"identity":"911749dc-ef42-4930-b707-9421d437183d","order_by":3,"name":"Kandioura NOBA","email":"","orcid":"","institution":"Université Cheikh Anta Diop de Dakar","correspondingAuthor":false,"prefix":"","firstName":"Kandioura","middleName":"","lastName":"NOBA","suffix":""}],"badges":[],"createdAt":"2025-03-20 13:08:20","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6269900/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6269900/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":81253139,"identity":"0983a761-6a3c-4a78-86cb-82e58b9d5264","added_by":"auto","created_at":"2025-04-24 03:51:36","extension":"jpg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":60288,"visible":true,"origin":"","legend":"\u003cp\u003eSenegal's geographical location and agro-ecological zones\u003c/p\u003e","description":"","filename":"1.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6269900/v1/78aa615e57569d133826b853.jpg"},{"id":81253138,"identity":"533682db-34ad-41e6-af63-3868c7602bd3","added_by":"auto","created_at":"2025-04-24 03:51:36","extension":"jpg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":50489,"visible":true,"origin":"","legend":"\u003cp\u003eTrends of areas sown in Senegal by crop and year\u003c/p\u003e","description":"","filename":"2.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6269900/v1/ecf411bf0ab644feb734c248.jpg"},{"id":81253521,"identity":"d70c8ff7-cb74-4a84-90f9-c697b1768fff","added_by":"auto","created_at":"2025-04-24 03:59:36","extension":"jpg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":53406,"visible":true,"origin":"","legend":"\u003cp\u003eTrends in groundnut, millet and cotton production in Senegal over the years\u003c/p\u003e","description":"","filename":"3.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6269900/v1/35ebcc35966533bc4138ceb9.jpg"},{"id":81253522,"identity":"4a709fb0-95f5-4824-afe4-a2a992bc04f7","added_by":"auto","created_at":"2025-04-24 03:59:36","extension":"jpg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":58766,"visible":true,"origin":"","legend":"\u003cp\u003eEvolution du rendement de l’arachide, du mil et du coton au Sénégal en fonction des années\u003c/p\u003e","description":"","filename":"4.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6269900/v1/da37666f057d23f3b247bb30.jpg"},{"id":81253146,"identity":"b46b5d8b-8fe4-4431-afc4-aa8746d136a6","added_by":"auto","created_at":"2025-04-24 03:51:36","extension":"jpg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":152384,"visible":true,"origin":"","legend":"\u003cp\u003eCross-validation for minimum mean square error (\u003cstrong\u003eright\u003c/strong\u003e) and LASSO regression coefficients for variable selection (\u003cstrong\u003eleft\u003c/strong\u003e) for groundnut, millet and cotton.\u003c/p\u003e","description":"","filename":"5.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6269900/v1/52724af41beedd04ccfd76fc.jpg"},{"id":81253145,"identity":"da1e248a-8940-4f14-ad2b-4cbf0e7dcf21","added_by":"auto","created_at":"2025-04-24 03:51:36","extension":"jpg","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":41548,"visible":true,"origin":"","legend":"\u003cp\u003ePercentage increase in mean square error (%IncMSE)\u003c/p\u003e","description":"","filename":"6.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6269900/v1/1f109812dc49d34bc258b463.jpg"},{"id":81253524,"identity":"ebb49b77-caf0-45f2-8bc6-d9797299d03a","added_by":"auto","created_at":"2025-04-24 03:59:36","extension":"jpg","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":68807,"visible":true,"origin":"","legend":"\u003cp\u003eComparison between observed yield and predicted yield using Multiple Linear regression (\u003cstrong\u003eA\u003c/strong\u003e), LASSO regression (\u003cstrong\u003eB\u003c/strong\u003e) and Random Forest regression (\u003cstrong\u003eC\u003c/strong\u003e) for groundnut, millet and cotton.\u003c/p\u003e","description":"","filename":"7.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6269900/v1/e674c6c134212e59bcc73507.jpg"},{"id":81255667,"identity":"3433cc0c-dcf5-493d-907a-6e7bfd72e822","added_by":"auto","created_at":"2025-04-24 04:23:37","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1505605,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6269900/v1/ea23f877-05d5-4cb5-b3c3-84a29430a84f.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Crop yield forecasting in Senegal: application of Machine Learning methods","fulltext":[{"header":"INTRODUCTION","content":"\u003cp\u003eCrop modelling has been developed since the 1980s and 1990s to predict crop yields. It is crucial for improving agricultural production (Torquebiau, \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e2015\u003c/span\u003e) and ensuring global food security. The low productivity of the agricultural sector is linked to many factors, including climatic disturbances (Noba et al., \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2014\u003c/span\u003e). Numerous models have been proposed and validated to date (Ansarifar et al., \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). These models offer the possibility of measuring the links between climate and agriculture by transforming climate data such as temperature, precipitation, etc. into agronomic variables such as crop yields, biomass, etc. (Sultan et al., \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2015\u003c/span\u003e). Most studies on crop yield prediction fall into two categories: processed-based crop models, also known as mechanistic models, and data-driven machine learning models, which are mainly based on collected historical data (Sultan et al., \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2015\u003c/span\u003e; Chipanshi et al., \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2015\u003c/span\u003e; Maestrini et al., \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Chang et al., \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2023\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eIn Senegal, different forecasting approaches have been used to study the effect of climate change on agricultural yields. The majority of this research focuses on the use of mechanistic methods to make predictions (Garcia, \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2015\u003c/span\u003e; Kouakou, \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e2013\u003c/span\u003e). However, mechanistic models require a fairly considerable amount of input information in order to make the model operational (Basso et al., \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2013\u003c/span\u003e). In addition, these models have limited predictive performance due to the high variability of environmental conditions, which is associated with model structure and parameter decisions, beyond the spatio-temporal variability of yields observed over a wide area (Chang et al., \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). Moreover, most mechanistic models implemented in software such as DSSAT, APSIM, SARRAH etc. use daily data for model calibration. Machine Learning models, on the other hand, reduce the computational load (Bergez et al., \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2023\u003c/span\u003e), and make it possible to produce easy, accurate and up-to-date forecasts (Reisi-gahrouei et al., \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Mirani et al. \u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). These models are very useful for high-resolution simulations over a vast area (Sultan et al., \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2015\u003c/span\u003e; Bergez et al., \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). They are able to implicitly account for additive effects and interactions of different parameters, which allows them to outperform most crop models in terms of prediction (Chang et al., \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). With sufficient monthly data in terms of quantity and quality, Machine Learning models can be effectively calibrated to provide accurate predictions. Thus, this study aims to assess the robustness of Machine Learning methods such as Stepwise Multiple regression, LASSO regression and Random Forest regression in predicting the yield of Senegal's main food and cash crops (groundnut, millet and cotton).\u003c/p\u003e"},{"header":"1 MATERIALS AND METHODS","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e1.1 Presentation of the study area\u003c/h2\u003e \u003cp\u003eThe study was carried out in Senegal, covering the period from 1980 to 2021. Senegal is located in the extreme west of the African continent, between 12\u0026deg;5 and 16\u0026deg;5 north latitude and 11\u0026deg;5 and 17\u0026deg;5 west longitude. Covering an area of 196,722 km2, it is bordered to the north by Mauritania, to the east by Mali, to the south by Guinea and Guinea Bissau, to the west by Gambia, and by the Atlantic Ocean along a 500 km coastline. Dakar (550 km2), the capital, is a peninsula in the far west.\u003c/p\u003e \u003cp\u003eSenegal is subdivided into six eco-geographical (agro-ecological) zones. The delimitation of these zones is based on a combination of biophysical and socio-economic factors. The agro-ecological zones are: the Groundnut Basin, Casamance, the Niayes zone, Eastern Senegal, the River Valley and the sylvopastoral zone (Ferlo) (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). Senegal has a dry tropical climate with two seasons: a dry season from November to June and a rainy season from July to October.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003e1.2 Data source and parameters used\u003c/h3\u003e\n\u003cp\u003eQuantitative data were used in this study. These include agricultural data of groundnuts, millet and cotton from 1980 to 2021 (42 years), covering areas, production and yields from the Direction de l'Analyse, de la Pr\u0026eacute;vision et des Statistiques Agricoles (DAPSA). Climatic data such as rainfall, temperature and humidity were also obtained for the same periods from the Agence Nationale de l'Aviation Civile et de la M\u0026eacute;t\u0026eacute;orologie du S\u0026eacute;n\u0026eacute;gal (ANACIM). Additional data were extracted from online databases (FAOSTAT: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.fao.org/faostat/en/#home\u003c/span\u003e\u003cspan address=\"https://www.fao.org/faostat/en/#home\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e and Data Access Viewer (DAV): \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://power.larc.nasa.gov/data-access-viewer/\u003c/span\u003e\u003cspan address=\"https://power.larc.nasa.gov/data-access-viewer/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). All these comprehensive data cover all regions of Senegal.\u003c/p\u003e \u003cp\u003eA total of 11 parameters were used to calibrate the model. These are area, production and yield for the agricultural parameters, and annual cumulative rainfall, maximum monthly rainfall from June to October, mean annual temperature, maximum monthly temperature (Tmax), minimum monthly temperature (Tmin), relative humidity and root zone soil wetness (RZSW) for the climatic parameters. The variable year is mainly a time parameter. Yield \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\left(Y\\right)\\)\u003c/span\u003e\u003c/span\u003e is the response variable and the other variables are explanatory variables (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{X}_{i}\\)\u003c/span\u003e\u003c/span\u003e).\u003c/p\u003e\n\u003ch3\u003e1.3 Machine Learning methods\u003c/h3\u003e\n\u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003e1.3.1 Stepwise Multiple Linear Regression\u003c/h2\u003e \u003cp\u003eIn statistics, Multiple Linear Regression (MLR) is commonly used to find the combination of predictors \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{X}_{i}\\)\u003c/span\u003e\u003c/span\u003e (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:i=1,\\cdots\\:p\\::\\:\\text{e}\\text{x}\\text{p}\\text{l}\\text{a}\\text{n}\\text{a}\\text{t}\\text{o}\\text{r}\\text{y}\\:\\text{v}\\text{a}\\text{r}\\text{i}\\text{a}\\text{b}\\text{l}\\text{e}\\text{s}\\)\u003c/span\u003e\u003c/span\u003e) that best explains the dependent variable \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:Y\\)\u003c/span\u003e\u003c/span\u003e (response variable). It aims to establish links between a dependent variable and several independent variables. The original form of linear regression, known as the method of least squares, was first introduced by Adrien-Marie Legendre in 1805 and published by Johann Carl Friedrich Gauss in 1809 (Abbas et al., \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). Stepwise regression is a classical variable selection method that identifies and selects specific groups of important explanatory variables (Ajith et al., \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). It is used to select the best regression variables from a large number of independent variables (Abhinaya et al., \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). The MLR model can be described as follows:\u003c/p\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:Y=\\:{\\beta\\:}_{0}+{\\beta\\:}_{1}{X}_{1}+\\:{\\beta\\:}_{2}{X}_{2}+\\:\\cdots\\:{\\beta\\:}_{P}{X}_{P}+\\:{\\epsilon\\:}_{i}\\:,\\:\\:\\:i=1,\\cdots\\:,\\:\\:p,\\:\\)\u003c/span\u003e \u003c/span\u003e ( 1 )\u003c/p\u003e \u003cp\u003ethis equation can be written in matrix form as:\u003c/p\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:Y=X\\beta\\:+\\epsilon\\:\\)\u003c/span\u003e \u003c/span\u003e, ( 2 )\u003c/p\u003e \u003cp\u003ewhere :\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:Y\\)\u003c/span\u003e \u003c/span\u003e is an observed random variable, called the variable to be explained, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:Y=({y}_{1},\\cdots\\:,{y}_{n})\\)\u003c/span\u003e\u003c/span\u003e, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:n\\)\u003c/span\u003e\u003c/span\u003e is the number of observations,\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:{X}_{1},\\cdots\\:,\\:{X}_{p}\\)\u003c/span\u003e \u003c/span\u003e are explanatory variables or regressors,\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:{\\beta\\:}_{0},{\\beta\\:}_{1}\\cdots\\:,\\:{\\beta\\:}_{p}\\:\\)\u003c/span\u003e \u003c/span\u003eare unknown real parameters to be estimated, called regression parameters or regression coefficients,\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:{\\epsilon\\:}_{i}\\)\u003c/span\u003e \u003c/span\u003e are unobserved random variables independent of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{X}_{i}\\)\u003c/span\u003e\u003c/span\u003e, known as errors or noise, to which certain additional conditions are imposed.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eTo estimate the regression coefficients \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\beta\\:\\)\u003c/span\u003e\u003c/span\u003e, the ordinary least squares (OLS) method is applied. This involves minimizing the sum of squared errors. In other words, it corresponds to minimizing the sum of squared deviations between observed and predicted values. For multiple regression, this method estimates \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\beta\\:\\)\u003c/span\u003e\u003c/span\u003e by the value that minimizes\u003c/p\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:{\\left(Y-X\\beta\\:\\right)}^{T}(Y-X\\beta\\:)\\)\u003c/span\u003e \u003c/span\u003e, ( 3 )\u003c/p\u003e \u003cp\u003ehence the estimator of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\beta\\:\\)\u003c/span\u003e\u003c/span\u003e, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\widehat{\\beta\\:}\\)\u003c/span\u003e\u003c/span\u003e, is:\u003c/p\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:\\widehat{\\beta\\:}={\\left({X}^{T}X\\right)}^{-1}{X}^{T}Y,\\)\u003c/span\u003e \u003c/span\u003e ( 4 )\u003c/p\u003e \u003cp\u003ewhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{X}^{T}\\:\\)\u003c/span\u003e\u003c/span\u003eis the transpose of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:X\\)\u003c/span\u003e\u003c/span\u003e, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\left({X}^{T}X\\right)}^{-1}\\)\u003c/span\u003e\u003c/span\u003e is the inverse of the matrix \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\left({X}^{T}X\\right)\\)\u003c/span\u003e\u003c/span\u003e.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003e1.3.2 Least Absolute Shrinkage and Selection Operator regression (LASSO regression)\u003c/h3\u003e\n\u003cp\u003eInspired by Ridge's penalized regression, the LASSO (Least Absolute Selection and Shrinkage Operator) regression was developed in 1996 by Robert Tibshirani. This method both shrinks the size of the coefficients and performs variable selection by reducing the regression coefficients to zero and penalizing the regression model with a penalty term called norm \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{L}_{1}\\)\u003c/span\u003e\u003c/span\u003e, which is the sum of the absolute coefficients (Tibshirani, \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e1996\u003c/span\u003e). The penalty has the effect of forcing certain coefficient estimates, whose contribution to the model is minor, to be exactly equal to zero (Tibshirani, \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e1996\u003c/span\u003e). A clear advantage of LASSO regression over Ridge regression is that it produces simpler, easier-to-interpret models that incorporate only a reduced set of predictors (Abhinaya et al., \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2021\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eThe statistical model takes the form of Eq.\u0026nbsp;(1). When \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:p\\)\u003c/span\u003e\u003c/span\u003e is large or the variables are linearly dependent, least-squares estimators can fail. Indeed, the Ordinary Least Squares (OLS) method used to minimize residual squared errors in regressions has some major drawbacks (Hammami et al., \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2012\u003c/span\u003e). If the number of independent variables is high, or if the explanatory variables are highly correlated, the variance of the least-squares coefficient estimates can be unacceptably high, leading to a lack of interpretation and accuracy (Hammami et al., \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2012\u003c/span\u003e; Tibshirani, \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e1996\u003c/span\u003e). Thus, the penalized (or constrained) LASSO method consists in restricting the space over which this penalty criterion is minimized. The penalty (constraint) of the LASSO method is given by:\u003c/p\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:p\\left(\\beta\\:\\right)={‖\\beta\\:‖}_{1},\\:\\)\u003c/span\u003e \u003c/span\u003e ( 5 )\u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003e\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{‖\\beta\\:‖}_{1}=\\:\\sum\\:_{i=1}^{p}\\left|{\\beta\\:}_{i}\\right|\\)\u003c/span\u003e\u003c/span\u003e. ( 6 )\u003c/h2\u003e \u003cp\u003eThe estimator of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\beta\\:}\\)\u003c/span\u003e\u003c/span\u003e, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\widehat{\\beta\\:}\\)\u003c/span\u003e\u003c/span\u003e, by the LASSO method is defined by:\u003c/p\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:{\\widehat{\\beta\\:}}_{Lasso}={arg}\\underset{\\beta\\:}{{min}}\\left\\{{\\left|y-X\\beta\\:\\right|}^{2}+\\:\\sum\\:_{i=1}^{P}\\left|{\\beta\\:}_{i}\\right|\\right\\},\\)\u003c/span\u003e \u003c/span\u003e( 7 )\u003c/p\u003e \u003cp\u003eWhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:p\\)\u003c/span\u003e\u003c/span\u003e is the number of observations, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:X\\)\u003c/span\u003e\u003c/span\u003e is the matrix of explanatory variables, y represents the vector of the response variable and λ \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\:0\\:\\)\u003c/span\u003e\u003c/span\u003erepresents a tuning parameter known as the tuning or shrinkage parameter.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003e1.3.3 Random Forest regression\u003c/h3\u003e\n\u003cp\u003eRandom Forest is one of the most advanced and powerful machine learning algorithms introduced in 2001 by Breiman. Random Forest is a non-parametric algorithm that constructs a set of decision trees, each of which is calibrated using Bootstrap sampling (Ruiz-Aĺvarez et al., \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Ga\u0026aacute;l, \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e2012\u003c/span\u003e). The features to perform each division in the trees are selected from a random subsample of the feature set (Ruiz-Aĺvarez et al., \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). In Random Forest, the classifier can be described as the collection of tree-structured classifiers that is an advanced version of Bagging classification to which randomness has been added (Breiman, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2001\u003c/span\u003e). Random Forest divides each node using the best among a subset of randomly chosen predictors at that node (Ok et al., \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2012\u003c/span\u003e). The CART algorithm is used to create trees (Breiman, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2001\u003c/span\u003e). Random Forest uses two parameters: the number of regression trees (n_tree) to be developed and the number of randomly sampled features in each division (m_try) (Ruiz-Aĺvarez et al., \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). Once the model has been trained, the prediction can be obtained as follows:\u003c/p\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:\\widehat{Y}=\\:\\frac{1}{M}\\sum\\:_{m=1}^{M}{T}_{m}\\left({y}_{i}\\right),\\:\\:i=1,\\cdots\\:,\\:p,\\:\\)\u003c/span\u003e \u003c/span\u003e( 8 )\u003c/p\u003e \u003cp\u003ewhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:M\\)\u003c/span\u003e\u003c/span\u003e is the number of trees, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{T}_{m}\\)\u003c/span\u003e\u003c/span\u003e is a single decision tree and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{y}_{i}\\:\\)\u003c/span\u003e\u003c/span\u003eis the vector of predictors.\u003c/p\u003e\n\u003ch3\u003e1.4 Model performance measurement\u003c/h3\u003e\n\u003cp\u003eModel performance was tested using various statistical measures. The use of several measures enabled us to evaluate the performance of a single model and to compare several models. In this study, model performance was tested using: coefficient of determination (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{R}^{2}\\)\u003c/span\u003e\u003c/span\u003e), Root Mean Square Error (RMSE) and Mean Absolute Error (MAE).\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eCoefficient of determination (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{R}^{2}\\)\u003c/span\u003e\u003c/span\u003e):\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003cdiv id=\"Equa\" class=\"Equation\"\u003e \u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equa\" name=\"EquationSource\"\u003e\n$$\\:{R}^{2}=1-\\frac{{\\sum\\:}_{i=1}^{n}{({Y}_{i}-{\\widehat{Y}}_{i})}^{2}}{{\\sum\\:}_{i=1}^{n}{({Y}_{i}-{\\stackrel{-}{Y}}_{i})}^{2}}$$\u003c/div\u003e \u003c/div\u003e \u003c/p\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003e( 9 )\u003c/h2\u003e \u003cp\u003eThe \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{R}^{2}\\)\u003c/span\u003e\u003c/span\u003ecorresponds to the proportion of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:Y\\)\u003c/span\u003e\u003c/span\u003e variability explained by the model. In multiple regression models, it corresponds to the squared correlation between the observed values and the values predicted by the model. The closer the \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{R}^{2}\\)\u003c/span\u003e\u003c/span\u003e is to 1, the better the model's prediction.\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eRoot Mean Square Error (RMSE):\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003cdiv id=\"Equb\" class=\"Equation\"\u003e \u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equb\" name=\"EquationSource\"\u003e\n$$\\:RMSE=\\sqrt{\\frac{{\\sum\\:}_{i=1}^{n}{({Y}_{i}-{\\widehat{Y}}_{i})}^{2}}{n}}$$\u003c/div\u003e \u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003e( 10 )\u003c/h2\u003e \u003cp\u003eRMSE is a metric that measures the average difference between the predicted values of a statistical model and the actual values. An RMSE value of zero indicates a perfect model fit. The lower the RMSE, the better the model and its predictions.\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eMean Absolute Error (MAE):\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003e\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:MAE=\\frac{1}{n}\\sum\\:_{i=1}^{n}\\left|{Y}_{i}-{\\widehat{Y}}_{i}\\right|\\)\u003c/span\u003e\u003c/span\u003e ( 11 )\u003c/h2\u003e \u003cp\u003eThe Mean Absolute Error (MAE) measures the average absolute magnitude between actual values and the values predicted by the regression model. The lower the MAE, the better the model's predictions.\u003c/p\u003e \u003cp\u003ewhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{Y}_{i}\\)\u003c/span\u003e\u003c/span\u003e is the actual value; \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\widehat{Y}}_{i}\\)\u003c/span\u003e\u003c/span\u003e is the predicted value; \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\stackrel{-}{Y}}_{i}\\)\u003c/span\u003e\u003c/span\u003e is the mean value of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{Y}_{i}\\)\u003c/span\u003e\u003c/span\u003e; and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:i=\\text{1,2},3,\\cdots\\:,n\\)\u003c/span\u003e\u003c/span\u003e.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003e1.5 Data processing and analysis\u003c/h2\u003e \u003cp\u003eR software version \u003cem\u003e4.2.0\u003c/em\u003e is used to analyze and model agricultural and climatic data. The general methodology involves data visualization and yield modeling. The \u003cem\u003eggplot2\u003c/em\u003e package was used to create graphs to visualize production, area sown and yields of groundnuts, millet and cotton as a function of years in Senegal, as well as their trend curves. The trend curve is generated using the simple linear regression model with the \u003cem\u003elm\u003c/em\u003e function under the null hypothesis H\u003csub\u003e0\u003c/sub\u003e: \u0026ldquo;\u003cem\u003ethat there is no significant linear relationship between the variables sown area, production and yield respectively with the year variable\u0026rdquo;.\u003c/em\u003e\u003c/p\u003e \u003cp\u003eBefore implementing the models, the \u003cem\u003ecreateDataPartition\u003c/em\u003e function in the \u003cem\u003ecaret\u003c/em\u003e package was used to randomly divide the dataset into a training dataset and test dataset. A 60\u0026thinsp;\u0026minus;\u0026thinsp;40% split between training and test data was used due to the limited sample size. This balance allows for sufficient training data while maintaining a reasonable test set for model evaluation. Additionally, 10-fold cross-validation on the training set helps mitigate the impact of the smaller training data size, ensuring a robust performance assessment. The k-fold cross-validation was carried out using the \u003cem\u003ecaret\u003c/em\u003e package. \u003cem\u003eMass\u003c/em\u003e package was used to perform Stepwise Multiple Linear Regression model. The R package most commonly used to perform constrained linear regression (e.g. LASSO regression for our study) is \u003cem\u003eglmnet\u003c/em\u003e. The model is implemented using a function also called \u003cem\u003eglmnet().\u003c/em\u003e To implement Random Forest regression on R, a set of packages is required: \u003cem\u003edplyr\u003c/em\u003e (data manipulation), \u003cem\u003erandomForest\u003c/em\u003e (model implementation), \u003cem\u003erandomForestExplainer\u003c/em\u003e (graphics) and \u003cem\u003eggplot2\u003c/em\u003e (graphics). The \u003cem\u003ecaret\u003c/em\u003e and \u003cem\u003eMetrics\u003c/em\u003e packages were also used for performance measurement.\u003c/p\u003e \u003c/div\u003e"},{"header":"2 RESULTS","content":"\u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003e2.1 Analysis of trends in agricultural parameters\u003c/h2\u003e \u003cdiv id=\"Sec17\" class=\"Section3\"\u003e \u003ch2\u003e2.1.1 Change in area sown in hectares by crop\u003c/h2\u003e \u003cp\u003eAn increasing trend in the area sown to groundnuts over the period studied was noted, whereas for millet and cotton, the trend was downwards with very large annual fluctuations (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). The p-values (p) are all less than α\u0026thinsp;=\u0026thinsp;0.05 (significance threshold), so there is a significant relationship between the area sown variable and the year variable.\u003c/p\u003e \u003cp\u003eObservation of the raw data shows significant variation for cotton, groundnuts and millet. There was little variation in cotton land reserves between 2011 and 2021, with an amplitude of 17,183 ha. On the other hand, there was a large change between 2000 and 2010, with a variation of 33,261 ha. The most drastic drop in area was seen in 2019, down to 16,511 ha, while the largest increase was seen in 1997, at 54,439 ha.\u003c/p\u003e \u003cp\u003eFor groundnuts, the smallest amplitude was obtained in the period from 1990 to 2000, i.e. 511,778 ha, indicating little variation in the data. The greatest variation was between 2000 and 2010, at 670,730 ha, indicating a large change in area over this period. Cultivated areas fell drastically in 1998 and 2003, to 519,168 and 524,843 ha respectively. In 2017, however, there was a dramatic increase in the area set aside for groundnuts, to 1,254,058 ha.\u003c/p\u003e \u003cp\u003eFor millet, the smallest area is obtained in the period from 2011 to 2021, i.e. 307 069 ha, indicating little variation in the area sown during this period. However, the largest area (510,032 ha) is recorded in the period from 1980 to 1990, indicating high variability in area. The years 2004 and 2007 were marked by significant decreases in the area sown, to 686,929 and 686,892 ha respectively. The largest area sown was in 1985, at 1,337,805 ha.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003e2.1.2 Change in production in tonnes by type of crop\u003c/h2\u003e \u003cp\u003eThere is an upward trend in groundnut and millet production. For cotton, on the other hand, the trend is downwards (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e). The p-values (p) for groundnuts and cotton are less than α\u0026thinsp;=\u0026thinsp;0.05, so there is a significant relationship between the area sown variable and the year variable for groundnuts and cotton. However, the probability value associated with the trend for millet is greater than 0.05.\u003c/p\u003e \u003cp\u003eRaw data shows that for cotton, the lowest amplitude (13,697t) is recorded in the period from 2011 to 2021, indicating low production variability. The greatest variability was recorded between 1980 and 1990, with an amplitude of 38,531t. Production is marked by a series of declines over the years, with drastic falls in 2016 and 2018 (15,121 and 15,160 t respectively). However, a significant increase was seen in 1984, with 59,495 t.\u003c/p\u003e \u003cp\u003eMillet production is variable: the period from 1990 to 2000 was marked by low variability in production, with a range of 367,652t, while high variability was observed in the period from 2011 to 2021, with a range of 735,862t. Recurring declines were observed, notably in 2007, with a significant drop to 318,822t, while a significant increase was noted in 2020, to 1,144,855t.\u003c/p\u003e \u003cp\u003eFor groundnut production varied little between 1990 and 2000 (497,612t), while it varied significantly between 2011 and 2021 (1,269,958t). Production fell drastically in 2002, to 260,723t, before rising sharply to 1,797,486t in 2020.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec19\" class=\"Section2\"\u003e \u003ch2\u003e2.1.3 Changes in yield (kg/ha) by type of crop\u003c/h2\u003e \u003cp\u003eProduction yields change over time for groundnuts, millet and cotton (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e). The probability value associated with the trend for groundnuts and millet is less than 0.05. This shows that there is a significant effect between the yields of groundnuts and millet. This shows that there is a significant effect between groundnut and millet yield and the year variable, unlike for cotton (p\u0026lt;0.05).\u003c/p\u003e \u003cp\u003eCotton yields varied little between 2000 and 2010, with a range of 283 kg/ha, whereas the period from 1980 to 1990 was marked by high variability, with a range of 583 kg/ha. The lowest cotton yield was recorded in 1998 at 604 kg/ha, while a significant increase in cotton yield was recorded in 1984 at 1,284 kg/ha.\u003c/p\u003e \u003cp\u003eGroundnut yields remain variable throughout the period studied. A small variation in yields is recorded in the period from 1990 to 2000 with a range of 455 kg/ha, while a large variation in yields is observed between 2011 and 2021 with a range of 868 kg/ha. After the low groundnut yields recorded in 1980 and 2002 (489 kg/ha and 320 kg/ha respectively), a considerable increase was noted in 2020 (1,467 kg/ha).\u003c/p\u003e \u003cp\u003eMillet yields are increasing more or less steadily. The period from 1990 to 2000 was marked by low yield variability, with a range of 245 kg/ha, while high yield variability was achieved in the period from 2011 to 2021, with a range of 548 kg/ha. The lowest yield was observed in 1983 with 425 kg/ha, while the highest yield was obtained in 2020 with 1,119 kg/ha.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec20\" class=\"Section2\"\u003e \u003ch2\u003e2.2 Crop yield prediction\u003c/h2\u003e \u003cp\u003e \u003cb\u003e2.2.1 With the stepwise Multiple Linear Model\u003c/b\u003e \u003c/p\u003e\u003cp\u003eOverall, groundnut, millet and cotton yields in Senegal are influenced by production and area (Table\u0026nbsp;1). For groundnut, cumulative rainfall, relative humidity and root zone soil moisture (RZSW) also had an effect on yield. The RZSW variable had a positive effect on groundnut yield, unlike the other variables. For millet, mean annual temperature, relative humidity and RZSW also influence yield. Unlike production and relative humidity, which have positive effects on millet yield, area, mean annual temperature and RZSW have negative coefficients.\u003c/p\u003e \u003cp\u003eThe variables year, cumulative rainfall, mean annual temperature and mean monthly temperature (Tmin) also had an effect on cotton yield. Cumulative rainfall and production have positive effects on cotton yield.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec21\" class=\"Section2\"\u003e \u003ch2\u003eTableau 1 : Stepwise Multiple Linear Model\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"No\" id=\"Taba\" border=\"1\"\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eGroundnut\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003eEstimate\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003eStd. Error\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003et value\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003ePr(\u0026gt;|t|)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e(Intercept)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e7.11e\u0026thinsp;+\u0026thinsp;02\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.83e\u0026thinsp;+\u0026thinsp;02\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e3.882\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.000926 ***\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eProduction\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e9.66e-04\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e3.65e-05\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e26.464\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;2e-16 ***\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eArea sown\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-9.36e-04\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e5.16e-05\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-18.154\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e6.80e-14 ***\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCumulative rainfall\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-9.76e-02\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e4.22e-02\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-2.314\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.031416 *\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRelative humidity\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-9.76e\u0026thinsp;+\u0026thinsp;00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e5.88e\u0026thinsp;+\u0026thinsp;00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-1.66\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.112473\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRZSW\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2.05e\u0026thinsp;+\u0026thinsp;03\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e9.00e\u0026thinsp;+\u0026thinsp;02\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e2.272\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.034253 *\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003eMillet\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003eEstimate\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003eStd. Error\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003et value\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003ePr(\u0026gt;|t|)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e(Intercept)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2.08e\u0026thinsp;+\u0026thinsp;03\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e7.71e\u0026thinsp;+\u0026thinsp;02\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e2.703\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.0137 *\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eProduction\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1.15e-03\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e6.16e-05\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e18.675\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e3.98e-14 ***\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eArea sown\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-7.85e-04\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e7.42e-05\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-10.581\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e1.21e-09 ***\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMean annual temperature\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-3.84e\u0026thinsp;+\u0026thinsp;01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e2.37e\u0026thinsp;+\u0026thinsp;01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-1.62\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.121\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRelative humidity\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1.04e\u0026thinsp;+\u0026thinsp;01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e5.67e\u0026thinsp;+\u0026thinsp;00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1.831\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.082 .\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRZSW\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-2.03e\u0026thinsp;+\u0026thinsp;03\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e8.46e\u0026thinsp;+\u0026thinsp;02\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-2.405\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.026 *\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003eCotton\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003eEstimate\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003eStd. Error\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003et value\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003ePr(\u0026gt;|t|)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e(Intercept)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e6.46e\u0026thinsp;+\u0026thinsp;03\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.58e\u0026thinsp;+\u0026thinsp;03\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e4.101\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.000609 ***\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eYear\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-4.12e\u0026thinsp;+\u0026thinsp;00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e8.96e-01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-4.6\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.000195 ***\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eProduction\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2.42e-02\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.17e-03\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e20.61\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e1.84e-14 ***\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eArea sown\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-2.27e-02\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.35e-03\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-16.79\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e7.46e-13 ***\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCumulative rainfall\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1.56e-01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e3.27e-02\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e4.782\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.00013 ***\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMean annual temperature\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e9.82e\u0026thinsp;+\u0026thinsp;01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e2.58e\u0026thinsp;+\u0026thinsp;01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e3.811\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.001181 **\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTmin\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-1.69e\u0026thinsp;+\u0026thinsp;01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e8.07e\u0026thinsp;+\u0026thinsp;00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-2.088\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.050482 .\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"5\"\u003eNote: *** (**, *) means that the appropriate explanatory variable has an effect at the 1% (5%, 10%) significance level.\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e\u003ch2\u003e2.2.2 With the LASSO regression model\u003c/h2\u003e \u003cp\u003eThreshold lambda values for groundnut, millet and cotton are shown with an upper and lower limit (Fig.\u0026nbsp;5, left panel). Two specific lambda values are highlighted by vertical dotted lines. The lambda value was chosen from this range. The tables (Fig.\u0026nbsp;5, right panel) show the climatic and agricultural variables that are important for predicting groundnut, millet and cotton yields in Senegal. Among these variables, year, production, area, cumulative rainfall, Tmax, Tmin and RZSW were found to be the most influential and were used to build the groundnut forecasting model. However, production and Tmax had a positive effect on yield, unlike the other variables. On the other hand, production, area, mean monthly rainfall, mean annual temperature, Tmax, Tmin, relative humidity and RZSW are important variables for predicting millet yield. Unlike the other variables, production, year, Tmin and Tmax have positive effects on millet yield. For cotton, the variables that influence yield are: year, production, area, cumulative rainfall, mean annual temperature, Tmax, Tmin and relative humidity. Among these variables, production, cumulative rainfall and mean annual temperature have a positive effect on cotton yield.\u003c/p\u003e \u003ch2\u003e2.2.3 With Random Forest regression model\u003c/h2\u003e \u003cp\u003eThe percentage increase in mean square error (%IncMSE) and the increase in node purity (IncNodePurity) are measures commonly used in Random Forest algorithms, particularly in the context of assessing the importance of variables in the model. Figure\u0026nbsp;6 shows the ranking of the relative importance of climatic and agricultural variables in predicting the yield of groundnut, millet and cotton crops in Senegal. Variables with higher %IncMSE values are considered more important in the model, contributing more to reducing error or increasing purity when making decisions in the tree. Production, area, cumulative rainfall and year are the most important predictors of groundnut yield. For millet, the results showed that production, year, average monthly rainfall and relative humidity were the most important variables for predicting yields. For cotton, the variables with the greatest impact on yield are production, cumulative rainfall and year.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec22\" class=\"Section2\"\u003e \u003ch2\u003e2.3 Performance evaluation of the three models\u003c/h2\u003e \u003cp\u003eTo compare the best results between the stepwise Multiple Linear regression, the LASSO regression and the Random Forest regression, R\u003csup\u003e2\u003c/sup\u003e, RMSE and MAE were calculated (Table\u0026nbsp;2). Almost all the models show a very good quality of fit. The coefficients of determination R\u003csup\u003e2\u003c/sup\u003e for groundnut, millet and cotton are respectively 0.93, 0.98 and 0.70 for the stepwise Multiple Linear regression; 0.96, 0.98 and 0.70 for the LASSO regression and 0.74, 0.87 and 0.01 for the Random Forest regression. For the latter, the R\u003csup\u003e2\u003c/sup\u003e obtained for cotton is very low, showing that the model performs poorly in predicting cotton yields in Senegal. The RMSE and MAE statistical indicators for the three regression methods presented in Table\u0026nbsp;2 shows that the LASSO regression has the lowest values of RMSE (44.63, 16.62 and 59.04) and MAE (63.21, 21.19 and 79.55) respectivey for groundnut, millet and cotton, which represents a better result for the LASSO regression.\u003c/p\u003e \u003cp\u003eThe performance of the groundnut, millet and cotton yield prediction equation was tested by comparing the predicted values with the observed values for the period 1980 to 2021, which are presented in Fig.\u0026nbsp;7. It shows that, overall, the 3 models studied show satisfactory goodness of fit (R2 close to 1) for all the test data except for cotton with the Random Forest regression. Most of the models showed little variation between observed and predicted yields. A large proportion of the points are closely clustered around the reference line and are within the 95% confidence interval. However, a large variability was observed in the prediction of cotton yield by Random Forest regression. This clearly means that the prediction accuracy of the LASSO method is better than that of the stepwise method and Random Forest for groundnut, millet and cotton.\u003c/p\u003e \u003cdiv id=\"Sec23\" class=\"Section3\"\u003e \u003ch2\u003eTableau 2 : Comparison of prediction performance\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"No\" id=\"Tabg\" border=\"1\"\u003e \u003ccolgroup cols=\"10\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c10\" colnum=\"10\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c4\" namest=\"c2\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c7\" namest=\"c5\"\u003e \u003cp\u003eMod\u0026egrave;le\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c10\" namest=\"c8\"\u003e\u0026nbsp;\u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eSp\u0026eacute;culation\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colspan=\"3\" nameend=\"c4\" namest=\"c2\"\u003e \u003cp\u003eR\u0026eacute;gression Lin\u0026eacute;aire Multiple pas \u0026agrave; pas\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colspan=\"3\" nameend=\"c7\" namest=\"c5\"\u003e \u003cp\u003eR\u0026eacute;gression\u003c/p\u003e \u003cp\u003eLASSO\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colspan=\"3\" nameend=\"c10\" namest=\"c8\"\u003e \u003cp\u003eR\u0026eacute;gression de\u003c/p\u003e \u003cp\u003eRandom Forest\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003eR\u003c/b\u003e\u003csup\u003e\u003cb\u003e2\u003c/b\u003e\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003eRMSE\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003eMAE\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003eR\u003c/b\u003e\u003csup\u003e\u003cb\u003e2\u003c/b\u003e\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003eRMSE\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003eMAE\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003eR\u003c/b\u003e\u003csup\u003e\u003cb\u003e2\u003c/b\u003e\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003eRMSE\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e\u003cb\u003eMAE\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGroundnut\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.93\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e77.09\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e54.52\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e63.21\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e44.63\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.74\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e187.83\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e142.17\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMillet\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e26.30\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e21.05\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e21.19\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e16.62\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.87\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e66.74\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e51.58\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCotton\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.70\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e80.51\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e60.55\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.70\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e79.55\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e59.04\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e159.70\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e126.76\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e "},{"header":"3 Discussion","content":"\u003cp\u003eAgriculture is a sector that is highly dependent on the climate. The Sahel is most affected by climate trends and variability, particularly in Senegal where the majority of crops depend on the climate (Alhassane et al., \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2013\u003c/span\u003e). During the study period, the areas set aside for groundnuts increased, while a decrease was observed for millet and cotton. Groundnut and millet production also increased. Cotton, on the other hand, is down. An increase in groundnut, millet and cotton yields has also been observed. These variations could be due to the fact that rainfall in Senegal varies according to climatic zone (Faye \u003cem\u003eet al.\u003c/em\u003e, 2017; Mballo et al., \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). The 70s and 80s were marked by severe drought in the Sahel, which had a negative impact on crop yields (Paturel et al., \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e1997\u003c/span\u003e; Sarr, \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2006\u003c/span\u003e). This explains the low and varied yields observed during the study period. The results of this study are consistent with those of Faye et al. (\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2018\u003c/span\u003e), who showed that rainfall alone does not determine agricultural production. This is why, in some years, rainfall is high and yields are low.\u003c/p\u003e \u003cp\u003eThe LASSO model gives better results for predicting groundnut, millet and cotton yields. LASSO is a method that combines regularisation and variable selection by imposing a constraint on the sum of the absolute values of the coefficients. It reduces certain coefficients to zero, thereby selecting only those variables whose coefficient remains non-zero, with the aim of minimising the prediction error. The purpose of shrinkage is to prevent overfitting caused by collinearity or high dimensionality of covariates, though it may perform poorly with highly correlated datasets (Hou et al., \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2018\u003c/span\u003e ; Utazirubanda et al., \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). Random Fotest regression performed poorly in predicting cotton yields in Senegal, although the goodness of fit was good for groundnuts and millet. Multiple linear stepwise regression also performed well, with a very good model fit for all the crops studied. On a larger scale, these results confirm the work of Singh et al. (\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e2019\u003c/span\u003e) on wheat which revealed a higher prediction accuracy of the LASSO model compared to the Multiple Step Linear model using meteorological indices. Similarly, Kumar et al. (\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2021\u003c/span\u003e), in their research on wheat yield prediction, also showed that the LASSO model was more efficient than the stepwise Multiple Linear model. On the other hand, studies by Abhinaya et al. (\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2021\u003c/span\u003e) on groundnut revealed a better fit of the data with the Multiple Linear Stepwise model. This discrepancy could be explained by the presence of multicollinearity, which can also lead to unstable and unreliable coefficient estimates due to the high correlation between the predictors. This multicollinearity, resolved by the Multiple Stepwise Linear model, enabled it to provide a better fit to the data. In Senegal, work by Sarr \u0026amp; Sultan (\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2023\u003c/span\u003e) using machine learning methods to predict groundnut, millet, maize and sorghum yields revealed that Random Forest regression was among the best for predicting crop yields. This disparity in the results of this study is due to the different approaches and types of data used. In fact, this study was based only on agricultural and climatic data, whereas Sarr \u0026amp; Sultan (\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2023\u003c/span\u003e) used three combinations of input data. These were satellite data (NDVI only), climate data only and a combination of satellite and climate data to predict crop yields.\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eThis study assessed the robustness of yield prediction models for groundnut, millet and cotton using Machine Learning methods. The results showed an increase in the area set aside for groundnuts and a decrease in the area set aside for millet and cotton. In terms of production, an increase was also noted for groundnuts and millet. Cotton production, on the other hand, is down. An upward trend in groundnut, millet and cotton yields was also observed. The LASSO regression performed better and gave better results, with a coefficient of determination (R\u0026sup2;) of 0.96 for groundnuts, 0.98 for millet and 0.70 for cotton in Senegal, compared with the Multiple stepwise regression and the Random Forest regression. It also gave the lowest RMSE and MAE values. This study is a contribution to the development of an operational decision support system for food security in Senegal. The promising results obtained encourage further work in this direction, in close collaboration with national agricultural research institutions and farmers' organisations. It would also be interesting to extend the study to other crops that are important for food security in Senegal.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e \u003cstrong\u003eConflict of interest:\u003c/strong\u003e \u003cp\u003e\u0026bull; The authors did not receive support from any organization for the submitted work. The authors declare that they have no conflict of interest.\u003c/p\u003e \u003c/p\u003e\u003ch2\u003eFunding statement:\u003c/h2\u003e \u003cp\u003eNo funding was received for conducting this study.\u003c/p\u003e \u003cp\u003e\u003cb\u003eEthics, Consent to Participate, and Consent to Publish declarations\u003c/b\u003e: not applicable.\u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eN.K.G.S. wrote the entire text of the manuscript and produced the analyses and graphs with the help of A.N. and P.N. A.N helped to draw up the graphs. P.N took part in data analysis.K.N to participate in the correction of the Manuscript.All authors reviewed the manuscript.\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eThe data used in this manuscript include agricultural data of groundnuts, millet and cotton from 1980 to 2021 (42 years), covering areas, production and yields from the Direction de l'Analyse, de la Pr\u0026eacute;vision et des Statistiques Agricoles (DAPSA). Climatic data such as rainfall, temperature and humidity were also obtained for the same periods from the Agence Nationale de l'Aviation Civile et de la M\u0026eacute;t\u0026eacute;orologie du S\u0026eacute;n\u0026eacute;gal (ANACIM). Additional data (rainfall and root zone soil wetness (RZSW) respectively) were extracted from online databases (FAOSTAT: https://www.fao.org/faostat/en/#home and Data Access Viewer (DAV): https://power.larc.nasa.gov/data-access-viewer/). Data from DAPSA and ANACIM cannot be shared without prior authorisation from these organisations.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eAbbas F., Afzaal H., Farooque A.A., Tang S. (2020). Crop Yield Prediction through Proximal Sensing and Machine Learning Algorithms. Agronomy, 10, 1046.https://doi.org/10.3390/agronomy10071046\u003c/li\u003e\n\u003cli\u003eAbhinaya D., Patil S. G., Dheebakaran Ga., Djanaguiraman M., Arockia Stephen Raj. (2021). Use of Statistical Models in Predicting Groundnut Yield in Relation to Weather Parameters. Madras Agricultural Journal, 108. https://doi.org/10.29321/MAJ.10.000546\u003c/li\u003e\n\u003cli\u003eAjith S., Debnath M.K., Gupta D.S., Basak P. (2023). Application of statistical and machine learning models in combination with stepwise regression for predicting rapeseed-mustard yield in Northern districts of West Bengal. International Journal of Statistics and Applied Mathematics, 8, 141\u0026ndash;149. https://doi.org/10.22271/maths.2023.v8.i3b.1004 \u003c/li\u003e\n\u003cli\u003eAlhassane A., Salack S., Ly M., Lona I., Traor\u0026eacute; S.B., Sarr B. (2013). Evolution of agro-climatic risks related to the recent trends of the rainfall regime over the Sudano-Sahelian region of West Africa. S\u0026eacute;cheresse, 24, 282\u0026ndash;293. https://doi.org/10.1684/sec.2013.0400\u003c/li\u003e\n\u003cli\u003eAnsarifar J., Wang L., Archontoulis S.V. (2021). An interaction regression model for crop yield prediction. Scientific Reports, 11, 17754.\u003c/li\u003e\n\u003cli\u003eBergez J.E., Constantin J., Debaeke Ph., Raynal H. and Plassin S., (2023). Modelling climate change impacts on agricultural systems. Chapter taken from: Nendel C. (Eds.), Burleigh Dodds Science, pp. 3\u0026ndash;38.\u003c/li\u003e\n\u003cli\u003eBasso, B., Cammarano, D., Carfagna, E. (2013). Review of crop yield forecasting methods and early warning systems. In: Report Presented to First Meeting of the Scientific Advisory Committee of the Gloal Strategy to Improve Agricultural and Rural Statistics. FAO, Headquarters, Rome, Italy, 18\u0026ndash;19 July.\u003c/li\u003e\n\u003cli\u003eBreiman L., (2001). Random Forests. Machine Learning, 45, 5\u0026ndash;32.\u003c/li\u003e\n\u003cli\u003eChang Y., Latham J., Licht M., Wang L. (2023). A data-driven crop model for maize yield prediction. Communications Biology, 6, 439.\u003c/li\u003e\n\u003cli\u003eChipanshi A., Zhang Y., Kouadio L., Newlands N., Davidson A., Hill H., Warren R., Qian B., Daneshfar B., Bedard F., Reichert G. (2015). Evaluation of the Integrated Canadian Crop Yield Forecaster (ICCYF) model for in-season prediction of crop yield across the Canadian agricultural landscape. Agricultural and Forest Meteorology, 206, 137\u0026ndash;150. https://doi.org/10.1016/j.agrformet.2015.03.007\u003c/li\u003e\n\u003cli\u003eDirection de l\u0026apos;Analyse de la Pr\u0026eacute;vision et des Statistiques Agricoles (DAPSA), (2021). Rapport de l\u0026rsquo;Enqu\u0026ecirc;te Agricole Annuelle (EAA) 2020-2021.\u003c/li\u003e\n\u003cli\u003eGa\u0026aacute;l M. (2012). Modelling the impact of climate change on the Hungarian wine regions using random forest. Applied Ecology and Environmental Research, 10, 121\u0026ndash;140. https://doi.org/10.15666/aeer/1002_121140\u003c/li\u003e\n\u003cli\u003eGarcia, L. (2015). Impact du changement climatique sur les rendements du mil et de l\u0026apos;arachide au S\u0026eacute;n\u0026eacute;gal: Approche par exp\u0026eacute;rimentation virtuelle (Doctoral dissertation, Montpellier SupAgro).\u003c/li\u003e\n\u003cli\u003eFaye A. (2018). Climat et agriculture au S\u0026eacute;n\u0026eacute;gal : Analyse \u0026eacute;conomique de la disponibilit\u0026eacute; de l\u0026rsquo;eau d\u0026rsquo;irrigation dans un contexte de variabilit\u0026eacute; des pr\u0026eacute;cipitations dans les niayes. Th\u0026egrave;se de doctorat, Universit\u0026eacute; Cheikh-Anta-Diop de Dakar, Facult\u0026eacute; des Sciences \u0026Eacute;conomiques et de Gestion, Formation doctorale : \u0026Eacute;conomie et changement climatique, 217p.\u003c/li\u003e\n\u003cli\u003eFaye M., Fall A., Faye G., Van Hecke E. (2018). La variabilit\u0026eacute; pluviom\u0026eacute;trique et ses incidences sur les rendements agricoles dans la r\u0026eacute;gion des Terres Neuves du S\u0026eacute;n\u0026eacute;gal oriental. Belgeo. https://doi.org/10.4000/belgeo.22083\u003c/li\u003e\n\u003cli\u003eHammami D., Lee T.S., Ouarda T.B.M.J., Lee J. (2012). Predictor selection for downscaling GCM data with LASSO. Journal of Geophysical Research: Atmospheres, 117, 2012JD017864. https://doi.org/10.1029/2012JD017864\u003c/li\u003e\n\u003cli\u003eHou, J., Paravati, A., Hou, J., Xu, R., \u0026amp; Murphy, J. (2018). High‐dimensional variable selection and prediction under competing risks with application to SEER‐Medicare linked data. Statistics in medicine, 37(24), 3486-3502. https://doi.org/10.1002/sim.7822\u003c/li\u003e\n\u003cli\u003eKumar, S.D. Attri, K.K. Singh (2021). Comparison of Lasso and stepwise regression technique for wheat yield prediction. Journal of Agrometeorology, 21, 188\u0026ndash;192. https://doi.org/10.54386/jam.v21i2.231\u003c/li\u003e\n\u003cli\u003eMaestrini B., Mimić G., Van Oort P.A.J., Jindo K., Brdar S., Athanasiadis I.N., Van Evert F.K. (2022). Mixing process-based and data-driven approaches in yield prediction. European Journal of Agronomy, 139, 126569. https://doi.org/10.1016/j.eja.2022.126569\u003c/li\u003e\n\u003cli\u003eMballo I., Sy O., Gaye D., San\u0026eacute; B. (2021). Variabilit\u0026eacute; pluviom\u0026eacute;trique et d\u0026eacute;veloppement de l\u0026rsquo;activit\u0026eacute; agricole dans la r\u0026eacute;gion de Kolda (S\u0026eacute;n\u0026eacute;gal). Dynamiques environnementales, 101\u0026ndash;126. https://doi.org/10.4000/dynenviron.6013\u003c/li\u003e\n\u003cli\u003eNoba K., Ngom A., Gu\u0026egrave;ye M., Bass\u0026egrave;ne C., Kane M., Diop I., Ndoye F., Mbaye M.S., Kane A., Tidiane Ba A. (2014). L\u0026rsquo;arachide au S\u0026eacute;n\u0026eacute;gal : \u0026eacute;tat des lieux, contraintes et perspectives pour la relance de la fili\u0026egrave;re. OCL, 21, D205. https://doi.org/10.1051/ocl/2013039\u003c/li\u003e\n\u003cli\u003eNouvelle Alliance pour la S\u0026eacute;curit\u0026eacute; Alimentaire et la Nutrition (NASAN), (2022).\u003c/li\u003e\n\u003cli\u003eOk A.O., Akar O., Gungor O. (2012). Evaluation of random forest method for agricultural crop classification. European Journal of Remote Sensing, 45, 421\u0026ndash;432. https://doi.org/10.5721/EuJRS20124535\u003c/li\u003e\n\u003cli\u003ePaturel, J. E., Servat, \u0026Eacute;., Lub\u0026egrave;s-Niel, H., \u0026amp; Delattre, M. O. (1997). Variabilit\u0026eacute; climatique et analyse de s\u0026eacute;ries pluviom\u0026eacute;triques de longue dur\u0026eacute;e en Afrique de l\u0026apos;Ouest et centrale non sah\u0026eacute;lienne. Comptes Rendus de l\u0026apos;Acad\u0026eacute;mie des Sciences-Series IIA-Earth and Planetary Science, 325(10), 779-782. https://doi.org/10.1016/S1251-8050(97)82756-5\u003c/li\u003e\n\u003cli\u003eRuiz-Aĺvarez M., Gomariz-Castillo F., Alonso-Sarr\u0026iacute;a F. (2021). Evapotranspiration Response to Climate Change in Semi-Arid Areas: Using Random Forest as Multi-Model Ensemble Method. Water, 13, 222.\u003c/li\u003e\n\u003cli\u003eSarr A.B., Sultan B. (2023). Predicting crop yields in Senegal using machine learning methods. International Journal of Climatology, 43, 1817\u0026ndash;1838. https://doi.org/10.1002/joc.7947\u003c/li\u003e\n\u003cli\u003eSarr, B. (2006). INSTAT+ en bref Manuel d\u0026rsquo;utilisation destin\u0026eacute; aux Ing\u0026eacute;nieurs en agrom\u0026eacute;t\u0026eacute;orologie et en m\u0026eacute;t\u0026eacute;orologie a\u0026eacute;ronautique, CILSS, Centre r\u0026eacute;gional Agrhymet, 74 p.\u003c/li\u003e\n\u003cli\u003eSingh K.N., Singh K.K., Kumar S., Panwar S., Gurung B. (2019). Forecasting crop yield through weather indices through LASSO. The Indian Journal of Agricultural Sciences, 89. https://doi.org/10.56093/ijas.v89i3.87602\u003c/li\u003e\n\u003cli\u003eKouakou P., K. (2013). Am\u0026eacute;lioration de la pr\u0026eacute;vision des rendements du mil (Pennisetum glaucum (L.) R. Br.) au S\u0026eacute;n\u0026eacute;gal par l\u0026rsquo;utilisation de mod\u0026egrave;les de culture : prise en compte de la sensibilit\u0026eacute; \u0026agrave; la photop\u0026eacute;riode des vari\u0026eacute;t\u0026eacute;s et de la fertilit\u0026eacute; dans les parcelles d\u0026rsquo;agriculteurs. Universit\u0026eacute; Cheikh Anta Diop de Dakar, Th\u0026egrave;se de doctorat. \u003c/li\u003e\n\u003cli\u003eKumar, S.D. Attri, K.K. Singh (2021). Comparison of Lasso and stepwise regression technique for wheat yield prediction. Journal of Agrometeorology, 21, 188\u0026ndash;192. https://doi.org/10.54386/jam.v21i2.231\u003c/li\u003e\n\u003cli\u003eSultan B., Roudier P., Traor\u0026eacute; S. (2015). Chapitre 10. Les impacts du changement climatique sur les rendements agricoles en Afrique de l\u0026rsquo;Ouest. In: Les soci\u0026eacute;t\u0026eacute;s rurales face aux changements climatiques et environnementaux en Afrique de l\u0026rsquo;Ouest (eds Sultan B, Lalou R, Amadou Sanni M, Oumarou A, Soumar\u0026eacute; MA), pp. 209\u0026ndash;225. IRD \u0026Eacute;ditions.\u003c/li\u003e\n\u003cli\u003eTorquebiau E., (2015). Changement climatique et agricultures du monde. Collection Agricultures et d\u0026eacute;fis du monde, Cirad-AFD. Editions Quae, 328 pages.\u003c/li\u003e\n\u003cli\u003eTibshirani R. (1996). Regression Shrinkage and Selection Via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58, 267\u0026ndash;288. https://www.jstor.org/stable/2346178\u003c/li\u003e\n\u003cli\u003eUtazirubanda, J. C., M. Le\u0026oacute;n, T., \u0026amp; Ngom, P. (2021). Variable selection with Group LASSO approach: Application to Cox regression with frailty model. \u003cem\u003eCommunications in Statistics-Simulation and Computation\u003c/em\u003e, \u003cem\u003e50\u003c/em\u003e(3), 881- 901. https://doi.org/10.1080/03610918.2019.1571605\u003c/li\u003e\n\u003cli\u003eReisi-gahrouei, O., Homayouni, S., Mcnairn, H., \u0026amp; Hosseini, M. (2019). Crop biomass estimation using multi regression analysis and neural networks from multitemporal L-band polarimetric synthetic aperture radar data. 1161. https://doi.org/10.1080/01431161.2019.1594436\u003c/li\u003e\n\u003cli\u003eMirani, A., Memon, M. S., Chohan, R., Wagan, A. A., \u0026amp; Qabulio, M. (2021). Machine learning in agriculture: A review. LUME, 10, 5. https://doi.org/10.3390/s18082674 \u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"discover-agriculture","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"Learn more about [Discover Agriculture](https://www.springer.com/journal/44279)","snPcode":"44279","submissionUrl":"https://submission.nature.com/new-submission/44279/3","title":"Discover Agriculture","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Discover Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Prediction, Machine Learning Models, Agricultural yields, Climate change","lastPublishedDoi":"10.21203/rs.3.rs-6269900/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6269900/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eTwo approaches are generally used to predict crop yields. The first is based on Machine Learning methods and the second on mechanistic models. In this study, the robustness of Machine Learning methods in predicting groundnut, millet and cotton yields in Senegal is assessed. These methods are Multiple stepwise regression, Least Absolute Selection and Shrinkage Operator (LASSO) regression and Random Forest regression. These prediction models were tested using a collection of historical agricultural and climatic data for Senegal from 1980 to 2021. Analysis of agricultural trends reveals marked inter-annual variability depending on the crop and period: low variability between 1990\u0026ndash;2000 for groundnuts and millet, and between 2011\u0026ndash;2021 for cotton. High variability between 2000\u0026ndash;2010 for groundnuts and cotton, and between 1980\u0026ndash;1990 for millet. Overall, area, production and yields fluctuate widely depending on the periods and crops studied. The crop yield prediction models of groundnut, millet and cotton performed satisfactorily for test dataset except for cotton. They perform well for groundnut and millet, with high R\u003csup\u003e2\u003c/sup\u003e for LASSO regression (0.96 and 0.98) and stepwise multiple regression (0.93 and 0.98). Cotton had a low R\u003csup\u003e2\u003c/sup\u003e for Random Forest (0.01). The LASSO regression gave the lowest values of RMSE and MAE. Overall, it is the best model for predicting groundnut, millet and cotton yields in Senegal.\u003c/p\u003e","manuscriptTitle":"Crop yield forecasting in Senegal: application of Machine Learning methods","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-04-24 03:51:31","doi":"10.21203/rs.3.rs-6269900/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-05-19T17:40:18+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-05-16T09:16:03+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-05-11T10:33:41+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"248059451351378819283263376500244210033","date":"2025-05-07T10:45:51+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"271089968530327814307456123042219469373","date":"2025-05-06T10:39:34+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"263057978396576951476781114756792746650","date":"2025-05-06T09:55:36+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"225490427927752796952831064107086238774","date":"2025-05-06T09:04:35+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-04-23T10:37:00+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"262448879602935059279907062756667169691","date":"2025-04-15T12:19:12+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"135131961860855745596842957113482432387","date":"2025-04-15T05:35:12+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-04-02T10:58:07+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2025-04-01T07:47:15+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-03-29T09:07:41+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-03-29T09:05:47+00:00","index":"","fulltext":""},{"type":"submitted","content":"Discover Agriculture","date":"2025-03-20T13:01:35+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"discover-agriculture","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"Learn more about [Discover Agriculture](https://www.springer.com/journal/44279)","snPcode":"44279","submissionUrl":"https://submission.nature.com/new-submission/44279/3","title":"Discover Agriculture","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Discover Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"ade6af3e-2193-41e8-a080-2eb0de7e2f75","owner":[],"postedDate":"April 24th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2025-09-29T12:08:14+00:00","versionOfRecord":[],"versionCreatedAt":"2025-04-24 03:51:31","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-6269900","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6269900","identity":"rs-6269900","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00