Estimation of the satellite-derived Leaf Area Index of spring wheat using machine learning approaches | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Estimation of the satellite-derived Leaf Area Index of spring wheat using machine learning approaches Pratibha Prakash, Swadhina Koley, Soora Naresh Kumar, Ramesh Chand Harit, and 2 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4685508/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract The study focuses on the estimation of Leaf Area Index (LAI) for smallholder farms less than 1 acre in semi-arid regions, particularly in Bundelkhand, India. Accurate LAI estimation is crucial for optimizing crop management practices, enhancing yield predictions, and improving the sustainability of agricultural operations. This study evaluates the efficiency of different machine learning algorithms in deriving LAI from Sentinel-2 and Landsat-8 data, with a focus on spring wheat across two growing seasons (2020–2021 and 2021–2022) in six villages in the Bundelkhand region of India. Three machine learning approaches—Random Forest (RF), Support Vector Machine (SVM), and XGBoost—were employed for LAI estimation. Validation against ground-truth LAI measurements was carried out using Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Pearson’s correlation coefficient (R), and Multiplicative Bias (MBias). Results indicate that RF and SVM with Radial Basis Function (SVM-RBF) achieved the highest accuracy for both Sentinel-2 and Landsat-8 data. For Sentinel-2, RF and SVM-RBF both achieved an R-value of 0.94, with RMSE of 0.40 and MAE of 0.29 and 0.30, respectively. RF showed a slight overestimation (MBias = 1.02), while SVM-RBF had a perfect MBias of 1.00. XGBoost also performed well (R = 0.94), though with slightly higher RMSE (0.43) and MAE (0.33), and an MBias of 0.88, indicating slight underestimation. SVM linear had lower performance metrics (R = 0.84, RMSE = 0.62, MAE = 0.48, MBias = 1.02). For Landsat-8, RF and SVM-RBF also showed strong performance (R = 0.94), with RF achieving RMSE of 0.38 and MAE of 0.28, and SVM-RBF achieving the lowest RMSE of 0.37 and MAE of 0.29. Both had near-perfect MBias values (RF = 1.00, SVM-RBF = 0.99). XGBoost displayed a high R-value (0.93) but higher error metrics (RMSE = 0.40, MAE = 0.30, MBias = 1.01). SVM linear underperformed (R = 0.78, RMSE = 0.69, MAE = 0.53, MBias = 0.98). Overall, RF and SVM-RBF consistently outperformed SVM linear and XGBoost across both satellite datasets. Landsat-8 OLI Sentinel-2 MSI Leaf Area Index Spring wheat Machine learning Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Introduction Continuous crop monitoring is essential for making informed crop management and yield optimization decisions. Leaf Area Index (LAI) is a crucial biophysical variable used for crop monitoring, and farmers and researchers can track crop growth and health and detect stress to make informed decisions about crop management practices by continuously monitoring LAI. The Global Climate Observing System listed LAI as an Essential Climate Variable and a key variable for models studying vegetation-atmosphere interactions (Baret et al. 2013 , GCOS 2024). Information-based management can help to improve crop yield and quality, lower management costs, and increase the sustainability of agricultural production. Remote sensing data comes in handy in estimating LAI as these data are available at repetitive time scales covering large areas on the ground (Filgueiras et al., 2019). Estimating biophysical variables is a crucial aspect of remote sensing applications, as it allows for non-destructive estimation of these variables across extensive areas (Mourad et al., 2020 ). LAI is a dimensionless canopy structure parameter and is described as an area of one side leaf per unit ground area and is a key biophysical variable in agricultural and environmental studies (Jonckheere et al., 2004 ). It provides information on the amount of light intercepted by the crop canopy (Ganguly et al., 2012 ; Cui et al., 2018). LAI can be measured most accurately using a direct method, which is destructive sampling of leaves and using field instruments. However, these measurement methods have their limitations as they are labour-intensive, non-economical and time-consuming (Raj et al., 2021 ). Many studies have estimated LAI using satellite remote sensing data with reasonable accuracies (Tripathi et al., 2013 ; Djamai et al., 2018; Xie et al., 2019 ; Mourad et al., 2020 ; Filipponi, 2021 ; Kang et al., 2021 ; Sun et al., 2021 ). Estimated LAI is an important biophysical variable for yield estimation and can be assimilated in the crop simulation model using data assimilation methods (Dente et al., 2008 ; Huang et al., 2019 ). Estimation of LAI from optical remote sensing data can be categorized into two methods: (1) empirical relationships between satellite-derived LAI (vegetation index) and ground-measured LAI and (2) physical model-based inversion. Empirical methods are simple and computationally easier as compared to the model-based approaches and provide an acceptable level of accuracy in LAI estimation (Atzberger, 2004 ; Cui et al., 2018; Pasqualotto et al., 2019 ). One drawback of empirical methods is their limited applicability to local scales, as the established relationships are specific to particular locations. Additionally, this approach necessitates multiple calibrations with ground-based observational data (Sun et al., 2022). The physical model-based approach uses the canopy radiative transfer model (RTM), and the most preferred RTM method is PROSAIL (Jacquemoud et al., 2009 ). Based on interactions between radiation, canopy components, and soil surface, RTM inversion is carried out utilising reflectance and auxiliary variables. The RTM inversion is achieved using a Look-up Table (LUT) and machine learning approaches. This method has shown strong potential for biophysical variables estimation (Verrelst et al., 2015 ; Xie et al., 2019 ). The physical model-based inversion method is employed to generate global-scale Leaf Area Index products from Moderate Resolution Imaging Spectroradiometer (MODIS) data, albeit at a coarse spatial and temporal resolution. MODIS satellites, launched in the years 1999 (terra) and 2002 (aqua), provide NDVI at a temporal resolution of 16 days and spatial resolution of 250m. The MODIS data are widely used in vegetation dynamics and monitoring studies at global and at regional levels scale such as the mountain grassland leaf area index is (estimated using the inversion radiative transfer model with decent RMSE 1.62 (m2/m2) (e.g., Beck et al., 2006 ; Ren et al., 2008 ; Mkhabela et al., 2011 ; Pasolli et al., 2015; Dubey et al., 2018 ; Prasad et al., 2021 ). However, these products are of limited use in areas where agricultural land holding are very small. The earth observation satellites such as Landsat and Sentinel which have a spatial resolution in the range of 10–60 m can be used in area of small land holdings. The Landsat program by the United States Geological Survey (USGS) was launched in 1972 and provides time series satellite images at spatial resolution of 30 m and a revisit period of 16 days. By employing statistical and radiative transfer model (RTM) inversion techniques, Landsat-8 imagery is able to provide precise estimates of specific leaf area (SLA) at both regional and global scales (Ganguly et al., 2012 , Ali et al., 2017 ) The LAI model inversion approach using multitemporal optical data from Landsat effectively derives leaf area LAI for various crop types, showing consistent seasonal variations with crop phenological stages (Gonzalez-Sanpedro et al., 2008 ). An improvement in the spatial and temporal resolution is provided by the European Space Agency (ESA) Sentinel satellites, launched in 2015. A constellation of two satellites Sentinel 2A and Sentinel 2B provides imagery at 10m to 60 m spatial resolution at 5-day revisit time between the two satellites. These satellite sensors data have been used in different studies for vegetation analysis and estimation of crop biophysical parameters such as leaf area index, canopy chlorophyll content (Zheng et al., 2015 ; Onojeghuo et al., 2018 ; Kowalski et al., 2020 ; Nihar et al., 2022 ). The presence of Red edge bands makes Sentinel 2 data of particular interest for LAI retrieval from winter wheat (Xie et al., 2019 ). Sentinel-2 Band-8A-Narrow Near InfraRed is more accurate for Leaf Area Index estimation in cotton, tomato, and wheat, while Band-9 (Water vapor) shows a high correlation with LAI, facilitating more accurate agricultural monitoring than traditional Vegetation Indices. The traditionally used inversion techniques involve a minimization of a cost function which requires excessive computation time to achieve the high retrieval accuracy (Kimes et. al, 2009, Wang et al., 2017 ). Machine learning algorithms for LAI retrieval can improve prediction accuracies and spatial consistencies (Houborg et al. 2018).Given the potential application of satellite-estimated LAI for crop management, this study aims at validating satellite-estimated LAI for spring wheat crops. This study proposes a framework for retrieving LAI from Landsat-8 and Sentinel-2 satellite data using three machine learning methods i.e., Random Forest (RF), Support Vector Machine (SVM) and XGBoost. In addition, the field-observed LAI is compared with the satellite-derived ones. Materials and method Study area This study was conducted for spring wheat ( Triticum aestivum L.) in farmer’s fields in six villages situated in the Jhansi district of Uttar Pradesh and Niwari district of Madhya Pradesh, India (Fig. 1 ). The sample study site lies in the semi-arid, drought-prone and water-scarce Bundelkhand region of Central India. Spring wheat is a major crop grown in the winter season both in the Jhansi and Niwari districts of the Bundelkhand region. These two districts are located between 78° 15’ E to 79° 24’E longitude and 25° 15’ N to 25° 45’N latitude at a mean elevation of 285m. The field observations for LAI were taken for two years (2020–2021 and 2021–2022) during three crop growth stages. Dataset and Methodology Ground observations The ground observed data on LAI was collected from 175 farmer’s fields in six villages viz., Pathari, Kothkhera, Nayagaon, and Durgapura, situated in Jhansi district of Uttar Pradesh, and Mathurapura and Ramnagar situated in Niwari district of Madhya Pradesh. LAI data were collected in 175 farmers’ fields in year 1, i.e., 2020–2021 and in 131 farmers’ fields in year 2, i.e., 2021–2022, at three stages of crop growth (Table 1 ). The average field sizes for farmers in Jhansi and Niwari districts are small to semi-medium size land holdings. The discrepancy in the number of farmers in two years is due to the shift of 44 farmers to green peas from wheat. LAI was assessed utilizing the LICOR 2000 plant canopy analyzer, a portable device employing indirect and non-contact estimation techniques (Asner et al., 2003 ). The calculation of LAI involves radiation readings obtained via a fish-eye optical sensor. Canopy light interception is gauged at five angles, both above and below the canopy, and the radiative transfer model is employed to determine LAI (Chaurasia et al., 2011 ). Satellite data acquisition and processing Satellite imagery from satellites, i.e., Sentinel-2 and Landsat-8, were downloaded for crop growth periods (Table 1 ). Landsat-8 satellite data provided by the USGS has a spatial resolution of 30 m and a temporal resolution of 16 days. Sentinel-2 data provided by the ESA has a better spatial (10 m) and temporal resolution (5 days). The LAI estimation was carried out using all the bands except for the thermal band present in Landsat. Additional bands that give mean Sun sensor geometry, i.e., Solar Zenith Angle (SZA) and View Zenith Angle (VZA), were also used. The absolute cosine of these sun-sensor geometry indices has been extracted and used for model building. The overall methodology for LAI estimation using Sentinel-2 and Landsat-8 data is represented in Fig. 2 . Table 1 Dates of recording LAI in farmer’s field and dates of acquisition of Sentinel-2 and Landsat-8 imagery for LAI estimation Crop year Stage Date of ground observation Date of Acquisition Sentinel-2 Landsat-8 Year 1 Vegetative - - - Anthesis 23 to 25-Jan-2021 21-Jan-2021 20-Jan-2021 Dough 8 to 10-Mar-2021 7-Mar-2021 9-Mar-2021 Year 2 Vegetative 22 to 25-Dec-2021 22-Dec-2021 22-Dec-2021 Anthesis 23 to 26-Feb-2022 22-Feb-2022 24-Feb-2022 Dough 21 to 26-Mar-2022 22-Mar-2022 28-Mar-2022 Machine learning approach Machine learning algorithms comprise a collection of methods designed to autonomously discern patterns within data and utilize these patterns for regression tasks. They represent a progression beyond linear regression models, capable of operating effectively on intricate and non-linear datasets. In regression, the objective is to forecast a value based on input data (Belgiu et al., 2016; Cooner et al., 2016). The goal of regression is to predict a value based on input datasets. These algorithms undergo training using a dataset, learning from it, and subsequently leveraging this acquired knowledge to make predictions on fresh, untrained data (Mahesh, 2020). i) Random Forest (RF) The RF algorithm stands out as a prevalent choice for regression due to its robustness, speed, and accuracy (Mudi et al., 2022 ; Goyal et al., 2022). It operates by combining multiple trees trained on data subsets (Sun et al., 2022), employing a majority voting approach for predictions (Breiman, 2001 ). By growing multiple decision trees from bootstrapped samples, RF diminishes overfitting and enhances model accuracy; it also estimates generalization performance using out-of-bag error (Maxwell et al., 2018 ). Individual trees are trained using in-bag samples, while out-of-bag samples are used to estimate the out-of-bag error, which provides a measure of the model's generalisation performance. Using a majority voting or weighted voting scheme, the final prediction is made by aggregating the predictions of all trees. When compared to individual decision trees, the RF classifier has several advantages, including improved accuracy, ability to exclude outliers, and handling high-dimensional data and large datasets, and reduced overfitting (Rodriguez-Galiano et al., 2012 ). ii) Support Vector Machine (SVM) SVM introduced by Vapnik in 1979 is one of the non-parametric methods that has been extensively employed in numerous studies (Cervantes et al.,2020). On the basis of several optimization techniques or kernels, the SVM classifier generates an ideal decision boundary, also known as a hyperplane. To accommodate the hyperplane, various kernel alternatives are available (Kavzoglu et al., 2009). Radial basis function (RBF) and polynomial kernel are two examples of often employed kernels and also improves the classification accuracy in the remote sensing imagery ( Steinwart et al., 2006 )(Chakraborty et al., 2016 ) Cost, or the C parameter, and gamma (γ), are two important factors that are required to be set for implementation in the SVM algorithm. The C parameter plays a crucial role in order to create a complex decision boundary. The greater the C value, the decision boundary will be more complicated. Therefore, the ideal C value for SVM needs to be determined carefully, as a higher C value can result in overfitting of the model. Using γ, the hyperplane’s form is determined (Ghosh et al., 2014). iii) Xtreme Gradient Boosting (XGBoost) Extreme Gradient Boosting (XGBoost), created by Chen et al. (2016), is also a popular and powerful machine learning algorithm for regression problems. This algorithm performs iteratively to construct a series of prediction-making decision trees. After each iteration, XGBoost modifies the prediction according to the loss function, which measures the difference between the predicted and actual values. By adjusting the parameters of the decision trees, XGBoost attempts to minimise the loss function. This process is repeated until the loss function reaches its minimum value, which corresponds to the optimal prediction. The XGBoost algorithm iteratively increases prediction accuracy by considering the loss function after training the model with a significant bias error. Variable importance and Hyperparameter tuning In machine learning, variable importance is the degree of importance shown by some variables that will have more participation in the model preparation as compared to other variables. It assesses the relationship of predictors with the target variable and how these predictors affect the model outcome (Chen et al., 2020 ). Hyperparameters are the parameters in machine learning algorithms that need to be trained before using them for final model creation. For each ML algorithm, there are different hyperparameters that need to be tuned, for example, the number of trees (ntrees) and the number of variables at each split (mtry) in RF. In the current study hyperparameter of each ML algorithm was tuned before using for final model prediction. The Random Forest was tuned for ntrees and mtry variables. The ntree is the final number of trees to be grown, whereas mtry determines the number of variables at each split. In the case of SVM, the Cost (C) parameter for both linear and RBF and gamma (γ) only for RBF kernel was tuned. The XGBoost was tuned for various hyperparameters such as the number of iterations (nrounds), maximum depth of the tree (max_depth), learning rate (eta) and minimum loss reduction (gamma). The nrounds determine the number of iterations to be used in the model preparation, whereas max_depth is the maximum depth in the trees. The (eta) controls the learning rate, and gamma is the minimum loss reduction needed to make more partitions in a leaf node of the tree (Kavzoglu et al., 2022). LAI estimation from satellite imagery The machine learning based modelling was carried out for LAI prediction using all available bands for Sentinel-2 and Landsat-8 and additional SZA and VZA bands. The hyperparameters of the models were tuned, and using the most appropriate hyperparameters, the model was created. The dataset was split into a 70:30 ratio, where 70% of the LAI observed is utilized to train the model and 30% is used to test the model. The model accuracy was evaluated based on the Pearson correlation coefficient (R), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Multiplicative Bias (Mbias). Statistical evaluation and validation of satellite-estimated LAI The statistical properties such as mean (Eq. 1 ), standard deviation (SD) (Eq. 2 ), and coefficient of variation (CV) (Eq. 3) of ground observed LAI values were used to compare with satellite-estimated LAI. Where the ground observed values are represented by X Observed and satellite-derived values are represented by X estimated $$\:Mean=\:\frac{\sum\:{X}_{Observed\:or\:estimated}}{N}$$ 1 $$\:SD=\:\frac{\sqrt{\sum\:{X}_{Observed\:or\:estimated}-Mean\:obs\:or\:est.}}{N}$$ 2 CV = \(\:\frac{SD}{Mean}\) (3) The validation step was performed using statistical analysis such as Pearson’s correlation coefficient (R), Root Mean Square Error (RMSE), Mean Absolute Error (MAE) and multiplicative bias (MBias) between the X observed and X estimated . The R (Eq. 4 ) is the commonly used statistical metric that gives the agreement between observed and estimated data with a score of positive 1 indicating perfect positive agreement, whereas a score of negative 1 indicates perfect inverse agreement. $$\:R=\:\frac{n\:(\sum\:{X}_{Observed}{X}_{estimated}\:)-(\sum\:{X}_{Observed}\left)\right(\sum\:{X}_{estimated\:}\:)}{\sqrt{[n\sum\:{{X}_{Observed}}^{2}-\left(\sum\:{{X}_{Observed\:})}^{2}]\right[n\sum\:{{X}_{estimated\:}}^{2}-(\sum\:{{X}_{estimated})}^{2}]}}$$ 4 RMSE (Eq. 5 ) and MAE (Eq. 6 ) are the most widely used statistical metrics for measuring model performances. The magnitude of deviation between observed and estimated data is given by RMSE and MAE, with a perfect score of 0 indicating no deviation. $$\:RMSE=\sqrt{\frac{1}{n}\:\sum\:_{i=1}^{n}{\left({X}_{Observed}-{X}_{estimated}\:\right)}^{2}}$$ 5 $$\:MAE=\:\frac{1}{n}\:\sum\:_{i=1}^{n}\left|{X}_{Observed}-{X}_{estimated\:}\:\right|\:$$ 6 The Multiplicative Bias (Eq. 7 ) quantifies the ratio between estimated and observed values. An ideal estimation yields a value of 1, while underestimation produces values below 1, and overestimation results in values exceeding 1 (Moazami et al., 2013 ). $$\:MBias=\:\frac{{\sum\:}_{i=1}^{n}{X}_{estimated\:}}{{\sum\:}_{i=1}^{n}{X}_{Observed}}$$ 7 Results and discussion Parameter tuning and variable importance The machine learning models were tuned for different hyperparameters to select the best hyperparameter values for Sentinel-2 and Landsat-8 data (Fig. 3 ). The RF algorithm's mtry and ntrees parameters were tuned, with final mtry values of 10 and 6, and ntrees values of 500 and 1500 were used for Sentinel-2 and Landsat-8, respectively. The SVM-RBF algorithm was tuned for sigma and cost parameters, with final sigma values of 0.03125 and cost values of 64 and 32 used for Sentinel-2 and Landsat-8, respectively. The cost parameter was tuned in the case of the SVM-linear algorithm, with the best values obtained being 22.62742 and 181.0193 for Sentinel-2 and Landsat-8, respectively. The XGBoost algorithm's parameters, including nrounds, max depth, eta, and gamma, were adjusted. The ultimate values selected were 100 for nrounds, 5 for max depth, and 0.02 and 0.04 for eta, with corresponding gamma values of 0.01 and 0.09 for Sentinel-2 and Landsat-8, respectively. Optimum band selection is very important for LAI estimation (He et al., 2020 ). Hence, the variable importance, which provides information about the contribution of each predictor variable in model creation, was calculated (Fig. 4 ). In the case of Sentinel-2 (Fig. 4 a), the SZA band, which represents the solar zenith angle, has the highest importance. Next to SZA, Bands 6 and 7 in Sentinel 2 MSI, which measure reflectance in the red-edge regions of the spectrum, have shown importance. Red edge bands in Sentinel 2 are strongly linked to LAI and can be used for retrieving LAI with a RMSE of 0.6 for various crop types (Delegido et al,2011) (He et al., 2020 ). These are commonly considered important for LAI estimation because they provide information about the chlorophyll content and health of the vegetation (Xu et al., 2019 ). Apart from Band 6 and 7, Band 8, which is the NIR band, has also been shown to be important in predicting LAI. The Near-Infrared (NIR) band holds significance in vegetation studies due to its high reflectance, attributed to the strong scattering of individual leaves and entire plants in this spectral region. In the case of Landsat-8 (Fig. 4 b), the B6 and B7, which are SWIR bands, have shown high importance. The NIR band in the case of Landsat-8 showed low importance as compared to Sentinel-2. The SWIR is sensitive to plant water content, and LAI is sensitive to the SWIR region, mostly in the 1000 nm to 1400 nm range (Jacquemoud et al., 2009 ). The importance of SWIR in the prediction of LAI has been demonstrated by various studies (e.g., Srinet et al., 2019 ; Abebe et al., 2022 ). The green and red spectral regions were also found to be important in LAI estimation after SWIR bands. The red and green spectral regions mostly indicate the leaf development pattern of the crops, as reflectance in these regions mostly depends on the leaf growth process (Motohka et al., 2010 ). LAI estimation using satellite data and machine learning algorithm and comparison with ground observed data The evaluation of Leaf Area Index (LAI) estimation using Sentinel-2 and Landsat-8 satellite data, leveraging different machine learning algorithms, reveals several significant findings. These results, summarized in Table 2 , indicate distinct performance patterns across the algorithms: Random Forest (RF), Support Vector Machine with Radial Basis Function (SVM-RBF), Support Vector Machine with a linear kernel (SVM linear), and XGBoost. The LAI estimated from the Sentinel data using the Random Forest (RF) and SVM-RBF both achieved a high R-value of 0.94, suggesting a robust correlation between the field-observed and satellite-estimated LAI. They also recorded an RMSE of 0.40 and a close MAE (RF: 0.29, SVM-RBF: 0.30). The MBias values for RF and SVM-RBF were 1.02 and 1.00, respectively, indicating minimal overestimation by RF and perfect estimation by SVM-RBF. XGBoost also demonstrated a high R-value of 0.94, though slightly higher RMSE (0.43) and MAE (0.33) compared to RF and SVM-RBF. Its MBias was 0.88, showing a slight underestimation. SVM linear had a noticeably lower R-value of 0.84, coupled with higher RMSE (0.62) and MAE (0.48), indicating less accuracy. Its MBias was 1.02, indicating a slight overestimation. The LAI estimated from the Landsat 8 data using RF and SVM-RBF both achieved an R-value of 0.94, reflecting a strong relationship between observed and predicted LAI. The RMSE for RF was 0.38 and MAE 0.28, while SVM-RBF recorded the lowest RMSE of 0.37 and an MAE of 0.29. Their MBias values were 1.00 for RF and 0.99 for SVM-RBF, indicating almost perfect estimation. XGBoost had an R-value of 0.93, with RMSE and MAE values of 0.40 and 0.30, respectively. Its MBias was 1.01, reflecting a slight overestimation. SVM linear showed the lowest performance with an R-value of 0.78, higher RMSE (0.69), and MAE (0.53). The MBias for SVM linear was 0.98, indicating a slight underestimation. Overall, the results indicate that RF and SVM-RBF consistently outperformed SVM linear and XGBoost across both satellite datasets. RF and SVM-RBF not only showed higher R-values but also lower RMSE and MAE values, which are critical for accurate LAI estimation. The MBias metric further underscored the precision of SVM-RBF, particularly with its perfect or near-perfect values, whereas RF showed slight overestimation tendencies. XGBoost, while comparable in terms of R-value, displayed marginally higher error metrics and variable MBias. SVM linear consistently underperformed across all metrics. These findings align with existing literature, which also recognizes RF and SVM (especially RBF kernel) as highly effective for vegetation parameter estimation, including LAI, in various ecosystems such as grasslands and forests (Omer et al., 2016 ; Srinet et al., 2019 ; Shen et al., 2022 ). The limitations drawn from the results are that the study relies on the availability of satellite data, which may not coincide perfectly with the ground observation dates. This mismatch can introduce discrepancies due to changes in crop conditions over time. The machine learning models were trained and validated using data from specific regions (Jhansi and Niwari). These models may not generalize well to different regions with different climatic, soil, and crop conditions without additional local calibration. The satellite data can be affected by atmospheric conditions such as cloud cover, haze, and aerosols, which can distort the reflectance values and impact the accuracy of LAI estimation. While atmospheric correction techniques are used, they may not fully eliminate these effects. The accuracy of the machine learning models depends on the quality and quantity of ground truth data. Limited or unevenly distributed ground truth data can reduce the robustness of the models. LAI is influenced by a multitude of factors including plant species, growth stages, and environmental conditions. Capturing this complexity requires highly detailed and temporally frequent data, which might not always be feasible. Table 2 Statistical metrics derived for predicted LAI for Sentine-2 and Landsat-8 using different machine learning algorithm Satellite ML algorithm R RMSE MAE MBias Sentinel-2 RF 0.94 0.40 0.29 1.02 SVM-RBF 0.94 0.40 0.30 1.00 SVM-Linear 0.84 0.62 0.48 1.02 XGBoost 0.94 0.43 0.33 0.88 Conclusion The present study focuses on validating the Leaf Area Index (LAI) estimated from Sentinel-2 and Landsat-8 satellite data against observed LAI measurements collected from smallholder farmers' fields in the districts of Jhansi and Niwari. The LAI estimation employed three advanced machine learning approaches: Random Forest (RF), Support Vector Machine (SVM), and XGBoost, utilizing all available spectral bands from the satellites. Additionally, the solar zenith angle (SZA) and view zenith angle (VZA) bands were integrated to enhance the accuracy of the LAI estimates. The key findings from the study were that the SVM and RF-based models showed a strong correlation with the observed LAI data and lower error rates compared to other methods. Specifically, the SVM approach was highly effective for Sentinel-2 data, while the RF approach excelled with Landsat-8 data. The variable importance analysis of the study helped to understand that the red-edge band and Near Infrared (NIR) bands in Sentinel-2 were identified as crucial for accurate LAI estimation. The red-edge band is significant due to its sensitivity to chlorophyll content and plant health, while the NIR band is important because of the high scattering of light by plant structures, which is indicative of biomass and leaf area. The Shortwave Infrared (SWIR) bands in Landsat-8, along with the red and green bands, were found to be highly important. The SWIR bands are sensitive to water content in plants, making them valuable for assessing plant health and stress. The red and green bands provide essential information on chlorophyll absorption and overall vegetation vigor. The inclusion of SZA and VZA bands helped to account for the variations in sunlight angles and sensor viewing angles, which can affect the reflectance values recorded by the satellites. This integration improved the robustness of the LAI estimates, making the models more reliable under varying observational conditions. This study demonstrates that machine learning approaches, particularly SVM for Sentinel-2 (R-value of 0.94, MBias of 1.00) and RF for Landsat-8 R-value of 0.94, MBias of 1.00), can provide highly accurate LAI estimates. These models leverage the specific strengths of different spectral bands to capture detailed information about plant health, biomass, and moisture content. The integration of additional bands like SZA and VZA further enhances the accuracy and reliability of these estimates under various observational conditions. The findings have significant implications for smallholder farmers in regions similar to Jhansi and Niwari. Accurate LAI estimation can help farmers monitor crop health, optimize resource use (such as water and fertilizers), and improve crop yield predictions. By validating these models with ground-truth data, the study provides a robust framework for using satellite data and machine learning in agricultural management and decision-making processes. This approach can be scaled and adapted to other regions and crop types, offering a powerful tool for enhancing agricultural productivity and sustainability through precise and timely information on crop health and growth. However, addressing the limitations mentioned is essential for further improving the accuracy and applicability of these models. Declarations Conflict of interest The authors declare that they have no known financial or personal conflicts of interest. Funding information This research did not receive any financial support from any external organization. Author Contribution P. P.: research conceptualization; data curation; primary analysis; writing the original draft. S. K.: partial analysis; writing and editing manuscript; S. N. K.: research conceptualization, manuscript editing, overall supervision. R. C. H.: field data collection and supervision. J. K. G.: field data collection. R. K.: manuscript editing. Acknowledgement The first author gratefully acknowledges the Indian Council of Agricultural Research (ICAR) – Indian Agricultural Research Institute for providing the opportunity for carrying out doctoral study. The authors also extend their gratitude to the National Aeronautics and Space Administration (NASA) and the European Space Agency (ESA) and for generously providing the satellite data essential for this research. Data availability The Sentinel 2 multispectral data was accessed and analysed in the Google Earth Engine cloud computing platform using the link https://developers.google.com/earth-engine/datasets/catalog/sentinel-2 , and the Landsat 8 data was accessed through https://developers.google.com/earth-engine/datasets/catalog/landsat-8 . References Abebe, G., Tadesse, T. and Gessesse, B., 2022. Estimating Leaf Area Index and biomass of sugarcane based on Gaussian process regression using Landsat 8 and Sentinel 1A observations. International Journal of Image and Data Fusion , pp.1-31. Asner, G.P., Scurlock, J.M. and A. Hicke, J., 2003. Global synthesis of leaf area index observations: implications for ecological and remote sensing studies. Global ecology and biogeography , 12 (3), pp.191-205. Atzberger, C., 2004. Object-based retrieval of biophysical canopy variables using artificial neural nets and radiative transfer models. Remote sensing of environment , 93 (1-2), pp.53-67. Baret, F., Hagolle, O., Geiger, B., Bicheron, P., Miras, B., Huc, M., Berthelot, B., Niño, F., Weiss, M., Samain, O. and Roujean, J.L., 2007. LAI, fAPAR and fCover CYCLOPES global products derived from VEGETATION: Part 1: Principles of the algorithm. Remote sensing of environment , 110 (3), pp.275-286. Baret, F., Weiss, M., Lacaze, R., Camacho, F., Makhmara, H., Pacholcyzk, P., & Smets, B. (2013). GEOV1: LAI and FAPAR essential climate variables and FCOVER global time series capitalizing over existing products. Part1: Principles of development and production. Remote sensing of environment, 137, 299-309. Beck, P.S., Atzberger, C., Høgda, K.A., Johansen, B. and Skidmore, A.K., 2006. Improved monitoring of vegetation dynamics at very high latitudes: A new method using MODIS NDVI. Remote sensing of Environment , 100 (3), pp.321-334. Belgiu, M. and Drăguţ, L., 2016. Random forest in remote sensing: A review of applications and future directions. ISPRS journal of photogrammetry and remote sensing , 114 , pp.24-31. Breiman, L., 2001. Random forests. Machine learning , 45 , pp.5-32. Chaurasia, S., Nigam, R., Bhattacharya, B.K., Sridhar, V.N., Mallick, K., Vyas, S.P., Patel, N.K., Mukherjee, J., Shekhar, C., Kumar, D. and Singh, P., 2011. Development of regional wheat VI-LAI models using Resourcesat-1 AWiFS data. Journal of Earth System Science , 120 (6), p.1113. Chen, R.C., Dewi, C., Huang, S.W. and Caraka, R.E., 2020. Selecting critical features for data classification based on machine learning methods. Journal of Big Data , 7 (1), p.52. Chen, T. and Guestrin, C., 2016, August. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785-794). Cui, Z. and Kerekes, J.P., 2018. Potential of red edge spectral bands in future landsat satellites on agroecosystem canopy green leaf area index retrieval. Remote Sensing , 10 (9), p.1458. Dente, L., Satalino, G., Mattia, F. and Rinaldi, M., 2008. Assimilation of leaf area index derived from ASAR and MERIS data into CERES-Wheat model to map wheat yield. Remote sensing of Environment , 112 (4), pp.1395-1407. Djamai, N. and Fernandes, R., 2018. Comparison of SNAP-derived Sentinel-2A L2A product to ESA product over Europe. Remote Sensing , 10 (6), p.926. Dubey, S.K., Gavli, A.S., Yadav, S.K., Sehgal, S. and Ray, S.S., 2018. Remote sensing-based yield forecasting for sugarcane (Saccharum officinarum L.) crop in India. Journal of the Indian Society of Remote Sensing , 46 , pp.1823-1833. Filipponi, F., 2021. Comparison of LAI Estimates from High Resolution Satellite Observations Using Different Biophysical Processors. In Biology and Life Sciences Forum (Vol. 3, No. 1, p. 5). Multidisciplinary Digital Publishing Institute. Ganguly, S., Nemani, R.R., Zhang, G., Hashimoto, H., Milesi, C., Michaelis, A., Wang, W., Votava, P., Samanta, A., Melton, F. and Dungan, J.L., 2012. Generating global leaf area index from Landsat: Algorithm formulation and demonstration. Remote Sensing of Environment , 122 , pp.185-202. GCOS ECV. (2024). Essential climate variables. GCOS. https://gcos.wmo.int/en/essential-climate-variables Ghosh, A. and Joshi, P.K., 2014. A comparison of selected classification algorithms for mapping bamboo patches in lower Gangetic plains using very high resolution WorldView 2 imagery. International Journal of Applied Earth Observation and Geoinformation , 26 , pp.298-311. Gozdowski, D., Stępień, M., Panek, E., Varghese, J., Bodecka, E., Rozbicki, J. and Samborski, S., 2020. Comparison of winter wheat NDVI data derived from Landsat 8 and active optical sensor at field scale. Remote Sensing Applications: Society and Environment , 20 , p.100409. He, L., Ren, X., Wang, Y., Liu, B., Zhang, H., Liu, W., Feng, W. & Guo, T. (2020). Comparing methods for estimating leaf area index by multi-angular remote sensing in winter wheat. Scientific Reports, 10(1), 13943. Huang, J., Gómez-Dans, J.L., Huang, H., Ma, H., Wu, Q., Lewis, P.E., Liang, S., Chen, Z., Xue, J.H., Wu, Y. and Zhao, F., 2019. Assimilation of remote sensing into crop growth models: Current status and perspectives. Agricultural and forest meteorology , 276 , p.107609. Jacquemoud, S., Verhoef, W., Baret, F., Bacour, C., Zarco-Tejada, P.J., Asner, G.P., François, C. and Ustin, S.L., 2009. PROSPECT+ SAIL models: A review of use for vegetation characterization. Remote sensing of environment , 113 , pp.S56-S66. Jonckheere, I., Fleck, S., Nackaerts, K., Muys, B., Coppin, P., Weiss, M. and Baret, F., 2004. Review of methods for in situ leaf area index determination: Part I. Theories, sensors and hemispherical photography. Agricultural and forest meteorology , 121 (1-2), pp.19-35. Kang, Y., Ozdogan, M., Gao, F., Anderson, M.C., White, W.A., Yang, Y., Yang, Y. and Erickson, T.A., 2021. A data-driven approach to estimate leaf area index for Landsat images over the contiguous US. Remote Sensing of Environment , 258 , p.112383. Kavzoglu, T. and Colkesen, I., 2009. A kernel functions analysis for support vector machines for land cover classification. International Journal of Applied Earth Observation and Geoinformation , 11 (5), pp.352-359. Kavzoglu, T. and Teke, A., 2022. Advanced hyperparameter optimization for improved spatial prediction of shallow landslides using extreme gradient boosting (XGBoost). Bulletin of Engineering Geology and the Environment , 81 (5), p.201. Kowalski, K., Senf, C., Hostert, P. and Pflugmacher, D., 2020. Characterizing spring phenology of temperate broadleaf forests using Landsat and Sentinel-2 time series. International Journal of Applied Earth Observation and Geoinformation , 92 , p.102172. Maxwell, A.E., Warner, T.A. and Fang, F., 2018. Implementation of machine-learning classification in remote sensing: An applied review. International Journal of Remote Sensing , 39 (9), pp.2784-2817. Mkhabela, M.S., Bullock, P., Raj, S., Wang, S. and Yang, Y., 2011. Crop yield forecasting on the Canadian Prairies using MODIS NDVI data. Agricultural and Forest Meteorology , 151 (3), pp.385-393. Moazami, S., Golian, S., Kavianpour, M.R. and Hong, Y., 2013. Comparison of PERSIANN and V7 TRMM Multi-satellite Precipitation Analysis (TMPA) products with rain gauge data over Iran. International journal of remote sensing , 34 (22), pp.8156-8171. Motohka, T., Nasahara, K.N., Oguma, H. and Tsuchida, S., 2010. Applicability of green-red vegetation index for remote sensing of vegetation phenology. Remote Sensing , 2 (10), pp.2369-2387 Mourad, R., Jaafar, H., Anderson, M. and Gao, F., 2020. Assessment of leaf area index models using harmonized landsat and sentinel-2 surface reflectance data over a semi-arid irrigated landscape. Remote Sensing , 12 (19), p.3121. Mudi, S., Paramanik, S., Behera, M.D., Prakash, A.J., Deep, N.R., Kale, M.P., Kumar, S., Sharma, N., Pradhan, P., Chavan, M. and Roy, P.S., 2022. Moderate resolution LAI prediction using Sentinel-2 satellite data and indirect field measurements in Sikkim Himalaya. Environmental Monitoring and Assessment , 194 (12), p.897. Nihar, A., Patel, N.R., Pokhariyal, S. and Danodia, A., 2022. Sugarcane crop type discrimination and area mapping at field scale using sentinel images and machine learning methods. Journal of the Indian Society of Remote Sensing , pp.1-9. Onojeghuo, A.O., Blackburn, G.A., Wang, Q., Atkinson, P.M., Kindred, D. and Miao, Y., 2018. Mapping paddy rice fields by applying machine learning algorithms to multi-temporal Sentinel-1A and Landsat data. International journal of remote sensing , 39 (4), pp.1042-1067. Pasqualotto, N., Bolognesi, S.F., Belfiore, O.R., Delegido, J., D’Urso, G. and Moreno, J., 2019, October. Canopy chlorophyll content and LAI estimation from Sentine1-2: Vegetation indices and Sentine1-2 Leve1-2A automatic products comparison. In 2019 IEEE International Workshop on Metrology for Agriculture and Forestry (MetroAgriFor) (pp. 301-306). IEEE. Prasad, N.R., Patel, N.R. and Danodia, A., 2021. Cotton Yield Estimation Using Phenological Metrics Derived from Long-Term MODIS Data. Journal of the Indian Society of Remote Sensing , 49 , pp.2597-2610. Raj, R., Walker, J. P., Pingale, R., Nandan, R., Naik, B., & Jagarlapudi, A. (2021). Leaf area index estimation using top-of-canopy airborne RGB images. International Journal of Applied Earth Observation and Geoinformation, 96, 102282. Ren, J., Chen, Z., Zhou, Q. and Tang, H., 2008. Regional yield estimation for winter wheat with MODIS-NDVI data in Shandong, China. International Journal of Applied Earth Observation and Geoinformation , 10 (4), pp.403-413. Rodriguez-Galiano, V.F., Ghimire, B., Rogan, J., Chica-Olmo, M. and Rigol-Sanchez, J.P., 2012. An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS journal of photogrammetry and remote sensing , 67 , pp.93-104. Srinet, R., Nandy, S. and Patel, N.R., 2019. Estimating leaf area index and light extinction coefficient using Random Forest regression algorithm in a tropical moist deciduous forest, India. Ecological Informatics , 52 , pp.94-102. Sun, Y., Qin, Q., Ren, H. and Zhang, Y., 2021. Decameter cropland LAI/FPAR estimation from sentinel-2 imagery using google earth engine. IEEE Transactions on Geoscience and Remote Sensing , 60 , pp.1-14. Sun, Y., Qin, Q., Ren, H., Zhang, T. and Chen, S., 2019. Red-edge band vegetation indices for leaf area index estimation from Sentinel-2/MSI imagery. IEEE Transactions on Geoscience and Remote Sensing , 58 (2), pp.826-840. Trimble Navigation Limited (2012) GreenSeeker® Handheld Crop Sensor. Available at: https://agriculture.trimble.com/product/greenseeker-handheld-crop-sensor/. Tripathi, R., Sahoo, R.N., Gupta, V.K., Sehgal, V.K. and Sahoo, P.M., 2013. Developing Vegetation Health Index from biophysical variables derivedusing modis satellite data in the trans-gangetic plains of india. Emirates Journal of Food and Agriculture , pp.376-384. Verrelst, J., Rivera, J.P., Veroustraete, F., Muñoz-Marí, J., Clevers, J.G., Camps-Valls, G. and Moreno, J., 2015. Experimental Sentinel-2 LAI estimation using parametric, non-parametric and physical retrieval methods–A comparison. ISPRS Journal of Photogrammetry and Remote Sensing , 108 , pp.260-272. Xie, Q., Dash, J., Huete, A., Jiang, A., Yin, G., Ding, Y., Peng, D., Hall, C.C., Brown, L., Shi, Y. and Ye, H., 2019. Retrieval of crop biophysical parameters from Sentinel-2 remote sensing imagery. International Journal of Applied Earth Observation and Geoinformation , 80 , pp.187-195. Xu, N., Tian, J., Tian, Q., Xu, K. and Tang, S., 2019. Analysis of vegetation red edge with different illuminated/shaded canopy proportions and to construct normalized difference canopy shadow index. Remote Sensing , 11 (10), p.1192. Zheng, B., Myint, S.W., Thenkabail, P.S. and Aggarwal, R.M., 2015. A support vector machine to identify irrigated crop types using time-series Landsat NDVI data. International Journal of Applied Earth Observation and Geoinformation , 34 , pp.103-112. Ganguly, S., Nemani, R., Zhang, G., Hashimoto, H., Milesi, C., Michaelis, A., Wang, W., Votava, P., Samanta, A., Melton, F., Dungan, J., Vermote, E., Gao, F., Knyazikhin, Y., & Myneni, R. (2012). Generating global Leaf Area Index from Landsat: Algorithm formulation and demonstration. Remote Sensing of Environment, 122, 185-202. https://doi.org/10.1016/J.RSE.2011.10.032. Ali, A., Darvishzadeh, R., & Skidmore, A. (2017). Retrieval of Specific Leaf Area From Landsat-8 Surface Reflectance Data Using Statistical and Physical Models. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 10, 3529-3536. https://doi.org/10.1109/JSTARS.2017.2690623. Gonzalez-Sanpedro, M., Toan, T., Moreno, J., Kergoat, L., & Rubio, E. (2008). Seasonal variations of leaf area index of agricultural fields retrieved from Landsat data. Remote Sensing of Environment, 112, 810-824. https://doi.org/10.1016/J.RSE.2007.06.018. Kimes, D. S., Knyazikhin, Y., Privette, J. L., Abuelgasim, A. A., & Gao, F. (2000). Inversion methods for physically‐based models. Remote Sensing Reviews, 18(2-4), 381-439. Wang, T., Xiao, Z., & Liu, Z. (2017). Performance Evaluation of Machine Learning Methods for Leaf Area Index Retrieval from Time-Series MODIS Reflectance Data. Sensors (Basel, Switzerland), 17. https://doi.org/10.3390/s17010081. Houborg, R., & Mccabe, M. (2018). A hybrid training approach for leaf area index estimation via Cubist and random forests machine-learning. Isprs Journal of Photogrammetry and Remote Sensing, 135, 173-188. https://doi.org/10.1016/J.ISPRSJPRS.2017.10.004. Cervantes, J., García, F., Rodríguez-Mazahua, L., & Chau, A. (2020). A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing, 408, 189-215. https://doi.org/10.1016/j.neucom.2019.10.118. Vapnik, V. (1979). Estimation of dependences based on empirical data. Springer-verlag. Steinwart, I., Hush, D., & Scovel, C. (2006). An Explicit Description of the Reproducing Kernel Hilbert Spaces of Gaussian RBF Kernels. IEEE Transactions on Information Theory, 52, 4635-4643. https://doi.org/10.1109/TIT.2006.881713. Chakraborty, D., Sarkar, A., & Maulik, U. (2016). A new isotropic locality improved kernel for pattern classifications in remote sensing imagery. spatial statistics, 17, 71-82. https://doi.org/10.1016/J.SPASTA.2016.04.003. Delegido, J., Verrelst, J., Alonso, L., & Moreno, J. (2011). Evaluation of Sentinel-2 Red-Edge Bands for Empirical Estimation of Green LAI and Chlorophyll Content. Sensors (Basel, Switzerland), 11, 7063 - 7081. https://doi.org/10.3390/s110707063. Shen, B., Ding, L., Ma, L., Li, Z., Pulatov, A., Kulenbekov, Z., Chen, J., Mambetova, S., Hou, L., Xu, D., Wang, X., & Xin, X. (2022). Modeling the Leaf Area Index of Inner Mongolia Grassland Based on Machine Learning Regression Algorithms Incorporating Empirical Knowledge. Remote. Sens., 14, 4196. https://doi.org/10.3390/rs14174196. Srinet, R., Nandy, S., & Patel, N. (2019). Estimating leaf area index and light extinction coefficient using Random Forest regression algorithm in a tropical moist deciduous forest, India. Ecol. Informatics, 52, 94-102. https://doi.org/10.1016/J.ECOINF.2019.05.008. Omer, G., Mutanga, O., Abdel-Rahman, E., & Adam, E. (2016). Empirical Prediction of Leaf Area Index (LAI) of Endangered Tree Species in Intact and Fragmented Indigenous Forests Ecosystems Using WorldView-2 Data and Two Robust Machine Learning Algorithms. Remote. Sens., 8, 324. https://doi.org/10.3390/rs8040324. Siegmann, B., & Jarmer, T. (2015). Comparison of different regression models and validation techniques for the assessment of wheat leaf area index from hyperspectral data. International Journal of Remote Sensing, 36, 4519 - 4534. https://doi.org/10.1080/01431161.2015.1084438. Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4685508","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":337898199,"identity":"9b40df7d-6409-497c-9ed7-e1f5ef4fe2c9","order_by":0,"name":"Pratibha Prakash","email":"","orcid":"","institution":"ICAR – Indian Agricultural Research Institute","correspondingAuthor":false,"prefix":"","firstName":"Pratibha","middleName":"","lastName":"Prakash","suffix":""},{"id":337898200,"identity":"6a68fe61-378d-47b7-9c88-588d116ca088","order_by":1,"name":"Swadhina Koley","email":"","orcid":"","institution":"ICAR – Indian Agricultural Research Institute","correspondingAuthor":false,"prefix":"","firstName":"Swadhina","middleName":"","lastName":"Koley","suffix":""},{"id":337898201,"identity":"af4665ac-247d-4964-9a7a-95c34c347212","order_by":2,"name":"Soora Naresh Kumar","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABBklEQVRIiWNgGAWjYLACxoYEBIcfRCQUkKJFsgGkxYAULQYHwCRu1ebsZx8+/LkjLXFt+9nHHz5U3JM3Pr868cMDAwZ5frEDWLVY9qQbG/OeyUncdibdTHLGmWLDbTfebpYAOsxw5uwErFoMDqSxSTO2VSRuAzKYedsSGLfdOLsBpCXB4DYOLeefsf/8CdJy/hnzZ6AW+80zzm7+gVfLjTQ2Bt42oMNupDFIA7UkbuDv3YbflhvPmIEq04y33XjGBvRLQvKMG7zbLBIMJHD75Xwa48efbcmy286nMQNDLMG2v//s5ps/Kmzk+aWxa8ECJMAqJYhVDgL8B0hRPQpGwSgYBSMAAACp+2ZlVd6oIwAAAABJRU5ErkJggg==","orcid":"","institution":"ICAR – Indian Agricultural Research Institute","correspondingAuthor":true,"prefix":"","firstName":"Soora","middleName":"Naresh","lastName":"Kumar","suffix":""},{"id":337898202,"identity":"1e480af1-0480-4899-9055-589c676ead77","order_by":3,"name":"Ramesh Chand Harit","email":"","orcid":"","institution":"ICAR – Indian Agricultural Research Institute","correspondingAuthor":false,"prefix":"","firstName":"Ramesh","middleName":"Chand","lastName":"Harit","suffix":""},{"id":337898203,"identity":"c21183cf-37ee-4559-a9b5-bf42f634b26e","order_by":4,"name":"Jitender Kumar Gupta","email":"","orcid":"","institution":"ICAR – Indian Agricultural Research Institute","correspondingAuthor":false,"prefix":"","firstName":"Jitender","middleName":"Kumar","lastName":"Gupta","suffix":""},{"id":337898204,"identity":"64725833-ed4e-4d6c-8bf7-4eb7c5676146","order_by":5,"name":"Ravi Kumar","email":"","orcid":"","institution":"ICAR-National Bureau of Soil Survey \u0026 Landuse Planning","correspondingAuthor":false,"prefix":"","firstName":"Ravi","middleName":"","lastName":"Kumar","suffix":""}],"badges":[],"createdAt":"2024-07-04 09:40:41","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4685508/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4685508/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":62101432,"identity":"dbc958bd-316f-4798-8f07-950192344397","added_by":"auto","created_at":"2024-08-09 09:46:26","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":426309,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eStudy area map showing Jhansi and Niwari districts with village locations\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"Fig1.png","url":"https://assets-eu.researchsquare.com/files/rs-4685508/v1/c17cb4dc3251a101d521632a.png"},{"id":62101429,"identity":"1c8ec6c4-76c4-46e2-bf76-93f7f4798c68","added_by":"auto","created_at":"2024-08-09 09:46:25","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":142707,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFlowchart illustrating for LAI and NDVI estimation from satellite imagery and validation from observed data\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"Fig2.png","url":"https://assets-eu.researchsquare.com/files/rs-4685508/v1/048c09e782dc2aad264a922f.png"},{"id":62101437,"identity":"5322923c-5a9e-439e-ae7b-ad69d3575268","added_by":"auto","created_at":"2024-08-09 09:46:27","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":417277,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ea: Parameter tuning for RF and SVM algorithms\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eb: Parameter tuning for XGBoost algorithm for Sentinel (left), and Landsat (right)\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"Fig3.png","url":"https://assets-eu.researchsquare.com/files/rs-4685508/v1/81728cf42c88802dc5588780.png"},{"id":62101428,"identity":"1f2e61d2-135d-41b0-96df-1ae00150b806","added_by":"auto","created_at":"2024-08-09 09:46:25","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":27726,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eVariable importance curve for a) Sentinel-2 b) Landsat-8\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"Fig4.png","url":"https://assets-eu.researchsquare.com/files/rs-4685508/v1/b9c1a5a6759a85451dbc24f0.png"},{"id":62101434,"identity":"e901d0e8-a325-4988-8551-0996212ca3d4","added_by":"auto","created_at":"2024-08-09 09:46:27","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":684583,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eScatter plot showing predicted and estimated LAI a) Sentinel-2 b) Landsat and different machine learning algorithms\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"Fig5.png","url":"https://assets-eu.researchsquare.com/files/rs-4685508/v1/207d2719db76b379fe05a571.png"},{"id":64461132,"identity":"da5d0956-588b-438c-bfdf-607c7721e8d8","added_by":"auto","created_at":"2024-09-13 12:47:25","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2150926,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4685508/v1/0a0dc089-9d64-40f3-9c53-efdc3fc2219c.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Estimation of the satellite-derived Leaf Area Index of spring wheat using machine learning approaches","fulltext":[{"header":"Introduction","content":"\u003cp\u003eContinuous crop monitoring is essential for making informed crop management and yield optimization decisions. Leaf Area Index (LAI) is a crucial biophysical variable used for crop monitoring, and farmers and researchers can track crop growth and health and detect stress to make informed decisions about crop management practices by continuously monitoring LAI. The Global Climate Observing System listed LAI as an Essential Climate Variable and a key variable for models studying vegetation-atmosphere interactions (Baret et al. \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2013\u003c/span\u003e, GCOS 2024). Information-based management can help to improve crop yield and quality, lower management costs, and increase the sustainability of agricultural production. Remote sensing data comes in handy in estimating LAI as these data are available at repetitive time scales covering large areas on the ground (Filgueiras et al., 2019). Estimating biophysical variables is a crucial aspect of remote sensing applications, as it allows for non-destructive estimation of these variables across extensive areas (Mourad et al., \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). LAI is a dimensionless canopy structure parameter and is described as an area of one side leaf per unit ground area and is a key biophysical variable in agricultural and environmental studies (Jonckheere et al., \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2004\u003c/span\u003e). It provides information on the amount of light intercepted by the crop canopy (Ganguly et al., \u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e2012\u003c/span\u003e; Cui et al., 2018). LAI can be measured most accurately using a direct method, which is destructive sampling of leaves and using field instruments. However, these measurement methods have their limitations as they are labour-intensive, non-economical and time-consuming (Raj et al., \u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). Many studies have estimated LAI using satellite remote sensing data with reasonable accuracies (Tripathi et al., \u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e2013\u003c/span\u003e; Djamai et al., 2018; Xie et al., \u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Mourad et al., \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Filipponi, \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Kang et al., \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Sun et al., \u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). Estimated LAI is an important biophysical variable for yield estimation and can be assimilated in the crop simulation model using data assimilation methods (Dente et al., \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2008\u003c/span\u003e; Huang et al., \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e2019\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eEstimation of LAI from optical remote sensing data can be categorized into two methods: (1) empirical relationships between satellite-derived LAI (vegetation index) and ground-measured LAI and (2) physical model-based inversion. Empirical methods are simple and computationally easier as compared to the model-based approaches and provide an acceptable level of accuracy in LAI estimation (Atzberger, \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2004\u003c/span\u003e; Cui et al., 2018; Pasqualotto et al., \u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e2019\u003c/span\u003e). One drawback of empirical methods is their limited applicability to local scales, as the established relationships are specific to particular locations. Additionally, this approach necessitates multiple calibrations with ground-based observational data (Sun et al., 2022). The physical model-based approach uses the canopy radiative transfer model (RTM), and the most preferred RTM method is PROSAIL (Jacquemoud et al., \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2009\u003c/span\u003e). Based on interactions between radiation, canopy components, and soil surface, RTM inversion is carried out utilising reflectance and auxiliary variables. The RTM inversion is achieved using a Look-up Table (LUT) and machine learning approaches. This method has shown strong potential for biophysical variables estimation (Verrelst et al., \u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e2015\u003c/span\u003e; Xie et al., \u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e2019\u003c/span\u003e). The physical model-based inversion method is employed to generate global-scale Leaf Area Index products from Moderate Resolution Imaging Spectroradiometer (MODIS) data, albeit at a coarse spatial and temporal resolution.\u003c/p\u003e \u003cp\u003eMODIS satellites, launched in the years 1999 (terra) and 2002 (aqua), provide NDVI at a temporal resolution of 16 days and spatial resolution of 250m. The MODIS data are widely used in vegetation dynamics and monitoring studies at global and at regional levels scale such as the mountain grassland leaf area index is (estimated using the inversion radiative transfer model with decent RMSE 1.62 (m2/m2) (e.g., Beck et al., \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2006\u003c/span\u003e; Ren et al., \u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e2008\u003c/span\u003e; Mkhabela et al., \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2011\u003c/span\u003e; Pasolli et al., 2015; Dubey et al., \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Prasad et al., \u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). However, these products are of limited use in areas where agricultural land holding are very small. The earth observation satellites such as Landsat and Sentinel which have a spatial resolution in the range of 10\u0026ndash;60 m can be used in area of small land holdings. The Landsat program by the United States Geological Survey (USGS) was launched in 1972 and provides time series satellite images at spatial resolution of 30 m and a revisit period of 16 days. By employing statistical and radiative transfer model (RTM) inversion techniques, Landsat-8 imagery is able to provide precise estimates of specific leaf area (SLA) at both regional and global scales (Ganguly et al., \u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e2012\u003c/span\u003e, Ali et al., \u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e2017\u003c/span\u003e) The LAI model inversion approach using multitemporal optical data from Landsat effectively derives leaf area LAI for various crop types, showing consistent seasonal variations with crop phenological stages (Gonzalez-Sanpedro et al., \u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e2008\u003c/span\u003e). An improvement in the spatial and temporal resolution is provided by the European Space Agency (ESA) Sentinel satellites, launched in 2015. A constellation of two satellites Sentinel 2A and Sentinel 2B provides imagery at 10m to 60 m spatial resolution at 5-day revisit time between the two satellites. These satellite sensors data have been used in different studies for vegetation analysis and estimation of crop biophysical parameters such as leaf area index, canopy chlorophyll content (Zheng et al., \u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e2015\u003c/span\u003e; Onojeghuo et al., \u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Kowalski et al., \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Nihar et al., \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). The presence of Red edge bands makes Sentinel 2 data of particular interest for LAI retrieval from winter wheat (Xie et al., \u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e2019\u003c/span\u003e). Sentinel-2 Band-8A-Narrow Near InfraRed is more accurate for Leaf Area Index estimation in cotton, tomato, and wheat, while Band-9 (Water vapor) shows a high correlation with LAI, facilitating more accurate agricultural monitoring than traditional Vegetation Indices.\u003c/p\u003e \u003cp\u003eThe traditionally used inversion techniques involve a minimization of a cost function which requires excessive computation time to achieve the high retrieval accuracy (Kimes et. al, 2009, Wang et al., \u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e2017\u003c/span\u003e). Machine learning algorithms for LAI retrieval can improve prediction accuracies and spatial consistencies (Houborg et al. 2018).Given the potential application of satellite-estimated LAI for crop management, this study aims at validating satellite-estimated LAI for spring wheat crops. This study proposes a framework for retrieving LAI from Landsat-8 and Sentinel-2 satellite data using three machine learning methods i.e., Random Forest (RF), Support Vector Machine (SVM) and XGBoost. In addition, the field-observed LAI is compared with the satellite-derived ones.\u003c/p\u003e"},{"header":"Materials and method","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eStudy area\u003c/h2\u003e \u003cp\u003eThis study was conducted for spring wheat (\u003cem\u003eTriticum aestivum\u003c/em\u003e L.) in farmer\u0026rsquo;s fields in six villages situated in the Jhansi district of Uttar Pradesh and Niwari district of Madhya Pradesh, India (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). The sample study site lies in the semi-arid, drought-prone and water-scarce Bundelkhand region of Central India. Spring wheat is a major crop grown in the winter season both in the Jhansi and Niwari districts of the Bundelkhand region. These two districts are located between 78\u0026deg; 15\u0026rsquo; E to 79\u0026deg; 24\u0026rsquo;E longitude and 25\u0026deg; 15\u0026rsquo; N to 25\u0026deg; 45\u0026rsquo;N latitude at a mean elevation of 285m. The field observations for LAI were taken for two years (2020\u0026ndash;2021 and 2021\u0026ndash;2022) during three crop growth stages.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003eDataset and Methodology\u003c/h2\u003e \u003cdiv id=\"Sec5\" class=\"Section3\"\u003e \u003ch2\u003eGround observations\u003c/h2\u003e \u003cp\u003eThe ground observed data on LAI was collected from 175 farmer\u0026rsquo;s fields in six villages viz., Pathari, Kothkhera, Nayagaon, and Durgapura, situated in Jhansi district of Uttar Pradesh, and Mathurapura and Ramnagar situated in Niwari district of Madhya Pradesh. LAI data were collected in 175 farmers\u0026rsquo; fields in year 1, i.e., 2020\u0026ndash;2021 and in 131 farmers\u0026rsquo; fields in year 2, i.e., 2021\u0026ndash;2022, at three stages of crop growth (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). The average field sizes for farmers in Jhansi and Niwari districts are small to semi-medium size land holdings. The discrepancy in the number of farmers in two years is due to the shift of 44 farmers to green peas from wheat. LAI was assessed utilizing the LICOR 2000 plant canopy analyzer, a portable device employing indirect and non-contact estimation techniques (Asner et al., \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2003\u003c/span\u003e). The calculation of LAI involves radiation readings obtained via a fish-eye optical sensor. Canopy light interception is gauged at five angles, both above and below the canopy, and the radiative transfer model is employed to determine LAI (Chaurasia et al., \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2011\u003c/span\u003e).\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003eSatellite data acquisition and processing\u003c/h2\u003e \u003cp\u003eSatellite imagery from satellites, i.e., Sentinel-2 and Landsat-8, were downloaded for crop growth periods (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). Landsat-8 satellite data provided by the USGS has a spatial resolution of 30 m and a temporal resolution of 16 days. Sentinel-2 data provided by the ESA has a better spatial (10 m) and temporal resolution (5 days). The LAI estimation was carried out using all the bands except for the thermal band present in Landsat. Additional bands that give mean Sun sensor geometry, i.e., Solar Zenith Angle (SZA) and View Zenith Angle (VZA), were also used. The absolute cosine of these sun-sensor geometry indices has been extracted and used for model building. The overall methodology for LAI estimation using Sentinel-2 and Landsat-8 data is represented in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eDates of recording LAI in farmer\u0026rsquo;s field and dates of acquisition of Sentinel-2 and Landsat-8 imagery for LAI estimation\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eCrop year\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eStage\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eDate of ground observation\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colspan=\"2\" nameend=\"c5\" namest=\"c4\"\u003e \u003cp\u003eDate of Acquisition\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eSentinel-2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eLandsat-8\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eYear 1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eVegetative\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAnthesis\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e23 to 25-Jan-2021\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e21-Jan-2021\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e20-Jan-2021\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDough\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e8 to 10-Mar-2021\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e7-Mar-2021\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e9-Mar-2021\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eYear 2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eVegetative\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e22 to 25-Dec-2021\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e22-Dec-2021\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e22-Dec-2021\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAnthesis\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e23 to 26-Feb-2022\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e22-Feb-2022\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e24-Feb-2022\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDough\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e21 to 26-Mar-2022\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e22-Mar-2022\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e28-Mar-2022\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eMachine learning approach\u003c/h3\u003e\n\u003cp\u003eMachine learning algorithms comprise a collection of methods designed to autonomously discern patterns within data and utilize these patterns for regression tasks. They represent a progression beyond linear regression models, capable of operating effectively on intricate and non-linear datasets. In regression, the objective is to forecast a value based on input data (Belgiu et al., 2016; Cooner et al., 2016). The goal of regression is to predict a value based on input datasets. These algorithms undergo training using a dataset, learning from it, and subsequently leveraging this acquired knowledge to make predictions on fresh, untrained data (Mahesh, 2020).\u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003ei) Random Forest (RF)\u003c/h2\u003e \u003cp\u003eThe RF algorithm stands out as a prevalent choice for regression due to its robustness, speed, and accuracy (Mudi et al., \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Goyal et al., 2022). It operates by combining multiple trees trained on data subsets (Sun et al., 2022), employing a majority voting approach for predictions (Breiman, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2001\u003c/span\u003e). By growing multiple decision trees from bootstrapped samples, RF diminishes overfitting and enhances model accuracy; it also estimates generalization performance using out-of-bag error (Maxwell et al., \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). Individual trees are trained using in-bag samples, while out-of-bag samples are used to estimate the out-of-bag error, which provides a measure of the model's generalisation performance. Using a majority voting or weighted voting scheme, the final prediction is made by aggregating the predictions of all trees. When compared to individual decision trees, the RF classifier has several advantages, including improved accuracy, ability to exclude outliers, and handling high-dimensional data and large datasets, and reduced overfitting (Rodriguez-Galiano et al., \u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e2012\u003c/span\u003e).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003eii) Support Vector Machine (SVM)\u003c/h2\u003e \u003cp\u003eSVM introduced by Vapnik in 1979 is one of the non-parametric methods that has been extensively employed in numerous studies (Cervantes et al.,2020). On the basis of several optimization techniques or kernels, the SVM classifier generates an ideal decision boundary, also known as a hyperplane. To accommodate the hyperplane, various kernel alternatives are available (Kavzoglu et al., 2009). Radial basis function (RBF) and polynomial kernel are two examples of often employed kernels and also improves the classification accuracy in the remote sensing imagery ( Steinwart et al., \u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e2006\u003c/span\u003e)(Chakraborty et al., \u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e2016\u003c/span\u003e) Cost, or the C parameter, and gamma (γ), are two important factors that are required to be set for implementation in the SVM algorithm. The C parameter plays a crucial role in order to create a complex decision boundary. The greater the C value, the decision boundary will be more complicated. Therefore, the ideal C value for SVM needs to be determined carefully, as a higher C value can result in overfitting of the model. Using γ, the hyperplane\u0026rsquo;s form is determined (Ghosh et al., 2014).\u003c/p\u003e \u003cdiv id=\"Sec10\" class=\"Section3\"\u003e \u003ch2\u003eiii) Xtreme Gradient Boosting (XGBoost)\u003c/h2\u003e \u003cp\u003eExtreme Gradient Boosting (XGBoost), created by Chen et al. (2016), is also a popular and powerful machine learning algorithm for regression problems. This algorithm performs iteratively to construct a series of prediction-making decision trees. After each iteration, XGBoost modifies the prediction according to the loss function, which measures the difference between the predicted and actual values. By adjusting the parameters of the decision trees, XGBoost attempts to minimise the loss function. This process is repeated until the loss function reaches its minimum value, which corresponds to the optimal prediction. The XGBoost algorithm iteratively increases prediction accuracy by considering the loss function after training the model with a significant bias error.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003eVariable importance and Hyperparameter tuning\u003c/h2\u003e \u003cp\u003eIn machine learning, variable importance is the degree of importance shown by some variables that will have more participation in the model preparation as compared to other variables. It assesses the relationship of predictors with the target variable and how these predictors affect the model outcome (Chen et al., \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). Hyperparameters are the parameters in machine learning algorithms that need to be trained before using them for final model creation. For each ML algorithm, there are different hyperparameters that need to be tuned, for example, the number of trees (ntrees) and the number of variables at each split (mtry) in RF.\u003c/p\u003e \u003cp\u003eIn the current study hyperparameter of each ML algorithm was tuned before using for final model prediction. The Random Forest was tuned for ntrees and mtry variables. The ntree is the final number of trees to be grown, whereas mtry determines the number of variables at each split. In the case of SVM, the Cost (C) parameter for both linear and RBF and gamma (γ) only for RBF kernel was tuned. The XGBoost was tuned for various hyperparameters such as the number of iterations (nrounds), maximum depth of the tree (max_depth), learning rate (eta) and minimum loss reduction (gamma). The nrounds determine the number of iterations to be used in the model preparation, whereas max_depth is the maximum depth in the trees. The (eta) controls the learning rate, and gamma is the minimum loss reduction needed to make more partitions in a leaf node of the tree (Kavzoglu et al., 2022).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003eLAI estimation from satellite imagery\u003c/h2\u003e \u003cp\u003eThe machine learning based modelling was carried out for LAI prediction using all available bands for Sentinel-2 and Landsat-8 and additional SZA and VZA bands. The hyperparameters of the models were tuned, and using the most appropriate hyperparameters, the model was created. The dataset was split into a 70:30 ratio, where 70% of the LAI observed is utilized to train the model and 30% is used to test the model. The model accuracy was evaluated based on the Pearson correlation coefficient (R), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Multiplicative Bias (Mbias).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003eStatistical evaluation and validation of satellite-estimated LAI\u003c/h2\u003e \u003cp\u003eThe statistical properties such as mean (Eq.\u0026nbsp;\u003cspan refid=\"Equ1\" class=\"InternalRef\"\u003e1\u003c/span\u003e), standard deviation (SD) (Eq.\u0026nbsp;\u003cspan refid=\"Equ2\" class=\"InternalRef\"\u003e2\u003c/span\u003e), and coefficient of variation (CV) (Eq.\u0026nbsp;3) of ground observed LAI values were used to compare with satellite-estimated LAI. Where the ground observed values are represented by X\u003csub\u003eObserved\u003c/sub\u003e and satellite-derived values are represented by X\u003csub\u003eestimated\u003c/sub\u003e\u003cdiv id=\"Equ1\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ1\" name=\"EquationSource\"\u003e\n$$\\:Mean=\\:\\frac{\\sum\\:{X}_{Observed\\:or\\:estimated}}{N}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e1\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equ2\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ2\" name=\"EquationSource\"\u003e\n$$\\:SD=\\:\\frac{\\sqrt{\\sum\\:{X}_{Observed\\:or\\:estimated}-Mean\\:obs\\:or\\:est.}}{N}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e2\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cdiv id=\"Sec14\" class=\"Section3\"\u003e \u003ch2\u003eCV = \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\frac{SD}{Mean}\\)\u003c/span\u003e\u003c/span\u003e (3)\u003c/h2\u003e \u003cp\u003eThe validation step was performed using statistical analysis such as Pearson\u0026rsquo;s correlation coefficient (R), Root Mean Square Error (RMSE), Mean Absolute Error (MAE) and multiplicative bias (MBias) between the X\u003csub\u003eobserved\u003c/sub\u003e and X\u003csub\u003eestimated\u003c/sub\u003e. The R (Eq.\u0026nbsp;\u003cspan refid=\"Equ3\" class=\"InternalRef\"\u003e4\u003c/span\u003e) is the commonly used statistical metric that gives the agreement between observed and estimated data with a score of positive 1 indicating perfect positive agreement, whereas a score of negative 1 indicates perfect inverse agreement.\u003cdiv id=\"Equ3\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ3\" name=\"EquationSource\"\u003e\n$$\\:R=\\:\\frac{n\\:(\\sum\\:{X}_{Observed}{X}_{estimated}\\:)-(\\sum\\:{X}_{Observed}\\left)\\right(\\sum\\:{X}_{estimated\\:}\\:)}{\\sqrt{[n\\sum\\:{{X}_{Observed}}^{2}-\\left(\\sum\\:{{X}_{Observed\\:})}^{2}]\\right[n\\sum\\:{{X}_{estimated\\:}}^{2}-(\\sum\\:{{X}_{estimated})}^{2}]}}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e4\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eRMSE (Eq.\u0026nbsp;\u003cspan refid=\"Equ4\" class=\"InternalRef\"\u003e5\u003c/span\u003e) and MAE (Eq.\u0026nbsp;\u003cspan refid=\"Equ5\" class=\"InternalRef\"\u003e6\u003c/span\u003e) are the most widely used statistical metrics for measuring model performances. The magnitude of deviation between observed and estimated data is given by RMSE and MAE, with a perfect score of 0 indicating no deviation.\u003cdiv id=\"Equ4\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ4\" name=\"EquationSource\"\u003e\n$$\\:RMSE=\\sqrt{\\frac{1}{n}\\:\\sum\\:_{i=1}^{n}{\\left({X}_{Observed}-{X}_{estimated}\\:\\right)}^{2}}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e5\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equ5\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ5\" name=\"EquationSource\"\u003e\n$$\\:MAE=\\:\\frac{1}{n}\\:\\sum\\:_{i=1}^{n}\\left|{X}_{Observed}-{X}_{estimated\\:}\\:\\right|\\:$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e6\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eThe Multiplicative Bias (Eq.\u0026nbsp;\u003cspan refid=\"Equ6\" class=\"InternalRef\"\u003e7\u003c/span\u003e) quantifies the ratio between estimated and observed values. An ideal estimation yields a value of 1, while underestimation produces values below 1, and overestimation results in values exceeding 1 (Moazami et al., \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2013\u003c/span\u003e).\u003cdiv id=\"Equ6\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ6\" name=\"EquationSource\"\u003e\n$$\\:MBias=\\:\\frac{{\\sum\\:}_{i=1}^{n}{X}_{estimated\\:}}{{\\sum\\:}_{i=1}^{n}{X}_{Observed}}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e7\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e"},{"header":"Results and discussion","content":"\u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003eParameter tuning and variable importance\u003c/h2\u003e \u003cp\u003eThe machine learning models were tuned for different hyperparameters to select the best hyperparameter values for Sentinel-2 and Landsat-8 data (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e3\u003c/span\u003e). The RF algorithm's mtry and ntrees parameters were tuned, with final mtry values of 10 and 6, and ntrees values of 500 and 1500 were used for Sentinel-2 and Landsat-8, respectively. The SVM-RBF algorithm was tuned for sigma and cost parameters, with final sigma values of 0.03125 and cost values of 64 and 32 used for Sentinel-2 and Landsat-8, respectively. The cost parameter was tuned in the case of the SVM-linear algorithm, with the best values obtained being 22.62742 and 181.0193 for Sentinel-2 and Landsat-8, respectively. The XGBoost algorithm's parameters, including nrounds, max depth, eta, and gamma, were adjusted. The ultimate values selected were 100 for nrounds, 5 for max depth, and 0.02 and 0.04 for eta, with corresponding gamma values of 0.01 and 0.09 for Sentinel-2 and Landsat-8, respectively.\u003c/p\u003e \u003cp\u003eOptimum band selection is very important for LAI estimation (He et al., \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). Hence, the variable importance, which provides information about the contribution of each predictor variable in model creation, was calculated (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e4\u003c/span\u003e). In the case of Sentinel-2 (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e4\u003c/span\u003ea), the SZA band, which represents the solar zenith angle, has the highest importance. Next to SZA, Bands 6 and 7 in Sentinel 2 MSI, which measure reflectance in the red-edge regions of the spectrum, have shown importance. Red edge bands in Sentinel 2 are strongly linked to LAI and can be used for retrieving LAI with a RMSE of 0.6 for various crop types (Delegido et al,2011) (He et al., \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). These are commonly considered important for LAI estimation because they provide information about the chlorophyll content and health of the vegetation (Xu et al., \u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e2019\u003c/span\u003e). Apart from Band 6 and 7, Band 8, which is the NIR band, has also been shown to be important in predicting LAI. The Near-Infrared (NIR) band holds significance in vegetation studies due to its high reflectance, attributed to the strong scattering of individual leaves and entire plants in this spectral region.\u003c/p\u003e \u003cp\u003eIn the case of Landsat-8 (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e4\u003c/span\u003eb), the B6 and B7, which are SWIR bands, have shown high importance. The NIR band in the case of Landsat-8 showed low importance as compared to Sentinel-2. The SWIR is sensitive to plant water content, and LAI is sensitive to the SWIR region, mostly in the 1000 nm to 1400 nm range (Jacquemoud et al., \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2009\u003c/span\u003e). The importance of SWIR in the prediction of LAI has been demonstrated by various studies (e.g., Srinet et al., \u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Abebe et al., \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). The green and red spectral regions were also found to be important in LAI estimation after SWIR bands. The red and green spectral regions mostly indicate the leaf development pattern of the crops, as reflectance in these regions mostly depends on the leaf growth process (Motohka et al., \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e2010\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003eLAI estimation using satellite data and machine learning algorithm and comparison with ground observed data\u003c/h2\u003e \u003cp\u003eThe evaluation of Leaf Area Index (LAI) estimation using Sentinel-2 and Landsat-8 satellite data, leveraging different machine learning algorithms, reveals several significant findings. These results, summarized in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, indicate distinct performance patterns across the algorithms: Random Forest (RF), Support Vector Machine with Radial Basis Function (SVM-RBF), Support Vector Machine with a linear kernel (SVM linear), and XGBoost.\u003c/p\u003e \u003cp\u003eThe LAI estimated from the Sentinel data using the Random Forest (RF) and SVM-RBF both achieved a high R-value of 0.94, suggesting a robust correlation between the field-observed and satellite-estimated LAI. They also recorded an RMSE of 0.40 and a close MAE (RF: 0.29, SVM-RBF: 0.30). The MBias values for RF and SVM-RBF were 1.02 and 1.00, respectively, indicating minimal overestimation by RF and perfect estimation by SVM-RBF. XGBoost also demonstrated a high R-value of 0.94, though slightly higher RMSE (0.43) and MAE (0.33) compared to RF and SVM-RBF. Its MBias was 0.88, showing a slight underestimation. SVM linear had a noticeably lower R-value of 0.84, coupled with higher RMSE (0.62) and MAE (0.48), indicating less accuracy. Its MBias was 1.02, indicating a slight overestimation.\u003c/p\u003e \u003cp\u003eThe LAI estimated from the Landsat 8 data using RF and SVM-RBF both achieved an R-value of 0.94, reflecting a strong relationship between observed and predicted LAI. The RMSE for RF was 0.38 and MAE 0.28, while SVM-RBF recorded the lowest RMSE of 0.37 and an MAE of 0.29. Their MBias values were 1.00 for RF and 0.99 for SVM-RBF, indicating almost perfect estimation. XGBoost had an R-value of 0.93, with RMSE and MAE values of 0.40 and 0.30, respectively. Its MBias was 1.01, reflecting a slight overestimation. SVM linear showed the lowest performance with an R-value of 0.78, higher RMSE (0.69), and MAE (0.53). The MBias for SVM linear was 0.98, indicating a slight underestimation.\u003c/p\u003e \u003cp\u003eOverall, the results indicate that RF and SVM-RBF consistently outperformed SVM linear and XGBoost across both satellite datasets. RF and SVM-RBF not only showed higher R-values but also lower RMSE and MAE values, which are critical for accurate LAI estimation. The MBias metric further underscored the precision of SVM-RBF, particularly with its perfect or near-perfect values, whereas RF showed slight overestimation tendencies. XGBoost, while comparable in terms of R-value, displayed marginally higher error metrics and variable MBias. SVM linear consistently underperformed across all metrics. These findings align with existing literature, which also recognizes RF and SVM (especially RBF kernel) as highly effective for vegetation parameter estimation, including LAI, in various ecosystems such as grasslands and forests (Omer et al., \u003cspan citationid=\"CR64\" class=\"CitationRef\"\u003e2016\u003c/span\u003e; Srinet et al., \u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Shen et al., \u003cspan citationid=\"CR62\" class=\"CitationRef\"\u003e2022\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eThe limitations drawn from the results are that the study relies on the availability of satellite data, which may not coincide perfectly with the ground observation dates. This mismatch can introduce discrepancies due to changes in crop conditions over time. The machine learning models were trained and validated using data from specific regions (Jhansi and Niwari). These models may not generalize well to different regions with different climatic, soil, and crop conditions without additional local calibration. The satellite data can be affected by atmospheric conditions such as cloud cover, haze, and aerosols, which can distort the reflectance values and impact the accuracy of LAI estimation. While atmospheric correction techniques are used, they may not fully eliminate these effects. The accuracy of the machine learning models depends on the quality and quantity of ground truth data. Limited or unevenly distributed ground truth data can reduce the robustness of the models. LAI is influenced by a multitude of factors including plant species, growth stages, and environmental conditions. Capturing this complexity requires highly detailed and temporally frequent data, which might not always be feasible.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eStatistical metrics derived for predicted LAI for Sentine-2 and Landsat-8 using different machine learning algorithm\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSatellite\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eML algorithm\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eR\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eRMSE\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eMAE\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eMBias\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003e\u003cb\u003eSentinel-2\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRF\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.94\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.40\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.29\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e1.02\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSVM-RBF\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.94\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.40\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.30\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e1.00\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSVM-Linear\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.84\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.62\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.48\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e1.02\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eXGBoost\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.94\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.43\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.33\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.88\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"Conclusion","content":"\u003cp\u003eThe present study focuses on validating the Leaf Area Index (LAI) estimated from Sentinel-2 and Landsat-8 satellite data against observed LAI measurements collected from smallholder farmers' fields in the districts of Jhansi and Niwari. The LAI estimation employed three advanced machine learning approaches: Random Forest (RF), Support Vector Machine (SVM), and XGBoost, utilizing all available spectral bands from the satellites. Additionally, the solar zenith angle (SZA) and view zenith angle (VZA) bands were integrated to enhance the accuracy of the LAI estimates. The key findings from the study were that the SVM and RF-based models showed a strong correlation with the observed LAI data and lower error rates compared to other methods. Specifically, the SVM approach was highly effective for Sentinel-2 data, while the RF approach excelled with Landsat-8 data. The variable importance analysis of the study helped to understand that the red-edge band and Near Infrared (NIR) bands in Sentinel-2 were identified as crucial for accurate LAI estimation. The red-edge band is significant due to its sensitivity to chlorophyll content and plant health, while the NIR band is important because of the high scattering of light by plant structures, which is indicative of biomass and leaf area. The Shortwave Infrared (SWIR) bands in Landsat-8, along with the red and green bands, were found to be highly important. The SWIR bands are sensitive to water content in plants, making them valuable for assessing plant health and stress. The red and green bands provide essential information on chlorophyll absorption and overall vegetation vigor. The inclusion of SZA and VZA bands helped to account for the variations in sunlight angles and sensor viewing angles, which can affect the reflectance values recorded by the satellites. This integration improved the robustness of the LAI estimates, making the models more reliable under varying observational conditions.\u003c/p\u003e \u003cp\u003eThis study demonstrates that machine learning approaches, particularly SVM for Sentinel-2 (R-value of 0.94, MBias of 1.00) and RF for Landsat-8 R-value of 0.94, MBias of 1.00), can provide highly accurate LAI estimates. These models leverage the specific strengths of different spectral bands to capture detailed information about plant health, biomass, and moisture content. The integration of additional bands like SZA and VZA further enhances the accuracy and reliability of these estimates under various observational conditions. The findings have significant implications for smallholder farmers in regions similar to Jhansi and Niwari. Accurate LAI estimation can help farmers monitor crop health, optimize resource use (such as water and fertilizers), and improve crop yield predictions. By validating these models with ground-truth data, the study provides a robust framework for using satellite data and machine learning in agricultural management and decision-making processes. This approach can be scaled and adapted to other regions and crop types, offering a powerful tool for enhancing agricultural productivity and sustainability through precise and timely information on crop health and growth. However, addressing the limitations mentioned is essential for further improving the accuracy and applicability of these models.\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003eConflict of interest\u003c/h2\u003e\n\u003cp\u003eThe authors declare that they have no known financial or personal conflicts of interest.\u003c/p\u003e\n\u003ch2\u003eFunding information\u003c/h2\u003e\n\u003cp\u003eThis research did not receive any financial support from any external organization.\u003c/p\u003e\n\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\n\u003cp\u003eP. P.: research conceptualization; data curation; primary analysis; writing the original draft. S. K.: partial analysis; writing and editing manuscript; S. N. K.: research conceptualization, manuscript editing, overall supervision. R. C. H.: field data collection and supervision. J. K. G.: field data collection. R. K.: manuscript editing.\u003c/p\u003e\n\u003ch2\u003eAcknowledgement\u003c/h2\u003e\n\u003cp\u003eThe first author gratefully acknowledges the Indian Council of Agricultural Research (ICAR) \u0026ndash; Indian Agricultural Research Institute for providing the opportunity for carrying out doctoral study. The authors also extend their gratitude to the National Aeronautics and Space Administration (NASA) and the European Space Agency (ESA) and for generously providing the satellite data essential for this research.\u003c/p\u003e\n\u003ch2\u003eData availability\u003c/h2\u003e\n\u003cp\u003eThe Sentinel 2 multispectral data was accessed and analysed in the Google Earth Engine cloud computing platform using the link \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://developers.google.com/earth-engine/datasets/catalog/sentinel-2\u003c/span\u003e\u003c/span\u003e, and the Landsat 8 data was accessed through \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://developers.google.com/earth-engine/datasets/catalog/landsat-8\u003c/span\u003e\u003c/span\u003e.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eAbebe, G., Tadesse, T. and Gessesse, B., 2022. Estimating Leaf Area Index and biomass of sugarcane based on Gaussian process regression using Landsat 8 and Sentinel 1A observations. \u003cem\u003eInternational Journal of Image and Data Fusion\u003c/em\u003e, pp.1-31.\u003c/li\u003e\n\u003cli\u003eAsner, G.P., Scurlock, J.M. and A. Hicke, J., 2003. Global synthesis of leaf area index observations: implications for ecological and remote sensing studies. \u003cem\u003eGlobal ecology and biogeography\u003c/em\u003e, \u003cem\u003e12\u003c/em\u003e(3), pp.191-205.\u003c/li\u003e\n\u003cli\u003eAtzberger, C., 2004. Object-based retrieval of biophysical canopy variables using artificial neural nets and radiative transfer models. \u003cem\u003eRemote sensing of environment\u003c/em\u003e, \u003cem\u003e93\u003c/em\u003e(1-2), pp.53-67.\u003c/li\u003e\n\u003cli\u003eBaret, F., Hagolle, O., Geiger, B., Bicheron, P., Miras, B., Huc, M., Berthelot, B., Ni\u0026ntilde;o, F., Weiss, M., Samain, O. and Roujean, J.L., 2007. LAI, fAPAR and fCover CYCLOPES global products derived from VEGETATION: Part 1: Principles of the algorithm. \u003cem\u003eRemote sensing of environment\u003c/em\u003e, \u003cem\u003e110\u003c/em\u003e(3), pp.275-286.\u003c/li\u003e\n\u003cli\u003eBaret, F., Weiss, M., Lacaze, R., Camacho, F., Makhmara, H., Pacholcyzk, P., \u0026amp; Smets, B. (2013). GEOV1: LAI and FAPAR essential climate variables and FCOVER global time series capitalizing over existing products. Part1: Principles of development and production. Remote sensing of environment, 137, 299-309.\u003c/li\u003e\n\u003cli\u003eBeck, P.S., Atzberger, C., H\u0026oslash;gda, K.A., Johansen, B. and Skidmore, A.K., 2006. Improved monitoring of vegetation dynamics at very high latitudes: A new method using MODIS NDVI. \u003cem\u003eRemote sensing of Environment\u003c/em\u003e, \u003cem\u003e100\u003c/em\u003e(3), pp.321-334.\u003c/li\u003e\n\u003cli\u003eBelgiu, M. and Drăguţ, L., 2016. Random forest in remote sensing: A review of applications and future directions. \u003cem\u003eISPRS journal of photogrammetry and remote sensing\u003c/em\u003e, \u003cem\u003e114\u003c/em\u003e, pp.24-31.\u003c/li\u003e\n\u003cli\u003eBreiman, L., 2001. Random forests. \u003cem\u003eMachine learning\u003c/em\u003e, \u003cem\u003e45\u003c/em\u003e, pp.5-32.\u003c/li\u003e\n\u003cli\u003eChaurasia, S., Nigam, R., Bhattacharya, B.K., Sridhar, V.N., Mallick, K., Vyas, S.P., Patel, N.K., Mukherjee, J., Shekhar, C., Kumar, D. and Singh, P., 2011. Development of regional wheat VI-LAI models using Resourcesat-1 AWiFS data. \u003cem\u003eJournal of Earth System Science\u003c/em\u003e, \u003cem\u003e120\u003c/em\u003e(6), p.1113.\u003c/li\u003e\n\u003cli\u003eChen, R.C., Dewi, C., Huang, S.W. and Caraka, R.E., 2020. Selecting critical features for data classification based on machine learning methods. \u003cem\u003eJournal of Big Data\u003c/em\u003e, \u003cem\u003e7\u003c/em\u003e(1), p.52.\u003c/li\u003e\n\u003cli\u003eChen, T. and Guestrin, C., 2016, August. Xgboost: A scalable tree boosting system. In \u003cem\u003eProceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining\u003c/em\u003e (pp. 785-794).\u003c/li\u003e\n\u003cli\u003eCui, Z. and Kerekes, J.P., 2018. Potential of red edge spectral bands in future landsat satellites on agroecosystem canopy green leaf area index retrieval. \u003cem\u003eRemote Sensing\u003c/em\u003e, \u003cem\u003e10\u003c/em\u003e(9), p.1458.\u003c/li\u003e\n\u003cli\u003eDente, L., Satalino, G., Mattia, F. and Rinaldi, M., 2008. Assimilation of leaf area index derived from ASAR and MERIS data into CERES-Wheat model to map wheat yield. \u003cem\u003eRemote sensing of Environment\u003c/em\u003e, \u003cem\u003e112\u003c/em\u003e(4), pp.1395-1407.\u003c/li\u003e\n\u003cli\u003eDjamai, N. and Fernandes, R., 2018. Comparison of SNAP-derived Sentinel-2A L2A product to ESA product over Europe. \u003cem\u003eRemote Sensing\u003c/em\u003e, \u003cem\u003e10\u003c/em\u003e(6), p.926.\u003c/li\u003e\n\u003cli\u003eDubey, S.K., Gavli, A.S., Yadav, S.K., Sehgal, S. and Ray, S.S., 2018. Remote sensing-based yield forecasting for sugarcane (Saccharum officinarum L.) crop in India. \u003cem\u003eJournal of the Indian Society of Remote Sensing\u003c/em\u003e, \u003cem\u003e46\u003c/em\u003e, pp.1823-1833. \u003c/li\u003e\n\u003cli\u003eFilipponi, F., 2021. Comparison of LAI Estimates from High Resolution Satellite Observations Using Different Biophysical Processors. In \u003cem\u003eBiology and Life Sciences Forum\u003c/em\u003e (Vol. 3, No. 1, p. 5). Multidisciplinary Digital Publishing Institute. \u003c/li\u003e\n\u003cli\u003eGanguly, S., Nemani, R.R., Zhang, G., Hashimoto, H., Milesi, C., Michaelis, A., Wang, W., Votava, P., Samanta, A., Melton, F. and Dungan, J.L., 2012. Generating global leaf area index from Landsat: Algorithm formulation and demonstration. \u003cem\u003eRemote Sensing of Environment\u003c/em\u003e, \u003cem\u003e122\u003c/em\u003e, pp.185-202.\u003c/li\u003e\n\u003cli\u003eGCOS ECV. (2024). Essential climate variables. GCOS. https://gcos.wmo.int/en/essential-climate-variables\u003c/li\u003e\n\u003cli\u003eGhosh, A. and Joshi, P.K., 2014. A comparison of selected classification algorithms for mapping bamboo patches in lower Gangetic plains using very high resolution WorldView 2 imagery. \u003cem\u003eInternational Journal of Applied Earth Observation and Geoinformation\u003c/em\u003e, \u003cem\u003e26\u003c/em\u003e, pp.298-311.\u003c/li\u003e\n\u003cli\u003eGozdowski, D., Stępień, M., Panek, E., Varghese, J., Bodecka, E., Rozbicki, J. and Samborski, S., 2020. Comparison of winter wheat NDVI data derived from Landsat 8 and active optical sensor at field scale. \u003cem\u003eRemote Sensing Applications: Society and Environment\u003c/em\u003e, \u003cem\u003e20\u003c/em\u003e, p.100409.\u003c/li\u003e\n\u003cli\u003eHe, L., Ren, X., Wang, Y., Liu, B., Zhang, H., Liu, W., Feng, W. \u0026amp; Guo, T. (2020). Comparing methods for estimating leaf area index by multi-angular remote sensing in winter wheat. Scientific Reports, 10(1), 13943.\u003c/li\u003e\n\u003cli\u003eHuang, J., G\u0026oacute;mez-Dans, J.L., Huang, H., Ma, H., Wu, Q., Lewis, P.E., Liang, S., Chen, Z., Xue, J.H., Wu, Y. and Zhao, F., 2019. Assimilation of remote sensing into crop growth models: Current status and perspectives. \u003cem\u003eAgricultural and forest meteorology\u003c/em\u003e, \u003cem\u003e276\u003c/em\u003e, p.107609.\u003c/li\u003e\n\u003cli\u003eJacquemoud, S., Verhoef, W., Baret, F., Bacour, C., Zarco-Tejada, P.J., Asner, G.P., Fran\u0026ccedil;ois, C. and Ustin, S.L., 2009. PROSPECT+ SAIL models: A review of use for vegetation characterization. \u003cem\u003eRemote sensing of environment\u003c/em\u003e, \u003cem\u003e113\u003c/em\u003e, pp.S56-S66.\u003c/li\u003e\n\u003cli\u003eJonckheere, I., Fleck, S., Nackaerts, K., Muys, B., Coppin, P., Weiss, M. and Baret, F., 2004. Review of methods for in situ leaf area index determination: Part I. Theories, sensors and hemispherical photography. \u003cem\u003eAgricultural and forest meteorology\u003c/em\u003e, \u003cem\u003e121\u003c/em\u003e(1-2), pp.19-35.\u003c/li\u003e\n\u003cli\u003eKang, Y., Ozdogan, M., Gao, F., Anderson, M.C., White, W.A., Yang, Y., Yang, Y. and Erickson, T.A., 2021. A data-driven approach to estimate leaf area index for Landsat images over the contiguous US. \u003cem\u003eRemote Sensing of Environment\u003c/em\u003e, \u003cem\u003e258\u003c/em\u003e, p.112383.\u003c/li\u003e\n\u003cli\u003eKavzoglu, T. and Colkesen, I., 2009. A kernel functions analysis for support vector machines for land cover classification. \u003cem\u003eInternational Journal of Applied Earth Observation and Geoinformation\u003c/em\u003e, \u003cem\u003e11\u003c/em\u003e(5), pp.352-359.\u003c/li\u003e\n\u003cli\u003eKavzoglu, T. and Teke, A., 2022. Advanced hyperparameter optimization for improved spatial prediction of shallow landslides using extreme gradient boosting (XGBoost). \u003cem\u003eBulletin of Engineering Geology and the Environment\u003c/em\u003e, \u003cem\u003e81\u003c/em\u003e(5), p.201.\u003c/li\u003e\n\u003cli\u003eKowalski, K., Senf, C., Hostert, P. and Pflugmacher, D., 2020. Characterizing spring phenology of temperate broadleaf forests using Landsat and Sentinel-2 time series. \u003cem\u003eInternational Journal of Applied Earth Observation and Geoinformation\u003c/em\u003e, \u003cem\u003e92\u003c/em\u003e, p.102172.\u003c/li\u003e\n\u003cli\u003eMaxwell, A.E., Warner, T.A. and Fang, F., 2018. Implementation of machine-learning classification in remote sensing: An applied review. \u003cem\u003eInternational Journal of Remote Sensing\u003c/em\u003e, \u003cem\u003e39\u003c/em\u003e(9), pp.2784-2817.\u003c/li\u003e\n\u003cli\u003eMkhabela, M.S., Bullock, P., Raj, S., Wang, S. and Yang, Y., 2011. Crop yield forecasting on the Canadian Prairies using MODIS NDVI data. \u003cem\u003eAgricultural and Forest Meteorology\u003c/em\u003e, \u003cem\u003e151\u003c/em\u003e(3), pp.385-393.\u003c/li\u003e\n\u003cli\u003eMoazami, S., Golian, S., Kavianpour, M.R. and Hong, Y., 2013. Comparison of PERSIANN and V7 TRMM Multi-satellite Precipitation Analysis (TMPA) products with rain gauge data over Iran. \u003cem\u003eInternational journal of remote sensing\u003c/em\u003e, \u003cem\u003e34\u003c/em\u003e(22), pp.8156-8171.\u003c/li\u003e\n\u003cli\u003eMotohka, T., Nasahara, K.N., Oguma, H. and Tsuchida, S., 2010. Applicability of green-red vegetation index for remote sensing of vegetation phenology. \u003cem\u003eRemote Sensing\u003c/em\u003e, \u003cem\u003e2\u003c/em\u003e(10), pp.2369-2387\u003c/li\u003e\n\u003cli\u003eMourad, R., Jaafar, H., Anderson, M. and Gao, F., 2020. Assessment of leaf area index models using harmonized landsat and sentinel-2 surface reflectance data over a semi-arid irrigated landscape. \u003cem\u003eRemote Sensing\u003c/em\u003e, \u003cem\u003e12\u003c/em\u003e(19), p.3121.\u003c/li\u003e\n\u003cli\u003eMudi, S., Paramanik, S., Behera, M.D., Prakash, A.J., Deep, N.R., Kale, M.P., Kumar, S., Sharma, N., Pradhan, P., Chavan, M. and Roy, P.S., 2022. Moderate resolution LAI prediction using Sentinel-2 satellite data and indirect field measurements in Sikkim Himalaya. \u003cem\u003eEnvironmental Monitoring and Assessment\u003c/em\u003e, \u003cem\u003e194\u003c/em\u003e(12), p.897.\u003c/li\u003e\n\u003cli\u003eNihar, A., Patel, N.R., Pokhariyal, S. and Danodia, A., 2022. Sugarcane crop type discrimination and area mapping at field scale using sentinel images and machine learning methods. \u003cem\u003eJournal of the Indian Society of Remote Sensing\u003c/em\u003e, pp.1-9.\u003c/li\u003e\n\u003cli\u003eOnojeghuo, A.O., Blackburn, G.A., Wang, Q., Atkinson, P.M., Kindred, D. and Miao, Y., 2018. Mapping paddy rice fields by applying machine learning algorithms to multi-temporal Sentinel-1A and Landsat data. \u003cem\u003eInternational journal of remote sensing\u003c/em\u003e, \u003cem\u003e39\u003c/em\u003e(4), pp.1042-1067.\u003c/li\u003e\n\u003cli\u003ePasqualotto, N., Bolognesi, S.F., Belfiore, O.R., Delegido, J., D\u0026rsquo;Urso, G. and Moreno, J., 2019, October. Canopy chlorophyll content and LAI estimation from Sentine1-2: Vegetation indices and Sentine1-2 Leve1-2A automatic products comparison. In \u003cem\u003e2019 IEEE International Workshop on Metrology for Agriculture and Forestry (MetroAgriFor)\u003c/em\u003e (pp. 301-306). IEEE. \u003c/li\u003e\n\u003cli\u003ePrasad, N.R., Patel, N.R. and Danodia, A., 2021. Cotton Yield Estimation Using Phenological Metrics Derived from Long-Term MODIS Data. \u003cem\u003eJournal of the Indian Society of Remote Sensing\u003c/em\u003e, \u003cem\u003e49\u003c/em\u003e, pp.2597-2610.\u003c/li\u003e\n\u003cli\u003eRaj, R., Walker, J. P., Pingale, R., Nandan, R., Naik, B., \u0026amp; Jagarlapudi, A. (2021). Leaf area index estimation using top-of-canopy airborne RGB images. International Journal of Applied Earth Observation and Geoinformation, 96, 102282.\u003c/li\u003e\n\u003cli\u003eRen, J., Chen, Z., Zhou, Q. and Tang, H., 2008. Regional yield estimation for winter wheat with MODIS-NDVI data in Shandong, China. \u003cem\u003eInternational Journal of Applied Earth Observation and Geoinformation\u003c/em\u003e, \u003cem\u003e10\u003c/em\u003e(4), pp.403-413.\u003c/li\u003e\n\u003cli\u003eRodriguez-Galiano, V.F., Ghimire, B., Rogan, J., Chica-Olmo, M. and Rigol-Sanchez, J.P., 2012. An assessment of the effectiveness of a random forest classifier for land-cover classification. \u003cem\u003eISPRS journal of photogrammetry and remote sensing\u003c/em\u003e, \u003cem\u003e67\u003c/em\u003e, pp.93-104.\u003c/li\u003e\n\u003cli\u003eSrinet, R., Nandy, S. and Patel, N.R., 2019. Estimating leaf area index and light extinction coefficient using Random Forest regression algorithm in a tropical moist deciduous forest, India. \u003cem\u003eEcological Informatics\u003c/em\u003e, \u003cem\u003e52\u003c/em\u003e, pp.94-102. \u003c/li\u003e\n\u003cli\u003eSun, Y., Qin, Q., Ren, H. and Zhang, Y., 2021. Decameter cropland LAI/FPAR estimation from sentinel-2 imagery using google earth engine. \u003cem\u003eIEEE Transactions on Geoscience and Remote Sensing\u003c/em\u003e, \u003cem\u003e60\u003c/em\u003e, pp.1-14.\u003c/li\u003e\n\u003cli\u003eSun, Y., Qin, Q., Ren, H., Zhang, T. and Chen, S., 2019. Red-edge band vegetation indices for leaf area index estimation from Sentinel-2/MSI imagery. \u003cem\u003eIEEE Transactions on Geoscience and Remote Sensing\u003c/em\u003e, \u003cem\u003e58\u003c/em\u003e(2), pp.826-840.\u003c/li\u003e\n\u003cli\u003eTrimble Navigation Limited (2012) GreenSeeker\u0026reg; Handheld Crop Sensor. Available at: https://agriculture.trimble.com/product/greenseeker-handheld-crop-sensor/.\u003c/li\u003e\n\u003cli\u003eTripathi, R., Sahoo, R.N., Gupta, V.K., Sehgal, V.K. and Sahoo, P.M., 2013. Developing Vegetation Health Index from biophysical variables derivedusing modis satellite data in the trans-gangetic plains of india. \u003cem\u003eEmirates Journal of Food and Agriculture\u003c/em\u003e, pp.376-384.\u003c/li\u003e\n\u003cli\u003eVerrelst, J., Rivera, J.P., Veroustraete, F., Mu\u0026ntilde;oz-Mar\u0026iacute;, J., Clevers, J.G., Camps-Valls, G. and Moreno, J., 2015. Experimental Sentinel-2 LAI estimation using parametric, non-parametric and physical retrieval methods\u0026ndash;A comparison. \u003cem\u003eISPRS Journal of Photogrammetry and Remote Sensing\u003c/em\u003e, \u003cem\u003e108\u003c/em\u003e, pp.260-272. \u003c/li\u003e\n\u003cli\u003eXie, Q., Dash, J., Huete, A., Jiang, A., Yin, G., Ding, Y., Peng, D., Hall, C.C., Brown, L., Shi, Y. and Ye, H., 2019. Retrieval of crop biophysical parameters from Sentinel-2 remote sensing imagery. \u003cem\u003eInternational Journal of Applied Earth Observation and Geoinformation\u003c/em\u003e, \u003cem\u003e80\u003c/em\u003e, pp.187-195.\u003c/li\u003e\n\u003cli\u003eXu, N., Tian, J., Tian, Q., Xu, K. and Tang, S., 2019. Analysis of vegetation red edge with different illuminated/shaded canopy proportions and to construct normalized difference canopy shadow index. \u003cem\u003eRemote Sensing\u003c/em\u003e, \u003cem\u003e11\u003c/em\u003e(10), p.1192.\u003c/li\u003e\n\u003cli\u003eZheng, B., Myint, S.W., Thenkabail, P.S. and Aggarwal, R.M., 2015. A support vector machine to identify irrigated crop types using time-series Landsat NDVI data. \u003cem\u003eInternational Journal of Applied Earth Observation and Geoinformation\u003c/em\u003e, \u003cem\u003e34\u003c/em\u003e, pp.103-112.\u003c/li\u003e\n\u003cli\u003eGanguly, S., Nemani, R., Zhang, G., Hashimoto, H., Milesi, C., Michaelis, A., Wang, W., Votava, P., Samanta, A., Melton, F., Dungan, J., Vermote, E., Gao, F., Knyazikhin, Y., \u0026amp; Myneni, R. (2012). Generating global Leaf Area Index from Landsat: Algorithm formulation and demonstration. Remote Sensing of Environment, 122, 185-202. https://doi.org/10.1016/J.RSE.2011.10.032.\u003c/li\u003e\n\u003cli\u003eAli, A., Darvishzadeh, R., \u0026amp; Skidmore, A. (2017). Retrieval of Specific Leaf Area From Landsat-8 Surface Reflectance Data Using Statistical and Physical Models. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 10, 3529-3536. https://doi.org/10.1109/JSTARS.2017.2690623.\u003c/li\u003e\n\u003cli\u003eGonzalez-Sanpedro, M., Toan, T., Moreno, J., Kergoat, L., \u0026amp; Rubio, E. (2008). Seasonal variations of leaf area index of agricultural fields retrieved from Landsat data. Remote Sensing of Environment, 112, 810-824. https://doi.org/10.1016/J.RSE.2007.06.018.\u003c/li\u003e\n\u003cli\u003eKimes, D. S., Knyazikhin, Y., Privette, J. L., Abuelgasim, A. A., \u0026amp; Gao, F. (2000). Inversion methods for physically‐based models. Remote Sensing Reviews, 18(2-4), 381-439.\u003c/li\u003e\n\u003cli\u003eWang, T., Xiao, Z., \u0026amp; Liu, Z. (2017). Performance Evaluation of Machine Learning Methods for Leaf Area Index Retrieval from Time-Series MODIS Reflectance Data. Sensors (Basel, Switzerland), 17. https://doi.org/10.3390/s17010081.\u003c/li\u003e\n\u003cli\u003eHouborg, R., \u0026amp; Mccabe, M. (2018). A hybrid training approach for leaf area index estimation via Cubist and random forests machine-learning. Isprs Journal of Photogrammetry and Remote Sensing, 135, 173-188. https://doi.org/10.1016/J.ISPRSJPRS.2017.10.004.\u003c/li\u003e\n\u003cli\u003eCervantes, J., Garc\u0026iacute;a, F., Rodr\u0026iacute;guez-Mazahua, L., \u0026amp; Chau, A. (2020). A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing, 408, 189-215. https://doi.org/10.1016/j.neucom.2019.10.118.\u003c/li\u003e\n\u003cli\u003eVapnik, V. (1979). Estimation of dependences based on empirical data. Springer-verlag.\u003c/li\u003e\n\u003cli\u003eSteinwart, I., Hush, D., \u0026amp; Scovel, C. (2006). An Explicit Description of the Reproducing Kernel Hilbert Spaces of Gaussian RBF Kernels. IEEE Transactions on Information Theory, 52, 4635-4643. https://doi.org/10.1109/TIT.2006.881713.\u003c/li\u003e\n\u003cli\u003eChakraborty, D., Sarkar, A., \u0026amp; Maulik, U. (2016). A new isotropic locality improved kernel for pattern classifications in remote sensing imagery. spatial statistics, 17, 71-82. https://doi.org/10.1016/J.SPASTA.2016.04.003.\u003c/li\u003e\n\u003cli\u003eDelegido, J., Verrelst, J., Alonso, L., \u0026amp; Moreno, J. (2011). Evaluation of Sentinel-2 Red-Edge Bands for Empirical Estimation of Green LAI and Chlorophyll Content. Sensors (Basel, Switzerland), 11, 7063 - 7081. https://doi.org/10.3390/s110707063.\u003c/li\u003e\n\u003cli\u003eShen, B., Ding, L., Ma, L., Li, Z., Pulatov, A., Kulenbekov, Z., Chen, J., Mambetova, S., Hou, L., Xu, D., Wang, X., \u0026amp; Xin, X. (2022). Modeling the Leaf Area Index of Inner Mongolia Grassland Based on Machine Learning Regression Algorithms Incorporating Empirical Knowledge. Remote. Sens., 14, 4196. https://doi.org/10.3390/rs14174196.\u003c/li\u003e\n\u003cli\u003eSrinet, R., Nandy, S., \u0026amp; Patel, N. (2019). Estimating leaf area index and light extinction coefficient using Random Forest regression algorithm in a tropical moist deciduous forest, India. Ecol. Informatics, 52, 94-102. https://doi.org/10.1016/J.ECOINF.2019.05.008.\u003c/li\u003e\n\u003cli\u003eOmer, G., Mutanga, O., Abdel-Rahman, E., \u0026amp; Adam, E. (2016). Empirical Prediction of Leaf Area Index (LAI) of Endangered Tree Species in Intact and Fragmented Indigenous Forests Ecosystems Using WorldView-2 Data and Two Robust Machine Learning Algorithms. Remote. Sens., 8, 324. https://doi.org/10.3390/rs8040324.\u003c/li\u003e\n\u003cli\u003eSiegmann, B., \u0026amp; Jarmer, T. (2015). Comparison of different regression models and validation techniques for the assessment of wheat leaf area index from hyperspectral data. International Journal of Remote Sensing, 36, 4519 - 4534. https://doi.org/10.1080/01431161.2015.1084438.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Landsat-8 OLI, Sentinel-2 MSI, Leaf Area Index, Spring wheat, Machine learning","lastPublishedDoi":"10.21203/rs.3.rs-4685508/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4685508/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThe study focuses on the estimation of Leaf Area Index (LAI) for smallholder farms less than 1 acre in semi-arid regions, particularly in Bundelkhand, India. Accurate LAI estimation is crucial for optimizing crop management practices, enhancing yield predictions, and improving the sustainability of agricultural operations. This study evaluates the efficiency of different machine learning algorithms in deriving LAI from Sentinel-2 and Landsat-8 data, with a focus on spring wheat across two growing seasons (2020\u0026ndash;2021 and 2021\u0026ndash;2022) in six villages in the Bundelkhand region of India. Three machine learning approaches\u0026mdash;Random Forest (RF), Support Vector Machine (SVM), and XGBoost\u0026mdash;were employed for LAI estimation. Validation against ground-truth LAI measurements was carried out using Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Pearson\u0026rsquo;s correlation coefficient (R), and Multiplicative Bias (MBias). Results indicate that RF and SVM with Radial Basis Function (SVM-RBF) achieved the highest accuracy for both Sentinel-2 and Landsat-8 data. For Sentinel-2, RF and SVM-RBF both achieved an R-value of 0.94, with RMSE of 0.40 and MAE of 0.29 and 0.30, respectively. RF showed a slight overestimation (MBias\u0026thinsp;=\u0026thinsp;1.02), while SVM-RBF had a perfect MBias of 1.00. XGBoost also performed well (R\u0026thinsp;=\u0026thinsp;0.94), though with slightly higher RMSE (0.43) and MAE (0.33), and an MBias of 0.88, indicating slight underestimation. SVM linear had lower performance metrics (R\u0026thinsp;=\u0026thinsp;0.84, RMSE\u0026thinsp;=\u0026thinsp;0.62, MAE\u0026thinsp;=\u0026thinsp;0.48, MBias\u0026thinsp;=\u0026thinsp;1.02). For Landsat-8, RF and SVM-RBF also showed strong performance (R\u0026thinsp;=\u0026thinsp;0.94), with RF achieving RMSE of 0.38 and MAE of 0.28, and SVM-RBF achieving the lowest RMSE of 0.37 and MAE of 0.29. Both had near-perfect MBias values (RF\u0026thinsp;=\u0026thinsp;1.00, SVM-RBF\u0026thinsp;=\u0026thinsp;0.99). XGBoost displayed a high R-value (0.93) but higher error metrics (RMSE\u0026thinsp;=\u0026thinsp;0.40, MAE\u0026thinsp;=\u0026thinsp;0.30, MBias\u0026thinsp;=\u0026thinsp;1.01). SVM linear underperformed (R\u0026thinsp;=\u0026thinsp;0.78, RMSE\u0026thinsp;=\u0026thinsp;0.69, MAE\u0026thinsp;=\u0026thinsp;0.53, MBias\u0026thinsp;=\u0026thinsp;0.98). Overall, RF and SVM-RBF consistently outperformed SVM linear and XGBoost across both satellite datasets.\u003c/p\u003e","manuscriptTitle":"Estimation of the satellite-derived Leaf Area Index of spring wheat using machine learning approaches","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-08-09 09:46:08","doi":"10.21203/rs.3.rs-4685508/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"4732654a-ee06-4fe1-a475-e476819282de","owner":[],"postedDate":"August 9th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2024-09-24T07:38:33+00:00","versionOfRecord":[],"versionCreatedAt":"2024-08-09 09:46:08","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-4685508","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4685508","identity":"rs-4685508","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.