Classification and Prediction of Growth Conditions in Food Barley Fields Using UAV Multispectral Images and Machine Learning Approaches | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Classification and Prediction of Growth Conditions in Food Barley Fields Using UAV Multispectral Images and Machine Learning Approaches Kento Mio, Rongling Ye, Osamu Watanabe This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9011430/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract (150 to 250 words) Spatial variability in barley maturation complicates preharvest classification for food-grade production and accurate conventional field-based assessments. Therefore, we classified barley growth into food-grade–oriented categories and developed a predictive framework integrating UAV-based multispectral imagery with machine learning. Grain quality metrics were first analyzed using principal component analysis and k-means clustering to define three physiologically distinct growth levels. Multi-temporal vegetation indices (NDVI, GNDVI, and NDRE) were extracted from UAV imagery, and key predictors were selected using Lasso regularization. Comparisons of Random Forest (RF), XGBoost, and Support Vector Machine models indicated that tree-based ensembles achieved high classification accuracy (>0.9), with late-season NDVI identified as the most influential predictor. Spatial mapping using the RF model revealed pronounced inter-field variability, identifying zones of incomplete maturation associated with lodging. Overall, integrating grain quality traits with UAV-based spectral monitoring allows accurate field-scale classification of food-grade barley growth and informs site-specific harvest management. Biological sciences/Computational biology and bioinformatics Biological sciences/Ecology Earth and environmental sciences/Ecology Earth and environmental sciences/Environmental sciences Physical sciences/Mathematics and computing Biological sciences/Plant sciences Barley UAV Machine learning Predict model framework Spatial prediction Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Introduction Sustainable agricultural production requires real-time monitoring of growth throughout the cultivation period. However, in recent years, the escalating uncertainty driven by extreme climate variability has exceeded the capacity of traditional, experience-based management in mitigating adverse impacts (Hossain et al., 2020 ; Omotoso et al., 2023 ; Teku, 2025 ). Barley ( Hordeum vulgare ) is the fourth most widely produced cereal worldwide (Yasuda, 2009 ). Although primarily utilized worldwide for animal feed and malting, the crop occupies a distinct role in East Asian food cultures, particularly in Japan and South Korea, where it is a staple in dishes such as barley-mixed rice and various processed foodstuffs. Owing to its superior water-soluble dietary fiber content relative to rice, barley is recognized as a medium-glycemic index (GI) food (Atkinson et al., 2021 ) that suppresses postprandial hyperglycemia and optimizes the intestinal environment through gut microbiota fermentation (Aoe et al., 2017 ; Mio et al., 2023 ; Miyamoto et al., 2018 ). Consequently, consumer demand for this grain as a functional, health-promoting food is steadily rising. The production and distribution of food-grade cultivars requires consistently high yields and physical characteristics, such as grain thickness, width, and moisture content, to ensure processing suitability and functional efficacy. Crop yield and quality are shaped by a complex interplay of endogenous and exogenous factors, including climatic conditions and soil responsiveness to nitrogen fertilization. However, these parameters are typically assessed post-harvest or during processing, a timeline that precludes any active intervention that can mitigate degradation risks during the growing season. Given that inter-annual fluctuations in these traits are intensifying due to recent climatic shifts, there is an urgent need to establish real-time growth monitoring methodologies throughout the cultivation period at the field level to facilitate timely cultivation management. Conventional methods for monitoring crops are labor-intensive, time-consuming, and often rely on destructive sampling (Ali et al., 2017 ). These limitations hinder the large-scale assessment of spatiotemporal dynamics across large scales. Notably, remote sensing (RS) technologies that use Unmanned Aerial Vehicles (UAVs) and satellites have emerged as effective tools for addressing this issue. RS enables high-frequency, non-destructive data acquisition over broad areas, facilitating the precise evaluation of crop status and disease risks. A substantial body of research has demonstrated the effectiveness of RS technologies in modeling barley growth and yield. UAV-based multispectral and RGB imagery, as well as integrated UAV–satellite frameworks, have been widely applied to estimate leaf area index and predict grain productivity with high accuracy, with reported R² values frequently exceeding 0.80 (Duffková et al., 2022 ; Ganeva et al., 2023 ; Herzig et al., 2021 ; Perich et al., 2023 ). Despite these advances, high yield prediction accuracy does not inherently translate to the reliable assessment of food-grade quality traits. Substantial evidence suggests that barley yield and quality are not consistently positively correlated. For example, genetic analysis have shown that quality-related quantitative trait loci (QTL) often operate independently of yield-related QTLs (Kochevenko et al., 2018 ), while increased nitrogen fertilization may enhance yield but reduce grain density and test weight (Habiyaremye et al., 2021 ). Consequently, predictive models that rely solely on yield data are subject to high uncertainty when estimating the specific quality parameters required for food-grade standards. To address these uncertainties, machine learning (ML) models capable of capturing complex interactions between environmental variables and crop outcomes have been increasingly adopted. ML algorithms achieve superior predictive performance because they do not assume linearity, offering a clear advantage when analyzing numerous non-linear agricultural factors. Several studies coupling spectral data from UAVs or satellites with ML have successfully estimated tiller density, nitrogen use efficiency, and nitrogen concentration (Hu et al., 2022 ; Li et al., 2025 ; Yan et al., 2025 ). Although challenges regarding model interpretability persist, ML remains a powerful tool for constructing highly accurate prediction models from multivariate UAV-derived data, provided that sufficient datasets are available. This study aimed to classify barley growth into food-grade-oriented categories and to develop a reliable predictive model by integrating UAV-based RS data with machine learning, linking quality parameters and phenology-specific vegetation indices to capture complex relationships between growth and grain quality. This approach is expected to provide a framework that allows accurate prediction of key quality traits, thereby supporting agriculture and targeted management strategies of barley. Materials and Methods Study Site and Data Acquisition The study was conducted across eight double-cropped barley-soybean fields located in Miyada Village, Kamiina District, Nagano Prefecture, Japan (35°45’26” N; 137°55’59” E) (Fig. 1 ). To monitor the crop during its ripening and maturation phases, aerial surveys were conducted on April 23, May 10, May 24, and June 1, 2024. All sorties were performed at approximately 10:00 AM using a multispectral camera integrated into an Unmanned Aerial Vehicle (UAV; Mavic 3 Multispectral, DJI Inc.). The UAV operated at an altitude of 100 m with 80% forward and side overlap. To ensure temporal data consistency, the automated missions followed identical trajectories and imaging parameters for each survey date. The acquired multispectral image were processed using a Pix4Dmapper (Pix4D S.A.) to generate high-resolution orthomosaics. For radiometric calibration, raw digital numbers were converted into reflectance maps for four spectral bands: green (G), red (R), red edge (RE), and near-infrared (NIR). This calibration involved capturing a standardized reflectance panel (Quantomics Co., Ltd.) immediately before each flight to account for variations in ambient light. On June 1, 2024, barley was harvested from 46 sampling points across the experimental fields, following a 15 m square grid design. Approximately five plants were collected from each designated location. To align these physical samples with the remote sensing datasets, 5 m 2 regions of interest (ROIs) were delineated on the vegetation index maps at the coordinates of each sampling site. Mean pixel values within these ROIs were then extracted to establish a direct correspondence with the quality parameters detailed in subsequent sections. Prior to analysis, the harvested samples were dried in a forced-air oven at 60°C for 72 hours to stabilize moisture and halt biological activity. To capture spatial heterogeneity within the fields, orthomosaic images were partitioned into a systematic hexagonal grid using QGIS (version 3.34, QGIS Development Team). Each hexagonal cell had a side length of 5 m. This hexagonal tessellation was selected to mitigate sampling bias and provide a more equidistant distribution between the centroids of adjacent cells compared with a standard rectangular grid (Birch et al., 2007 ). For each polygon, mean reflectance values were extracted across all spectral bands to represent local canopy characteristics. The multispectral sensor recorded data across four discrete wavebands with the following center wavelengths and full-width at half-maximum values: G: 560 ± 16 nm, R: 650 ± 16 nm, RE: 730 ± 16 nm, NIR: 860 ± 16 nm. The following VIs were calculated using spatially averaged reflectance data to evaluate the growth status and physiological conditions of the barley: Normalized Difference Vegetation Index (NDVI): $$NDVI=\frac{NIR-R}{NIR+R}$$ 1 Green Normalized Difference Vegetation Index (GNDVI): $$GNDVI=\frac{NIR-G}{NIR+G}$$ 2 Normalized Difference Red Edge Index (NDRE) $$NDRE=\frac{NIR-RE}{NIR+RE}$$ 3 Analysis of Barley Grains Following a 72-hour stabilization period at 60°C, the harvested samples underwent comprehensive quality analysis. To evaluate the morphological characteristics, 30 kernels were randomly selected from each sampling location to measure the length, width, and thickness using digital calipers. Thousand-grain weight (TKW) was determined by weighing 100 kernels in triplicate; and the mean of these measurements was scaled by a factor of ten to calculate the final TKW. Grain vitreousness was assessed using a specialized cracker (RN-840; Kett Electric Laboratory, Tokyo, Japan). Kernels were bisected cross-sectionally, and the internal structure was visually inspected to determine the percentage of vitreous grains per sample. To quantify the moisture content ( MC , %), the barley was pulverized for 60 seconds using a commercial mill. The resulting flour was then incubated at 120°C for two hours until a constant mass was achieved. The mass was recorded before and after drying using a precision electronic balance, and the MC was calculated using the following gravimetric equation: $$MC=\frac{{W}_{w}-{W}_{d}}{{W}_{w}}$$ 4 where is \({W}_{w}\) the wet weight of the sample before drying and \({W}_{d}\) is the dry weight after reaching a constant weight. Data Analysis and Model Development The development of the predictive model for food-grade barley growth followed a systematic three-stage methodological approach: (i) Characterization and Classification—quantification of grain quality metrics and the subsequent categorization of growth levels through cluster analysis; (ii) Dataset configuration and feature selection—construction of a machine learning dataset, supplemented by data augmentation and the identification of key vegetation indices via regularization; (iii) Model Implementation and Evaluation—deployment of multiple ML algorithms and a comparative assessment of their predictive performance. The complete analytical workflow is illustrated in Fig. 2 . All statistical computations, data processing, and predictive modeling were conducted in R (version 4.5.0; R Core Team). Characterization and Classification of Food-Grade Barley Growth Levels A multivariate analytical approach was employed to define discrete growth categories based on the six quality parameters measured at each sampling site. A distance matrix was first constructed using the vegdist function the ‘ade4’ package. To ensure compatibility Euclidean geometry for subsequent ordination, a quasi-Euclidean transformation was applied using the quasieuclid function. Principal Component Analysis (PCA) was performed using the dudi.pca function to reduce the dimensionality of quality traits. k-means Clustering was applied to the scores of the first two principal components (PC1 and PC2), and the optimal number of clusters ( k ) was determined using the elbow method, which evaluates the within-cluster sum of squares across a range of k values. The results indicated the three-cluster configuration ( k = 3) provided the most stable and representative partition of the data (Supplementary Fig. S1 ). Based on these multi-dimensional profiles, three distinct physiological growth categories were defined to serve as ground-truth labels for subsequent modeling: Level 0 (Poor Growth): Identified by a high frequency of immature grains and insufficient grain filling, representing areas with significant developmental delay. Level 1 (Sub-optimal Growth): Characterized by marginally elevated moisture content and suboptimal grain morphometry, specifically exhibiting a significant reduction in grain length relative to the optimal threshold. Level 2 (Optimal Growth): Distinguished by superior grain quality, including standardized dimensions and successful physiological maturation. The specific quality characteristics associated with each level are described in the results section. These categorical levels were used as the target variables for supervised machine learning. Data Augmentation and Spatial Correspondence To construct a reliable machine learning dataset, data augmentation was performed following the methodologies established by Morishita and Ishitsuka ( 2020 , 2021 ), which assume that barley quality characteristics remain relatively uniform within limited spatial proximity. Using QGIS, a 1.5 m radius circular buffer was generated around the precise GPS coordinates of each ground truth sampling point (Fig. 3 ). Previous studies have demonstrated that circular buffers of 1.0 to 1.5 m effectively expand agricultural datasets (Morishita M & Ishitsuka, 2020 , 2021 ). Given that this study utilized a similar multispectral sensor operating the same flight altitude (100 m), a 1.5 m radius was deemed appropriate. UAV-derived multispectral images were resampled to a spatial resolution of 0.25 m per pixel, and three vegetation indices—NDVI, GNDVI, and NDRE—were calculated for each pixel within each buffer. Each pixel was subsequently labeled with the growth category (level 0, 1, or 2) assigned to its corresponding central sampling point. This augmentation process expanded the initial ground-truth set to 5,309 training instances, providing a sufficiently large and diverse dataset to capture the spatial variability necessary for effective machine learning training. Feature Selection of Lasso Multinomial Logistic Regression Given the multi-temporal nature of UAV surveys, the initial dataset exhibited high dimensionality and significant potential for multicollinearity among vegetation indices across different observation dates. To identify the most influential predictors for each growth level while preventing model overfitting, we applied a multinomial logistic regression model with Lasso (L1) regularization using the ‘glmnet’ package. Lasso facilitates automated variable selection by minimizing the negative log-likelihood with an added penalty term, shrinking the coefficients of less informative variables to zero, and thereby isolating the most critical spectral features. The objective function \(J\) ( \(\beta\) ) to be minimized is defined as: $$J\left(\beta\right)=-\left[\frac{1}{N}\sum_{i=1}^{N}{I}_{i}\left(\beta\right)\right]+\lambda\sum_{j=1}^{p}\left|{\beta}_{j}\right|$$ 5 where, i (β) represents the log-likelihood for the i th observation within the multinomial distribution. Here, \(n\) denotes the total number of observations, \(p\) signifies the number of predictors encompassing various vegetation indices (VIs) across multiple survey dates, \(\beta\) represents the regression coefficients, and \(\lambda\) serves as the regularization parameter that dictates model sparsity. The optimal regularization parameter, \(\lambda\) , was determined via 10-fold cross-validation by selecting \({\lambda}_{min}\) the value that minimized the mean cross-validated error. Applying the L1 penalty reduces the coefficients of the less significant variables to zero, effectively performing automated feature selection. From this regularized set, we identified the top six VIs—with the highest absolute coefficients for each growth level were identified. These prioritized indices were utilized as the final explanatory variables for the subsequent comparative evaluation of the machine learning models. Machine Learning Model Construction and Evaluation To predict the classified growth categories, the following three machine learning algorithms were implemented: Random Forest (RF), eXtreme Gradient Boosting (XGBoost), and Support Vector Machine (SVM). Their classification performances were then compared. The growth levels (levels 0, 1, and 2) served as categorical target variables, whereas the top six vegetation indices identified via lasso regularization were utilized as explanatory variables. The augmented dataset was partitioned into training (80%) and validation (20%) sets. To ensure model stability and optimize the hyperparameters during the construction phase, we employed 5-fold cross-validation within the training set. For the SVM model, feature scaling was performed prior to training to prevent differences in the ranges of spectral indices from biasing support vector optimization. The predictive performance was evaluated using a multiclass classification framework based on four metrics widely utilized remote sensing and machine learning: Accuracy, Precision, Recall, and the F1-score (Powers D.M.W, 2011 ; Sokolova & Lapalme, 2009 ). For a specific growth level k , the following parameters were derived from the confusion matrix: True Positive ( TP k ), where growth level k is correctly classified as k . False Positive ( FP k ), where a level other than k is incorrectly classified as k . False Negative (FN k ), where growth level k is incorrectly classified as another level. True Negative ( TN k ): A level other than k is correctly classified as not being k . Based on these foundational parameters, the following evaluation metrics were calculated to quantify the predictive performance of the models across the three growth levels: \(Accuracy=\frac{{\sum}_{k-1}^{n}{TP}_{k}}{N}\) (6) \({Precision}_{k}=\frac{{TP}_{k}}{{TP}_{k}+{FP}_{k}}\) (7) \({Recall}_{k}=\frac{{TP}_{k}}{{TP}_{k}+{FN}_{k}}\) (8) \({F1}_{k}=2\times\frac{{Precision}_{k}\times{Recall}_{k}}{{Precision}_{k}+{Recall}_{k}}\) (9) Accuracy represents the total proportion of correctly classified instances across all growth levels, where n is the number of classes (levels 0, 1, and 2) and N is the total number of observations in the validation set. Precision indicates the model’s ability to correctly identify only the actual instances of a specific growth level, while recall measures the capacity to capture all relevant instances within that level. To provide a balanced assessment that is particularly useful for navigating class-specific trade-offs between precision and recall, the F1-score was calculated as the harmonic mean of the two. These metrics were calculated for each growth level to evaluate each algorithm comprehensively. Learning Curve Analysis Learning curve analyses were conducted to evaluate the influence of training data volume on classification performance and model stability. This standard diagnostic tool was used to monitor the training progression and generalization capabilities of machine learning and deep learning models (Mohr & van Rijn, 2024 ). The training dataset was partitioned into incremental subsets via stepwise subsampling at 10% intervals. At each increment, a subset was randomly drawn from the training pool to retrain the RF, XGBoost, and SVM models. By calculating the accuracy of both the training and validation sets at each step, we evaluated the model’s susceptibility to overfitting and its sensitivity to data scale. This analysis allowed us to determine whether the augmented dataset was sufficient to reach a performance plateau or whether further data acquisition would be required to enhance predictive stability. Model Interpretability via SHAP Analysis In contrast to traditional statistical methods, machine learning architectures, such as RF, XGBoost, and SVM, are often considered “black boxes,” making it difficult to interpret the logic of their predictions (Benos et al., 2021 ). To address this limitation and ensure agronomic interpretability, we employed the SHAP (SHapley Additive exPlanations) framework (Lundberg & Lee, 2017 ). The SHAP is a game-theoretic approach that assigns an importance value to each feature for a specific prediction, providing a unified measure of feature contributions that satisfies the key properties of local accuracy, missingness, and consistency. We utilized the Kernel SHAP method using the ‘shapviz’ package in R. This model-agnostic approach enables a consistent comparison of feature importance across different algorithms. By generating SHAP summary plots, we visualized both the global significance and directional impact of each vegetation index on the classification of barley growth levels. This allowed us to validate whether the decision-making process of the model was aligned with the established physiological principles of crop development.. Field-wide Spatial Prediction and Mapping To visualize the spatial growth distribution, a systematic hexagonal grid (5 m side length) was generated across the study area using QGIS. The mean values of the critical VIs identified via Lasso regularization were extracted for each hexagonal cell. The trained machine learning models were subsequently applied to these spatial datasets to predict and map the growth categories (levels 0, 1, and 2) for each grid cell across the experimental fields. Results Classification of barley samples based on quality traits The classification results derived from PCA and k-means clustering of the barley quality metrics (Supplementary Fig. S2) are illustrated in Fig. 4 A. PC1 accounted for 60.47% of the total variance, whereas PC2 accounted for 16.05%, collectively capturing over 76% of the cumulative variance in the dataset. Analysis of the factor loadings revealed that TKW, grain thickness, and grain width were associated with the negative direction of PC1, whereas grain length had a strong positive influence on PC2 (Fig. 4 B, C). Figure 4 D compares barley quality traits among the three identified clusters. The red cluster was characterized by high vitreousness but significantly lower grain thickness, grain width, and TKW than the other clusters. Although the blue and green clusters showed no substantial differences in quality values, the green cluster had longer grains and slightly lower moisture content. Based on these findings, we defined the clusters as follows: the red cluster as Level 0 (poor growth; predominantly immature), the blue cluster as Level 1 (suboptimal growth; slightly high moisture and short grains), and the green cluster as Level 2 (optimal growth). Temporal changes in vegetation indices and variable selection using Lasso regression Figure 5 A illustrates the temporal variations in the VIs for each growth category in the dataset expanded through circular buffer augmentation. From April 23 to May 24, Level 0 consistently exhibited lower NDVI, NDRE, and GNDVI values than Levels 1 and 2, with the largest differences observed on April 23 and May 10. By June 1, the indices for Levels 1 and 2 had declined, reflecting the reduction of chlorophyll content associated with crop senescence and maturation. In contrast, Level 0 maintained higher index values on the final date. The relative importance of the variables, based on the Lasso regression coefficients, is presented in Fig. 5 B. For Level 0, the most critical predictors were GNDVI, NDVI, and NDRE from May 24. For Level 1, GNDVI on June 1, together with GNDVI and NDVI on May 10, yielded high coefficients. For Level 2, GNDVI (from May 24 to April 23) and NDVI (from May 24) were identified as the primary indices. Based on these results, we prioritized six specific features, NDVI, GNDVI, and NDRE, from May 24 to June 1, as the final explanatory variables for the subsequent machine learning models. Machine learning model validation and field-scale growth prediction Table 1 summarizes the predictive performances of the RF, XGBoost, and SVM models. Across all the evaluation metrics, the performance hierarchy was as follows: XGBoost > RF > SVM. Both XGBoost and RF demonstrated high predictive capabilities, with accuracy scores exceeding 0.9, whereas SVM yielded marginally lower results. Learning curves (Fig. 6 ) were used to assess whether model performance could improve with additional data or had reached a plateau. The results indicated that XGBoost and RF exhibited narrower gaps between training and validation accuracy than SVM, suggesting greater resilience to overfitting and data insufficiency. As the training dataset increased, the RF model eventually surpassed the XGBoost model, indicating superior scalability for this dataset. Table 1 Prediction accuracy for each machine learning model Accuracy RF XGBoost SVM 0.903 0.912 0.869 Precision 0.901 0.916 0.872 Recall 0.903 0.912 0.869 F1 0.902 0.914 0.870 Each value indicates estimated probabilities. Abbreviations, RF: Random Forest, XGBoost: eXtreme Gradient Boosting, SVM: Support Vector Machine. Figures To verify the stability of feature importance and decision-making patterns, we compared SHAP summary plots across three distinct algorithms: RF, XGBoost, and SVM (Fig. 7 , Supplementary Fig. S3, S4). The analysis revealed strong consistency among models regarding the most influential spectral indices. Across all three classifiers, NDVI on June 1 consistently emerged as the dominant predictor for distinguishing growth levels, followed by GNDVI and NDRE on May 24. The directional effects of these features remained stable across modeling architectures. For Level 0, lower values of these indices yielded positive SHAP values, increasing the probability of poor growth classification. Conversely, for Level 2, higher values contributed positively to prediction probability, consistent with expected physiological trends. Although the magnitude and distribution of SHAP values exhibited minor variations, the SVM model displayed a more discrete distribution compared to the continuous spread observed in tree-based ensembles (RF and XGBoost), and the overall feature rankings and separation patterns between high and low feature values remained remarkably consistent. Based on learning curve analysis and its superior predictive accuracy, the RF model was selected to generate growth-level visualizations for the entire study site. Figure 8 illustrates the resulting spatial distribution using a systematic hexagonal grid in QGIS. The analysis revealed that inter-field variation is markedly greater than intra-field heterogeneity. The two fields situated on the western side were classified exclusively as Levels 1 and 2, suggesting favorable growth conditions across these sectors. In contrast, the eastern fields exhibited substantial developmental issues, with certain areas estimated to consist of more than 50% of poor growth (Level 0) zones. Discussion In this study, a reliable classification model for food-grade barley growth was developed by integrating UAV-derived multispectral imagery with ground-truth quality parameters. The methodology was structured as a four-stage analytical workflow. Initially, cluster analysis of grain quality traits established an objective basis for defining three distinct growth levels, successfully bridging the gap between remote sensing spectral patterns and agronomic reality. Subsequently, Lasso regression identified the VIs recorded from late May to early June as the most influential predictors, effectively optimizing the feature selection process. Third, a comparative assessment of machine learning algorithms revealed that the RF model delivered the most balanced performance regarding accuracy and generalizability, a finding further validated by learning curve and SHAP analyses. Finally, applying this predictive model to field-scale mapping successfully visualized the spatial heterogeneity of barley fields, highlighting the model's capacity to delineate inter-field disparities in crop maturation. These findings underscore the potential of combining regularized variable selection with ensemble learning to enhance precision management in barley production. PCA of the barley samples revealed that TKW, grain thickness, and grain width were the primary drivers of phenotypic variance, collectively accounting for > 70% of the total variation captured by the first two principal components. These results align with established findings, indicating that morphological traits, such as grain weight and size, are stable and highly heritable characteristics that fundamentally dictate yield formation (Paire et al., 2024 ; Sakamoto & Matsuoka, 2004 ). From a morphological perspective, this supports the physiological concept that the final grain weight is determined by the synergy between grain volume (potential size) and the efficiency of endosperm starch accumulation during the grain-filling period (Sakamoto & Matsuoka, 2004 ). Furthermore, the Level 0 cluster was characterized by high vitreousness, alongside low TKW, and reduced physical dimensions. Minimizing vitreousness is critical for food-grade barley, as previous research has demonstrated that a high level renders the grain unsuitable for human consumption (Okiyama et al., 2021 ). While vitreousness typically indicates high protein content and superior milling quality in durum wheat, in barley, it reflects a physiological failure of endosperm modification due to immaturity or terminal stress. This steep appearance of Level 0 grains suggests that they failed to complete the developmental transition from the proteinaceous matrix to the starchy endosperm (Lachutta & Jankowski, 2024 ). Significant differences in the VIs were observed between April 23 and May 10, providing an indirect proxy for the chlorophyll concentration and vegetative biomass during barley ripening. The divergence in spectral indices observed on June 1 served as a key biophysical indicator of maturation quality. During this period, Level 0 maintained elevated NDVI, NDRE, and GNDVI values, whereas Levels 1 and 2 exhibited a marked decline. In barley, achieving optimal maturity (levels 1 and 2) is accompanied by a reduction in spectral reflectance, which is indicative of natural chlorophyll degradation and canopy senescence. These physiological processes are essential for the efficient remobilization of nutrients into developing grains (Lekhana et al., 2025 ). Conversely, sustained greenness at Level 0 indicates a delayed maturation phenotype, in which plants fail to enter the terminal drying phase synchronously with the rest of the field. Late-season spectral data effectively identified zones of incomplete ripening, where grain texture was likely compromised (Materazzi, 2025 ). Capturing such phenological inflection points is critical for enhancing the precision of grain quality mapping (Adak et al., 2021 ). Overall, our findings underscore the importance of monitoring terminal growth stages via remote sensing, consistent with previous recommendations. The XGBoost and RF models outperformed SVM, achieving classification accuracies exceeding 0.9. Although growth levels consisted of three categorical classes, the VIs varied substantially across sampling locations. This variability likely favors tree-based ensemble models, which can flexibly capture complex, nonlinear relationships. These findings align with those of Deng et al. ( 2025 ), who confirmed that ensemble models effectively manage the high-dimensional interactions inherent in agricultural datasets (Deng et al., 2025 ). Moreover, learning curve analysis indicated that the RF model outperformed XGBoost in this study. Although XGBoost often surpasses RF in many applications, the opposite trend was observed. This discrepancy likely arose from the substantial variance in growth levels and significant spectral fluctuations across sites. Gradient boosting algorithms prioritize misclassified instances by assigning higher weights (Chen & Guestrin, 2016 ). Consequently, these models may be prone to overfitting and reduced generalization when applied to datasets characterized by significant noise or outliers (H. Zhao et al., 2025 ). In contrast, the RF model is inherently resilient to data variability and outliers. Such attributes facilitated superior generalization, particularly given the limited sample size. Moreover, learning curves confirmed RF as the optimal model for this analysis. SHAP analysis consistently identified the NDVI on June 1 as the dominant predictor across all evaluated algorithms, consistent with the finding of Adak et al. ( 2021 ) on maize maturation, who demonstrated that late-season VIs associated with post-anthesis senescence are critical determinants of grain filling and final yield (Adak et al., 2021 ). Similarly, an investigation into yield prediction factors for winter wheat in Kansas, USA, indicated that the NDVI during the terminal growth stages is the most significant predictor of productivity (Maranhão et al., 2025 ). Furthermore, recent studies on barley have emphasized the diagnostic importance of the VIs curve, particularly from its peak to the onset of senescence (Y. Zhao et al., 2025 ). This consistency with prior research suggests that our models prioritized senescence timing as the primary biophysical mechanism for differentiating growth levels in barley. This physiological phase is critical for accurately predicting flowering and maturation. The flag-leaf stage signifies the completion of canopy development, while the period from flowering to maturity, directly determines grain number and filling efficiency (Al-Ajlouni et al., 2016 ; Ugarte et al., 2007 ). Consequently, we suggest that late-season NDVI serves as a reliable proxy for barley growth performance, as it reflects encapsulates both grain-filling efficiency and stress-induced senescence. Spatial mapping using the RF model in QGIS revealed that inter-field variability was significantly more pronounced than intra-field heterogeneity in barley growth. To capture these nuances, a hexagonal grid system was used. Hexagonal tessellation is widely recognized for representing adjacency relationships more effectively than conventional square grids, minimizing geometric distortion of continuous ecological gradients, while reducing edge effects and sampling bias (Littidej et al., 2025 ). Unlike square grids, hexagonal cells each have six equidistant neighboring units. Consequently, mapping growth status at this resolution optimal for delineating localized areas of poor development, specifically those categorized as Level 0. The western fields exhibited uniform growth Levels 1 and 2, whereas the eastern fields exhibited pronounced spatial variation. Notably, one eastern field was estimated to have over 50% of its area classified at Level 0. These high-proportion Level 0 areas corresponded to locations where flood-induced lodging had been observed since late April (Supplemental Fig. S5). These findings suggest that the RF model can effectively characterize crop growth status at the field scale. For example, identifying Level 0 zones before harvest could facilitate selective harvesting, preventing the contamination of high-quality lots with inferior grains. Furthermore, the correlation between early-season lodging and Level 0 classification suggests that this model may serve as a diagnostic tool for drainage management. Visualizing these spatial patterns helps identify chronic flooding areas, thereby informing targeted infrastructure improvements or variable-rate interventions aimed at alleviating environmental stress during the maturation period. This study has several limitations. First, the analysis was based on a single-year dataset and does not account for inter-annual meteorological variability, including variations in precipitation and temperature. To develop a more adaptable and robust framework, longitudinal data collection and integration of environmental variables into predictive models are essential. Multi-year validation is also required to rigorously assess model reliability across diverse growing seasons. Previous research has shown that machine learning-based yield predictions often lack direct causal links to agronomic interventions, such as fertilization regimes, making the underlying drivers difficult to interpret (Kakimoto et al., 2022 ). To address this limitation, recent advancements in causal machine learning should aim to better quantify uncertainty and clarify complex cause-and-effect relationships (Tanaka & Yokoyama, 2023). Therefore, future research should incorporate cause frameworks to improve interpretability and enhance the practical utility of remote sensing in precision agriculture. Conclusion This study demonstrated that integrating grain quality metrics provides a sound basis for classifying barley growth suitable for food-grade production. Furthermore, combining UAV-derived multispectral imagery with machine learning architecture facilitated highly accurate predictions of these growth levels. By capturing preharvest conditions via remote sensing, this framework enables field scale assessment of crop status, offering the potential to implement site-specific harvest scheduling based on localized maturation patterns. Building on these developed models, future efforts should incorporate longitudinal, multi-year datasets, and meteorological variables to improve generalizability across seasons. Such advancements will enhance the model's adaptability to inter annual climatic fluctuations and support more resilient and precise management strategies for food-grade barley production. Declarations Competing interests The authors have no relevant financial or non-financial interests to disclose. Funding The authors received no specific funding for this work Author Contribution K.M and O.W designed and conceived this study. K.M collected data. K.M analyzed and interpreted the results and drafted the manuscript. O.W and R.Y supported statistical analyses. All authors read and approved the final manuscript. Acknowledgement The authors would like to express their sincere gratitude to the farmers and the staff of the Industrial Promotion Division at the Miyada Village Office in Japan for their significant cooperation in conducting this study. Data Availability The datasets generated and/or analyzed during the current study are not publicly available due to the privacy of the participating farmers but are available from the corresponding author on reasonable request. References Adak, A. et al. Temporal Vegetation Indices and Plant Height from Remotely Sensed Imagery Can Predict Grain Yield and Flowering Time Breeding Value in Maize via Machine Learning Regression. Remote Sens. 13 (11), 2141. https://doi.org/10.3390/rs13112141 (2021). Al-Ajlouni, Z. et al. Impact of Pre-Anthesis Water Deficit on Yield and Yield Components in Barley (Hordeum vulgare L.) Plants Grown under Controlled Conditions. Agronomy 6 (2), 33. https://doi.org/10.3390/agronomy6020033 (2016). Ali, M. M., Al-Ani, A., Eamus, D. & Tan, D. K. Y. Leaf nitrogen determination using non-destructive techniques–A review. J. Plant Nutr. 40 (7), 928–953. https://doi.org/10.1080/01904167.2016.1143954 (2017). Aoe, S. et al. Effects of high β-glucan barley on visceral fat obesity in Japanese individuals: A randomized, double-blind study. Nutrition 42 , 1–6. https://doi.org/https://doi.org/10.1016/j.nut.2017.05.002 (2017). Atkinson, F. S., Brand-Miller, J. C., Foster-Powell, K., Buyken, A. E. & Goletzke, J. International tables of glycemic index and glycemic load values 2021: a systematic review. Am. J. Clin. Nutr. 114 (5), 1625–1632. https://doi.org/https://doi.org/10.1093/ajcn/nqab233 (2021). Benos, L. et al. Machine Learning in Agriculture: A Comprehensive Updated Review. Sensors 21 (11), 3758. https://doi.org/10.3390/s21113758 (2021). Birch, C. P. D., Oom, S. P. & Beecham, J. A. Rectangular and hexagonal grids used for observation, experiment and simulation in ecology. Ecol. Model. 206 , 3–4. https://doi.org/10.1016/j.ecolmodel.2007.03.041 (2007). Chen, T. & Guestrin, C. XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16 , 785–794. (2016). https://doi.org/10.1145/2939672.2939785 Cossani, C. M., Palta, J. & Sadras, V. O. Genetic yield gain between 1942 and 2013 and associated changes in phenology, yield components and root traits of Australian barley. Plant. Soil. 480 (1), 151–163. https://doi.org/10.1007/s11104-022-05570-7 (2022). Deng, L. et al. Sorghum yield prediction using UAV multispectral imaging and stacking ensemble learning in arid regions. Frontiers in Plant Science , 16 . (2025). https://doi.org/10.3389/fpls.2025.1636015 Duffková, R., Poláková, L., Lukas, V. & Fučík, P. The Effect of Controlled Tile Drainage on Growth and Grain Yield of Spring Barley as Detected by UAV Images, Yield Map and Soil Moisture Content. Remote Sens. 14 (19), 4959. https://doi.org/10.3390/rs14194959 (2022). Ganeva, D. et al. Remotely Sensed Phenotypic Traits for Heritability Estimates and Grain Yield Prediction of Barley Using Multispectral Imaging from UAVs. Sensors 23 (11), 5008. https://doi.org/10.3390/s23115008 (2023). Habiyaremye, C. et al. Effect of Nitrogen and Seeding Rate on β-Glucan, Protein, and Grain Yield of Naked Food Barley in No-Till Cropping Systems in the Palouse Region of the Pacific Northwest. Frontiers in Sustainable Food Systems , 5 . (2021). https://doi.org/10.3389/fsufs.2021.663445 Herzig, P. et al. Evaluation of RGB and Multispectral Unmanned Aerial Vehicle (UAV) Imagery for High-Throughput Phenotyping and Yield Prediction in Barley Breeding. Remote Sens. 13 (14), 2670. https://doi.org/10.3390/rs13142670 (2021). Hossain, A. et al. Agricultural Land Degradation: Processes and Problems Undermining Future Food Security. In S. Fahad, M. Hasanuzzaman, M. Alam, H. Ullah, M. Saeed, I. Ali Khan, & M. Adnan (Eds.), Environment, Climate, Plant and Vegetation Growth (pp. 17–61). Springer International Publishing. (2020). https://doi.org/10.1007/978-3-030-49732-3_2 Hu, J. et al. Estimation of wheat tiller density using remote sensing data and machine learning methods. Frontiers in Plant Science , 13 . (2022). https://doi.org/10.3389/fpls.2022.1075856 Kakimoto, S., Mieno, T., Tanaka, T. S. T. & Bullock, D. S. Causal forest approach for site-specific input management via on-farm precision experimentation. Comput. Electron. Agric. 199 , 107164. https://doi.org/https://doi.org/10.1016/j.compag.2022.107164 (2022). Kochevenko, A. et al. Identification of QTL hot spots for malting quality in two elite breeding lines with distinct tolerance to abiotic stress. BMC Plant Biol. 18 (1), 106. https://doi.org/10.1186/s12870-018-1323-4 (2018). Lachutta, K. & Jankowski, K. J. The Quality of Winter Wheat Grain by Different Sowing Strategies and Nitrogen Fertilizer Rates: A Case Study in Northeastern Poland. Agriculture 14 (4), 552. https://doi.org/10.3390/agriculture14040552 (2024). Lekhana, M. V. et al. Effect of terminal heat stress on stay-green and senescence process could explain genetic variation in grain yield and nutritional profile in wheat (Triticum aestivum L). Plant. Growth Regul. 105 (5), 1545–1557. https://doi.org/10.1007/s10725-025-01345-z (2025). Li, Y., Wang, C., Zhu, J., Wang, Q. & Liu, P. Classification of Nitrogen-Efficient Wheat Varieties Based on UAV Hyperspectral Remote Sensing. Plants , 14 (13), 1908. (2025). https://doi.org/10.3390/plants14131908 Littidej, P., Pumhirunroj, B. & Slack, D. An Alternative Model for Predicting Rubber Yield in High-Density Plots in Ecological Areas Adjacent to the Mekong River Using Forest-Based Classification and Regression on a Hexagonal Grid. IEEE Access. 13 , 200028–200053. https://doi.org/10.1109/ACCESS.2025.3636231 (2025). Lundberg, S. & Lee, S. I. A Unified Approach to Interpreting Model Predictions . (2017). https://doi.org/10.48550/arXiv.1705.07874 Maranhão, R. L. A., Caldas, M. M., Kastens, J., Watson, J. & Lollato, R. P. Assessing NDVI, Climate, and Management to Predict Winter Wheat Yields at Field Scale in Kansas, USA. Remote Sens. 17 (20), 3500. https://doi.org/10.3390/rs17203500 (2025). Materazzi, F. Sowing, Monitoring, Detecting: A Possible Solution to Improve the Visibility of Cropmarks in Cultivated Fields. J. Imaging . 11 (3), 71. https://doi.org/10.3390/jimaging11030071 (2025). Mio, K. et al. A single administration of barley β-glucan and arabinoxylan extracts reduces blood glucose levels at the second meal via intestinal fermentation. Biosci. Biotechnol. Biochem. 87 (1), 99–107. https://doi.org/10.1093/bbb/zbac171 (2023). Miyamoto, J. et al. Barley β-glucan improves metabolic condition via short-chain fatty acids produced by gut microbial fermentation in high fat diet fed mice. PLOS ONE . 13 (4), e0196579. https://doi.org/10.1371/journal.pone.0196579 (2018). Mohr, F. & van Rijn, J. N. Learning curves for decision making in supervised machine learning: a survey. Mach. Learn. 113 (11). https://doi.org/10.1007/s10994-024-06619-7 (2024). Morishita, M. & Ishitsuka, N. Estimation of soil moisture distribution in soybean field using UAV. J. Japanese Agricultural Syst. Soc. 36 (4), 55–61. https://doi.org/10.14962/jass.36.4_55 (2020). Morishita, M. & Ishitsuka, N. Estimation of soil properties distribution using UAV observation and machine learning. J. Japanese Agricultural Syst. Soc. 37 (2), 21–28. https://doi.org/10.14962/jass.37.2_21 (2021). Okiyama, T. et al. Factors of Fluctuation in Glassy Grain Rate and β-Glucan Content and their Control by Fertilizing Technology in Barley Cultivar Shunrai for Barley Rice. Japanese J. Crop Sci. 90 (2), 194–205. https://doi.org/10.1626/jcs.90.194 (2021). Omotoso, A. B., Letsoalo, S., Olagunju, K. O., Tshwene, C. S. & Omotayo, A. O. Climate change and variability in sub-Saharan Africa: A systematic review of trends and impacts on agriculture. J. Clean. Prod. 414 , 137487. https://doi.org/https://doi.org/10.1016/j.jclepro.2023.137487 (2023). Paire, L., McCabe, C. & McCabe, T. Multi-model genome-wide association study on key organic naked barley agronomic, phenological, diseases, and grain quality traits. Euphytica 220 (7), 118. https://doi.org/10.1007/s10681-024-03374-7 (2024). Perich, G. et al. Pixel-based yield mapping and prediction from Sentinel-2 using spectral indices and neural networks. Field Crops Res. 292 , 108824. https://doi.org/https://doi.org/10.1016/j.fcr.2023.108824 (2023). Powers, D. M. W. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. Int. J. Mach. Learn. Technol. 2 (1), 37–64. https://doi.org/https://doi.org/10.48550/arXiv.2010.16061 (2011). Sakamoto, T. & Matsuoka, M. Generating high-yielding varieties by genetic manipulation of plant architecture. Curr. Opin. Biotechnol. 15 (2), 144–147. https://doi.org/https://doi.org/10.1016/j.copbio.2004.02.003 (2004). Sokolova, M. & Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 45 (4), 427–437. https://doi.org/https://doi.org/10.1016/j.ipm.2009.03.002 (2009). Teku, D. Navigating climate uncertainty: a comprehensive review of climatic variabilities and extreme events on environmental, socio-economic, and livelihood dimensions in Ethiopia with adaptation strategies. All Earth . 37 (1), 1–30. https://doi.org/10.1080/27669645.2025.2514418 (2025). Ugarte, C., Calderini, D. F. & Slafer, G. A. Grain weight and grain number responsiveness to pre-anthesis temperature in wheat, barley and triticale. Field Crops Res. 100 (2), 240–248. https://doi.org/https://doi.org/10.1016/j.fcr.2006.07.010 (2007). Yan, J. et al. Estimation of regional-scale maize plant nitrogen content based on multi-source remote sensing data. Front. Plant Sci. 16 https://doi.org/10.3389/fpls.2025.1669170 (2025). Yasuda, S. Establishment and characteristics of barley varieties. Breed. Res. 11 (4), 137–143. https://doi.org/10.1270/jsbbr.11.137 (2009). Zhao, H., Liu, W., Wang, Y. & Wu, L. Comparative analysis of algorithmic approaches in ensemble learning: bagging vs. boosting. Sci. Rep. 15 (1), 34218. https://doi.org/10.1038/s41598-025-15971-0 (2025). Zhao, Y., Jiang, R., Brider, J., Chapman, S. & Potgieter, A. Characterizing wheat and barley growth and phenology using multi-spectral remote sensing for site-specific precision agriculture. Silico Plants . 7 (2). https://doi.org/10.1093/insilicoplants/diaf013 (2025). Additional Declarations No competing interests reported. Supplementary Files SupplementalFigures.docx Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9011430","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":619694840,"identity":"07f086b4-976d-414f-a249-35af9a41d119","order_by":0,"name":"Kento Mio","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABEElEQVRIie3QsUrEMBjA8S8E4pLq2lKLrxAIHDj5Kg2CkxyKyw0d4tKp0PX6FgcHh5uVQKacrhlcXJwcCrec2MG0Di7m8DaR/MkSyI/kC0Ao9Ac7AgP4vS9QDRRaGNZXxEuSW0eQ1KiRvyVMjQSjRUuHfes9+Z1eq831HcmWT+sHddU/CxnnqNvC4dQnUPV4kTbmmK/sNFdN+ToQnFRAbnwEx2YClBCxspSpSCpxb3NI3SxCegg5eeMdJVgsa8MU7dV4y8cuQsGwNCqxWMClI2QkZOctMZhJMjeaz60jUam4pC/lacX8s5y1hnfdrMjq2vCNe1gmD86V3c6098d+CrknMZ3vQ8aK/UkoFAr91z4B1DVg2ZOLj70AAAAASUVORK5CYII=","orcid":"","institution":"Shinshu University","correspondingAuthor":true,"prefix":"","firstName":"Kento","middleName":"","lastName":"Mio","suffix":""},{"id":619694841,"identity":"41dcab19-c052-4752-972f-58524a6fda6c","order_by":1,"name":"Rongling Ye","email":"","orcid":"","institution":"Shinshu University","correspondingAuthor":false,"prefix":"","firstName":"Rongling","middleName":"","lastName":"Ye","suffix":""},{"id":619694842,"identity":"f76994ae-00ec-4c7d-a84c-d52b3d3f056b","order_by":2,"name":"Osamu Watanabe","email":"","orcid":"","institution":"Shinshu University","correspondingAuthor":false,"prefix":"","firstName":"Osamu","middleName":"","lastName":"Watanabe","suffix":""}],"badges":[],"createdAt":"2026-03-02 14:41:00","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-9011430/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9011430/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":106727489,"identity":"43dfb6cb-4abf-4631-acc8-03def72d1907","added_by":"auto","created_at":"2026-04-12 18:39:13","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":684299,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eStudy Site\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eRed dots indicate the locations where the barley ears were sampled for analysis.\u003c/p\u003e","description":"","filename":"image1.png","url":"https://assets-eu.researchsquare.com/files/rs-9011430/v1/aabeeb487a98d927381d16df.png"},{"id":106637032,"identity":"788127b1-f7d1-4acc-9d0c-f328f591692d","added_by":"auto","created_at":"2026-04-10 16:58:31","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":152605,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFramework used to create a model to predict the growth of food-grade barley\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"image2.png","url":"https://assets-eu.researchsquare.com/files/rs-9011430/v1/5164a650b6dfba4a2d7e112f.png"},{"id":106637035,"identity":"7c44fc00-7f6f-43b7-a1b3-09d06a5aae72","added_by":"auto","created_at":"2026-04-10 16:58:31","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":444722,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eExtending data with circular buffers using QGIS\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eEach pixel within the circular buffer was assumed to have uniform barley quality characteristics.\u003c/p\u003e","description":"","filename":"image3.png","url":"https://assets-eu.researchsquare.com/files/rs-9011430/v1/78c0a9b85ebef418b235f415.png"},{"id":106637036,"identity":"28e127ea-138c-4b75-92c2-9789dae9c3ef","added_by":"auto","created_at":"2026-04-10 16:58:31","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":156606,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eConstruction of growth prediction models and barley quality values by growth level.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(A) PCA score plot illustrating the clustering of samples based on quality traits. The colors correspond to growth levels 0 (red), 1 (blue), and 2 (green). Ellipses indicate the 95% confidence intervals for each group. (B, C) Factor loadings of barley quality traits on the first (PC1) and second (PC2) principal components. (D) Distribution of barley quality traits across the three growth levels. Box plots display the median and interquartile ranges for variables such as length, width, thickness, thousand kernel weight (TKW), moisture, and vitreousness.\u003c/p\u003e","description":"","filename":"image4.png","url":"https://assets-eu.researchsquare.com/files/rs-9011430/v1/2579868a5e2d0fa1b682f910.png"},{"id":106637040,"identity":"5b59b465-a22c-4789-ac80-d39ef8d8dc8d","added_by":"auto","created_at":"2026-04-10 16:58:31","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":154010,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eVIs across different growth levels and regression coefficients estimated via Lasso regression\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(A) Distributions of vegetation indices (GNDVI, NDRE, NDVI) categorized by growth level across four observation dates. The violin plots display the probability density of the data, with internal box plots indicating the median and interquartile ranges. (B) Regression coefficients for each variable estimated using a multinomial logistic regression model with Lasso regularization. The magnitude of the coefficient represents the importance of the variable for classifying each growth level. In both panels, red, blue, and green correspond to growth levels 0, 1, and 2, respectively.\u003c/p\u003e","description":"","filename":"image5.png","url":"https://assets-eu.researchsquare.com/files/rs-9011430/v1/9ff3233dc6d996de56ae805c.png"},{"id":106637037,"identity":"77f17f14-9c51-4b6b-b5ff-cabc1c419dde","added_by":"auto","created_at":"2026-04-10 16:58:31","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":132056,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eLearning curves of the machine learning models\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe x-axis represents the number of training samples (training set size), and the y-axis indicates the model performance. “Train” and “Test” represent the results for the training and validation datasets, respectively. Abbreviations, RF: Random Forest, XGBoost: eXtreme Gradient Boosting, SVM: Support Vector Machine.\u003c/p\u003e","description":"","filename":"image6.png","url":"https://assets-eu.researchsquare.com/files/rs-9011430/v1/173220dd6206486b9bcbd864.png"},{"id":106637039,"identity":"c85aaad7-bb8b-4a71-8d9a-360a56ab2977","added_by":"auto","created_at":"2026-04-10 16:58:31","extension":"jpeg","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":216138,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSHAP summary plots illustrate feature importance for the RF model across growth levels.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe panels correspond to growth levels of 0, 1, and 2. The features are ranked by their importance on the y-axis. The x-axis represents the SHAP value, which indicates the effect of each feature on the prediction probability of the model. The color gradient denotes the magnitude of the feature value, where lighter colors (green/yellow) represent higher values, and darker colors (purple/blue) represent lower values.\u003c/p\u003e","description":"","filename":"image7.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-9011430/v1/9a5aae2ea77dee45bc49a6e5.jpeg"},{"id":106993578,"identity":"a4905f7b-8be7-41e8-a4db-7208b310d2aa","added_by":"auto","created_at":"2026-04-15 14:38:02","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":717343,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePrediction Growth levels across the entire field by RF model.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe colors indicate the growth levels predicted by the Random Forest (RF) model; the red, blue, and green grids correspond to growth levels 0, 1, and 2, respectively.\u003c/p\u003e","description":"","filename":"image8.png","url":"https://assets-eu.researchsquare.com/files/rs-9011430/v1/74868ee60333bf3a3edf7999.png"},{"id":108180967,"identity":"a8ad0b10-71ea-4d15-9126-3492751a9a3b","added_by":"auto","created_at":"2026-04-30 08:55:48","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":3075129,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9011430/v1/5f4dfc33-eade-416a-b8d5-880fe1699edd.pdf"},{"id":106727096,"identity":"640ffc8b-5e97-44d7-a9c6-255112590d25","added_by":"auto","created_at":"2026-04-12 18:38:08","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":3143391,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementalFigures.docx","url":"https://assets-eu.researchsquare.com/files/rs-9011430/v1/5f1d41d55185dbf5d20399e0.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"Classification and Prediction of Growth Conditions in Food Barley Fields Using UAV Multispectral Images and Machine Learning Approaches","fulltext":[{"header":"Introduction","content":"\u003cp\u003eSustainable agricultural production requires real-time monitoring of growth throughout the cultivation period. However, in recent years, the escalating uncertainty driven by extreme climate variability has exceeded the capacity of traditional, experience-based management in mitigating adverse impacts (Hossain et al., \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Omotoso et al., \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Teku, \u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). Barley (\u003cem\u003eHordeum vulgare\u003c/em\u003e) is the fourth most widely produced cereal worldwide (Yasuda, \u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e2009\u003c/span\u003e). Although primarily utilized worldwide for animal feed and malting, the crop occupies a distinct role in East Asian food cultures, particularly in Japan and South Korea, where it is a staple in dishes such as barley-mixed rice and various processed foodstuffs. Owing to its superior water-soluble dietary fiber content relative to rice, barley is recognized as a medium-glycemic index (GI) food (Atkinson et al., \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2021\u003c/span\u003e) that suppresses postprandial hyperglycemia and optimizes the intestinal environment through gut microbiota fermentation (Aoe et al., \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2017\u003c/span\u003e; Mio et al., \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Miyamoto et al., \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). Consequently, consumer demand for this grain as a functional, health-promoting food is steadily rising. The production and distribution of food-grade cultivars requires consistently high yields and physical characteristics, such as grain thickness, width, and moisture content, to ensure processing suitability and functional efficacy. Crop yield and quality are shaped by a complex interplay of endogenous and exogenous factors, including climatic conditions and soil responsiveness to nitrogen fertilization. However, these parameters are typically assessed post-harvest or during processing, a timeline that precludes any active intervention that can mitigate degradation risks during the growing season. Given that inter-annual fluctuations in these traits are intensifying due to recent climatic shifts, there is an urgent need to establish real-time growth monitoring methodologies throughout the cultivation period at the field level to facilitate timely cultivation management.\u003c/p\u003e \u003cp\u003eConventional methods for monitoring crops are labor-intensive, time-consuming, and often rely on destructive sampling (Ali et al., \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2017\u003c/span\u003e). These limitations hinder the large-scale assessment of spatiotemporal dynamics across large scales. Notably, remote sensing (RS) technologies that use Unmanned Aerial Vehicles (UAVs) and satellites have emerged as effective tools for addressing this issue. RS enables high-frequency, non-destructive data acquisition over broad areas, facilitating the precise evaluation of crop status and disease risks. A substantial body of research has demonstrated the effectiveness of RS technologies in modeling barley growth and yield. UAV-based multispectral and RGB imagery, as well as integrated UAV\u0026ndash;satellite frameworks, have been widely applied to estimate leaf area index and predict grain productivity with high accuracy, with reported R\u0026sup2; values frequently exceeding 0.80 (Duffkov\u0026aacute; et al., \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Ganeva et al., \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Herzig et al., \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Perich et al., \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e2023\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eDespite these advances, high yield prediction accuracy does not inherently translate to the reliable assessment of food-grade quality traits. Substantial evidence suggests that barley yield and quality are not consistently positively correlated. For example, genetic analysis have shown that quality-related quantitative trait loci (QTL) often operate independently of yield-related QTLs (Kochevenko et al., \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2018\u003c/span\u003e), while increased nitrogen fertilization may enhance yield but reduce grain density and test weight (Habiyaremye et al., \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). Consequently, predictive models that rely solely on yield data are subject to high uncertainty when estimating the specific quality parameters required for food-grade standards.\u003c/p\u003e \u003cp\u003eTo address these uncertainties, machine learning (ML) models capable of capturing complex interactions between environmental variables and crop outcomes have been increasingly adopted. ML algorithms achieve superior predictive performance because they do not assume linearity, offering a clear advantage when analyzing numerous non-linear agricultural factors. Several studies coupling spectral data from UAVs or satellites with ML have successfully estimated tiller density, nitrogen use efficiency, and nitrogen concentration (Hu et al., \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Li et al., \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2025\u003c/span\u003e; Yan et al., \u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). Although challenges regarding model interpretability persist, ML remains a powerful tool for constructing highly accurate prediction models from multivariate UAV-derived data, provided that sufficient datasets are available.\u003c/p\u003e \u003cp\u003eThis study aimed to classify barley growth into food-grade-oriented categories and to develop a reliable predictive model by integrating UAV-based RS data with machine learning, linking quality parameters and phenology-specific vegetation indices to capture complex relationships between growth and grain quality. This approach is expected to provide a framework that allows accurate prediction of key quality traits, thereby supporting agriculture and targeted management strategies of barley.\u003c/p\u003e"},{"header":"Materials and Methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eStudy Site and Data Acquisition\u003c/h2\u003e \u003cp\u003eThe study was conducted across eight double-cropped barley-soybean fields located in Miyada Village, Kamiina District, Nagano Prefecture, Japan (35\u0026deg;45\u0026rsquo;26\u0026rdquo; N; 137\u0026deg;55\u0026rsquo;59\u0026rdquo; E) (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). To monitor the crop during its ripening and maturation phases, aerial surveys were conducted on April 23, May 10, May 24, and June 1, 2024. All sorties were performed at approximately 10:00 AM using a multispectral camera integrated into an Unmanned Aerial Vehicle (UAV; Mavic 3 Multispectral, DJI Inc.). The UAV operated at an altitude of 100 m with 80% forward and side overlap. To ensure temporal data consistency, the automated missions followed identical trajectories and imaging parameters for each survey date. The acquired multispectral image were processed using a Pix4Dmapper (Pix4D S.A.) to generate high-resolution orthomosaics. For radiometric calibration, raw digital numbers were converted into reflectance maps for four spectral bands: green (G), red (R), red edge (RE), and near-infrared (NIR). This calibration involved capturing a standardized reflectance panel (Quantomics Co., Ltd.) immediately before each flight to account for variations in ambient light. On June 1, 2024, barley was harvested from 46 sampling points across the experimental fields, following a 15 m square grid design. Approximately five plants were collected from each designated location. To align these physical samples with the remote sensing datasets, 5 m\u003csup\u003e2\u003c/sup\u003e regions of interest (ROIs) were delineated on the vegetation index maps at the coordinates of each sampling site. Mean pixel values within these ROIs were then extracted to establish a direct correspondence with the quality parameters detailed in subsequent sections. Prior to analysis, the harvested samples were dried in a forced-air oven at 60\u0026deg;C for 72 hours to stabilize moisture and halt biological activity.\u003c/p\u003e \u003cp\u003eTo capture spatial heterogeneity within the fields, orthomosaic images were partitioned into a systematic hexagonal grid using QGIS (version 3.34, QGIS Development Team). Each hexagonal cell had a side length of 5 m. This hexagonal tessellation was selected to mitigate sampling bias and provide a more equidistant distribution between the centroids of adjacent cells compared with a standard rectangular grid (Birch et al., \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2007\u003c/span\u003e). For each polygon, mean reflectance values were extracted across all spectral bands to represent local canopy characteristics. The multispectral sensor recorded data across four discrete wavebands with the following center wavelengths and full-width at half-maximum values: G: 560\u0026thinsp;\u0026plusmn;\u0026thinsp;16 nm, R: 650\u0026thinsp;\u0026plusmn;\u0026thinsp;16 nm, RE: 730\u0026thinsp;\u0026plusmn;\u0026thinsp;16 nm, NIR: 860\u0026thinsp;\u0026plusmn;\u0026thinsp;16 nm.\u003c/p\u003e \u003cp\u003eThe following VIs were calculated using spatially averaged reflectance data to evaluate the growth status and physiological conditions of the barley:\u003c/p\u003e \u003cp\u003eNormalized Difference Vegetation Index (NDVI):\u003cdiv id=\"Equ1\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ1\" name=\"EquationSource\"\u003e\n$$NDVI=\\frac{NIR-R}{NIR+R}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e1\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eGreen Normalized Difference Vegetation Index (GNDVI):\u003cdiv id=\"Equ2\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ2\" name=\"EquationSource\"\u003e\n$$GNDVI=\\frac{NIR-G}{NIR+G}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e2\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eNormalized Difference Red Edge Index (NDRE)\u003cdiv id=\"Equ3\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ3\" name=\"EquationSource\"\u003e\n$$NDRE=\\frac{NIR-RE}{NIR+RE}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e3\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eAnalysis of Barley Grains\u003c/h3\u003e\n\u003cp\u003eFollowing a 72-hour stabilization period at 60\u0026deg;C, the harvested samples underwent comprehensive quality analysis. To evaluate the morphological characteristics, 30 kernels were randomly selected from each sampling location to measure the length, width, and thickness using digital calipers. Thousand-grain weight (TKW) was determined by weighing 100 kernels in triplicate; and the mean of these measurements was scaled by a factor of ten to calculate the final TKW. Grain vitreousness was assessed using a specialized cracker (RN-840; Kett Electric Laboratory, Tokyo, Japan). Kernels were bisected cross-sectionally, and the internal structure was visually inspected to determine the percentage of vitreous grains per sample. To quantify the moisture content (\u003cem\u003eMC\u003c/em\u003e, %), the barley was pulverized for 60 seconds using a commercial mill. The resulting flour was then incubated at 120\u0026deg;C for two hours until a constant mass was achieved. The mass was recorded before and after drying using a precision electronic balance, and the MC was calculated using the following gravimetric equation:\u003cdiv id=\"Equ4\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ4\" name=\"EquationSource\"\u003e\n$$MC=\\frac{{W}_{w}-{W}_{d}}{{W}_{w}}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e4\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003ewhere is \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({W}_{w}\\)\u003c/span\u003e\u003c/span\u003e the wet weight of the sample before drying and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({W}_{d}\\)\u003c/span\u003e\u003c/span\u003e is the dry weight after reaching a constant weight.\u003c/p\u003e\n\u003ch3\u003eData Analysis and Model Development\u003c/h3\u003e\n\u003cp\u003eThe development of the predictive model for food-grade barley growth followed a systematic three-stage methodological approach: (i) Characterization and Classification\u0026mdash;quantification of grain quality metrics and the subsequent categorization of growth levels through cluster analysis; (ii) Dataset configuration and feature selection\u0026mdash;construction of a machine learning dataset, supplemented by data augmentation and the identification of key vegetation indices via regularization; (iii) Model Implementation and Evaluation\u0026mdash;deployment of multiple ML algorithms and a comparative assessment of their predictive performance. The complete analytical workflow is illustrated in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e. All statistical computations, data processing, and predictive modeling were conducted in R (version 4.5.0; R Core Team).\u003c/p\u003e\n\u003ch3\u003eCharacterization and Classification of Food-Grade Barley Growth Levels\u003c/h3\u003e\n\u003cp\u003eA multivariate analytical approach was employed to define discrete growth categories based on the six quality parameters measured at each sampling site. A distance matrix was first constructed using the vegdist function the \u0026lsquo;ade4\u0026rsquo; package. To ensure compatibility Euclidean geometry for subsequent ordination, a quasi-Euclidean transformation was applied using the quasieuclid function. Principal Component Analysis (PCA) was performed using the dudi.pca function to reduce the dimensionality of quality traits. k-means Clustering was applied to the scores of the first two principal components (PC1 and PC2), and the optimal number of clusters (\u003cem\u003ek\u003c/em\u003e) was determined using the elbow method, which evaluates the within-cluster sum of squares across a range of \u003cem\u003ek\u003c/em\u003e values. The results indicated the three-cluster configuration (\u003cem\u003ek\u003c/em\u003e\u0026thinsp;=\u0026thinsp;3) provided the most stable and representative partition of the data (Supplementary Fig. \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e). Based on these multi-dimensional profiles, three distinct physiological growth categories were defined to serve as ground-truth labels for subsequent modeling:\u003c/p\u003e \u003cp\u003eLevel 0 (Poor Growth): Identified by a high frequency of immature grains and insufficient grain filling, representing areas with significant developmental delay.\u003c/p\u003e \u003cp\u003eLevel 1 (Sub-optimal Growth): Characterized by marginally elevated moisture content and suboptimal grain morphometry, specifically exhibiting a significant reduction in grain length relative to the optimal threshold.\u003c/p\u003e \u003cp\u003eLevel 2 (Optimal Growth): Distinguished by superior grain quality, including standardized dimensions and successful physiological maturation.\u003c/p\u003e \u003cp\u003eThe specific quality characteristics associated with each level are described in the results section. These categorical levels were used as the target variables for supervised machine learning.\u003c/p\u003e\n\u003ch3\u003eData Augmentation and Spatial Correspondence\u003c/h3\u003e\n\u003cp\u003eTo construct a reliable machine learning dataset, data augmentation was performed following the methodologies established by Morishita and Ishitsuka (\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e2020\u003c/span\u003e, \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2021\u003c/span\u003e), which assume that barley quality characteristics remain relatively uniform within limited spatial proximity. Using QGIS, a 1.5 m radius circular buffer was generated around the precise GPS coordinates of each ground truth sampling point (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e). Previous studies have demonstrated that circular buffers of 1.0 to 1.5 m effectively expand agricultural datasets (Morishita M \u0026amp; Ishitsuka, \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e2020\u003c/span\u003e, \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). Given that this study utilized a similar multispectral sensor operating the same flight altitude (100 m), a 1.5 m radius was deemed appropriate. UAV-derived multispectral images were resampled to a spatial resolution of 0.25 m per pixel, and three vegetation indices\u0026mdash;NDVI, GNDVI, and NDRE\u0026mdash;were calculated for each pixel within each buffer. Each pixel was subsequently labeled with the growth category (level 0, 1, or 2) assigned to its corresponding central sampling point. This augmentation process expanded the initial ground-truth set to 5,309 training instances, providing a sufficiently large and diverse dataset to capture the spatial variability necessary for effective machine learning training.\u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eFeature Selection of Lasso Multinomial Logistic Regression\u003c/h2\u003e \u003cp\u003eGiven the multi-temporal nature of UAV surveys, the initial dataset exhibited high dimensionality and significant potential for multicollinearity among vegetation indices across different observation dates. To identify the most influential predictors for each growth level while preventing model overfitting, we applied a multinomial logistic regression model with Lasso (L1) regularization using the \u0026lsquo;glmnet\u0026rsquo; package. Lasso facilitates automated variable selection by minimizing the negative log-likelihood with an added penalty term, shrinking the coefficients of less informative variables to zero, and thereby isolating the most critical spectral features. The objective function \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(J\\)\u003c/span\u003e\u003c/span\u003e(\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\beta\\)\u003c/span\u003e\u003c/span\u003e) to be minimized is defined as:\u003cdiv id=\"Equ5\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ5\" name=\"EquationSource\"\u003e\n$$J\\left(\\beta\\right)=-\\left[\\frac{1}{N}\\sum_{i=1}^{N}{I}_{i}\\left(\\beta\\right)\\right]+\\lambda\\sum_{j=1}^{p}\\left|{\\beta}_{j}\\right|$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e5\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003ewhere, \u003cem\u003ei\u003c/em\u003e (β) represents the log-likelihood for the \u003cem\u003ei\u003c/em\u003e th observation within the multinomial distribution. Here, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(n\\)\u003c/span\u003e\u003c/span\u003e denotes the total number of observations, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(p\\)\u003c/span\u003e\u003c/span\u003e signifies the number of predictors encompassing various vegetation indices (VIs) across multiple survey dates, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\beta\\)\u003c/span\u003e\u003c/span\u003e represents the regression coefficients, and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\lambda\\)\u003c/span\u003e\u003c/span\u003e serves as the regularization parameter that dictates model sparsity. The optimal regularization parameter, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\lambda\\)\u003c/span\u003e\u003c/span\u003e, was determined via 10-fold cross-validation by selecting \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({\\lambda}_{min}\\)\u003c/span\u003e\u003c/span\u003e the value that minimized the mean cross-validated error. Applying the L1 penalty reduces the coefficients of the less significant variables to zero, effectively performing automated feature selection. From this regularized set, we identified the top six VIs\u0026mdash;with the highest absolute coefficients for each growth level were identified. These prioritized indices were utilized as the final explanatory variables for the subsequent comparative evaluation of the machine learning models.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eMachine Learning Model Construction and Evaluation\u003c/h3\u003e\n\u003cp\u003eTo predict the classified growth categories, the following three machine learning algorithms were implemented: Random Forest (RF), eXtreme Gradient Boosting (XGBoost), and Support Vector Machine (SVM). Their classification performances were then compared. The growth levels (levels 0, 1, and 2) served as categorical target variables, whereas the top six vegetation indices identified via lasso regularization were utilized as explanatory variables.\u003c/p\u003e \u003cp\u003eThe augmented dataset was partitioned into training (80%) and validation (20%) sets. To ensure model stability and optimize the hyperparameters during the construction phase, we employed 5-fold cross-validation within the training set. For the SVM model, feature scaling was performed prior to training to prevent differences in the ranges of spectral indices from biasing support vector optimization. The predictive performance was evaluated using a multiclass classification framework based on four metrics widely utilized remote sensing and machine learning: Accuracy, Precision, Recall, and the F1-score (Powers D.M.W, \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e2011\u003c/span\u003e; Sokolova \u0026amp; Lapalme, \u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e2009\u003c/span\u003e). For a specific growth level \u003cem\u003ek\u003c/em\u003e, the following parameters were derived from the confusion matrix: True Positive (\u003cem\u003eTP\u003c/em\u003e\u003csub\u003e\u003cem\u003ek\u003c/em\u003e\u003c/sub\u003e), where growth level \u003cem\u003ek\u003c/em\u003e is correctly classified as \u003cem\u003ek\u003c/em\u003e. False Positive (\u003cem\u003eFP\u003c/em\u003e\u003csub\u003e\u003cem\u003ek\u003c/em\u003e\u003c/sub\u003e), where a level other than \u003cem\u003ek\u003c/em\u003e is incorrectly classified as \u003cem\u003ek\u003c/em\u003e. False Negative \u003cem\u003e(FN\u003c/em\u003e\u003csub\u003e\u003cem\u003ek\u003c/em\u003e\u003c/sub\u003e), where growth level \u003cem\u003ek\u003c/em\u003e is incorrectly classified as another level. True Negative (\u003cem\u003eTN\u003c/em\u003e\u003csub\u003e\u003cem\u003ek\u003c/em\u003e\u003c/sub\u003e): A level other than \u003cem\u003ek\u003c/em\u003e is correctly classified as not being \u003cem\u003ek\u003c/em\u003e. Based on these foundational parameters, the following evaluation metrics were calculated to quantify the predictive performance of the models across the three growth levels:\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"No\" id=\"Taba\" border=\"1\"\u003e \u003ccolgroup cols=\"2\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(Accuracy=\\frac{{\\sum}_{k-1}^{n}{TP}_{k}}{N}\\)\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003e(6)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({Precision}_{k}=\\frac{{TP}_{k}}{{TP}_{k}+{FP}_{k}}\\)\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e(7)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({Recall}_{k}=\\frac{{TP}_{k}}{{TP}_{k}+{FN}_{k}}\\)\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e(8)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({F1}_{k}=2\\times\\frac{{Precision}_{k}\\times{Recall}_{k}}{{Precision}_{k}+{Recall}_{k}}\\)\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e(9)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eAccuracy represents the total proportion of correctly classified instances across all growth levels, where \u003cem\u003en\u003c/em\u003e is the number of classes (levels 0, 1, and 2) and \u003cem\u003eN\u003c/em\u003e is the total number of observations in the validation set. Precision indicates the model\u0026rsquo;s ability to correctly identify only the actual instances of a specific growth level, while recall measures the capacity to capture all relevant instances within that level. To provide a balanced assessment that is particularly useful for navigating class-specific trade-offs between precision and recall, the F1-score was calculated as the harmonic mean of the two. These metrics were calculated for each growth level to evaluate each algorithm comprehensively.\u003c/p\u003e\n\u003ch3\u003eLearning Curve Analysis\u003c/h3\u003e\n\u003cp\u003eLearning curve analyses were conducted to evaluate the influence of training data volume on classification performance and model stability. This standard diagnostic tool was used to monitor the training progression and generalization capabilities of machine learning and deep learning models (Mohr \u0026amp; van Rijn, \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). The training dataset was partitioned into incremental subsets via stepwise subsampling at 10% intervals. At each increment, a subset was randomly drawn from the training pool to retrain the RF, XGBoost, and SVM models. By calculating the accuracy of both the training and validation sets at each step, we evaluated the model\u0026rsquo;s susceptibility to overfitting and its sensitivity to data scale. This analysis allowed us to determine whether the augmented dataset was sufficient to reach a performance plateau or whether further data acquisition would be required to enhance predictive stability.\u003c/p\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003eModel Interpretability via SHAP Analysis\u003c/h2\u003e \u003cp\u003eIn contrast to traditional statistical methods, machine learning architectures, such as RF, XGBoost, and SVM, are often considered \u0026ldquo;black boxes,\u0026rdquo; making it difficult to interpret the logic of their predictions (Benos et al., \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). To address this limitation and ensure agronomic interpretability, we employed the SHAP (SHapley Additive exPlanations) framework (Lundberg \u0026amp; Lee, \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2017\u003c/span\u003e). The SHAP is a game-theoretic approach that assigns an importance value to each feature for a specific prediction, providing a unified measure of feature contributions that satisfies the key properties of local accuracy, missingness, and consistency. We utilized the Kernel SHAP method using the \u0026lsquo;shapviz\u0026rsquo; package in R. This model-agnostic approach enables a consistent comparison of feature importance across different algorithms. By generating SHAP summary plots, we visualized both the global significance and directional impact of each vegetation index on the classification of barley growth levels. This allowed us to validate whether the decision-making process of the model was aligned with the established physiological principles of crop development..\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003eField-wide Spatial Prediction and Mapping\u003c/h2\u003e \u003cp\u003eTo visualize the spatial growth distribution, a systematic hexagonal grid (5 m side length) was generated across the study area using QGIS. The mean values of the critical VIs identified via Lasso regularization were extracted for each hexagonal cell. The trained machine learning models were subsequently applied to these spatial datasets to predict and map the growth categories (levels 0, 1, and 2) for each grid cell across the experimental fields.\u003c/p\u003e \u003c/div\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003eClassification of barley samples based on quality traits\u003c/h2\u003e \u003cp\u003eThe classification results derived from PCA and k-means clustering of the barley quality metrics (Supplementary Fig. S2) are illustrated in Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eA. PC1 accounted for 60.47% of the total variance, whereas PC2 accounted for 16.05%, collectively capturing over 76% of the cumulative variance in the dataset. Analysis of the factor loadings revealed that TKW, grain thickness, and grain width were associated with the negative direction of PC1, whereas grain length had a strong positive influence on PC2 (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eB, C). Figure\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eD compares barley quality traits among the three identified clusters. The red cluster was characterized by high vitreousness but significantly lower grain thickness, grain width, and TKW than the other clusters. Although the blue and green clusters showed no substantial differences in quality values, the green cluster had longer grains and slightly lower moisture content. Based on these findings, we defined the clusters as follows: the red cluster as Level 0 (poor growth; predominantly immature), the blue cluster as Level 1 (suboptimal growth; slightly high moisture and short grains), and the green cluster as Level 2 (optimal growth).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003eTemporal changes in vegetation indices and variable selection using Lasso regression\u003c/h2\u003e \u003cp\u003eFigure\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eA illustrates the temporal variations in the VIs for each growth category in the dataset expanded through circular buffer augmentation. From April 23 to May 24, Level 0 consistently exhibited lower NDVI, NDRE, and GNDVI values than Levels 1 and 2, with the largest differences observed on April 23 and May 10. By June 1, the indices for Levels 1 and 2 had declined, reflecting the reduction of chlorophyll content associated with crop senescence and maturation. In contrast, Level 0 maintained higher index values on the final date. The relative importance of the variables, based on the Lasso regression coefficients, is presented in Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eB. For Level 0, the most critical predictors were GNDVI, NDVI, and NDRE from May 24. For Level 1, GNDVI on June 1, together with GNDVI and NDVI on May 10, yielded high coefficients. For Level 2, GNDVI (from May 24 to April 23) and NDVI (from May 24) were identified as the primary indices. Based on these results, we prioritized six specific features, NDVI, GNDVI, and NDRE, from May 24 to June 1, as the final explanatory variables for the subsequent machine learning models.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003eMachine learning model validation and field-scale growth prediction\u003c/h2\u003e \u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e summarizes the predictive performances of the RF, XGBoost, and SVM models. Across all the evaluation metrics, the performance hierarchy was as follows: XGBoost\u0026thinsp;\u0026gt;\u0026thinsp;RF\u0026thinsp;\u0026gt;\u0026thinsp;SVM. Both XGBoost and RF demonstrated high predictive capabilities, with accuracy scores exceeding 0.9, whereas SVM yielded marginally lower results. Learning curves (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e) were used to assess whether model performance could improve with additional data or had reached a plateau. The results indicated that XGBoost and RF exhibited narrower gaps between training and validation accuracy than SVM, suggesting greater resilience to overfitting and data insufficiency. As the training dataset increased, the RF model eventually surpassed the XGBoost model, indicating superior scalability for this dataset.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003ePrediction accuracy for each machine learning model\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eAccuracy\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRF\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eXGBoost\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eSVM\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.903\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.912\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.869\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.901\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.916\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.872\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.903\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.912\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.869\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eF1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.902\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.914\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.870\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"4\"\u003eEach value indicates estimated probabilities.\u003c/td\u003e\u003c/tr\u003e \u003ctr\u003e\u003ctd colspan=\"4\"\u003eAbbreviations, RF: Random Forest, XGBoost: eXtreme Gradient Boosting, SVM: Support Vector Machine.\u003c/td\u003e\u003c/tr\u003e \u003ctr\u003e\u003ctd colspan=\"4\"\u003e\u003cb\u003eFigures\u003c/b\u003e\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eTo verify the stability of feature importance and decision-making patterns, we compared SHAP summary plots across three distinct algorithms: RF, XGBoost, and SVM (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003e, Supplementary Fig. S3, S4). The analysis revealed strong consistency among models regarding the most influential spectral indices. Across all three classifiers, NDVI on June 1 consistently emerged as the dominant predictor for distinguishing growth levels, followed by GNDVI and NDRE on May 24. The directional effects of these features remained stable across modeling architectures. For Level 0, lower values of these indices yielded positive SHAP values, increasing the probability of poor growth classification. Conversely, for Level 2, higher values contributed positively to prediction probability, consistent with expected physiological trends. Although the magnitude and distribution of SHAP values exhibited minor variations, the SVM model displayed a more discrete distribution compared to the continuous spread observed in tree-based ensembles (RF and XGBoost), and the overall feature rankings and separation patterns between high and low feature values remained remarkably consistent.\u003c/p\u003e \u003cp\u003eBased on learning curve analysis and its superior predictive accuracy, the RF model was selected to generate growth-level visualizations for the entire study site. Figure\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003e illustrates the resulting spatial distribution using a systematic hexagonal grid in QGIS. The analysis revealed that inter-field variation is markedly greater than intra-field heterogeneity. The two fields situated on the western side were classified exclusively as Levels 1 and 2, suggesting favorable growth conditions across these sectors. In contrast, the eastern fields exhibited substantial developmental issues, with certain areas estimated to consist of more than 50% of poor growth (Level 0) zones.\u003c/p\u003e \u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003eIn this study, a reliable classification model for food-grade barley growth was developed by integrating UAV-derived multispectral imagery with ground-truth quality parameters. The methodology was structured as a four-stage analytical workflow. Initially, cluster analysis of grain quality traits established an objective basis for defining three distinct growth levels, successfully bridging the gap between remote sensing spectral patterns and agronomic reality. Subsequently, Lasso regression identified the VIs recorded from late May to early June as the most influential predictors, effectively optimizing the feature selection process. Third, a comparative assessment of machine learning algorithms revealed that the RF model delivered the most balanced performance regarding accuracy and generalizability, a finding further validated by learning curve and SHAP analyses. Finally, applying this predictive model to field-scale mapping successfully visualized the spatial heterogeneity of barley fields, highlighting the model's capacity to delineate inter-field disparities in crop maturation. These findings underscore the potential of combining regularized variable selection with ensemble learning to enhance precision management in barley production.\u003c/p\u003e \u003cp\u003ePCA of the barley samples revealed that TKW, grain thickness, and grain width were the primary drivers of phenotypic variance, collectively accounting for \u0026gt;\u0026thinsp;70% of the total variation captured by the first two principal components. These results align with established findings, indicating that morphological traits, such as grain weight and size, are stable and highly heritable characteristics that fundamentally dictate yield formation (Paire et al., \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Sakamoto \u0026amp; Matsuoka, \u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e2004\u003c/span\u003e). From a morphological perspective, this supports the physiological concept that the final grain weight is determined by the synergy between grain volume (potential size) and the efficiency of endosperm starch accumulation during the grain-filling period (Sakamoto \u0026amp; Matsuoka, \u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e2004\u003c/span\u003e). Furthermore, the Level 0 cluster was characterized by high vitreousness, alongside low TKW, and reduced physical dimensions. Minimizing vitreousness is critical for food-grade barley, as previous research has demonstrated that a high level renders the grain unsuitable for human consumption (Okiyama et al., \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). While vitreousness typically indicates high protein content and superior milling quality in durum wheat, in barley, it reflects a physiological failure of endosperm modification due to immaturity or terminal stress. This steep appearance of Level 0 grains suggests that they failed to complete the developmental transition from the proteinaceous matrix to the starchy endosperm (Lachutta \u0026amp; Jankowski, \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e2024\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eSignificant differences in the VIs were observed between April 23 and May 10, providing an indirect proxy for the chlorophyll concentration and vegetative biomass during barley ripening. The divergence in spectral indices observed on June 1 served as a key biophysical indicator of maturation quality. During this period, Level 0 maintained elevated NDVI, NDRE, and GNDVI values, whereas Levels 1 and 2 exhibited a marked decline.\u003c/p\u003e \u003cp\u003eIn barley, achieving optimal maturity (levels 1 and 2) is accompanied by a reduction in spectral reflectance, which is indicative of natural chlorophyll degradation and canopy senescence. These physiological processes are essential for the efficient remobilization of nutrients into developing grains (Lekhana et al., \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). Conversely, sustained greenness at Level 0 indicates a delayed maturation phenotype, in which plants fail to enter the terminal drying phase synchronously with the rest of the field. Late-season spectral data effectively identified zones of incomplete ripening, where grain texture was likely compromised (Materazzi, \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). Capturing such phenological inflection points is critical for enhancing the precision of grain quality mapping (Adak et al., \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). Overall, our findings underscore the importance of monitoring terminal growth stages via remote sensing, consistent with previous recommendations.\u003c/p\u003e \u003cp\u003eThe XGBoost and RF models outperformed SVM, achieving classification accuracies exceeding 0.9. Although growth levels consisted of three categorical classes, the VIs varied substantially across sampling locations. This variability likely favors tree-based ensemble models, which can flexibly capture complex, nonlinear relationships. These findings align with those of Deng et al. (\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2025\u003c/span\u003e), who confirmed that ensemble models effectively manage the high-dimensional interactions inherent in agricultural datasets (Deng et al., \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). Moreover, learning curve analysis indicated that the RF model outperformed XGBoost in this study. Although XGBoost often surpasses RF in many applications, the opposite trend was observed. This discrepancy likely arose from the substantial variance in growth levels and significant spectral fluctuations across sites. Gradient boosting algorithms prioritize misclassified instances by assigning higher weights (Chen \u0026amp; Guestrin, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2016\u003c/span\u003e). Consequently, these models may be prone to overfitting and reduced generalization when applied to datasets characterized by significant noise or outliers (H. Zhao et al., \u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). In contrast, the RF model is inherently resilient to data variability and outliers. Such attributes facilitated superior generalization, particularly given the limited sample size. Moreover, learning curves confirmed RF as the optimal model for this analysis.\u003c/p\u003e \u003cp\u003eSHAP analysis consistently identified the NDVI on June 1 as the dominant predictor across all evaluated algorithms, consistent with the finding of Adak et al. (\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2021\u003c/span\u003e) on maize maturation, who demonstrated that late-season VIs associated with post-anthesis senescence are critical determinants of grain filling and final yield (Adak et al., \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). Similarly, an investigation into yield prediction factors for winter wheat in Kansas, USA, indicated that the NDVI during the terminal growth stages is the most significant predictor of productivity (Maranh\u0026atilde;o et al., \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). Furthermore, recent studies on barley have emphasized the diagnostic importance of the VIs curve, particularly from its peak to the onset of senescence (Y. Zhao et al., \u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). This consistency with prior research suggests that our models prioritized senescence timing as the primary biophysical mechanism for differentiating growth levels in barley. This physiological phase is critical for accurately predicting flowering and maturation. The flag-leaf stage signifies the completion of canopy development, while the period from flowering to maturity, directly determines grain number and filling efficiency (Al-Ajlouni et al., \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2016\u003c/span\u003e; Ugarte et al., \u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e2007\u003c/span\u003e). Consequently, we suggest that late-season NDVI serves as a reliable proxy for barley growth performance, as it reflects encapsulates both grain-filling efficiency and stress-induced senescence.\u003c/p\u003e \u003cp\u003eSpatial mapping using the RF model in QGIS revealed that inter-field variability was significantly more pronounced than intra-field heterogeneity in barley growth. To capture these nuances, a hexagonal grid system was used. Hexagonal tessellation is widely recognized for representing adjacency relationships more effectively than conventional square grids, minimizing geometric distortion of continuous ecological gradients, while reducing edge effects and sampling bias (Littidej et al., \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). Unlike square grids, hexagonal cells each have six equidistant neighboring units. Consequently, mapping growth status at this resolution optimal for delineating localized areas of poor development, specifically those categorized as Level 0. The western fields exhibited uniform growth Levels 1 and 2, whereas the eastern fields exhibited pronounced spatial variation. Notably, one eastern field was estimated to have over 50% of its area classified at Level 0. These high-proportion Level 0 areas corresponded to locations where flood-induced lodging had been observed since late April (Supplemental Fig. S5). These findings suggest that the RF model can effectively characterize crop growth status at the field scale. For example, identifying Level 0 zones before harvest could facilitate selective harvesting, preventing the contamination of high-quality lots with inferior grains. Furthermore, the correlation between early-season lodging and Level 0 classification suggests that this model may serve as a diagnostic tool for drainage management. Visualizing these spatial patterns helps identify chronic flooding areas, thereby informing targeted infrastructure improvements or variable-rate interventions aimed at alleviating environmental stress during the maturation period.\u003c/p\u003e \u003cp\u003eThis study has several limitations. First, the analysis was based on a single-year dataset and does not account for inter-annual meteorological variability, including variations in precipitation and temperature. To develop a more adaptable and robust framework, longitudinal data collection and integration of environmental variables into predictive models are essential. Multi-year validation is also required to rigorously assess model reliability across diverse growing seasons. Previous research has shown that machine learning-based yield predictions often lack direct causal links to agronomic interventions, such as fertilization regimes, making the underlying drivers difficult to interpret (Kakimoto et al., \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). To address this limitation, recent advancements in causal machine learning should aim to better quantify uncertainty and clarify complex cause-and-effect relationships (Tanaka \u0026amp; Yokoyama, 2023). Therefore, future research should incorporate cause frameworks to improve interpretability and enhance the practical utility of remote sensing in precision agriculture.\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eThis study demonstrated that integrating grain quality metrics provides a sound basis for classifying barley growth suitable for food-grade production. Furthermore, combining UAV-derived multispectral imagery with machine learning architecture facilitated highly accurate predictions of these growth levels. By capturing preharvest conditions via remote sensing, this framework enables field scale assessment of crop status, offering the potential to implement site-specific harvest scheduling based on localized maturation patterns. Building on these developed models, future efforts should incorporate longitudinal, multi-year datasets, and meteorological variables to improve generalizability across seasons. Such advancements will enhance the model's adaptability to inter annual climatic fluctuations and support more resilient and precise management strategies for food-grade barley production.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e \u003cstrong\u003eCompeting interests\u003c/strong\u003e \u003cp\u003eThe authors have no relevant financial or non-financial interests to disclose.\u003c/p\u003e \u003c/p\u003e\u003ch2\u003eFunding\u003c/h2\u003e \u003cp\u003eThe authors received no specific funding for this work\u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eK.M and O.W designed and conceived this study. K.M collected data. K.M analyzed and interpreted the results and drafted the manuscript. O.W and R.Y supported statistical analyses. All authors read and approved the final manuscript.\u003c/p\u003e\u003ch2\u003eAcknowledgement\u003c/h2\u003e\u003cp\u003eThe authors would like to express their sincere gratitude to the farmers and the staff of the Industrial Promotion Division at the Miyada Village Office in Japan for their significant cooperation in conducting this study.\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eThe datasets generated and/or analyzed during the current study are not publicly available due to the privacy of the participating farmers but are available from the corresponding author on reasonable request.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eAdak, A. et al. Temporal Vegetation Indices and Plant Height from Remotely Sensed Imagery Can Predict Grain Yield and Flowering Time Breeding Value in Maize via Machine Learning Regression. \u003cem\u003eRemote Sens.\u003c/em\u003e \u003cb\u003e13\u003c/b\u003e (11), 2141. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/rs13112141\u003c/span\u003e\u003cspan address=\"10.3390/rs13112141\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAl-Ajlouni, Z. et al. Impact of Pre-Anthesis Water Deficit on Yield and Yield Components in Barley (Hordeum vulgare L.) Plants Grown under Controlled Conditions. \u003cem\u003eAgronomy\u003c/em\u003e \u003cb\u003e6\u003c/b\u003e (2), 33. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/agronomy6020033\u003c/span\u003e\u003cspan address=\"10.3390/agronomy6020033\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2016).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAli, M. M., Al-Ani, A., Eamus, D. \u0026amp; Tan, D. K. Y. Leaf nitrogen determination using non-destructive techniques\u0026ndash;A review. \u003cem\u003eJ. Plant Nutr.\u003c/em\u003e \u003cb\u003e40\u003c/b\u003e (7), 928\u0026ndash;953. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1080/01904167.2016.1143954\u003c/span\u003e\u003cspan address=\"10.1080/01904167.2016.1143954\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2017).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAoe, S. et al. Effects of high β-glucan barley on visceral fat obesity in Japanese individuals: A randomized, double-blind study. \u003cem\u003eNutrition\u003c/em\u003e \u003cb\u003e42\u003c/b\u003e, 1\u0026ndash;6. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/https://doi.org/10.1016/j.nut.2017.05.002\u003c/span\u003e\u003cspan address=\"10.1016/j.nut.2017.05.002\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2017).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAtkinson, F. S., Brand-Miller, J. C., Foster-Powell, K., Buyken, A. E. \u0026amp; Goletzke, J. International tables of glycemic index and glycemic load values 2021: a systematic review. \u003cem\u003eAm. J. Clin. Nutr.\u003c/em\u003e \u003cb\u003e114\u003c/b\u003e (5), 1625\u0026ndash;1632. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/https://doi.org/10.1093/ajcn/nqab233\u003c/span\u003e\u003cspan address=\"10.1093/ajcn/nqab233\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBenos, L. et al. Machine Learning in Agriculture: A Comprehensive Updated Review. \u003cem\u003eSensors\u003c/em\u003e \u003cb\u003e21\u003c/b\u003e (11), 3758. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/s21113758\u003c/span\u003e\u003cspan address=\"10.3390/s21113758\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBirch, C. P. D., Oom, S. P. \u0026amp; Beecham, J. A. Rectangular and hexagonal grids used for observation, experiment and simulation in ecology. \u003cem\u003eEcol. Model.\u003c/em\u003e \u003cb\u003e206\u003c/b\u003e, 3\u0026ndash;4. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.ecolmodel.2007.03.041\u003c/span\u003e\u003cspan address=\"10.1016/j.ecolmodel.2007.03.041\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2007).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen, T. \u0026amp; Guestrin, C. XGBoost: A Scalable Tree Boosting System. \u003cem\u003eProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD \u0026rsquo;16\u003c/em\u003e, 785\u0026ndash;794. (2016). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1145/2939672.2939785\u003c/span\u003e\u003cspan address=\"10.1145/2939672.2939785\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCossani, C. M., Palta, J. \u0026amp; Sadras, V. O. Genetic yield gain between 1942 and 2013 and associated changes in phenology, yield components and root traits of Australian barley. \u003cem\u003ePlant. Soil.\u003c/em\u003e \u003cb\u003e480\u003c/b\u003e (1), 151\u0026ndash;163. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s11104-022-05570-7\u003c/span\u003e\u003cspan address=\"10.1007/s11104-022-05570-7\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDeng, L. et al. Sorghum yield prediction using UAV multispectral imaging and stacking ensemble learning in arid regions. \u003cem\u003eFrontiers in Plant Science\u003c/em\u003e, \u003cem\u003e16\u003c/em\u003e. (2025). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3389/fpls.2025.1636015\u003c/span\u003e\u003cspan address=\"10.3389/fpls.2025.1636015\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDuffkov\u0026aacute;, R., Pol\u0026aacute;kov\u0026aacute;, L., Lukas, V. \u0026amp; Fuč\u0026iacute;k, P. The Effect of Controlled Tile Drainage on Growth and Grain Yield of Spring Barley as Detected by UAV Images, Yield Map and Soil Moisture Content. \u003cem\u003eRemote Sens.\u003c/em\u003e \u003cb\u003e14\u003c/b\u003e (19), 4959. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/rs14194959\u003c/span\u003e\u003cspan address=\"10.3390/rs14194959\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGaneva, D. et al. Remotely Sensed Phenotypic Traits for Heritability Estimates and Grain Yield Prediction of Barley Using Multispectral Imaging from UAVs. \u003cem\u003eSensors\u003c/em\u003e \u003cb\u003e23\u003c/b\u003e (11), 5008. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/s23115008\u003c/span\u003e\u003cspan address=\"10.3390/s23115008\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHabiyaremye, C. et al. Effect of Nitrogen and Seeding Rate on β-Glucan, Protein, and Grain Yield of Naked Food Barley in No-Till Cropping Systems in the Palouse Region of the Pacific Northwest. \u003cem\u003eFrontiers in Sustainable Food Systems\u003c/em\u003e, \u003cem\u003e5\u003c/em\u003e. (2021). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3389/fsufs.2021.663445\u003c/span\u003e\u003cspan address=\"10.3389/fsufs.2021.663445\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHerzig, P. et al. Evaluation of RGB and Multispectral Unmanned Aerial Vehicle (UAV) Imagery for High-Throughput Phenotyping and Yield Prediction in Barley Breeding. \u003cem\u003eRemote Sens.\u003c/em\u003e \u003cb\u003e13\u003c/b\u003e (14), 2670. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/rs13142670\u003c/span\u003e\u003cspan address=\"10.3390/rs13142670\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHossain, A. et al. Agricultural Land Degradation: Processes and Problems Undermining Future Food Security. In S. Fahad, M. Hasanuzzaman, M. Alam, H. Ullah, M. Saeed, I. Ali Khan, \u0026amp; M. Adnan (Eds.), \u003cem\u003eEnvironment, Climate, Plant and Vegetation Growth\u003c/em\u003e (pp. 17\u0026ndash;61). Springer International Publishing. (2020). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/978-3-030-49732-3_2\u003c/span\u003e\u003cspan address=\"10.1007/978-3-030-49732-3_2\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHu, J. et al. Estimation of wheat tiller density using remote sensing data and machine learning methods. \u003cem\u003eFrontiers in Plant Science\u003c/em\u003e, \u003cem\u003e13\u003c/em\u003e. (2022). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3389/fpls.2022.1075856\u003c/span\u003e\u003cspan address=\"10.3389/fpls.2022.1075856\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKakimoto, S., Mieno, T., Tanaka, T. S. T. \u0026amp; Bullock, D. S. Causal forest approach for site-specific input management via on-farm precision experimentation. \u003cem\u003eComput. Electron. Agric.\u003c/em\u003e \u003cb\u003e199\u003c/b\u003e, 107164. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/https://doi.org/10.1016/j.compag.2022.107164\u003c/span\u003e\u003cspan address=\"10.1016/j.compag.2022.107164\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKochevenko, A. et al. Identification of QTL hot spots for malting quality in two elite breeding lines with distinct tolerance to abiotic stress. \u003cem\u003eBMC Plant Biol.\u003c/em\u003e \u003cb\u003e18\u003c/b\u003e (1), 106. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1186/s12870-018-1323-4\u003c/span\u003e\u003cspan address=\"10.1186/s12870-018-1323-4\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2018).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLachutta, K. \u0026amp; Jankowski, K. J. The Quality of Winter Wheat Grain by Different Sowing Strategies and Nitrogen Fertilizer Rates: A Case Study in Northeastern Poland. \u003cem\u003eAgriculture\u003c/em\u003e \u003cb\u003e14\u003c/b\u003e (4), 552. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/agriculture14040552\u003c/span\u003e\u003cspan address=\"10.3390/agriculture14040552\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLekhana, M. V. et al. Effect of terminal heat stress on stay-green and senescence process could explain genetic variation in grain yield and nutritional profile in wheat (Triticum aestivum L). \u003cem\u003ePlant. Growth Regul.\u003c/em\u003e \u003cb\u003e105\u003c/b\u003e (5), 1545\u0026ndash;1557. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s10725-025-01345-z\u003c/span\u003e\u003cspan address=\"10.1007/s10725-025-01345-z\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi, Y., Wang, C., Zhu, J., Wang, Q. \u0026amp; Liu, P. Classification of Nitrogen-Efficient Wheat Varieties Based on UAV Hyperspectral Remote Sensing. \u003cem\u003ePlants\u003c/em\u003e, \u003cem\u003e14\u003c/em\u003e(13), 1908. (2025). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/plants14131908\u003c/span\u003e\u003cspan address=\"10.3390/plants14131908\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLittidej, P., Pumhirunroj, B. \u0026amp; Slack, D. An Alternative Model for Predicting Rubber Yield in High-Density Plots in Ecological Areas Adjacent to the Mekong River Using Forest-Based Classification and Regression on a Hexagonal Grid. \u003cem\u003eIEEE Access.\u003c/em\u003e \u003cb\u003e13\u003c/b\u003e, 200028\u0026ndash;200053. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/ACCESS.2025.3636231\u003c/span\u003e\u003cspan address=\"10.1109/ACCESS.2025.3636231\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLundberg, S. \u0026amp; Lee, S. I. \u003cem\u003eA Unified Approach to Interpreting Model Predictions\u003c/em\u003e. (2017). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.48550/arXiv.1705.07874\u003c/span\u003e\u003cspan address=\"10.48550/arXiv.1705.07874\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMaranh\u0026atilde;o, R. L. A., Caldas, M. M., Kastens, J., Watson, J. \u0026amp; Lollato, R. P. Assessing NDVI, Climate, and Management to Predict Winter Wheat Yields at Field Scale in Kansas, USA. \u003cem\u003eRemote Sens.\u003c/em\u003e \u003cb\u003e17\u003c/b\u003e (20), 3500. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/rs17203500\u003c/span\u003e\u003cspan address=\"10.3390/rs17203500\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMaterazzi, F. Sowing, Monitoring, Detecting: A Possible Solution to Improve the Visibility of Cropmarks in Cultivated Fields. \u003cem\u003eJ. Imaging\u003c/em\u003e. \u003cb\u003e11\u003c/b\u003e (3), 71. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/jimaging11030071\u003c/span\u003e\u003cspan address=\"10.3390/jimaging11030071\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMio, K. et al. A single administration of barley β-glucan and arabinoxylan extracts reduces blood glucose levels at the second meal via intestinal fermentation. \u003cem\u003eBiosci. Biotechnol. Biochem.\u003c/em\u003e \u003cb\u003e87\u003c/b\u003e (1), 99\u0026ndash;107. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/bbb/zbac171\u003c/span\u003e\u003cspan address=\"10.1093/bbb/zbac171\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMiyamoto, J. et al. Barley β-glucan improves metabolic condition via short-chain fatty acids produced by gut microbial fermentation in high fat diet fed mice. \u003cem\u003ePLOS ONE\u003c/em\u003e. \u003cb\u003e13\u003c/b\u003e (4), e0196579. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1371/journal.pone.0196579\u003c/span\u003e\u003cspan address=\"10.1371/journal.pone.0196579\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2018).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMohr, F. \u0026amp; van Rijn, J. N. Learning curves for decision making in supervised machine learning: a survey. \u003cem\u003eMach. Learn.\u003c/em\u003e \u003cb\u003e113\u003c/b\u003e (11). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s10994-024-06619-7\u003c/span\u003e\u003cspan address=\"10.1007/s10994-024-06619-7\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMorishita, M. \u0026amp; Ishitsuka, N. Estimation of soil moisture distribution in soybean field using UAV. \u003cem\u003eJ. Japanese Agricultural Syst. Soc.\u003c/em\u003e \u003cb\u003e36\u003c/b\u003e (4), 55\u0026ndash;61. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.14962/jass.36.4_55\u003c/span\u003e\u003cspan address=\"10.14962/jass.36.4_55\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMorishita, M. \u0026amp; Ishitsuka, N. Estimation of soil properties distribution using UAV observation and machine learning. \u003cem\u003eJ. Japanese Agricultural Syst. Soc.\u003c/em\u003e \u003cb\u003e37\u003c/b\u003e (2), 21\u0026ndash;28. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.14962/jass.37.2_21\u003c/span\u003e\u003cspan address=\"10.14962/jass.37.2_21\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOkiyama, T. et al. Factors of Fluctuation in Glassy Grain Rate and β-Glucan Content and their Control by Fertilizing Technology in Barley Cultivar Shunrai for Barley Rice. \u003cem\u003eJapanese J. Crop Sci.\u003c/em\u003e \u003cb\u003e90\u003c/b\u003e (2), 194\u0026ndash;205. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1626/jcs.90.194\u003c/span\u003e\u003cspan address=\"10.1626/jcs.90.194\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOmotoso, A. B., Letsoalo, S., Olagunju, K. O., Tshwene, C. S. \u0026amp; Omotayo, A. O. Climate change and variability in sub-Saharan Africa: A systematic review of trends and impacts on agriculture. \u003cem\u003eJ. Clean. Prod.\u003c/em\u003e \u003cb\u003e414\u003c/b\u003e, 137487. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/https://doi.org/10.1016/j.jclepro.2023.137487\u003c/span\u003e\u003cspan address=\"10.1016/j.jclepro.2023.137487\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePaire, L., McCabe, C. \u0026amp; McCabe, T. Multi-model genome-wide association study on key organic naked barley agronomic, phenological, diseases, and grain quality traits. \u003cem\u003eEuphytica\u003c/em\u003e \u003cb\u003e220\u003c/b\u003e (7), 118. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s10681-024-03374-7\u003c/span\u003e\u003cspan address=\"10.1007/s10681-024-03374-7\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePerich, G. et al. Pixel-based yield mapping and prediction from Sentinel-2 using spectral indices and neural networks. \u003cem\u003eField Crops Res.\u003c/em\u003e \u003cb\u003e292\u003c/b\u003e, 108824. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/https://doi.org/10.1016/j.fcr.2023.108824\u003c/span\u003e\u003cspan address=\"10.1016/j.fcr.2023.108824\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePowers, D. M. W. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. \u003cem\u003eInt. J. Mach. Learn. Technol.\u003c/em\u003e \u003cb\u003e2\u003c/b\u003e (1), 37\u0026ndash;64. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/https://doi.org/10.48550/arXiv.2010.16061\u003c/span\u003e\u003cspan address=\"10.48550/arXiv.2010.16061\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2011).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSakamoto, T. \u0026amp; Matsuoka, M. Generating high-yielding varieties by genetic manipulation of plant architecture. \u003cem\u003eCurr. Opin. Biotechnol.\u003c/em\u003e \u003cb\u003e15\u003c/b\u003e (2), 144\u0026ndash;147. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/https://doi.org/10.1016/j.copbio.2004.02.003\u003c/span\u003e\u003cspan address=\"10.1016/j.copbio.2004.02.003\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2004).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSokolova, M. \u0026amp; Lapalme, G. A systematic analysis of performance measures for classification tasks. \u003cem\u003eInf. Process. Manag.\u003c/em\u003e \u003cb\u003e45\u003c/b\u003e (4), 427\u0026ndash;437. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/https://doi.org/10.1016/j.ipm.2009.03.002\u003c/span\u003e\u003cspan address=\"10.1016/j.ipm.2009.03.002\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2009).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTeku, D. Navigating climate uncertainty: a comprehensive review of climatic variabilities and extreme events on environmental, socio-economic, and livelihood dimensions in Ethiopia with adaptation strategies. \u003cem\u003eAll Earth\u003c/em\u003e. \u003cb\u003e37\u003c/b\u003e (1), 1\u0026ndash;30. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1080/27669645.2025.2514418\u003c/span\u003e\u003cspan address=\"10.1080/27669645.2025.2514418\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eUgarte, C., Calderini, D. F. \u0026amp; Slafer, G. A. Grain weight and grain number responsiveness to pre-anthesis temperature in wheat, barley and triticale. \u003cem\u003eField Crops Res.\u003c/em\u003e \u003cb\u003e100\u003c/b\u003e (2), 240\u0026ndash;248. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/https://doi.org/10.1016/j.fcr.2006.07.010\u003c/span\u003e\u003cspan address=\"10.1016/j.fcr.2006.07.010\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2007).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYan, J. et al. Estimation of regional-scale maize plant nitrogen content based on multi-source remote sensing data. \u003cem\u003eFront. Plant Sci.\u003c/em\u003e \u003cb\u003e16\u003c/b\u003e \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3389/fpls.2025.1669170\u003c/span\u003e\u003cspan address=\"10.3389/fpls.2025.1669170\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYasuda, S. Establishment and characteristics of barley varieties. \u003cem\u003eBreed. Res.\u003c/em\u003e \u003cb\u003e11\u003c/b\u003e (4), 137\u0026ndash;143. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1270/jsbbr.11.137\u003c/span\u003e\u003cspan address=\"10.1270/jsbbr.11.137\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2009).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhao, H., Liu, W., Wang, Y. \u0026amp; Wu, L. Comparative analysis of algorithmic approaches in ensemble learning: bagging vs. boosting. \u003cem\u003eSci. Rep.\u003c/em\u003e \u003cb\u003e15\u003c/b\u003e (1), 34218. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41598-025-15971-0\u003c/span\u003e\u003cspan address=\"10.1038/s41598-025-15971-0\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhao, Y., Jiang, R., Brider, J., Chapman, S. \u0026amp; Potgieter, A. Characterizing wheat and barley growth and phenology using multi-spectral remote sensing for site-specific precision agriculture. \u003cem\u003eSilico Plants\u003c/em\u003e. \u003cb\u003e7\u003c/b\u003e (2). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/insilicoplants/diaf013\u003c/span\u003e\u003cspan address=\"10.1093/insilicoplants/diaf013\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2025).\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Barley, UAV, Machine learning, Predict model framework, Spatial prediction","lastPublishedDoi":"10.21203/rs.3.rs-9011430/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9011430/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"(150 to 250 words) Spatial variability in barley maturation complicates preharvest classification for food-grade production and accurate conventional field-based assessments. Therefore, we classified barley growth into food-grade–oriented categories and developed a predictive framework integrating UAV-based multispectral imagery with machine learning. Grain quality metrics were first analyzed using principal component analysis and k-means clustering to define three physiologically distinct growth levels. Multi-temporal vegetation indices (NDVI, GNDVI, and NDRE) were extracted from UAV imagery, and key predictors were selected using Lasso regularization. Comparisons of Random Forest (RF), XGBoost, and Support Vector Machine models indicated that tree-based ensembles achieved high classification accuracy (\u003e0.9), with late-season NDVI identified as the most influential predictor. Spatial mapping using the RF model revealed pronounced inter-field variability, identifying zones of incomplete maturation associated with lodging. Overall, integrating grain quality traits with UAV-based spectral monitoring allows accurate field-scale classification of food-grade barley growth and informs site-specific harvest management.","manuscriptTitle":"Classification and Prediction of Growth Conditions in Food Barley Fields Using UAV Multispectral Images and Machine Learning Approaches","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-04-10 16:58:26","doi":"10.21203/rs.3.rs-9011430/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"d0fda75f-4501-4c9b-91ee-14c5948fd3fa","owner":[],"postedDate":"April 10th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":65945972,"name":"Biological sciences/Computational biology and bioinformatics"},{"id":65945973,"name":"Biological sciences/Ecology"},{"id":65945974,"name":"Earth and environmental sciences/Ecology"},{"id":65945975,"name":"Earth and environmental sciences/Environmental sciences"},{"id":65945976,"name":"Physical sciences/Mathematics and computing"},{"id":65945977,"name":"Biological sciences/Plant sciences"}],"tags":[],"updatedAt":"2026-04-27T08:28:26+00:00","versionOfRecord":[],"versionCreatedAt":"2026-04-10 16:58:26","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9011430","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9011430","identity":"rs-9011430","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.