Analysis of Leaf cover on Raspberry Fruits Based on Hyperspectral Techniques Combined with Machine Learning Models | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Analysis of Leaf cover on Raspberry Fruits Based on Hyperspectral Techniques Combined with Machine Learning Models Zhujun Chen, Juan Wang, Ruiqian Xi, Zhenhui Ren This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4607290/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract The aim of this study is to explore the potential application of hyperspectral technology in detecting the problem of fruit cover in the orchard. Three types of hyperspectral data were collected using a hyperspectral instrument to cover raspberry fruits with leaves. Machine learning models were used to classify and regress covered and uncovered fruits. The results show that hyperspectral technology can effectively differentiate fruits under different cover conditions, with spectral intensity data performing better in addressing cover issues. Random forest (RF) and multilayer perceptron (MLP) models demonstrated high accuracy in classification analysis, with MLP achieving a ROC AUC value of 0.99 on full-band data. Regression analysis also revealed a significant correlation between degree of coverage and spectral features, highlighting in particular the high explanatory power of light intensity data in predicting degree of coverage. This study not only confirms the application value of hyperspectral technology in precision agriculture, but also provides new technical support for intelligent orchard management and automated harvesting. Future research will focus on improving the generalisation ability of the models, integrating multi-source data to further improve the accuracy of coverage detection, and exploring the development of real-time monitoring and automatic control systems to achieve comprehensive intelligence in orchard management. Physical sciences/Optics and photonics/Optical techniques/Spectroscopy Biological sciences/Plant sciences/Plant ecology Physical sciences/Mathematics and computing/Information technology Hyperspectral technology Fruit prediction Machine learning models Precision agriculture MLP RF Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 Figure 11 Figure 12 Figure 13 Figure 14 Figure 15 Figure 16 1. Introduction Raspberries (Rubus idaeus) are popular berries known for their delicious taste and high nutritional value(Natasa Kljajic, J. Subic, & Sredojević, 2017). Their high levels of vitamin C, fibre, antioxidants and other nutrients give raspberries significant nutritional and functional value(Mannino et al., 2022 ). As a result, raspberries are in high demand on the market, not only for direct consumption, but also for the production of jams, juices, frozen foods and more(De Santis, Carbone, Garzoli, Laghezza, & Turchetti, 2022 ). Given their nutritional composition and market demand, raspberry cultivation has significant economic value and research importance in current agricultural practices, as evidenced by the increasing area and yield of raspberries(Tan, Dai, Lu, & Shi, 2020 ;Yan et al., 2020 ). As agricultural practices evolve towards more efficient and sustainable methods, the emergence of intelligent orchard management, including yield estimation, growth monitoring and fruit drop prediction, along with harvesting mechanisation to increase efficiency, is becoming critical (Zhu, Chen, Zheng, Peng, & Chen, 2024 ). A critical aspect of orchard management is the effective identification of all raspberry fruit in the orchard; however, the phenomenon of fruit being covered by leaves affects yield estimation, harvesting efficiency and overall crop quality. Therefore, research into the problem of leaf-covered raspberry fruit is of paramount importance in order to improve the efficiency and accuracy of raspberry fruit collection. Currently, machine vision technology is commonly used to identify covered fruit(Chen et al., 2022 ). With the advancement of deep learning, machine vision technology in the field of object detection has matured and can meet the current demands of smart orchard management(Dai et al., 2022 ;Wei, Wu, Ge, Yao, & Bai, 2022 ). In recent years, the development of hyperspectral analysis technology has provided many new management requirements and solutions for smart orchard management(Zhujun, Juan, Xuan, Yuhong, & Zhenhui, 2023 ). Hyperspectral analysis is expected to play a crucial role in the future smart orchard management, so it is of great importance to explore its various functional areas. Currently, there have been some studies on the identification of covered fruits. These studies mainly focus on using machine vision technology to identify covered fruits and improving the accuracy of identifying covered fruits by upgrading data acquisition equipment(Mirbod, Choi, Heinemann, Marini, & He, 2023 ) or optimising algorithms(Kang & Chen, 2020 ). However, there are limitations to machine vision technology for identifying leaf-covered fruits, including the effect of cover angle on the accuracy of identifying covered fruits, the inability to identify small fruits that are completely covered by leaves, and the potential consumption of significant computational resources and time in processing image information compared to spectral data processing(E., D., & R., 2009). In addition, machine vision processing needs to take into account factors such as lighting, angles and canopy cover, adding to the complexity of the processing. Hyperspectral technology is considered a potentially powerful tool for improving orchard management practices(Barbedo, 2023 ;Chen et al., 2024 ;Zhujun et al., 2023 ). Unlike traditional spectral methods, hyperspectral analysis technology provides a comprehensive and detailed examination of the electromagnetic spectrum, allowing precise identification and quantification of biochemical and physiological changes in plants(Barbedo, 2023 ;Chang, 2022 ). The versatility and expanding applications of hyperspectral technology allow researchers to efficiently collect diverse and relevant data for intelligent orchard management. By integrating multiple data streams, hyperspectral analysis simplifies the information acquisition process, reduces equipment diversity, and reduces the complexity of processing heterogeneous data sets. Hyperspectral analysis technology can capture spectral information from both leaf-covered fruits and uncovered fruits in the orchard. By analysing and processing this information, characteristic information of the fruit can be extracted, enabling the prediction and detection of leaf-covered raspberry fruit. This study is the first to use hyperspectral technology to identify leaf-covered raspberry fruits. By using the comprehensive spectral information provided by hyperspectral analysis, researchers can distinguish fruits that are completely covered by leaves, thereby supporting more accurate yield estimation, harvest planning and crop management strategies. This study aims to address the shortcomings of existing methods for assessing leaf-covered fruit by fully exploiting the capabilities of hyperspectral technology. Through careful data pre-processing, feature extraction and classification modelling, this study aims to elucidate the spectral characteristics of fruit development affected by leaf canopy, and to evaluate the effectiveness of hyperspectral analysis in characterising this phenomenon. By gaining insight into the impact of canopy cover on intelligent orchard management and automated harvesting, this study aims to provide valuable tools and insights to optimise the production process and improve crop quality in raspberry cultivation. This study uses hyperspectral data analysis technology to collect two types of spectral data: leaf-covered raspberry fruit and fruit without leaf cover, including light intensity data and reflectance data. Classification is performed using models such as Random forest (RF), Logistic regression (LR) and multilayer perceptrons (MLP), and model performance is evaluated by tuning grid search parameters. The results indicate that the hyperspectral analysis technology is effective in distinguishing fruits with and without leaf cover. In addition, we conducted a correlation analysis of the distance covered by leaves on raspberry fruits and found a significant effect of leaf cover on spectral features at different distances, with the highest correlation coefficient observed at 0 cm. Furthermore, we compared the applicability of spectral intensity data and reflectance data in studying leaf-covered raspberry fruits, with the results indicating that spectral intensity data is more suitable for addressing leaf cover issues. The innovation of this study is the first application of hyperspectral technology to the detection of raspberry leaf shading problems, an approach that is groundbreaking in the field of agriculture. Compared to traditional machine vision techniques, hyperspectral technology is capable of capturing detailed spectral information over a wider range of the electromagnetic spectrum, offering new possibilities for the precise identification and quantification of biochemical and physiological changes in plants. By utilising the data collected by the hyperspectral instrument and combining it with an advanced machine learning model, this study not only effectively distinguishes fruits under different coverage distances, but also shows the superiority of light intensity data in dealing with the coverage problem. Among them, the multilayer perceptron (MLP) model achieves a ROC AUC value of 0.99 on full-band data, a result that excels in classification analysis. The approach in this study provides new perspectives and methods for precision agriculture and orchard management by simplifying the information acquisition process, reducing the diversity of equipment requirements, and reducing the complexity of handling heterogeneous datasets. In addition, this study provides new technical support for smart orchard management and automated harvesting, which is expected to significantly improve the efficiency and accuracy of leaf-covered fruit detection. 2. Materials and methods 2.1. Data collection This study focused on two varieties of raspberries, Polka and Hirtz, grown at the Xinyun Family Farm in Baoding City, Hebei Province, China (38.84265881177145, 115.42011291742594). The raspberries were grown in a greenhouse and spectral data were collected from May to September 2023. Data collection ensured adequate sunlight and good fruit growth conditions (In this study, all spectral data were collected from raspberry plants (Rubus idaeus) while they were growing, with no instances of fruit picking involved in the process.). The optpsky ATP2400 spectrometer (Fig. 1 ) was used for hyperspectral data collection. The spectrometer had 2048 spectral bands from 191.37 to 1106.92 nm with a sampling rate of 0.36 nm. The data collected included light intensity, absorbance and reflectance data. During data collection, the probe of the hyperspectral instrument was placed close to the measured leaves. Spectral data were first measured for leaves without fruit and leaves with fruit. For situations with covered fruits, spectral data were also collected for four different leaf-to-fruit distances: 0 cm, 2 cm, 4 cm and 6 cm. To minimise errors in the leaf-to-fruit distance, measurements were made using dimensional measurements and manual support of the fruits to control the leaf-to-fruit distance. A total of five scenarios of spectral data were collected, including leaves without covered fruits and covered fruits at distances of 0 cm, 2 cm, 4 cm and 6 cm (Fig. 2 ). A total of 6294 sets of light intensity data, 6293 sets of reflectance data and 6292 sets of absorbance data were collected for the five scenarios. Each measurement consisted of one set of hyperspectral data, and by modifying the measurement types option, all three types of spectral data were measured simultaneously. This ensured consistency in the measurement targets and reduced variation between the study objects, thereby minimising the influence of measurement error. 2.2. Data pre-processing Data pre-processing plays a crucial role in various scientific fields, especially in agricultural research(Diwu, Bian, Wang, & Liu, 2019 ). The spectral data were collected from leaf-covered and uncovered raspberry plants. All data processing procedures were performed using Python 3.11.5 software within Anaconda (v23.7.4) and custom Python scripts to ensure high quality data sets for subsequent classification and regression analyses. Initially, raw spectral data was loaded from Excel files. Due to environmental factors, instrument performance and interference during data transmission, the raw data collected often contained missing values. In the early stages of data pre-processing, the missing values in the data set were checked. It was found that the amount of missing data in the spectral intensity and reflectance datasets was minimal and limited to specific rows. Therefore, a manual deletion approach was used to remove missing values to ensure data integrity. However, the absorbance dataset had a higher rate of missing values. To address this issue, the K-Nearest Neighbours (KNN) interpolation method was used to fill in the missing values. However, even after interpolation, significant uncertainty remained in the absorbance data, which could potentially affect the accuracy of the final analysis. Baseline correction and smoothing techniques were used to reduce noise in the spectral data(Y., X., H., & W., 2023). Specifically, wavelet baseline correction and the Savitzky-Golay (SG) filter(Press & Teukolsky, 1990 ) were applied to improve the signal-to-noise ratio and ensure the stability of the analysis(Bertinetto & Vuorinen, 2014 ). Following baseline correction and smoothing, scatter correction was applied to standardise spectral intensities between samples. The standard normal variate (SNV)(Barnes, Dhanoa, & Lister, 1989 ) correction was chosen for its ability to eliminate variation caused by scatter effects, thereby concentrating and standardising the data for comparative analysis. Feature extraction was then performed to identify significant spectral features associated with fruit cover. Peak detection algorithms were used to extract feature peaks and valleys from the pre-processed spectra, which were used for subsequent classification and regression model development. The motivation for selecting hyperspectral technology for this study is its ability to capture detailed spectral information over a wide range of wavelengths(Zhu et al., 2020 ). The pre-processing pipeline outlined in this section provides a critical foundation for subsequent analyses to investigate the effect of canopy cover on fruit spectral characteristics. By using advanced hyperspectral technology and tailored preprocessing techniques, we aim to provide valuable insights to guide agricultural practices and contribute to the optimisation of fruit quality assessment methods. 2.3. Classification analysis The dataset utilised in this study consists of hyperspectral data of raspberry fruits covered and uncovered by leaves, with the aim of classifying the two scenarios of covered and uncovered raspberry fruits. The dataset consists of several samples, each with a set of spectral features that form a feature matrix. The target variable "label" indicates whether the sample is covered or not. After pre-processing, the dataset was strictly divided into training and test sets at a ratio of 70% (training set) and 30% (test set) to ensure effective generalisation of the model to unknown data. This partitioning method helps to evaluate the performance of the model in practical applications. To mitigate the influence of different feature value ranges on model training, feature scaling was performed using the StandardScaler method. This method transforms the values of each feature into a distribution with a mean of 0 and a standard deviation of 1, thereby accelerating the convergence speed of the model and improving its stability and predictive performance. Three different classification models were applied to analyse the dataset: RF Classifier, LR and MLP Classifier. 2.3.1. RF Classifier The Random Forest (RF) classifier(Genuer, Poggi, & Tuleau-Malot, 2010 ;Wang, Li, Song, & Rong, 2019 ), known for its robustness against overfitting. In this study, the RF classifier(Breiman, 2001 ) was used to analyse the hyperspectral data of covered and uncovered raspberry fruits in order to identify and classify the two different scenarios.This method, which operates by averaging predictions from multiple decision trees(Jiang et al., 2007 ), was optimized by tweaking parameters such as the number of trees (n_estimators) and the maximum depth of trees(max_depth), thereby enhancing its effectiveness in identifying fruit coverage. 2.3.2. LR Classifier LR(Genkin, Lewis, & Madigan, 2007 ;Komarek, Moore, Committee, Calvet, & Nichol, 2004 ) is a standard algorithm for binary classification(Feng et al., 2022;Mondal et al., 2019), was employed to assess the likelihood of raspberry fruits being covered based on their hyperspectral data. By estimating the probability of a sample belonging to a specific category, LR provides a clear interpretation of the data. In this research, model complexity was managed and hyperparameters were optimized using GridSearchCV, enhancing the model's predictive accuracy for identifying the canopy status of the fruits, which is crucial for precision agriculture and fruit quality management. 2.3.3. MLP Classifier MLP(Lyu, Wang, Luo, Shuai, & Huang, 2021 ) is a type of feed-forward artificial neural network, was utilized in this study for classifying raspberry fruits based on their hyperspectral data. The MLP's structure, which includes input, hidden, and output layers with non-linear activation functions, allows it to learn complex relationships between input features and outputs (Fig. 3 ). This model was optimized by adjusting parameters such as the number of hidden layers and iterations, using GridSearchCV to enhance classification accuracy for leaf-covered and uncovered fruits. The MLP's strength lies in its non-linear modeling, making it adept at handling intricate datasets. It was instrumental in identifying patterns in spectral features related to fruit coverage, offering valuable insights for precision agriculture. The optimized MLP model aids in improving the efficiency of fruit quality assessment and provides decision support for orchard management. Performance was evaluated using metrics like ROC AUC, accuracy, and specificity. 2.4. Regression analysis method for distance parameters In this study, regression analysis of spectral data of fruits covered by leaves at different distances (0 cm, 2 cm, 4 cm, 6 cm) was performed to establish the feature matrix and labelled the coverage distances. To validate the model prediction ability, the dataset was divided into 70% training set and 30% test set, and the features were normalised using StandardScaler to improve the model training efficiency and prediction performance. 2.4.1. Linear regression analysis Linear regression(Arjo, 2009 ;Marill, 2004 ), a foundational statistical approach(Frees, 2009 ;Maindonald & Braun, 2010 ), was used in this study to model the linear relationship between spectral features and the distance of fruit coverage. The model predicts the covered distance by minimizing the sum of squared errors, identifying a linear equation that correlates spectral data with the leaf cover status. This analysis aids in understanding key spectral features and is fundamental for orchard management, with the potential to inform if more complex models are necessary for capturing non-linear data relationships. 2.4.2. Ridge regression (RR) analysis Ridge regression, also known as Tikhonov regularisation, is a linear regression method specifically designed to deal with the problem of multicollinearity(Golub, Hansen, & O'Leary, 1999 ;Sonnberger, 1989). Ridge regression, which combats multicollinearity in linear models by adding an L2 penalty term to the loss function(Naozumi, Lundberg, & Su-In, 2019 ;Sen, Tomal, & Yan, 2022 ), was applied to analyze spectral data of leaf-covered raspberries. This method stabilizes parameter estimates and reduces model complexity. The model parameters are solved by minimizing the loss function, typically using a closed-form solution. In this study, ridge regression predicted the distance between fruit and leaves, mitigating multicollinearity common in spectral data. Hyperparameter tuning via GridSearchCV optimized model performance, enhancing predictive accuracy and offering insights into spectral data's relationship with canopy coverage for orchard management. 2.4.3. Lasso regression analysis Lasso regression(Tibshirani & Tibshirani, 1996 ;Zhang et al., 2021 ), addressing multicollinearity in data, incorporates an L1 penalty for feature selection, shrinking insignificant coefficients to zero(Guo et al., 2022 ;Tsarouchi, Vlachopoulos, Karahaliou, Vassiou, & Costaridou, 2020 ). It minimizes a loss function with an added penalty term, resulting in a sparse solution where only relevant features contribute to the mode(LI, HU, & ZHAO, 2022)l. This method was used to analyze spectral data of leaf-covered fruits, predicting the leaf-fruit distance. Lasso's sparsity identifies key spectral features for distance prediction, enhancing model interpretability and accuracy. GridSearchCV optimized the regularization parameter, balancing model simplicity with performance. 2.4.4. RF regression analysis RF regression(Sheridan & RP, 2013;Svetnik, 2003 ) enhances model predictions by averaging the outcomes of multiple decision trees, capitalizing on "collective intelligence" to improve performance over single models. It introduces randomness by selecting features randomly during tree node splits, increasing model diversity and reducing overfitting. RF optimizes predictions by minimizing the mean squared error (MSE)(Svetnik, 2003 ;Vu et al., 2019 ) and is adept at handling non-linear relationships and feature importance assessment. In this study, RF was applied to predict the distance between fruit and covered leaf, with hyperparameter tuning via GridSearchCV to enhance predictive accuracy by adjusting the number of trees and tree depth parameters. 2.4.5. MLP regression analysis The Multilayer Perceptron (MLP), a neural network for regression and classification(Golnaraghi, Zangenehmadar, Moselhi, & Alkass, 2019 ), was employed in this study using the MLPRegressor (MLPR) from Scikit-learn. MLP leverages its non-linear modeling to analyze complex spectral data, with hyperparameter tuning via GridSearchCV to optimize the network's structure and training process, enhancing prediction accuracy and generalization. Model performance is evaluated using metrics like the Coefficient of Determination (R 2 ), MSE, and Pearson's correlation coefficient. 3. Results and discussion 3.1. Results of data pre-processing This study involved rigorous preprocessing of spectral data—intensity, absorbance, and reflectance—to ensure quality for analysis. Steps included handling missing values, baseline correction, smoothing, scatter correction, and feature extraction. From the distribution of missing values in the raw data collected for the three categories (Fig. 4 ), it can be judged that the amount of missing data for the spectral and reflectance data is small, and some data are missing in a particular row, so direct manual deletion is used to remove the missing values to ensure data integrity. However, the absorbance data set has a high rate of missing hyperspectral data - absorbance data is obtained from the absorption properties of the principal substance for specific wavelengths of light.. Absorbance (Charnley, 2023 ;Platt & Stutz, 2008 )is calculated using the formula: $$\text{A}\left({\lambda }\right)=-{\text{l}\text{o}\text{g}}_{10}\left(\frac{{\text{I}}_{\text{t}\text{r}\text{a}\text{n}\text{s}\text{m}\text{i}\text{t}\text{t}\text{e}\text{d}}}{{\text{I}}_{\text{i}\text{n}\text{c}\text{i}\text{d}\text{e}\text{n}\text{t}}}\right)$$ 1 In the above equation, A is the absorbance at the wavelength at v, It is the transmitted light intensity, and Ii is the incident light intensity(Castanié, 2013 ). However, the absorbance data showed a large number of nulls, which can be analysed because leaf cover changes the distribution and intensity of the incident light, resulting in significant differences in the spectral information received on the surface of the fruit compared to the uncovered area. This variation can be outside the dynamic range of the sensor, resulting in an inability to accurately measure absorbance. The large number of missing values affects the integrity of the data and the accuracy of the analyses. This led to integrity issues and could skew model learning and results(Khan et al., 2022 ). As a result, absorbance data may be less suitable for analyzing leaf-covered fruits where data precision is vital, whereas light intensity and reflectance data are more reliably extracted and utilized. In contrast, light intensity and reflectance data may better reflect the spectral characteristics of fruits under cover conditions and it is easier to extract useful information from the data. Raw spectral data of light intensity (Fig. 5 ) as well as reflectance data (Fig. 6 ) were averaged for the case of leaf followed by covered fruit versus the case of no fruit behind the leaf. From the spectrograms collected, it was possible to identify the band between 200 and 1100 nanometres (nm). This band covers the region from UV to NIR and is an important interval in plant spectral analysis, providing detailed information on the biochemical composition and physical structure of the fruit(Galvez-Sola et al., 2015 ;Ling, Goodin, Raynor, & Joern, 2019 ). Two major wave peaks are observed from Fig. 5 (a), where the highest visible wave peak in Fig. 5 (b) occurs in the range of about 700 to 800 nm, which is consistent with the absorption properties of plant pigments (e.g. chlorophyll), which have a strong absorption in this band. Another sub-obscure peak in Fig. 5 (c) is located between 500 and 600 nm, which may be related to the absorption properties of carotenoids, which also have strong absorption in this band. These peaks provide important information about the spectral characteristics of the fruit. When comparing the spectral curves of the 'no cover' with those of the different covered distances, it was observed that the 'no cover' curve differed significantly from the other curves in the overall spectral map. This difference may not be significant enough in the full band, but it can be clearly seen in the detail plots (show as Figs. 5 b, 5 c). This suggests that the spectral features reflect the effect of shading, despite the fact that shading conditions change the path and intensity of light propagation. From the reflectance spectrograms in Fig. 6 (a) it can be seen that the effective spectral range is approximately between 400 and 900 nm. Below 400 nm and above 900 nm, the spectral data show strong noise, combined with the analysis of light intensity spectra in Fig. 5 , the light intensity in the same region of the curve a few there is little light intensity signal, may be due to the spectroscopic instrument in the area of relatively low sensitivity, resulting in a decline in the signal-to-noise ratio, or the samples in the ultraviolet or mid-infrared region does not have a significant reflectance or absorption characteristics, and therefore the measured signal is mainly noise. In the interval 400–900 nm the curve is relatively stable, indicating that the spectral data in this band range has a good signal-to-noise ratio and is suitable for further analysis. In the 600–800 nm interval (Fig. 6 b) there is a peak in the reflectance curve which corresponds to the position of the trough in the light intensity spectrum. This peak may be due to the light absorption properties of some chemical components in the sample at these wavelengths. In the visible region, the absorption properties of chlorophyll and other pigments affect the reflectance, resulting in a characteristic peak in the spectrogram. Therefore, prior to pre-processing, the available range is extracted and the strong part of the noise is removed before the data is pre-processed. Data pre-processing techniques, including outlier and missing value correction, baseline adjustment, and smoothing with an SG filter, were applied to the hyperspectral dataset to enhance analysis accuracy. The wavelet transform method was used for baseline correction, and the SNV method addressed scatter effects. The above pre-processing steps resulted in two sets of spectrograms (Figs. 7 and 8 ) showing the data before and after pre-processing, respectively. The spectrograms before pre-processing show the noise and irregularities in the raw data, while the spectrograms after pre-processing show clearer and more stable spectral features. Particularly in the 500-900nm band, the pre-processed spectrograms show clearer spectral features, indicating that the pre-processing effectively improves the usability and analytical value of the data. 3.2. Feature extraction process After data pre-processing, feature extraction is a key part of the analysis process in this study, which aims to identify key information closely related to sample characteristics from complex spectral data(Labory, Njomgue-Fotso, & Bottini, 2024 ;Sachar & Kumar, 2021 ). Wave peak detection is first performed on the pre-processed data using the find_peaks function, which is able to identify significant local maxima in the spectra. Optimal peak selection was achieved by setting the peak detection parameters, including the minimum height, minimum distance and relative height thresholds. Post-extraction,SpectrumAlignment ensured spectral comparability by aligning wave peaks to a reference spectrum, creating a superspectrum for standard analysis. Feature selection and filtering retained significant features for analysis, using a ratio-based approach to exclude less frequent features and focus on those most relevant for classification or regression. The results of the feature extraction are shown in Figs. 9 and 10 , which clearly show the peaks and troughs of each feature parameter point. These wave peaks reveal key information in the hyperspectral data and provide a solid basis for hyperspectral analysis. Finally, the wavelength parameter values obtained by the filter ratio based feature extraction method include: 554.51, 592.519, 600.895, 714.059, 721.601, 745.837, 755.462, 770.258, 805.974, 820.44, 825.522, 831. 014, 836.071, 839.016, 857.846, 869.062, 889.26, 903.97, 591.121, 596.71, 718.498, 723.372, 751.529, 760. 695, 775.024, 805.548, 817.045, 822.559, 827.636, 832.279, 854.511, 867.404, 881.452, 900.303. These effectively extracted wavelength ranges cover 554.51-903.97 nm, spanning the visible to infrared spectral range. The use of these features provides a deeper understanding of the differences between samples and greatly simplifies the model, providing important support for sample classification and regression analysis. 3.3. Results of classification analysis This study conducted a hyperspectral analysis to classify leaf-covered raspberry fruit using three machine learning (ML) models: RF, LR, and MLP. It compared feature band selection to full band data for classification, evaluating model performance through metrics like ROC AUC, accuracy, sensitivity, and specificity, with results detailed in Table 1 . Table 1 Classification Model Evaluation Results Data Evaluation criteria RF Train RF Test LR Train LR Test MLP Train MLP Test Intensity characteristic wavelength ROC AUC 1 0.9 0.86 0.84 0.94 0.92 accurancy 0.98 0.84 0.83 0.8 0.93 0.9 Sensitivity 0.97 0.79 0.82 0.8 0.91 0.87 Specificity 0.99 0.83 0.8 0.78 0.93 0.91 intensity all wavelength ROC AUC 1 0.87 0.96 0.94 0.99 0.95 accurancy 0.99 0.80 0.94 0.92 0.97 0.94 Sensitivity 0.99 0.79 0.92 0.91 0.96 0.92 Specificity 1 0.80 0.91 0.90 0.96 0.96 Reflectance selection data ROC AUC 1 0.94 0.91 0.89 0.97 0.95 accurancy 0.99 0.89 0.85 0.82 0.91 0.89 Sensitivity 0.98 0.8 0.74 0.69 0.96 0.93 Specificity 1 0.93 0.90 0.89 0.89 0.87 3.3.1. Classification Performance of Spectral Light Intensity Data Combined with the area under the ROC curve to evaluate the classification performance of the optical intensity data, the ROC curves of the feature band spectral data (Fig. 11 ) and the full band spectral data (Fig. 12 ) comprehensively demonstrate the evaluation of the accuracy performance of the three models. In feature band classification, the RF model excelled on the training set with a perfect ROC AUC of 1.00, accuracy of 0.98, sensitivity of 0.97, and specificity of 0.99, indicating high accuracy in distinguishing training samples. However, its performance on the test set declined to an ROC AUC of 0.90, accuracy of 0.84, sensitivity of 0.79, and specificity of 0.83, indicating a need for improved generalizability. LR and MLP also performed well on the training set but with slightly lower metrics than RF. On the test set, LR mirrored RF's performance, while MLP demonstrated better generalization with a high ROC AUC of 0.92, accuracy of 0.90, sensitivity of 0.87, and specificity of 0.91. On full-band data, all models showed improved training performance, with RF and MLP reaching ROC AUCs of 1.00 and 0.99, and accuracies of 0.99 and 0.97, respectively. This indicates the benefit of richer information in full-band data for pattern recognition. However, RF's test set performance dropped to an ROC AUC of 0.87 and accuracy of 0.80. Logistic regression and MLP maintained stability with ROC AUCs of 0.94 and 0.95, and high sensitivities and specificities, demonstrating their robustness across datasets. Selected feature bands enhance model training efficiency and reduce computational demands, especially beneficial in high-dimensional data by lowering complexity and overfitting risks, thus improving model generalizability. Despite full-band data offering more information, it may result in prolonged training and prediction times without significantly surpassing the classification performance of selected feature bands on test sets. RF showed better performance with selected feature bands than with full-band data, likely due to the latter's higher dimensionality and increased model complexity. LR exhibited stable performance across both datasets, with less significant improvement in the full band. Conversely, the multilayer perceptron sustained high performance on full-band data, particularly on the test set, indicating strong generalization and classification accuracy. The feature band approach is beneficial for efficiency and computational resource usage, particularly with high data dimensionality. Full-band data, while more informative, is not necessary when resources are ample. Future research should explore feature selection and dimensionality reduction for optimized model performance and cost. In practice, selected feature bands are preferable for a balance between speed and accuracy. 3.3.2. Classification performance of reflectance data The ROC curve of the optical reflectance spectral data, as depicted in Fig. 13 , presents an exceptional classification performance of all models on the training set, with RF achieving a perfect ROC AUC of 1.00 and high accuracy, sensitivity, and specificity, indicating strong classification and canopy identification. Despite a slight dip in performance on the test set, RF maintained a high ROC AUC of 0.94 and good accuracy, sensitivity, and specificity, demonstrating robust generalization. The LR and MLP also perform well on the training set, with ROC AUC of 0.91 and 0.97, and accuracy of 0.85 and 0.91. On the test set, the LR has a ROC AUC of 0.89 and accuracy of 0.82, while the MLP has a ROC AUC of 0.95 and accuracy of 0.89, and the sensitivity and specificity of both also remain at a higher level. Reflectance spectral data offered models richer information during training, leading to near-perfect performance, likely due to its reflection of samples' chemical and physical properties, which aids classification. Despite a slight decline in light intensity spectral data performance on the test set, reflectance data maintained high accuracy and generalization, suggesting its reliability in practical applications, especially with ample computational resources. Hyperspectral technology effectively distinguished between leaf-covered and uncovered fruits, with high accuracy and recall, proving its efficacy in addressing coverage issues. 3.4. Results of regression analysis Regression analysis revealed significant correlations between spectral characteristics and the extent of fruit cover, with light intensity spectra demonstrating a stronger predictive correlation than reflectance data for raspberry fruit coverage. Multiple regression models were employed to forecast the leaf-fruit distance, with model performance evaluated using R 2 , Pearson correlation coefficients, and p-values, as detailed in Table 2 . Table 2 Regression Model Evaluation Results Data Model R 2 Pearson correlation P-value Intensity extraction LR 0.11 0.33 3.11×10 − 39 RR 0.34 0.63 2.54×10 − 40 Lasso 0.70 0.64 2.49×10 − 41 RF 0.12 0.36 9.40×10 − 48 MLPR 0.67 0.75 4.91×10 − 76 Intensity all LR -4.53×10 12 0.33 0.59 RR 0.65 0.89 3.00×10 − 141 Lasso 0.60 0.85 1.19×10 − 117 RF 0.19 0.73 4.02×10 − 70 MLPR 0.84 0.95 1.50×10 − 178 Reflectance extraction LR -6.97×10 8 0.01 0.57 RR 0.05 0.23 8.15×10 − 19 Lasso 0.04 0.21 3.35×10 − 15 RF 0.14 0.37 8.71×10 − 49 MLPR 0.39 0.69 6.30×10 − 88 3.4.1. Results of Regression Analysis of Light Intensity Data For the light intensity extracted data, the R 2 value was 0.1065, indicating that the model explained approximately 10.65% of the variability. the Pearson correlation coefficient was 0.3282, indicating a moderate positive correlation. the p-value was \(3.11\times {10}^{-39}\) , which is much less than 0.05, indicating that the predictive power of the linear regression model is significant. However, for the full light intensity data, the R2 values were unusually negative ( \(-4.53\times {10}^{12}\) ), which is unreasonable and may be due to errors in data pre-processing or model implementation. RR In the extracted light intensity data, the R 2 value improves to 0.34 with a Pearson correlation coefficient of 0.6327 and the p-value is still very small ( \(2.54\times {10}^{-40}\) ), indicating that the ridge regression model has a better fit compared to the linear regression model and the predictive power remains significant. In the light intensity data, the R 2 value increased to 0.65 and the Pearson correlation coefficient was as high as 0.89, which shows that the predictive ability of the model built from the full band light intensity data is more significant. Lasso Regression For the light intensity extracted data and the full band data, the R2 values were 0.70 and 0.60, and the Pearson correlation coefficients were 0.64 and 0.85, with p-values much less than 0.05, indicating that the regression model performed best in explaining the variability and correlation of the two types of data, and the predictive ability was very significant. RF & MLPR had R 2 values of 0.12 and 0.67 and Pearson correlation coefficients of 0.36 and 0.75 on the light intensity extraction data, with p-values that were very small, showing that the models had significant predictive ability, but their explanatory power was weaker compared to the Lasso regression model. In the regression analyses of the light intensity extraction data and the full band data, the scatter plots provide an intuitive way of observing the fit of different regression models at each distance. Below are the regression scatter plots for the light intensity extracted data (Fig. 14 ) and the full band data (Fig. 15 ): Figure 14 illustrates the scatter distribution across four distances, representing the predicted versus actual values for regression models. The MLP Regressor performs best at 0cm and 4cm, with a narrower distribution of scatters aligned along the diagonal, signifying high prediction accuracy. It effectively utilizes light intensity data for short-distance predictions. The Lasso Regressor excels under 2cm, demonstrating linearity and low error in medium-distance predictions. At 6cm, all models' fits are average, with scattered distribution and off-diagonal alignment, suggesting decreased accuracy. This decline in fit quality is attributed to the increasing complexity of the relationship between light intensity features and fruit canopy at greater distances, challenging the models' predictive accuracy. As distance grows, model fit degrades, scatter distribution widens, and prediction error rises, underscoring the prediction challenge at longer distances. Figure 15 shows the results of the regression analyses of the full-band data at different distances. Similar to the light intensity extraction data, we can see the difference in performance of the different models at different distances. The RR and LassoR fit well at close distances, but their accuracy declines with increasing distance. At 6 cm, Lasso performed the best, yet all models struggled at longer distances. The broad scatter distribution and significant prediction errors indicate models' challenges in precisely forecasting fruit cover at greater distances. Analysis of Figs. 14 and 15 indicated that Lasso and MLP regressors had superior fit and accuracy at specific distances in light intensity data, with performance diminishing at greater distances. This insight is crucial for comprehending spectral-canopy distance relationships and selecting optimal models for precise predictions. Future studies should explore model optimization for enhanced accuracy across varying distances and consider additional influential factors, such as environmental conditions and data collection techniques, to boost the model's practical utility. 3.4.2. Reflectance data regression analysis results For the reflectance extraction data, the R 2 value of the LR regression model was − 6.97×10 8 , which is an outlier and requires further checking of the accuracy of data processing and model implementation. While the R 2 values for the RR and Lasso regression models were 0.05 and 0.04, respectively, indicating limited explanatory power of the models. However, the RF and MLP regression models performed relatively well on the reflectance data with R 2 values of 0.14 and 0.40, respectively, showing some predictive ability. Figure 16 displays scatter distributions across four distances, where each scatter represents the relationship between the predicted and actual values of a regression model at a particular distance. While reflectance data's regression performance is not as strong as Intensity data, the MLPR model shows some predictive value at 2 cm. Light intensity extraction data's regression analysis indicates a correlation between spectral features and fruit covered distance, with Lasso regression effectively capturing this relationship, as evidenced by high R 2 and Pearson correlation values. However, anomalous R 2 values in full light intensity data necessitate a review of data processing and model development. Reflectance data, though less explanatory, hints at the potential utility of random forest and MLP regression models in predicting fruit cover, suggesting either a subtler relationship or the need for more complex models to discern it. In summary, the results of the regression analyses in this study reveal the complex relationship between spectral characteristics and leaf-covered raspberry distance, and highlight the advantages and disadvantages of different regression models in explaining and predicting this relationship. Future research can further explore more advanced models and algorithms to improve the accuracy and explanatory power of the prediction. At the same time, due attention must be paid to the accuracy of data processing and model implementation to avoid anomalous results. 3.5. Discussion. This study successfully detected leaf-covered fruits using hyperspectral data and a sophisticated ML model, addressing a key need in precision agriculture for automated, intelligent orchard management. The dataset included light intensity and reflectance spectral data, collected under specific conditions, to capture the spectral characteristics of fruits at various cover levels. The problem with choosing hyperspectral technology for predicting fruit cover is that hyperspectral technology has unique advantages. Hyperspectral data sets typically have lower data dimensions compared to traditional machine vision technologies, which allows for easier data processing and simpler, more efficient calculations that can be achieved in real time for precision agriculture. In addition, hyperspectral technology is able to capture detailed spectral information over a wider range of wavelengths, providing richer data to support the biochemical and physiological status of fruit. In this study, light intensity data and reflectance data showed different advantages in different research areas, and reflectance data showed clear advantages in the classification area. By combining the Random Forest algorithm, we achieved highly accurate classification results in which the ROC AUC values were close to perfect, and the accuracy and specificity were also very high. This indicates that the reflectance data can effectively reflect the spectral differences between covered and uncovered fruits, providing strong support for accurate classification. In the regression analysis, the light intensity data showed its advantages. Combined with the Lasso regression model, we found that the light intensity data had high explanatory power in predicting the degree of canopy cover, and both the R2 value and the Pearson correlation coefficient showed significant correlation. This suggests that light intensity data has good sensitivity and accuracy in reflecting the degree of fruit canopy. The research advantage of machine learning combined with hyperspectral data is its ability to deal with complex non-linear relationships and extract useful information from them. This approach not only improves the accuracy of fruit covered by leaf detection, but is also more applicable to production and practice. Although it is not yet widely used in practical production, this is mainly due to the low popularity of hyperspectral equipment and the fact that research and development of related technologies is still in its infancy. However, with the continuous advancement of technology and cost reduction, it is expected that hyperspectral technology will be widely used in smart agriculture in the near future to achieve comprehensive smart agricultural management. There are already many research and application cases demonstrating the potential of hyperspectral technology in smart agriculture, providing a solid foundation for future research and application. 4. Conclusion In this study, hyperspectral technology was systematically applied for the first time to provide an in-depth analysis of the leaf-covered fruit problem. By collecting hyperspectral data under different conditions and applying machine learning models for classification and regression analysis, leaf-covered and uncovered raspberry fruits were successfully distinguished. The results show that hyperspectral technology can effectively extract the spectral features of fruits and accurately predict the canopy status of fruits through appropriate data processing and model optimisation. In addition, we found that spectral intensity data have better applicability in solving the canopy cover problem, which provides new technical support and methodological direction for orchard management. This study not only confirms the potential of hyperspectral technology in agriculture, but also provides new perspectives for fruit quality assessment and orchard monitoring. By comparing different spectral data types and machine learning models, we provide clear methodological directions for future research and applications. In particular, this study found that combining light intensity data with multilayer perceptron models can achieve better performance in canopy detection, which provides strong technical support for the development of precision agriculture. Future research can explore the following aspects in depth: first, the generalisation ability and accuracy of the model can be improved by collecting more diverse orchard data. Secondly, the combination of other sensor data, such as multispectral imaging and thermal imaging data, will be explored to further improve the accuracy of canopy detection. In addition, the research could be extended to other types of fruit trees and fruit to validate the effectiveness of hyperspectral technology in a wider range of agricultural applications. Finally, real-time monitoring and automated control systems will be developed to apply hyperspectral technology to practical orchard management, in order to achieve automated and intelligent orchard management. Despite the results of this study, there are still some limitations. For example, the limited sample size in the dataset may affect the generalisability of the model. In addition, the reflectance data faced some challenges in pre-processing and model training, which need to be further optimised. Future research can be improved and deepened in the following areas: first, the robustness and accuracy of the model can be improved by increasing the sample size and diversity. Secondly, more advanced data processing techniques should be explored to better handle reflectance data. Finally, the problem of canopy detection under different environmental conditions is investigated to improve the adaptability of the model in practical applications. In conclusion, this study provides new insights and methods for the application of hyperspectral technology in orchard management and lays the foundation for the future development of precision agriculture. Continued refinement of methods and techniques is expected to lead to greater efficiency and sophistication in orchard management and fruit quality assessment. Declarations Authors’ contributions C-ZJ was mainly responsible for the execution of the experiments and the organisation and processing of the data. X-RQ provided extensive help and advice during the experimental process. R-ZH and W-J provided valuable help and insights during the data analysis and interpretation phase. All authors were involved in drafting and revising the manuscript and read and approved the final version for publication. Funding The authors are grateful for the financial support from the Hebei Provincial Department of Science and Technology (Grant number: 20326338D). Data availability statement Data supporting the results of this study are available in the article (and/or its supplementary materials). Due to privacy constraints and the proprietary nature of the data, other data related to this study are not publicly available as they are stored in Excel spreadsheets on the authors' personal computers. Requests for access to these datasets should be directed to Zhenhui Ren and will be considered on a case-by-case basis with the author's approval. Although these data are not publicly available at this time, the authors anticipate that they will be considered for inclusion in a public repository after 2025, when the individual co-authors have completed their relevant studies, pending completion of their relevant academic assignments by the first author. Competing Interests Statement Prof. Zhenhui Ren's work has been funded by Hebei Provincial Department of Science and Technology. Dr Zhujun Chen、Dr Juan Wang and Dr Ruiqian Xi declare no potential conflict of interest. References Arjo, D. (2009). Statistical Models: Theory and Practice. Technometrics , 48(2), 315 Barbedo, J. G. A. (2023). A review on the combination of deep learning techniques with proximal hyperspectral images in agriculture. Computers and Electronics in Agriculture , 210(0168-1699), 107920. doi: https://doi.org/10.1016/j.compag.2023.107920 Barnes, R. J., Dhanoa, M. S., & Lister, S. J. (1989). Standard Normal Variate Transformation and De-Trending of Near-Infrared Diffuse Reflectance Spectra. Applied Spectroscopy , 43(5), 772-777. doi: 10.1366/0003702894202201 Bertinetto, C. G., & Vuorinen, T. (2014). Automatic Baseline Recognition for the Correction of Large Sets of Spectra Using Continuous Wavelet Transform and Iterative Fitting. Applied Spectroscopy , 68(2), 155-164. doi: 10.1366/13-07018 Breiman, L. (2001). Random Forests. Machine Learning , 45(1), 5-32. doi: 10.1023/A:1010933404324 Castanié, F. (2013). Spectral analysis: parametric and non-parametric digital methods : John Wiley & Sons. Chang, C. (2022). Advances in Hyperspectral Image Processing Techniques - introduce . Charnley, S. B. (2023). Absorption Spectroscopy. Springer eBooks , 0(2023), 40-41. doi: 10.1007/978-3-662-65093-6_9 Chen, J., Zhang, H., Wang, Z., Wu, J., Luo, T., Wang, H.,... Long, T. (2022). An image restoration and detection method for picking robot based on convolutional auto-encoder. Computers and Electronics in Agriculture , 196(0168-1699), 106896. doi: https://doi.org/10.1016/j.compag.2022.106896 Chen, R., Liu, W., Yang, H., Jin, X., Yang, G., Zhou, Y.,... Feng, H. (2024). A novel framework to assess apple leaf nitrogen content: Fusion of hyperspectral reflectance and phenology information through deep learning. Computers and Electronics in Agriculture , 219(0168-1699), 108816. doi: https://doi.org/10.1016/j.compag.2024.108816 Dai, F., Wang, F., Yang, D., Lin, S., Chen, X., Lan, Y.,... Deng, X. (2022). Detection Method of Citrus Psyllids With Field High-Definition Camera Based on Improved Cascade Region-Based Convolution Neural Networks. Frontiers in Plant Science , 12(Jan 24), 816272. doi: 10.3389/fpls.2021.816272 De Santis, D., Carbone, K., Garzoli, S., Laghezza, M. V., & Turchetti, G. (2022). Bioactivity and Chemical Profile of Rubus idaeus L. Leaves Steam-Distillation Extract. [Journal Article]. Foods , 11(10), 1455. doi: 10.3390/foods11101455 Diwu, P. Y., Bian, X. H., Wang, Z. F., & Liu, W. (2019). Study on the Selection of Spectral Preprocessing Methods. SPECTROSCOPY AND SPECTRAL ANALYSIS , 39(9), 2800-2806. doi: 10.3964/j.issn.1000-0593(2019)09-2800-07 E., C., D., Z., & R., R. (2009). Neurofuzzy prediction for gaze control. Canadian Journal of Electrical and Computer Engineering , 34(1/2), 15-20. doi: 10.1109/CJECE.2009.5291203 Frees, E. W. (2009). Multiple Linear Regression – IN. Cambridge University Press eBooks , 0(2009), 70-106. doi: 10.1017/cbo9780511814372.004 Galvez-Sola, L., Garcia-Sanchez, F., Perez-Perez, J. G., Gimeno, V., Navarro, J. M., Moral, R.,... Nieves, M. (2015). Rapid estimation of nutritional elements on citrus leaves by near infrared reflectance spectroscopy. [Journal Article]. Front Plant Sci , 6(Jul 23), 571. doi: 10.3389/fpls.2015.00571 Genkin, A., Lewis, D. D., & Madigan, D. (2007). Large-Scale Bayesian Logistic Regression for Text Categorization. Technometrics , 49(3), 291-304. doi: 10.1198/004017007000000245 Genuer, R., Poggi, J., & Tuleau-Malot, C. (2010). Variable selection using random forests. Pattern Recognition Letters , 31(14), 2225-2236. doi: https://doi.org/10.1016/j.patrec.2010.03.014 Golnaraghi, S., Zangenehmadar, Z., Moselhi, O., & Alkass, S. (2019). Application of Artificial Neural Network(s) in Predicting Formwork Labour Productivity. Advances in Civil Engineering , 2019(PT.1), 1-11 Golub, G. H., Hansen, P. C., & O'Leary, D. P. (1999). Tikhonov Regularization and Total Least Squares. SIAM Journal on Matrix Analysis and Applications , 21(1), 185-194. doi: 10.1137/S0895479897326432 Guo, L., Du, S., Gao, S., Zhao, R., Huang, G., Jin, F.,... Zhang, L. (2022). Delta-Radiomics Based on Dynamic Contrast-Enhanced MRI Predicts Pathologic Complete Response in Breast Cancer Patients Treated with Neoadjuvant Chemotherapy. Cancers , 14(14), 3515. doi: 10.3390/cancers14143515 Jiang, P., Wu, H., Wei, J., Sang, F., Sun, X.,... Lu, Z. (2007). RF-DYMHC: detecting the yeast meiotic recombination hotspots and coldspots by random forest model using gapped dinucleotide composition features. Nucleic acids research , 35(Web Server issue), W47-W51. doi: 10.1093/nar/gkm217 Kang, H., & Chen, C. (2020). Fruit detection, segmentation and 3D visualisation of environments in apple orchards. Computers and Electronics in Agriculture , 171(0168-1699), 105302. doi: 10.1016/j.compag.2020.105302 Khan, W., Zaki, N., Ahmad, A., Masud, M. M., Ali, L., Ali, N.,... Ahmed, L. A. (2022). Mixed Data Imputation Using Generative Adversarial Networks. IEEE Access , 10(2169-3536), 124475-124490. doi: 10.1109/access.2022.3218067 Komarek, P., Moore, A., Committee, A., Calvet, A., & Nichol. (2004). Logistic regression for data mining and high-dimensional classification ., Carnegie Mellon University. Retrieved from Available from Labory, J., Njomgue-Fotso, E., & Bottini, S. (2024). Benchmarking feature selection and feature extraction methods to improve the performances of machine-learning algorithms for patient classification using metabolomics biomedical data. Computational and Structural Biotechnology Journal , 23(Mar 19), 1274-1287. doi: https://doi.org/10.1016/j.csbj.2024.03.016 LI, M., HU, H., & ZHAO, L. (2022). Key factors affecting carbon prices from a time-varying perspective. Environmental Science and Pollution Research , 29(43), 65144-65160. doi: 10.1007/s11356-022-20376-x Ling, B., Goodin, D. G., Raynor, E. J., & Joern, A. (2019). Hyperspectral Analysis of Leaf Pigments and Nutritional Elements in Tallgrass Prairie Vegetation. Frontiers in plant science , 10(Feb 25), 142. doi: 10.3389/fpls.2019.00142 Lyu, Z., Wang, Z., Luo, F., Shuai, J., & Huang, Y. (2021). Protein Secondary Structure Prediction With a Reductive Deep Learning Method. [Journal Article]. Front Bioeng Biotechnol , 9(2296-4185), 687426. doi: 10.3389/fbioe.2021.687426 Maindonald, J., & Braun, W. J. (2010). Multiple linear regression. In J. Maindonald & W. J. Braun (Eds.), (170-216). Cambridge: Cambridge University Press. (Reprinted. Mannino, G., Serio, G., Gaglio, R., Busetta, G., La Rosa, L., Lauria, A.,... Gentile, C. (2022). Phytochemical Profile and Antioxidant, Antiproliferative, and Antimicrobial Properties of Rubus idaeus Seed Powder. FOODS , 11(17), 2605 Marill, K. A. (2004). Advanced Statistics:Linear Regression,Part I: Simple Linear Regression. Academic Emergency Medicine , 11(1069-6563), 87-93. doi: 10.1197/j.aem.2003.09.005 Mirbod, O., Choi, D., Heinemann, P. H., Marini, R. P., & He, L. (2023). On-tree apple fruit size estimation using stereo vision with deep learning-based occlusion handling. Biosystems Engineering , 226(1537-5110), 27-42. doi: 10.1016/j.biosystemseng.2022.12.008 Naozumi, H., Lundberg, S. M., & Su-In, L. (2019). AIControl: replacing matched control experiments with machine learning improves ChIP-seq peak identification. Nuclc Acids Research , 10(47), 58 Natasa Kljajic, J. Subic, & Sredojević, Z. (2017). Profitability of Raspberry Production on Holdings in THE Territory of Ariljr. Ekonomika Poljoprivrede (1979) , 1(64), 57-68. doi: 10.5937/ekopolj1701057k Platt, U., & Stutz, J. (2008). Differential Absorption Spectroscopy. In U. Platt & J. Stutz (Eds.), Differential Optical Absorption Spectroscopy: Principles and Applications (135-174). Berlin, Heidelberg: Springer Berlin Heidelberg. (Reprinted. Press, W. H., & Teukolsky, S. A. (1990). Savitzky‐Golay Smoothing Filters. Computer in Physics , 4(6), 669-672. doi: 10.1063/1.4822961 Sachar, S., & Kumar, A. (2021). Survey of feature extraction and classification techniques to identify plant through leaves. Expert systems with applications , 167(4), 114181. doi: 10.1016/j.eswa.2020.114181 Sen, P. B., Tomal, J. H., & Yan, Y. (2022). A Novel Algorithm for Feature Selection Using Penalized Regression with Applications to Single-Cell RNA Sequencing Data. [Journal Article]. Biology (Basel) , 11(10), 1495. doi: 10.3390/biology11101495 Sheridan, & RP. (2013). Using Random Forest To Model the Domain Applicability of Another Random Forest Model. J Chem Inf Model , 11(53), 2837-2850. doi: https://doi.org/10.1021/ci400482e Sonnberger, R. B. H. (1989). Regression Diagnostics: Identifying Influential Data and Sources of Collinearity.by D. A. Belsley; E. Kuh; R. E. Welsch. Journal of Applied Econometrics , 4(1), 97-99 Svetnik, V. (2003). Random forest: a classification and regression tool for compound classification and QSAR modeling. Journal of Chemical Information & Computer Sciences , 43(6), 1947-1958. doi: https://doi.org/10.1021/ci034160g Tan, C. H., Dai, H. P., Lu, J., & Shi, W. (2020). Raspberry production in greenhouse in Northeast China . Tibshirani, R., & Tibshirani, R. (1996). Regression shrinkage via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) , 58(1), 267-288. doi: https://doi.org/10.1111/j.2517-6161.1996.tb02080.x Tsarouchi, M. I., Vlachopoulos, G. F., Karahaliou, A. N., Vassiou, K. G., & Costaridou, L. I. (2020). Multi-parametric MRI lesion heterogeneity biomarkers for breast cancer diagnosis. Physica Medica , 80(2), 101-110 Vu, B. N., Sánchez, O., Bi, J., Xiao, Q., Hansel, N. N., Checkley, W.,... Liu, Y. (2019). Developing an Advanced PM2.5 Exposure Model in Lima, Peru Remote Sensing (11, pp. 614). (Reprinted. Wang, Y., Li, Y., Song, Y., & Rong, X. (2019). Facial Expression Recognition Based on Random Forest and Convolutional Neural Network. Information (Basel) , 10(12), 375. doi: 10.3390/info10120375 Wei, X., Wu, L., Ge, D., Yao, M., & Bai, Y. (2022). Prediction of the Maturity of Greenhouse Grapes Based on Imaging Technology. Plant Phenomics , 2022(Mar 30), 9753427. doi: 10.34133/2022/9753427 Y., L., X., W., H., Y., & W., D. (2023). Pattern-Coupled Baseline Correction Method for Near-Infrared Spectroscopy Multivariate Modeling. IEEE Transactions on Instrumentation and Measurement , 72(1557-9662), 1-9. doi: 10.1109/TIM.2023.3265101 Yan, G., Zhang, J., Jiang, M., Gao, X., Yang, H.,... Li, L. (2020). Identification of Known and Novel MicroRNAs in Raspberry Organs Through High-Throughput Sequencing. [Journal Article]. Front Plant Sci , 11, 728. doi: 10.3389/fpls.2020.00728 Zhang, L., Zhang, K., Liu, S., Zhang, R., Yang, Y., Wang, Q.,... Wang, J. (2021). Identification of a ceRNA Network in Lung Adenocarcinoma Based on Integration Analysis of Tumor-Associated Macrophage Signature Genes. Frontiers in cell and developmental biology , 9, 629941. doi: 10.3389/fcell.2021.629941 Zhu, M., Huang, D., Hu, X., Tong, W., Han, B., Tian, J.,... Luo, H. (2020). Application of hyperspectral technology in detection of agricultural products and food: A Review. Food Science & Nutrition , 8(10), 5206-5214. doi: https://doi.org/10.1002/fsn3.1852 Zhu, X., Chen, F., Zheng, Y., Peng, X., & Chen, C. (2024). An efficient method for detecting Camellia oleifera fruit under complex orchard environment. Scientia Horticulturae , 330(0304-4238), 113091. doi: https://doi.org/10.1016/j.scienta.2024.113091 Zhujun, C., Juan, W., Xuan, L., Yuhong, G., & Zhenhui, R. (2023). The Application of Optical Nondestructive Testing for Fresh Berry Fruits. Food Engineering Reviews , 16(2024), 85-115. doi: 10.1007/s12393-023-09353-3 Additional Declarations No competing interests reported. Supplementary Files GraphicalAbstract.jpg Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4607290","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":326321683,"identity":"1420c5c7-077e-4e03-bb16-da732af8928b","order_by":0,"name":"Zhujun Chen","email":"","orcid":"","institution":"Hebei Agricultural University","correspondingAuthor":false,"prefix":"","firstName":"Zhujun","middleName":"","lastName":"Chen","suffix":""},{"id":326321684,"identity":"b2d0f6d8-7ce1-443a-a3bf-f843fb315bf8","order_by":1,"name":"Juan Wang","email":"","orcid":"","institution":"Hebei Agricultural University","correspondingAuthor":false,"prefix":"","firstName":"Juan","middleName":"","lastName":"Wang","suffix":""},{"id":326321685,"identity":"13ed50d5-99d2-484a-a791-828910bb8d13","order_by":2,"name":"Ruiqian Xi","email":"","orcid":"","institution":"Hebei Agricultural University","correspondingAuthor":false,"prefix":"","firstName":"Ruiqian","middleName":"","lastName":"Xi","suffix":""},{"id":326321686,"identity":"2fa108da-3d13-48c4-8d52-61a109b17615","order_by":3,"name":"Zhenhui Ren","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAsklEQVRIiWNgGAWjYJACZoYKGzk29vYDpGg5k2bMx3MmgQQtjC2HE+dJOBgQp9y8vffw58KGtPQ2CYYEhh8V2whrkTlzLsF45g6b3DbpxgOMPWduE9YiIZFjkMx7Ji23TeZAAjNjGzFa5N8YHOZtO5zOJpFgQKQWCR7DZqCWBBK08OQYM/OcSTNsAwbyQeL8wn7G+DNPhY28fHv7wQc/KojQggIOkKh+FIyCUTAKRgEuAACAKDg3bl41tgAAAABJRU5ErkJggg==","orcid":"","institution":"Hebei Agricultural University","correspondingAuthor":true,"prefix":"","firstName":"Zhenhui","middleName":"","lastName":"Ren","suffix":""}],"badges":[],"createdAt":"2024-06-19 16:53:09","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4607290/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4607290/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":60299383,"identity":"57622637-db21-45d0-9f75-2f1b73fca9a9","added_by":"auto","created_at":"2024-07-15 10:28:45","extension":"jpg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":166585,"visible":true,"origin":"","legend":"\u003cp\u003eOptpsky ATP2400 Spectrometer\u003c/p\u003e","description":"","filename":"fig.1.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4607290/v1/5aad6b2e7822bf9f3ae3fafa.jpg"},{"id":60298192,"identity":"54a32143-a4e6-4f90-8d75-57a8ea6c9222","added_by":"auto","created_at":"2024-07-15 10:12:44","extension":"jpg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":119956,"visible":true,"origin":"","legend":"\u003cp\u003eMeasurement of distance\u003c/p\u003e","description":"","filename":"fig.2.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4607290/v1/b1c2f353da264f9534112028.jpg"},{"id":60299382,"identity":"aacbdaf5-a144-42a5-aa52-10f789e403b5","added_by":"auto","created_at":"2024-07-15 10:28:44","extension":"jpg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":6420739,"visible":true,"origin":"","legend":"\u003cp\u003eMLP Principle Diagram\u003c/p\u003e","description":"","filename":"fig.3.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4607290/v1/3335c28bb96442b2941dc4e0.jpg"},{"id":60298196,"identity":"f086941c-4e1c-4853-b525-4106d1a072ec","added_by":"auto","created_at":"2024-07-15 10:12:44","extension":"jpg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":1939015,"visible":true,"origin":"","legend":"\u003cp\u003eMissing Value Distribution\u003c/p\u003e","description":"","filename":"fig.4.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4607290/v1/f8ad509ca08b0ac70d936d51.jpg"},{"id":60298194,"identity":"86a83528-669f-43b5-bb9b-23f800d56fd4","added_by":"auto","created_at":"2024-07-15 10:12:44","extension":"jpg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":2036711,"visible":true,"origin":"","legend":"\u003cp\u003eAverage Intensity Data\u003c/p\u003e","description":"","filename":"fig.5.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4607290/v1/51db1a75ea9d8acb0198153d.jpg"},{"id":60298652,"identity":"4b4c990a-a5cf-4956-835d-d79de97b995b","added_by":"auto","created_at":"2024-07-15 10:20:44","extension":"jpg","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":1880747,"visible":true,"origin":"","legend":"\u003cp\u003eAverage Reflectance Data\u003c/p\u003e","description":"","filename":"fig.6.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4607290/v1/e35d4ff825399a1c3132cd15.jpg"},{"id":60298656,"identity":"bf7a2503-6abd-472d-b5a2-06eb0b66034a","added_by":"auto","created_at":"2024-07-15 10:20:45","extension":"jpg","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":5120562,"visible":true,"origin":"","legend":"\u003cp\u003ePreprocessed Optical Intensity Data\u003c/p\u003e","description":"","filename":"fig.7.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4607290/v1/601f0c3181a0cff8714b192d.jpg"},{"id":60298207,"identity":"f9ac3269-57f7-4e3b-a32e-4711f60104c2","added_by":"auto","created_at":"2024-07-15 10:12:45","extension":"jpg","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":3544356,"visible":true,"origin":"","legend":"\u003cp\u003ePreprocessed Optical Reflectance Data\u003c/p\u003e","description":"","filename":"fig.8.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4607290/v1/2fe9758ed535b2a11caa5cb0.jpg"},{"id":60298209,"identity":"d23ee025-f1e7-4ae4-8b07-eb98e5d9e94d","added_by":"auto","created_at":"2024-07-15 10:12:45","extension":"jpg","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":1027703,"visible":true,"origin":"","legend":"\u003cp\u003ePeak Detection Result\u003c/p\u003e","description":"","filename":"fig.9.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4607290/v1/fe2c1c6f509421a2a17eceba.jpg"},{"id":60299831,"identity":"68822158-c6b5-41ae-a53e-edd083eb7a21","added_by":"auto","created_at":"2024-07-15 10:36:45","extension":"jpg","order_by":10,"title":"Figure 10","display":"","copyAsset":false,"role":"figure","size":1030693,"visible":true,"origin":"","legend":"\u003cp\u003eValley Detection Result\u003c/p\u003e","description":"","filename":"fig.10.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4607290/v1/5a036b6ca8ae11aef8a6021b.jpg"},{"id":60298206,"identity":"e14276d0-6de7-411f-adc8-5c2230e37e14","added_by":"auto","created_at":"2024-07-15 10:12:45","extension":"jpg","order_by":11,"title":"Figure 11","display":"","copyAsset":false,"role":"figure","size":1258145,"visible":true,"origin":"","legend":"\u003cp\u003eROC curve of optical intensity extraction data\u003c/p\u003e","description":"","filename":"fig.11.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4607290/v1/84af2501d7f64378ee86a3d7.jpg"},{"id":60298199,"identity":"67c240af-d05f-4680-a60c-03fa3dce48ce","added_by":"auto","created_at":"2024-07-15 10:12:44","extension":"jpg","order_by":12,"title":"Figure 12","display":"","copyAsset":false,"role":"figure","size":1106386,"visible":true,"origin":"","legend":"\u003cp\u003eROC curve of optical intensity full data\u003c/p\u003e","description":"","filename":"fig.12.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4607290/v1/44de29ae36091e8eb6c0684b.jpg"},{"id":60298204,"identity":"29a5cae7-5321-462c-af31-a094e910b9be","added_by":"auto","created_at":"2024-07-15 10:12:45","extension":"jpg","order_by":13,"title":"Figure 13","display":"","copyAsset":false,"role":"figure","size":1183272,"visible":true,"origin":"","legend":"\u003cp\u003eROC curve of optical reflectance extraction data\u003c/p\u003e","description":"","filename":"fig.13.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4607290/v1/57d861281a9c84c7ef281b08.jpg"},{"id":60298654,"identity":"5abd193e-fc48-4ea9-adad-76517fad2d2f","added_by":"auto","created_at":"2024-07-15 10:20:44","extension":"jpg","order_by":14,"title":"Figure 14","display":"","copyAsset":false,"role":"figure","size":3795398,"visible":true,"origin":"","legend":"\u003cp\u003eRegression scatter plots for extracted intensity data\u003c/p\u003e","description":"","filename":"fig.14.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4607290/v1/5d075e34e6dc4a4f49d6efa6.jpg"},{"id":60298200,"identity":"23fabc25-49dd-42ae-b859-7df2e2d42c17","added_by":"auto","created_at":"2024-07-15 10:12:45","extension":"jpg","order_by":15,"title":"Figure 15","display":"","copyAsset":false,"role":"figure","size":3275589,"visible":true,"origin":"","legend":"\u003cp\u003eRegression scatter plots for full intensity data\u003c/p\u003e","description":"","filename":"fig.15.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4607290/v1/b831a9184dc6289a7525047c.jpg"},{"id":60298205,"identity":"250e154f-34b5-4b69-9fe2-9ac2f673dcf1","added_by":"auto","created_at":"2024-07-15 10:12:45","extension":"jpg","order_by":16,"title":"Figure 16","display":"","copyAsset":false,"role":"figure","size":3364054,"visible":true,"origin":"","legend":"\u003cp\u003eRegression scatter plots for extracted Reflectance data\u003c/p\u003e","description":"","filename":"fig.16.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4607290/v1/6a7e8522410297c98450d812.jpg"},{"id":100370496,"identity":"eab0ef4c-d416-43e8-9fe6-a562b7f3d0c5","added_by":"auto","created_at":"2026-01-16 08:06:08","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":38345988,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4607290/v1/3b374614-528d-460d-9cb7-a9ab8de01130.pdf"},{"id":60299381,"identity":"3912c288-83fb-4a87-9e92-14ac2180e3f7","added_by":"auto","created_at":"2024-07-15 10:28:44","extension":"jpg","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":432877,"visible":true,"origin":"","legend":"","description":"","filename":"GraphicalAbstract.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4607290/v1/0da6dc3e14aa2bc279797405.jpg"}],"financialInterests":"No competing interests reported.","formattedTitle":"Analysis of Leaf cover on Raspberry Fruits Based on Hyperspectral Techniques Combined with Machine Learning Models","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eRaspberries (Rubus idaeus) are popular berries known for their delicious taste and high nutritional value(Natasa Kljajic, J. Subic, \u0026amp; Sredojević, 2017). Their high levels of vitamin C, fibre, antioxidants and other nutrients give raspberries significant nutritional and functional value(Mannino et al., \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). As a result, raspberries are in high demand on the market, not only for direct consumption, but also for the production of jams, juices, frozen foods and more(De Santis, Carbone, Garzoli, Laghezza, \u0026amp; Turchetti, \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). Given their nutritional composition and market demand, raspberry cultivation has significant economic value and research importance in current agricultural practices, as evidenced by the increasing area and yield of raspberries(Tan, Dai, Lu, \u0026amp; Shi, \u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e2020\u003c/span\u003e;Yan et al., \u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). As agricultural practices evolve towards more efficient and sustainable methods, the emergence of intelligent orchard management, including yield estimation, growth monitoring and fruit drop prediction, along with harvesting mechanisation to increase efficiency, is becoming critical (Zhu, Chen, Zheng, Peng, \u0026amp; Chen, \u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). A critical aspect of orchard management is the effective identification of all raspberry fruit in the orchard; however, the phenomenon of fruit being covered by leaves affects yield estimation, harvesting efficiency and overall crop quality. Therefore, research into the problem of leaf-covered raspberry fruit is of paramount importance in order to improve the efficiency and accuracy of raspberry fruit collection.\u003c/p\u003e \u003cp\u003eCurrently, machine vision technology is commonly used to identify covered fruit(Chen et al., \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). With the advancement of deep learning, machine vision technology in the field of object detection has matured and can meet the current demands of smart orchard management(Dai et al., \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2022\u003c/span\u003e;Wei, Wu, Ge, Yao, \u0026amp; Bai, \u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). In recent years, the development of hyperspectral analysis technology has provided many new management requirements and solutions for smart orchard management(Zhujun, Juan, Xuan, Yuhong, \u0026amp; Zhenhui, \u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). Hyperspectral analysis is expected to play a crucial role in the future smart orchard management, so it is of great importance to explore its various functional areas. Currently, there have been some studies on the identification of covered fruits. These studies mainly focus on using machine vision technology to identify covered fruits and improving the accuracy of identifying covered fruits by upgrading data acquisition equipment(Mirbod, Choi, Heinemann, Marini, \u0026amp; He, \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2023\u003c/span\u003e) or optimising algorithms(Kang \u0026amp; Chen, \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). However, there are limitations to machine vision technology for identifying leaf-covered fruits, including the effect of cover angle on the accuracy of identifying covered fruits, the inability to identify small fruits that are completely covered by leaves, and the potential consumption of significant computational resources and time in processing image information compared to spectral data processing(E., D., \u0026amp; R., 2009). In addition, machine vision processing needs to take into account factors such as lighting, angles and canopy cover, adding to the complexity of the processing.\u003c/p\u003e \u003cp\u003eHyperspectral technology is considered a potentially powerful tool for improving orchard management practices(Barbedo, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2023\u003c/span\u003e;Chen et al., \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2024\u003c/span\u003e;Zhujun et al., \u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). Unlike traditional spectral methods, hyperspectral analysis technology provides a comprehensive and detailed examination of the electromagnetic spectrum, allowing precise identification and quantification of biochemical and physiological changes in plants(Barbedo, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2023\u003c/span\u003e;Chang, \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). The versatility and expanding applications of hyperspectral technology allow researchers to efficiently collect diverse and relevant data for intelligent orchard management. By integrating multiple data streams, hyperspectral analysis simplifies the information acquisition process, reduces equipment diversity, and reduces the complexity of processing heterogeneous data sets.\u003c/p\u003e \u003cp\u003eHyperspectral analysis technology can capture spectral information from both leaf-covered fruits and uncovered fruits in the orchard. By analysing and processing this information, characteristic information of the fruit can be extracted, enabling the prediction and detection of leaf-covered raspberry fruit. This study is the first to use hyperspectral technology to identify leaf-covered raspberry fruits. By using the comprehensive spectral information provided by hyperspectral analysis, researchers can distinguish fruits that are completely covered by leaves, thereby supporting more accurate yield estimation, harvest planning and crop management strategies.\u003c/p\u003e \u003cp\u003eThis study aims to address the shortcomings of existing methods for assessing leaf-covered fruit by fully exploiting the capabilities of hyperspectral technology. Through careful data pre-processing, feature extraction and classification modelling, this study aims to elucidate the spectral characteristics of fruit development affected by leaf canopy, and to evaluate the effectiveness of hyperspectral analysis in characterising this phenomenon. By gaining insight into the impact of canopy cover on intelligent orchard management and automated harvesting, this study aims to provide valuable tools and insights to optimise the production process and improve crop quality in raspberry cultivation.\u003c/p\u003e \u003cp\u003eThis study uses hyperspectral data analysis technology to collect two types of spectral data: leaf-covered raspberry fruit and fruit without leaf cover, including light intensity data and reflectance data. Classification is performed using models such as Random forest (RF), Logistic regression (LR) and multilayer perceptrons (MLP), and model performance is evaluated by tuning grid search parameters. The results indicate that the hyperspectral analysis technology is effective in distinguishing fruits with and without leaf cover. In addition, we conducted a correlation analysis of the distance covered by leaves on raspberry fruits and found a significant effect of leaf cover on spectral features at different distances, with the highest correlation coefficient observed at 0 cm. Furthermore, we compared the applicability of spectral intensity data and reflectance data in studying leaf-covered raspberry fruits, with the results indicating that spectral intensity data is more suitable for addressing leaf cover issues.\u003c/p\u003e \u003cp\u003eThe innovation of this study is the first application of hyperspectral technology to the detection of raspberry leaf shading problems, an approach that is groundbreaking in the field of agriculture. Compared to traditional machine vision techniques, hyperspectral technology is capable of capturing detailed spectral information over a wider range of the electromagnetic spectrum, offering new possibilities for the precise identification and quantification of biochemical and physiological changes in plants. By utilising the data collected by the hyperspectral instrument and combining it with an advanced machine learning model, this study not only effectively distinguishes fruits under different coverage distances, but also shows the superiority of light intensity data in dealing with the coverage problem. Among them, the multilayer perceptron (MLP) model achieves a ROC AUC value of 0.99 on full-band data, a result that excels in classification analysis. The approach in this study provides new perspectives and methods for precision agriculture and orchard management by simplifying the information acquisition process, reducing the diversity of equipment requirements, and reducing the complexity of handling heterogeneous datasets. In addition, this study provides new technical support for smart orchard management and automated harvesting, which is expected to significantly improve the efficiency and accuracy of leaf-covered fruit detection.\u003c/p\u003e"},{"header":"2. Materials and methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.1. Data collection\u003c/h2\u003e \u003cp\u003eThis study focused on two varieties of raspberries, Polka and Hirtz, grown at the Xinyun Family Farm in Baoding City, Hebei Province, China (38.84265881177145, 115.42011291742594). The raspberries were grown in a greenhouse and spectral data were collected from May to September 2023. Data collection ensured adequate sunlight and good fruit growth conditions (In this study, all spectral data were collected from raspberry plants (Rubus idaeus) while they were growing, with no instances of fruit picking involved in the process.). The optpsky ATP2400 spectrometer (Fig.\u0026nbsp;\u003cspan refid=\"Fig17\" class=\"InternalRef\"\u003e1\u003c/span\u003e) was used for hyperspectral data collection. The spectrometer had 2048 spectral bands from 191.37 to 1106.92 nm with a sampling rate of 0.36 nm. The data collected included light intensity, absorbance and reflectance data.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eDuring data collection, the probe of the hyperspectral instrument was placed close to the measured leaves. Spectral data were first measured for leaves without fruit and leaves with fruit. For situations with covered fruits, spectral data were also collected for four different leaf-to-fruit distances: 0 cm, 2 cm, 4 cm and 6 cm. To minimise errors in the leaf-to-fruit distance, measurements were made using dimensional measurements and manual support of the fruits to control the leaf-to-fruit distance. A total of five scenarios of spectral data were collected, including leaves without covered fruits and covered fruits at distances of 0 cm, 2 cm, 4 cm and 6 cm (Fig.\u0026nbsp;\u003cspan refid=\"Fig18\" class=\"InternalRef\"\u003e2\u003c/span\u003e). A total of 6294 sets of light intensity data, 6293 sets of reflectance data and 6292 sets of absorbance data were collected for the five scenarios. Each measurement consisted of one set of hyperspectral data, and by modifying the measurement types option, all three types of spectral data were measured simultaneously. This ensured consistency in the measurement targets and reduced variation between the study objects, thereby minimising the influence of measurement error.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e2.2. Data pre-processing\u003c/h2\u003e \u003cp\u003eData pre-processing plays a crucial role in various scientific fields, especially in agricultural research(Diwu, Bian, Wang, \u0026amp; Liu, \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2019\u003c/span\u003e). The spectral data were collected from leaf-covered and uncovered raspberry plants. All data processing procedures were performed using Python 3.11.5 software within Anaconda (v23.7.4) and custom Python scripts to ensure high quality data sets for subsequent classification and regression analyses.\u003c/p\u003e \u003cp\u003eInitially, raw spectral data was loaded from Excel files. Due to environmental factors, instrument performance and interference during data transmission, the raw data collected often contained missing values. In the early stages of data pre-processing, the missing values in the data set were checked. It was found that the amount of missing data in the spectral intensity and reflectance datasets was minimal and limited to specific rows. Therefore, a manual deletion approach was used to remove missing values to ensure data integrity. However, the absorbance dataset had a higher rate of missing values. To address this issue, the K-Nearest Neighbours (KNN) interpolation method was used to fill in the missing values. However, even after interpolation, significant uncertainty remained in the absorbance data, which could potentially affect the accuracy of the final analysis.\u003c/p\u003e \u003cp\u003eBaseline correction and smoothing techniques were used to reduce noise in the spectral data(Y., X., H., \u0026amp; W., 2023). Specifically, wavelet baseline correction and the Savitzky-Golay (SG) filter(Press \u0026amp; Teukolsky, \u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e1990\u003c/span\u003e) were applied to improve the signal-to-noise ratio and ensure the stability of the analysis(Bertinetto \u0026amp; Vuorinen, \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2014\u003c/span\u003e). Following baseline correction and smoothing, scatter correction was applied to standardise spectral intensities between samples. The standard normal variate (SNV)(Barnes, Dhanoa, \u0026amp; Lister, \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e1989\u003c/span\u003e) correction was chosen for its ability to eliminate variation caused by scatter effects, thereby concentrating and standardising the data for comparative analysis. Feature extraction was then performed to identify significant spectral features associated with fruit cover. Peak detection algorithms were used to extract feature peaks and valleys from the pre-processed spectra, which were used for subsequent classification and regression model development.\u003c/p\u003e \u003cp\u003eThe motivation for selecting hyperspectral technology for this study is its ability to capture detailed spectral information over a wide range of wavelengths(Zhu et al., \u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). The pre-processing pipeline outlined in this section provides a critical foundation for subsequent analyses to investigate the effect of canopy cover on fruit spectral characteristics. By using advanced hyperspectral technology and tailored preprocessing techniques, we aim to provide valuable insights to guide agricultural practices and contribute to the optimisation of fruit quality assessment methods.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e2.3. Classification analysis\u003c/h2\u003e \u003cp\u003eThe dataset utilised in this study consists of hyperspectral data of raspberry fruits covered and uncovered by leaves, with the aim of classifying the two scenarios of covered and uncovered raspberry fruits. The dataset consists of several samples, each with a set of spectral features that form a feature matrix. The target variable \"label\" indicates whether the sample is covered or not. After pre-processing, the dataset was strictly divided into training and test sets at a ratio of 70% (training set) and 30% (test set) to ensure effective generalisation of the model to unknown data. This partitioning method helps to evaluate the performance of the model in practical applications. To mitigate the influence of different feature value ranges on model training, feature scaling was performed using the StandardScaler method. This method transforms the values of each feature into a distribution with a mean of 0 and a standard deviation of 1, thereby accelerating the convergence speed of the model and improving its stability and predictive performance.\u003c/p\u003e \u003cp\u003eThree different classification models were applied to analyse the dataset: RF Classifier, LR and MLP Classifier.\u003c/p\u003e \u003cdiv id=\"Sec6\" class=\"Section3\"\u003e \u003ch2\u003e2.3.1. RF Classifier\u003c/h2\u003e \u003cp\u003eThe Random Forest (RF) classifier(Genuer, Poggi, \u0026amp; Tuleau-Malot, \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2010\u003c/span\u003e;Wang, Li, Song, \u0026amp; Rong, \u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e2019\u003c/span\u003e), known for its robustness against overfitting. In this study, the RF classifier(Breiman, \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2001\u003c/span\u003e) was used to analyse the hyperspectral data of covered and uncovered raspberry fruits in order to identify and classify the two different scenarios.This method, which operates by averaging predictions from multiple decision trees(Jiang et al., \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e2007\u003c/span\u003e), was optimized by tweaking parameters such as the number of trees (n_estimators) and the maximum depth of trees(max_depth), thereby enhancing its effectiveness in identifying fruit coverage.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section3\"\u003e \u003ch2\u003e2.3.2. LR Classifier\u003c/h2\u003e \u003cp\u003eLR(Genkin, Lewis, \u0026amp; Madigan, \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2007\u003c/span\u003e;Komarek, Moore, Committee, Calvet, \u0026amp; Nichol, \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2004\u003c/span\u003e) is a standard algorithm for binary classification(Feng et al., 2022;Mondal et al., 2019), was employed to assess the likelihood of raspberry fruits being covered based on their hyperspectral data. By estimating the probability of a sample belonging to a specific category, LR provides a clear interpretation of the data. In this research, model complexity was managed and hyperparameters were optimized using GridSearchCV, enhancing the model's predictive accuracy for identifying the canopy status of the fruits, which is crucial for precision agriculture and fruit quality management.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section3\"\u003e \u003ch2\u003e2.3.3. MLP Classifier\u003c/h2\u003e \u003cp\u003eMLP(Lyu, Wang, Luo, Shuai, \u0026amp; Huang, \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e2021\u003c/span\u003e) is a type of feed-forward artificial neural network, was utilized in this study for classifying raspberry fruits based on their hyperspectral data. The MLP's structure, which includes input, hidden, and output layers with non-linear activation functions, allows it to learn complex relationships between input features and outputs (Fig.\u0026nbsp;\u003cspan refid=\"Fig19\" class=\"InternalRef\"\u003e3\u003c/span\u003e). This model was optimized by adjusting parameters such as the number of hidden layers and iterations, using GridSearchCV to enhance classification accuracy for leaf-covered and uncovered fruits.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe MLP's strength lies in its non-linear modeling, making it adept at handling intricate datasets. It was instrumental in identifying patterns in spectral features related to fruit coverage, offering valuable insights for precision agriculture. The optimized MLP model aids in improving the efficiency of fruit quality assessment and provides decision support for orchard management. Performance was evaluated using metrics like ROC AUC, accuracy, and specificity.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003e2.4. Regression analysis method for distance parameters\u003c/h2\u003e \u003cp\u003eIn this study, regression analysis of spectral data of fruits covered by leaves at different distances (0 cm, 2 cm, 4 cm, 6 cm) was performed to establish the feature matrix and labelled the coverage distances. To validate the model prediction ability, the dataset was divided into 70% training set and 30% test set, and the features were normalised using StandardScaler to improve the model training efficiency and prediction performance.\u003c/p\u003e \u003cdiv id=\"Sec10\" class=\"Section3\"\u003e \u003ch2\u003e2.4.1. Linear regression analysis\u003c/h2\u003e \u003cp\u003eLinear regression(Arjo, \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2009\u003c/span\u003e;Marill, \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e2004\u003c/span\u003e), a foundational statistical approach(Frees, \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2009\u003c/span\u003e;Maindonald \u0026amp; Braun, \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2010\u003c/span\u003e), was used in this study to model the linear relationship between spectral features and the distance of fruit coverage. The model predicts the covered distance by minimizing the sum of squared errors, identifying a linear equation that correlates spectral data with the leaf cover status. This analysis aids in understanding key spectral features and is fundamental for orchard management, with the potential to inform if more complex models are necessary for capturing non-linear data relationships.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec11\" class=\"Section3\"\u003e \u003ch2\u003e2.4.2. Ridge regression (RR) analysis\u003c/h2\u003e \u003cp\u003eRidge regression, also known as Tikhonov regularisation, is a linear regression method specifically designed to deal with the problem of multicollinearity(Golub, Hansen, \u0026amp; O'Leary, \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e1999\u003c/span\u003e;Sonnberger, 1989). Ridge regression, which combats multicollinearity in linear models by adding an L2 penalty term to the loss function(Naozumi, Lundberg, \u0026amp; Su-In, \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e2019\u003c/span\u003e;Sen, Tomal, \u0026amp; Yan, \u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e2022\u003c/span\u003e), was applied to analyze spectral data of leaf-covered raspberries. This method stabilizes parameter estimates and reduces model complexity. The model parameters are solved by minimizing the loss function, typically using a closed-form solution. In this study, ridge regression predicted the distance between fruit and leaves, mitigating multicollinearity common in spectral data. Hyperparameter tuning via GridSearchCV optimized model performance, enhancing predictive accuracy and offering insights into spectral data's relationship with canopy coverage for orchard management.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section3\"\u003e \u003ch2\u003e2.4.3. Lasso regression analysis\u003c/h2\u003e \u003cp\u003eLasso regression(Tibshirani \u0026amp; Tibshirani, \u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e1996\u003c/span\u003e;Zhang et al., \u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e2021\u003c/span\u003e), addressing multicollinearity in data, incorporates an L1 penalty for feature selection, shrinking insignificant coefficients to zero(Guo et al., \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2022\u003c/span\u003e;Tsarouchi, Vlachopoulos, Karahaliou, Vassiou, \u0026amp; Costaridou, \u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). It minimizes a loss function with an added penalty term, resulting in a sparse solution where only relevant features contribute to the mode(LI, HU, \u0026amp; ZHAO, 2022)l. This method was used to analyze spectral data of leaf-covered fruits, predicting the leaf-fruit distance. Lasso's sparsity identifies key spectral features for distance prediction, enhancing model interpretability and accuracy. GridSearchCV optimized the regularization parameter, balancing model simplicity with performance.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section3\"\u003e \u003ch2\u003e2.4.4. RF regression analysis\u003c/h2\u003e \u003cp\u003eRF regression(Sheridan \u0026amp; RP, 2013;Svetnik, \u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e2003\u003c/span\u003e) enhances model predictions by averaging the outcomes of multiple decision trees, capitalizing on \"collective intelligence\" to improve performance over single models. It introduces randomness by selecting features randomly during tree node splits, increasing model diversity and reducing overfitting. RF optimizes predictions by minimizing the mean squared error (MSE)(Svetnik, \u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e2003\u003c/span\u003e;Vu et al., \u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e2019\u003c/span\u003e) and is adept at handling non-linear relationships and feature importance assessment.\u003c/p\u003e \u003cp\u003eIn this study, RF was applied to predict the distance between fruit and covered leaf, with hyperparameter tuning via GridSearchCV to enhance predictive accuracy by adjusting the number of trees and tree depth parameters.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section3\"\u003e \u003ch2\u003e2.4.5. MLP regression analysis\u003c/h2\u003e \u003cp\u003eThe Multilayer Perceptron (MLP), a neural network for regression and classification(Golnaraghi, Zangenehmadar, Moselhi, \u0026amp; Alkass, \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e2019\u003c/span\u003e), was employed in this study using the MLPRegressor (MLPR) from Scikit-learn. MLP leverages its non-linear modeling to analyze complex spectral data, with hyperparameter tuning via GridSearchCV to optimize the network's structure and training process, enhancing prediction accuracy and generalization. Model performance is evaluated using metrics like the Coefficient of Determination (R\u003csup\u003e2\u003c/sup\u003e), MSE, and Pearson's correlation coefficient.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e"},{"header":"3. Results and discussion","content":"\u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003e3.1. Results of data pre-processing\u003c/h2\u003e \u003cp\u003eThis study involved rigorous preprocessing of spectral data\u0026mdash;intensity, absorbance, and reflectance\u0026mdash;to ensure quality for analysis. Steps included handling missing values, baseline correction, smoothing, scatter correction, and feature extraction. From the distribution of missing values in the raw data collected for the three categories (Fig.\u0026nbsp;\u003cspan refid=\"Fig20\" class=\"InternalRef\"\u003e4\u003c/span\u003e), it can be judged that the amount of missing data for the spectral and reflectance data is small, and some data are missing in a particular row, so direct manual deletion is used to remove the missing values to ensure data integrity. However, the absorbance data set has a high rate of missing hyperspectral data - absorbance data is obtained from the absorption properties of the principal substance for specific wavelengths of light.. Absorbance (Charnley, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2023\u003c/span\u003e;Platt \u0026amp; Stutz, \u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e2008\u003c/span\u003e)is calculated using the formula:\u003cdiv id=\"Equ1\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ1\" name=\"EquationSource\"\u003e\n$$\\text{A}\\left({\\lambda }\\right)=-{\\text{l}\\text{o}\\text{g}}_{10}\\left(\\frac{{\\text{I}}_{\\text{t}\\text{r}\\text{a}\\text{n}\\text{s}\\text{m}\\text{i}\\text{t}\\text{t}\\text{e}\\text{d}}}{{\\text{I}}_{\\text{i}\\text{n}\\text{c}\\text{i}\\text{d}\\text{e}\\text{n}\\text{t}}}\\right)$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e1\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eIn the above equation, A is the absorbance at the wavelength at v, It is the transmitted light intensity, and Ii is the incident light intensity(Castani\u0026eacute;, \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2013\u003c/span\u003e). However, the absorbance data showed a large number of nulls, which can be analysed because leaf cover changes the distribution and intensity of the incident light, resulting in significant differences in the spectral information received on the surface of the fruit compared to the uncovered area. This variation can be outside the dynamic range of the sensor, resulting in an inability to accurately measure absorbance. The large number of missing values affects the integrity of the data and the accuracy of the analyses. This led to integrity issues and could skew model learning and results(Khan et al., \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). As a result, absorbance data may be less suitable for analyzing leaf-covered fruits where data precision is vital, whereas light intensity and reflectance data are more reliably extracted and utilized. In contrast, light intensity and reflectance data may better reflect the spectral characteristics of fruits under cover conditions and it is easier to extract useful information from the data.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eRaw spectral data of light intensity (Fig.\u0026nbsp;\u003cspan refid=\"Fig21\" class=\"InternalRef\"\u003e5\u003c/span\u003e) as well as reflectance data (Fig.\u0026nbsp;\u003cspan refid=\"Fig22\" class=\"InternalRef\"\u003e6\u003c/span\u003e) were averaged for the case of leaf followed by covered fruit versus the case of no fruit behind the leaf. From the spectrograms collected, it was possible to identify the band between 200 and 1100 nanometres (nm). This band covers the region from UV to NIR and is an important interval in plant spectral analysis, providing detailed information on the biochemical composition and physical structure of the fruit(Galvez-Sola et al., \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2015\u003c/span\u003e;Ling, Goodin, Raynor, \u0026amp; Joern, \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e2019\u003c/span\u003e). Two major wave peaks are observed from Fig.\u0026nbsp;\u003cspan refid=\"Fig21\" class=\"InternalRef\"\u003e5\u003c/span\u003e(a), where the highest visible wave peak in Fig.\u0026nbsp;\u003cspan refid=\"Fig21\" class=\"InternalRef\"\u003e5\u003c/span\u003e(b) occurs in the range of about 700 to 800 nm, which is consistent with the absorption properties of plant pigments (e.g. chlorophyll), which have a strong absorption in this band. Another sub-obscure peak in Fig.\u0026nbsp;\u003cspan refid=\"Fig21\" class=\"InternalRef\"\u003e5\u003c/span\u003e(c) is located between 500 and 600 nm, which may be related to the absorption properties of carotenoids, which also have strong absorption in this band. These peaks provide important information about the spectral characteristics of the fruit. When comparing the spectral curves of the 'no cover' with those of the different covered distances, it was observed that the 'no cover' curve differed significantly from the other curves in the overall spectral map. This difference may not be significant enough in the full band, but it can be clearly seen in the detail plots (show as Figs.\u0026nbsp;\u003cspan refid=\"Fig21\" class=\"InternalRef\"\u003e5\u003c/span\u003eb, \u003cspan refid=\"Fig21\" class=\"InternalRef\"\u003e5\u003c/span\u003ec). This suggests that the spectral features reflect the effect of shading, despite the fact that shading conditions change the path and intensity of light propagation.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFrom the reflectance spectrograms in Fig.\u0026nbsp;\u003cspan refid=\"Fig22\" class=\"InternalRef\"\u003e6\u003c/span\u003e(a) it can be seen that the effective spectral range is approximately between 400 and 900 nm. Below 400 nm and above 900 nm, the spectral data show strong noise, combined with the analysis of light intensity spectra in Fig.\u0026nbsp;\u003cspan refid=\"Fig21\" class=\"InternalRef\"\u003e5\u003c/span\u003e, the light intensity in the same region of the curve a few there is little light intensity signal, may be due to the spectroscopic instrument in the area of relatively low sensitivity, resulting in a decline in the signal-to-noise ratio, or the samples in the ultraviolet or mid-infrared region does not have a significant reflectance or absorption characteristics, and therefore the measured signal is mainly noise. In the interval 400\u0026ndash;900 nm the curve is relatively stable, indicating that the spectral data in this band range has a good signal-to-noise ratio and is suitable for further analysis. In the 600\u0026ndash;800 nm interval (Fig.\u0026nbsp;\u003cspan refid=\"Fig22\" class=\"InternalRef\"\u003e6\u003c/span\u003eb) there is a peak in the reflectance curve which corresponds to the position of the trough in the light intensity spectrum. This peak may be due to the light absorption properties of some chemical components in the sample at these wavelengths. In the visible region, the absorption properties of chlorophyll and other pigments affect the reflectance, resulting in a characteristic peak in the spectrogram. Therefore, prior to pre-processing, the available range is extracted and the strong part of the noise is removed before the data is pre-processed.\u003c/p\u003e \u003cp\u003eData pre-processing techniques, including outlier and missing value correction, baseline adjustment, and smoothing with an SG filter, were applied to the hyperspectral dataset to enhance analysis accuracy. The wavelet transform method was used for baseline correction, and the SNV method addressed scatter effects. The above pre-processing steps resulted in two sets of spectrograms (Figs.\u0026nbsp;\u003cspan refid=\"Fig23\" class=\"InternalRef\"\u003e7\u003c/span\u003e and \u003cspan refid=\"Fig24\" class=\"InternalRef\"\u003e8\u003c/span\u003e) showing the data before and after pre-processing, respectively. The spectrograms before pre-processing show the noise and irregularities in the raw data, while the spectrograms after pre-processing show clearer and more stable spectral features. Particularly in the 500-900nm band, the pre-processed spectrograms show clearer spectral features, indicating that the pre-processing effectively improves the usability and analytical value of the data.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003e3.2. Feature extraction process\u003c/h2\u003e \u003cp\u003eAfter data pre-processing, feature extraction is a key part of the analysis process in this study, which aims to identify key information closely related to sample characteristics from complex spectral data(Labory, Njomgue-Fotso, \u0026amp; Bottini, \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2024\u003c/span\u003e;Sachar \u0026amp; Kumar, \u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). Wave peak detection is first performed on the pre-processed data using the find_peaks function, which is able to identify significant local maxima in the spectra. Optimal peak selection was achieved by setting the peak detection parameters, including the minimum height, minimum distance and relative height thresholds.\u003c/p\u003e \u003cp\u003ePost-extraction,SpectrumAlignment ensured spectral comparability by aligning wave peaks to a reference spectrum, creating a superspectrum for standard analysis. Feature selection and filtering retained significant features for analysis, using a ratio-based approach to exclude less frequent features and focus on those most relevant for classification or regression.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe results of the feature extraction are shown in Figs.\u0026nbsp;\u003cspan refid=\"Fig25\" class=\"InternalRef\"\u003e9\u003c/span\u003e and \u003cspan refid=\"Fig26\" class=\"InternalRef\"\u003e10\u003c/span\u003e, which clearly show the peaks and troughs of each feature parameter point. These wave peaks reveal key information in the hyperspectral data and provide a solid basis for hyperspectral analysis. Finally, the wavelength parameter values obtained by the filter ratio based feature extraction method include: 554.51, 592.519, 600.895, 714.059, 721.601, 745.837, 755.462, 770.258, 805.974, 820.44, 825.522, 831. 014, 836.071, 839.016, 857.846, 869.062, 889.26, 903.97, 591.121, 596.71, 718.498, 723.372, 751.529, 760. 695, 775.024, 805.548, 817.045, 822.559, 827.636, 832.279, 854.511, 867.404, 881.452, 900.303. These effectively extracted wavelength ranges cover 554.51-903.97 nm, spanning the visible to infrared spectral range. The use of these features provides a deeper understanding of the differences between samples and greatly simplifies the model, providing important support for sample classification and regression analysis.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003e3.3. Results of classification analysis\u003c/h2\u003e \u003cp\u003eThis study conducted a hyperspectral analysis to classify leaf-covered raspberry fruit using three machine learning (ML) models: RF, LR, and MLP. It compared feature band selection to full band data for classification, evaluating model performance through metrics like ROC AUC, accuracy, sensitivity, and specificity, with results detailed in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eClassification Model Evaluation Results\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"8\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eData\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eEvaluation criteria\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRF Train\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eRF Test\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eLR Train\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eLR Test\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eMLP Train\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003eMLP Test\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003eIntensity characteristic wavelength\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eROC AUC\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.9\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.86\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.84\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.94\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.92\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eaccurancy\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.84\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.83\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.93\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.9\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSensitivity\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.79\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.82\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.91\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.87\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSpecificity\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.83\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.78\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.93\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.91\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003eintensity all wavelength\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eROC AUC\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.87\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.94\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.95\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eaccurancy\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.80\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.94\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.92\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.94\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSensitivity\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.79\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.92\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.91\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.92\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSpecificity\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.80\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.91\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.90\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.96\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003eReflectance selection data\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eROC AUC\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.94\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.91\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.89\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.95\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eaccurancy\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.89\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.85\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.82\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.91\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.89\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSensitivity\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.74\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.69\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.93\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSpecificity\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.93\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.90\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.89\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.89\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.87\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cdiv id=\"Sec19\" class=\"Section3\"\u003e \u003ch2\u003e3.3.1. Classification Performance of Spectral Light Intensity Data\u003c/h2\u003e \u003cp\u003eCombined with the area under the ROC curve to evaluate the classification performance of the optical intensity data, the ROC curves of the feature band spectral data (Fig.\u0026nbsp;\u003cspan refid=\"Fig27\" class=\"InternalRef\"\u003e11\u003c/span\u003e) and the full band spectral data (Fig.\u0026nbsp;\u003cspan refid=\"Fig28\" class=\"InternalRef\"\u003e12\u003c/span\u003e) comprehensively demonstrate the evaluation of the accuracy performance of the three models.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eIn feature band classification, the RF model excelled on the training set with a perfect ROC AUC of 1.00, accuracy of 0.98, sensitivity of 0.97, and specificity of 0.99, indicating high accuracy in distinguishing training samples. However, its performance on the test set declined to an ROC AUC of 0.90, accuracy of 0.84, sensitivity of 0.79, and specificity of 0.83, indicating a need for improved generalizability. LR and MLP also performed well on the training set but with slightly lower metrics than RF. On the test set, LR mirrored RF's performance, while MLP demonstrated better generalization with a high ROC AUC of 0.92, accuracy of 0.90, sensitivity of 0.87, and specificity of 0.91.\u003c/p\u003e \u003cp\u003eOn full-band data, all models showed improved training performance, with RF and MLP reaching ROC AUCs of 1.00 and 0.99, and accuracies of 0.99 and 0.97, respectively. This indicates the benefit of richer information in full-band data for pattern recognition. However, RF's test set performance dropped to an ROC AUC of 0.87 and accuracy of 0.80. Logistic regression and MLP maintained stability with ROC AUCs of 0.94 and 0.95, and high sensitivities and specificities, demonstrating their robustness across datasets.\u003c/p\u003e \u003cp\u003eSelected feature bands enhance model training efficiency and reduce computational demands, especially beneficial in high-dimensional data by lowering complexity and overfitting risks, thus improving model generalizability. Despite full-band data offering more information, it may result in prolonged training and prediction times without significantly surpassing the classification performance of selected feature bands on test sets.\u003c/p\u003e \u003cp\u003eRF showed better performance with selected feature bands than with full-band data, likely due to the latter's higher dimensionality and increased model complexity. LR exhibited stable performance across both datasets, with less significant improvement in the full band. Conversely, the multilayer perceptron sustained high performance on full-band data, particularly on the test set, indicating strong generalization and classification accuracy.\u003c/p\u003e \u003cp\u003eThe feature band approach is beneficial for efficiency and computational resource usage, particularly with high data dimensionality. Full-band data, while more informative, is not necessary when resources are ample. Future research should explore feature selection and dimensionality reduction for optimized model performance and cost. In practice, selected feature bands are preferable for a balance between speed and accuracy.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec20\" class=\"Section3\"\u003e \u003ch2\u003e3.3.2. Classification performance of reflectance data\u003c/h2\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe ROC curve of the optical reflectance spectral data, as depicted in Fig.\u0026nbsp;\u003cspan refid=\"Fig29\" class=\"InternalRef\"\u003e13\u003c/span\u003e, presents an exceptional classification performance of all models on the training set, with RF achieving a perfect ROC AUC of 1.00 and high accuracy, sensitivity, and specificity, indicating strong classification and canopy identification. Despite a slight dip in performance on the test set, RF maintained a high ROC AUC of 0.94 and good accuracy, sensitivity, and specificity, demonstrating robust generalization.\u003c/p\u003e \u003cp\u003eThe LR and MLP also perform well on the training set, with ROC AUC of 0.91 and 0.97, and accuracy of 0.85 and 0.91. On the test set, the LR has a ROC AUC of 0.89 and accuracy of 0.82, while the MLP has a ROC AUC of 0.95 and accuracy of 0.89, and the sensitivity and specificity of both also remain at a higher level.\u003c/p\u003e \u003cp\u003eReflectance spectral data offered models richer information during training, leading to near-perfect performance, likely due to its reflection of samples' chemical and physical properties, which aids classification. Despite a slight decline in light intensity spectral data performance on the test set, reflectance data maintained high accuracy and generalization, suggesting its reliability in practical applications, especially with ample computational resources. Hyperspectral technology effectively distinguished between leaf-covered and uncovered fruits, with high accuracy and recall, proving its efficacy in addressing coverage issues.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec21\" class=\"Section2\"\u003e \u003ch2\u003e3.4. Results of regression analysis\u003c/h2\u003e \u003cp\u003eRegression analysis revealed significant correlations between spectral characteristics and the extent of fruit cover, with light intensity spectra demonstrating a stronger predictive correlation than reflectance data for raspberry fruit coverage. Multiple regression models were employed to forecast the leaf-fruit distance, with model performance evaluated using R\u003csup\u003e2\u003c/sup\u003e, Pearson correlation coefficients, and p-values, as detailed in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eRegression Model Evaluation Results\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eData\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eR\u003csup\u003e2\u003c/sup\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003ePearson correlation\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eP-value\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"4\" rowspan=\"5\"\u003e \u003cp\u003eIntensity extraction\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.11\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.33\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e3.11\u0026times;10\u0026thinsp;\u003csup\u003e\u0026minus;\u0026thinsp;39\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.34\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.63\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e2.54\u0026times;10\u003csup\u003e\u0026minus;\u0026thinsp;40\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLasso\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.70\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e2.49\u0026times;10\u003csup\u003e\u0026minus;\u0026thinsp;41\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRF\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.12\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.36\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e9.40\u0026times;10\u003csup\u003e\u0026minus;\u0026thinsp;48\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMLPR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.67\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.75\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e4.91\u0026times;10\u003csup\u003e\u0026minus;\u0026thinsp;76\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"4\" rowspan=\"5\"\u003e \u003cp\u003eIntensity all\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e-4.53\u0026times;10\u003csup\u003e12\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.33\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.59\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.65\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.89\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e3.00\u0026times;10\u003csup\u003e\u0026minus;\u0026thinsp;141\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLasso\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.60\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.85\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e1.19\u0026times;10\u003csup\u003e\u0026minus;\u0026thinsp;117\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRF\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.19\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.73\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e4.02\u0026times;10\u003csup\u003e\u0026minus;\u0026thinsp;70\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMLPR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.84\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e1.50\u0026times;10\u003csup\u003e\u0026minus;\u0026thinsp;178\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"4\" rowspan=\"5\"\u003e \u003cp\u003eReflectance extraction\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e-6.97\u0026times;10\u003csup\u003e8\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.57\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.05\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.23\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e8.15\u0026times;10\u003csup\u003e\u0026minus;\u0026thinsp;19\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLasso\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.04\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.21\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e3.35\u0026times;10\u003csup\u003e\u0026minus;\u0026thinsp;15\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRF\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.14\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.37\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e8.71\u0026times;10\u003csup\u003e\u0026minus;\u0026thinsp;49\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMLPR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.39\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.69\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e6.30\u0026times;10\u003csup\u003e\u0026minus;\u0026thinsp;88\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cdiv id=\"Sec22\" class=\"Section3\"\u003e \u003ch2\u003e3.4.1. Results of Regression Analysis of Light Intensity Data\u003c/h2\u003e \u003cp\u003eFor the light intensity extracted data, the R\u003csup\u003e2\u003c/sup\u003e value was 0.1065, indicating that the model explained approximately 10.65% of the variability. the Pearson correlation coefficient was 0.3282, indicating a moderate positive correlation. the p-value was \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(3.11\\times {10}^{-39}\\)\u003c/span\u003e\u003c/span\u003e, which is much less than 0.05, indicating that the predictive power of the linear regression model is significant. However, for the full light intensity data, the R2 values were unusually negative (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(-4.53\\times {10}^{12}\\)\u003c/span\u003e\u003c/span\u003e), which is unreasonable and may be due to errors in data pre-processing or model implementation.\u003c/p\u003e \u003cp\u003eRR In the extracted light intensity data, the R\u003csup\u003e2\u003c/sup\u003e value improves to 0.34 with a Pearson correlation coefficient of 0.6327 and the p-value is still very small (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(2.54\\times {10}^{-40}\\)\u003c/span\u003e\u003c/span\u003e), indicating that the ridge regression model has a better fit compared to the linear regression model and the predictive power remains significant.\u003c/p\u003e \u003cp\u003eIn the light intensity data, the R\u003csup\u003e2\u003c/sup\u003e value increased to 0.65 and the Pearson correlation coefficient was as high as 0.89, which shows that the predictive ability of the model built from the full band light intensity data is more significant.\u003c/p\u003e \u003cp\u003eLasso Regression For the light intensity extracted data and the full band data, the R2 values were 0.70 and 0.60, and the Pearson correlation coefficients were 0.64 and 0.85, with p-values much less than 0.05, indicating that the regression model performed best in explaining the variability and correlation of the two types of data, and the predictive ability was very significant.\u003c/p\u003e \u003cp\u003eRF \u0026amp; MLPR had R\u003csup\u003e2\u003c/sup\u003e values of 0.12 and 0.67 and Pearson correlation coefficients of 0.36 and 0.75 on the light intensity extraction data, with p-values that were very small, showing that the models had significant predictive ability, but their explanatory power was weaker compared to the Lasso regression model.\u003c/p\u003e \u003cp\u003eIn the regression analyses of the light intensity extraction data and the full band data, the scatter plots provide an intuitive way of observing the fit of different regression models at each distance. Below are the regression scatter plots for the light intensity extracted data (Fig.\u0026nbsp;\u003cspan refid=\"Fig30\" class=\"InternalRef\"\u003e14\u003c/span\u003e) and the full band data (Fig.\u0026nbsp;\u003cspan refid=\"Fig31\" class=\"InternalRef\"\u003e15\u003c/span\u003e):\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig30\" class=\"InternalRef\"\u003e14\u003c/span\u003e illustrates the scatter distribution across four distances, representing the predicted versus actual values for regression models. The MLP Regressor performs best at 0cm and 4cm, with a narrower distribution of scatters aligned along the diagonal, signifying high prediction accuracy. It effectively utilizes light intensity data for short-distance predictions. The Lasso Regressor excels under 2cm, demonstrating linearity and low error in medium-distance predictions. At 6cm, all models' fits are average, with scattered distribution and off-diagonal alignment, suggesting decreased accuracy. This decline in fit quality is attributed to the increasing complexity of the relationship between light intensity features and fruit canopy at greater distances, challenging the models' predictive accuracy. As distance grows, model fit degrades, scatter distribution widens, and prediction error rises, underscoring the prediction challenge at longer distances.\u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig31\" class=\"InternalRef\"\u003e15\u003c/span\u003e shows the results of the regression analyses of the full-band data at different distances. Similar to the light intensity extraction data, we can see the difference in performance of the different models at different distances.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe RR and LassoR fit well at close distances, but their accuracy declines with increasing distance. At 6 cm, Lasso performed the best, yet all models struggled at longer distances. The broad scatter distribution and significant prediction errors indicate models' challenges in precisely forecasting fruit cover at greater distances.\u003c/p\u003e \u003cp\u003eAnalysis of Figs.\u0026nbsp;\u003cspan refid=\"Fig30\" class=\"InternalRef\"\u003e14\u003c/span\u003e and \u003cspan refid=\"Fig31\" class=\"InternalRef\"\u003e15\u003c/span\u003e indicated that Lasso and MLP regressors had superior fit and accuracy at specific distances in light intensity data, with performance diminishing at greater distances. This insight is crucial for comprehending spectral-canopy distance relationships and selecting optimal models for precise predictions. Future studies should explore model optimization for enhanced accuracy across varying distances and consider additional influential factors, such as environmental conditions and data collection techniques, to boost the model's practical utility.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec23\" class=\"Section3\"\u003e \u003ch2\u003e3.4.2. Reflectance data regression analysis results\u003c/h2\u003e \u003cp\u003eFor the reflectance extraction data, the R\u003csup\u003e2\u003c/sup\u003e value of the LR regression model was \u0026minus;\u0026thinsp;6.97\u0026times;10\u003csup\u003e8\u003c/sup\u003e, which is an outlier and requires further checking of the accuracy of data processing and model implementation. While the R\u003csup\u003e2\u003c/sup\u003e values for the RR and Lasso regression models were 0.05 and 0.04, respectively, indicating limited explanatory power of the models. However, the RF and MLP regression models performed relatively well on the reflectance data with R\u003csup\u003e2\u003c/sup\u003e values of 0.14 and 0.40, respectively, showing some predictive ability.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig32\" class=\"InternalRef\"\u003e16\u003c/span\u003e displays scatter distributions across four distances, where each scatter represents the relationship between the predicted and actual values of a regression model at a particular distance. While reflectance data's regression performance is not as strong as Intensity data, the MLPR model shows some predictive value at 2 cm. Light intensity extraction data's regression analysis indicates a correlation between spectral features and fruit covered distance, with Lasso regression effectively capturing this relationship, as evidenced by high R\u003csup\u003e2\u003c/sup\u003e and Pearson correlation values. However, anomalous R\u003csup\u003e2\u003c/sup\u003e values in full light intensity data necessitate a review of data processing and model development. Reflectance data, though less explanatory, hints at the potential utility of random forest and MLP regression models in predicting fruit cover, suggesting either a subtler relationship or the need for more complex models to discern it.\u003c/p\u003e \u003cp\u003eIn summary, the results of the regression analyses in this study reveal the complex relationship between spectral characteristics and leaf-covered raspberry distance, and highlight the advantages and disadvantages of different regression models in explaining and predicting this relationship. Future research can further explore more advanced models and algorithms to improve the accuracy and explanatory power of the prediction. At the same time, due attention must be paid to the accuracy of data processing and model implementation to avoid anomalous results.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec24\" class=\"Section2\"\u003e \u003ch2\u003e3.5. Discussion.\u003c/h2\u003e \u003cp\u003eThis study successfully detected leaf-covered fruits using hyperspectral data and a sophisticated ML model, addressing a key need in precision agriculture for automated, intelligent orchard management. The dataset included light intensity and reflectance spectral data, collected under specific conditions, to capture the spectral characteristics of fruits at various cover levels.\u003c/p\u003e \u003cp\u003eThe problem with choosing hyperspectral technology for predicting fruit cover is that hyperspectral technology has unique advantages. Hyperspectral data sets typically have lower data dimensions compared to traditional machine vision technologies, which allows for easier data processing and simpler, more efficient calculations that can be achieved in real time for precision agriculture. In addition, hyperspectral technology is able to capture detailed spectral information over a wider range of wavelengths, providing richer data to support the biochemical and physiological status of fruit.\u003c/p\u003e \u003cp\u003eIn this study, light intensity data and reflectance data showed different advantages in different research areas, and reflectance data showed clear advantages in the classification area. By combining the Random Forest algorithm, we achieved highly accurate classification results in which the ROC AUC values were close to perfect, and the accuracy and specificity were also very high. This indicates that the reflectance data can effectively reflect the spectral differences between covered and uncovered fruits, providing strong support for accurate classification. In the regression analysis, the light intensity data showed its advantages. Combined with the Lasso regression model, we found that the light intensity data had high explanatory power in predicting the degree of canopy cover, and both the R2 value and the Pearson correlation coefficient showed significant correlation. This suggests that light intensity data has good sensitivity and accuracy in reflecting the degree of fruit canopy.\u003c/p\u003e \u003cp\u003eThe research advantage of machine learning combined with hyperspectral data is its ability to deal with complex non-linear relationships and extract useful information from them. This approach not only improves the accuracy of fruit covered by leaf detection, but is also more applicable to production and practice. Although it is not yet widely used in practical production, this is mainly due to the low popularity of hyperspectral equipment and the fact that research and development of related technologies is still in its infancy. However, with the continuous advancement of technology and cost reduction, it is expected that hyperspectral technology will be widely used in smart agriculture in the near future to achieve comprehensive smart agricultural management. There are already many research and application cases demonstrating the potential of hyperspectral technology in smart agriculture, providing a solid foundation for future research and application.\u003c/p\u003e \u003c/div\u003e"},{"header":"4. Conclusion","content":"\u003cp\u003eIn this study, hyperspectral technology was systematically applied for the first time to provide an in-depth analysis of the leaf-covered fruit problem. By collecting hyperspectral data under different conditions and applying machine learning models for classification and regression analysis, leaf-covered and uncovered raspberry fruits were successfully distinguished. The results show that hyperspectral technology can effectively extract the spectral features of fruits and accurately predict the canopy status of fruits through appropriate data processing and model optimisation. In addition, we found that spectral intensity data have better applicability in solving the canopy cover problem, which provides new technical support and methodological direction for orchard management.\u003c/p\u003e\n\u003cp\u003eThis study not only confirms the potential of hyperspectral technology in agriculture, but also provides new perspectives for fruit quality assessment and orchard monitoring. By comparing different spectral data types and machine learning models, we provide clear methodological directions for future research and applications. In particular, this study found that combining light intensity data with multilayer perceptron models can achieve better performance in canopy detection, which provides strong technical support for the development of precision agriculture.\u003c/p\u003e\n\u003cp\u003eFuture research can explore the following aspects in depth: first, the generalisation ability and accuracy of the model can be improved by collecting more diverse orchard data. Secondly, the combination of other sensor data, such as multispectral imaging and thermal imaging data, will be explored to further improve the accuracy of canopy detection. In addition, the research could be extended to other types of fruit trees and fruit to validate the effectiveness of hyperspectral technology in a wider range of agricultural applications. Finally, real-time monitoring and automated control systems will be developed to apply hyperspectral technology to practical orchard management, in order to achieve automated and intelligent orchard management.\u003c/p\u003e\n\u003cp\u003eDespite the results of this study, there are still some limitations. For example, the limited sample size in the dataset may affect the generalisability of the model. In addition, the reflectance data faced some challenges in pre-processing and model training, which need to be further optimised. Future research can be improved and deepened in the following areas: first, the robustness and accuracy of the model can be improved by increasing the sample size and diversity. Secondly, more advanced data processing techniques should be explored to better handle reflectance data. Finally, the problem of canopy detection under different environmental conditions is investigated to improve the adaptability of the model in practical applications.\u003c/p\u003e\n\u003cp\u003eIn conclusion, this study provides new insights and methods for the application of hyperspectral technology in orchard management and lays the foundation for the future development of precision agriculture. Continued refinement of methods and techniques is expected to lead to greater efficiency and sophistication in orchard management and fruit quality assessment.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eAuthors’ contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eC-ZJ was mainly responsible for the execution of the experiments and the organisation and processing of the data. X-RQ provided extensive help and advice during the experimental process. R-ZH and W-J provided valuable help and insights during the data analysis and interpretation phase. All authors were involved in drafting and revising the manuscript and read and approved the final version for publication.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors are grateful for the financial support from the Hebei Provincial Department of Science and Technology (Grant number: 20326338D).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData availability statement\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eData supporting the results of this study are available in the article (and/or its supplementary materials). Due to privacy constraints and the proprietary nature of the data, other data related to this study are not publicly available as they are stored in Excel spreadsheets on the authors' personal computers. Requests for access to these datasets should be directed to Zhenhui Ren and will be considered on a case-by-case basis with the author's approval. Although these data are not publicly available at this time, the authors anticipate that they will be considered for inclusion in a public repository after 2025, when the individual co-authors have completed their relevant studies, pending completion of their relevant academic assignments by the first author.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting Interests Statement\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eProf. Zhenhui Ren's work has been funded by Hebei Provincial Department of Science and Technology. Dr Zhujun Chen、Dr Juan Wang and Dr Ruiqian Xi declare no potential conflict of interest.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n \u003cli\u003eArjo, D. (2009). Statistical Models: Theory and Practice. \u003cem\u003eTechnometrics\u003c/em\u003e, 48(2), 315\u003c/li\u003e\n \u003cli\u003eBarbedo, J. G. A. (2023). A review on the combination of deep learning techniques with proximal hyperspectral images in agriculture. \u003cem\u003eComputers and Electronics in Agriculture\u003c/em\u003e, 210(0168-1699), 107920. doi: https://doi.org/10.1016/j.compag.2023.107920\u003c/li\u003e\n \u003cli\u003eBarnes, R. J., Dhanoa, M. S., \u0026amp; Lister, S. J. (1989). Standard Normal Variate Transformation and De-Trending of Near-Infrared Diffuse Reflectance Spectra. \u003cem\u003eApplied Spectroscopy\u003c/em\u003e, 43(5), 772-777. doi: 10.1366/0003702894202201\u003c/li\u003e\n \u003cli\u003eBertinetto, C. G., \u0026amp; Vuorinen, T. (2014). Automatic Baseline Recognition for the Correction of Large Sets of Spectra Using Continuous Wavelet Transform and Iterative Fitting. \u003cem\u003eApplied Spectroscopy\u003c/em\u003e, 68(2), 155-164. doi: 10.1366/13-07018\u003c/li\u003e\n \u003cli\u003eBreiman, L. (2001). Random Forests. \u003cem\u003eMachine Learning\u003c/em\u003e, 45(1), 5-32. doi: 10.1023/A:1010933404324\u003c/li\u003e\n \u003cli\u003eCastani\u0026eacute;, F. (2013). \u003cem\u003eSpectral analysis: parametric and non-parametric digital methods\u003c/em\u003e: John Wiley \u0026amp; Sons.\u003c/li\u003e\n \u003cli\u003eChang, C. (2022). \u003cem\u003eAdvances in Hyperspectral Image Processing Techniques - introduce\u003c/em\u003e.\u003c/li\u003e\n \u003cli\u003eCharnley, S. B. (2023). Absorption Spectroscopy. \u003cem\u003eSpringer eBooks\u003c/em\u003e, 0(2023), 40-41. doi: 10.1007/978-3-662-65093-6_9\u003c/li\u003e\n \u003cli\u003eChen, J., Zhang, H., Wang, Z., Wu, J., Luo, T., Wang, H.,... Long, T. (2022). An image restoration and detection method for picking robot based on convolutional auto-encoder. \u003cem\u003eComputers and Electronics in Agriculture\u003c/em\u003e, 196(0168-1699), 106896. doi: https://doi.org/10.1016/j.compag.2022.106896\u003c/li\u003e\n \u003cli\u003eChen, R., Liu, W., Yang, H., Jin, X., Yang, G., Zhou, Y.,... Feng, H. (2024). A novel framework to assess apple leaf nitrogen content: Fusion of hyperspectral reflectance and phenology information through deep learning. \u003cem\u003eComputers and Electronics in Agriculture\u003c/em\u003e, 219(0168-1699), 108816. doi: https://doi.org/10.1016/j.compag.2024.108816\u003c/li\u003e\n \u003cli\u003eDai, F., Wang, F., Yang, D., Lin, S., Chen, X., Lan, Y.,... Deng, X. (2022). Detection Method of Citrus Psyllids With Field High-Definition Camera Based on Improved Cascade Region-Based Convolution Neural Networks. \u003cem\u003eFrontiers in Plant Science\u003c/em\u003e, 12(Jan 24), 816272. doi: 10.3389/fpls.2021.816272\u003c/li\u003e\n \u003cli\u003eDe Santis, D., Carbone, K., Garzoli, S., Laghezza, M. V., \u0026amp; Turchetti, G. (2022). Bioactivity and Chemical Profile of Rubus idaeus L. Leaves Steam-Distillation \u0026nbsp; Extract. [Journal Article]. \u003cem\u003eFoods\u003c/em\u003e, 11(10), 1455. doi: 10.3390/foods11101455\u003c/li\u003e\n \u003cli\u003eDiwu, P. Y., Bian, X. H., Wang, Z. F., \u0026amp; Liu, W. (2019). Study on the Selection of Spectral Preprocessing Methods. \u003cem\u003eSPECTROSCOPY AND SPECTRAL ANALYSIS\u003c/em\u003e, 39(9), 2800-2806. doi: 10.3964/j.issn.1000-0593(2019)09-2800-07\u003c/li\u003e\n \u003cli\u003eE., C., D., Z., \u0026amp; R., R. (2009). Neurofuzzy prediction for gaze control. \u003cem\u003eCanadian Journal of Electrical and Computer Engineering\u003c/em\u003e, 34(1/2), 15-20. doi: 10.1109/CJECE.2009.5291203\u003c/li\u003e\n \u003cli\u003eFrees, E. W. (2009). Multiple Linear Regression\u0026nbsp;\u0026ndash;\u0026nbsp;IN. \u003cem\u003eCambridge University Press eBooks\u003c/em\u003e, 0(2009), 70-106. doi: 10.1017/cbo9780511814372.004\u003c/li\u003e\n \u003cli\u003eGalvez-Sola, L., Garcia-Sanchez, F., Perez-Perez, J. G., Gimeno, V., Navarro, J. M., Moral, R.,... Nieves, M. (2015). Rapid estimation of nutritional elements on citrus leaves by near infrared \u0026nbsp;reflectance spectroscopy. [Journal Article]. \u003cem\u003eFront Plant Sci\u003c/em\u003e, 6(Jul 23), 571. doi: 10.3389/fpls.2015.00571\u003c/li\u003e\n \u003cli\u003eGenkin, A., Lewis, D. D., \u0026amp; Madigan, D. (2007). Large-Scale Bayesian Logistic Regression for Text Categorization. \u003cem\u003eTechnometrics\u003c/em\u003e, 49(3), 291-304. doi: 10.1198/004017007000000245\u003c/li\u003e\n \u003cli\u003eGenuer, R., Poggi, J., \u0026amp; Tuleau-Malot, C. (2010). Variable selection using random forests. \u003cem\u003ePattern Recognition Letters\u003c/em\u003e, 31(14), 2225-2236. doi: https://doi.org/10.1016/j.patrec.2010.03.014\u003c/li\u003e\n \u003cli\u003eGolnaraghi, S., Zangenehmadar, Z., Moselhi, O., \u0026amp; Alkass, S. (2019). Application of Artificial Neural Network(s) in Predicting Formwork Labour Productivity. \u003cem\u003eAdvances in Civil Engineering\u003c/em\u003e, 2019(PT.1), 1-11\u003c/li\u003e\n \u003cli\u003eGolub, G. H., Hansen, P. C., \u0026amp; O\u0026apos;Leary, D. P. (1999). Tikhonov Regularization and Total Least Squares. \u003cem\u003eSIAM Journal on Matrix Analysis and Applications\u003c/em\u003e, 21(1), 185-194. doi: 10.1137/S0895479897326432\u003c/li\u003e\n \u003cli\u003eGuo, L., Du, S., Gao, S., Zhao, R., Huang, G., Jin, F.,... Zhang, L. (2022). Delta-Radiomics Based on Dynamic Contrast-Enhanced MRI Predicts Pathologic Complete Response in Breast Cancer Patients Treated with Neoadjuvant Chemotherapy. \u003cem\u003eCancers\u003c/em\u003e, 14(14), 3515. doi: 10.3390/cancers14143515\u003c/li\u003e\n \u003cli\u003eJiang, P., Wu, H., Wei, J., Sang, F., Sun, X.,... Lu, Z. (2007). RF-DYMHC: detecting the yeast meiotic recombination hotspots and coldspots by random forest model using gapped dinucleotide composition features. \u003cem\u003eNucleic acids research\u003c/em\u003e, 35(Web Server issue), W47-W51. doi: 10.1093/nar/gkm217\u003c/li\u003e\n \u003cli\u003eKang, H., \u0026amp; Chen, C. (2020). Fruit detection, segmentation and 3D visualisation of environments in apple orchards. \u003cem\u003eComputers and Electronics in Agriculture\u003c/em\u003e, 171(0168-1699), 105302. doi: 10.1016/j.compag.2020.105302\u003c/li\u003e\n \u003cli\u003eKhan, W., Zaki, N., Ahmad, A., Masud, M. M., Ali, L., Ali, N.,... Ahmed, L. A. (2022). Mixed Data Imputation Using Generative Adversarial Networks. \u003cem\u003eIEEE Access\u003c/em\u003e, 10(2169-3536), 124475-124490. doi: 10.1109/access.2022.3218067\u003c/li\u003e\n \u003cli\u003eKomarek, P., Moore, A., Committee, A., Calvet, A., \u0026amp; Nichol. (2004). \u003cem\u003eLogistic regression for data mining and high-dimensional classification\u003c/em\u003e., Carnegie Mellon University. Retrieved from Available from\u003c/li\u003e\n \u003cli\u003eLabory, J., Njomgue-Fotso, E., \u0026amp; Bottini, S. (2024). Benchmarking feature selection and feature extraction methods to improve the performances of machine-learning algorithms for patient classification using metabolomics biomedical data. \u003cem\u003eComputational and Structural Biotechnology Journal\u003c/em\u003e, 23(Mar 19), 1274-1287. doi: https://doi.org/10.1016/j.csbj.2024.03.016\u003c/li\u003e\n \u003cli\u003eLI, M., HU, H., \u0026amp; ZHAO, L. (2022). Key factors affecting carbon prices from a time-varying perspective. \u003cem\u003eEnvironmental Science and Pollution Research\u003c/em\u003e, 29(43), 65144-65160. doi: 10.1007/s11356-022-20376-x\u003c/li\u003e\n \u003cli\u003eLing, B., Goodin, D. G., Raynor, E. J., \u0026amp; Joern, A. (2019). Hyperspectral Analysis of Leaf Pigments and Nutritional Elements in Tallgrass Prairie Vegetation. \u003cem\u003eFrontiers in plant science\u003c/em\u003e, 10(Feb 25), 142. doi: 10.3389/fpls.2019.00142\u003c/li\u003e\n \u003cli\u003eLyu, Z., Wang, Z., Luo, F., Shuai, J., \u0026amp; Huang, Y. (2021). Protein Secondary Structure Prediction With a Reductive Deep Learning Method. [Journal Article]. \u003cem\u003eFront Bioeng Biotechnol\u003c/em\u003e, 9(2296-4185), 687426. doi: 10.3389/fbioe.2021.687426\u003c/li\u003e\n \u003cli\u003eMaindonald, J., \u0026amp; Braun, W. J. (2010). Multiple linear regression. In J. Maindonald \u0026amp; W. J. Braun (Eds.), (170-216). Cambridge: Cambridge University Press. (Reprinted.\u003c/li\u003e\n \u003cli\u003eMannino, G., Serio, G., Gaglio, R., Busetta, G., La Rosa, L., Lauria, A.,... Gentile, C. (2022). Phytochemical Profile and Antioxidant, Antiproliferative, and Antimicrobial Properties of \u0026nbsp;Rubus idaeus \u0026nbsp; Seed Powder. \u003cem\u003eFOODS\u003c/em\u003e, 11(17), 2605\u003c/li\u003e\n \u003cli\u003eMarill, K. A. (2004). Advanced Statistics:Linear Regression,Part I: Simple Linear Regression. \u003cem\u003eAcademic Emergency Medicine\u003c/em\u003e, 11(1069-6563), 87-93. doi: 10.1197/j.aem.2003.09.005\u003c/li\u003e\n \u003cli\u003eMirbod, O., Choi, D., Heinemann, P. H., Marini, R. P., \u0026amp; He, L. (2023). On-tree apple fruit size estimation using stereo vision with deep learning-based occlusion handling. \u003cem\u003eBiosystems Engineering\u003c/em\u003e, 226(1537-5110), 27-42. doi: 10.1016/j.biosystemseng.2022.12.008\u003c/li\u003e\n \u003cli\u003eNaozumi, H., Lundberg, S. M., \u0026amp; Su-In, L. (2019). AIControl: replacing matched control experiments with machine learning improves ChIP-seq peak identification. \u003cem\u003eNuclc Acids Research\u003c/em\u003e, 10(47), 58\u003c/li\u003e\n \u003cli\u003eNatasa Kljajic, J. Subic, \u0026amp; Sredojević, Z. (2017). Profitability of Raspberry Production on Holdings in THE Territory of Ariljr. \u003cem\u003eEkonomika Poljoprivrede (1979)\u003c/em\u003e, 1(64), 57-68. doi: 10.5937/ekopolj1701057k\u003c/li\u003e\n \u003cli\u003ePlatt, U., \u0026amp; Stutz, J. (2008). Differential Absorption Spectroscopy. In U. Platt \u0026amp; J. Stutz (Eds.), \u003cem\u003eDifferential Optical Absorption Spectroscopy: Principles and Applications\u003c/em\u003e (135-174). Berlin, Heidelberg: Springer Berlin Heidelberg. (Reprinted.\u003c/li\u003e\n \u003cli\u003ePress, W. H., \u0026amp; Teukolsky, S. A. (1990). Savitzky‐Golay Smoothing Filters. \u003cem\u003eComputer in Physics\u003c/em\u003e, 4(6), 669-672. doi: 10.1063/1.4822961\u003c/li\u003e\n \u003cli\u003eSachar, S., \u0026amp; Kumar, A. (2021). Survey of feature extraction and classification techniques to identify plant through leaves. \u003cem\u003eExpert systems with applications\u003c/em\u003e, 167(4), 114181. doi: 10.1016/j.eswa.2020.114181\u003c/li\u003e\n \u003cli\u003eSen, P. B., Tomal, J. H., \u0026amp; Yan, Y. (2022). A Novel Algorithm for Feature Selection Using Penalized Regression with \u0026nbsp;Applications to Single-Cell RNA Sequencing Data. [Journal Article]. \u003cem\u003eBiology (Basel)\u003c/em\u003e, 11(10), 1495. doi: 10.3390/biology11101495\u003c/li\u003e\n \u003cli\u003eSheridan, \u0026amp; RP. (2013). Using Random Forest To Model the Domain Applicability of Another Random Forest Model. \u003cem\u003eJ Chem Inf Model\u003c/em\u003e, 11(53), 2837-2850. doi: https://doi.org/10.1021/ci400482e\u003c/li\u003e\n \u003cli\u003eSonnberger, R. B. H. (1989). Regression Diagnostics: Identifying Influential Data and Sources of Collinearity.by D. A. Belsley; E. Kuh; R. E. Welsch. \u003cem\u003eJournal of Applied Econometrics\u003c/em\u003e, 4(1), 97-99\u003c/li\u003e\n \u003cli\u003eSvetnik, V. (2003). Random forest: a classification and regression tool for compound classification and QSAR modeling. \u003cem\u003eJournal of Chemical Information \u0026amp; Computer Sciences\u003c/em\u003e, 43(6), 1947-1958. doi: https://doi.org/10.1021/ci034160g\u003c/li\u003e\n \u003cli\u003eTan, C. H., Dai, H. P., Lu, J., \u0026amp; Shi, W. (2020). \u003cem\u003eRaspberry production in greenhouse in Northeast China\u003c/em\u003e.\u003c/li\u003e\n \u003cli\u003eTibshirani, R., \u0026amp; Tibshirani, R. (1996). Regression shrinkage via the lasso. \u003cem\u003eJournal of the Royal Statistical Society: Series B (Methodological)\u003c/em\u003e, 58(1), 267-288. doi: https://doi.org/10.1111/j.2517-6161.1996.tb02080.x\u003c/li\u003e\n \u003cli\u003eTsarouchi, M. I., Vlachopoulos, G. F., Karahaliou, A. N., Vassiou, K. G., \u0026amp; Costaridou, L. I. (2020). Multi-parametric MRI lesion heterogeneity biomarkers for breast cancer diagnosis. \u003cem\u003ePhysica Medica\u003c/em\u003e, 80(2), 101-110\u003c/li\u003e\n \u003cli\u003eVu, B. N., S\u0026aacute;nchez, O., Bi, J., Xiao, Q., Hansel, N. N., Checkley, W.,... Liu, Y. (2019). Developing an Advanced PM2.5 Exposure Model in Lima, Peru \u003cem\u003eRemote Sensing\u003c/em\u003e (11, pp. 614). (Reprinted.\u003c/li\u003e\n \u003cli\u003eWang, Y., Li, Y., Song, Y., \u0026amp; Rong, X. (2019). Facial Expression Recognition Based on Random Forest and Convolutional Neural Network. \u003cem\u003eInformation (Basel)\u003c/em\u003e, 10(12), 375. doi: 10.3390/info10120375\u003c/li\u003e\n \u003cli\u003eWei, X., Wu, L., Ge, D., Yao, M., \u0026amp; Bai, Y. (2022). Prediction of the Maturity of Greenhouse Grapes Based on Imaging Technology. \u003cem\u003ePlant Phenomics\u003c/em\u003e, 2022(Mar 30), 9753427. doi: 10.34133/2022/9753427\u003c/li\u003e\n \u003cli\u003eY., L., X., W., H., Y., \u0026amp; W., D. (2023). Pattern-Coupled Baseline Correction Method for Near-Infrared Spectroscopy Multivariate Modeling. \u003cem\u003eIEEE Transactions on Instrumentation and Measurement\u003c/em\u003e, 72(1557-9662), 1-9. doi: 10.1109/TIM.2023.3265101\u003c/li\u003e\n \u003cli\u003eYan, G., Zhang, J., Jiang, M., Gao, X., Yang, H.,... Li, L. (2020). Identification of Known and Novel MicroRNAs in Raspberry Organs Through \u0026nbsp;High-Throughput Sequencing. [Journal Article]. \u003cem\u003eFront Plant Sci\u003c/em\u003e, 11, 728. doi: 10.3389/fpls.2020.00728\u003c/li\u003e\n \u003cli\u003eZhang, L., Zhang, K., Liu, S., Zhang, R., Yang, Y., Wang, Q.,... Wang, J. (2021). Identification of a ceRNA Network in Lung Adenocarcinoma Based on Integration Analysis of Tumor-Associated Macrophage Signature Genes. \u003cem\u003eFrontiers in cell and developmental biology\u003c/em\u003e, 9, 629941. doi: 10.3389/fcell.2021.629941\u003c/li\u003e\n \u003cli\u003eZhu, M., Huang, D., Hu, X., Tong, W., Han, B., Tian, J.,... Luo, H. (2020). Application of hyperspectral technology in detection of agricultural products and food: A Review. \u003cem\u003eFood Science \u0026amp; Nutrition\u003c/em\u003e, 8(10), 5206-5214. doi: https://doi.org/10.1002/fsn3.1852\u003c/li\u003e\n \u003cli\u003eZhu, X., Chen, F., Zheng, Y., Peng, X., \u0026amp; Chen, C. (2024). An efficient method for detecting Camellia oleifera fruit under complex orchard environment. \u003cem\u003eScientia Horticulturae\u003c/em\u003e, 330(0304-4238), 113091. doi: https://doi.org/10.1016/j.scienta.2024.113091\u003c/li\u003e\n \u003cli\u003eZhujun, C., Juan, W., Xuan, L., Yuhong, G., \u0026amp; Zhenhui, R. (2023). The Application of Optical Nondestructive Testing for Fresh Berry Fruits. \u003cem\u003eFood Engineering Reviews\u003c/em\u003e, 16(2024), 85-115. doi: 10.1007/s12393-023-09353-3\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Hyperspectral technology, Fruit prediction, Machine learning models, Precision agriculture, MLP, RF","lastPublishedDoi":"10.21203/rs.3.rs-4607290/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4607290/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThe aim of this study is to explore the potential application of hyperspectral technology in detecting the problem of fruit cover in the orchard. Three types of hyperspectral data were collected using a hyperspectral instrument to cover raspberry fruits with leaves. Machine learning models were used to classify and regress covered and uncovered fruits. The results show that hyperspectral technology can effectively differentiate fruits under different cover conditions, with spectral intensity data performing better in addressing cover issues. Random forest (RF) and multilayer perceptron (MLP) models demonstrated high accuracy in classification analysis, with MLP achieving a ROC AUC value of 0.99 on full-band data. Regression analysis also revealed a significant correlation between degree of coverage and spectral features, highlighting in particular the high explanatory power of light intensity data in predicting degree of coverage. This study not only confirms the application value of hyperspectral technology in precision agriculture, but also provides new technical support for intelligent orchard management and automated harvesting. Future research will focus on improving the generalisation ability of the models, integrating multi-source data to further improve the accuracy of coverage detection, and exploring the development of real-time monitoring and automatic control systems to achieve comprehensive intelligence in orchard management.\u003c/p\u003e","manuscriptTitle":"Analysis of Leaf cover on Raspberry Fruits Based on Hyperspectral Techniques Combined with Machine Learning Models","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-07-15 10:12:39","doi":"10.21203/rs.3.rs-4607290/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"5fdd9e6e-64fe-412e-ba32-512a0288a4f4","owner":[],"postedDate":"July 15th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":34533326,"name":"Physical sciences/Optics and photonics/Optical techniques/Spectroscopy"},{"id":34533327,"name":"Biological sciences/Plant sciences/Plant ecology"},{"id":34533328,"name":"Physical sciences/Mathematics and computing/Information technology"}],"tags":[],"updatedAt":"2026-01-14T13:25:07+00:00","versionOfRecord":[],"versionCreatedAt":"2024-07-15 10:12:39","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-4607290","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4607290","identity":"rs-4607290","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.