Quantitative analysis of camellia oil in blending vegetable oil based on Raman spectroscopy and deep learning models | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Quantitative analysis of camellia oil in blending vegetable oil based on Raman spectroscopy and deep learning models YuanBo Huang, Hua Zhao, Qin Luo, Yulin Xu, Shi Yin, Yongjun Hu This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6594745/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 9 You are reading this latest preprint version Abstract There is an urgent need for a fast and accurate method to quantify the true content of high-value vegetable oils in vegetable blended oils. In this study, Raman spectroscopy is combined with three deep learning models to identify the camellia oil content in rapeseed-corn-camellia oil blends. All three deep learning models demonstrate superior predictive capabilities compared to traditional machine learning models. Notably, the improved CNN-GRU-MHA model shows the best performance in quantitatively predicting the camellia oil content, with R 2 p and RMSEP values of 0.9981 and 0.3714. The results indicate that the proposed method provides a promising analytical approach for authenticity detection of blended oils. Raman spectra Vegetable blended oils Deep Learning Models Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 1. Introduction Vegetable blended oil is a vegetable oil made by mixing two or more pure vegetable oils in a certain ratio[ 1 ]. Compared to single vegetable oils, blended oils provide a more balanced nutritional value due to the varying fatty acid content of different vegetable oils[ 2 ]. Generally, vegetable blended oils are made by mixing a large amount of low-priced pure vegetable oils with a small amount of high-priced pure vegetable oils[ 3 ]. Camellia oil is rich in oleic acid, vitamin E, polyphenols and other bioactive substances with health-promoting properties. Long-term consumption of camellia oil can alleviate high blood pressure and cardiovascular diseases, earning it the title "Oriental olive oil" [ 4 , 5 ]. However, some unscrupulous manufacturers overstate the proportion of camellia oil in blends of rapeseed, corn, and camellia oils through false advertising to make illegal profits. Therefore, accurately identifying the content of specific high-value vegetable oils in blended oils is crucial for protecting consumers' rights and interests. Generally, conventional analytical methods such as Gas chromatography[ 6 ], Mass spectrometry[ 7 , 8 ], Gas chromatography-mass spectrometry[ 9 ], Synchronized fluorescence spectrometry[ 10 , 11 ] and Two-dimensional correlation spectroscopy[ 12 , 13 ] have been used to determine the types and ratios of fatty acids to identify vegetable oils. However, these methods have their own limitations, such as complex sample handling, cumbersome procedures, being time-consuming and requiring specialized laboratory facilities[ 14 ].In contrast, Raman spectroscopy, which neither consumes chemical reagents nor requires complexed sample preparation, is a simple and rapid detection method. It is a promising detection technique for identifying vegetable oils[ 15 – 17 ]. Although traditional machine learning models have had some success in identifying vegetable oils[ 18 ], they often rely on manual feature extraction and feature selection when dealing with complex, multivariate spectral data. This approach not only increases the workload, but also tends to lead to inadequate or inaccurate feature extraction[ 19 ]. In contrast, deep learning models can automatically extract features from spectral data, which significantly improves the accuracy and robustness of the models[ 20 ]. However, deep learning models may still face problems of feature selection and model generalization when dealing with complex spectral data[ 21 ].The introduction of the attention mechanism provides an effective solution to this issue. The attention mechanism can dynamically adjust the degree of attention of the model to different parts of the input data, thus improving the feature extraction and discrimination ability of the model in complex spectral data[ 22 ]. In this study, the spectral dataset was visualized using principal component analysis and uniform manifold approximation and projection. Then, three spectral variable selection methods—namely, the Genetic Optimization Algorithm, Uninformative Variable Exclusion Algorithm, and Whale Optimization Algorithm—were used to identify the relevant spectral variables of camellia oil. An extreme learning machine model was then constructed using the selected variables. Finally, three deep learning models, including one-dimensional convolutional neural network (1D-CNN), ConvNext with Efficient Channel Attention (ConvNext-ECA), and convolutional neural network with gated recurrent unit and multi-head attention (CNN-GRU-MHA), were developed to quantitatively identify camellia oil content in ternary plant oil blends. 2. Experiments and methods 2.1 Sample preparation Three samples of pure oils were purchased from Tmall (Hangzhou, China), namely extra virgin camellia oil, rapeseed oil and corn oil. All samples were accompanied by traceable labels complying with the Chinese national standard for pure oils, and the blends were prepared in increments, with extra virgin camellia oil in increments of 5% by volume and the other two oils in increments of 10% by volume. After conducting a market survey, it was concluded that the content of camellia oil in any camellia blend would not be too high. Therefore, the maximum concentration of extra virgin camellia oil in this study was limited to 30%. The content of the other two oils ranged from 0–100%, and a total of 68 samples of blended oils with different concentration ratios were prepared. Three samples were prepared for each concentration, and 10 Raman spectra were obtained for each sample, for a total of 30 spectra for each concentration, and a total of 2040 Raman spectral data were obtained. During preparation, the three pure oils were pipetted into a beaker according to their corresponding volume ratios, with a total volume of 10 mL. The mixture was then washed and dried with deionized water and stirred well using an ultrasonic oscillator (JM-10D-28, Jiemeng Ultrasonics Co., Ltd., China). Prior to acquiring Raman spectra, 0.1 mL of the well-mixed sample was extracted using a pipette (Eppendorf Research plus, Eppendorf AG, Germany) and dropped onto a multiwell plate. Each sample was stored in a dark and cold environment before spectrum acquisition. 2.2 Raman measurements Raman spectra of all samples were obtained by a Raman spectrometer (RK785-I, BW&TEK, USA). The laser excitation wavelength was 785 nm, the laser power was set at 3.0 mW, the signal-to-noise ratio was 300:1, the single spectrum acquisition time was 3 s, and the number of accumulations was 3 times to obtain spectra with good signal-to-noise ratio. The spectral data in the Raman spectral shift range of 1000–1800 cm − 1 were extracted for subsequent data analysis. 2.3 Spectra pre-processing Raman spectral signals are often affected by a variety of interfering factors, which can be improved by preprocessing techniques to eliminate background signal noise, increase the accuracy and reliability of spectral analysis. The original Raman spectra were preprocessed using the following steps:1. Smoothing Filtering : The Savitzky-Golay (SG) filtering method was applied to smooth the original Raman spectra, reducing noise;2. Standardization : The Standard Normal Variate (SNV) method was used to minimize spectral intensity variations caused by sample inhomogeneities; 3. Baseline Correction : Baseline subtraction was performed using the Asymmetric Least Squares (ALS) smoothing method[ 24 ];4. Normalization : To improve prediction accuracy, all spectral data were normalized to the range [0,1] using the max-min normalization process;5. Data Averaging and Expansion : Five spectral data points were randomly selected from each concentration and averaged, resulting in 68 different concentrations with a total of 408 spectra. To meet the sample size requirements for the deep learning model, the dataset was expanded by adding Poisson noise and frequency-domain perturbations to the averaged spectral data, resulting in a total of 2040 expanded spectral data. 2.4 Visualization and analysis of spectral dataset (PCA, UMAP) Principal component analysis and uniform manifold approximation and projection are both spectral downscaling methods. PCA is a widely used linear downscaling technique that transforms the original variables into some new variables that are linear combinations of the originals[ 23 ].These new variables, called principal component scores, capture as much variation in the data as possible while retaining essential information. In applications, these component scores can be mapped to two or three dimensions for sample classification. In contrast, UMAP is a novel dimensionality reduction method that preserves both local and global structures by constructing a graph representation of high-dimensional data and optimizing the distances between data points in a low-dimensional space. It is commonly used for clustering and visualizing high-dimensional datasets[ 24 ]. 2.5 Model of ELM Extreme Learning Machines are a class of machine learning models based on feedforward neural networks suitable for solving nonlinear regression and classification problems. The ELM algorithm consists of randomly generated hidden-layer node parameters and a linear output layer, which is determined analytically from the output weights of a single hidden-layer feedforward network[ 25 ]. Compared with other traditional machine learning models, ELM has significant advantages with fast and good computational scalability. In this study, four ELM models were used to predict the concentration of camellia oil in blended oil. 2.6 1D-CNN The Convolutional Neural Network is one of the representative models of deep learning models, which is a class of feed-forward neural networks that contain convolutional computation with a deep structure[ 26 ]. In this study, a 1D-CNN quantitative identification model was constructed using 2040 Raman spectral data to predict the camellia oil content in a ternary vegetable oil blend. It includes an input layer, four 1D convolutional layers, a batch normalization layer, two pooling layers, a flatten layer, a fully connected layer, a Dropout layer and an output layer. Figure 1 (a) illustrates the structure of the 1D-CNN model. The Raman spectral data in the range of 1000 cm − 1 to 1800 cm − 1 are fed into the one-dimensional CNN model through the input layer. The 1D convolutional layer extracts features from the input data, while the batch normalization layer enhances model stability and accelerates convergence. The max pooling layer compresses and combines features extracted by the convolutional layer to reduce the dimensionality of its outputs. The flattening layer transforms high-dimensional data into a one-dimensional format, and the fully connected layer links all neurons from the output of the flattening layer to its neurons. The Dropout layer is employed to prevent overfitting. The loss function of the model is "MSE", the optimizer is "Nadam", and the learning rate is 0.0001. 2.7 ConvNext-ECA In the field of deep learning, especially in deep convolutional neural networks, the attention mechanism has become one of the key techniques to improve the model performance. Among them, the ECA module (Efficient Channel Attention), as a novel channel attention mechanism, captures inter-channel dependencies through one-dimensional convolutional[ 27 ]. Compared with the traditional attention mechanism, ECA module avoids the complex processes of dimensionality reduction and dimensionality enhancement, resulting in a more efficient and lightweight design. In this study, a ConvNeXt-ECA model was developed, in which an ECA module is added after each convolutional layer to achieve local cross-channel interactive fusion of spectral data. After applying global average pooling and flattening, the model ultimately identifies the camellia oil content. The structure of this model is shown in Fig. 1 (b). 2.8 CNN-GRU-MHA Spectral data obtained from Raman spectrometers constitute a collection of time series data. Recurrent neural networks (RNN) are widely used for time series data processing, but often face issues with gradient vanishing and explosion when dealing with long sequences. Gated Recurrent Unit (GRU),with their gating mechanism, effectively handle long sequences and have lower computational costs compared to RNN[ 28 ]. In addition, 1D-CNN are comparable to RNN in handling certain sequence data but are more computationally efficient. Thus, 1D-CNN can serve as a preprocessing step, extracting features from spectral data and reducing the input sequence length before feeding them into the GRU layer. To further enhance model performance, this study introduces the Multi-Head Attention (MHA) mechanism after the GRU layer, which enables the model to focus on key feature regions in the Raman spectra and capture relationships between different parts of the data. Based on these considerations, a CNN-GRU-MHA model was developed in this study, as shown in Fig. 1 (c). The model consists of a convolutional layer, a pooling layer, and a reshaping layer, which extract features and adjust the data before feeding it into the GRU layer with 128 output dimensions. The processed data is then passed to the MHA mechanism, followed by layer normalization. Test results demonstrate that this model effectively combines the efficient feature extraction capability of 1D-CNN, the temporal feature processing ability of GRU, and the global feature capturing capability of MHA, resulting in excellent predictive performance. 2.9 Evaluation of the quantitative analysis model In quantitative analysis, the coefficient of determination (R 2 ) and root mean square error (RMSE) are commonly used to evaluate the performance of a fitted regression model[ 29 ].The calculations for RMSE and R² are shown in (1) and (2): $$\:RMSE=\sqrt{\frac{1}{n}{\sum\:}_{i=1}^{n}{\left({y}_{i}-{\widehat{y}}_{i}\right)}^{2}}$$ 1 $$\:{R}^{2}=1-\frac{\sum\:_{i=1}^{n}\:{\left({y}_{i}-{\widehat{y}}_{i}\right)}^{2}}{\sum\:_{i=1}^{n}\:{\left({y}_{i}-\stackrel{-}{y}\right)}^{2}}$$ 2 where n is the number of samples, and \(\:{\:y}_{i}\) and \(\:{\widehat{\:y}}_{i}\) represent the actual and predicted values of camellia oil concentration in the mixture, respectively. Generally, a lower RMSE and \(\:{R}^{2}\) value closer to 1 indicate a better predictive performance of the regression model. 3. Results and discussion 3.1 Raman spectroscopy analysis After pretreatment, the Raman spectra of the three pure oils—rapeseed oil, camellia oil, and corn oil—are shown in Fig. 2 . In Fig. 2 , the characteristic peak near the Raman shift value of 1062 cm⁻¹ mainly corresponds to the C-C stretching vibration of fatty acids. The characteristic peak near 1255 cm⁻¹ reflects the = C-H bending vibration of carbon atoms in unsaturated olefinic bonds. The peak near 1288 cm⁻¹ is related to the C-H bending vibration of methylene groups. The characteristic peak near 1427 cm⁻¹ corresponds to the -CH₂ scissoring vibration. The peak near 1641 cm⁻¹ represents the cis C = C stretching vibration of olefinic RHC = CHR, and the peak near 1732 cm⁻¹ corresponds to the C = O stretching vibration of the ester bond carbonyl group[ 30 ]. As shown in Fig. 2 , the Raman spectral shapes and the positions of the Raman characteristic peaks of the three vegetable oils are approximately the same, with the differences lying in the peak intensities of these characteristic peaks. Raman spectra reflect the vibrations of molecular bonds in the structure of substances, and such molecular bonds are present in the unsaturated fatty acids of vegetable oils. Therefore, the positions of the characteristic peaks in Raman spectra and the intensity of the peaks are related to the types and proportions of unsaturated fatty acids in vegetable oils. It can be inferred that the fatty acid compositions of the three vegetable oils are approximately the same, but variations in the content of certain fatty acids lead to differences in the intensities of the same Raman peaks across different vegetable oils[ 31 ]. Figure 3 shows the Raman spectra of the binary mixture of camellia oil and corn oil at varying camellia oil concentrations. It can be observed that as the camellia oil content increases from 0–30%, the intensity of the characteristic peaks around 1288 cm⁻¹ and 1641 cm⁻¹, attributed to oleic acid in camellia oil, increases accordingly. This provides a valuable basis for identifying camellia oil in vegetable oil mixtures. Figure 4 shows the Raman spectra of the binary blended oils of rapeseed oil and corn oil. As observed from Fig. 4 , the intensity of the characteristic peak at 1515 cm⁻¹ increases with the content of rapeseed oil, and this peak reflects the content of β-carotene[ 30 ]. The β-carotene content in corn oil was found to be higher than that in corn oil. Therefore, as the content of rapeseed oil increases, the mixture contains a higher amount of β-carotene, resulting in a stronger characteristic peak at this Raman shift. However, it was not possible to accurately distinguish between rapeseed and corn oils or blends with different concentrations of the two vegetable oils using only the Raman spectra. Therefore, the PCA model and UMAP model were used visualize and analyze the three pure oils. Data visualization using PCA was performed on the three pure oil Raman spectral datasets, and the variance contributions of principal components 1, 2 and 3 were 92.71%, 3.26%, and 0.57%, respectively. The cumulative variance contribution of the first two principal components exceeds 95%, while that of the first three principal components exceeds 96%. This indicates that the PCA model effectively captured most of the information in the vegetable oil data. The first three principal components were selected to visualize the spectral data in two and three dimensions, as shown in Fig. 5 (a) and Fig. 5 (b). However, the results indicated that it was difficult to clearly distinguish among the three pure oils. To address this issue, the superior UMAP visualization algorithm was used. The results demonstrated that UMAP could effectively distinguish the three pure oils in both two-dimensional and three-dimensional spaces, as shown in Fig. 5 (c) and Fig. 5 (d). In the visualization plots of PCA and UMAP, it can be observed that there are some differences between the Raman spectral data of the same oil species. Although corn oil and rapeseed oil are similar in composition and Raman spectra, the species differences between them are significantly larger than the differences within them in the UMAP analysis, and the rapeseed oil species differences are the largest. Visualization intuitively reveals the similarities and differences between different pure oils, leading to a better understanding of their performance in blended oils, which in turn improves the accuracy and reliability of the quantitative model. Therefore, it is necessary to visualize the three pure oils before conducting quantitative analysis of the blended oils. 3.2 Quantitative analysis models of ELM and its combination with variable selection In this study, various machine learning models such as ELM, GA-ELM, UVE-ELM and WOA-ELM were used to quantitatively analyze the concentration of camellia oil in vegetable blended oils in combination with 408 Raman spectral data, and the related information is shown in Table 1 . Specifically, four spectral data from each concentration of blended oil were randomly selected as the training set, and the remaining two spectral data were used as the test set, which yielded a total of 272 spectral data for training and 136 spectral data for testing.GA is an optimization method based on the theory of biological evolution used to simulate the evolutionary mechanism of competitive selection of biological species[ 32 ]. The GA selected 76 spectral variables after 100 iterations (Fig. 6 (a)). As shown in Fig. 6 (b), the selected variables are mainly distributed between the individual Raman spectral peaks. Then, the ELM model was constructed using these 76 variables, and the values of R 2 P and RMSEP for the final model were 0.9463 and 1.8679. Table 1 Different variable selection algorithms combined with four ELM models. Quantitative Model PC variables Training Test RMSEC RMSEP R 2 c R 2 p ELM 10 400 2.5861 2.7425 0.9054 0.8896 GA-ELM 7 76 1.7926 1.8679 0.9531 0.9463 UVE-ELM 9 49 0.9865 1.0592 0.9754 0.9718 WOA-ELM 4 21 1.4551 1.8034 0.9647 0.9572 In addition, UVE is an uninformative variable elimination method used to improve the configuration problem in the dataset[ 33 ]. The UVE eliminates uninformative variables by calculating stability metrics and setting thresholds, and a total of 49 spectral variables were selected (Fig. 6 (c)). The selected spectral variables were mainly concentrated at the vibrational peaks of the Raman spectra of camellia oil, as shown in Fig. 6 (d). Ultimately, the values of R 2 P and RMSEP for the UVE-ELM model were 0.9718 and 1.0592 respectively with improved predictive ability compared to GA-ELM. Additionally, the whale optimization algorithm is a new nature-inspired meta-heuristic algorithm, which is implemented to find the optimal solution by mimicking the feeding behavior of humpback whales in the ocean[ 34 ]. As shown in Fig. 6 (e), the WOA algorithm selected 21 spectral variables by setting the population size, shrinking envelope coefficient, spiral update parameter, and iteration number. Finally, the values of R 2 P and RMSEP for the model were 0.9572 and 1.8034(Fig. 6 (f)). Figure 7 shows the calibration and prediction results of the four quantitative models, and it was found that the prediction accuracy of the ELM model improved by 0.0567 after GA feature extraction compared with the ELM model due to GA feature extraction effectively reduces the dimensionality of the spectral data, thereby simplifying the model construction[ 32 ].The prediction accuracy of the ELM model for camellia oil after UVE variable screening was improved by 0.0255 compared to the GA-ELM because the reduction in the number of variables eliminates the covariance or uncorrelated information, which in turn improves the model performance[ 35 ]. Moreover, most of the screened variables were concentrated near the main molecular vibration Raman peaks of camellia oil. In contrast, the prediction accuracy of the ELM model built after the screening of WOA variables decreased by 0.0146 compared with that of the UVE-ELM due to variables were removed, resulting in the loss of valuable information. This loss hindered the model from achieving higher prediction accuracy. Although the ELM model can predict the concentration of camellia oil, its prediction is not accurate enough. Therefore, in order to predict the camellia oil content in blended oils more accurately, three different deep learning models were designed. 3.3 Deep learning quantitative analysis models In this study, three one-dimensional deep learning models based on 1D-CNN, ConvNext-ECA and CNN-GRU-MHA were constructed to quantitatively analyze camellia oil in mixed vegetable oils using 2040 spectral data. To evaluate the generalization ability of the deep learning models, 60% of the data was randomly selected as the training set, 20% as the validation set, and the remaining 20% as the test set. The training set was used to determine the model parameters, the validation set was used to adjust the hyperparameters and optimize the network structure, and the test set was used to evaluate the generalization ability of the models. All models are quantitatively analyzed using the "MSE" loss function and the "Leaky ReLU" activation function, while the ConvNext-ECA and CNN-GRU-MHA models use the "Adam" optimizer with a learning rate of 0.0001, and the 1D-CNN model uses the "Nadam" optimizer. Table 2 presents the average prediction results of 10 runs of each model, and Fig. 8 displays the prediction results of the three deep learning models. Table 2 Prediction results of three deep learning models, 1D-CNN, ConvNeXt-ECA, and CNN-GRU-MHA. Quantitative Model Training Test RMSEC RMSEP R 2 c R 2 p 1D-CNN 0.5430 0.5845 0.9938 0.9912 ConvNeXt-ECA 0.4632 0.5217 0.9964 0.9957 CNN-GRU-MHA 0.3326 0.3714 0.9986 0.9981 All three models demonstrated excellent prediction ability, with R 2 p higher than 0.995 for ConvNeXt-ECA and R 2 p higher than 0.998 for CNN-GRU-MHA. Although 1D-CNN is not as accurate as the other two models in predicting camellia oil concentration, it is still superior to the traditional machine learning models as shown in Fig. 9 .The ConvNeXt-ECA model has been improved on the Convolutional Neural Network architecture by introducing the ECA attention mechanism, which focuses more on the important features of the Raman spectral data, thereby enhancing feature extraction. The CNN-GRU-MHA model demonstrates better prediction performance, probably because the model not only efficiently extracts the data features through the convolutional and pooling layers of CNN, but also achieves more accurate sequence modeling through the GRU's powerful temporal feature processing capability. Additionally, the inclusion of the multi-head attention mechanism enables the model to capture dependencies and complex spatio-temporal correlations in the data. However, both the 1D-CNN model and the ConvNeXt-ECA model are insensitive to sequence data and cannot effectively capture some temporal information in Raman spectral data. Therefore, compared to the other two models, CNN-GRU-MHA demonstrates superior prediction performance. 4. Conclusion In this study, three one-dimensional deep learning models were designed to enable the quantitative analysis of camellia oil in three-component blends. The results show that all three deep learning models exhibit excellent prediction performance compared with the traditional machine learning models. In addition, the improved models successfully quantitatively analyzed the concentration of camellia oil in blended vegetable oils. Among these three deep learning models, the CNN-GRU-MHA model showed the best prediction performance, with the model's R 2 p higher than 0.998 and RMSEP lower than 0.372. However, it is worth noting that the blended oils sold in the market are not all made of a mixture of three vegetable oils. Therefore, this study provides an attentional mechanism combined with deep learning models for quantitatively analyzing high-value oil types in blended oils, and also demonstrates the potential of the application of attentional mechanism fused with deep learning models in vegetable oil identification. Declarations CRediT authorship contribution statement Yuanbo Huang: Investigation, Conceptualization, Methodology, Formal analysis, Writing original draft. Hua Zhao: Data curation, Formal analysis. Qin Luo: Formal analysis. Yulin Xu: Formal analysis. Shi Yin: Formal analysis. Yongjun Hu: Funding acquisition, Project administration, Supervision, Writing – review & editing. Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Data availability Data will be made available on request. Acknowledgments This work was financially supported by National Natural Science Foundation of China (NSFC) (Grant nos. 22473045 and 22073031), the Natural Science Foundation of Guangdong Province (Grant No. 2021A15150123). References X. Wu, X. Zhang, Z. Du, D. Yang, B. Xu, R. Ma, H. Luo, H. Liu, Y. Zhang, Raman spectroscopy combined with multiple one-dimensional deep learning models for simultaneous quantification of multiple components in blended olive oil, Food Chem., 431 (2024) 137109. V. Ramya, K.P. Shyam, B. Kadalmani, Determination of Mono-Oil Proportion in Blended Edible Vegetable Oil (BEVO) with Identical Fatty Acid Profile: a Case Study on Coconut-Palm Kernel Oil Discrimination, Food Analytical Methods, 15 (2022) 1407-1417. J. Zhu, Y. Rong, X. Jiang, H. Qian, X. Yu, Q. Chen, Raman spectroscopy coupled with metaheuristics-based variable selection models: A method for rapid determination of extra virgin olive oil content in vegetable blend oils, J. Food Compos. Anal., 123 (2023) 105503. L. Gao, L. Jin, Q. Liu, K. Zhao, L. Lin, J. Zheng, C. Li, B. Chen, Y. Shen, Recent advances in the extraction, composition analysis and bioactivity of Camellia (Camellia oleifera Abel.) oil, Trends Food Sci. Technol., 143 (2024) 104211. Y. Shang, L. Bao, H. Bi, S. Guan, J. Xu, Y. Gu, C. Zhao, Authenticity Discrimination and Adulteration Level Detection of Camellia Seed Oil via Hyperspectral Imaging Technology, Food Analytical Methods, 17 (2024) 450-463. W. Mu, Y. Zhao, Z. Wang, Y. He, C. Yang, J. Wang, Combining lipase enzymatic techniques and antioxidants on the flavor of structured lipids (SLs) prepared from goat butter and coconut oil, Food Bioscience, 60 (2024) 104332. G. Zeng, Z. Wang, Y. Hou, B. Ding, L. Wang, W. Chen, J. Li, J. Xie, Identification of Soybean Origin via TAGs Profile Analysis Using MALDI-TOF/MS, Food Analytical Methods, 17 (2024) 766-772. H.G.T.H. Jayatunga, H.D. Weerathunge, H.P.P.S. Somasiri, K.R.R. Mahanama, Use of Process-Based Marker Compounds to Identify Different Coconut Oils, Food Analytical Methods, 17 (2024) 96-104. C. Gong, X.-T. Lu, S.-D. Zhang, K. Xiao, X. Xu, Detection of lard adulteration in 3 kinds of vegetable oils by liquid chromatography–mass spectrometry with porous graphite carbon column, Anal. Sci., 40 (2024) 1289-1299. Y.-H. Liu, P.-P. Wu, Q. Liu, H.-D. Luo, S.-H. Cao, G.-C. Lin, D.-S. Fu, X.-D. Zhong, Y.-Q. Li, A Simple Fluorescence Spectroscopic Approach for Simultaneous and Rapid Detection of Four Polycyclic Aromatic Hydrocarbons (PAH4) in Vegetable Oils, Food Analytical Methods, 9 (2016) 3209-3217. K. Wójcicki, I. Khmelinskii, M. Sikorski, F. Caponio, V.M. Paradiso, C. Summo, A. Pasqualone, E. Sikorska, Spectroscopic techniques and chemometrics in analysis of blends of extra virgin with refined and mild deodorized olive oils, Eur. J. Lipid Sci. Technol., 117 (2015) 92-102. Y. Liu, L. Yao, Z. Xia, Y. Gao, Z. Gong, Geographical discrimination and adulteration analysis for edible oils using two-dimensional correlation spectroscopy and convolutional neural networks (CNNs), Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 246 (2021) 118973. J. Qiu, H.-Y. Hou, N.T. Huyen, I.-S. Yang, X.-B. Chen, Raman Spectroscopy and 2DCOS Analysis of Unsaturated Fatty Acid in Edible Vegetable Oils, Applied Sciences, 9 (2019) 2807. J. Kuang, N. Luo, Z. Hao, J. Xu, X. He, J. Shi, NI-Raman spectroscopy combined with BP-Adaboost neural network for adulteration detection of soybean oil in camellia oil, Journal of Food Measurement and Characterization, 16 (2022) 3208-3215. C. Lörchner, C. Fauhl-Hassek, M.A. Glomb, V. Baeten, J.A. Fernández Pierna, S. Esslinger, Comparison of Spectroscopic Techniques Using the Adulteration of Pumpkin Seed Oil as Example, Food Analytical Methods, 17 (2024) 332-347. G. Jiménez-Hernández, F. Ortega-Gavilán, M.G. Bagur-González, A. González-Casado, Discrimination/Classification of Edible Vegetable Oils from Raman Spatially Solved Fingerprints Obtained on Portable Instrumentation, Foods, 13 (2024) 183. Z. Lu, H. Yu, Y. Yin, Y. Yuan, H. Liang, F. Li, Z. Li, Determination of the Acid and Peroxide Values of Vegetable Oils by Raman Spectroscopy with Competitive Adaptive Reweighted Sampling (CARS) and Back Propagation Neural Network (BPNN), Anal. Lett., 57 2289-2306. Y.-K. Li, W.-C. Jiao, B.-W. Han, M. Jia, D.-M. Wang, H.-M. Liu, L.-X. Hou, Detection of counterfeit sesame oil based on Raman spectroscopy and chemometric analysis, LWT, 185 (2023) 115131. S.Y.-S.S. Adade, H. Lin, S.A. Haruna, N.A.N. Johnson, A.O. Barimah, Z. Afang, Z. Chen, J.-N. Ekumah, W. Fuyun, H. Li, Q. Chen, Multicomponent prediction of Sudan dye adulteration in crude palm oil using SERS – Based bimetallic nanoflower combined with genetic algorithm, J. Food Compos. Anal., 125 (2024) 105768. M. Wu, M. Li, B. Fan, Y. Sun, L. Tong, F. Wang, L. Li, A rapid and low-cost method for detection of nine kinds of vegetable oil adulteration based on 3-D fluorescence spectroscopy, LWT, 188 (2023) 115419. A.-Q. Chen, H.-L. Wu, T. Wang, X.-Z. Wang, H.-B. Sun, R.-Q. Yu, Intelligent analysis of excitation-emission matrix fluorescence fingerprint to identify and quantify adulteration in camellia oil based on machine learning, Talanta, 251 (2023) 123733. X. Xin, X. Tian, C. Chen, C. Chen, K. Li, X. Ma, L. Zhao, X. Lv, A method for accurate identification of Uyghur medicinal components based on Raman spectroscopy and multi-label deep learning, Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 315 (2024) 124251. D. Cao, F. Shi, J. Sheng, J. Zhu, H. Yin, S. Qin, J. Yao, L. Zhu, J. Lu, X. Wang, Machine learning–driven SERS analysis platform for rapid and accurate detection of precancerous lesions of gastric cancer, Microchim. Acta, 191 (2024) 415. M.N. Mohamad Asri, R. Verma, N.A. Mahat, N.A.M. Nor, W.N.S. Mat Desa, D. Ismail, Discrimination and source correspondence of black gel inks using Raman spectroscopy and chemometric analysis with UMAP and PLS-DA, Chemometrics Intellig. Lab. Syst., 225 (2022) 104557. H. Li, M. Mehedi Hassan, J. Wang, W. Wei, M. Zou, Q. Ouyang, Q. Chen, Investigation of nonlinear relationship of surface enhanced Raman scattering signal for robust prediction of thiabendazole in apple, Food Chem., 339 (2021) 127843. Z. Zhang, H. Li, L. Huang, H. Wang, H. Niu, Z. Yang, M. Wang, Rapid identification and quantitative analysis of malachite green in fish via SERS and 1D convolutional neural network, Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 320 (2024) 124655. Z. Kang, J. Liu, C. Ma, C. Chen, X. Lv, C. Chen, Early screening of cervical cancer based on tissue Raman spectroscopy combined with deep learning algorithms, Photodiagnosis and Photodynamic Therapy, 42 (2023) 103557. Z. Chen, X. Dong, C. Liu, S. Wang, S. Dong, Q. Huang, Rapid detection of residual chlorpyrifos and pyrimethanil on fruit surface by surface-enhanced Raman spectroscopy integrated with deep learning approach, Scientific Reports, 13 (2023) 19855. Y. Luo, W. Su, M.F. Rabbi, Q. Wan, D. Xu, Z. Wang, S. Liu, X. Xu, J. Wu, Quantitative analysis of microplastics in water environments based on Raman spectroscopy and convolutional neural network, Sci. Total Environ., 926 (2024) 171925. M. Saleem, N. Ahmad, R. Ullah, Z. Ali, S. Mahmood, H. Ali, Raman Spectroscopy–Based Characterization of Canola Oil, Food Analytical Methods, 13 (2020) 1292-1303. F. Huang, Y. Li, H. Guo, J. Xu, Z. Chen, J. Zhang, Y. Wang, Identification of waste cooking oil and vegetable oil via Raman spectroscopy, J. Raman Spectrosc., 47 (2016) 860-864. B. Bilgin, C. Yanik, H. Torun, M.C. Onbasli, Genetic Algorithm-Driven Surface-Enhanced Raman Spectroscopy Substrate Optimization, Nanomaterials, 11 (2021) 2905. H. Li, W. Sheng, M.M. Hassan, W. Geng, Q. Chen, Quantification of antibiotics in food by octahedral gold-silver nanocages-based SERS sensor coupling multivariate calibration, Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 320 (2024) 124595. X. Bian, R. Zhang, P. Liu, Y. Xiang, S. Wang, X. Tan, Near infrared spectroscopic variable selection by a novel swarm intelligence algorithm for rapid quantification of high order edible blend oil, Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 284 (2023) 121788. H. Pan, W. Ahmad, T. Jiao, A. Zhu, Q. Ouyang, Q. Chen, Label-free Au NRs-based SERS coupled with chemometrics for rapid quantitative detection of thiabendazole residues in citrus, Food Chem., 375 (2022) 131681. Additional Declarations No competing interests reported. Cite Share Download PDF Status: Under Review Version 1 posted Editorial decision: Revision requested 07 Sep, 2025 Reviews received at journal 06 Sep, 2025 Reviews received at journal 04 Sep, 2025 Reviewers agreed at journal 20 Aug, 2025 Reviewers agreed at journal 18 Aug, 2025 Reviewers invited by journal 13 May, 2025 Editor assigned by journal 06 May, 2025 Submission checks completed at journal 06 May, 2025 First submitted to journal 05 May, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6594745","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":456027344,"identity":"6a73fd1a-05b9-4ff2-b5a1-0e50f619d5b3","order_by":0,"name":"YuanBo Huang","email":"","orcid":"","institution":"South China Normal University","correspondingAuthor":false,"prefix":"","firstName":"YuanBo","middleName":"","lastName":"Huang","suffix":""},{"id":456027345,"identity":"fe91ca06-0c64-40d9-a851-e2634fd016ad","order_by":1,"name":"Hua Zhao","email":"","orcid":"","institution":"South China Normal University","correspondingAuthor":false,"prefix":"","firstName":"Hua","middleName":"","lastName":"Zhao","suffix":""},{"id":456027347,"identity":"e7002dc6-6937-4926-99f3-9ee3af5c013f","order_by":2,"name":"Qin Luo","email":"","orcid":"","institution":"South China Normal University","correspondingAuthor":false,"prefix":"","firstName":"Qin","middleName":"","lastName":"Luo","suffix":""},{"id":456027348,"identity":"ff418724-9b8d-4d17-b9d6-17d7339ca06b","order_by":3,"name":"Yulin Xu","email":"","orcid":"","institution":"South China Normal University","correspondingAuthor":false,"prefix":"","firstName":"Yulin","middleName":"","lastName":"Xu","suffix":""},{"id":456027350,"identity":"de2400e9-b16d-4229-a2e8-5f9ca39fad9c","order_by":4,"name":"Shi Yin","email":"","orcid":"","institution":"South China Normal University","correspondingAuthor":false,"prefix":"","firstName":"Shi","middleName":"","lastName":"Yin","suffix":""},{"id":456027351,"identity":"f01fb5ed-7e41-49cc-bd2d-40d8a430bb5d","order_by":5,"name":"Yongjun Hu","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAsklEQVRIiWNgGAWjYFCCMwwMH6BMCaK1MM4gUQsPAzMPSVoMDp49Jm3zxy7P4ADzwds8DHZ5hLUcOJcmnduWXGxwgC3ZmochuZgILWfMpHMbDiRuOMBjJs3DcCCxgSgtFn9AWvi/kaCFgQ1sCxtxWiQPnDG27G1LTpx5mM3Yco5BMmEtfDfOGN748ccuse9488MbbyrsCGtRuHEAymIGu5OQeiCQ7ydo6igYBaNgFIx4AADqvz7Aa0UbrAAAAABJRU5ErkJggg==","orcid":"","institution":"South China Normal University","correspondingAuthor":true,"prefix":"","firstName":"Yongjun","middleName":"","lastName":"Hu","suffix":""}],"badges":[],"createdAt":"2025-05-05 13:08:33","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6594745/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6594745/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":82804118,"identity":"99b115a5-9ff2-4113-b1f3-84eacd065827","added_by":"auto","created_at":"2025-05-15 12:03:07","extension":"jpg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":86859,"visible":true,"origin":"","legend":"\u003cp\u003eStructure diagram of the three deep learning models.\u003c/p\u003e","description":"","filename":"1.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6594745/v1/4f46bf5c4f1f1a6ac78b2d79.jpg"},{"id":82805207,"identity":"c0940258-77cf-493f-8d0f-ae4aa49df25b","added_by":"auto","created_at":"2025-05-15 12:19:07","extension":"jpg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":61231,"visible":true,"origin":"","legend":"\u003cp\u003eRaman spectra of three pure vegetable oils.\u003c/p\u003e","description":"","filename":"2.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6594745/v1/9a78a5715e217e2ec57486e4.jpg"},{"id":82804119,"identity":"b72983ae-4500-4738-ae76-15c5195e68ac","added_by":"auto","created_at":"2025-05-15 12:03:07","extension":"jpg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":43852,"visible":true,"origin":"","legend":"\u003cp\u003eRaman spectra of camellia oil concentration variations in camellia-corn oil mixtures.\u003c/p\u003e","description":"","filename":"3.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6594745/v1/ca3d54e68fbc71551606d17c.jpg"},{"id":82804120,"identity":"80892927-b7ba-4848-b77c-a8eb7f03867f","added_by":"auto","created_at":"2025-05-15 12:03:07","extension":"jpg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":51887,"visible":true,"origin":"","legend":"\u003cp\u003eRaman spectra of binary blended oils of rapeseed oil and corn oil.\u003c/p\u003e","description":"","filename":"4.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6594745/v1/f49aea356556c964886be3f8.jpg"},{"id":82804121,"identity":"273b2e92-dcb1-4be2-9b30-7f1a28a66377","added_by":"auto","created_at":"2025-05-15 12:03:07","extension":"jpg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":95118,"visible":true,"origin":"","legend":"\u003cp\u003ePCA and UMAP downscaling results for three pure oil spectral datasets, PCA distribution in 2D space (a), PCA distribution in 3D space (b), UMAP distribution in 2D space (c), and UMAP distribution in 3D space (d).\u003c/p\u003e","description":"","filename":"5.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6594745/v1/116077e843890f6e4b22a4cd.jpg"},{"id":82805028,"identity":"09357cb7-eb8e-4a0e-8db4-a5cdd8dd1c76","added_by":"auto","created_at":"2025-05-15 12:11:07","extension":"jpg","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":93097,"visible":true,"origin":"","legend":"\u003cp\u003eVariable selection process of GA algorithm (a), distribution of GA-selected variables in the spectrum (b), variable selection process of UVE algorithm (c), distribution of UVE-selected variables in the spectrum (d), variable selection process of WOA algorithm (e), and distribution of WOA-selected variables in the spectrum (f).\u003c/p\u003e","description":"","filename":"6.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6594745/v1/188801438c107872ecd88149.jpg"},{"id":82804133,"identity":"8c6430f9-7d18-4ddc-a7a6-ddcf644228bd","added_by":"auto","created_at":"2025-05-15 12:03:07","extension":"jpg","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":79246,"visible":true,"origin":"","legend":"\u003cp\u003eCalibration and prediction results for ELM model (a), GA-ELM model (b), UVE-ELM model (c), and WOA-ELM model (d).\u003c/p\u003e","description":"","filename":"7.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6594745/v1/ab0aac46b0d55759f9d2773b.jpg"},{"id":82804127,"identity":"9f5277fa-bb60-4268-b2dc-3c26cb2976bc","added_by":"auto","created_at":"2025-05-15 12:03:07","extension":"jpg","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":55010,"visible":true,"origin":"","legend":"\u003cp\u003ePrediction results for 1D-CNN model (a), ConvNeXt-ECA model (b), CNN-GRU-MHA model (c).\u003c/p\u003e","description":"","filename":"8.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6594745/v1/ad745b578f114b9fd71addc2.jpg"},{"id":82806399,"identity":"1bf385cc-3889-4836-88bd-7058e6c14dcd","added_by":"auto","created_at":"2025-05-15 12:27:07","extension":"jpg","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":43217,"visible":true,"origin":"","legend":"\u003cp\u003eComparison of the quantitative performance of multiple models.\u003c/p\u003e","description":"","filename":"9.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6594745/v1/726146fdb4fd3bda2d8ba178.jpg"},{"id":82806564,"identity":"2a55a536-d4cb-4096-b947-ed9c4ee514c5","added_by":"auto","created_at":"2025-05-15 12:35:13","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1381049,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6594745/v1/71ba002e-4b91-41e2-af84-53b77c55ba86.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Quantitative analysis of camellia oil in blending vegetable oil based on Raman spectroscopy and deep learning models","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eVegetable blended oil is a vegetable oil made by mixing two or more pure vegetable oils in a certain ratio[\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. Compared to single vegetable oils, blended oils provide a more balanced nutritional value due to the varying fatty acid content of different vegetable oils[\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. Generally, vegetable blended oils are made by mixing a large amount of low-priced pure vegetable oils with a small amount of high-priced pure vegetable oils[\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. Camellia oil is rich in oleic acid, vitamin E, polyphenols and other bioactive substances with health-promoting properties. Long-term consumption of camellia oil can alleviate high blood pressure and cardiovascular diseases, earning it the title \"Oriental olive oil\" [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e, \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. However, some unscrupulous manufacturers overstate the proportion of camellia oil in blends of rapeseed, corn, and camellia oils through false advertising to make illegal profits. Therefore, accurately identifying the content of specific high-value vegetable oils in blended oils is crucial for protecting consumers' rights and interests.\u003c/p\u003e \u003cp\u003eGenerally, conventional analytical methods such as Gas chromatography[\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e], Mass spectrometry[\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e], Gas chromatography-mass spectrometry[\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e], Synchronized fluorescence spectrometry[\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e, \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e] and Two-dimensional correlation spectroscopy[\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e, \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e] have been used to determine the types and ratios of fatty acids to identify vegetable oils. However, these methods have their own limitations, such as complex sample handling, cumbersome procedures, being time-consuming and requiring specialized laboratory facilities[\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e].In contrast, Raman spectroscopy, which neither consumes chemical reagents nor requires complexed sample preparation, is a simple and rapid detection method. It is a promising detection technique for identifying vegetable oils[\u003cspan additionalcitationids=\"CR16\" citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eAlthough traditional machine learning models have had some success in identifying vegetable oils[\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e], they often rely on manual feature extraction and feature selection when dealing with complex, multivariate spectral data. This approach not only increases the workload, but also tends to lead to inadequate or inaccurate feature extraction[\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e]. In contrast, deep learning models can automatically extract features from spectral data, which significantly improves the accuracy and robustness of the models[\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e]. However, deep learning models may still face problems of feature selection and model generalization when dealing with complex spectral data[\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e].The introduction of the attention mechanism provides an effective solution to this issue. The attention mechanism can dynamically adjust the degree of attention of the model to different parts of the input data, thus improving the feature extraction and discrimination ability of the model in complex spectral data[\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eIn this study, the spectral dataset was visualized using principal component analysis and uniform manifold approximation and projection. Then, three spectral variable selection methods\u0026mdash;namely, the Genetic Optimization Algorithm, Uninformative Variable Exclusion Algorithm, and Whale Optimization Algorithm\u0026mdash;were used to identify the relevant spectral variables of camellia oil. An extreme learning machine model was then constructed using the selected variables. Finally, three deep learning models, including one-dimensional convolutional neural network (1D-CNN), ConvNext with Efficient Channel Attention (ConvNext-ECA), and convolutional neural network with gated recurrent unit and multi-head attention (CNN-GRU-MHA), were developed to quantitatively identify camellia oil content in ternary plant oil blends.\u003c/p\u003e"},{"header":"2. Experiments and methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.1 Sample preparation\u003c/h2\u003e \u003cp\u003eThree samples of pure oils were purchased from Tmall (Hangzhou, China), namely extra virgin camellia oil, rapeseed oil and corn oil. All samples were accompanied by traceable labels complying with the Chinese national standard for pure oils, and the blends were prepared in increments, with extra virgin camellia oil in increments of 5% by volume and the other two oils in increments of 10% by volume. After conducting a market survey, it was concluded that the content of camellia oil in any camellia blend would not be too high. Therefore, the maximum concentration of extra virgin camellia oil in this study was limited to 30%. The content of the other two oils ranged from 0\u0026ndash;100%, and a total of 68 samples of blended oils with different concentration ratios were prepared. Three samples were prepared for each concentration, and 10 Raman spectra were obtained for each sample, for a total of 30 spectra for each concentration, and a total of 2040 Raman spectral data were obtained.\u003c/p\u003e \u003cp\u003eDuring preparation, the three pure oils were pipetted into a beaker according to their corresponding volume ratios, with a total volume of 10 mL. The mixture was then washed and dried with deionized water and stirred well using an ultrasonic oscillator (JM-10D-28, Jiemeng Ultrasonics Co., Ltd., China). Prior to acquiring Raman spectra, 0.1 mL of the well-mixed sample was extracted using a pipette (Eppendorf Research plus, Eppendorf AG, Germany) and dropped onto a multiwell plate. Each sample was stored in a dark and cold environment before spectrum acquisition.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e2.2 Raman measurements\u003c/h2\u003e \u003cp\u003eRaman spectra of all samples were obtained by a Raman spectrometer (RK785-I, BW\u0026amp;TEK, USA). The laser excitation wavelength was 785 nm, the laser power was set at 3.0 mW, the signal-to-noise ratio was 300:1, the single spectrum acquisition time was 3 s, and the number of accumulations was 3 times to obtain spectra with good signal-to-noise ratio. The spectral data in the Raman spectral shift range of 1000\u0026ndash;1800 cm\u003csup\u003e\u0026minus;\u0026thinsp;1\u003c/sup\u003e were extracted for subsequent data analysis.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e2.3 Spectra pre-processing\u003c/h2\u003e \u003cp\u003eRaman spectral signals are often affected by a variety of interfering factors, which can be improved by preprocessing techniques to eliminate background signal noise, increase the accuracy and reliability of spectral analysis. The original Raman spectra were preprocessed using the following steps:1. \u003cb\u003eSmoothing Filtering\u003c/b\u003e: The Savitzky-Golay (SG) filtering method was applied to smooth the original Raman spectra, reducing noise;2.\u003cb\u003eStandardization\u003c/b\u003e: The Standard Normal Variate (SNV) method was used to minimize spectral intensity variations caused by sample inhomogeneities; 3.\u003cb\u003eBaseline Correction\u003c/b\u003e: Baseline subtraction was performed using the Asymmetric Least Squares (ALS) smoothing method[\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e];4. \u003cb\u003eNormalization\u003c/b\u003e: To improve prediction accuracy, all spectral data were normalized to the range [0,1] using the max-min normalization process;5. \u003cb\u003eData Averaging and Expansion\u003c/b\u003e: Five spectral data points were randomly selected from each concentration and averaged, resulting in 68 different concentrations with a total of 408 spectra. To meet the sample size requirements for the deep learning model, the dataset was expanded by adding Poisson noise and frequency-domain perturbations to the averaged spectral data, resulting in a total of 2040 expanded spectral data.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003e2.4 Visualization and analysis of spectral dataset (PCA, UMAP)\u003c/h2\u003e \u003cp\u003ePrincipal component analysis and uniform manifold approximation and projection are both spectral downscaling methods. PCA is a widely used linear downscaling technique that transforms the original variables into some new variables that are linear combinations of the originals[\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e].These new variables, called principal component scores, capture as much variation in the data as possible while retaining essential information. In applications, these component scores can be mapped to two or three dimensions for sample classification. In contrast, UMAP is a novel dimensionality reduction method that preserves both local and global structures by constructing a graph representation of high-dimensional data and optimizing the distances between data points in a low-dimensional space. It is commonly used for clustering and visualizing high-dimensional datasets[\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003e2.5 Model of ELM\u003c/h2\u003e \u003cp\u003eExtreme Learning Machines are a class of machine learning models based on feedforward neural networks suitable for solving nonlinear regression and classification problems. The ELM algorithm consists of randomly generated hidden-layer node parameters and a linear output layer, which is determined analytically from the output weights of a single hidden-layer feedforward network[\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e]. Compared with other traditional machine learning models, ELM has significant advantages with fast and good computational scalability. In this study, four ELM models were used to predict the concentration of camellia oil in blended oil.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003e2.6 1D-CNN\u003c/h2\u003e \u003cp\u003eThe Convolutional Neural Network is one of the representative models of deep learning models, which is a class of feed-forward neural networks that contain convolutional computation with a deep structure[\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e]. In this study, a 1D-CNN quantitative identification model was constructed using 2040 Raman spectral data to predict the camellia oil content in a ternary vegetable oil blend. It includes an input layer, four 1D convolutional layers, a batch normalization layer, two pooling layers, a flatten layer, a fully connected layer, a Dropout layer and an output layer. Figure\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e(a) illustrates the structure of the 1D-CNN model.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe Raman spectral data in the range of 1000 cm\u0026thinsp;\u0026minus;\u0026thinsp;1 to 1800 cm\u0026thinsp;\u0026minus;\u0026thinsp;1 are fed into the one-dimensional CNN model through the input layer. The 1D convolutional layer extracts features from the input data, while the batch normalization layer enhances model stability and accelerates convergence. The max pooling layer compresses and combines features extracted by the convolutional layer to reduce the dimensionality of its outputs. The flattening layer transforms high-dimensional data into a one-dimensional format, and the fully connected layer links all neurons from the output of the flattening layer to its neurons. The Dropout layer is employed to prevent overfitting. The loss function of the model is \"MSE\", the optimizer is \"Nadam\", and the learning rate is 0.0001.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003e2.7 ConvNext-ECA\u003c/h2\u003e \u003cp\u003eIn the field of deep learning, especially in deep convolutional neural networks, the attention mechanism has become one of the key techniques to improve the model performance. Among them, the ECA module (Efficient Channel Attention), as a novel channel attention mechanism, captures inter-channel dependencies through one-dimensional convolutional[\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e]. Compared with the traditional attention mechanism, ECA module avoids the complex processes of dimensionality reduction and dimensionality enhancement, resulting in a more efficient and lightweight design. In this study, a ConvNeXt-ECA model was developed, in which an ECA module is added after each convolutional layer to achieve local cross-channel interactive fusion of spectral data. After applying global average pooling and flattening, the model ultimately identifies the camellia oil content. The structure of this model is shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e(b).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003e2.8 CNN-GRU-MHA\u003c/h2\u003e \u003cp\u003eSpectral data obtained from Raman spectrometers constitute a collection of time series data. Recurrent neural networks (RNN) are widely used for time series data processing, but often face issues with gradient vanishing and explosion when dealing with long sequences. Gated Recurrent Unit (GRU),with their gating mechanism, effectively handle long sequences and have lower computational costs compared to RNN[\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e]. In addition, 1D-CNN are comparable to RNN in handling certain sequence data but are more computationally efficient. Thus, 1D-CNN can serve as a preprocessing step, extracting features from spectral data and reducing the input sequence length before feeding them into the GRU layer. To further enhance model performance, this study introduces the Multi-Head Attention (MHA) mechanism after the GRU layer, which enables the model to focus on key feature regions in the Raman spectra and capture relationships between different parts of the data.\u003c/p\u003e \u003cp\u003eBased on these considerations, a CNN-GRU-MHA model was developed in this study, as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e(c). The model consists of a convolutional layer, a pooling layer, and a reshaping layer, which extract features and adjust the data before feeding it into the GRU layer with 128 output dimensions. The processed data is then passed to the MHA mechanism, followed by layer normalization. Test results demonstrate that this model effectively combines the efficient feature extraction capability of 1D-CNN, the temporal feature processing ability of GRU, and the global feature capturing capability of MHA, resulting in excellent predictive performance.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003e2.9 Evaluation of the quantitative analysis model\u003c/h2\u003e \u003cp\u003eIn quantitative analysis, the coefficient of determination (R\u003csup\u003e2\u003c/sup\u003e) and root mean square error (RMSE) are commonly used to evaluate the performance of a fitted regression model[\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e].The calculations for RMSE and R\u0026sup2; are shown in (1) and (2):\u003cdiv id=\"Equ1\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ1\" name=\"EquationSource\"\u003e\n$$\\:RMSE=\\sqrt{\\frac{1}{n}{\\sum\\:}_{i=1}^{n}{\\left({y}_{i}-{\\widehat{y}}_{i}\\right)}^{2}}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e1\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equ2\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ2\" name=\"EquationSource\"\u003e\n$$\\:{R}^{2}=1-\\frac{\\sum\\:_{i=1}^{n}\\:{\\left({y}_{i}-{\\widehat{y}}_{i}\\right)}^{2}}{\\sum\\:_{i=1}^{n}\\:{\\left({y}_{i}-\\stackrel{-}{y}\\right)}^{2}}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e2\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003ewhere n is the number of samples, and\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\:y}_{i}\\)\u003c/span\u003e\u003c/span\u003e and\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\widehat{\\:y}}_{i}\\)\u003c/span\u003e\u003c/span\u003e represent the actual and predicted values of camellia oil concentration in the mixture, respectively. Generally, a lower RMSE and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{R}^{2}\\)\u003c/span\u003e\u003c/span\u003e value closer to 1 indicate a better predictive performance of the regression model.\u003c/p\u003e \u003c/div\u003e"},{"header":"3. Results and discussion","content":"\u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003e3.1 Raman spectroscopy analysis\u003c/h2\u003e \u003cp\u003eAfter pretreatment, the Raman spectra of the three pure oils\u0026mdash;rapeseed oil, camellia oil, and corn oil\u0026mdash;are shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e. In Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, the characteristic peak near the Raman shift value of 1062 cm⁻\u0026sup1; mainly corresponds to the C-C stretching vibration of fatty acids. The characteristic peak near 1255 cm⁻\u0026sup1; reflects the =\u0026thinsp;C-H bending vibration of carbon atoms in unsaturated olefinic bonds. The peak near 1288 cm⁻\u0026sup1; is related to the C-H bending vibration of methylene groups. The characteristic peak near 1427 cm⁻\u0026sup1; corresponds to the -CH₂ scissoring vibration. The peak near 1641 cm⁻\u0026sup1; represents the cis C\u0026thinsp;=\u0026thinsp;C stretching vibration of olefinic RHC\u0026thinsp;=\u0026thinsp;CHR, and the peak near 1732 cm⁻\u0026sup1; corresponds to the C\u0026thinsp;=\u0026thinsp;O stretching vibration of the ester bond carbonyl group[\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e].\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eAs shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, the Raman spectral shapes and the positions of the Raman characteristic peaks of the three vegetable oils are approximately the same, with the differences lying in the peak intensities of these characteristic peaks. Raman spectra reflect the vibrations of molecular bonds in the structure of substances, and such molecular bonds are present in the unsaturated fatty acids of vegetable oils. Therefore, the positions of the characteristic peaks in Raman spectra and the intensity of the peaks are related to the types and proportions of unsaturated fatty acids in vegetable oils. It can be inferred that the fatty acid compositions of the three vegetable oils are approximately the same, but variations in the content of certain fatty acids lead to differences in the intensities of the same Raman peaks across different vegetable oils[\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e]. Figure\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e shows the Raman spectra of the binary mixture of camellia oil and corn oil at varying camellia oil concentrations. It can be observed that as the camellia oil content increases from 0\u0026ndash;30%, the intensity of the characteristic peaks around 1288 cm⁻\u0026sup1; and 1641 cm⁻\u0026sup1;, attributed to oleic acid in camellia oil, increases accordingly. This provides a valuable basis for identifying camellia oil in vegetable oil mixtures.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e shows the Raman spectra of the binary blended oils of rapeseed oil and corn oil. As observed from Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e, the intensity of the characteristic peak at 1515 cm⁻\u0026sup1; increases with the content of rapeseed oil, and this peak reflects the content of β-carotene[\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e]. The β-carotene content in corn oil was found to be higher than that in corn oil. Therefore, as the content of rapeseed oil increases, the mixture contains a higher amount of β-carotene, resulting in a stronger characteristic peak at this Raman shift. However, it was not possible to accurately distinguish between rapeseed and corn oils or blends with different concentrations of the two vegetable oils using only the Raman spectra. Therefore, the PCA model and UMAP model were used visualize and analyze the three pure oils.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eData visualization using PCA was performed on the three pure oil Raman spectral datasets, and the variance contributions of principal components 1, 2 and 3 were 92.71%, 3.26%, and 0.57%, respectively. The cumulative variance contribution of the first two principal components exceeds 95%, while that of the first three principal components exceeds 96%. This indicates that the PCA model effectively captured most of the information in the vegetable oil data. The first three principal components were selected to visualize the spectral data in two and three dimensions, as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e(a) and Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e(b). However, the results indicated that it was difficult to clearly distinguish among the three pure oils. To address this issue, the superior UMAP visualization algorithm was used. The results demonstrated that UMAP could effectively distinguish the three pure oils in both two-dimensional and three-dimensional spaces, as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e(c) and Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e(d).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eIn the visualization plots of PCA and UMAP, it can be observed that there are some differences between the Raman spectral data of the same oil species. Although corn oil and rapeseed oil are similar in composition and Raman spectra, the species differences between them are significantly larger than the differences within them in the UMAP analysis, and the rapeseed oil species differences are the largest. Visualization intuitively reveals the similarities and differences between different pure oils, leading to a better understanding of their performance in blended oils, which in turn improves the accuracy and reliability of the quantitative model. Therefore, it is necessary to visualize the three pure oils before conducting quantitative analysis of the blended oils.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003e3.2 Quantitative analysis models of ELM and its combination with variable selection\u003c/h2\u003e \u003cp\u003eIn this study, various machine learning models such as ELM, GA-ELM, UVE-ELM and WOA-ELM were used to quantitatively analyze the concentration of camellia oil in vegetable blended oils in combination with 408 Raman spectral data, and the related information is shown in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. Specifically, four spectral data from each concentration of blended oil were randomly selected as the training set, and the remaining two spectral data were used as the test set, which yielded a total of 272 spectral data for training and 136 spectral data for testing.GA is an optimization method based on the theory of biological evolution used to simulate the evolutionary mechanism of competitive selection of biological species[\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e]. The GA selected 76 spectral variables after 100 iterations (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e(a)). As shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e(b), the selected variables are mainly distributed between the individual Raman spectral peaks. Then, the ELM model was constructed using these 76 variables, and the values of R\u003csup\u003e2\u003c/sup\u003eP and RMSEP for the final model were 0.9463 and 1.8679.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eDifferent variable selection algorithms combined with four ELM models.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"7\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eQuantitative Model\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003ePC\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003evariables\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c5\" namest=\"c4\"\u003e \u003cp\u003eTraining\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c7\" namest=\"c6\"\u003e \u003cp\u003eTest\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eRMSEC\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eRMSEP\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eR\u003csup\u003e2\u003c/sup\u003ec\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eR\u003csup\u003e2\u003c/sup\u003ep\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eELM\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e400\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e2.5861\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e2.7425\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.9054\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.8896\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGA-ELM\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e7\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e76\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e1.7926\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e1.8679\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.9531\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.9463\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eUVE-ELM\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e9\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e49\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.9865\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e1.0592\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.9754\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.9718\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eWOA-ELM\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e21\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e1.4551\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e1.8034\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.9647\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.9572\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eIn addition, UVE is an uninformative variable elimination method used to improve the configuration problem in the dataset[\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e]. The UVE eliminates uninformative variables by calculating stability metrics and setting thresholds, and a total of 49 spectral variables were selected (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e(c)). The selected spectral variables were mainly concentrated at the vibrational peaks of the Raman spectra of camellia oil, as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e(d). Ultimately, the values of R\u003csup\u003e2\u003c/sup\u003eP and RMSEP for the UVE-ELM model were 0.9718 and 1.0592 respectively with improved predictive ability compared to GA-ELM. Additionally, the whale optimization algorithm is a new nature-inspired meta-heuristic algorithm, which is implemented to find the optimal solution by mimicking the feeding behavior of humpback whales in the ocean[\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e]. As shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e(e), the WOA algorithm selected 21 spectral variables by setting the population size, shrinking envelope coefficient, spiral update parameter, and iteration number. Finally, the values of R\u003csup\u003e2\u003c/sup\u003eP and RMSEP for the model were 0.9572 and 1.8034(Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e(f)).\u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003e shows the calibration and prediction results of the four quantitative models, and it was found that the prediction accuracy of the ELM model improved by 0.0567 after GA feature extraction compared with the ELM model due to GA feature extraction effectively reduces the dimensionality of the spectral data, thereby simplifying the model construction[\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e].The prediction accuracy of the ELM model for camellia oil after UVE variable screening was improved by 0.0255 compared to the GA-ELM because the reduction in the number of variables eliminates the covariance or uncorrelated information, which in turn improves the model performance[\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e].\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eMoreover, most of the screened variables were concentrated near the main molecular vibration Raman peaks of camellia oil. In contrast, the prediction accuracy of the ELM model built after the screening of WOA variables decreased by 0.0146 compared with that of the UVE-ELM due to variables were removed, resulting in the loss of valuable information. This loss hindered the model from achieving higher prediction accuracy. Although the ELM model can predict the concentration of camellia oil, its prediction is not accurate enough. Therefore, in order to predict the camellia oil content in blended oils more accurately, three different deep learning models were designed.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003e3.3 Deep learning quantitative analysis models\u003c/h2\u003e \u003cp\u003eIn this study, three one-dimensional deep learning models based on 1D-CNN, ConvNext-ECA and CNN-GRU-MHA were constructed to quantitatively analyze camellia oil in mixed vegetable oils using 2040 spectral data. To evaluate the generalization ability of the deep learning models, 60% of the data was randomly selected as the training set, 20% as the validation set, and the remaining 20% as the test set. The training set was used to determine the model parameters, the validation set was used to adjust the hyperparameters and optimize the network structure, and the test set was used to evaluate the generalization ability of the models. All models are quantitatively analyzed using the \"MSE\" loss function and the \"Leaky ReLU\" activation function, while the ConvNext-ECA and CNN-GRU-MHA models use the \"Adam\" optimizer with a learning rate of 0.0001, and the 1D-CNN model uses the \"Nadam\" optimizer. Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e presents the average prediction results of 10 runs of each model, and Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003e displays the prediction results of the three deep learning models.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003ePrediction results of three deep learning models, 1D-CNN, ConvNeXt-ECA, and CNN-GRU-MHA.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eQuantitative Model\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e \u003cp\u003eTraining\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c5\" namest=\"c4\"\u003e \u003cp\u003eTest\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRMSEC\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRMSEP\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eR\u003csup\u003e2\u003c/sup\u003ec\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eR\u003csup\u003e2\u003c/sup\u003ep\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e1D-CNN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.5430\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.5845\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.9938\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.9912\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eConvNeXt-ECA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.4632\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.5217\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.9964\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.9957\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCNN-GRU-MHA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.3326\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.3714\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.9986\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.9981\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eAll three models demonstrated excellent prediction ability, with R\u003csup\u003e2\u003c/sup\u003ep higher than 0.995 for ConvNeXt-ECA and R\u003csup\u003e2\u003c/sup\u003ep higher than 0.998 for CNN-GRU-MHA. Although 1D-CNN is not as accurate as the other two models in predicting camellia oil concentration, it is still superior to the traditional machine learning models as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig9\" class=\"InternalRef\"\u003e9\u003c/span\u003e.The ConvNeXt-ECA model has been improved on the Convolutional Neural Network architecture by introducing the ECA attention mechanism, which focuses more on the important features of the Raman spectral data, thereby enhancing feature extraction. The CNN-GRU-MHA model demonstrates better prediction performance, probably because the model not only efficiently extracts the data features through the convolutional and pooling layers of CNN, but also achieves more accurate sequence modeling through the GRU's powerful temporal feature processing capability. Additionally, the inclusion of the multi-head attention mechanism enables the model to capture dependencies and complex spatio-temporal correlations in the data. However, both the 1D-CNN model and the ConvNeXt-ECA model are insensitive to sequence data and cannot effectively capture some temporal information in Raman spectral data. Therefore, compared to the other two models, CNN-GRU-MHA demonstrates superior prediction performance.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"4. Conclusion","content":"\u003cp\u003eIn this study, three one-dimensional deep learning models were designed to enable the quantitative analysis of camellia oil in three-component blends. The results show that all three deep learning models exhibit excellent prediction performance compared with the traditional machine learning models. In addition, the improved models successfully quantitatively analyzed the concentration of camellia oil in blended vegetable oils. Among these three deep learning models, the CNN-GRU-MHA model showed the best prediction performance, with the model\u0026apos;s R\u003csup\u003e2\u003c/sup\u003ep higher than 0.998 and RMSEP lower than 0.372. However, it is worth noting that the blended oils sold in the market are not all made of a mixture of three vegetable oils. Therefore, this study provides an attentional mechanism combined with deep learning models for quantitatively analyzing high-value oil types in blended oils, and also demonstrates the potential of the application of attentional mechanism fused with deep learning models in vegetable oil identification.\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003eCRediT authorship contribution statement\u003c/h2\u003e\n\u003cp\u003e\u003cstrong\u003eYuanbo Huang:\u0026nbsp;\u003c/strong\u003e Investigation, Conceptualization, Methodology, Formal analysis, Writing original draft. \u003cstrong\u003eHua Zhao:\u0026nbsp;\u003c/strong\u003eData curation, Formal analysis. \u003cstrong\u003eQin Luo:\u0026nbsp;\u003c/strong\u003eFormal analysis. \u003cstrong\u003eYulin Xu:\u0026nbsp;\u003c/strong\u003eFormal analysis. \u003cstrong\u003eShi Yin:\u0026nbsp;\u003c/strong\u003eFormal analysis. \u003cstrong\u003eYongjun Hu:\u0026nbsp;\u003c/strong\u003eFunding acquisition, Project administration, Supervision, Writing\u0026nbsp;\u0026ndash;\u0026nbsp;review \u0026amp; editing.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eDeclaration of Competing Interest\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eData will be made available on request.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgments\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis work was financially supported by National Natural Science Foundation of China (NSFC) (Grant nos. 22473045 and 22073031), the Natural Science Foundation of Guangdong Province (Grant No. 2021A15150123).\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eX. Wu, X. Zhang, Z. Du, D. Yang, B. Xu, R. Ma, H. Luo, H. Liu, Y. Zhang, Raman spectroscopy combined with multiple one-dimensional deep learning models for simultaneous quantification of multiple components in blended olive oil, Food Chem., 431 (2024) 137109.\u003c/li\u003e\n\u003cli\u003eV. Ramya, K.P. Shyam, B. Kadalmani, Determination of Mono-Oil Proportion in Blended Edible Vegetable Oil (BEVO) with Identical Fatty Acid Profile: a Case Study on Coconut-Palm Kernel Oil Discrimination, Food Analytical Methods, 15 (2022) 1407-1417.\u003c/li\u003e\n\u003cli\u003eJ. Zhu, Y. Rong, X. Jiang, H. Qian, X. Yu, Q. Chen, Raman spectroscopy coupled with metaheuristics-based variable selection models: A method for rapid determination of extra virgin olive oil content in vegetable blend oils, J. Food Compos. Anal., 123 (2023) 105503.\u003c/li\u003e\n\u003cli\u003eL. Gao, L. Jin, Q. Liu, K. Zhao, L. Lin, J. Zheng, C. Li, B. Chen, Y. Shen, Recent advances in the extraction, composition analysis and bioactivity of Camellia (Camellia oleifera Abel.) oil, Trends Food Sci. Technol., 143 (2024) 104211.\u003c/li\u003e\n\u003cli\u003eY. Shang, L. Bao, H. Bi, S. Guan, J. Xu, Y. Gu, C. Zhao, Authenticity Discrimination and Adulteration Level Detection of Camellia Seed Oil via Hyperspectral Imaging Technology, Food Analytical Methods, 17 (2024) 450-463.\u003c/li\u003e\n\u003cli\u003eW. Mu, Y. Zhao, Z. Wang, Y. He, C. Yang, J. Wang, Combining lipase enzymatic techniques and antioxidants on the flavor of structured lipids (SLs) prepared from goat butter and coconut oil, Food Bioscience, 60 (2024) 104332.\u003c/li\u003e\n\u003cli\u003eG. Zeng, Z. Wang, Y. Hou, B. Ding, L. Wang, W. Chen, J. Li, J. Xie, Identification of Soybean Origin via TAGs Profile Analysis Using MALDI-TOF/MS, Food Analytical Methods, 17 (2024) 766-772.\u003c/li\u003e\n\u003cli\u003eH.G.T.H. Jayatunga, H.D. Weerathunge, H.P.P.S. Somasiri, K.R.R. Mahanama, Use of Process-Based Marker Compounds to Identify Different Coconut Oils, Food Analytical Methods, 17 (2024) 96-104.\u003c/li\u003e\n\u003cli\u003eC. Gong, X.-T. Lu, S.-D. Zhang, K. Xiao, X. Xu, Detection of lard adulteration in 3 kinds of vegetable oils by liquid chromatography\u0026ndash;mass spectrometry with porous graphite carbon column, Anal. Sci., 40 (2024) 1289-1299.\u003c/li\u003e\n\u003cli\u003eY.-H. Liu, P.-P. Wu, Q. Liu, H.-D. Luo, S.-H. Cao, G.-C. Lin, D.-S. Fu, X.-D. Zhong, Y.-Q. Li, A Simple Fluorescence Spectroscopic Approach for Simultaneous and Rapid Detection of Four Polycyclic Aromatic Hydrocarbons (PAH4) in Vegetable Oils, Food Analytical Methods, 9 (2016) 3209-3217.\u003c/li\u003e\n\u003cli\u003eK. W\u0026oacute;jcicki, I. Khmelinskii, M. Sikorski, F. Caponio, V.M. Paradiso, C. Summo, A. Pasqualone, E. Sikorska, Spectroscopic techniques and chemometrics in analysis of blends of extra virgin with refined and mild deodorized olive oils, Eur. J. Lipid Sci. Technol., 117 (2015) 92-102.\u003c/li\u003e\n\u003cli\u003eY. Liu, L. Yao, Z. Xia, Y. Gao, Z. Gong, Geographical discrimination and adulteration analysis for edible oils using two-dimensional correlation spectroscopy and convolutional neural networks (CNNs), Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 246 (2021) 118973.\u003c/li\u003e\n\u003cli\u003eJ. Qiu, H.-Y. Hou, N.T. Huyen, I.-S. Yang, X.-B. Chen, Raman Spectroscopy and 2DCOS Analysis of Unsaturated Fatty Acid in Edible Vegetable Oils, Applied Sciences, 9 (2019) 2807.\u003c/li\u003e\n\u003cli\u003eJ. Kuang, N. Luo, Z. Hao, J. Xu, X. He, J. Shi, NI-Raman spectroscopy combined with BP-Adaboost neural network for adulteration detection of soybean oil in camellia oil, Journal of Food Measurement and Characterization, 16 (2022) 3208-3215.\u003c/li\u003e\n\u003cli\u003eC. L\u0026ouml;rchner, C. Fauhl-Hassek, M.A. Glomb, V. Baeten, J.A. Fern\u0026aacute;ndez Pierna, S. Esslinger, Comparison of Spectroscopic Techniques Using the Adulteration of Pumpkin Seed Oil as Example, Food Analytical Methods, 17 (2024) 332-347.\u003c/li\u003e\n\u003cli\u003eG. Jim\u0026eacute;nez-Hern\u0026aacute;ndez, F. Ortega-Gavil\u0026aacute;n, M.G. Bagur-Gonz\u0026aacute;lez, A. Gonz\u0026aacute;lez-Casado, Discrimination/Classification of Edible Vegetable Oils from Raman Spatially Solved Fingerprints Obtained on Portable Instrumentation, Foods, 13 (2024) 183.\u003c/li\u003e\n\u003cli\u003eZ. Lu, H. Yu, Y. Yin, Y. Yuan, H. Liang, F. Li, Z. Li, Determination of the Acid and Peroxide Values of Vegetable Oils by Raman Spectroscopy with Competitive Adaptive Reweighted Sampling (CARS) and Back Propagation Neural Network (BPNN), Anal. Lett., 57 2289-2306.\u003c/li\u003e\n\u003cli\u003eY.-K. Li, W.-C. Jiao, B.-W. Han, M. Jia, D.-M. Wang, H.-M. Liu, L.-X. Hou, Detection of counterfeit sesame oil based on Raman spectroscopy and chemometric analysis, LWT, 185 (2023) 115131.\u003c/li\u003e\n\u003cli\u003eS.Y.-S.S. Adade, H. Lin, S.A. Haruna, N.A.N. Johnson, A.O. Barimah, Z. Afang, Z. Chen, J.-N. Ekumah, W. Fuyun, H. Li, Q. Chen, Multicomponent prediction of Sudan dye adulteration in crude palm oil using SERS \u0026ndash; Based bimetallic nanoflower combined with genetic algorithm, J. Food Compos. Anal., 125 (2024) 105768.\u003c/li\u003e\n\u003cli\u003eM. Wu, M. Li, B. Fan, Y. Sun, L. Tong, F. Wang, L. Li, A rapid and low-cost method for detection of nine kinds of vegetable oil adulteration based on 3-D fluorescence spectroscopy, LWT, 188 (2023) 115419.\u003c/li\u003e\n\u003cli\u003eA.-Q. Chen, H.-L. Wu, T. Wang, X.-Z. Wang, H.-B. Sun, R.-Q. Yu, Intelligent analysis of excitation-emission matrix fluorescence fingerprint to identify and quantify adulteration in camellia oil based on machine learning, Talanta, 251 (2023) 123733.\u003c/li\u003e\n\u003cli\u003eX. Xin, X. Tian, C. Chen, C. Chen, K. Li, X. Ma, L. Zhao, X. Lv, A method for accurate identification of Uyghur medicinal components based on Raman spectroscopy and multi-label deep learning, Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 315 (2024) 124251.\u003c/li\u003e\n\u003cli\u003eD. Cao, F. Shi, J. Sheng, J. Zhu, H. Yin, S. Qin, J. Yao, L. Zhu, J. Lu, X. Wang, Machine learning\u0026ndash;driven SERS analysis platform for rapid and accurate detection of precancerous lesions of gastric cancer, Microchim. Acta, 191 (2024) 415.\u003c/li\u003e\n\u003cli\u003eM.N. Mohamad Asri, R. Verma, N.A. Mahat, N.A.M. Nor, W.N.S. Mat Desa, D. Ismail, Discrimination and source correspondence of black gel inks using Raman spectroscopy and chemometric analysis with UMAP and PLS-DA, Chemometrics Intellig. Lab. Syst., 225 (2022) 104557.\u003c/li\u003e\n\u003cli\u003eH. Li, M. Mehedi Hassan, J. Wang, W. Wei, M. Zou, Q. Ouyang, Q. Chen, Investigation of nonlinear relationship of surface enhanced Raman scattering signal for robust prediction of thiabendazole in apple, Food Chem., 339 (2021) 127843.\u003c/li\u003e\n\u003cli\u003eZ. Zhang, H. Li, L. Huang, H. Wang, H. Niu, Z. Yang, M. Wang, Rapid identification and quantitative analysis of malachite green in fish via SERS and 1D convolutional neural network, Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 320 (2024) 124655.\u003c/li\u003e\n\u003cli\u003eZ. Kang, J. Liu, C. Ma, C. Chen, X. Lv, C. Chen, Early screening of cervical cancer based on tissue Raman spectroscopy combined with deep learning algorithms, Photodiagnosis and Photodynamic Therapy, 42 (2023) 103557.\u003c/li\u003e\n\u003cli\u003eZ. Chen, X. Dong, C. Liu, S. Wang, S. Dong, Q. Huang, Rapid detection of residual chlorpyrifos and pyrimethanil on fruit surface by surface-enhanced Raman spectroscopy integrated with deep learning approach, Scientific Reports, 13 (2023) 19855.\u003c/li\u003e\n\u003cli\u003eY. Luo, W. Su, M.F. Rabbi, Q. Wan, D. Xu, Z. Wang, S. Liu, X. Xu, J. Wu, Quantitative analysis of microplastics in water environments based on Raman spectroscopy and convolutional neural network, Sci. Total Environ., 926 (2024) 171925.\u003c/li\u003e\n\u003cli\u003eM. Saleem, N. Ahmad, R. Ullah, Z. Ali, S. Mahmood, H. Ali, Raman Spectroscopy\u0026ndash;Based Characterization of Canola Oil, Food Analytical Methods, 13 (2020) 1292-1303.\u003c/li\u003e\n\u003cli\u003eF. Huang, Y. Li, H. Guo, J. Xu, Z. Chen, J. Zhang, Y. Wang, Identification of waste cooking oil and vegetable oil via Raman spectroscopy, J. Raman Spectrosc., 47 (2016) 860-864.\u003c/li\u003e\n\u003cli\u003eB. Bilgin, C. Yanik, H. Torun, M.C. Onbasli, Genetic Algorithm-Driven Surface-Enhanced Raman Spectroscopy Substrate Optimization, Nanomaterials, 11 (2021) 2905.\u003c/li\u003e\n\u003cli\u003eH. Li, W. Sheng, M.M. Hassan, W. Geng, Q. Chen, Quantification of antibiotics in food by octahedral gold-silver nanocages-based SERS sensor coupling multivariate calibration, Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 320 (2024) 124595.\u003c/li\u003e\n\u003cli\u003eX. Bian, R. Zhang, P. Liu, Y. Xiang, S. Wang, X. Tan, Near infrared spectroscopic variable selection by a novel swarm intelligence algorithm for rapid quantification of high order edible blend oil, Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 284 (2023) 121788.\u003c/li\u003e\n\u003cli\u003eH. Pan, W. Ahmad, T. Jiao, A. Zhu, Q. Ouyang, Q. Chen, Label-free Au NRs-based SERS coupled with chemometrics for rapid quantitative detection of thiabendazole residues in citrus, Food Chem., 375 (2022) 131681.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"food-analytical-methods","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"Learn more about [Food Analytical Methods](https://www.springer.com/journal/12161)","snPcode":"12161","submissionUrl":"https://submission.nature.com/new-submission/12161/3","title":"Food Analytical Methods","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"Raman spectra, Vegetable blended oils, Deep Learning Models","lastPublishedDoi":"10.21203/rs.3.rs-6594745/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6594745/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThere is an urgent need for a fast and accurate method to quantify the true content of high-value vegetable oils in vegetable blended oils. In this study, Raman spectroscopy is combined with three deep learning models to identify the camellia oil content in rapeseed-corn-camellia oil blends. All three deep learning models demonstrate superior predictive capabilities compared to traditional machine learning models. Notably, the improved CNN-GRU-MHA model shows the best performance in quantitatively predicting the camellia oil content, with R\u003csup\u003e2\u003c/sup\u003ep and RMSEP values of 0.9981 and 0.3714. The results indicate that the proposed method provides a promising analytical approach for authenticity detection of blended oils.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e","manuscriptTitle":"Quantitative analysis of camellia oil in blending vegetable oil based on Raman spectroscopy and deep learning models","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-05-15 12:03:02","doi":"10.21203/rs.3.rs-6594745/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-09-07T11:18:08+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-09-06T15:55:12+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-09-04T17:23:39+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"77848850206198206467614722844341728653","date":"2025-08-20T12:39:34+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"295764452711597664624314155367141336172","date":"2025-08-18T12:43:34+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-05-13T14:15:59+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-05-06T23:43:16+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-05-06T23:41:31+00:00","index":"","fulltext":""},{"type":"submitted","content":"Food Analytical Methods","date":"2025-05-05T13:04:50+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"food-analytical-methods","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"Learn more about [Food Analytical Methods](https://www.springer.com/journal/12161)","snPcode":"12161","submissionUrl":"https://submission.nature.com/new-submission/12161/3","title":"Food Analytical Methods","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"d2b01799-0dd2-4038-80a5-ea819f2afdb5","owner":[],"postedDate":"May 15th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2025-10-12T09:08:23+00:00","versionOfRecord":[],"versionCreatedAt":"2025-05-15 12:03:02","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-6594745","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6594745","identity":"rs-6594745","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.